And in my condition, will Hibernate search be faster than search by HQL?
It depends on many things, like the analyzers you use, the size of your data set, the RDMS you use, hardware, contention, … I can’t possibly answer to that kind of question, you just have to try to find out. I can just say that Lucene is quite fast in general, so performance issues are likely to be configuration issues.
That being said, your use case is not exactly the standard full-text use case: you’re looking for random sequence of characters, not for actual words. Full-text search (Hibernate Search + Lucene) will work, but that’s not what it’s best at.
So. In you case, to make things faster you probably do need ngrams. I hope your dataset isn’t too extensive, or you have a lot of disk space for your indexes, because ngrams take a lot of space. That’s the trade-off: it may be faster, but will require a lot of disk space.
I’ve never used non-edge ngrams myself, but I could advise the following.
When querying, you should not apply the same n-gram tokenizer. If you do, then when searching for “ab” for example, Lucene would try to find any document that contain “a”, OR “b”, OR “ab”. Which is three queries instead of just the one you would want, i.e. just the documents that contain “ab”.
Thus, when querying, you’ll probably want to apply an analyzer that has the same filters, but no tokenizer at all (i.e. that uses the KeywordTokenizerFactory
, which does not tokenize at all). This way, you’ll get the correct behavior: the search term will have to match against one ngram.
So:
- Define an additional analyzer named “customAnalyzer_query”, which is the same as “customAnalyzer”, but with a
KeywordTokenizerFactory
instead of NGramTokenizerFactory
. Do not remove the current analyzer definition (“customAnalyzer”) and leave it on the remarks
field. The new analyzer is an addition, not a replacement.
- When you create the query builder, make sure to override the analyzer:
QueryBuilder qb = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Driver.class)
.overridesForField( "remarks", "customAnalyzer_query" )
.get();
- Build your query like this (do not ignore the analyzer):
Query query = qb.keyword().onField("remarks").matching(searchText).createQuery();
The reason you should not ignore the analyzer is that if some filters are not applied, for example the StandardFilter, then your search terms may unexpectedly not match the ngrams. For example a document containing “SOMEWORD” will contain the ngram “someword” (lowercased), thus a query with the search term “SOMEWORD” ignoring the analyzer will not match, since the search term is not lowercased.