I am using white-space Tokenizer to tokenize my records into indexes.
I have records like Postal codes. eg. PL10 1AA and other like PL11 1AA
Lets suppose I have two records as of now. So, while doing a not query on keyword PL10 it is also removing the record having PL11 which is not correct according to my requirements. Same happens if I search for exact match for PL10 then also I am getting records which contain PL11.
Not in query for PL10 1AA : search performed on query -(fullRecordStandard:"(PL pl10) 1aa")
Please suggest how to achieve or which tokenizer is best compatible for exact match or partial match but that should not eliminate/bring such non matching records.
Your help will be appreciated.
White-space tokenizer will never gonna help u in current situation nor the StandardTokenizer
because they split the words on basis of space.
you may have 2 approaches to fllow as work around for your problem
1: try writing your own analyzer which split data in spaces but when you search some thing it will search on basis of first token if match then return result otherwise no.
this approach sounds good but very hard to write such analyzer
2: store data without any spaces and retun result on basis of regex call like this PL10*
I agree with your advice, I don’t have too much knowledge in Lucene.
As I am a newbie to Hibernate Search and Lucene.
I am trying to learn and explore it as much as possible.
I just caught with a requirement which does not satisfy with my current knowledge of tokenizers and analyzers.
Please let me know where I am doing wrong in achieving the requirement.
Ya I tried a lot with white-space/standard tokenizer but failed to achieve.
I have to achieve the exact match it means that if I searched for PL10 then i should be getting the records only which contains PL10 in it.
Same if I am eliminating for records. User might search for whole Post code as well, in that case also I have to check the exact match.
Just wanted to know what are things which needs to mentioned in custom analyzer.