elasticsearch - How to give tokens from certain tokenizers more weight? -
i have following (simplified) data
[ { id: 1, customernumber: "0008", name: "bob" }, { id: 2, customernumber: "0854", name: "sue" }, { id: 3, customernumber: "0041", name: "larry" } ]
the context auto-complete search bar @ top of application.
i'm using custom regex tokenizer trim leading zeros user need not enter them. gets me tokens
id 1 => "8" id 2 => "854" id 3 => "41"
i have edge-n-gram tokenizer applied gives me tokens
id 1 => "8" id 2 => "854", "85", "8" id 3 => "41", "4"
our users consider "0008" better match query "8" "0854". when search "8" getting tons of results "08**" ranking higher "0008".
how make "0008" rank higher "0854" when searching "8"?
- sometimes users include leading zeros in query.
- i think problem both id 1 , 2 tokenize single "8" there on equal. don't know how remedy problem.
query:
post _search { "size": 24, "from": 0, "query": { "multi_match": { "query": "8", "fields": [ "customernumber", "name" ], "type": "best_fields" } } }
i ended achieving desired result changing "leading zeros trimmer" "token filter" "character filter".
i changed "edge n gram token filter" using "edge n gram tokenizer" instead.
these 2 changes gave me desired result.
Comments
Post a Comment