elasticsearch - How to give tokens from certain tokenizers more weight? -


i have following (simplified) data

[   { id: 1, customernumber: "0008", name: "bob" },   { id: 2, customernumber: "0854", name: "sue" },   { id: 3, customernumber: "0041", name: "larry" } ] 

the context auto-complete search bar @ top of application.

i'm using custom regex tokenizer trim leading zeros user need not enter them. gets me tokens

id 1 => "8" id 2 => "854" id 3 => "41" 

i have edge-n-gram tokenizer applied gives me tokens

id 1 => "8" id 2 => "854", "85", "8" id 3 => "41", "4" 

our users consider "0008" better match query "8" "0854". when search "8" getting tons of results "08**" ranking higher "0008".

how make "0008" rank higher "0854" when searching "8"?

  • sometimes users include leading zeros in query.
  • i think problem both id 1 , 2 tokenize single "8" there on equal. don't know how remedy problem.

query:

post _search {     "size": 24,     "from": 0,     "query": {          "multi_match": {             "query": "8",             "fields": [                 "customernumber",                 "name"             ],             "type": "best_fields"         }      } } 

i ended achieving desired result changing "leading zeros trimmer" "token filter" "character filter".

i changed "edge n gram token filter" using "edge n gram tokenizer" instead.

these 2 changes gave me desired result.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -