mongodb - Searching with Precedence on Array Order -
my gut feeling answer no, possible perform search in mongodb comparing similarity of arrays order important?
e.g. have 3 documents so
{'_id':1, "my_list": ["a",2,6,8,34,90]}, {'_id':2, "my_list": ["a","f",2,6,19,8,90,55]}, {'_id':3, "my_list": [90,34,8,6,3,"a"]}
1 , 2 similar, 3 wildly different irrespective of fact contains of same values 1.
ideally search similar {"my_list" : ["a",2,6,8,34,90] }
, results document 1 , 2.
it's regex search wild cards. know can in python enough, speed important , i'm dealing 1.3 million documents.
any "comparison" or "selection" more or less subjective actual logic applied. general principle consider product of matched indices array test against , array present in document. example:
var sample = ["a",2,6,8,34,90]; db.getcollection('source').aggregate([ { "$match": { "my_list": { "$in": sample } } }, { "$addfields": { "score": { "$add": [ { "$cond": { "if": { "$eq": [ { "$size": { "$setintersection": [ "$my_list", sample ] }}, { "$size": { "$literal": sample } } ] }, "then": 100, "else": 0 }}, { "$sum": { "$map": { "input": "$my_list", "as": "ml", "in": { "$multiply": [ { "$indexofarray": [ { "$reversearray": "$my_list" }, "$$ml" ]}, { "$indexofarray": [ { "$reversearray": { "$literal": sample } }, "$$ml" ]} ] } } }} ] } }}, { "$sort": { "score": -1 } } ])
would return documents in order this:
/* 1 */ { "_id" : 1.0, "my_list" : [ "a", 2, 6, 8, 34, 90], "score" : 155.0 } /* 2 */ { "_id" : 2.0, "my_list" : ["a", "f", 2, 6, 19, 8, 90, 55], "score" : 62.0 } /* 3 */ { "_id" : 3.0, "my_list" : [ 90, 34, 8, 6, 3, "a"], "score" : 15.0 }
the key being when applied using $reversearray
, values $indexofarray
"larger" produced matching index on order "first last" ( reversed ) gives larger "weight" matches @ beginning of array moves towards end.
of course should make consideration things second document in fact contain "most" of matches , have more array entries place "larger" weight on initial matches in first document.
from above "a"
scores more in second document in first because array longer though both matched "a"
in first position. there effect "f"
mismatch , therefore has greater negative effect if later in array. same applies "a"
in last document, @ end of array match has little bearing on overall weight.
the counter in consideration add logic consider "exact match" case, such here $size
comparison $setintersection
of sample , current array. adjust scores ensure matched provided elements scored higher document less positional matches, more elements overall.
with "score" in place can filter out results ( i.e $limit
) or whatever other logic can apply in order return actual results wanted. first step calculating "score" work from.
so it's subjective logic means "nearest match", $reversearray
, $indexofarray
operations key putting "more weight" on earlier index matches rather last.
overall looking "calculation" of logic. aggregation framework has of available operators, ones apply end implementation. i'm showing "logically works" more weight on "earlier matches" in array comparison rather "latter matches", , of course "most weight" arrays same.
note: similar logic achieved using
includearrayindex
option of$unwind
earlier version of mongodb without main operators used above. process require usage of$unwind
deconstruct arrays in first place, , performance hit incur negate effectiveness of operation.
Comments
Post a Comment