mysql - Is there an optimal/efficient way to compare multiple rows for similarity/difference? -
i trying compare list of values - objective work out total difference/similarity between items, , grade items how similar each other along several criteria, measured swing default (0).
the criteria value ever varies between -8 , +8 (or null if not specified). example:
colour: flavour: temperature: item a: -4 2 5 item b: 3 0 3 item c: 2 -3 1
so work out difference between items , c: colour (2 - -4) = -6, flavour (-3 - 2) = -5, , temperature (1 - 5)=-4. -6 + -5 + -4 = gives score of 15. assume abs() used standardise differences before getting item total / score. i'll simplify problem bit leaving off of maths - main issue here score, , number of criteria matched, each row.
please give problem thought before looking @ current solution, avoid preconceptions: http://sqlfiddle.com/#!9/11c0f/1
is there nicer way of doing it, i'm ignorant of? potentially there around 150 criteria, , few hundred items/rows needing displayed , sorted similarity each other, quite bit of crunching.
caching results of these expensive queries given. question more difference/similarity calculation - there deep sql magic actual comparison more efficiently using pure arithmetic? data stored way ease of processing if need be.
there catch - not items have criteria, field null if none applicable.
i've deliberately omitted sql body of question avoid precolouring viewers opinions. i'm using mysql if system offers efficient way handle entirely changeable.
thanks in advance experience , insights.
consider encapsulating similarity calculation function takes 2 arguments. assuming function named similarity()
possible solution be:
select item_1, item_2, sum(sim) similarity ( select a.item_name item_1, b.item_name item_2, similarity(a.criteria_value, b.criteria_value) sim ourdata inner join ourdata b on b.item_name > a.item_name , b.attribute_name = a.attribute_name ) x group item_1, item_2;
if appropriate, run on items_attributes
table instead of ourdata
table--it's not clear me why have different numbers of rows.
Comments
Post a Comment