algorithm - Efficient approximation of most frequent items across multiple large item-sets -

February 15, 2010

i learn there efficient algorithm can approximate top-k frequent items across many large item-sets.

for example, have 99 sets:

s1 = (i1, i2, i3, i5, i7, i9, ... , i230231)

s2 = (i2, i4, i7, i11, i17, i21, ... , i230231)

s3 = (i1, i5, i9, i12, i13, i55, ... , i334234)

...

s99 = (i2, i3, i4, i6, i22, i32, ... , i96574)

want find frequent items. e.g. i2, i1, i3, i7

an underlying assumption items care (want included in top-k) have @ least 10% frequency across sets, if there 100 sets, item care should @ least appear in 10 sets (hope can make problem easier).

a straightforward solution build dictionary items keys , frequencies values, , iterate through each item-set , increase frequency count each item. problem costs linear running time based on sizes of sets: ∑|si|. , item-sets large.

i tried find approximation random sampling pairs of sets, , each pair of sets make intersection of them, based on idea if item frequent enough, should have frequency across multiple intersections. not faster (if not slower).

greatly appreciate ideas, keen read researches well

Search This Blog

LP

algorithm - Efficient approximation of most frequent items across multiple large item-sets -

Comments

Post a Comment

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

nginx - phpPgAdmin - log in works but I have to login again after clicking on any links -