python 3.x - How to compare columns in a pandas dataframe -


i have pandas dataframe looks "word" column header columns:

   word    word    word    word 0  nap     nap     nap     cat 1  cat     cat     cat     flower 2  peace   kick    kick    go 3  phone   fin     fin     nap 

how can return words appear in 4 columns?

expected output:

  word 0 nap 1 cat 

  • use apply(set) turn each column set of words
  • use set.intersection find words in each column's set
  • turn list , series

pd.series(list(set.intersection(*df.apply(set))))  0    cat 1    nap dtype: object 

we can accomplish same task python functional magic performance benefit.

pd.series(list(     set.intersection(*map(set, map(lambda c: df[c].values.tolist(), df))) ))  0    cat 1    nap dtype: object 

timing
code below

enter image description here

pir1 = lambda d: pd.series(list(set.intersection(*d.apply(set)))) pir2 = lambda d: pd.series(list(set.intersection(*map(set, map(lambda c: d[c].values.tolist(), d))))) # took liberties @anton vbr's solution. vbr = lambda d: pd.series((lambda x: x.index[x.values == len(d.columns)])(pd.value_counts(d.values.ravel())))  results = pd.dataframe(     index=pd.index([10, 30, 100, 300, 1000, 3000, 10000, 30000]),     columns='pir1 pir2 vbr'.split() )  in results.index:     d = pd.concat(dict(enumerate(         [pd.series(np.random.choice(words[:i*2], i, false)) _ in range(4)]     )), axis=1)     j in results.columns:         stmt = '{}(d)'.format(j)         setp = 'from __main__ import d, {}'.format(j)         results.set_value(i, j, timeit(stmt, setp, number=100))  results.plot(loglog=true) 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -