python 3.x - How to compare columns in a pandas dataframe -
i have pandas dataframe looks "word" column header columns:
   word    word    word    word 0  nap     nap     nap     cat 1  cat     cat     cat     flower 2  peace   kick    kick    go 3  phone   fin     fin     nap   how can return words appear in 4 columns?
expected output:
  word 0 nap 1 cat      
- use 
apply(set)turn each column set of words - use 
set.intersectionfind words in each column's set - turn list , series
 
pd.series(list(set.intersection(*df.apply(set))))  0    cat 1    nap dtype: object   we can accomplish same task python functional magic performance benefit.
pd.series(list(     set.intersection(*map(set, map(lambda c: df[c].values.tolist(), df))) ))  0    cat 1    nap dtype: object   timing
 code below  
pir1 = lambda d: pd.series(list(set.intersection(*d.apply(set)))) pir2 = lambda d: pd.series(list(set.intersection(*map(set, map(lambda c: d[c].values.tolist(), d))))) # took liberties @anton vbr's solution. vbr = lambda d: pd.series((lambda x: x.index[x.values == len(d.columns)])(pd.value_counts(d.values.ravel())))  results = pd.dataframe(     index=pd.index([10, 30, 100, 300, 1000, 3000, 10000, 30000]),     columns='pir1 pir2 vbr'.split() )  in results.index:     d = pd.concat(dict(enumerate(         [pd.series(np.random.choice(words[:i*2], i, false)) _ in range(4)]     )), axis=1)     j in results.columns:         stmt = '{}(d)'.format(j)         setp = 'from __main__ import d, {}'.format(j)         results.set_value(i, j, timeit(stmt, setp, number=100))  results.plot(loglog=true)      
Comments
Post a Comment