python - nlargest on groupby with multiindex and multiple agg column -
stuggling apply .nlargest() groupedby data in order show largest 10 gross revenue per index[0]
groupedby data looks this:
when run:
grp_data.n_largest(10,'grossrevenue_gbp') doesn't seem working me, full code snippet below:
tmean = lambda x :stats.trim_mean(x, 0.1) data = data.loc[(data['yyyy'] == 2016)&(data['new_category_id'] != 0)] grp_data = data.groupby(['new_category','cdi_cus_nm'])['grossrevenue_gbp', 'ordercount', '% rev', 'movc_gbp', 'average order size'] .aggregate({'grossrevenue_gbp':np.sum, 'ordercount':np.sum,'% rev': np.sum,'movc_gbp': tmean ,'average order size': tmean }) .nlargest(10,'grossrevenue_gbp') grp_data['country'] = 'eu' key1 = grp_data.index.labels[0] key2 = grp_data['grossrevenue_gbp'].rank(ascending=false) sorter = np.lexsort((key2, key1)) grp_data = grp_data.take(sorter) grp_data = grp_data[['% rev','grossrevenue_gbp', 'movc_gbp','average order size','ordercount','country']] would appreciate help.
thanks,
i think need groupby first multiindex level , apply function nlargest:
grp_data = data.groupby(['new_category','cdi_cus_nm']) .aggregate({'grossrevenue_gbp':np.sum, 'ordercount':np.sum, '% rev': np.sum, 'movc_gbp': tmean , 'average order size': tmean }) df = grp_data.groupby('new_category') .apply(lambda x: x.nlargest(1,'grossrevenue_gbp')) .reset_index(level=0, drop=true) 
Comments
Post a Comment