python - Aggregating over multiple categories with sub results -
this seems simple enough problem, couldn’t find on it.
have dataframe:
df = pd.dataframe({'x':[1,3,1,3,1,5,1,5], 'cat1':['a','a','b','b','a','a','b','b'], 'cat2':['a','a','a','a','b','b','b','b']}) cat1 cat2 x 0 1 1 3 2 b 1 3 b 3 4 b 1 5 b 5 6 b b 1 7 b b 5
where cat1 , cat 2 categories, , want aggregate x on them using aggregate function (sum, weighted average or defined). obvious solution, like
res = df.groupby(['cat1','cat2']).sum()
results in
x cat1 cat2 4 b 6 b 4 b 6
but want this:
cat1 cat2 x 0 4 1 b 6 2 10 3 b 4 4 b b 6 5 b 10 6 8 7 b 12 8 20
one way go this:
res2 = df.groupby(['cat1','cat2']).sum().reset_index() res1a = df.groupby(['cat1']).sum().reset_index() res1b = df.groupby(['cat2']).sum().reset_index() res1a['cat2'] = 'any' res1b['cat1'] = 'any' res = res1a.append(res1b).append(res2).set_index(['cat1','cat2'])
but since actual task involves 10 categories, creating separate groupbys various category subsets , appending not convenient. hope there better way go
you can use unstack
reshape, add new column , new row any
sum
, last reshape stack
:
#add `['x']` series output res = df.groupby(['cat1','cat2'])['x'].sum() print (res) cat1 cat2 4 b 6 b 4 b 6 name: x, dtype: int64 = res.unstack() a['any'] = a.sum(axis=1) a.loc['any'] = a.sum() print (a) cat2 b cat1 4 6 10 b 4 6 10 8 12 20 df1 = a.stack() print (df1) cat1 cat2 4 b 6 10 b 4 b 6 10 8 b 12 20 dtype: int64
edit:
res2 = df.groupby(['cat1','cat2']).sum() res1a = df.groupby(['cat1']).sum() res1b = df.groupby(['cat2']).sum() res1a = res1a.set_index(pd.series(['any'] * len(res1a), name='cat2'), append=true) res1b=res1b.set_index(pd.series(['any'] * len(res1b), name='cat1'), append=true).swaplevel() res1ab=pd.dataframe(res1a.sum().values, index=pd.multiindex.from_arrays([['any'],['any']]), columns=['x']) res2 = pd.concat([res2, res1a, res1b, res1ab]) print (res2) x cat1 cat2 4 b 6 b 4 b 6 10 b 10 8 b 12 20
Comments
Post a Comment