python - Group over Series of lists in Panda Dataframe -
i have dataframe list in each cell. each row of dataframe want group on 1st element of lists , average second element. here dummy data , screenshot of df illustrate problem:
import pandas pd df = pd.dataframe({"column a":[["winter 2012",5],["sommer 2012",10]], "column b":[["sommer 2012",20],["winter 2012",10]], "column c":[["winter 2012",15],["sommer 2012",30]]}) df column column b column c 0 [winter 2012, 5] [sommer 2012, 20] [winter 2012, 15] 1 [sommer 2012, 10] [winter 2012, 10] [sommer 2012, 30]
the desired output first line should this:
column d column e 0 [winter 2012, 10] [sommer 2012, 20] 1 [sommer 2012, 20] [winter 2012, 10]
being new python, cannot wrap head around how approach this.
here's 1 way
in [410]: df.apply(lambda x: pd.series( x.apply(pd.series) .groupby(0, as_index=false, sort=false) .mean() .values.tolist(), index=['column d', 'column e']), axis=1) out[410]: column d column e 0 [winter 2012, 10] [sommer 2012, 20] 1 [sommer 2012, 20] [winter 2012, 10]
Comments
Post a Comment