python - Group over Series of lists in Panda Dataframe -

i have dataframe list in each cell. each row of dataframe want group on 1st element of lists , average second element. here dummy data , screenshot of df illustrate problem:

import pandas pd df = pd.dataframe({"column a":[["winter 2012",5],["sommer 2012",10]],                    "column b":[["sommer 2012",20],["winter 2012",10]],                    "column c":[["winter 2012",15],["sommer 2012",30]]}) df              column           column b           column c 0   [winter 2012, 5]  [sommer 2012, 20]  [winter 2012, 15] 1  [sommer 2012, 10]  [winter 2012, 10]  [sommer 2012, 30]

the desired output first line should this:

            column d           column e 0  [winter 2012, 10]  [sommer 2012, 20] 1  [sommer 2012, 20]  [winter 2012, 10]

being new python, cannot wrap head around how approach this.

here's 1 way

in [410]: df.apply(lambda x: pd.series(                    x.apply(pd.series)                     .groupby(0, as_index=false, sort=false)                     .mean()                     .values.tolist(), index=['column d', 'column e']),                    axis=1) out[410]:             column d           column e 0  [winter 2012, 10]  [sommer 2012, 20] 1  [sommer 2012, 20]  [winter 2012, 10]

Search This Blog

LP

python - Group over Series of lists in Panda Dataframe -

Comments

Post a Comment