python - Create new columns by grouping and aggregating multicolumns in pandas -


i have dataframe 50 columns, of them period_start_time, id, speed_throughput, etc. dataframe sample:

    id     period_start_time         speed_througput    ... 0    1     2017-06-14 20:00:00              6 1    1     2017-06-14 20:00:00              10 2    1     2017-06-14 21:00:00              2 3    1     2017-06-14 21:00:00              5 4    2     2017-06-14 20:00:00              8 5    2     2017-06-14 20:00:00              12 ... 

i have tried go create 2 new columns grouping 2 columns(id , period_start_time) , find avg , min of speed_trhoughput. code i've tried:

df['throughput_avg']=df.sort_values(['period_start_time'],ascending=false).groupby(['period_start_time','id'])[['speed_trhoughput']].max() df['throughput_min'] = df.groupby(['period_start_time', 'id'])[['speed_trhoughput']].min() 

as can see, there 2 ways i've tried, nothing works. error message received both attempts:

 typeerror:incompatible index of inserted column frame index 

i suppose know output needs be, there no need post it.

option 1
use agg in groupby , join attach main dataframe

df.join(     df.groupby(['id', 'period_start_time']).speed_througput.agg(         ['mean', 'min']     ).rename(columns={'mean': 'avg'}).add_prefix('throughput_'),     on=['id', 'period_start_time'] )     id    period_start_time  speed_througput  throughput_avg  throughput_min 0   1  2017-06-14 20:00:00                6             8.0               6 1   1  2017-06-14 20:00:00               10             8.0               6 2   1  2017-06-14 21:00:00                2             3.5               2 3   1  2017-06-14 21:00:00                5             3.5               2 4   2  2017-06-14 20:00:00                8            10.0               8 5   2  2017-06-14 20:00:00               12            10.0               8 

option 2
use transform in groupby context , use assign add new columns

g = df.groupby(['id', 'period_start_time']).speed_througput.transform df.assign(throughput_avg=g('mean'), throughput_min=g('min'))     id    period_start_time  speed_througput  throughput_avg  throughput_min 0   1  2017-06-14 20:00:00                6             8.0               6 1   1  2017-06-14 20:00:00               10             8.0               6 2   1  2017-06-14 21:00:00                2             3.5               2 3   1  2017-06-14 21:00:00                5             3.5               2 4   2  2017-06-14 20:00:00                8            10.0               8 5   2  2017-06-14 20:00:00               12            10.0               8 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -