python - Pandas if/then aggregation -
i've been searching , haven't figured out yet. hoping can aide python newb solving problem.
i'm trying figure out how write if/then statement in python , perform aggregation off if/then statement. end goal if date = 1/7/2017 use value in "fake" column. if date = else average 2 columns together.
here have far:
import pandas pd import numpy np import datetime np.random.seed(42) dte=pd.date_range(start=datetime.date(2017,1,1), end= datetime.date(2017,1,15)) fake=np.random.randint(15,100, size=15) fake2=np.random.randint(300,1000,size=15) so_df=pd.dataframe({'date':dte, 'fake':fake, 'fake2':fake2}) so_df['avg']= so_df[['fake','fake2']].mean(axis=1) so_df.head()
assuming have computed average column:
so_df['fake'].where(so_df['date']=='20170107', so_df['avg']) out: 0 375.5 1 260.0 2 331.0 3 267.5 4 397.0 5 355.0 6 89.0 7 320.5 8 449.0 9 395.5 10 197.0 11 438.5 12 498.5 13 409.5 14 525.5 name: fake, dtype: float64
if not, can replace column reference same calculation:
so_df['fake'].where(so_df['date']=='20170107', so_df[['fake','fake2']].mean(axis=1))
to check multiple dates, need use element-wise version of or operator (which pipe: |
). otherwise raise error.
so_df['fake'].where((so_df['date']=='20170107') | (so_df['date']=='20170109'), so_df['avg'])
the above checks 2 dates. in case of 3 or more, may want use isin
list:
so_df['fake'].where(so_df['date'].isin(['20170107', '20170109', '20170112']), so_df['avg']) out[42]: 0 375.5 1 260.0 2 331.0 3 267.5 4 397.0 5 355.0 6 89.0 7 320.5 8 38.0 9 395.5 10 197.0 11 67.0 12 498.5 13 409.5 14 525.5 name: fake, dtype: float64
Comments
Post a Comment