Panda dataframe column conditional on another column -
import pandas pd import urllib.request import numpy np url="https://www.misoenergy.org/library/repository/market%20reports/20170811_da_bc.xls" cnstxls = urllib.request.urlopen(url) xl = pd.excelfile(cnstxls) df = xl.parse("sheet1",skiprows=3) constr = df.iloc[:,1:7] constr['class'] = np.where(constr['hour of occurrence'] == (1,2,3,4,5,6), 'offpeak', 'onpeak') sumsp=constr.groupby('constraint_id','class',axis=0)['shadow price'].sum().sort_values(ascending=true)`
1) new column class giving errors - says typeerror: invalid type comparison
. how set new column based on multiple hours? works when put 1 hour (1 or 2 or 3...)
2) typeerror: groupby() got multiple values argument 'axis'
. groupby
using 2 columns. works 1 column.
let's try:
constr['class'] = np.where(constr['hour of occurrence'].isin([1,2,3,4,5,6]),'offpeak','onpeak') sumsp = constr.groupby(['constraint_id','class'],axis=0)['shadow price'].sum().sort_values(ascending=true) print(sumsp)
output:
constraint_id class 281292 onpeak -780.05 1049 onpeak -364.68 4636 onpeak -276.62 201082 onpeak -245.44 1607 onpeak -237.36 98333 onpeak -112.05 107318 onpeak -96.10 270366 onpeak -80.71 267644 onpeak -73.25 285770 onpeak -59.53 1049 offpeak -46.52 281292 offpeak -33.80 270888 onpeak -19.68 289484 offpeak -10.41 onpeak -4.52 1607 offpeak -2.60 9712 onpeak 0.84 268470 onpeak 1.14 248010 onpeak 1.48 287090 onpeak 1.63 offpeak 11.78 188144 offpeak 26.32 4862 onpeak 28.03 285770 offpeak 50.21 name: shadow price, dtype: float64
edit unstack
pivot class:
sumsp.unstack('class')
output:
class offpeak onpeak constraint_id 1049 -46.52 -364.68 1607 -2.60 -237.36 4636 nan -276.62 4862 nan 28.03 9712 nan 0.84 98333 nan -112.05 107318 nan -96.10 188144 26.32 nan 201082 nan -245.44 248010 nan 1.48 267644 nan -73.25 268470 nan 1.14 270366 nan -80.71 270888 nan -19.68 281292 -33.80 -780.05 285770 50.21 -59.53 287090 11.78 1.63 289484 -10.41 -4.52
Comments
Post a Comment