python 3.x - How to sum up combined string has serval numbers in a pandas DataFrame column -
i have string contains comma delimited int values, such x = "1,2,3,4,5,6" , how calculate sum of x contained values?
i tried:
values = x.split(",").map(lambda a:int(a)) sum(values)
attributeerror: 'list' object has no attribute 'map'
actually, have pandas dataframe have such data format:
import numpy np import pandas pd df = pd.dataframe({'id':[100,101,201], 'prices_a':['1,2,3','4,5,6','7,8,9'], 'prices_b':['1,2,3','2,6,6','3,5,8']})
so be:
id prices_a prices_b 0 100 1,2,3 1,2,3 1 101 4,5,6 2,6,6 2 201 7,8,9 3,5,8
i add new column diff compare prices_a
& prices_b,
if same, df['diff'] = 'match'
, otherwise, df['diff'] = sum(prices_a values) - sum(prices_b b values)
you can use numpy.where
, sum
s columns use str.split
, astype
sum
per rows (axis=1
):
a = df['prices_a'].str.split(',', expand=true).astype(float).sum(axis=1) b = df['prices_b'].str.split(',', expand=true).astype(float).sum(axis=1) print (a) 0 6.0 1 15.0 2 24.0 dtype: float64 print (b) 0 6.0 1 14.0 2 16.0 dtype: float64 df['df'] = np.where(df['prices_a'] == df['prices_b'], 'match', - b) print (df) id prices_a prices_b df 0 100 1,2,3 1,2,3 match 1 101 4,5,6 2,6,6 1.0 2 201 7,8,9 3,5,8 8.0
but better not mixed strings numeric.
so possible use e.g nan
s instead match
:
df['diff'] = np.where(df['prices_a'] == df['prices_b'], np.nan, - b) print (df) id prices_a prices_b diff 0 100 1,2,3 1,2,3 nan 1 101 4,5,6 2,6,6 1.0 2 201 7,8,9 3,5,8 8.0
Comments
Post a Comment