pandas - Create a dictionary by grouping by values from a dataframe column in python -


i have dataframe 7 columns, follows:

  bank_acct firstname | bank_acct lastname | bank_acctnumber   | firstname | lastname | id | date1    | date2     b1                  | last1              | 123               | abc       | efg      | 12 | somedate | somedate     b2                  | last2              | 245               | abc       | efg      | 12 | somedate | somedate     b1                  | last1              | 123               | def       | efg      | 12 | somedate | somedate     b3                  | last3              | 356               | abc       | ghi      | 13 | somedate | somedate     b4                  | last4              | 478               | xyz       | fhj      | 13 | somedate | somedate     b5                  | last5              | 599               | xyz       | dfi      | 13 | somedate | somedate 

i want create dictionary with:

 {id1: (count of bank_acct firstname, count of distinct bank_acct lastname,          {bank_acctnumber1 : itscount, bank_acctnumber2 : itscount},           count of distinct firstname, count of distinct lastname),    id2: (...), } 

for above example:

{12: (2, 2, {123: 2, 245: 1}, 2, 1), 13 : (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)} 

below code that:

cols = ['bank first name', 'bank last name' 'bank acctnumber', 'first name', 'last name']     df1 = df.groupby('id').apply(lambda x: tuple(x[c].nunique() c in cols))     d = df1.to_dict() 

but above code gives output as:

 {12: (2, 2, 2, 2, 1), 13 : (3, 3, 3, 2, 3)} 

giving count of distinct bank acctnumber instead of inner dictionary.

how required dictionary instead? thanks!!

you define columns , functions in list

in [15]: cols = [      ...:     {'col': 'bank_acct firstname', 'func': pd.series.nunique},      ...:     {'col': 'bank_acct lastname', 'func': pd.series.nunique},      ...:     {'col': 'bank_acctnumber', 'func': lambda x: x.value_counts().to_dict()},      ...:     {'col': 'firstname', 'func': pd.series.nunique},      ...:     {'col': 'lastname', 'func': pd.series.nunique}      ...:     ]  in [16]: df.groupby('id').apply(lambda x: tuple(c['func'](x[c['col']]) c in cols)) out[16]: id 12            (2, 2, {123: 2, 245: 1}, 2, 1) 13    (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3) dtype: object  in [17]: (df.groupby('id')             .apply(lambda x: tuple(c['func'](x[c['col']]) c in cols))             .to_dict()) out[17]: {12: (2, 2, {123: 2, 245: 1}, 2, 1),  13: (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)} 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -