python - Comparing one row in a data frame to multiple rows in a different data frame to generate similarity score -


i new python, pandas , fuzzywuzzy , i'm trying use fuzzywuzzy package generate similarity scores in order assign correct values rows in data frame.

i have 2 data frames follows

df:                                         df2: user_id        employer                     employer_list       score 1              google                       amazon              nan 2              citibank                     citibank            nan 3              b m                        google              nan 4              amazon inc                   ibm                 nan 5              dell corp                     

the first data frame (df) export contains user_id , respective employers. goal of second data frame store fuzz.ratio obtained comparing each distinct employer value first data frame employer_list in second one.

for example first employer 'google' compared values in df2['employer_list']:

df:                                         df2: user_id        employer                     employer_list       score 1              google                       amazon              17 2              citibank                     citibank            0 3              b m                        google              100 4              amazon inc                   ibm                 0 5              dell corp 

finally, first data frame updated third column take value google df2 has fuzz.ratio higher 90. process iterated each value in df['employer'] , values in df2['score'] replaced newer ones.

final result like:

df:                                              user_id        employer      final_employer                   1              google        google                2              citibank      citibank               3              b m         ibm               4              amazon inc    amazon               5              dell corp     nan 

so far have tried

df2['score'] = fuzz.ratio(df['employer'].iloc[0],df2['employer_list']) 

but returns same score in rows in df2['score'].


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -