python - Comparing one row in a data frame to multiple rows in a different data frame to generate similarity score -
i new python, pandas , fuzzywuzzy , i'm trying use fuzzywuzzy package generate similarity scores in order assign correct values rows in data frame.
i have 2 data frames follows
df: df2: user_id employer employer_list score 1 google amazon nan 2 citibank citibank nan 3 b m google nan 4 amazon inc ibm nan 5 dell corp
the first data frame (df
) export contains user_id , respective employers. goal of second data frame store fuzz.ratio obtained comparing each distinct employer value first data frame employer_list
in second one.
for example first employer 'google' compared values in df2['employer_list']
:
df: df2: user_id employer employer_list score 1 google amazon 17 2 citibank citibank 0 3 b m google 100 4 amazon inc ibm 0 5 dell corp
finally, first data frame updated third column take value google df2 has fuzz.ratio higher 90. process iterated each value in df['employer']
, values in df2['score']
replaced newer ones.
final result like:
df: user_id employer final_employer 1 google google 2 citibank citibank 3 b m ibm 4 amazon inc amazon 5 dell corp nan
so far have tried
df2['score'] = fuzz.ratio(df['employer'].iloc[0],df2['employer_list'])
but returns same score in rows in df2['score']
.
Comments
Post a Comment