Python - Finding Row Discrepancies Between Two Dataframes -


i have 2 dataframes same number of columns, d1 , d2.

note: d1 , d2 may have different number of rows. note: d1 , d2 may not indexed same row in each data frame.

what best way check whether or not 2 dataframes have same data?

my current solution consists of appending 2 dataframes , dropping rows match.

d_combined = d1.append(d2) d_discrepancy = d_combined.drop_duplicates(keep=false) print(d_discrepancy) 

i new python , pandas library. because using dataframes millions of rows , 8-10 columns, there faster , more efficient way check discrepancies? can shown initial dataframe resulting discrepancy row from?

setup

d1 = pd.dataframe(dict(a=[1, 2, 3, 4])) d2 = pd.dataframe(dict(a=[2, 3, 4, 5])) 

option 1
use pd.merge. i'll include parameter indicator=true show data came from.

d1.merge(d2, how='outer', indicator=true)          _merge 0  1   left_only 1  2        both 2  3        both 3  4        both 4  5  right_only 

if have same data, i'd expect _merge column both everything. can check with

d1.merge(d2, how='outer', indicator=true)._merge.eq('both').all()  false 

in case, returned false therefore not same data.


option 2
use drop_duplicates
need make sure drop duplicates initial dataframes first.

d1.drop_duplicates().append(d2.drop_duplicates()) \     .drop_duplicates(keep=false).empty 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -