Python - Finding Row Discrepancies Between Two Dataframes -
i have 2 dataframes same number of columns, d1 , d2.
note: d1 , d2 may have different number of rows. note: d1 , d2 may not indexed same row in each data frame.
what best way check whether or not 2 dataframes have same data?
my current solution consists of appending 2 dataframes , dropping rows match.
d_combined = d1.append(d2) d_discrepancy = d_combined.drop_duplicates(keep=false) print(d_discrepancy)
i new python , pandas library. because using dataframes millions of rows , 8-10 columns, there faster , more efficient way check discrepancies? can shown initial dataframe resulting discrepancy row from?
setup
d1 = pd.dataframe(dict(a=[1, 2, 3, 4])) d2 = pd.dataframe(dict(a=[2, 3, 4, 5]))
option 1
use pd.merge
. i'll include parameter indicator=true
show data came from.
d1.merge(d2, how='outer', indicator=true) _merge 0 1 left_only 1 2 both 2 3 both 3 4 both 4 5 right_only
if have same data, i'd expect _merge
column both
everything. can check with
d1.merge(d2, how='outer', indicator=true)._merge.eq('both').all() false
in case, returned false
therefore not same data.
option 2
use drop_duplicates
need make sure drop duplicates initial dataframes first.
d1.drop_duplicates().append(d2.drop_duplicates()) \ .drop_duplicates(keep=false).empty
Comments
Post a Comment