python - Trying to understand how pandas merge works, right join specifically -


i ran confusing result larger dataframe, have made toy 1 captures of what's confusing me:

import pandas pd big_index = [123, 124, 125, 126, 127, 128, 129, 130] big_dat = {'year': pd.series([2000, 2000, 2000, 2001, 2002, 2002, 2002, 2004], index=big_index),       'other': pd.series(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], index=big_index)} big_df = pd.dataframe(big_dat)  year_index = [2003, 2000, 2001, 2002] year_dat = {'a': pd.series([1, 2, 3, 4], index=year_index),         'b': pd.series([5, 6, 7, 8], index=year_index)} year_df = pd.dataframe(year_dat) 

left, , inner merge work i'd expect, right , outer produce odd results:

merged_right = pd.merge(     big_df,     year_df,     how='right',     left_on='year',     right_index=true     ) merged_right     other  year   b 123      2000  2  6 124     b  2000  2  6 125     c  2000  2  6 126     d  2001  3  7 127     e  2002  4  8 128     f  2002  4  8 129     g  2002  4  8 130   nan  2003  1  5  merged_outer = pd.merge(     big_df,     year_df,     how='outer',     left_on='year',     right_index=true     ) merged_outer     other  year       b 123      2000  2.0  6.0 124     b  2000  2.0  6.0 125     c  2000  2.0  6.0 126     d  2001  3.0  7.0 127     e  2002  4.0  8.0 128     f  2002  4.0  8.0 129     g  2002  4.0  8.0 130     h  2004  nan  nan 130   nan  2003  1.0  5.0 

in both cases index 130 gets associated 2003 year entry, no apparent reason. there's no "good" way handle since i'm assuming index can't have nan in it. i'd have expected throw error though, rather returning incorrect last column. i'm misunderstanding pandas doing under hood. tips resources figure out why going wrong appreciated code showing how right.


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -