python - intersection 2 pandas dataframe -
in problem have 2 dataframes mydataframe1 , mydataframe2 below.
mydataframe1 out[13]: start end remove 50 60 1 61 105 0 106 150 1 151 160 0 161 180 1 181 200 0 201 400 1 mydataframe2 out[14]: start end 55 100 105 140 151 154 155 185 220 240 from mydataframe2 remove rows interval start-end contained (also partially) in of "remove"=1 intervals in mydataframe1. in other words there should not itnersection between intervals of mydataframe2 , each of intervals in mydataframe1
in case mydataframe2 becomes
mydataframe2 out[15]: start end 151 154
you use pd.intervalindex intersections
get rows removed
in [313]: dfr = df1.query('remove == 1') construct intervalindex removed ranges
in [314]: s1 = pd.intervalindex.from_arrays(dfr.start, dfr.end, 'both') construct intervalindex tested
in [315]: s2 = pd.intervalindex.from_arrays(df2.start, df2.end, 'both') select rows of s2 not in s1 ranges
in [316]: df2.loc[[x not in s1 x in s2]] out[316]: start end 2 151 154 details
in [320]: df1 out[320]: start end remove 0 50 60 1 1 61 105 0 2 106 150 1 3 151 160 0 4 161 180 1 5 181 200 0 6 201 400 1 in [321]: df2 out[321]: start end 0 55 100 1 105 140 2 151 154 3 155 185 4 220 240 in [322]: dfr out[322]: start end remove 0 50 60 1 2 106 150 1 4 161 180 1 6 201 400 1 intervalindex details
in [323]: s1 out[323]: intervalindex([[50, 60], [106, 150], [161, 180], [201, 400]] closed='both', dtype='interval[int64]') in [324]: s2 out[324]: intervalindex([[55, 100], [105, 140], [151, 154], [155, 185], [220, 240]] closed='both', dtype='interval[int64]') in [326]: [x not in s1 x in s2] out[326]: [false, false, true, false, false]
Comments
Post a Comment