python - Efficient way to determine total time taking overlap into account -
i using pandas dataframe following:
i trying find best way determine total time spent ship @ particular berth taking account overlap in duration of visit. here data looks like:
in out berth 2015-01-14 13:57:00 2015-01-15 17:15:00 01 2015-01-14 14:30:00 2015-01-15 02:50:00 01 2015-01-14 14:30:00 2015-01-16 06:10:00 01 2015-01-25 02:15:00 2015-01-26 13:41:00 01
what want find out total time particular berth used. looking @ data there overlaps , cant add times each record.
looking @ above data can see 2nd ship timing within first time recorded 0, , 3rd ship comes before first stays till after 1st 1 leaves here time = (out of 3rd ship - in of 1st) , move next 1 there no overlap there , add [out of 4 - in of 4] total time spent on berth 1, , continue till end producing :
berth hours worked 01 7.750 02 10.275 03 5.585 08 31.980
here's solution 1 berth. hope can expand multiple berths.
split dataframe arrivals , departures:
dfin = df[['in']] dfout = df[['out']] dfin['direction'] = 1 dfout['diretcion'] = -1
as of now, in , out timestamps:
dfin.columns = ('ts', 'direction') dfout.columns = ('ts', 'direction')
combine 2 parts 1 tall sorted dataframe of traffic:
traffic = pd.concat([dfin, dfout]).sort_values('ts') # ts direction #0 2015-01-14 13:57:00 1 #1 2015-01-14 14:30:00 1 #2 2015-01-14 14:30:00 1 #1 2015-01-15 02:50:00 -1 #0 2015-01-15 17:15:00 -1 #2 2015-01-16 06:10:00 -1 #3 2015-01-25 02:15:00 1 #3 2015-01-26 13:41:00 -1
calculate number of ships @ berth, arrive , depart:
traffic['ships'] = traffic['direction'].cumsum()
identify time periods when berth empty. calculate sequence number of each "busy period."
traffic['empty'] = (traffic['ships'] == 0).shift().fillna(0).astype(int) traffic['busy_id'] = traffic['empty'].cumsum() # ts direction ships empty busy_id #0 2015-01-14 13:57:00 1 1 0 0 #1 2015-01-14 14:30:00 1 2 0 0 #2 2015-01-14 14:30:00 1 3 0 0 #1 2015-01-15 02:50:00 -1 2 0 0 #0 2015-01-15 17:15:00 -1 1 0 0 #2 2015-01-16 06:10:00 -1 0 0 0 #3 2015-01-25 02:15:00 1 1 1 1 #3 2015-01-26 13:41:00 -1 0 0 1
calculate start , end of each "busy period":
busy_data = traffic.groupby('busy_id')['ts'].agg([min, max]) # min max #busy_id #0 2015-01-14 13:57:00 2015-01-16 06:10:00 #1 2015-01-25 02:15:00 2015-01-26 13:41:00
calculate total length of "busy periods":
(busy_data['max'] - busy_data['min']).sum() #timedelta('3 days 03:39:00')
Comments
Post a Comment