python - Efficient way to determine total time taking overlap into account -


i using pandas dataframe following:

i trying find best way determine total time spent ship @ particular berth taking account overlap in duration of visit. here data looks like:

 in                    out                        berth  2015-01-14 13:57:00   2015-01-15 17:15:00         01 2015-01-14 14:30:00   2015-01-15 02:50:00         01 2015-01-14 14:30:00   2015-01-16 06:10:00         01 2015-01-25 02:15:00   2015-01-26 13:41:00         01 

what want find out total time particular berth used. looking @ data there overlaps , cant add times each record.

looking @ above data can see 2nd ship timing within first time recorded 0, , 3rd ship comes before first stays till after 1st 1 leaves here time = (out of 3rd ship - in of 1st) , move next 1 there no overlap there , add [out of 4 - in of 4] total time spent on berth 1, , continue till end producing :

berth   hours worked   01    7.750   02    10.275   03    5.585   08    31.980 

here's solution 1 berth. hope can expand multiple berths.

split dataframe arrivals , departures:

dfin = df[['in']] dfout = df[['out']] dfin['direction'] = 1 dfout['diretcion'] = -1 

as of now, in , out timestamps:

dfin.columns = ('ts', 'direction') dfout.columns = ('ts', 'direction') 

combine 2 parts 1 tall sorted dataframe of traffic:

traffic = pd.concat([dfin, dfout]).sort_values('ts') #                   ts  direction #0 2015-01-14 13:57:00          1 #1 2015-01-14 14:30:00          1 #2 2015-01-14 14:30:00          1 #1 2015-01-15 02:50:00         -1 #0 2015-01-15 17:15:00         -1 #2 2015-01-16 06:10:00         -1 #3 2015-01-25 02:15:00          1 #3 2015-01-26 13:41:00         -1 

calculate number of ships @ berth, arrive , depart:

traffic['ships'] = traffic['direction'].cumsum() 

identify time periods when berth empty. calculate sequence number of each "busy period."

traffic['empty'] = (traffic['ships'] == 0).shift().fillna(0).astype(int) traffic['busy_id'] = traffic['empty'].cumsum() #                   ts  direction  ships  empty busy_id #0 2015-01-14 13:57:00          1      1      0       0 #1 2015-01-14 14:30:00          1      2      0       0 #2 2015-01-14 14:30:00          1      3      0       0 #1 2015-01-15 02:50:00         -1      2      0       0 #0 2015-01-15 17:15:00         -1      1      0       0 #2 2015-01-16 06:10:00         -1      0      0       0 #3 2015-01-25 02:15:00          1      1      1       1 #3 2015-01-26 13:41:00         -1      0      0       1 

calculate start , end of each "busy period":

busy_data = traffic.groupby('busy_id')['ts'].agg([min, max]) #                      min                 max #busy_id                                         #0     2015-01-14 13:57:00 2015-01-16 06:10:00 #1     2015-01-25 02:15:00 2015-01-26 13:41:00 

calculate total length of "busy periods":

(busy_data['max'] - busy_data['min']).sum() #timedelta('3 days 03:39:00') 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -