How to cast time columns and find timedelta with condition in python pandas -
i have column time non null object , cannot convert timedelta or datetime.
time msg 12:29:36.306000 setup 12:29:36.507000 alerting 12:29:38.207000 service 12:29:39.194000 setup 12:30:05.773000 alerting 12:30:06.205000 service 12:32:07.315000 setup 12:32:17.194000 service 12:32:26.889000 setup 12:36:06.274000 alerting 12:36:08.523000 service 12:37:59.200000 setup 12:47:10.652000 alerting 12:47:43.921000 setup when type df.info(), got 'time' column non null object , couldn't convert timedelta or datetime(for it's obvious why can't it). so, what's solution find difference between consecutive msg (time delta), if timedelta < 5sec pass. output:
time msg diff 12:29:36.306000 setup 12:29:36.507000 alerting 12:29:38.207000 service 12:29:39.194000 setup 12:30:05.773000 alerting 12:30:06.205000 service 12:32:07.315000 setup 12:32:17.194000 service 12:32:26.889000 setup 12:36:06.274000 alerting 6.30*** 12:36:08.523000 service 12:37:59.200000 setup 12:47:10.652000 alerting 11.02*** 12:47:43.921000 setup i've tried this:
df['diff'] = (df['time']df['time'].shift()).fillna(0) but didn't know write condition 5sec interval.
i think first need convert str , call to_timedelta.
then diff , comapre 5s.
last new column use mask mask:
df['time'] = pd.to_timedelta(df['time'].astype(str)) df['diff'] = df['time'].diff() df['mask'] = df['time'].diff() > pd.timedelta(5, unit='s') print (df) time msg diff mask 0 12:29:36.306000 setup nat false 1 12:29:36.507000 alerting 00:00:00.201000 false 2 12:29:38.207000 service 00:00:01.700000 false 3 12:29:39.194000 setup 00:00:00.987000 false 4 12:30:05.773000 alerting 00:00:26.579000 true 5 12:30:06.205000 service 00:00:00.432000 false 6 12:32:07.315000 setup 00:02:01.110000 true 7 12:32:17.194000 service 00:00:09.879000 true 8 12:32:26.889000 setup 00:00:09.695000 true 9 12:36:06.274000 alerting 00:03:39.385000 true 10 12:36:08.523000 service 00:00:02.249000 false 11 12:37:59.200000 setup 00:01:50.677000 true 12 12:47:10.652000 alerting 00:09:11.452000 true 13 12:47:43.921000 setup 00:00:33.269000 true df['time'] = pd.to_timedelta(df['time']) diff = df['time'].diff() mask = df['time'].diff() > pd.timedelta(5, unit='s') df['new'] = diff.where(mask) print (df) time msg new 0 12:29:36.306000 setup nat 1 12:29:36.507000 alerting nat 2 12:29:38.207000 service nat 3 12:29:39.194000 setup nat 4 12:30:05.773000 alerting 00:00:26.579000 5 12:30:06.205000 service nat 6 12:32:07.315000 setup 00:02:01.110000 7 12:32:17.194000 service 00:00:09.879000 8 12:32:26.889000 setup 00:00:09.695000 9 12:36:06.274000 alerting 00:03:39.385000 10 12:36:08.523000 service nat 11 12:37:59.200000 setup 00:01:50.677000 12 12:47:10.652000 alerting 00:09:11.452000 13 12:47:43.921000 setup 00:00:33.269000
Comments
Post a Comment