python - Split a dataframe by time column - pandas -
i select right portion of dataset explain following example:
input df:
id_b, ts_b,value id1,2017-04-27 01:35:30,0 id1,2017-04-27 01:35:40,0 id1,2017-04-27 01:35:50,1 id1,2017-04-27 01:36:00,4 id1,2017-04-27 01:36:10,5 id1,2017-04-27 01:36:20,100 id1,2017-04-27 01:36:30,155 id1,2017-04-27 01:36:40,235 id1,2017-04-27 01:36:50,0 id1,2017-04-27 01:36:60,0 id1,2017-04-27 01:37:00,2353 id1,2017-04-27 01:37:10,221 id1,2017-04-27 01:37:20,2432 id1,2017-04-27 01:37:30,2654 id1,2017-04-27 01:37:40,12 id1,2017-04-27 01:37:50,5 id1,2017-04-27 01:38:00,5 id1,2017-04-27 01:38:10,23 id1,2017-04-27 01:38:20,5 id1,2017-04-27 01:38:30,2 id1,2017-04-27 01:38:40,2 id1,2017-04-27 01:38:50,1 id1,2017-04-27 01:39:00,0 id1,2017-04-27 01:39:10,0 id1,2017-04-27 01:39:20,0 id1,2017-04-27 01:39:30,0 id1,2017-04-27 01:39:40,0 id1,2017-04-27 01:39:50,0 id1,2017-04-27 01:40:00,0 id1,2017-04-27 01:40:10,1 id1,2017-04-27 01:40:20,5 id1,2017-04-27 01:40:30,221 id1,2017-04-27 01:40:40,2432 id1,2017-04-27 01:40:50,2654 id1,2017-04-27 01:40:60,12 id1,2017-04-27 01:41:00,5 id1,2017-04-27 01:41:10,5 id1,2017-04-27 01:41:20,23 id1,2017-04-27 01:41:30,5 id1,2017-04-27 01:41:40,2 id1,2017-04-27 01:41:50,1
considering following: segment_number = 1
duration = 3 minuts
i want select first segment of dataframe starting first df.value non 0 until last value covering duration of 3 minutes.
output: id1,2017-04-27 01:35:50,1 id1,2017-04-27 01:36:00,4 id1,2017-04-27 01:36:10,5 id1,2017-04-27 01:36:20,100 id1,2017-04-27 01:36:30,155 id1,2017-04-27 01:36:40,235 id1,2017-04-27 01:36:50,0 id1,2017-04-27 01:36:60,0 id1,2017-04-27 01:37:00,2353 id1,2017-04-27 01:37:10,221 id1,2017-04-27 01:37:20,2432 id1,2017-04-27 01:37:30,2654 id1,2017-04-27 01:37:40,12 id1,2017-04-27 01:37:50,5 id1,2017-04-27 01:38:00,5 id1,2017-04-27 01:38:10,23 id1,2017-04-27 01:38:20,5 id1,2017-04-27 01:38:30,2 id1,2017-04-27 01:38:40,2 id1,2017-04-27 01:38:50,1
considering following: segment_number = 2
duration = 1.40 minuts
i want select second segment of dateframe starting first df.value non 0 until last value covering duration of 1.40 minutes.
output:
id1,2017-04-27 01:40:10,1 id1,2017-04-27 01:40:20,5 id1,2017-04-27 01:40:30,221 id1,2017-04-27 01:40:40,2432 id1,2017-04-27 01:40:50,2654 id1,2017-04-27 01:40:60,12 id1,2017-04-27 01:41:00,5 id1,2017-04-27 01:41:10,5 id1,2017-04-27 01:41:20,23 id1,2017-04-27 01:41:30,5 id1,2017-04-27 01:41:40,2 id1,2017-04-27 01:41:50,1
so far, did indexed df w.r.t ts_b using `pd.to_datetime , set_index' , using variable "last_end_point" keeps track of index of previous segment.
not right output.
any appreciated.
this answer formulated:
import pandas pd import numpy np import datetime df = pd.read_csv("filename.csv") df['ts_b'] = pd.to_datetime(df['ts_b']) def find_the_energenies_segment(key_mapped, duration, energenie_df, threshold): non_zero_indexs = energenie_df[energenie_df["value"]>threshold].index first_index = non_zero_indexs[0] if len(non_zero_indexs)>0 else none if(not first_index): return {"sub_df": none, "start_index": none, "end_index":none, "duration": duration} start_time = energenie_df.loc[first_index].ts_b hours,minutes,seconds = duration.split(":") end_time = start_time + datetime.timedelta(hours=int(hours),minutes=int(minutes),seconds=int(seconds)) last_index = energenie_df[energenie_df["ts_b"]>end_time].index[0]-1 return {"sub_df": energenie_df.loc[first_index:last_index], "start_index": first_index, "end_index":last_index, "duration": duration} out = find_the_energenies_segment("id1", "00:03:00", df, 0 ) print(out)
Comments
Post a Comment