python 3.x - Efficiently reading multiple csv files into Pandas dataframe -
i trying read 3 years of data files (one each date), , portion interested quite small (total ~1.4 million rows), compared parent files (each 90mb , 1.5 million rows). below code has worked pretty me in past smaller number of files. 1095 files process, crawling (taking 3-4 seconds read 1 file). suggestions making more efficient/fast?
import pandas pd glob import glob file_list = glob(r'c:\temp2\dl*.csv') file in file_list: print(file) df = pd.read_csv(file, header=none) df = df[[0,1,3,4,5]] df2 = df[df[0].isin(det_list)] if file_list[0]==file: rawdf = df2 else: rawdf = rawdf.append(df2)
iiuc, try this:
import pandas pd glob import glob file_list = glob(r'c:\temp2\dl*.csv') cols = [0,1,3,4,5] df = pd.concat([pd.read_csv(f, header=none, usecols=cols) .add_prefix('c') .query("c0 in @det_list") f in file_list], ignore_index=true)
Comments
Post a Comment