python 3.x - Efficiently reading multiple csv files into Pandas dataframe -


i trying read 3 years of data files (one each date), , portion interested quite small (total ~1.4 million rows), compared parent files (each 90mb , 1.5 million rows). below code has worked pretty me in past smaller number of files. 1095 files process, crawling (taking 3-4 seconds read 1 file). suggestions making more efficient/fast?

import pandas pd glob import glob  file_list = glob(r'c:\temp2\dl*.csv')  file in file_list:     print(file)     df = pd.read_csv(file, header=none)     df = df[[0,1,3,4,5]]     df2 = df[df[0].isin(det_list)]       if file_list[0]==file:         rawdf = df2     else:         rawdf = rawdf.append(df2) 

iiuc, try this:

import pandas pd glob import glob  file_list = glob(r'c:\temp2\dl*.csv')  cols = [0,1,3,4,5]  df = pd.concat([pd.read_csv(f, header=none, usecols=cols)                   .add_prefix('c')                   .query("c0 in @det_list")                  f in file_list],                ignore_index=true) 

Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -