python 3.x - Efficiently reading multiple csv files into Pandas dataframe -

April 15, 2013

i trying read 3 years of data files (one each date), , portion interested quite small (total ~1.4 million rows), compared parent files (each 90mb , 1.5 million rows). below code has worked pretty me in past smaller number of files. 1095 files process, crawling (taking 3-4 seconds read 1 file). suggestions making more efficient/fast?

import pandas pd glob import glob  file_list = glob(r'c:\temp2\dl*.csv')  file in file_list:     print(file)     df = pd.read_csv(file, header=none)     df = df[[0,1,3,4,5]]     df2 = df[df[0].isin(det_list)]       if file_list[0]==file:         rawdf = df2     else:         rawdf = rawdf.append(df2)

iiuc, try this:

import pandas pd glob import glob  file_list = glob(r'c:\temp2\dl*.csv')  cols = [0,1,3,4,5]  df = pd.concat([pd.read_csv(f, header=none, usecols=cols)                   .add_prefix('c')                   .query("c0 in @det_list")                  f in file_list],                ignore_index=true)

Search This Blog

LP

python 3.x - Efficiently reading multiple csv files into Pandas dataframe -

Comments

Post a Comment

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

nginx - phpPgAdmin - log in works but I have to login again after clicking on any links -

How to deploy a middleman blog inside a rails app? -