python - Multicore processing of multiple files and writing to a shared output file -


so, have 30 files each 1 gb in size , reading them sequentially on mac 16 gb ram , 4 cpu cores. processing of each 1 of them independent of others. taking 2 hours complete processing. each file has data day (time series/24hrs). there 30 days of data. after processing appending output file day wise (i.e. day 1, day 2, , on).

can solve problem using multiprocessing? have side effect (like thrashing etc.)? great if 1 can guide me on pattern. read multiprocess, pools , imap still not clear me how write file sequentially (i.e. day wise).

my approach (either 1 of below):

  1. use imap ordered output looking for. or
  2. to write individual output files each input files , merge them 1 sorting.

is there better pattern solve problem? need use queue here? confused!

basic demo of approach two:

from concurrent.futures import processpoolexecutor  executor = processpoolexecutor(max_workers=10)  inputs = ['a.input.txt', ] outputs = ['a.output.txt', ]   def process(input, output):     """ process 1 file @ time."""     pass   def merge(files):     """ merge output files. """     pass  in range(len(inputs)):     executor.submit(process, inputs[i], outputs[i])  merge(outputs) 

Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -