python - Multicore processing of multiple files and writing to a shared output file -
so, have 30 files each 1 gb in size , reading them sequentially on mac 16 gb ram , 4 cpu cores. processing of each 1 of them independent of others. taking 2 hours complete processing. each file has data day (time series/24hrs). there 30 days of data. after processing appending output file day wise (i.e. day 1, day 2, , on).
can solve problem using multiprocessing? have side effect (like thrashing etc.)? great if 1 can guide me on pattern. read multiprocess, pools , imap still not clear me how write file sequentially (i.e. day wise).
my approach (either 1 of below):
- use imap ordered output looking for. or
- to write individual output files each input files , merge them 1 sorting.
is there better pattern solve problem? need use queue here? confused!
basic demo of approach two:
from concurrent.futures import processpoolexecutor executor = processpoolexecutor(max_workers=10) inputs = ['a.input.txt', ] outputs = ['a.output.txt', ] def process(input, output): """ process 1 file @ time.""" pass def merge(files): """ merge output files. """ pass in range(len(inputs)): executor.submit(process, inputs[i], outputs[i]) merge(outputs)
Comments
Post a Comment