@parallel vs. native loops in julia -
i run example , got result. got large number of iteration can result less amount of iteration can worse result.
i know there little overhead , it's absolutely ok, there way run loop less amount of iteration in parallel way better sequential way?
x = 0 @time i=1:200000000 x = int(rand(bool)) + x end
7.503359 seconds (200.00 m allocations: 2.980 gib, 2.66% gc time)
x = @time @parallel (+) i=1:200000000 int(rand(bool)) end
0.432549 seconds (3.91 k allocations: 241.138 kib)
i got result parallel here in following example not.
x2 = 0 @time i=1:100000 x2 = int(rand(bool)) + x2 end
0.006025 seconds (98.97 k allocations: 1.510 mib)
x2 = @time @parallel (+) i=1:100000 int(rand(bool)) end
0.084736 seconds (3.87 k allocations: 239.122 kib)
q: there way run loop less amount of iteration in parallel way better sequential way?
a: yes.
1) acquire more resources ( processors compute, memory store ) if ought sense
2) arrange workflow smarter - benefit register-based code, harnessing cache-lines's sizes upon each first fetch, deploy re-use possible ( hard work? yes, hard work, why repetitively pay 150+ [ns] instead of having paid once , reuse well-aligned neighbouring cells within ~ 30 [ns] latency-costs ( if numa permits )? ). smarter workflow means code re-designs respect increasing resulting assembly-code "density"-of-computations , tweaking code better by-pass ( optimising-)-superscalar processor hardware design tricks, of no use / positive-benefit in highly-tuned hpc computing payloads.
3) avoid headbangs blocking resources & bottlenecks ( central singularities alike host's hardware unique source-of-randomness, io-devices et al )
4) familiar optimising compilers internal options , "shortcuts" -- anti-patterns generated @ cost of extended run-times
5) maximum underlying operating system's tweaking. not doing this, optimised code still waits ( , lot ) in o/s-scheduler's queue
Comments
Post a Comment