python - Installing pandas 0.20.3 on Google Cloud Dataflow takes a very long time -


when using apache beam python sdk 2.0.0 on google cloud dataflow, takes forever (about 8 minutes) install pandas 0.20.3. install hangs on message running setup.py bdist_wheel pandas: still running.... on machine, however, installing same version of pandas doesn't take 30 seconds (even after clearing pip cache). installing pandas takes third of cost of running pipeline right now. ideas on why takes time?

dataflow sdk stages dependencies in source form because client architecture not match vms used dataflow workers. cause pandas installed sources , compiled on vms taking long time.

it possible solve using --extra_package flag , pointing whl file. pandas can use corresponding whl file (py27, x86_64) pypi page of pandas.


Comments

Popular posts from this blog

python Tkinter Capturing keyboard events save as one single string -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

javascript - Z-index in d3.js -