pyspark - Spark not using all nodes after resizing -
i've got emr cluster i'm trying use execute large text processing jobs, , had running on smaller cluster, after resizing master keeps running jobs locally , crashing due memory issues.
this current configuration have cluster:
[ { "classification":"capacity-scheduler", "properties": { "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.dominantresourcecalculator" }, "configurations":[] }, { "classification":"spark", "properties": { "maximizeresourceallocation":"true" }, "configurations":[] }, { "classification":"spark-defaults", "properties": { "spark.executor.instances":"0", "spark.dynamicallocation.enabled":"true" }, "configurations":[] } ]
this potential solution saw this question, , did work before resized.
now whenever attempt submit spark job spark-submit mytask.py
see tons of log entries doesn't seem leave master host, so:
17/08/14 23:49:23 info tasksetmanager: starting task 0.0 in stage 0.0 (tid 0,localhost, executor driver, partition 0, process_local, 405141 bytes)
i've tried different parameters, setting --deploy-mode cluster
, --master yarn
, since yarn running on master node, still seeing work being done master host, while core nodes sit idle.
is there configuration i'm missing, preferably 1 doesn't require rebuilding cluster?
Comments
Post a Comment