pyspark - Spark not using all nodes after resizing -


i've got emr cluster i'm trying use execute large text processing jobs, , had running on smaller cluster, after resizing master keeps running jobs locally , crashing due memory issues.

this current configuration have cluster:

[     {         "classification":"capacity-scheduler",          "properties":             {                 "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.dominantresourcecalculator"             },         "configurations":[]     },     {         "classification":"spark",         "properties":             {                 "maximizeresourceallocation":"true"             },         "configurations":[]     },     {         "classification":"spark-defaults",          "properties":             {                 "spark.executor.instances":"0",                 "spark.dynamicallocation.enabled":"true"             },         "configurations":[]     } ] 

this potential solution saw this question, , did work before resized.

now whenever attempt submit spark job spark-submit mytask.py see tons of log entries doesn't seem leave master host, so:

 17/08/14 23:49:23 info tasksetmanager: starting task 0.0 in stage 0.0 (tid 0,localhost, executor driver, partition 0, process_local, 405141 bytes) 

i've tried different parameters, setting --deploy-mode cluster , --master yarn, since yarn running on master node, still seeing work being done master host, while core nodes sit idle.

is there configuration i'm missing, preferably 1 doesn't require rebuilding cluster?


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -