pyspark - Spark not using all nodes after resizing -

July 15, 2012

i've got emr cluster i'm trying use execute large text processing jobs, , had running on smaller cluster, after resizing master keeps running jobs locally , crashing due memory issues.

this current configuration have cluster:

[     {         "classification":"capacity-scheduler",          "properties":             {                 "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.dominantresourcecalculator"             },         "configurations":[]     },     {         "classification":"spark",         "properties":             {                 "maximizeresourceallocation":"true"             },         "configurations":[]     },     {         "classification":"spark-defaults",          "properties":             {                 "spark.executor.instances":"0",                 "spark.dynamicallocation.enabled":"true"             },         "configurations":[]     } ]

this potential solution saw this question, , did work before resized.

now whenever attempt submit spark job spark-submit mytask.py see tons of log entries doesn't seem leave master host, so:

 17/08/14 23:49:23 info tasksetmanager: starting task 0.0 in stage 0.0 (tid 0,localhost, executor driver, partition 0, process_local, 405141 bytes)

i've tried different parameters, setting --deploy-mode cluster , --master yarn, since yarn running on master node, still seeing work being done master host, while core nodes sit idle.

is there configuration i'm missing, preferably 1 doesn't require rebuilding cluster?

Search This Blog

LP

pyspark - Spark not using all nodes after resizing -

Comments

Post a Comment

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

nginx - phpPgAdmin - log in works but I have to login again after clicking on any links -