python - Why does PySpark keeps calliing Iterator.next after the TaskContext is marked as completed? -

March 15, 2011

i've built connector database , in custom rdd.compute function i'm adding listener taskcontext.addtaskcompletionlistener closes database connection (this seems common practice among spark connectors, spark's hadooprdd uses pattern well).

i have pyspark application creates rdd database data , call rdd.take(1) on rdd (1 partition, 1 local worker thread). problem here pyspark keeps calling iterator.next (custom iterator well) after fired task completion handler.

this doesn't happen when scala application calls same rdd.take(1), spark source code pyspark's rdd.take differs spark's scala rdd.take.

i'd appreciate thoughts on issue.

Search This Blog

LP

python - Why does PySpark keeps calliing Iterator.next after the TaskContext is marked as completed? -

Comments

Post a Comment

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -