python - Why does PySpark keeps calliing Iterator.next after the TaskContext is marked as completed? -
i've built connector database , in custom rdd.compute
function i'm adding listener taskcontext.addtaskcompletionlistener
closes database connection (this seems common practice among spark connectors, spark's hadooprdd
uses pattern well).
i have pyspark application creates rdd database data , call rdd.take(1)
on rdd (1 partition, 1 local worker thread). problem here pyspark keeps calling iterator.next
(custom iterator well) after fired task completion handler.
this doesn't happen when scala application calls same rdd.take(1)
, spark source code pyspark's rdd.take
differs spark's scala rdd.take
.
i'd appreciate thoughts on issue.
Comments
Post a Comment