postgresql - Pyspark read column of jsons as rdd -


i wondering performant (and correct) way this?

i have postgres table containing multiple columns, 1 of jsonb, , want load spark rdd.
here

url = mypostgresurl df = sqlcontext.read.json(dataframereader(sqlcontext).jdbc(      url='jdbc:%s' % url, table='"myschema".mytable', properties=properties      ).select('myjsoncolumn').rdd.map(lambda r: r.myjsoncolumn))  

this works slow (several minutes 1m rows)

thanks lot.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -