postgresql - Pyspark read column of jsons as rdd -
i wondering performant (and correct) way this?
i have postgres table containing multiple columns, 1 of jsonb, , want load spark rdd.
here
url = mypostgresurl df = sqlcontext.read.json(dataframereader(sqlcontext).jdbc( url='jdbc:%s' % url, table='"myschema".mytable', properties=properties ).select('myjsoncolumn').rdd.map(lambda r: r.myjsoncolumn))
this works slow (several minutes 1m rows)
thanks lot.
Comments
Post a Comment