pyspark - Extract value from cloudant IBM Bluemix NoSQL Database -


how extract value cloudant ibm bluemix nosql database stored in json format?

i tried code

def readdataframefromcloudant(host,user,pw,database):    cloudantdata=spark.read.format("com.cloudant.spark"). \       option("cloudant.host",host). \       option("cloudant.username", user). \       option("cloudant.password", pw). \       load(database)  cloudantdata.createorreplacetempview("washing") spark.sql("select * washing").show() return cloudantdata  hostname = "" user = "" pw = "" database = "database" cloudantdata=readdataframefromcloudant(hostname, user, pw, database) 

it stored in format

{   "_id": "31c24a382f3e4d333421fc89ada5361e",   "_rev": "1-8ba1be454fed5b48fa493e9fe97bedae",   "d": {     "count": 9,     "hardness": 72,     "temperature": 85,     "flowrate": 11,     "fluidlevel": "acceptable",     "ts": 1502677759234   } } 

i want result

expected

expected

actual outcome

actual outcome

create dummy dataset reproducing issue:

cloudantdata = spark.read.json(sc.parallelize([""" {   "_id": "31c24a382f3e4d333421fc89ada5361e",   "_rev": "1-8ba1be454fed5b48fa493e9fe97bedae",   "d": {     "count": 9,     "hardness": 72,     "temperature": 85,     "flowrate": 11,     "fluidlevel": "acceptable",     "ts": 1502677759234   } } """])) cloudantdata.take(1) 

returns:

[row(_id='31c24a382f3e4d333421fc89ada5361e', _rev='1-8ba1be454fed5b48fa493e9fe97bedae', d=row(count=9, flowrate=11, fluidlevel='acceptable', hardness=72, temperature=85, ts=1502677759234))] 

now flatten:

flat_df = cloudantdata.select("_id", "_rev", "d.*") flat_df.take(1) 

returns:

[row(_id='31c24a382f3e4d333421fc89ada5361e', _rev='1-8ba1be454fed5b48fa493e9fe97bedae', count=9, flowrate=11, fluidlevel='acceptable', hardness=72, temperature=85, ts=1502677759234)] 

i tested code ibm data science experience notebook using python 3.5 (experimental) spark 2.0

this answer based on: https://stackoverflow.com/a/45694796/1033422


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -