pyspark - Extract value from cloudant IBM Bluemix NoSQL Database -
how extract value cloudant ibm bluemix nosql database stored in json format?
i tried code
def readdataframefromcloudant(host,user,pw,database): cloudantdata=spark.read.format("com.cloudant.spark"). \ option("cloudant.host",host). \ option("cloudant.username", user). \ option("cloudant.password", pw). \ load(database) cloudantdata.createorreplacetempview("washing") spark.sql("select * washing").show() return cloudantdata hostname = "" user = "" pw = "" database = "database" cloudantdata=readdataframefromcloudant(hostname, user, pw, database)
it stored in format
{ "_id": "31c24a382f3e4d333421fc89ada5361e", "_rev": "1-8ba1be454fed5b48fa493e9fe97bedae", "d": { "count": 9, "hardness": 72, "temperature": 85, "flowrate": 11, "fluidlevel": "acceptable", "ts": 1502677759234 } }
i want result
expected
actual outcome
create dummy dataset reproducing issue:
cloudantdata = spark.read.json(sc.parallelize([""" { "_id": "31c24a382f3e4d333421fc89ada5361e", "_rev": "1-8ba1be454fed5b48fa493e9fe97bedae", "d": { "count": 9, "hardness": 72, "temperature": 85, "flowrate": 11, "fluidlevel": "acceptable", "ts": 1502677759234 } } """])) cloudantdata.take(1)
returns:
[row(_id='31c24a382f3e4d333421fc89ada5361e', _rev='1-8ba1be454fed5b48fa493e9fe97bedae', d=row(count=9, flowrate=11, fluidlevel='acceptable', hardness=72, temperature=85, ts=1502677759234))]
now flatten:
flat_df = cloudantdata.select("_id", "_rev", "d.*") flat_df.take(1)
returns:
[row(_id='31c24a382f3e4d333421fc89ada5361e', _rev='1-8ba1be454fed5b48fa493e9fe97bedae', count=9, flowrate=11, fluidlevel='acceptable', hardness=72, temperature=85, ts=1502677759234)]
i tested code ibm data science experience notebook using python 3.5 (experimental) spark 2.0
this answer based on: https://stackoverflow.com/a/45694796/1033422
Comments
Post a Comment