apache spark - How to format JSON string after conversion from pyspark dataframe -


i have converted data frame json using tojson in pyspark gives me each row json string. want reformat bit

my code given below:

spark=sparksession.builder.config("spark.sql.warehouse.dir", "c:\spark\spark-warehouse").appname("testapp").enablehivesupport().getorcreate() sqlstring="select lflow1.leasetype leasetype, lflow1.status status, lflow1.property property, lflow1.city city, lesflow2.dealtype dealtype, lesflow2.area area, lflow1.did did, lesflow2.mid mid lflow1, lesflow2  lflow1.did = lesflow2.mid"  def querybuilder(sqlval):     df=spark.sql(sqlval)     df.show()     return df  result=querybuilder(sqlstring) resultlist=result.tojson().collect() print(resultlist) print("type of",type(resultlist)) 

after this, output is:

[     '{"leasetype":"offer lease","status":"fully executed","property":"10230104","city":"edmonton","dealtype":"renewal","area":"2312","did":"79cc3959ffc8403f943ff0e7e93584f8","mid":"79cc3959ffc8403f943ff0e7e93584f8"}',     '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"784","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}',      '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"2223","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}',      '{"leasetype":"offer lease","status":"conditional","property":"106portw","city":"toronto","dealtype":"renewal","area":"2212","did":"69c3af0527014fd99d1ccf156c0bce93","mid":"69c3af0527014fd99d1ccf156c0bce93"}',      '{"leasetype":"offer lease","status":"fully executed","property":"106portw","city":"toronto","dealtype":"0","area":"","did":"04aedb01da5d44fead7e1315115c2530","mid":"04aedb01da5d44fead7e1315115c2530"}' ] 

but want format json array example: following 2 rows:

[     {         "leasetype": "offer lease",         "status": "fully executed",         "property": "10230104",         "city": "edmonton",         "dealtype": "renewal",         "area": "2312",         "did": "79cc3959ffc8403f943ff0e7e93584f8",         "mid": "79cc3959ffc8403f943ff0e7e93584f8"     },     {         "leasetype": "offer renew",         "status": "fully executed",         "property": "1040hami",         "city": "vancouver",         "dealtype": "renewal",         "area": "784",         "did": "ecf922d0583247c0a4cb591bd4b3843e",         "mid": "ecf922d0583247c0a4cb591bd4b3843e"     } ] 

i want omit ' here.

kindly me figure out.

hope helps!

import re import json  resultlist = [     '{"leasetype":"offer lease","status":"fully executed","property":"10230104","city":"edmonton","dealtype":"renewal","area":"2312","did":"79cc3959ffc8403f943ff0e7e93584f8","mid":"79cc3959ffc8403f943ff0e7e93584f8"}',     '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"784","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}',     '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"2223","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}',     '{"leasetype":"offer lease","status":"conditional","property":"106portw","city":"toronto","dealtype":"renewal","area":"2212","did":"69c3af0527014fd99d1ccf156c0bce93","mid":"69c3af0527014fd99d1ccf156c0bce93"}',     '{"leasetype":"offer lease","status":"fully executed","property":"106portw","city":"toronto","dealtype":"0","area":"","did":"04aedb01da5d44fead7e1315115c2530","mid":"04aedb01da5d44fead7e1315115c2530"}' ]  data_to_dump = re.sub(r"\'", "", str(resultlist)) json_data= json.dumps(data_to_dump) print json_data 

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -