apache spark - How to format JSON string after conversion from pyspark dataframe -
i have converted data frame json using tojson in pyspark gives me each row json string. want reformat bit
my code given below:
spark=sparksession.builder.config("spark.sql.warehouse.dir", "c:\spark\spark-warehouse").appname("testapp").enablehivesupport().getorcreate() sqlstring="select lflow1.leasetype leasetype, lflow1.status status, lflow1.property property, lflow1.city city, lesflow2.dealtype dealtype, lesflow2.area area, lflow1.did did, lesflow2.mid mid lflow1, lesflow2 lflow1.did = lesflow2.mid" def querybuilder(sqlval): df=spark.sql(sqlval) df.show() return df result=querybuilder(sqlstring) resultlist=result.tojson().collect() print(resultlist) print("type of",type(resultlist)) after this, output is:
[ '{"leasetype":"offer lease","status":"fully executed","property":"10230104","city":"edmonton","dealtype":"renewal","area":"2312","did":"79cc3959ffc8403f943ff0e7e93584f8","mid":"79cc3959ffc8403f943ff0e7e93584f8"}', '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"784","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}', '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"2223","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}', '{"leasetype":"offer lease","status":"conditional","property":"106portw","city":"toronto","dealtype":"renewal","area":"2212","did":"69c3af0527014fd99d1ccf156c0bce93","mid":"69c3af0527014fd99d1ccf156c0bce93"}', '{"leasetype":"offer lease","status":"fully executed","property":"106portw","city":"toronto","dealtype":"0","area":"","did":"04aedb01da5d44fead7e1315115c2530","mid":"04aedb01da5d44fead7e1315115c2530"}' ] but want format json array example: following 2 rows:
[ { "leasetype": "offer lease", "status": "fully executed", "property": "10230104", "city": "edmonton", "dealtype": "renewal", "area": "2312", "did": "79cc3959ffc8403f943ff0e7e93584f8", "mid": "79cc3959ffc8403f943ff0e7e93584f8" }, { "leasetype": "offer renew", "status": "fully executed", "property": "1040hami", "city": "vancouver", "dealtype": "renewal", "area": "784", "did": "ecf922d0583247c0a4cb591bd4b3843e", "mid": "ecf922d0583247c0a4cb591bd4b3843e" } ] i want omit ' here.
kindly me figure out.
hope helps!
import re import json resultlist = [ '{"leasetype":"offer lease","status":"fully executed","property":"10230104","city":"edmonton","dealtype":"renewal","area":"2312","did":"79cc3959ffc8403f943ff0e7e93584f8","mid":"79cc3959ffc8403f943ff0e7e93584f8"}', '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"784","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}', '{"leasetype":"offer renew","status":"fully executed","property":"1040hami","city":"vancouver","dealtype":"renewal","area":"2223","did":"ecf922d0583247c0a4cb591bd4b3843e","mid":"ecf922d0583247c0a4cb591bd4b3843e"}', '{"leasetype":"offer lease","status":"conditional","property":"106portw","city":"toronto","dealtype":"renewal","area":"2212","did":"69c3af0527014fd99d1ccf156c0bce93","mid":"69c3af0527014fd99d1ccf156c0bce93"}', '{"leasetype":"offer lease","status":"fully executed","property":"106portw","city":"toronto","dealtype":"0","area":"","did":"04aedb01da5d44fead7e1315115c2530","mid":"04aedb01da5d44fead7e1315115c2530"}' ] data_to_dump = re.sub(r"\'", "", str(resultlist)) json_data= json.dumps(data_to_dump) print json_data
Comments
Post a Comment