python - Pyspark Spliting List inside a list and tuple -
i have following
[('homicide', [('2017', 1)]), ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)]), ('robbery', [('2017', 1)])] how convert
[('homicide', ('2017', 1)), ('deceptive practice', ('2015', 10)), ('deceptive practice', ('2014', 3)), ('deceptive practice', ('2017', 14)), ('deceptive practice', ('2016', 14))] when tried using map throwing " attributeerror: 'list' object has no attribute 'map' "
rdd = sc.parallelize([('homicide', [('2017', 1)]), ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)])]) y = rdd.map(lambda x : (x[0],tuple(x[1])))
map method on rdd instead of python list, need parallelize list firstly , can use flatmap flatten inner lists:
rdd = sc.parallelize([('homicide', [('2017', 1)]), ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)]), ('robbery', [('2017', 1)])]) rdd.flatmap(lambda x: [(x[0], y) y in x[1]]).collect() # [('homicide', ('2017', 1)), # ('deceptive practice', ('2017', 14)), # ('deceptive practice', ('2016', 14)), # ('deceptive practice', ('2015', 10)), # ('deceptive practice', ('2013', 4)), # ('deceptive practice', ('2014', 3)), # ('robbery', ('2017', 1))]
Comments
Post a Comment