python - Pyspark Spliting List inside a list and tuple -


i have following

[('homicide', [('2017', 1)]),   ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)]),   ('robbery', [('2017', 1)])] 

how convert

[('homicide', ('2017', 1)),   ('deceptive practice', ('2015', 10)),   ('deceptive practice', ('2014', 3)),   ('deceptive practice', ('2017', 14)),   ('deceptive practice', ('2016', 14))] 

when tried using map throwing " attributeerror: 'list' object has no attribute 'map' "

rdd = sc.parallelize([('homicide', [('2017', 1)]), ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)])]) y = rdd.map(lambda x : (x[0],tuple(x[1]))) 

map method on rdd instead of python list, need parallelize list firstly , can use flatmap flatten inner lists:

rdd = sc.parallelize([('homicide', [('2017', 1)]),                        ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)]),                        ('robbery', [('2017', 1)])])  rdd.flatmap(lambda x: [(x[0], y) y in x[1]]).collect()  # [('homicide', ('2017', 1)),  #  ('deceptive practice', ('2017', 14)),  #  ('deceptive practice', ('2016', 14)),  #  ('deceptive practice', ('2015', 10)),  #  ('deceptive practice', ('2013', 4)),  #  ('deceptive practice', ('2014', 3)),  #  ('robbery', ('2017', 1))] 

Comments

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

go - golang pprof for c library code -