python - Pyspark Spliting List inside a list and tuple -

July 15, 2013

i have following

[('homicide', [('2017', 1)]),   ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)]),   ('robbery', [('2017', 1)])]

how convert

[('homicide', ('2017', 1)),   ('deceptive practice', ('2015', 10)),   ('deceptive practice', ('2014', 3)),   ('deceptive practice', ('2017', 14)),   ('deceptive practice', ('2016', 14))]

when tried using map throwing " attributeerror: 'list' object has no attribute 'map' "

rdd = sc.parallelize([('homicide', [('2017', 1)]), ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)])]) y = rdd.map(lambda x : (x[0],tuple(x[1])))

map method on rdd instead of python list, need parallelize list firstly , can use flatmap flatten inner lists:

rdd = sc.parallelize([('homicide', [('2017', 1)]),                        ('deceptive practice', [('2017', 14), ('2016', 14), ('2015', 10), ('2013', 4), ('2014', 3)]),                        ('robbery', [('2017', 1)])])  rdd.flatmap(lambda x: [(x[0], y) y in x[1]]).collect()  # [('homicide', ('2017', 1)),  #  ('deceptive practice', ('2017', 14)),  #  ('deceptive practice', ('2016', 14)),  #  ('deceptive practice', ('2015', 10)),  #  ('deceptive practice', ('2013', 4)),  #  ('deceptive practice', ('2014', 3)),  #  ('robbery', ('2017', 1))]

Search This Blog

LP

python - Pyspark Spliting List inside a list and tuple -

Comments

Post a Comment

Popular posts from this blog

python - Passing parameters between cells in a Jupyter Noteboook -

javascript - generate date range base on integers -

PHP and MySQL WP -