Writing Hive table from Spark specifying CSV as the format -
i'm having issue writing hive table spark. following code works fine; can write table (which defaults parquet format) , read in hive:
df.write.mode('overwrite').saveastable("db.table") hive> describe table; ok val string time taken: 0.021 seconds, fetched: 1 row(s) however, if specify format should csv:
df.write.mode('overwrite').format('csv').saveastable("db.table") then can save table, hive doesn't recognize schema:
hive> describe table; ok col array<string> deserializer time taken: 0.02 seconds, fetched: 1 row(s) it's worth noting can create hive table manually , insertinto it:
spark.sql("create table db.table(val string)") df.select('val').write.mode("overwrite").insertinto("db.table") doing so, hive seems recognize schema. that's clunky , can't figure way automate schema string anyway.
that because hive serde not support csv default.
if insist on using csv format, creating table below:
create table my_table(a string, b string, ...) row format serde 'org.apache.hadoop.hive.serde2.opencsvserde' serdeproperties ( "separatorchar" = "\t", "quotechar" = "'", "escapechar" = "\\" ) stored textfile; and insert data through df.write.insertinto
for more info:
Comments
Post a Comment