Writing Hive table from Spark specifying CSV as the format -
i'm having issue writing hive table spark. following code works fine; can write table (which defaults parquet format) , read in hive:
df.write.mode('overwrite').saveastable("db.table") hive> describe table; ok val string time taken: 0.021 seconds, fetched: 1 row(s)
however, if specify format should csv:
df.write.mode('overwrite').format('csv').saveastable("db.table")
then can save table, hive doesn't recognize schema:
hive> describe table; ok col array<string> deserializer time taken: 0.02 seconds, fetched: 1 row(s)
it's worth noting can create hive table manually , insertinto
it:
spark.sql("create table db.table(val string)") df.select('val').write.mode("overwrite").insertinto("db.table")
doing so, hive seems recognize schema. that's clunky , can't figure way automate schema string anyway.
that because hive serde
not support csv
default.
if insist on using csv
format, creating table below:
create table my_table(a string, b string, ...) row format serde 'org.apache.hadoop.hive.serde2.opencsvserde' serdeproperties ( "separatorchar" = "\t", "quotechar" = "'", "escapechar" = "\\" ) stored textfile;
and insert data through df.write.insertinto
for more info:
Comments
Post a Comment