Writing Hive table from Spark specifying CSV as the format -


i'm having issue writing hive table spark. following code works fine; can write table (which defaults parquet format) , read in hive:

df.write.mode('overwrite').saveastable("db.table")  hive> describe table; ok val           string time taken: 0.021 seconds, fetched: 1 row(s) 

however, if specify format should csv:

df.write.mode('overwrite').format('csv').saveastable("db.table") 

then can save table, hive doesn't recognize schema:

hive> describe table; ok col                     array<string>           deserializer time taken: 0.02 seconds, fetched: 1 row(s) 

it's worth noting can create hive table manually , insertinto it:

spark.sql("create table db.table(val string)") df.select('val').write.mode("overwrite").insertinto("db.table") 

doing so, hive seems recognize schema. that's clunky , can't figure way automate schema string anyway.

that because hive serde not support csv default.

if insist on using csv format, creating table below:

create table my_table(a string, b string, ...) row format serde 'org.apache.hadoop.hive.serde2.opencsvserde' serdeproperties (    "separatorchar" = "\t",    "quotechar"     = "'",    "escapechar"    = "\\" )   stored textfile; 

and insert data through df.write.insertinto

for more info:

https://cwiki.apache.org/confluence/display/hive/csv+serde


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -