Writing Hive table from Spark specifying CSV as the format -

September 15, 2011

i'm having issue writing hive table spark. following code works fine; can write table (which defaults parquet format) , read in hive:

df.write.mode('overwrite').saveastable("db.table")  hive> describe table; ok val           string time taken: 0.021 seconds, fetched: 1 row(s)

however, if specify format should csv:

df.write.mode('overwrite').format('csv').saveastable("db.table")

then can save table, hive doesn't recognize schema:

hive> describe table; ok col                     array<string>           deserializer time taken: 0.02 seconds, fetched: 1 row(s)

it's worth noting can create hive table manually , insertinto it:

spark.sql("create table db.table(val string)") df.select('val').write.mode("overwrite").insertinto("db.table")

doing so, hive seems recognize schema. that's clunky , can't figure way automate schema string anyway.

that because hive serde not support csv default.

if insist on using csv format, creating table below:

create table my_table(a string, b string, ...) row format serde 'org.apache.hadoop.hive.serde2.opencsvserde' serdeproperties (    "separatorchar" = "\t",    "quotechar"     = "'",    "escapechar"    = "\\" )   stored textfile;

and insert data through df.write.insertinto

for more info:

https://cwiki.apache.org/confluence/display/hive/csv+serde

Search This Blog

LP

Writing Hive table from Spark specifying CSV as the format -

Comments

Post a Comment

Popular posts from this blog

PHP and MySQL WP -

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

nginx - phpPgAdmin - log in works but I have to login again after clicking on any links -