apache spark sql - unable to load data from parquet files to hive external table -


i have written below scala code create parquet file

scala> case class person(name:string,age:int,sex:string) defined class person  scala> val data = seq(person("jack",25,"m"),person("john",26,"m"),person("anu",27,"f")) data: seq[person] = list(person(jack,25,m), person(john,26,m), person(anu,27,f))  scala> import  sqlcontext.implicits._ import sqlcontext.implicits._  scala> import org.apache.spark.sql.savemode import org.apache.spark.sql.savemode  scala> df.select("name","age","sex").write.format("parquet").mode("overwrite").save("sparksqloutput/person") 

hdfs status:

[cloudera@quickstart ~]$ hadoop fs -ls sparksqloutput/person found 4 items -rw-r--r--   1 cloudera cloudera          0 2017-08-14 23:03 sparksqloutput/person/_success -rw-r--r--   1 cloudera cloudera        394 2017-08-14 23:03 sparksqloutput/person/_common_metadata -rw-r--r--   1 cloudera cloudera        721 2017-08-14 23:03 sparksqloutput/person/_metadata -rw-r--r--   1 cloudera cloudera        773 2017-08-14 23:03 sparksqloutput/person/part-r-00000-2dd2f334-1985-42d6-9dbf-16b0a51e53a8.gz.parquet 

then have created external hive table using command below

hive> create external table person (name string,age int,sex string) stored parquet location '/sparksqlouput/person/'; ok time taken: 0.174 seconds hive> select * person     > ; ok time taken: 0.125 seconds 

but while run above select query no rows returned. kindly on this.

in general, hive sql statement 'select * <table>' locates table directory table data exist , dumps file contents hdfs directory.

in case select * not working means location not correct.

please note, in scala last statement contains save("sparksqloutput/person"), "sparksqloutput/person" relative path , expand "/user/<logged in username>/sparksqloutput/person" (i.e. "/user/cloudera/sparksqloutput/person").

hence while creating hive table should use "/user/cloudera/sparksqloutput/person" instead of "/sparksqloutput/person". practically "/sparksqloutput/person" not exist , hence did not output in select * person.


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -