linux - Collect logs for a function in python using shell script -


i have pyspark script working fine. script fetch data mysql , create hive tables in hdfs.

the pyspark script below.

#!/usr/bin/env python import sys pyspark import sparkcontext, sparkconf pyspark.sql import hivecontext conf = sparkconf() sc = sparkcontext(conf=conf) sqlcontext = hivecontext(sc)  #condition specify exact number of arguments in spark-submit command line if len(sys.argv) != 8:     print "invalid number of args......"     print "usage: spark-submit import.py arguments"     exit() table = sys.argv[1] hivedb = sys.argv[2] domain = sys.argv[3] port=sys.argv[4] mysqldb=sys.argv[5] username=sys.argv[6] password=sys.argv[7]  df = sqlcontext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load()  #register dataframe table df.registertemptable("mytemptable")  # create hive table temp table: sqlcontext.sql("create table {}.{} select * mytemptable".format(hivedb,table))  sc.stop() 

now pyspark script invoked using shell script. shell script passing table names arguments file.

the shell script below.

#!/bin/bash  source /home/$user/spark/source.sh [ $# -ne 1 ] && { echo "usage : $0 table ";exit 1; }  args_file=$1  timestamp=`date "+%y-%m-%d"` touch /home/$user/logs/${timestamp}.success_log touch /home/$user/logs/${timestamp}.fail_log success_logs=/home/$user/logs/${timestamp}.success_log failed_logs=/home/$user/logs/${timestamp}.fail_log  #function status of job creation function log_status {        status=$1        message=$2        if [ "$status" -ne 0 ];                 echo "`date +\"%y-%m-%d %h:%m:%s\"` [error] $message [status] $status : failed" | tee -a "${failed_logs}"                 #echo "please find attached log file more details"                 exit 1                 else                     echo "`date +\"%y-%m-%d %h:%m:%s\"` [info] $message [status] $status : success" | tee -a "${success_logs}"                 fi } while read -r table ;do    spark-submit --name "${table}" --master "yarn-client" --num-executors 2 --executor-memory 6g  --executor-cores 1 --conf "spark.yarn.executor.memoryoverhead=609" /home/$user/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${table}.log 2>&1   g_status=$?   log_status $g_status "spark job ${table} execution" done < "${args_file}"  echo "************************************************************************************************************************************************************************" 

i able collect logs each individual table in args_file using above shell script.

now have more 200 tables in mysql. have modified pyspark script below. have create function itreate on args_file , execute code.

new spark script

#!/usr/bin/env python import sys pyspark import sparkcontext, sparkconf pyspark.sql import hivecontext conf = sparkconf() sc = sparkcontext(conf=conf) sqlcontext = hivecontext(sc)  #condition specify exact number of arguments in spark-submit command line if len(sys.argv) != 8:     print "invalid number of args......"     print "usage: spark-submit import.py arguments"     exit() args_file = sys.argv[1] hivedb = sys.argv[2] domain = sys.argv[3] port=sys.argv[4] mysqldb=sys.argv[5] username=sys.argv[6] password=sys.argv[7]  def testing(table, hivedb, domain, port, mysqldb, username, password):      print "*********************************************************table = {} ***************************".format(table)     df = sqlcontext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load()      #register dataframe table     df.registertemptable("mytemptable")      # create hive table temp table:     sqlcontext.sql("create table {}.{} stored parquet select * mytemptable".format(hivedb,table))  input = sc.textfile('/user/xxxxxxx/spark_args/%s' %args_file).collect()  table in input:  testing(table, hivedb, domain, port, mysqldb, username, password)  sc.stop() 

now want collect logs individual table in args_file. getting 1 log file has log tables.

how can achieve requirement? or method doing wrong

new shell script:

spark-submit --name "${args_file}" --master "yarn-client" --num-executors 2 --executor-memory 6g  --executor-cores 1 --conf "spark.yarn.executor.memoryoverhead=609" /home/$user/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${args_file}.log 2>&1 


Comments

Popular posts from this blog

android - InAppBilling registering BroadcastReceiver in AndroidManifest -

python Tkinter Capturing keyboard events save as one single string -

sql server - Why does Linq-to-SQL add unnecessary COUNT()? -