linux - Collect logs for a function in python using shell script -
i have pyspark
script working fine. script fetch data mysql , create hive tables in hdfs
.
the pyspark
script below.
#!/usr/bin/env python import sys pyspark import sparkcontext, sparkconf pyspark.sql import hivecontext conf = sparkconf() sc = sparkcontext(conf=conf) sqlcontext = hivecontext(sc) #condition specify exact number of arguments in spark-submit command line if len(sys.argv) != 8: print "invalid number of args......" print "usage: spark-submit import.py arguments" exit() table = sys.argv[1] hivedb = sys.argv[2] domain = sys.argv[3] port=sys.argv[4] mysqldb=sys.argv[5] username=sys.argv[6] password=sys.argv[7] df = sqlcontext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load() #register dataframe table df.registertemptable("mytemptable") # create hive table temp table: sqlcontext.sql("create table {}.{} select * mytemptable".format(hivedb,table)) sc.stop()
now pyspark
script invoked using shell
script. shell script passing table names arguments file.
the shell script
below.
#!/bin/bash source /home/$user/spark/source.sh [ $# -ne 1 ] && { echo "usage : $0 table ";exit 1; } args_file=$1 timestamp=`date "+%y-%m-%d"` touch /home/$user/logs/${timestamp}.success_log touch /home/$user/logs/${timestamp}.fail_log success_logs=/home/$user/logs/${timestamp}.success_log failed_logs=/home/$user/logs/${timestamp}.fail_log #function status of job creation function log_status { status=$1 message=$2 if [ "$status" -ne 0 ]; echo "`date +\"%y-%m-%d %h:%m:%s\"` [error] $message [status] $status : failed" | tee -a "${failed_logs}" #echo "please find attached log file more details" exit 1 else echo "`date +\"%y-%m-%d %h:%m:%s\"` [info] $message [status] $status : success" | tee -a "${success_logs}" fi } while read -r table ;do spark-submit --name "${table}" --master "yarn-client" --num-executors 2 --executor-memory 6g --executor-cores 1 --conf "spark.yarn.executor.memoryoverhead=609" /home/$user/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${table}.log 2>&1 g_status=$? log_status $g_status "spark job ${table} execution" done < "${args_file}" echo "************************************************************************************************************************************************************************"
i able collect logs each individual table in args_file using above shell script.
now have more 200 tables in mysql. have modified pyspark
script below. have create function itreate on args_file
, execute code.
new spark script
#!/usr/bin/env python import sys pyspark import sparkcontext, sparkconf pyspark.sql import hivecontext conf = sparkconf() sc = sparkcontext(conf=conf) sqlcontext = hivecontext(sc) #condition specify exact number of arguments in spark-submit command line if len(sys.argv) != 8: print "invalid number of args......" print "usage: spark-submit import.py arguments" exit() args_file = sys.argv[1] hivedb = sys.argv[2] domain = sys.argv[3] port=sys.argv[4] mysqldb=sys.argv[5] username=sys.argv[6] password=sys.argv[7] def testing(table, hivedb, domain, port, mysqldb, username, password): print "*********************************************************table = {} ***************************".format(table) df = sqlcontext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load() #register dataframe table df.registertemptable("mytemptable") # create hive table temp table: sqlcontext.sql("create table {}.{} stored parquet select * mytemptable".format(hivedb,table)) input = sc.textfile('/user/xxxxxxx/spark_args/%s' %args_file).collect() table in input: testing(table, hivedb, domain, port, mysqldb, username, password) sc.stop()
now want collect logs individual table in args_file
. getting 1 log file has log tables.
how can achieve requirement? or method doing wrong
new shell script:
spark-submit --name "${args_file}" --master "yarn-client" --num-executors 2 --executor-memory 6g --executor-cores 1 --conf "spark.yarn.executor.memoryoverhead=609" /home/$user/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${args_file}.log 2>&1
Comments
Post a Comment