python版本 3.6.8
spark版本 2.4.5
-
创建python虚拟环境,详情见
-
压缩虚拟环境
cd /usr/local/thirdparty/ zip -q -r ai_test.zip ai_test/
-
虚拟环境put到HDFS
hdfs dfs -put ai_test.zip /ai
-
spark submit提交任务
spark-submit --master yarn \ --deploy-mode cluster \ --archives hdfs:///ai/ai_test.zip#py3 \ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON='py3/ai_test/bin/python' \ xxx.py --archives hdfs:///ai/ai_test.zip#py3 #py3 是一个别名,可任意取,便于spark.yarn.appMasterEnv.PYSPARK_PYTHON定位python环境
-
修改spark-env.sh文件,新增配置
vim spark-env.sh export PYSPARK_PYTHON=/usr/local/bin/python3
-
修改bin目录下的pyspark
vim pyspark if [[ -z "$PYSPARK_PYTHON" ]]; then if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! $WORKS_WITH_IPYTHON ]]; then echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2 exit 1 else PYSPARK_PYTHON=python3 #修改此处 fi fi export PYSPARK_PYTHON
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)