spark 提交任务_随笔

spark 提交任务 1.提交格式

spark-submit 
--class com.data.Test  
--master yarn-cluster 
--executor-memory 1G 
--num-executors 8  
--executor-cores 2 
--queue test_queue
hdfs:user/test.jar arg1 arg2


spark-sql  
--queue test_queue 
--deploy-mode client  
--num-executors 10 
--executor-memory 10g 
--executor-cores 5


spark-shell 
--queue test_queue

2.参数说明参数参数说明举例--mastermaster的地址，即提交任务在哪里执行Spark启动时的master参数以及Spark的部署方式_三丰的专栏-CSDN博客_pyspark setmaster--deploy-modedriver程序运行的位置

client：driver程序运行在client端

cluster：driver程序运行在某个worker上

--queue提交大yarn集群使用的队列--queue test--num-executors启动executor个数，默认2，在yarn中使用--num-executors 100，设置的太多的话，队列可能无法给予充分的资源--executor-memory每个executor的内存，默认1G--executor-memory 10G--executor-cores每个executor的核数，在yarn或者standalone下使用--executor-core 2--class程序的主类，主要是Java或scala--jarsspark依赖的jar，逗号分割hoodie-hive-0.4.7.jar,hoodie-common-0.4.7.jar--py-files依赖的python文件--py-files test.py--driver-memory设置Driver的内存大小，默认为1G--driver-memory 5G--conf key=value设置spark 属性值--conf spark.executor.memoryOverhead=4G--packages

包含在driver 和executor 的 classpath 中的 jar 的 maven 坐标，写法为 groupId:artifactId:version

在首次运行的时候会自动下载

org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5708918.html

spark 提交任务

发表评论

评论列表（0条）