需要有Spark的Local模式
Spark Local模式安装教程
二、修改配置文件cd /export/server/spark/conf/ cp workers.template workers vim workers
# 修改为以下内容: node1 node2 node3
cd /export/server/spark/conf cp spark-env.sh.template spark-env.sh vim spark-env.sh
# 增加如下内容: JAVA_HOME=/export/server/jdk1.8.0_65 HADOOP_CONF_DIR=/export/server/hadoop-3.3.0/etc/hadoop/ YARN_CONF_DIR=/export/server/hadoop-3.3.0/etc/hadoop/ export SPARK_MASTER_HOST=node1 export SPARK_MASTER_PORT=7077 SPARK_MASTER_WEBUI_PORT=8080 SPARK_WORKER_CORES=1 SPARK_WORKER_MEMORY=1g SPARK_WORKER_PORT=7078 SPARK_WORKER_WEBUI_PORT=8081 SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://node1:8020/sparklog/ -Dspark.history.fs.cleaner.enabled=true"
start-all.sh hdfs dfs -mkdir -p /sparklog/ cd /export/server/spark/conf cp spark-defaults.conf.template spark-defaults.conf vim spark-defaults.conf
# 添加以下内容: spark.eventLog.enabled true spark.eventLog.dir hdfs://node1:8020/sparklog/ spark.eventLog.compress true
cd /export/server/spark/conf cp log4j.properties.template log4j.properties vim log4j.properties
# 修改为下面的代码 log4j.rootCategory=WARN, console三、分发到其他机器
cd /export/server/ scp -r spark/ node2:$PWD scp -r spark/ node3:$PWD四、启动Spark Standalone 一、启动Hadoop
start-all.sh二、三台机器启动ZooKeeper
cd /export/server/zookeeper/bin ./zkServer.sh start三、启动Spark集群
cd /export/server/spark sbin/start-all.sh sbin/start-history-server.sh
cd /export/server/spark bin/spark-shell --master spark://node1:7077二、PySpark连接
cd /export/server/spark conda activate pyspark_env ./bin/pyspark --master spark://node1:7077
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)