Spark 官网下载: https://spark.apache.org/downloads.html
Hadoop 官网下载: https://hadoop.apache.org/releases.html
目前使用Spark 版本为: spark-2.4.3 Hadoop版本为: hadoop-2.10.1
二、配置自登陆检测是否可以自登陆,不需要密码则配置正常:
ssh localhost
在搭建Hadoop环境时,出现localhost.localdomain: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password)问题,
这个问题是由于即使是本机使用SSH服务也是需要对自己进行公私钥授权的,所以在本机通过ssh-keygen创建好公私钥,然后将公钥复制到公私钥的认证文件中就可以了
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
也有可能还会有权限问题报错
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
增加ssh keys权限
chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys三、下载并配置JAVA环境
- 下载java
sudo yum -y install gcc gcc-c++ make openssl-devel gmp-devel mpfr-devel libmpc-devel emacs-filesystem libmpcdevel libaio numactl autoconf automake libtool libffi-devel snappy snappy-devel zlib zlib-devel bzip2 bzip2-devel lz4-devel libasan lsof sysstat telnet psmisc && sudo yum install -y which java-1.8.0-openjdk java-1.8.0-openjdk-devel && sudo yum clean all
- 在Centos7上,通过yum install java,安装openjdk。安装后,执行echo $JAVA_HOME发现返回为空。说明JAVA_HOME没有配置,需要到/etc/profile中配置JAVA_HOME
查找并配置JAVA_HOME
which java ls -lrt /usr/bin/java ls -lrt /etc/alternatives/java
通过该命令查询到openjdk的安装路径后,编辑/etc/profile文件中配置JAVA_HOME
export JAVA_HOME=/data/etc/java/jdk1.8.0_291 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin export PATH=$PATH:${JAVA_PATH}四、Spark配置
- 执行命令
tar -xzvf spark-2.4.3-bin-hadoop2.7.tgz cd spark-2.4.3-bin-hadoop2.7/conf cp spark-defaults.conf.template spark-defaults.conf
- 修改spark配置
vi spark-defaults.conf
spark.executor.heartbeatInterval 110s spark.rpc.message.maxSize 1024 spark.hadoop.dfs.replication 1 # 临时文件路径 spark.local.dir /data/spark_test/temp/spark-tmp spark.driver.memory 10g spark.driver.maxResultSize 10g
- 修改启动 spark WEBUI master端口
vi sbin/start-master.sh
- 启动spark服务:
./sbin/start-all.sh
lsof -i:19080
- 打开spark web 界面:
查看自身ip:
ifconfig | grep inet
打开web : 127.0.0.1:19080
- 运行example 测试是否正常运行:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://页面上显示的端口ip ./examples/jars/spark-examples_2.11-2.4.3.jar五、HDFS配置
- 解压
tar -xzvf hadoop-2.10.1.tar.gz cd hadoop-2.10.1
- 修改 etc/hadoop/hdfs-site.xml
dfs.replication 1 dfs.namenode.name.dir /data/spark_test/temp/hdfs/dfs/name dfs.datanode.data.dir /data/spark_test/temp/hdfs/dfs/data dfs.permission false dfs.client.block.write.replace-datanode-on-failure.policy NEVER dfs.permissions.enabled false dfs.webhdfs.enabled true
- 修改 etc/hadoop/core-site.xml
fs.defaultFS hdfs://自身ip:19000 hadoop.tmp.dir /data/spark_test/temp/hdfs-tmp hadoop.http.staticuser.user root
- 修改 etc/hadoop/hadoop-env.sh JAVA_HOME
配置自身JAVA_HOME到 env.sh中
- 格式化hdfs
/bin/hdfs namenode -format
- 启动dfs
./sbin/start-dfs.sh
- 查看端口启动状态
lsof -i:19000
- 打开hdfs web 界面: 默认端口是50700
查看自身ip:
ifconfig | grep inet
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)