在搭建Pseudo-Distributed Operation伪分布式架构之前,建议先参考基于linux服务器搭建hadoop平台,完成hadoop的安装。
搭建伪分布式结构可分以下几步
- 修改etc/hadoop/core-site.xml文件内容,在文件末尾增加如下代码:
fs.defaultFS hdfs://localhost:9000 hadoop.tmp.dir /hadoop/hadoop-3.3.1/data/tmp
其中的localhost换成自己的主机名称,/hadoop/hadoop-3.3.1/data/tmp换成自己想要存放临时文件的目录,此目录不需要提前手动创建,配置完成后在执行命令时会自动创建
- 修改etc/hadoop/hdfs-site.xml文件内容,在文件末尾新增:
dfs.replication 1
- 测试是否可以无密码登录localhost:
$ ssh localhost
如果失败,可以依次执行以下命令
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys
测试ssh localhost的目的是为了在后续执行任务时,可以直接连接本地,不需要手动输入密码
- 其实完成1~3步后,按理来说就已经搭建好伪分布式Pseudo-Distributed Operation了,下面则是需要我们进行测试,是否真的搭建成功,就笔者而言,也是失败了很多次,在踩了很多坑之后,才最终执行成功的。
在第一次执行的时候,需要先对文件夹进行格式化:
$ bin/hdfs namenode -format
当格式化一次后,倘若再执行这句,则会出现
INFO util.GSet: capacity = 2^14 = 16384 entries Re-format filesystem in Storage Directory root= /hadoop/hadoop-3.3.1/data/tmp/dfs/name; location= null ? (Y or N)
就表示系统在确认,是否还需要再格式化一次,此时输入N表示拒绝即可
- 启动$ sbin/start-dfs.sh,执行的日志文件会默认存放在/hadoop/hadoop-3.3.1/logs路径下,
笔者一开始执行会报以下错误:
ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [iZbp1fuss5tg7yiacreowzZ] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
对应的解决方式为对start-dfs.sh和stop-dfs.sh新增如下内容:
#!/usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
但再次执行还是会报错,不过报错信息发生了变动:
[root@localhost hadoop-3.3.1]# sbin/start-dfs.sh WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. Starting namenodes on [localhost] Last login: Sat Jan 8 18:54:02 CST 2022 from 111.192.165.72 on pts/7 localhost: Warning: Permanently added 'localhost,172 .28.78.227' (ECDSA) to the list of known hosts. localhost: ERROR: JAVA_HOME is not set and could not be found. Starting datanodes Last login: Sat Jan 8 18:54:49 CST 2022 on pts/7 localhost: ERROR: JAVA_HOME is not set and could not be found. Starting secondary namenodes [localhost] Last login: Sat Jan 8 18:54:49 CST 2022 on pts/7 localhost: ERROR: JAVA_HOME is not set and could not be found.
对应的解决方式为对etc/hadoop/hadoop-env.sh增加export JAVA_HOME=/usr/java/jdk1.8.0_311
再次执行$ sbin/start-dfs.sh,即不再报错,此时可以执行jps命令,会发现确实已经启动了程序
[root@localhost hadoop-3.3.1]# jps 55718 SecondaryNameNode 55355 NameNode 55500 DataNode 55935 Jps
- 当存在以上进程时,则可以按照官方文档中提供的测试方式进行测试
#生成hdfs虚拟路径 $ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/$ bin/hdfs dfs -mkdir input $ bin/hdfs dfs -put etc/hadoop/*.xml input # 执行测试任务 $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep input output 'dfs[a-z.]+'
执行后会出现一堆代码,代表着这个任务的执行日志,日志挺多,就随便截了一部分,意思一下
- 任务执行结束后,可执行bin/hdfs dfs -cat output/*查看执行结果:
[root@localhost hadoop-3.3.1]# bin/hdfs dfs -cat output/* 1 dfsadmin 1 dfs.replication
- 当完成所有需要执行的任务后,就可以关闭进程了$ sbin/stop-dfs.sh
全篇完结,撒花~
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)