Hadoop学习之集群搭建_随笔

Hadoop学习之集群搭建一、虚拟机配置 1.网络环境配置

[root@localhost]# cd /etc/sysconfig/network-scripts

[root@localhost network-scripts]# vi ifcfg-ens33

[root@localhost network-scripts]# service network restart

[root@localhost network-scripts]# ip addr

[root@localhost network-scripts]# ping www.baidu.com

2.主机名配置

[root@localhost network-scripts]# hostname

[root@localhost network-scripts]# hostnamectl set-hostname hadoop001

[root@localhost network-scripts]# hostname

3.主机名与ip地址建立映射关系

[root@localhost ]# vi /etc/hosts

[root@localhost ]# ping hadoop001

三、远程连接虚拟机 1.远程连接

2.新建目录（用于后续软件包下载与安装）

[root@hadoop001 ]# mkdir /export/ (新建目录)

[root@hadoop001 export ]# mkdir data (存放数据文件)

[root@hadoop001 export ]# mkdir software (存放服务)

[root@hadoop001 export ]# mkdir servers (存放安装包)

四、克隆虚拟机 1.链接克隆两台虚拟机修改网络配置重启网络配置

[root@hadoop001 ~]# cd /etc/sysconfig/network-scripts

[root@hadoop001 network-scripts]# vi ifcfg-ens33

Hadoop003进行同样 *** 作且修改主机名

远程连接成功

连接成功如上图所示

2.与其他两台建立映射

[root@hadoop001 ]# vi /etc/hosts

3.拷贝hosts文件到其他虚拟机中

[root@hadoop001 ]# scp /etc/hosts hadoop002:/etc

[root@hadoop001 ]# scp /etc/hosts hadoop003:/etc

4.与主机建立映射

在notepad++hosts文件下添加以下信息

五、ssh进行三台虚拟机相互免密登录 *** 作 1.进入ssh中生成公钥

[root@hadoop001 .ssh]# ssh-keygen -t rsa

2.查看生成文件

3.拷贝到hadoop001，hadoop002，hadoop003上

[root@hadoop001 .ssh]# ssh-copy-id hadoop001

[root@hadoop001 .ssh]# ssh-copy-id hadoop002

[root@hadoop001 .ssh]# ssh-copy-id hadoop003

4.监测是否能免密登录成功

[root@hadoop001 .ssh]# ssh hadoop001

[root@hadoop001 .ssh]# exit

[root@hadoop001 .ssh]# ssh hadoop002

[root@hadoop001 .ssh]# exit

[root@hadoop001 .ssh]# ssh hadoop003

[root@hadoop001 .ssh]# exit

六、安装jdk 1.安装ftp工具

将安装包拖拽进入虚拟机

2.解压

[root@hadoop001 software]# tar -xzvf jdk-linux-x64.tar.gz -C /export/servers/

3.配置环境变量

[root@hadoop001 jdk1.8.0_131]# vi /etc/profile

4.使环境变量起作用

[root@hadoop001 jdk1.8.0_131]# source /etc/profile

5.查看是否配置成功

[root@hadoop001 jdk1.8.0_131]# java -version

七、安装hadoop 1.上传压缩包

2.解压压缩包

[root@hadoop001 software]# tar -xzvf hadoop-3.1.4.tar.gz -C /export/servers

解压成功

3.配置环境变量

[root@hadoop001 servers]# source /etc/profile（使环境变量起作用）

4.配置成功

八、配置hadoop主节点文件 1.配置的文件结构

使用notepad++编辑器

2.配置hadoop-env.sh文件

3配置core-site.xml文件

4.配置mapred-site.xml文件

5.配置yarn-site.xml文件

6.配置workers文件

九、启动hadoop集群 1.分发配置 a)将配置好的jdk，hadoop分发给hadoop002，hadoop003

[root@hadoop001 servers]# scp -r /export/servers/ hadoop002:/export/

[root@hadoop001 servers]# scp -r /export/servers/ hadoop003:/export/

b)分发环境变量配置文件

[root@hadoop001 servers]# scp /etc/profile hadoop002:/etc

[root@hadoop001 servers]# scp /etc/profile hadoop003:/etc

c)重启所有虚拟机环境变量配置

d)格式化集群系统

[root@hadoop001 servers]# hdfs namenode -format

此时hadoop存储的目录结构

2.启动hadoop集群 a)查看java进程（jps)

[root@hadoop001 servers]# jps

说明目前没有启动java进程

b)启动hdfs

[root@hadoop001 servers]# start-dfs.sh

启动出错需要修改配置

添加root用户的启动权限

添加以下内容

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

export YARN_RESOURCEMANAGER_USER=root

export YARN_NODEMANAGER_USER=root

重新启动

c)出现错误

SecondaryNameNode应该在hadoop002上

错误解决返回查看配置文件

发现hdfs-site.xml文件并未修改配置

添加如下配置

修改后重新启动hdfs

d)启动yarn

[root@hadoop001 servers]# start-yarn.sh

十、hadoop集群测试 1.防火墙修改

[root@hadoop001 ~]# firewall-cmd --state

暂时关闭防火墙

[root@hadoop001 ~]# systemctl stop firewalld

永久禁用防火墙

[root@hadoop001 ~]# systemctl disable firewalld

2.使用UI查看hadoop a)查看hdfs

有两种端口方式查看hdfs（9870)

查看hdfs（50070）

需要在notepad++上将以下注释重新解除

之后可以用50070端口号查看

b)查看yarn

3.Hdfs shell 命令的使用

[root@hadoop001 ~]# hdfs dfs -ls /

[root@hadoop001 ~]# hdfs dfs -mkdir /test

在ui下查看

上传本地文件

[root@hadoop001 ~]# hdfs dfs -put hello.java /test

上传成功

4.案例：Hadoop集群的计算使用-单词计数

一个maperduce的单词计算程序的案例

a)转到该目录下

[root@hadoop001 hadoop-3.1.4]# cd /export/servers/hadoop-3.1.4/share/hadoop/mapreduce

b)在当前文件夹新建两个文件

[root@hadoop001 mapreduce]# vi a.txt

[root@hadoop001 mapreduce]# vi b.txt

c)在hadoop集群根目录下创建input文件夹

[root@hadoop001 mapreduce]# hdfs dfs -mkdir /input

[root@hadoop001 mapreduce]# hdfs dfs -ls /

d)将刚才linux主机下新建的a.txt，b.txt两个文件上传至集群input文件夹中

[root@hadoop001 mapreduce]# hdfs dfs -put a.txt /input

[root@hadoop001 mapreduce]# hdfs dfs -put b.txt /input

e)在hadoop集群上执行单词统计 *** 作

[root@hadoop001 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.1.4.jar wordcount /input /output

查看集群 *** 作

f)解决错误

查看错误信息

查看hadoop classpath

[root@hadoop001 ~]# hadoop classpat

修改yarn-site.xml文件

将更改后的文件分发给其他主机

关闭yarn

重新启动yarn

重新输入单词计数指令重新查看运行 *** 作

运行成功的结果

查看结果

运用shell命令查看

[root@hadoop001 ~]# hdfs dfs -ls /output

[root@hadoop001 ~]# hdfs dfs -get /output/part-r-00000

[root@hadoop001 ~]# cat part-r-00000

5.案例：计算PI值

[root@hadoop001 mapreduce]# hadoop jar hadoop-mapreduce-examples-3.1.4.jar pi 2 500

查看进程

结果如图

至此 hdfs集群初步搭建完成，且具有基本功能。

开启你的hadoop学习之旅吧。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5696351.html

Hadoop学习之集群搭建

发表评论

评论列表（0条）