调用MapReduce对文件中各个单词出现的次数进行统计

调用MapReduce对文件中各个单词出现的次数进行统计,第1张

调用MapReduce对文件中各个单词出现的次数进行统计

调用MapReduce对文件中各个单词出现的次数进行统计

一、需求描述

1.将待分析的文件上传到HDFS。
2.调用MapReduce对文件中各个单词出现的次数进行统计。
3.将统计结果下载本地。

二、环境介绍

本实验采用Virtual Box虚拟机管理工具安装Ubuntu18.04,之后在Ubuntu中安装配置JDK1.8环境以及Hadoop3.2.2。

2.1 在Windows上安装Virtual Box

(1)获取VirtualBox 6.1.30 安装包
到Virtual Box官网:https://www.virtualbox.org/,在官网中选择Windows主机版本下载。若不清楚如何使用可以查看官方文档.
(2)安装Virtual Box
点击该安装包后,会出现安装界面,点击下一步。

  • 设置安装路径,点击下一步
  • 选择所需功能,按需选择即可

    若提示安装过程中需要中断网络,按是即可。
    准备按安装面,选择安装即可。若有需要修改的参数,则选择上一步进行修改。
    耐心等待1-3分钟即可。
2.2 在Virtual Box安装Ubuntu18.04

(1)获取Ubuntu18.04

  • 到Ubuntu官网,选择所需版本下载。

  • 这里选择ubuntu-18.04.6-desktop-amd64.iso 标准版下载,可根据需要下载所需版本。

    注:建议选择LTS版本,每个 Ubuntu LTS 总共维护 10 年:5 年标准支持 + 5 年 ESM。另:2021年12月24日在官方网站下载网络速度正常。

(2)在Virtual Box新建一个Ubuntu虚拟机

  • 打开Virtual Box,按快捷键CTRL+N:新建虚拟机。也可以选中编组新建虚拟机。配置保存文件夹,虚拟机类型、名称,Ubuntu版本,单击下一步。
  • 按物理机电脑配置给虚拟机内存,单击下一步

    注:若是使用图形化界面的Ubuntu,并打算运行集成开发环境eclipse等,建议多给些内存,防止eclipse等闪退。
  • 创建新的虚拟硬盘,选中创建
  • 选择虚拟硬盘文件类型,默认即可
  • l 选择动态分配磁盘大小,点击下一步
  • 选择文件位置和大小,选择创建

    至此,在Virtual Box创建Ubuntu虚拟机完成

(3)装载映像文件到Virtual Box

  • 启动虚拟机
  • 单击注册,选择保存Ubuntu的路径
  • 选择对应版本Ubuntu
    - 单击启动
    至此,Ubuntu 映像文件装载成功

(4)启动并安装Ubuntu虚拟机

  • 可以设置中文,并选择安装Ubuntu
  • l 键盘布局选择汉语,单击继续
  • 更新和其他软件,正常安装,单击继续即可

    注:建议不要下载更新,会非常慢
  • l 安装类型默认即可,单击现在安装
    注:若出现d窗显示是否写入磁盘,单击继续即可
  • 选择时区,选择中国上海,单击继续
  • 设置计算机名、用户名、密码,单击继续
  • 开始安装,需要亿点点时间

    注:建议断网,安装更快

    至此,Ubuntu安装完毕。重启启动Ubuntu即可。
2.3 安装前的准备工作

(1)安装增强功能

  • 选择设备->分配光驱->选中VBoxLinuxAdditions.iso
  • 单击文件->选中VBox_GAs_6.1.26->任意位置右键->选中在终端中打开
  • 运行VBoxLinuxAdditions.run脚本
bai@bai:/media/bai/VBox_GAs_6.1.26$ sudo ./VBoxLinuxAdditions.run 
[sudo] bai 的密码: 
Verifying archive integrity... All good.
Uncompressing VirtualBox 6.1.26 Guest Additions for Linux........
VirtualBox Guest Additions installer
Removing installed version 6.1.26 of VirtualBox Guest Additions...
Copying additional installer modules ...
Installing additional modules ...
VirtualBox Guest Additions: Starting.
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel 
modules.  This may take a while.
VirtualBox Guest Additions: To build modules for other installed kernels, run
VirtualBox Guest Additions:   /sbin/rcvboxadd quicksetup 
VirtualBox Guest Additions: or
VirtualBox Guest Additions:   /sbin/rcvboxadd quicksetup all
VirtualBox Guest Additions: Building the modules for kernel 5.4.0-91-generic.

This system is currently not set up to build kernel modules.
Please install the gcc make perl packages from your distribution.
VirtualBox Guest Additions: Running kernel modules will not be replaced until 
the system is restarted
 
  • 重启即可
bai@bai:/media/bai/VBox_GAs_6.1.26$ sudo reboot

(2)更新APT,防止部分软件无法安装

  • 重启后应当在家目录(~),若不在用如下命令返回
 bai@bai:/media/bai/VBox_GAs_6.1.26$ cd
  • 更新软件源
bai@bai:~$ sudo apt-get update
命中:1 http://cn.archive.ubuntu.com/ubuntu bionic InRelease
获取:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
获取:3 http://cn.archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] 
获取:4 http://cn.archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
已下载 252 kB,耗时 2秒 (126 kB/s)                            
正在读取软件包列表... 完成

(3)利用apt-get命令安装vim

bai@bai:~$ sudo apt-get install vim
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
将会同时安装下列软件:
  vim-runtime
建议安装:
  ctags vim-doc vim-scripts
下列【新】软件包将被安装:
  vim vim-runtime
升级了 0 个软件包,新安装了 2 个软件包,要卸载 0 个软件包,有 15 个软件包未被升级。
需要下载 6,588 kB 的归档。
解压缩后会消耗 32.0 MB 的额外空间。
您希望继续执行吗? [Y/n] Y
获取:1 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 vim-runtime all 2:8.0.1453-1ubuntu1.7 [5,435 kB]
获取:2 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 vim amd64 2:8.0.1453-1ubuntu1.7 [1,153 kB]
已下载 6,588 kB,耗时 27秒 (240 kB/s)                                         
正在选中未选择的软件包 vim-runtime。
(正在读取数据库 ... 系统当前共安装有 165575 个文件和目录。)
正准备解包 .../vim-runtime_2%3a8.0.1453-1ubuntu1.7_all.deb  ...
正在添加 vim-runtime 导致 /usr/share/vim/vim80/doc/help.txt 转移到 /usr/share/vim/vim80/doc/help.txt.vim-tiny
正在添加 vim-runtime 导致 /usr/share/vim/vim80/doc/tags 转移到 /usr/share/vim/vim80/doc/tags.vim-tiny
正在解包 vim-runtime (2:8.0.1453-1ubuntu1.7) ...
正在选中未选择的软件包 vim。
正准备解包 .../vim_2%3a8.0.1453-1ubuntu1.7_amd64.deb  ...
正在解包 vim (2:8.0.1453-1ubuntu1.7) ...
正在设置 vim-runtime (2:8.0.1453-1ubuntu1.7) ...
正在设置 vim (2:8.0.1453-1ubuntu1.7) ...
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/vim (vim)
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/vimdiff (vimdiff)
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/rvim (rvim)
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/rview (rview)
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/vi (vi)
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/view (view)
update-alternatives: 使用 /usr/bin/vim.basic 来在自动模式中提供 /usr/bin/ex (ex)
正在处理用于 man-db (2.8.3-2ubuntu0.1) 的触发器 ...

注:vim是vi的增强版,主要体现在如下几个方面:无限制的撤消;可以运行在多平台;语法高亮;对vi完全兼容。

(4)安装SSH、配置SSH无密码登录

  • 安装SSH server
bai@bai:~$ sudo apt-get install openssh-server
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
将会同时安装下列软件:
  ncurses-term openssh-sftp-server ssh-import-id
建议安装:
  molly-guard monkeysphere rssh ssh-askpass
下列【新】软件包将被安装:
  ncurses-term openssh-server openssh-sftp-server ssh-import-id
升级了 0 个软件包,新安装了 4 个软件包,要卸载 0 个软件包,有 15 个软件包未被升级。
需要下载 637 kB 的归档。
解压缩后会消耗 5,320 kB 的额外空间。
您希望继续执行吗? [Y/n] Y
获取:1 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 ncurses-term all 6.1-1ubuntu1.18.04 [248 kB]
获取:2 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 openssh-sftp-server amd64 1:7.6p1-4ubuntu0.5 [45.5 kB]
获取:3 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 openssh-server amd64 1:7.6p1-4ubuntu0.5 [332 kB]
获取:4 http://cn.archive.ubuntu.com/ubuntu bionic-updates/main amd64 ssh-import-id all 5.7-0ubuntu1.1 [10.9 kB]
已下载 637 kB,耗时 3秒 (187 kB/s)  
正在预设定软件包 ...
正在选中未选择的软件包 ncurses-term。
(正在读取数据库 ... 系统当前共安装有 167342 个文件和目录。)
正准备解包 .../ncurses-term_6.1-1ubuntu1.18.04_all.deb  ...
正在解包 ncurses-term (6.1-1ubuntu1.18.04) ...
正在选中未选择的软件包 openssh-sftp-server。
正准备解包 .../openssh-sftp-server_1%3a7.6p1-4ubuntu0.5_amd64.deb  ...
正在解包 openssh-sftp-server (1:7.6p1-4ubuntu0.5) ...
正在选中未选择的软件包 openssh-server。
正准备解包 .../openssh-server_1%3a7.6p1-4ubuntu0.5_amd64.deb  ...
正在解包 openssh-server (1:7.6p1-4ubuntu0.5) ...
正在选中未选择的软件包 ssh-import-id。
正准备解包 .../ssh-import-id_5.7-0ubuntu1.1_all.deb  ...
正在解包 ssh-import-id (5.7-0ubuntu1.1) ...
正在设置 ncurses-term (6.1-1ubuntu1.18.04) ...
正在设置 openssh-sftp-server (1:7.6p1-4ubuntu0.5) ...
正在设置 ssh-import-id (5.7-0ubuntu1.1) ...
正在设置 openssh-server (1:7.6p1-4ubuntu0.5) ...

Creating config file /etc/ssh/sshd_config with new version
Creating SSH2 RSA key; this may take some time ...
2048 SHA256:5kDSgl4H7cEHly/ZV3CCXl4iSRnnQIgq2JQtCeMBZF0 root@bai (RSA)
Creating SSH2 ECDSA key; this may take some time ...
256 SHA256:iNfpkF4SoqVxKQm0YVEEnmNEmqR+eqxjsVEg8AQOCW4 root@bai (ECDSA)
Creating SSH2 ED25519 key; this may take some time ...
256 SHA256:pFr7Mn9sF5Bh1zF2zCdIp17mhBHhizdsnF2gT9u6oIw root@bai (ED25519)
Created symlink /etc/systemd/system/sshd.service → /lib/systemd/system/ssh.service.
Created symlink /etc/systemd/system/multi-user.target.wants/ssh.service → /lib/systemd/system/ssh.service.
正在处理用于 man-db (2.8.3-2ubuntu0.1) 的触发器 ...
正在处理用于 ufw (0.36-0ubuntu0.18.04.1) 的触发器 ...
正在处理用于 ureadahead (0.100.0-21) 的触发器 ...
正在处理用于 systemd (237-3ubuntu10.52) 的触发器 ...

注:Ubuntu默认已安装了SSH client,故不需要安装

  • 使用命令登录本机
bai@bai:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:iNfpkF4SoqVxKQm0YVEEnmNEmqR+eqxjsVEg8AQOCW4.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
bai@localhost's password: 
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-91-generic x86_64)

 * documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

15 updates can be applied immediately.
To see these additional updates run: apt list --upgradable

Your Hardware Enablement Stack (HWE) is supported until April 2023.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
  • 设置免密登录
bai@bai:~$ exit
注销
Connection to localhost closed.
bai@bai:~$ cd ~/.ssh/	# 若没有该目录,请先执行一次ssh localhost
bai@bai:~/.ssh$ ssh-keygen -t rsa	# 会有提示,都按回车就可以
Generating public/private rsa key pair.
Enter file in which to save the key (/home/bai/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/bai/.ssh/id_rsa.
Your public key has been saved in /home/bai/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:WB6pFjm39fJ7qvCpV76ObLPIdWsFHdhXCNxg1Vm6lyM bai@bai
The key's randomart image is:
+---[RSA 2048]----+
|           .+B.o*|
|       . . .o =oo|
|      + = .  ..o |
|       O + .. ...|
|      + S . .E.o.|
|     .     o. o..|
|        . .oo.   |
|       . *++oo.  |
|        ++B**+   |
+----[SHA256]-----+
bai@bai:~/.ssh$ cat ./id_rsa.pub >> ./authorized_keys  # 加入授权
  • 再次登录本机(此时实现免密登录)
bai@bai:~/.ssh$ ssh localhost
Welcome to Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-91-generic x86_64)

 * documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

15 updates can be applied immediately.
To see these additional updates run: apt list --upgradable

New release '20.04.3 LTS' available.
Run 'do-release-upgrade' to upgrade to it.

Your Hardware Enablement Stack (HWE) is supported until April 2023.
Last login: Sat Dec 25 11:42:49 2021 from 127.0.0.1

2.4 配置安装Java环境

(1)获取JDK

  • 进入Java官网,选中对应安装包下载即可
  • 打勾,单击下载

    注:这里可能需要注册一个Oracle账户
    (2)将JDK传入Ubuntu系统
    第一种:直接拖拽法
  • 创建一个存放各类安装包的文件夹
bai@bai:~$ mkdir Downloads

打开文件夹->选择Downloads->将所需要的文件从物理机拖拽进入虚拟机

注:此种方法需要安装增强功能才可以使用

第二种:文件管理器

  • 选择控制->文件管理器
  • 将物理机文件传递给虚拟机
  • 显示传递完成

    注:之后再传入任何东西到Ubuntu均用此两种方法。

(3)安装JDK

  • 进入对应文件夹创建jvm文件
bai@bai:~$ cd /usr/lib
bai@bai:/usr/lib$ sudo mkdir jvm	# 创建jvm目录存放jdk文件
[sudo] bai 的密码: 
  • 进入对应文件夹解压jdk文件
bai@bai:/usr/lib$ cd ~/Downloads	# 进入对应目录
bai@bai:~/Downloads$ sudo tar -zxf ./jdk-8u311-linux-x64.tar.gz -C /usr/lib/jvm #把JDK文件解压到/usr/lib/jvm目录下

注:tar -v 可以查看具体的解包过程,这里可加可不加

  • 进入jvm目录查看jdka安装包版本
bai@bai:~/Downloads$ cd /usr/lib/jvm
bai@bai:/usr/lib/jvm$ ls
jdk1.8.0_311
  • 配置环境变量
bai@bai:/usr/lib/jvm$ vim ~/.bashrc	# 进入用户配置文件

键入如下内容:

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_311
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

配置文件立即生效

bai@bai:/usr/lib/jvm$ source ~/.bashrc
  • 查看Java版本
bai@bai:/usr/lib/jvm$ java -version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.311-b11, mixed mode)
2.5 安装Hadoop3.3.1

(1)从官方网站获取Hadoop,并传入Ubuntu
l 下载地址(官方并下载速度过慢,建议使用镜像站下载)
1、Hadoop3.3.1官方网站下载地址
2、Hadoop3.3.1华为镜像下载地址。

  • 将Hadoop传入Ubuntu
    同2.4采用直接拖拽或文件管理器方式,再此不多介绍

(2)安装Hadoop3.3.1

  • 解压Hadoop
bai@bai:~$ sudo tar -zxf ~/Downloads/hadoop-3.3.1.tar.gz -C /usr/local
[sudo] bai 的密码:
  • 将文件夹改名Hadoop并修改文件权限
bai@bai:~$ cd /usr/local/
bai@bai:/usr/local$ sudo mv ./hadoop-3.3.1/ ./hadoop
bai@bai:/usr/local$ sudo chown -R bai:bai ./hadoop	

注:修改权限时给用户:用户组 bai:bai,即装系统时的用户

  • 查看Hadoop版本
bai@bai:~$ cd /usr/local/hadoop/
bai@bai:/usr/local/hadoop$ ./bin/hadoop version
Hadoop 3.3.1
Source code repository https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2
Compiled by ubuntu on 2021-06-15T05:13Z
Compiled with protoc 3.7.1
From source with checksum 88a4ddb2299aca054416d6b7f81ca55
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar

(3)Hadoop伪分布式配置

  • 进入文件夹/usr/local/hadoop/etc/hadoop
bai@bai:/usr/local/hadoop$ cd /usr/local/hadoop/etc/hadoop
  • 进入修改core-site.xml配置文件
    进入文件core-site.xml:
bai@bai:/usr/local/hadoop/etc/hadoop$ vim core-site.xml

修改配置文件:


    
        hadoop.tmp.dir
        file:/usr/local/hadoop/tmp
        Abase for other temporary directories.
    
    
        fs.defaultFS
        hdfs://localhost:9000
    

  • 进入修改hdfs-site.xml配置文件
    进入配置文件
bai@bai:/usr/local/hadoop/etc/hadoop$ vim hdfs-site.xml

修改配置文件


    
        dfs.replication
        1
    
    
        dfs.namenode.name.dir
        file:/usr/local/hadoop/tmp/dfs/name
    
    
        dfs.datanode.data.dir
        file:/usr/local/hadoop/tmp/dfs/data
    

  • 执行Name Node格式化
bai@bai:/usr/local/hadoop/etc/hadoop$ cd /usr/local/hadoop
bai@bai:/usr/local/hadoop$ ./bin/hdfs namenode -format
WARNING: /usr/local/hadoop/logs does not exist. Creating.
2021-12-25 14:14:33,160 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = bai/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.3.1
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/jul-to-slf4j-1.7.30.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-client-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-common-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-server-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-core-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.4.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-server-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/dnsjava-2.1.7.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-3.3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/token-provider-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.30.jar:/usr/local/hadoop/share/hadoop/common/lib/j2objc-annotations-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/checker-qual-2.5.2.jar:/usr/local/hadoop/share/hadoop/common/lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.55.jar:/usr/local/hadoop/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-http-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-3.3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/metrics-core-3.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-5.0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-servlet-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jakarta.activation-api-1.2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-4.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-config-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-4.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.1.8.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.5.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/usr/local/hadoop/share/hadoop/common/lib/woodstox-core-5.3.0.jar:/usr/local/hadoop/share/hadoop/common/lib/failureaccess-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/accessors-smart-2.4.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-text-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-databind-2.10.5.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.11.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-4.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-9.8.1.jar:/usr/local/hadoop/share/hadoop/common/lib/json-smart-2.4.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-servlet-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-ajax-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-2.10.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-security-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-annotations-2.10.5.jar:/usr/local/hadoop/share/hadoop/common/lib/stax2-api-4.2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.5.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/re2j-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-xml-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang3-3.7.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-webapp-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-io-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/common/hadoop-kms-3.3.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.1-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-registry-3.3.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/okio-1.6.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-net-3.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/httpcore-4.4.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-server-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/dnsjava-2.1.7.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-annotations-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/animal-sniffer-annotations-1.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/j2objc-annotations-1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/checker-qual-2.5.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsch-0.1.55.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-http-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-auth-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-5.0.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-servlet-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jakarta.activation-api-1.2.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-27.0-jre.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/curator-framework-4.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.1.61.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/curator-client-4.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/snappy-java-1.1.8.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/zookeeper-jute-3.5.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/zookeeper-3.5.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/woodstox-core-5.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/failureaccess-1.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/accessors-smart-2.4.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-text-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.8.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-databind-2.10.5.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/curator-recipes-4.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/nimbus-jose-jwt-9.8.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/json-smart-2.4.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-ajax-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-2.10.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-security-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-annotations-2.10.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/stax2-api-4.2.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/httpclient-4.5.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/re2j-1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-xml-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang3-3.7.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-webapp-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-io-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.3.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-3.3.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-3.3.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.1-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.1-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-uploader-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn:/usr/local/hadoop/share/hadoop/yarn/lib/jline-3.9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/bcprov-jdk15on-1.60.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jna-5.2.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/snakeyaml-1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/usr/local/hadoop/share/hadoop/yarn/lib/websocket-common-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/websocket-server-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-plus-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-commons-9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax-websocket-client-impl-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-tree-9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.websocket-client-api-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/bcpkix-jdk15on-1.60.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-base-2.10.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-annotations-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/objenesis-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.19.jar:/usr/local/hadoop/share/hadoop/yarn/lib/websocket-api-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/websocket-servlet-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-client-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-analysis-9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-4.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/websocket-client-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/fst-2.50.jar:/usr/local/hadoop/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.10.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jakarta.xml.bind-api-2.3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.10.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-jndi-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax-websocket-server-impl-9.4.40.v20210413.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.websocket-api-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-router-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-services-api-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-services-core-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-mawo-core-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-3.3.1.jar
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r a3b9c37a397ad4188041dd80621bdeefc46885f2; compiled by 'ubuntu' on 2021-06-15T05:13Z
STARTUP_MSG:   java = 1.8.0_311
************************************************************/
2021-12-25 14:14:33,233 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-12-25 14:14:33,655 INFO namenode.NameNode: createNameNode [-format]
2021-12-25 14:14:35,948 INFO namenode.NameNode: Formatting using clusterid: CID-c17d9f47-eaed-4867-9209-368bd3d069f0
2021-12-25 14:14:36,150 INFO namenode.FSEditLog: Edit logging is async:true
2021-12-25 14:14:36,272 INFO namenode.FSNamesystem: KeyProvider: null
2021-12-25 14:14:36,274 INFO namenode.FSNamesystem: fsLock is fair: true
2021-12-25 14:14:36,279 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2021-12-25 14:14:36,305 INFO namenode.FSNamesystem: fsOwner                = bai (auth:SIMPLE)
2021-12-25 14:14:36,322 INFO namenode.FSNamesystem: supergroup             = supergroup
2021-12-25 14:14:36,322 INFO namenode.FSNamesystem: isPermissionEnabled    = true
2021-12-25 14:14:36,322 INFO namenode.FSNamesystem: isStoragePolicyEnabled = true
2021-12-25 14:14:36,323 INFO namenode.FSNamesystem: HA Enabled: false
2021-12-25 14:14:36,460 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2021-12-25 14:14:36,479 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2021-12-25 14:14:36,486 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2021-12-25 14:14:36,519 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2021-12-25 14:14:36,520 INFO blockmanagement.BlockManager: The block deletion will start around 2021 十二月 25 14:14:36
2021-12-25 14:14:36,525 INFO util.GSet: Computing capacity for map BlocksMap
2021-12-25 14:14:36,526 INFO util.GSet: VM type       = 64-bit
2021-12-25 14:14:36,554 INFO util.GSet: 2.0% max memory 2.4 GB = 48.2 MB
2021-12-25 14:14:36,555 INFO util.GSet: capacity      = 2^23 = 8388608 entries
2021-12-25 14:14:36,628 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
2021-12-25 14:14:36,634 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2021-12-25 14:14:36,649 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.999
2021-12-25 14:14:36,662 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2021-12-25 14:14:36,662 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2021-12-25 14:14:36,664 INFO blockmanagement.BlockManager: defaultReplication         = 1
2021-12-25 14:14:36,664 INFO blockmanagement.BlockManager: maxReplication             = 512
2021-12-25 14:14:36,667 INFO blockmanagement.BlockManager: minReplication             = 1
2021-12-25 14:14:36,667 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2021-12-25 14:14:36,668 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
2021-12-25 14:14:36,668 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2021-12-25 14:14:36,668 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2021-12-25 14:14:36,767 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
2021-12-25 14:14:36,774 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
2021-12-25 14:14:36,775 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
2021-12-25 14:14:36,775 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
2021-12-25 14:14:36,971 INFO util.GSet: Computing capacity for map INodeMap
2021-12-25 14:14:36,972 INFO util.GSet: VM type       = 64-bit
2021-12-25 14:14:36,973 INFO util.GSet: 1.0% max memory 2.4 GB = 24.1 MB
2021-12-25 14:14:36,973 INFO util.GSet: capacity      = 2^22 = 4194304 entries
2021-12-25 14:14:36,976 INFO namenode.FSDirectory: ACLs enabled? true
2021-12-25 14:14:36,977 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2021-12-25 14:14:36,979 INFO namenode.FSDirectory: XAttrs enabled? true
2021-12-25 14:14:36,987 INFO namenode.NameNode: Caching file names occurring more than 10 times
2021-12-25 14:14:36,993 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2021-12-25 14:14:37,018 INFO snapshot.SnapshotManager: SkipList is disabled
2021-12-25 14:14:37,052 INFO util.GSet: Computing capacity for map cachedBlocks
2021-12-25 14:14:37,053 INFO util.GSet: VM type       = 64-bit
2021-12-25 14:14:37,058 INFO util.GSet: 0.25% max memory 2.4 GB = 6.0 MB
2021-12-25 14:14:37,059 INFO util.GSet: capacity      = 2^20 = 1048576 entries
2021-12-25 14:14:37,090 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2021-12-25 14:14:37,095 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2021-12-25 14:14:37,134 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2021-12-25 14:14:37,152 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2021-12-25 14:14:37,163 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2021-12-25 14:14:37,174 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2021-12-25 14:14:37,175 INFO util.GSet: VM type       = 64-bit
2021-12-25 14:14:37,175 INFO util.GSet: 0.029999999329447746% max memory 2.4 GB = 740.0 KB
2021-12-25 14:14:37,175 INFO util.GSet: capacity      = 2^17 = 131072 entries
2021-12-25 14:14:37,345 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1137935780-127.0.1.1-1640412877306
2021-12-25 14:14:37,456 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
2021-12-25 14:14:37,564 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-12-25 14:14:38,019 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 398 bytes saved in 0 seconds .
2021-12-25 14:14:38,098 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-12-25 14:14:38,348 INFO namenode.FSNamesystem: Stopping services started for active state
2021-12-25 14:14:38,349 INFO namenode.FSNamesystem: Stopping services started for standby state
2021-12-25 14:14:38,378 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-12-25 14:14:38,379 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bai/127.0.1.1
************************************************************/
  • 启动start-dfs.sh
bai@bai:/usr/local/hadoop$ cd /usr/local/hadoop
bai@bai:/usr/local/hadoop$ ./sbin/start-dfs.sh  #start-dfs.sh是个完整的可执行文件,中间没有空格
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [bai]
  • 查看进程
bai@bai:/usr/local/hadoop$ jps
2337 SecondaryNameNode
2146 DataNode
2012 NameNode
2556 Jps
  • 关闭Hadoop:
bai@bai:/usr/local/hadoop$ ./sbin/stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [bai]
2.6 安装eclipse

(1)选择软件市场->搜索eclipse即可

注:之后继续在软件市场查看已安装程序,直接启动即可
(2)从从官方网站获取eclipse,上传至Ubuntu系统(同上)
eclipse官方网站:

  • 解压安装包到指定目录:
bai@bai:~$ sudo tar zxf ~/Downloads/eclipse-dsl-2021-12-R-linux-gtk-x86_64.tar.gz -C /usr/local
[sudo] bai 的密码: 
bai@bai:/usr/local$ sudo chown -R bai:bai eclipse #赋予相关用户:
  • 选择文件->其他位置->进入对应目录->选中eclipse

    注:如果直接通过官方网站下载,可能存在因为JDK版本不适配等原因导致eclipse无法正常打开。翻阅资料过程中,看到有一个博主已经总结了所有可能的解决情况,这里不在赘述。

(3)Eclipse打不开相关解决方法
1)出现an error has occured see the log file(日志中出现!MESSAGE frameworkEvent ERROR !STACK 0)问题
2)若只是JDK版本不适配问题,提示eclipse指定JDK版本启动,解决version XXX of the JVM is not suitable for this product.Version:XXX 问题
则在eclipse.ini中添加如下内容:

-vm
/JDK存放路径

注:实际上出现问题最快解决方法是重启尝试是否可以正常打开,若不可删除所有安装包,重新下载,重新打开

2.6 编写相关程序

(1)新建项目,并给项目命名:

(2)为项目导入所需的JRE包

  • 选中Libraries–>单击Add External JARs
  • 单击其他位置–>选中/usr/local/hadoop/share/hadoop路径
  • 需要导入./common下的jar包
  • 接下去还需要导入./common/lib,./mapreduce,./mapreduce/lib的jar包

    (2)编写应用程序
  • 选中file–>选中new–>单击Class
  • 给程序命名(遵循驼峰命名及见名知意)
  • 编写词频统计相关代码
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
    public WordCount() {
    }
     public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
        if(otherArgs.length < 2) {
            System.err.println("Usage: wordcount  [...] ");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class); 
        for(int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true)?0:1);
    }
    public static class TokenizerMapper extends Mapper {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();
        public TokenizerMapper() {
        }
        public void map(Object key, Text value, Mapper.Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString()); 
            while(itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
                context.write(this.word, one);
            }
        }
    }
public static class IntSumReducer extends Reducer {
        private IntWritable result = new IntWritable();
        public IntSumReducer() {
        }
        public void reduce(Text key, Iterable values, Reducer.Context context) throws IOException, InterruptedException {
            int sum = 0;
            IntWritable val;
            for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
                val = (IntWritable)i$.next();
            }
            this.result.set(sum);
            context.write(key, this.result);
        }
    }
}

ERRROR:如果提示有包未导入,如下可重新导入相关包

  • 首先选中Word Cound按右键—单击properties
  • 单击Java Build Path---->选中Libraries—>单击Add External JAR

    (3)打包应用程序
  • 创建存放应用程序
bai@bai:~$ cd /usr/local/hadoop
bai@bai:/usr/local/hadoop$ mkdir myapp
  • 执行程序:选中执行按钮---->选中Run As —>单击1.java Application
  • 执行结果:
  • 选中File---->单击Export
  • 选择打包的类型:单击Java---->选中Runnable JAR file---->单击Next
  • 选着程序保存路径---->单击Finish
  • 提示框按OK即可


    至此,程序打包完毕
三、数据来源及数据上传 3.1 数据来源:

(1)A CHRISTMAS CAROL(圣诞颂歌)
(2)HOLIDAY ROMANCE(浪漫假日)
(3)OLIVER TWIST(雾都孤儿)

3.2将数据上传到HDFS文件管理系统
  • 启动Hadoop
bai@bai:/usr/local/hadoop$ cd /usr/local/hadoop
bai@bai:/usr/local/hadoop$ ./sbin/start-dfs.sh
  • 删除可能存在的文件(清理实验环境防止报错)
bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -rm -r input
Deleted input
bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -rm -r output 
rm: `output': No such file or directory # 没有该目录
  • 在HDFS中创建用户目录,创建input目录
bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -mkdir -p /user/bai

注:这里需要创建与用户名相同的目录,否则可能报错

  • 创建input
    bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -mkdir input
    l 将文件上传到输入input目录
bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -put ~/text/wordfile* input
四、数据上传结果查看

bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -ls input # 查看是否上传成功

bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -ls input
Found 3 items
-rw-r--r--   1 bai supergroup     159257 2021-12-27 11:09 input/wordfile1.txt
-rw-r--r--   1 bai supergroup      77290 2021-12-27 11:09 input/wordfile2.txt
-rw-r--r--   1 bai supergroup     892579 2021-12-27 11:09 input/wordfile3.txt
五、数据处理过程的描述 5.1、Java API

API(Application Programming Interface,应用程序编程接口),是一组被封装的底层代码的集合,给用户提供一定的调用接口。编程时不需要知道API如何实现,只需要理解接口有什么作用,什么时候可以使用即可。减少程序员学习语言一门时间,提高代码可读性、复用性,是面向对象编程的集中体现。
JavaAPI顾名思义,就是利用Java调用应用程序编程接口,实现核心代码的复用。

5.2、HDFS简介

(1)HDFS
Hadoop Distributed File System(Hadoop分布式文件管理系统)的简称。分布式文件管理系统的出现与硬盘存储容量、存储价格有直接关系。本书中的HDFS出现在本世纪初,是基于Google的文件系统(GFS)和Map Reduce框架。该框架使得Nutch的扩展性得到极大的提升,这个框架便是Hadoop。于2005年作为Lucene的子项目Nutch的一部分被引入apache基金会(开源基金会)。

(2)HDFS目标

  1. 可以使用廉价存储设备就实现大规模存储。
  2. 流数据读写。
  3. 简单文件模型,实现一次写入多次读取。
  4. HDFS采用Java实现,通过JVM支持跨平台。

(3)HDFS劣势和优势

  1. 不适合低延迟数据访问;
  2. 无法高效存储大规模小文件;
  3. 不支持多用户写入以及任意修改文件。
  4. 支持大规模存储;
  5. 简化系统设计;
  6. 适合数据备份。

(4)分布式文件管理系统
分为数据节点和名称节点以及第二名称节点,即DataNode、NameNode和SecondaryNameNode。数据节点用于存取数据块,(一个块默认为64M,但为了方便数据存储固定为128M(可以设置更大的块))。名称节点则可以分为:Fslmage(文件快照)和EditLog( *** 作日志)。EditLog记录文件创建、删除、重命名等 *** 作,Fslmage维护文件系统树以及文件树中所有的文件和目录的元数据。名称结点记录每个文件中各个块所在的数据节点的位置信息,但并不持久存储,而是在系统启动时扫描所有数据节点并重构。

总结,名称节点的作用是:保存元数据;存在内存中;保存文件快和DataNode之间的映射关系。数据节点的作用是:存储数据文件;存放在磁盘中;维护数据节点和名称节点之间的映射关系(心跳)。

注1:元数据包括物理元数据、数据元数据、存储元数据、计算元数据等,但并不意味着名称结点会存储以上的所有元数据。需要明晰元数据的作用是为了让数据“结构化”,方便我们找到要找的数据的准确位置以及其“状态”。正如我们看到一个人会给它贴各种标签,通过整体人方便我们认识个体人。我也需要给庞杂的数据贴一个一个标签,方便我们对数据的 *** 作。(参考百度百科)
注2:数据分块原因:方便计算数据节点;方便存储数据。

5.3名称节点和第二名称节点。

首先需要了解名称结点中有 *** 作日志(EditLog)和文件快照(Fslmage)。第二名称节点会在t1或当 *** 作日志文件(EditLog)过大时。可以通过配置让其产生新的 *** 作日志文件,此时旧的文件快照(Fslmage)和旧的 *** 作日志(Editlog)将会合并为文件快照。因为这部分 *** 作需要一定时间,这个时候旧的文件快照和产生的新的日志文件维持服务器正常工作。假设到t2时间时,第二名称节点的新文件快照传回名称结点。该文件快照会将旧的文件快照替换。如此,可以减少日志文件大小。
到这里,想必都会有一个疑问,名称节点存放在内存中,为何要日志文件存入文件快照?这是因为在名称节点运行期间,HDFS会将所有更新 *** 作写入 *** 作日志文件。再长时间运行后, *** 作日志文件会非常大。虽然对运行中的名称节点没有明显影响,但当名称节点重启时,名称节点会将Fslmage所有内容映像到内存中,再一条条执行EditLog中的记录,若EditLog过大,启动时间会非常慢。而在这段时间中,HDFS系统处于安全模式,无法进行读写 *** 作,影响使用。
注:内存可以简单认为是我们在使用笔记本时能直接在屏幕看到的大部分内容。内存读写能力远强于硬盘,有效减少CPU调用数据时间,提高计算机运行效率。

5.4 HDFS体系概述

HDFS采用主从(Master/Slave)结构模型,一个集群有且只有一个名称节点多个数据节点。每个数据节点都需要周期性的给名称节点发送“心跳”,报告状态。若没有按时发送心跳信息,会被数据节点标记为“死机”,并不在分配I/O请求。
HDFS通信协议建立在TCP/IP协议基础上,数据节点与名称节点采用数据节点协议交互;客户端与数据节点则采用远程过程调用(Remote Procedure Call)实现,名称节点只响应来自数据节点和客户端的RPC请求。
HDFS体系局限性:1、名称节点为了提高效率将数据保存在内存中,因此名称节点能容纳对象的个数受限于内存空间大小。2、性能受限于单个名称结点的吞吐量。3、集群只有一个名称结点,无法实现对应用的隔离。4、一旦名称结点发生故障,整个集群不可用。

5.5 HDFS存储原理

(1)数据冗余存储(多副本存储)优点:

  • 就近存取,加快数据传输速度。
  • 容易判断出传输过程中是否出错。
  • 保证数据可靠性。

(2)数据副本存放策略:

  • 客户端发起写 *** 作请求,集群内则就近写入,集群外则选择磁盘空间充足、CPU不忙的数据节点写入。
  • 第二副本会被存放在与第一副本不同的机架上的数据节点上。
  • 第三副本会存放在与第一副本相同机架的其他节点上。
  • 更多副本则在集群中选择数据节点随机存储。

(3)数据读取:

  • HDFS提供可确认一个数据节点所属机架的ID,客户端可调用API获取自身机架ID。
  • 客户端读取数据时,从名称结点获取数据不同副本存放位置,就近选择一个副本读取,无就近则随机。

(4) 数据复制(略):

(5) 数据错误与恢复:

  • 名称节点出错:Hadoop保障名称节点安全方式:
    1)将名称节点数据同步存储到其他文件系统;
    2)运用名称节点中的元数据信息进行系统恢复。

  • 数据节点出错:
    1)由于一些数据节点不可用,导致数据块副本数量小于冗余因子。
    2)名称节点定期检查后这种情况,一旦发现副本数小于冗余因子,启动数据冗余复制,生成新副本(新副本是由其他相同数据节点复制产生)。

  • 数据出错:
    客户端读取到数据后会进行校验,若校验不通过,则请求到另一数据节点读取文件快,并向名称节点报告文件快出错,名称节点会定期检查并重新复制该块。

5.6 HDFS读写过程

5.6.1 HDFS读过程
(1)HDFS客户端通过FileSystem.open()方法打开文件,HDFS在收到用户请求后,通过DistributeFileSystem,实现用户请求。实际上,DistributeFileSystem则会创建输入流FSData InputStream,对HDFS而言输入流则是DFSInputStream。
(2)DFSInputStream远程调用名称节点,获取文件数据块保存位置。名称节点会根据客户端位置返回数据节点位置并按远近排序。
(3)获得输入流DFSInputStream后,客户端调用read()方法读取数据,输入流根据排序结构就近选择数据节点读取数据。
(4)数据从数据节点读取到客户端;数据读取完毕后,FSDataInputSteram关闭和该数据节点的连接。
(5)输入流通过getBlockLocations()方法查找下一个数据块。
找到数据块的最佳数据节点,读取数据。
(7)当数据全部读取完毕后,调用FSDataInpitStream的close()方法,关闭输入流。
注:读取出错时,会尝试连接包含该数据库的下一个数据节点。

5.6.2 HDFS写过程
(1)客户端通过FileSystem.create()创建文件,HDFS在客户端调用create()方法后,DistributedFileSystem会创建输入流FSDataOouputStream,具体的输入流是DFSOutputStream。
(2)DistributedFileSystem通过RPC远程调用名称节点,在文件系统中创建新文件。名称节点会检查文件名称是否存在,客户端是否有权限创建文件等。检查通过后,名称节点会构造一个新文件,添加文件信息。远程方法调用结束后,DistributedFileSystem利用DFSOutputStream实例化FSDataOutputStream,并返回给客户端,客户端使用该输出流写入数据。
(3)获得输出流后,FSDataOutputStream以后,客户端会调用输出流方法write()方法向HDFS中对应的文件写入数据。
(4)客户端向输出流FSDataOutputStream写入的数据会分包放入DFSOutputstream对象内部队列。输出流会向名称节点申请若干数据节点和副本数据块,这些数据节点形成一个数据流管道,数据包经数据流管道流向各数据节点。
(5)因文件通过网络传输,可能存在错漏,所以每次传输接收到数据的数据节点都需要向发送者发送ACK确认包。确认数据传输未出错。经过多次从重复(3)到(5)的传输,直至数据全部写完。
(6)客户端调用colse()方法关闭输出流,客户端则不会再向输出流写入数据,所以当DFSOutputStream对象内部队列中分包收到应答后,即可调用ClientProtocol.complete()方法通知名称节点关闭文件,完成一次读写 *** 作。

六、处理结果的下载及命令行展示

6.1 运行程序

  • 进入存放程序的文件夹
bai@bai:~$ cd /usr/local/hadoop/
  • 需要先启动Hadoop
bai@bai:/usr/local/hadoop$ cd /usr/local/hadoop/
bai@bai:/usr/local/hadoop$ ./sbin/start-dfs.sh
bai@bai:/usr/local/hadoop$ ./sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [bai]
  • 执行程序
bai@bai:/usr/local/hadoop$ ./bin/hadoop jar ./myapp/WordCount.jar input output
2021-12-27 11:11:37,049 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-12-27 11:11:37,534 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-12-27 11:11:37,535 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2021-12-27 11:11:40,894 INFO input.FileInputFormat: Total input files to process : 3
2021-12-27 11:11:41,045 INFO mapreduce.JobSubmitter: number of splits:3
2021-12-27 11:11:41,800 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local848881555_0001
2021-12-27 11:11:41,800 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-12-27 11:11:42,451 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2021-12-27 11:11:42,452 INFO mapreduce.Job: Running job: job_local848881555_0001
2021-12-27 11:11:42,476 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2021-12-27 11:11:42,544 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2021-12-27 11:11:42,550 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-12-27 11:11:42,553 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2021-12-27 11:11:42,896 INFO mapred.LocalJobRunner: Waiting for map tasks
2021-12-27 11:11:42,896 INFO mapred.LocalJobRunner: Starting task: attempt_local848881555_0001_m_000000_0
2021-12-27 11:11:43,000 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2021-12-27 11:11:43,031 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-12-27 11:11:43,275 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2021-12-27 11:11:43,305 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/bai/input/wordfile3.txt:0+892579
2021-12-27 11:11:43,662 INFO mapreduce.Job: Job job_local848881555_0001 running in uber mode : false
2021-12-27 11:11:43,665 INFO mapreduce.Job:  map 0% reduce 0%
2021-12-27 11:11:43,868 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2021-12-27 11:11:43,869 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2021-12-27 11:11:43,869 INFO mapred.MapTask: soft limit at 83886080
2021-12-27 11:11:43,869 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2021-12-27 11:11:43,869 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2021-12-27 11:11:43,875 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2021-12-27 11:11:45,324 INFO mapred.LocalJobRunner: 
2021-12-27 11:11:45,330 INFO mapred.MapTask: Starting flush of map output
2021-12-27 11:11:45,330 INFO mapred.MapTask: Spilling map output
2021-12-27 11:11:45,331 INFO mapred.MapTask: bufstart = 0; bufend = 1517170; bufvoid = 104857600
2021-12-27 11:11:45,331 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25585700(102342800); length = 628697/6553600
2021-12-27 11:11:46,837 INFO mapred.MapTask: Finished spill 0
2021-12-27 11:11:46,920 INFO mapred.Task: Task:attempt_local848881555_0001_m_000000_0 is done. And is in the process of committing
2021-12-27 11:11:46,931 INFO mapred.LocalJobRunner: map
2021-12-27 11:11:46,931 INFO mapred.Task: Task 'attempt_local848881555_0001_m_000000_0' done.
2021-12-27 11:11:46,986 INFO mapred.Task: Final Counters for attempt_local848881555_0001_m_000000_0: Counters: 24
	File System Counters
		FILE: Number of bytes read=75794958
		FILE: Number of bytes written=77320485
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=892579
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=5
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
		HDFS: Number of bytes read erasure-coded=0
	Map-Reduce framework
		Map input records=4040
		Map output records=157175
		Map output bytes=1517170
		Map output materialized bytes=309532
		Input split bytes=115
		Combine input records=157175
		Combine output records=21029
		Spilled Records=21029
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=188
		Total committed heap usage (bytes)=263065600
	File Input Format Counters 
		Bytes Read=892579
2021-12-27 11:11:46,989 INFO mapred.LocalJobRunner: Finishing task: attempt_local848881555_0001_m_000000_0
2021-12-27 11:11:47,004 INFO mapred.LocalJobRunner: Starting task: attempt_local848881555_0001_m_000001_0
2021-12-27 11:11:47,023 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2021-12-27 11:11:47,023 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-12-27 11:11:47,024 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2021-12-27 11:11:47,032 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/bai/input/wordfile1.txt:0+159257
2021-12-27 11:11:47,204 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2021-12-27 11:11:47,213 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2021-12-27 11:11:47,214 INFO mapred.MapTask: soft limit at 83886080
2021-12-27 11:11:47,214 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2021-12-27 11:11:47,214 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2021-12-27 11:11:47,221 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2021-12-27 11:11:47,402 INFO mapred.LocalJobRunner: 
2021-12-27 11:11:47,407 INFO mapred.MapTask: Starting flush of map output
2021-12-27 11:11:47,407 INFO mapred.MapTask: Spilling map output
2021-12-27 11:11:47,407 INFO mapred.MapTask: bufstart = 0; bufend = 272869; bufvoid = 104857600
2021-12-27 11:11:47,407 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26100004(104400016); length = 114393/6553600
2021-12-27 11:11:47,657 INFO mapred.MapTask: Finished spill 0
2021-12-27 11:11:47,668 INFO mapred.Task: Task:attempt_local848881555_0001_m_000001_0 is done. And is in the process of committing
2021-12-27 11:11:47,677 INFO mapreduce.Job:  map 33% reduce 0%
2021-12-27 11:11:47,703 INFO mapred.LocalJobRunner: map
2021-12-27 11:11:47,705 INFO mapred.Task: Task 'attempt_local848881555_0001_m_000001_0' done.
2021-12-27 11:11:47,713 INFO mapred.Task: Final Counters for attempt_local848881555_0001_m_000001_0: Counters: 24
	File System Counters
		FILE: Number of bytes read=75795322
		FILE: Number of bytes written=77416938
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1051836
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
		HDFS: Number of bytes read erasure-coded=0
	Map-Reduce framework
		Map input records=751
		Map output records=28599
		Map output bytes=272869
		Map output materialized bytes=96421
		Input split bytes=115
		Combine input records=28599
		Combine output records=6999
		Spilled Records=6999
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=119
		Total committed heap usage (bytes)=310251520
	File Input Format Counters 
		Bytes Read=159257
2021-12-27 11:11:47,720 INFO mapred.LocalJobRunner: Finishing task: attempt_local848881555_0001_m_000001_0
2021-12-27 11:11:47,722 INFO mapred.LocalJobRunner: Starting task: attempt_local848881555_0001_m_000002_0
2021-12-27 11:11:47,730 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2021-12-27 11:11:47,732 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-12-27 11:11:47,733 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2021-12-27 11:11:47,745 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/bai/input/wordfile2.txt:0+77290
2021-12-27 11:11:47,886 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2021-12-27 11:11:47,887 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2021-12-27 11:11:47,887 INFO mapred.MapTask: soft limit at 83886080
2021-12-27 11:11:47,887 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2021-12-27 11:11:47,888 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2021-12-27 11:11:47,890 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2021-12-27 11:11:47,997 INFO mapred.LocalJobRunner: 
2021-12-27 11:11:47,998 INFO mapred.MapTask: Starting flush of map output
2021-12-27 11:11:47,998 INFO mapred.MapTask: Spilling map output
2021-12-27 11:11:47,998 INFO mapred.MapTask: bufstart = 0; bufend = 128859; bufvoid = 104857600
2021-12-27 11:11:47,999 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26161976(104647904); length = 52421/6553600
2021-12-27 11:11:48,131 INFO mapred.MapTask: Finished spill 0
2021-12-27 11:11:48,153 INFO mapred.Task: Task:attempt_local848881555_0001_m_000002_0 is done. And is in the process of committing
2021-12-27 11:11:48,168 INFO mapred.LocalJobRunner: map
2021-12-27 11:11:48,171 INFO mapred.Task: Task 'attempt_local848881555_0001_m_000002_0' done.
2021-12-27 11:11:48,177 INFO mapred.Task: Final Counters for attempt_local848881555_0001_m_000002_0: Counters: 24
	File System Counters
		FILE: Number of bytes read=75795686
		FILE: Number of bytes written=77470432
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1129126
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
		HDFS: Number of bytes read erasure-coded=0
	Map-Reduce framework
		Map input records=388
		Map output records=13106
		Map output bytes=128859
		Map output materialized bytes=53462
		Input split bytes=115
		Combine input records=13106
		Combine output records=3802
		Spilled Records=3802
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=89
		Total committed heap usage (bytes)=295047168
	File Input Format Counters 
		Bytes Read=77290
2021-12-27 11:11:48,188 INFO mapred.LocalJobRunner: Finishing task: attempt_local848881555_0001_m_000002_0
2021-12-27 11:11:48,191 INFO mapred.LocalJobRunner: map task executor complete.
2021-12-27 11:11:48,210 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2021-12-27 11:11:48,211 INFO mapred.LocalJobRunner: Starting task: attempt_local848881555_0001_r_000000_0
2021-12-27 11:11:48,280 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2021-12-27 11:11:48,284 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2021-12-27 11:11:48,286 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2021-12-27 11:11:48,499 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@549533ed
2021-12-27 11:11:48,502 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2021-12-27 11:11:48,560 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=1768167808, maxSingleShuffleLimit=442041952, mergeThreshold=1166990848, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2021-12-27 11:11:48,586 INFO reduce.EventFetcher: attempt_local848881555_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2021-12-27 11:11:48,682 INFO mapreduce.Job:  map 100% reduce 0%
2021-12-27 11:11:48,723 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local848881555_0001_m_000002_0 decomp: 53458 len: 53462 to MEMORY
2021-12-27 11:11:48,744 INFO reduce.InMemoryMapOutput: Read 53458 bytes from map-output for attempt_local848881555_0001_m_000002_0
2021-12-27 11:11:48,747 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 53458, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->53458
2021-12-27 11:11:48,766 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local848881555_0001_m_000001_0 decomp: 96417 len: 96421 to MEMORY
2021-12-27 11:11:48,784 INFO reduce.InMemoryMapOutput: Read 96417 bytes from map-output for attempt_local848881555_0001_m_000001_0
2021-12-27 11:11:48,784 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 96417, inMemoryMapOutputs.size() -> 2, commitMemory -> 53458, usedMemory ->149875
2021-12-27 11:11:48,798 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local848881555_0001_m_000000_0 decomp: 309528 len: 309532 to MEMORY
2021-12-27 11:11:48,812 INFO reduce.InMemoryMapOutput: Read 309528 bytes from map-output for attempt_local848881555_0001_m_000000_0
2021-12-27 11:11:48,814 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 309528, inMemoryMapOutputs.size() -> 3, commitMemory -> 149875, usedMemory ->459403
2021-12-27 11:11:48,814 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2021-12-27 11:11:48,815 INFO mapred.LocalJobRunner: 3 / 3 copied.
2021-12-27 11:11:48,815 INFO reduce.MergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs
2021-12-27 11:11:48,855 INFO mapred.Merger: Merging 3 sorted segments
2021-12-27 11:11:48,860 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 459379 bytes
2021-12-27 11:11:49,104 INFO reduce.MergeManagerImpl: Merged 3 segments, 459403 bytes to disk to satisfy reduce memory limit
2021-12-27 11:11:49,105 INFO reduce.MergeManagerImpl: Merging 1 files, 459403 bytes from disk
2021-12-27 11:11:49,106 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2021-12-27 11:11:49,106 INFO mapred.Merger: Merging 1 sorted segments
2021-12-27 11:11:49,112 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 459391 bytes
2021-12-27 11:11:49,113 INFO mapred.LocalJobRunner: 3 / 3 copied.
2021-12-27 11:11:50,077 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2021-12-27 11:11:51,008 INFO mapred.Task: Task:attempt_local848881555_0001_r_000000_0 is done. And is in the process of committing
2021-12-27 11:11:51,044 INFO mapred.LocalJobRunner: 3 / 3 copied.
2021-12-27 11:11:51,044 INFO mapred.Task: Task attempt_local848881555_0001_r_000000_0 is allowed to commit now
2021-12-27 11:11:52,089 INFO output.FileOutputCommitter: Saved output of task 'attempt_local848881555_0001_r_000000_0' to hdfs://localhost:9000/user/bai/output
2021-12-27 11:11:52,106 INFO mapred.LocalJobRunner: reduce > reduce
2021-12-27 11:11:52,106 INFO mapred.Task: Task 'attempt_local848881555_0001_r_000000_0' done.
2021-12-27 11:11:52,114 INFO mapred.Task: Final Counters for attempt_local848881555_0001_r_000000_0: Counters: 30
	File System Counters
		FILE: Number of bytes read=76714600
		FILE: Number of bytes written=77929835
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1129126
		HDFS: Number of bytes written=279090
		HDFS: Number of read operations=14
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
		HDFS: Number of bytes read erasure-coded=0
	Map-Reduce framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=25821
		Reduce shuffle bytes=459415
		Reduce input records=31830
		Reduce output records=25821
		Spilled Records=31830
		Shuffled Maps =3
		Failed Shuffles=0
		Merged Map outputs=3
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=295047168
	Shuffle Errors
		BAD_ID=0
		ConNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=279090
2021-12-27 11:11:52,119 INFO mapred.LocalJobRunner: Finishing task: attempt_local848881555_0001_r_000000_0
2021-12-27 11:11:52,119 INFO mapred.LocalJobRunner: reduce task executor complete.
2021-12-27 11:11:52,686 INFO mapreduce.Job:  map 100% reduce 100%
2021-12-27 11:11:54,687 INFO mapreduce.Job: Job job_local848881555_0001 completed successfully
2021-12-27 11:11:54,756 INFO mapreduce.Job: Counters: 36
	File System Counters
		FILE: Number of bytes read=304100566
		FILE: Number of bytes written=310137690
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=4202667
		HDFS: Number of bytes written=279090
		HDFS: Number of read operations=35
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=6
		HDFS: Number of bytes read erasure-coded=0
	Map-Reduce framework
		Map input records=5179
		Map output records=198880
		Map output bytes=1918898
		Map output materialized bytes=459415
		Input split bytes=345
		Combine input records=198880
		Combine output records=31830
		Reduce input groups=25821
		Reduce shuffle bytes=459415
		Reduce input records=31830
		Reduce output records=25821
		Spilled Records=63660
		Shuffled Maps =3
		Failed Shuffles=0
		Merged Map outputs=3
		GC time elapsed (ms)=396
		Total committed heap usage (bytes)=1163411456
	Shuffle Errors
		BAD_ID=0
		ConNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1129126
	File Output Format Counters 
		Bytes Written=279090
  • 查看词频统计结果
bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -cat output/*

  • 将词频统计结果输出到文件
bai@bai:/usr/local/hadoop$ ./bin/hdfs dfs -cat output/* >> ~/text/wordcount.txt)

七、经验总结

在本次实验中,重新在物理机中安装virtualbox,Ubuntu18.04,JDK1.8,Hadoop3.3.1,eclipse2019。除最后一个直接在Ubuntu内下载外,其余均是在官方网站下载。作为搞技术的,一定要学会使用利用官方资源,如文档、各类包等。这并不需要多好的英语基础,实际上谷歌浏览器或者其他浏览器目前均有自动翻译的功能。另外还可以通过有道词典等程序,实现对单个单词的抓取翻译。而官方文档则更为重要,实际上官方文档提供了部分可能出现的错误,以及基本的安装流程。
接下来则是如何解决错误,实际上大多数错误都可以前往日志文件查看错误原因,但鉴于很多人根本找不到日志文件。以Hadoop为例,其日志文件存放在/安装路径/hadoop/logs中,可根据需要查看secondarynamenode、datanode、namenode的日志。另外若不清楚所下载的程序的日志文件具体位置,可以前往对应官方查看(https://hadoop.apache.org/docs/)。当然,绝大多数错误并不是查阅官方文档就可以解决的,这个时候你可以先将错误翻译后,百度一下或者上GitHub看看有没有类似错误。百度的大部分错误解析都云里雾里的,GitHub相对具体、系统很多。当然仍然存在错误无法解决的情况,这个时候可以重启一下,很多问题就消失了,或者删除相关文件,重新部署。另外如果设置了快照,可以选择恢复快照。一般来说,建议每部署完一个环境设置一个快照。
最后,在本次实验中没有遇到什么特别难的问题。大概最难搞的是如何让一个小白看得到,能根据文档部署环境。当然,此篇文档仍然不够完善。诸如Vim的 *** 作、Linux的基本 *** 作、HDFS的 *** 作,JAVA代码注释等等。并没有做好,本篇文档仅能够让你比较成功的部署相关环境,别无其他多余作用。当然如果有需要可以留言,我会抽空书写相关文档。

参考文献

1、virtualbox中ubuntu虚拟机怎么安装增强功能
2、vi 和vim 的区别以及用法
3、Hadoop官方文档
4、Linux Tools Quick Tutorial
5、大数据处理架构Hadoop 实践指南
6、HDFS编程实践
7、MapReduce编程实践

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5681078.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存