本文基于CentOS 7基础镜像使用docker-compose单机搭建包含一个namenode和一个datanode HDFS分布式集群,用于概念验证。
最后使用Java Client Library 访问HDFS写入一个文件,验证HDFS可远程访问。
文章目录规划端口构建镜像
1. 目录机构2. 文件内容3. 构建镜像 配置启动HDFS测试HDFS测试远程访问
Java示例
规划端口
docker需要暴露适当端口,以使得hadoop可以被远程访问。
远程客户端要访问hdfs时,会先从端口9000 获取数据元信息, 元信息包含写入的datanode节点。客户端访问datanode节点DN端口9866写入数据。
namenode端口9870用于访问namenode的UI界面。
datanode端口9864用于访问datanode的UI界面。
构建镜像构建镜像基于centos 7
docker pull centos:centos71. 目录机构
[root@bogon hadoop]# ls Dockerfile entrypoint-datanode.sh entrypoint-namenode.sh hadoop jdk
其中
hadoop 官网下载的tar.gz解压目录,下级目录为包含binjdk oracle jdk解压目录,下级目录为binentrypoint-namenode.sh 启动namenodeentrypoint-datanode.sh 启动datanode的脚本 2. 文件内容
Dockerfile
FROM centos:centos7 ADD hadoop /opt/hadoop ADD jdk /opt/jdk ENV JAVA_HOME /opt/jdk ENV HADOOP_HOME /opt/hadoop ENV PATH=${PATH}:${JAVA_HOME}/bin:${HADOOP_HOME}/bin ADD *.sh /
entrypoint-namenode.sh
entrypoint-namenode.sh 启动namenode进程,如果检测到元数据目录为空,则会格式化后再启动namenode进程。
#!/bin/bash # if metadata dir is empty, then format it DIR=${HADOOP_HOME}/metadata if [ "$(ls -A $DIR)" ]; then echo "[INFO] namenode name dir is not empty, skip format it" else echo "[INFO] namenode name dir is not empty,format it" hdfs namenode -format $(hostname) -nonInteractive fi # start name node daemon hdfs --daemon start namenode tail -f ${HADOOP_HOME}/logs/hadoop-root-namenode-namenode.log
entrypoint-datanode.sh
entrypoint-datanode.sh则简单启动datanode进程。
#!/bin/bash hdfs --daemon start datanode tail -f ${HADOOP_HOME}/logs/hadoop-root-datanode-datanode.log3. 构建镜像
构建镜像名为hadoop
docker build -t hadoop .配置
主节点和从节点使用一套配置。扩展从节点的话,可以将同一个配置目录挂载多个datanode容器上。
配置目录位于${HADOOP_HOME}/etc/hadoop
core-site.xml
fs.defaultFS hdfs://namenode:9000 hadoop.tmp.dir /opt/hadoop/tmp
hdfs-site.xml
配置副本数,namenode元数据目录和datanode数据目录
dfs.replication 1 dfs.namenode.name.dir /opt/hadoop/metadata dfs.datanode.data.dir /opt/hadoop/data
hadoop-env.sh
修改默认启动进程用户为root,否则会报错。
export HDFS_DATANODE_USER=root export HADOOP_SECURE_DN_USER=root export HDFS_NAMENODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root export HADOOP_SHELL_EXECNAME=root启动HDFS
version: "3" services: namenode: image: hadoop container_name: namenode ports: - 9870:9870 - 9000:9000 volumes: - ./namenode:/opt/hadoop/metadata - ./namenodeconfig:/opt/hadoop/etc/hadoop entrypoint: "bash /entrypoint-namenode.sh" hostname: namenode datanode: image: hadoop ports: - 9864:9864 - 9866:9866 volumes: - ./datanode:/opt/hadoop/data - ./namenodeconfig:/opt/hadoop/etc/hadoop entrypoint: "bash /entrypoint-datanode.sh" hostname: datanode client: image: hadoop entrypoint: "tail -f /var/log/yum.log"
运行hdfs:
docker-compose up -d
测试HDFSdocker volume并不会主动复制配置文件到映射目录下,因此需要手动拷贝一份配置文件到映射目录namenodeconfig,在hadoop安装包的/etc/hadoop下。
docker-compose启动了一个client 容器用于测试:
[root@bogon namenodeconfig]# docker ps ConTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 737ab9caa73c hadoop "bash /entrypoint-na…" 27 minutes ago Up 27 minutes 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp, 0.0.0.0:9870->9870/tcp, :::9870->9870/tcp namenode 662f14bfcd99 hadoop "tail -f /var/log/yu…" 27 minutes ago Up 27 minutes hadoop-cluster_client_1 e8141341f075 hadoop "bash /entrypoint-da…" 27 minutes ago Up 27 minutes 0.0.0.0:9864->9864/tcp, :::9864->9864/tcp, 0.0.0.0:9866->9866/tcp, :::9866->9866/tcp hadoop-cluster_datanode_1 [root@bogon namenodeconfig]# docker exec -it 662 /bin/bash [root@662f14bfcd99 /]# hadoop fs -fs "namenode:9000" -mkdir /tmp 2022-01-11 03:32:58,204 WARN fs.FileSystem: "namenode:9000" is a deprecated filesystem name. Use "hdfs://namenode:9000/" instead. mkdir: `/tmp': File exists [root@662f14bfcd99 /]# hadoop fs -fs "namenode:9000" -ls /tmp 2022-01-11 03:33:05,045 WARN fs.FileSystem: "namenode:9000" is a deprecated filesystem name. Use "hdfs://namenode:9000/" instead.测试远程访问
更改/etc/hosts
namenode ${DOCKER_HOST_IP} datanode ${DOCKER_HOST_IP}
连接"namenode:9000"远程访问hdfs.
Java示例- 使用maven引入依赖
1.8 2.10.1 2.14.0 org.apache.hadoop hadoop-common${hadoop.version} org.apache.hadoop hadoop-client${hadoop.version} org.apache.hadoop hadoop-hdfs${hadoop.version} org.apache.logging.log4j log4j-core${log4j.version}
- 核心代码
上传一个文件到hdfs
Configuration conf=new Configuration(); conf.setBoolean("dfs.client.use.datanode.hostname", true); conf.setBoolean("dfs.datanode.use.datanode.hostname", true); System.setProperty("HADOOP_USER_NAME","root"); conf.set("fs.defaultFS", "hdfs://namenode:9000"); FileSystem fileSystem = FileSystem.get(conf); fileSystem.copyFromLocalFile( new Path("D:\workbunch\hdfs-demo\test.log"),new Path("/user/test.log")); fileSystem.close();
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)