hadoop 系统是怎样增加节点的

hadoop 系统是怎样增加节点的,第1张

Hadoop添加节点的方法

自己实际添加节点过程:

1. 先在slave上配置好环境,包括ssh,jdk,相关config,lib,bin等的拷贝;

2. 将新的datanode的host加到集群namenode及其他datanode中去;

3. 将新的datanode的ip加到master的conf/slaves中;

4. 重启cluster,在cluster中看到新的datanode节点;

5. 运行bin/start-balancer.sh,这个会很耗时间

备注:

1. 如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低mr的工作效率;

2. 也可调用bin/start-balancer.sh 命令执行,也可加参数 -threshold 5

threshold 是平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长。

3. balancer也可以在有mr job的cluster上运行,默认dfs.balance.bandwidthPerSec很低,为1M/s。在没有mr job时,可以提高该设置加快负载均衡时间。

其他备注:

1. 必须确保slave的firewall已关闭

2. 确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中,反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中

mapper及reducer个数

url地址: http://wiki.apache.org/hadoop/HowManyMapsAndReduces

HowManyMapsAndReduces

Partitioning your job into maps and reduces

Picking the appropriate size for the tasks for your job can radically change the performance of Hadoop. Increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. At one extreme is the 1 map/1 reduce case where nothing is distributed. The other extreme is to have 1,000,000 maps/ 1,000,000 reduces where the framework runs out of resources for the overhead.

Number of Maps

The number of maps is usually driven by the number of DFS blocks in the input files. Although that causes people to adjust their DFS block size to adjust the number of maps. The right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.

Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the [WWW] InputFormat determines the number of maps.

The number of map tasks can also be increased manually using the JobConf's conf.setNumMapTasks(int num). This can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.

Number of Reduces

The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). At 0.95 all of the reduces can launch immediately and start transfering map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.

Currently the number of reduces is limited to roughly 1000 by the buffer size for the output files (io.buffer.size * 2 * numReduces <<heapSize). This will be fixed at some point, but until it is it provides a pretty firm upper bound.

The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.

The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's conf.setNumReduceTasks(int num).

自己的理解:

mapper个数的设置:跟input file 有关系,也跟filesplits有关系,filesplits的上线为dfs.block.size,下线可以通过mapred.min.split.size设置,最后还是由InputFormat决定。

较好的建议:

The right number of reduces seems to be 0.95 or 1.75 multiplied by (<no. of nodes>* mapred.tasktracker.reduce.tasks.maximum).increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

<property>

<name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>2</value>

<description>The maximum number of reduce tasks that will be run

simultaneously by a task tracker.

</description>

</property>

单个node新加硬盘

1.修改需要新加硬盘的node的dfs.data.dir,用逗号分隔新、旧文件目录

2.重启dfs

同步hadoop 代码

hadoop-env.sh

# host:path where hadoop code should be rsync'd from. Unset by default.

# export HADOOP_MASTER=master:/home/$USER/src/hadoop

用命令合并HDFS小文件

hadoop fs -getmerge <src><dest>

重启reduce job方法

Introduced recovery of jobs when JobTracker restarts. This facility is off by default.

Introduced config parameters "mapred.jobtracker.restart.recover", "mapred.jobtracker.job.history.block.size", and "mapred.jobtracker.job.history.buffer.size".

还未验证过。

IO写 *** 作出现问题

0-1246359584298, infoPort=50075, ipcPort=50020):Got exception while serving blk_-5911099437886836280_1292 to /172.16.100.165:

java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/

172.16.100.165:50010 remote=/172.16.100.165:50930]

at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)

at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)

at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)

at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)

at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)

at java.lang.Thread.run(Thread.java:619)

It seems there are many reasons that it can timeout, the example given in

HADOOP-3831 is a slow reading client.

解决办法:在hadoop-site.xml中设置dfs.datanode.socket.write.timeout=0试试;

My understanding is that this issue should be fixed in Hadoop 0.19.1 so that

we should leave the standard timeout. However until then this can help

resolve issues like the one you're seeing.

HDFS退服节点的方法

目前版本的dfsadmin的帮助信息是没写清楚的,已经file了一个bug了,正确的方法如下:

1. 将 dfs.hosts 置为当前的 slaves,文件名用完整路径,注意,列表中的节点主机名要用大名,即 uname -n 可以得到的那个。

2. 将 slaves 中要被退服的节点的全名列表放在另一个文件里,如 slaves.ex,使用 dfs.host.exclude 参数指向这个文件的完整路径

3. 运行命令 bin/hadoop dfsadmin -refreshNodes

4. web界面或 bin/hadoop dfsadmin -report 可以看到退服节点的状态是 Decomission in progress,直到需要复制的数据复制完成为止

5. 完成之后,从 slaves 里(指 dfs.hosts 指向的文件)去掉已经退服的节点

附带说一下 -refreshNodes 命令的另外三种用途:

2. 添加允许的节点到列表中(添加主机名到 dfs.hosts 里来)

3. 直接去掉节点,不做数据副本备份(在 dfs.hosts 里去掉主机名)

4. 退服的逆 *** 作——停止 exclude 里面和 dfs.hosts 里面都有的,正在进行 decomission 的节点的退服,也就是把 Decomission in progress 的节点重新变为 Normal (在 web 界面叫 in service)

Hadoop添加节点的方法

自己实际添加节点过程:

1. 先在slave上配置好环境,包括ssh,jdk,相关config,lib,bin等的拷贝;

2. 将新的datanode的host加到集群namenode及其他datanode中去;

3. 将新的datanode的ip加到master的conf/slaves中;

4. 重启cluster,在cluster中看到新的datanode节点;

5. 运行bin/start-balancer.sh,这个会很耗时间

备注:

1. 如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低mr的工作效率;

2. 也可调用bin/start-balancer.sh 命令执行,也可加参数 -threshold 5

threshold 是平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长。

3. balancer也可以在有mr job的cluster上运行,默认dfs.balance.bandwidthPerSec很低,为1M/s。在没有mr job时,可以提高该设置加快负载均衡时间。

其他备注:

1. 必须确保slave的firewall已关闭

2. 确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中,反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中

备注:

CDH 6.3.1

如下图所示,刚安装的CDH,提示存在丢失块,也存在副本不足的块

测试记录:

过滤出 MISSING的信息

可以看到都是oozie这个空间丢失的块

查看丢失的文件块信息

导致这个问题产生时是由于初始化集群后,只有两个节点的datanode,所以将副本调整为2 、dfs.replication=2,在切换后对hdfs做了抑制,导致告警出现;现在重新加入一个datanode,重新恢复3个datanode节点3副本模式。

测试记录:

现在看丢失块的问题解决:

1. https://www.cnblogs.com/hqt0731/articles/8804924.html

不行,得先升级到4

下面是升级方法

Step 1: 做下saveNamespace *** 作,停掉集群,并备份下 HDFS 的 Metadata

1.1 让namenode进入safe mode状态

$ bin/hadoop dfsadmin -safemode enter

1.2 执行saveNamespace *** 作

$ bin/hadoop dfsadmin -saveNamespace

1.3 stop 集群

1.4 备份 dfs.name.dir 下面的元数据

Step 2: 下载 CDH4,把CDH3的配置拷过来

注意CDH3配置文件是在conf目录下面,CDH4的配置文件目录已经改成了etc/hadoop目录

Step 3: 升级 HDFS Metadata

3.1 进入CDH4目录下执行:

sbin/hadoop-daemon.sh start namenode -upgrade -clusterid mycluster-test

说明mycluster-test是clusterid,可以指定,也可以不指定,如果不指定那么系统会自动生成一个

3.2 查看日志目录下的namenode日志,如果出现:

Upgrade of ${dfs.namenode.name.dir} is complete

说明元数据已经升级成功

3.3 启动DataNodes:

在每一台datanode上面启动datanode服务

sbin/hadoop-daemon.sh start datanode

datanode节点会自动升级

3.4 等待namenode退出安全模式,然后执行fsck

bin/hdfs fsck /

3.5 确认目录健康,没有block丢失后可以执行finalzeUpgrade及启动secondarynamenode

bin/hdfs dfsadmin -finalizeUpgrade

#finalized后将不能rollback

sbin/hadoop-daemon.sh start secondarynamenode

#请清理掉dfs.namenode.checkpoint.dir目录下老版本文件,否则会启动失败

回滚 *** 作:

若在升级过程中出了问题,想回滚到cdh3版本,一定不能执行bin/hdfs dfsadmin -finalizeUpgrade。在执行finalizeUpgrade之前都可以回滚

在cdh3 版本下面执行

(1)回滚Namenode,在namenode机器上面执行

bin/hadoop-daemon.sh start namenode -rollback

(2)回滚DataNode,在namenode机器上面执行

bin/hadoop-daemons.sh start datanode -rollback

也可以手工 *** 作,把数据move回来,然后正常启动

(1)回滚Namenode的数据

remove dfs.name.dir/current目录,mv dfs.name.dir/previous dfs.name.dir/current目录

这样子就恢复回namenode的元数据了

(2)回滚DataNode的数据

remove dfs.data.dir/current目录,mv dfs.data.dir/previous dfs.data.dir/current目录

这样子就恢复回datanode的数据了

*** 作完后就可以重新启动cdh3版本了

总的来说,升级是:mv current previous,创建current,读旧的元数据,写成新版本到current里面,而DataNode节点上面的block数据通过hardlink来放到current目录下面

回滚是:rm current,mv previous current


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/bake/7920496.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-04-11
下一篇 2023-04-11

发表评论

登录后才能评论

评论列表(0条)

保存