Hadoop爬坑记——HDFS文件因Hadoop版本原因导致的追加问题

Hadoop爬坑记——HDFS文件因Hadoop版本原因导致的追加问题,第1张

今日在练习HDFS文件的读取输出,写入,追加写入时,

读取输出,写入都没问题,在追加写入时出现了问题。

报错如下:

2017-07-14 10:50:00,046 WARN [org.apache.hadoop.hdfs.DFSClient] - DataStreamer Exception

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.153.138:50010,DS-1757abd0-ffc8-4320-a277-3dfb1022533e,DISK], DatanodeInfoWithStorage[192.168.153.137:50010,DS-ddd9a357-58d6-4c2a-921e-c16087974edb,DISK]], original=[DatanodeInfoWithStorage[192.168.153.138:50010,DS-1757abd0-ffc8-4320-a277-3dfb1022533e,DISK], DatanodeInfoWithStorage[192.168.153.137:50010,DS-ddd9a357-58d6-4c2a-921e-c16087974edb,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:918)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:992)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1160)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:455)

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.153.138:50010,DS-1757abd0-ffc8-4320-a277-3dfb1022533e,DISK], DatanodeInfoWithStorage[192.168.153.137:50010,DS-ddd9a357-58d6-4c2a-921e-c16087974edb,DISK]], original=[DatanodeInfoWithStorage[192.168.153.138:50010,DS-1757abd0-ffc8-4320-a277-3dfb1022533e,DISK], DatanodeInfoWithStorage[192.168.153.137:50010,DS-ddd9a357-58d6-4c2a-921e-c16087974edb,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:918)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:992)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1160)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:455)

2017-07-14 10:50:00,052 ERROR [org.apache.hadoop.hdfs.DFSClient] - Failed to close inode 16762

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.153.138:50010,DS-1757abd0-ffc8-4320-a277-3dfb1022533e,DISK], DatanodeInfoWithStorage[192.168.153.137:50010,DS-ddd9a357-58d6-4c2a-921e-c16087974edb,DISK]], original=[DatanodeInfoWithStorage[192.168.153.138:50010,DS-1757abd0-ffc8-4320-a277-3dfb1022533e,DISK], DatanodeInfoWithStorage[192.168.153.137:50010,DS-ddd9a357-58d6-4c2a-921e-c16087974edb,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:918)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:992)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1160)

at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:455)

添加上

之后报错如下:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /weather/output/abc.txt for DFSClient_NONMAPREDUCE_1964095166_1 on 192.168.153.1 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-590151867_1 on 192.168.153.1

然后再添加

完事就没问题了。

你的hadoop参数配置正确了吗?在hdfs-site.xml中把以下属性修改为true才可以。

<property>

<name>dfs.support.append</name>

<value>true</value>

</property>

下面有一段测试代码,你可以参考一下:

package com.wyp

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.FileSystem

import org.apache.hadoop.fs.Path

import org.apache.hadoop.io.IOUtils

import java.io.*

import java.net.URI

public class AppendContent {

public static void main(String[] args) {

String hdfs_path = "hdfs://mycluster/home/wyp/wyp.txt"//文件路径

Configuration conf = new Configuration()

conf.setBoolean("dfs.support.append", true)

String inpath = "/home/wyp/append.txt"

FileSystem fs = null

try {

fs = FileSystem.get(URI.create(hdfs_path), conf)

//要追加的文件流,inpath为文件

InputStream in = new

BufferedInputStream(new FileInputStream(inpath))

OutputStream out = fs.append(new Path(hdfs_path))

IOUtils.copyBytes(in, out, 4096, true)

} catch (IOException e) {

e.printStackTrace()

}

}

}

上半句话,访问文件不外乎读和写,需要读写时调用函数FileSystem&open()和FileSystem&create(),返回的对象是FSDataInputStream和FSDataOutputStream。 data直译成中文就是数据,stream直译成中文就是流。 这两个对象分别继承于java.io.DataInputStream和java.io.DataOutputStream, 是java的常用的文件读写类。 需要读时用DataInputStream的函数readInt(), readFloat()...,写时也差不多。

下半句话,两个关键词, ”单个客户“和”追加“。单个客户指不能有两个线程同时写;追加指写的形式只能是在文件后加内容(append),不能覆盖(overwrite)。 这两个限制都是设计上简化考虑。 多个线程同时append时,由于hdfs是一份文件存于多个机器,保证在每台机器上两个线程写的顺序一致(从而结果一致)是一个很难的问题(当然不是做不到), 出于简单考虑, 就不这么做了。 多个线程同时overwrite就更麻烦。


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/tougao/11664123.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-05-17
下一篇 2023-05-17

发表评论

登录后才能评论

评论列表(0条)

保存