hadoop方法将输出发送到多个目录_随笔

hadoop方法将输出发送到多个目录

您不需要第二份工作。我目前正在使用MultipleOutputs在我的一个程序中创建大量输出目录。尽管有30个以上的目录，但我只能使用几个MultipleOutputs对象。这是因为您可以在写入时设置输出目录，因此只能在需要时确定输出目录。如果要以不同的格式输出，则实际上只需要一个以上的namedOutput（例如，一种具有键：Text.class，值：Text.class，另一种具有键：Text.class和Value：IntWritable.class）

建立：

MultipleOutputs.addNamedOutput(job, "Output", TextOutputFormat.class, Text.class, Text.class);

减速器的设置：

mout = new MultipleOutputs<Text, Text>(context);

在减速器中调用mout：

String key; //set to whatever output key will beString value; //set to whatever output value will beString outputFileName; //set to absolute path to file where this should writemout.write("Output",new Text(key),new Text(value),outputFileName);

您可以用一段代码在编码时确定目录。例如，说您想按月份和年份指定目录：

int year;//extract year from dataint month;//extract month from dataString baseFileName; //parent directory to all outputs from this jobString outputFileName = baseFileName + "/" + year + "/" + month;mout.write("Output",new Text(key),new Text(value),outputFileName);

希望这可以帮助。

编辑：以上示例的输出文件结构：

base    2013        01        02        03        ...    2012        01        ...    ...

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5561196.html

hadoop方法将输出发送到多个目录

发表评论

评论列表（0条）