1.多路径输入
1)FileInputFormat.addInputPath 多次调用加载不同路径
import org.apache.hadoop.mapreduce.lib.input.FileInputFormatimport org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
String in0 = args[0]
String in1 = args[1]
String out = args[2]
FileInputFormat.addInputPath(job,new Path(in0))
FileInputFormat.addInputPath(job,new Path(in1))
FileOutputFormat.setOutputPath(job,new Path(out))
2)FileInputFormat.addInputPaths一次调用加载 多路径字符串用逗号隔开
FileInputFormat.addInputPaths(job, "hdfs://RS5-112:9000/cs/path1,hdfs://RS5-112:9000/cs/path2")
2.多种输入
MultipleInputs可以加载不同路径的输入文件,并且每个路径可用不同的maper
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class)
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)