1、 编译hadoop-lzo-0.4.21-SNAPSHOT.jar
2、上传hadoop-lzo-0.4.20.jar至/opt/module/hadoop-3.1.3/share/hadoop/common
3、修改core-site.xml
io.compression.codecs org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec io.compression.codec.lzo.class com.hadoop.compression.lzo.LzoCodec
4、分发hadoop-lzo-0.4.20.jar和core-site.xml,重启
scp ./hadoop-lzo-0.4.20.jar node02:`pwd` scp ./core-site.xml node02:`pwd`
5、数据准备
hdfs dfs -mkdir /input hadoop fs -put README.txt /input
6、运行
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec /input /output
7、生成lzo文件
8、切片支持
hadoop fs -put bigtable.lzo /input hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output
9、构建索引文件
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
10、再次执行
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)