目录
1、需求
(1)输入数据
(2)期望输出数据
2、实现(本地测试)
(1)环境准备
1)创建maven工程,MapReduceDemo(maven官网下载maven,利用阿里镜像速度快,仓库选择自己建的文件夹,默认在c盘)
2)在pom.xml文件中添加如下依赖
3)在项目的src/main/resources目录下,新建一个文件,命名为“log4j.properties”,在文件中填入。
4)创建包名:com.atguigu.mapreduce.wordcount
(2)三个类
1)Mapper类
2)Reducer类
3)Driver类
3、提交到集群测试
(1)用maven打jar包,需要添加的打包插件依赖
(2)将程序打成jar包
(3)拷贝该jar包到Hadoop集群的/opt/module/hadoop-3.1.3路径(可以直接拖进shell中这个目录)
(4)执行WordCount程序(执行前确保集群启动)
1、需求
在给定的文本文件中统计输出每一个单词出现的总次数
(1)输入数据txt文件
(2)期望输出数据atguigu 2 banzhang 1 cls 2 hadoop 1 jiao 1 ss 2 xue 12、实现(本地测试)
按照MapReduce编程规范,分别编写Mapper,Reducer,Driver。
(1)环境准备 1)创建maven工程,MapReduceDemo(maven官网下载maven,利用阿里镜像速度快,仓库选择自己建的文件夹,默认在c盘) 2)在pom.xml文件中添加如下依赖3)在项目的src/main/resources目录下,新建一个文件,命名为“log4j.properties”,在文件中填入。org.apache.hadoop hadoop-client3.1.3 junit junit4.12 org.slf4j slf4j-log4j121.7.30
log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n4)创建包名:com.atguigu.mapreduce.wordcount (2)三个类
!注意:导包要仔细
1)Mapper类package com.atguigu.mapreduce.wordcount2; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class WordCountMapper extends Mapper2)Reducer类{ private Text outK = new Text(); private IntWritable outV = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //String的方法多,因此转为String //1 获取一行,得到:atguigu atguigu String line = value.toString(); //2 切割,得到: //atguigu //atguigu String[] words = line.split(" "); //3 循环写出 for (String word : words) { //封装outK outK.set(word); //写出 context.write(outK,outV); } } }
package com.atguigu.mapreduce.wordcount2; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class WordCountReducer extends Reducer3)Driver类{ IntWritable outV = new IntWritable(); @Override protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { //传入的数值 atguigu(1,1) atguigu出现两次 int sum = 0; //累加 for (IntWritable value : values) { sum += value.get(); } outV.set(sum); //写出 context.write(key,outV); } }
package com.atguigu.mapreduce.wordcount; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException; public class WordCountDriver { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { //1、获取job Configuration conf = new Configuration(); Job job = Job.getInstance(conf); //2、设置jar包路径 job.setJarByClass(WordCountDriver.class); //3、关联mapper和reducer job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); //4、设置map输出的kv类型 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); //5、设置最终输出的kv类型(不一定是reducer的输出类型) job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //6、设置输入路径和输出路径 FileInputFormat.setInputPaths(job, new Path("D:\code\Hadoop\input\inputword")); FileOutputFormat.setOutputPath(job, new Path("D:\code\Hadoop\test\output")); //7、提交job boolean result = job.waitForCompletion(true); System.exit(result ? 0 : 1); } }3、提交到集群测试
为了使输入和输出路径可变,利用args,修改driver类
//6、设置输入路径和输出路径 FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1]));(1)用maven打jar包,需要添加的打包插件依赖
(2)将程序打成jar包maven-compiler-plugin 3.6.1 1.8 maven-assembly-plugin jar-with-dependencies make-assembly package single
(3)拷贝该jar包到Hadoop集群的/opt/module/hadoop-3.1.3路径(可以直接拖进shell中这个目录) (4)执行WordCount程序(执行前确保集群启动)
[atguigu@hadoop102 hadoop-3.1.3]$ hadoop jar wc.jar com.atguigu.mapreduce.wordcount.WordCountDriver /user/atguigu/input /user/atguigu/output
注意要copy driver类的reference, *** 作:选中左边文件,右键copy->copy reference
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)