Reduce算子:对数据流进行滚动聚合计算,并返回每次滚动聚合计算合并后的结果
示例环境
java.version: 1.8.x flink.version: 1.11.1
示例数据源 (项目码云下载)
Flink 系例 之 搭建开发环境与数据
Reduce.java
import com.flink.examples.DataSource; import org.apache.flink.api.common.functions.ReduceFunction; import org.apache.flink.api.java.functions.KeySelector; import org.apache.flink.api.java.tuple.Tuple3; import org.apache.flink.streaming.api.datastream.KeyedStream; import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import java.util.List; public class Reduce { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(4); List> tuple3List = DataSource.getTuple3ToList(); //注意:使用Integer进行分区时,会导致分区结果不对,转换成String类型输出key即可正确输出 KeyedStream , String> keyedStream = env.fromCollection(tuple3List).keyBy(new KeySelector , String>() { @Override public String getKey(Tuple3 tuple3) throws Exception { //f1为性别字段,以相同f1值(性别)进行分区 return String.valueOf(tuple3.f1); } }); SingleOutputStreamOperator > result = keyedStream.reduce(new ReduceFunction >() { @Override public Tuple3 reduce(Tuple3 t0, Tuple3 t1) throws Exception { int totalAge = t0.f2 + t1.f2; return new Tuple3<>("", t0.f1, totalAge); } }); result.print(); env.execute("flink Reduce job"); } }
打印结果
## 说明:为什么每一个分区的第一个数据对象每一个参数有值,是因为滚动聚合返回的是从第二数据对象向前叠加第一个数据对象,开始计算,所以第一个数据对象根本就不进入reduce方法; 2> (张三,man,20) 2> (,man,49) 2> (,man,79) 4> (李四,girl,24) 4> (,girl,56) 4> (,girl,74)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)