- 基本转换算子
- 1、Map
- 2、Flatmap
- 3、Filter
- 聚合算子
- 1、KeyBy
- 2、滚动转换算子
- 3、Reduce
//返回字符串长度 DataStream2、FlatmapmapStream = dataStream.map(new MapFunction () { @Override public Integer map(String value) throws Exception { return value.length(); } });
//按逗号分割字符串 DataStream3、FilterflatMapStream = dataStream.flatMap(new FlatMapFunction () { @Override public void flatMap(String value, Collector out) throws Exception { String[] fields = value.split(","); for(String field:fields){ out.collect(field); } } });
//按某一方式筛选过滤进行输出 DataStream聚合算子filterStream = dataStream.filter(new FilterFunction () { @Override public boolean filter(String value) throws Exception { return value.startsWith("sensor_1"); } });
DataStream里没有reduce和sum这类聚合 *** 作的方法,因为Flink设计中,所有数据必须先分组才能做聚合 *** 作。先keyBy得到KeyedStream,然后调用其reduce、sum等聚合 *** 作方法。(先分组后聚合)
1、KeyBy
将DataStream ->KeyedStream:逻辑地将一个流拆分成不相交的分区(主要不是拆分成两个流),每个分区包含具有相同Key的元素,在内部以hash的形式实现的。
- sum()
- min()
- minBy()
- max()
- maxBy()
这些算子可以针对KeyedStream的每一个不同分区做聚合
DataStream3、ReducesensorStream = dataStream.map(line -> { String[] fields = line.split(","); return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2])); }); // 先分组再聚合 // 分组 KeyedStream keyedStream = sensorStream.keyBy("id"); // 滚动聚合,max和maxBy区别在于,maxBy除了用于max比较的字段以外,其他字段也会更新成最新的,而max只有比较的字段更新,其他字段不变 DataStream resultStream = keyedStream.maxBy("temperature");
Reduce适用于更加一般化的聚合 *** 作场景。java中需要实现ReduceFunction函数式接口。
DataStreamdataStream = inputStream.map(line ->{ String[] fields = value.split(","); return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2])); }); //分组 KeyedStream keyedStream = dataStream.keyBy("id"); //reduce 去最大温度值,以及当前最新时间戳 keyedStream.reduce(new ReduceFunction () { @Override public SensorReading reduce(SensorReading sensorReading, SensorReading t1) throws Exception { return new SensorReading(sensorReading.getID(), t1.getTimestamp(), Math.max(sensorReading.getTimestamp(), t1.getTimestamp())); } }); keyedStream.reduce((curState, newData) -> { return new SensorReading(sensorReading.getID(), t1.getTimestamp(), Math.max(curState.getTimestamp(), newData.getTimestamp())); });
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)