整个的思路是:
- 构造数据源
- 窗口聚合代码
首先构造数据,新建一个MyData2.java的文件,写入这个MyData2的类
package create_data; import java.util.Arrays; public class MyData2 { public int keyId; public long timestamp; public int num; public double[] valueList; public MyData2() { } public MyData2(int accountId, long timestamp, int num, double[] valueList) { this.keyId = accountId; this.timestamp = timestamp; this.num = num; this.valueList = valueList; } public long getKeyId() { return keyId; } public void setKeyId(int keyId) { this.keyId = keyId; } public long getTimestamp() { return timestamp; } public void setTimestamp(long timestamp) { this.timestamp = timestamp; } public double[] getValueList() { return valueList; } public void setValueList(double[] valueList) { this.valueList = valueList; } public int getNum() { return num; } public void setNum(int num) { this.num = num; } @Override public String toString() { return "MyData{" + "keyId=" + keyId + ", timestamp=" + timestamp + ", num=" + num + ", valueList= " + Arrays.toString(valueList) + '}'; } }
然后需要一个控制数据生成的类,新建一个类:MyDataSource2.java,写入:
package create_data; import org.apache.flink.streaming.api.functions.source.SourceFunction; import java.util.Random; public class MyDataSource2 implements SourceFunction2. 全窗口聚合类{ // 定义标志位,用来控制数据的产生 private boolean isRunning = true; private final Random random = new Random(0); @Override public void run(SourceContext ctx) throws Exception { while (isRunning) { // ctx.collect(new MyData(random.nextInt(3), System.currentTimeMillis(), random.nextFloat())); ctx.collect(new MyData2(random.nextInt(3), System.currentTimeMillis(), 1, new double[]{random.nextDouble()})); Thread.sleep(1000L); // 1s生成1个数据 } } @Override public void cancel() { isRunning = false; } }
最后新建一个FullWindowLearn2.java类,构造全窗口聚合类
package windows_learn; import create_data.MyData2; import create_data.MyDataSource2; import org.apache.commons.lang3.ArrayUtils; import org.apache.flink.api.common.functions.ReduceFunction; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows; import org.apache.flink.streaming.api.windowing.time.Time; public class FullWindowLearn2 { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(3); DataStreamSourcesourceStream = env.addSource(new MyDataSource2()); SingleOutputStreamOperator outStream = sourceStream .keyBy("keyId") .window(TumblingProcessingTimeWindows.of(Time.seconds(5))) .reduce(new ReduceFunction () { @Override public MyData2 reduce(MyData2 value1, MyData2 value2) throws Exception { return new MyData2(value1.keyId, value2.timestamp, value1.getNum() + value2.getNum(), ArrayUtils.addAll(value1.valueList, value2.valueList)); } }); outStream.print(); env.execute(); } }
运行后的结果如下:
3> MyData{keyId=0, timestamp=1634698715287, num=1, valueList= [0.8314409887870612]} 3> MyData{keyId=2, timestamp=1634698719302, num=4, valueList= [0.6374174253501083, 0.11700660880722513, 0.3332183994766498, 0.6130357680446138]} 3> MyData{keyId=2, timestamp=1634698723310, num=3, valueList= [0.8791825178724801, 0.17597680203548016, 0.7051747444754559]} 3> MyData{keyId=1, timestamp=1634698724310, num=1, valueList= [0.5467397571984656]} 3> MyData{keyId=0, timestamp=1634698722308, num=1, valueList= [0.12889715087377673]} 3> MyData{keyId=2, timestamp=1634698729327, num=3, valueList= [0.5629496738983792, 0.6251463634655593, 0.8676786682939737]} 3> MyData{keyId=0, timestamp=1634698728324, num=2, valueList= [0.01492708588111824, 0.990722785714783]} 3> MyData{keyId=0, timestamp=1634698733340, num=3, valueList= [0.7331520701949938, 0.5266994346048661, 0.9846741428068255]} 3> MyData{keyId=2, timestamp=1634698734342, num=1, valueList= [0.0830623982249149]} 3> MyData{keyId=1, timestamp=1634698731334, num=1, valueList= [0.012806651575719585]} 3> MyData{keyId=2, timestamp=1634698739353, num=2, valueList= [0.30687115672762866, 0.6895039878550204]} 3> MyData{keyId=1, timestamp=1634698737351, num=1, valueList= [0.3591653475606117]} 3> MyData{keyId=0, timestamp=1634698738351, num=2, valueList= [0.7150310138504744, 0.004485602182885184]} 3> MyData{keyId=0, timestamp=1634698743367, num=3, valueList= [0.3387696535357536, 0.8657458802140383, 0.04494430391472559]} 3> MyData{keyId=1, timestamp=1634698744371, num=2, valueList= [0.9323680992655007, 0.21757041220968598]} 3> MyData{keyId=0, timestamp=1634698748381, num=4, valueList= [0.08278636648764448, 0.6922930069529333, 0.9481847392423067, 0.2112353749298962]} 3> MyData{keyId=2, timestamp=1634698749384, num=1, valueList= [0.3952070466478651]}
可以看到由于真实的时间戳并不是严格的安装5s来,因此有时候聚合4个,有时候6个,但整体是这样滴
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)