方法一 :当面对全量表,使用自关联 --笛卡尔积
先看一下笛卡尔积的结果
首先看一下全量表的明细数据
对明细数据进行聚合
select * from ( select count(*) as cnt,app_id,down_date from dwd_hispace_down_ds group by app_id,down_date) a left join ( select count(*) as cnt,app_id,down_date from dwd_hispace_down_ds group by app_id,down_date) b on a.app_id=b.app_id
select a.down_date ,a.app_id ,sum(case when a.down_date - b.down_date=7 then a.cnt/b.cnt else 0 end) as seven_tb ,sum(case when add_months(a.down_date,-1) = b.down_date then a.cnt/b.cnt else 0 end) as month_tb from ( select count(*) as cnt,app_id,down_date from dwd_hispace_down_ds group by app_id,down_date) a left join ( select count(*) as cnt,app_id,down_date from dwd_hispace_down_ds group by app_id,down_date) b on a.app_id=b.app_id group by a.down_date ,a.app_id
方法二:面对每天的增量表,假如是hive大数据的情景下,且数据量特别大,一次只能扫描一天的分区。如要计算20211201当天的周月同比。先看一下所有分区的数据
select app_id ,nvl(max(cnt)/max(seven_cnt),0) as seven_tb ,nvl(max(cnt)/max(month_cnt),0) as month_tb from ( select app_id ,count(*) as cnt ,null as seven_cnt ,null as month_cnt from DWD_HISPACE_DOWN_DM where pt_d=20211201 group by app_id union all select app_id ,null as cnt ,count(*) as seven_cnt ,null as month_cnt from DWD_HISPACE_DOWN_DM where pt_d=to_char(to_date('20211201','yyyy-mm-dd')-7,'yyyymmdd') group by app_id union all select app_id ,null as cnt ,null as seven_cnt ,count(*) as month_cnt from DWD_HISPACE_DOWN_DM where pt_d=to_char(add_months(to_date('20211201','yyyy-mm-dd'),-1),'yyyymmdd') group by app_id ) group by app_id
方法三 关联时间维度表
select a1.app_id ,sum(a1.cnt)/sum(b1.cnt) as week_tb from ( select app_id ,cnt ,b.week_ago from ( select count(*) as cnt,app_id,down_date from dwd_hispace_down_ds group by app_id,down_date) a left join dim_time b on a.down_date=tO_date(b.today,'yyyymmdd') ) a1 inner join ( select count(*) as cnt,app_id,down_date from dwd_hispace_down_ds group by app_id,down_date ) b1 on tO_date(a1.week_ago,'yyyymmdd')=b1.down_date and a1.app_id=b1.app_id group by a1.app_id
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)