mysqlhive求实际活动时间

mysqlhive求实际活动时间,第1张

mysql/hive求实际活动时间 背景:

求每个品牌方活动实际持续时间,可能出现的情况有:

1)一个品牌方有多个重叠活动

2)一个品牌方有多个不重叠的活动,活动之间的断层不纳入总时间

数据格式:
tm_name  sdt        edt
xiaomi	2022-01-06	2022-01-09
xiaomi	2022-01-11	2022-01-15
xiaomi	2022-01-17	2022-01-20
huawei	2022-01-06	2022-01-09
huawei	2022-01-08	2022-01-15
meizu	2022-01-06	2022-01-20
meizu	2022-01-09	2022-01-15
meizu	2022-01-12	2022-01-25
TCL	2022-01-06	2022-01-20
TCL	2022-01-09	2022-01-12
TCL	2022-01-15	2022-01-25
建表语句 :
-- 建表
DROp TABLE IF EXISTS `trademark`;
CREATE TABLE `trademark`  (
  `tm_name` varchar(255) CHARACTER SET gbk COLLATE gbk_chinese_ci NULL DEFAULT NULL,
  `sdt` date NULL DEFAULT NULL,
  `edt` date NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = gbk COLLATE = gbk_chinese_ci ROW_FORMAT = Dynamic;

-- 数据装填
INSERT INTO `trademark` VALUES ('xiaomi', '2022-01-06', '2022-01-09');
INSERT INTO `trademark` VALUES ('xiaomi', '2022-01-11', '2022-01-15');
INSERT INTO `trademark` VALUES ('xiaomi', '2022-01-17', '2022-01-20');
INSERT INTO `trademark` VALUES ('huawei', '2022-01-06', '2022-01-09');
INSERT INTO `trademark` VALUES ('huawei', '2022-01-08', '2022-01-15');
INSERT INTO `trademark` VALUES ('meizu', '2022-01-06', '2022-01-20');
INSERT INTO `trademark` VALUES ('meizu', '2022-01-09', '2022-01-15');
INSERT INTO `trademark` VALUES ('meizu', '2022-01-12', '2022-01-25');
INSERT INTO `trademark` VALUES ('TCL', '2022-01-06', '2022-01-20');
INSERT INTO `trademark` VALUES ('TCL', '2022-01-09', '2022-01-12');
INSERT INTO `trademark` VALUES ('TCL', '2022-01-15', '2022-01-25');
具体步骤:

1、在写有时间重叠情况时的sql时,基本都是采用将开始时间和结束时间打散union在一起,如同求最大在线人数的思路一样,此处flag标记在下面的 *** 作会体现它的作用,注意union all和union的区别:

select tm_name,sdt dt,'1' flag from trademark
union all
select tm_name,edt dt,'-1' flag from trademark

效果:

2、将union在一起的数据进行开窗处理,条件就是按品牌分组,累积flag求日期是否重叠,若是重叠的日期sum(1)会一直变大,而不重叠的话相当于开始之后紧跟着结束sum(1)与sum(-1)相当于0,用此标志来判断是否断层,而用lead是来获取下一个时间,用来标记断层的时间范围:

with t as(
select tm_name,sdt dt,'1' flag from trademark
union all
select tm_name,edt dt,'-1' flag from trademark)
select tm_name,dt,
	sum(flag) over(partition by tm_name order by dt) f1,
	lead(dt,1,dt) over(partition by tm_name order by dt) last_dt
from t 

 效果:

如上图所示,xiaomi出现了2个断层,分别是【01-09,01-11】和【01-15,01-17】。而标记这一行的f1列数据正好都是0。 

3、用每个品牌最晚结束的时间 - 最开始的时间 - 断层的时间 = 实际活动时间

with t as(
select tm_name,sdt dt,'1' flag from trademark
union all
select tm_name,edt dt,'-1' flag from trademark)
select
	tm_name,
	datediff(max(dt),min(dt))-sum(if(f1=0,datediff(last_dt,dt)-1,0)) real_time
from (
	select tm_name,dt,
		sum(flag) over(partition by tm_name order by dt) f1,
		lead(dt,1,dt) over(partition by tm_name order by dt) last_dt
	from t ) t1 group by tm_name
扩展法二:
with t as(
select
	tm_name,
	#若开始时间小于当前最大的结束时间就用结束时间+1作为开始时间,避免时间叠加
	if(max_edt is not null and sdt<=max_edt,date_add(max_edt,interval 1 day),sdt) sdt,
	edt
from (
select
	*,
	#取出当前数据前面最大的结束时间
	max(edt) over(partition by tm_name order by sdt rows between unbounded preceding and 1 preceding) max_edt
from trademark) t1)
select 
	tm_name,
  #因为上面改了sdt,所以会导致某些重叠数据sdt>edt,这里要过滤掉
	sum(if(datediff(edt,sdt)<0,0,datediff(edt,sdt)+1)) real_time
from t
group by tm_name
结果:

mysql 

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5701702.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存