Hive cuberollupgrouping setsGROUPING

Hive cuberollupgrouping setsGROUPING,第1张

Hive cube / rollup / grouping sets/GROUPING

Hive CUBE / ROLLUP / GROUPING SETS / GROUPING__ID用法详解
  • GROUPING SETS
  • GROUPING__ID(注意这里是两个下划线)
  • CUBE
  • ROLLUP

cube / rollup / grouping sets/GROUPING__ID,经常会被问到这几个函数的区别,今天就好好整理一下。

GROUPING SETS

grouping sets用法是给group by做辅助作用的函数,在SQL中的位置是跟在group by后面。grouping sets括号中的参数必须是group by后面的字段:group by 字段1,字段2 grouping sets(字段1,字段2)。使用grouping sets的区别在于,group by会根据括号中的字段进行分组去重,然后根据select中的聚合函数对字段聚合,而加上grouping sets相当于把括号中的字段分别进行上述group by *** 作,然后进行union all聚合。我们来实际 *** 作举例说明一下:

  • 准备数据
    select * from tab1;
    

结果:

monthdayvalues2015-032015-03-10value12015-032015-03-10value52015-032015-03-12value72015-042015-04-12value32015-042015-04-13value22015-042015-04-13value42015-042015-04-16value42015-032015-03-10value22015-032015-03-10value32015-042015-04-12value52015-042015-04-13value42015-042015-04-15value32015-042015-04-15value22015-042015-04-16value1

使用grouping sets

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
FROM tab1
GROUP BY month,day 
GROUPING SETS (month,day) ;

等价于

SELECt month,NULL,COUNT(DISTINCT values) AS uv FROM tab1 GROUP BY month 
UNIOn ALL 
SELECt NULL,day,COUNT(DISTINCT values) AS uv FROM tab1 GROUP BY day;

输出结果都是一样的

monthdayuv2015-03NULL52015-04NULL6NULL2015-03-104NULL2015-03-121NULL2015-04-122NULL2015-04-133NULL2015-04-152NULL2015-04-162 GROUPING__ID(注意这里是两个下划线)

GROUPING__ID,表示结果属于哪一个分组集合,一般和GROUPING SETS一起使用。
例如:

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID 
FROM tab1
GROUP BY month,day 
GROUPING SETS (month,day) 
ORDER BY GROUPING__ID;

查询结果:

monthdayuvGROUPING__ID2015-03NULL512015-04NULL61NULL2015-03-1042NULL2015-03-1212NULL2015-04-1222NULL2015-04-1332NULL2015-04-1522NULL2015-04-1622

也可以这样查询

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID 
FROM tab1
GROUP BY month,day 
GROUPING SETS (month,day,(month,day)) 
ORDER BY GROUPING__ID;

等价于

SELECt month,NULL,COUNT(DISTINCT values) AS uv,1 AS GROUPING__ID FROM tab1 GROUP BY month 
UNIOn ALL 
SELECt NULL,day,COUNT(DISTINCT values) AS uv,2 AS GROUPING__ID FROM tab1 GROUP BY day
UNIOn ALL 
SELECt month,day,COUNT(DISTINCT values) AS uv,3 AS GROUPING__ID FROM tab1 GROUP BY month,day

查询结果

monthdayuvGROUPING__ID2015-03NULL512015-04NULL61NULL2015-03-1042NULL2015-03-1212NULL2015-04-1222NULL2015-04-1332NULL2015-04-1522NULL2015-04-16222015-032015-03-10432015-032015-03-12132015-042015-04-12232015-042015-04-13332015-042015-04-15232015-042015-04-1623 CUBE

根据GROUP BY的维度的所有组合进行聚合。

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID 
FROM tab1
GROUP BY month,day 
WITH CUBE 
ORDER BY GROUPING__ID;

等价于

SELECt NULL,NULL,COUNT(DISTINCT values) AS uv,0 AS GROUPING__ID FROM tab1
UNIOn ALL 
SELECt month,NULL,COUNT(DISTINCT values) AS uv,1 AS GROUPING__ID FROM tab1 GROUP BY month 
UNIOn ALL 
SELECt NULL,day,COUNT(DISTINCT values) AS uv,2 AS GROUPING__ID FROM tab1 GROUP BY day
UNIOn ALL 
SELECt month,day,COUNT(DISTINCT values) AS uv,3 AS GROUPING__ID FROM tab1 GROUP BY month,day

查询结果

monthdayuvGROUPING__IDNULLNULL702015-03NULL512015-04NULL61NULL2015-04-1222NULL2015-04-1332NULL2015-04-1522NULL2015-04-1622NULL2015-03-1042NULL2015-03-12122015-032015-03-10432015-032015-03-12132015-042015-04-16232015-042015-04-12232015-042015-04-13332015-042015-04-1523

这里的第一条数据就相当于没有group by任何字段,直接进行了聚合 *** 作的结果。

ROLLUP

是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

比如,以month维度进行层级聚合

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID  
FROM tab1
GROUP BY month,day
WITH ROLLUP 
ORDER BY GROUPING__ID;

可以实现这样的上钻过程:
总UV->月的UV->月天的UV

monthdayuvGROUPING__IDNULLNULL702015-03NULL512015-04NULL612015-032015-03-10432015-032015-03-12132015-042015-04-12232015-042015-04-13332015-042015-04-15232015-042015-04-1623

从查询结果我们可以看出,GROUPING__ID为0的就是总UV;GROUPING__ID为1的为月的uv;GROUPING__ID为3的是月+天uv

把month和day调换顺序,则以day维度进行层级聚合

SELECt 
	day,
	month,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID  
FROM tab1
GROUP BY day,month 
WITH ROLLUP 
ORDER BY GROUPING__ID;

可以实现这样的上钻过程:
总UV->天的UV->天月的UV

daymonthuvGROUPING__IDNULLNULL702015-04-13NULL312015-03-12NULL112015-04-15NULL212015-03-10NULL412015-04-16NULL212015-04-12NULL212015-04-122015-04232015-03-102015-03432015-03-122015-03132015-04-132015-04332015-04-152015-04232015-04-162015-0423

从查询结果我们可以看出,GROUPING__ID为0的就是总UV;GROUPING__ID为1的为天的uv;GROUPING__ID为3的是天+月uv

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5699062.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存