大数据hive篇_随笔

大数据hive篇 1、业务背景

最近接到的一个比较刁的体系建设需求，需求背景简述：一个功能有新老两个版本，每个功能可以通过AB两个方式进到功能中使用，其中A方式又可以细分成A1、A2、A3三个口子。

拆分方式是按三层来拆，按第一层是新老的拆分的同时，也能按第一层是进入方式来拆以拆分维度日活为例
具体的拆分方式可见下表：

2、解决方法介绍

目前的想到的可以用的解决方案是通过group setings中的with cube来获取
展开之前，笔者先介绍下grouping setings的使用笔记，不足之处，望多多指正。

一句话概括：group seting的使用方式是多个group by进行union all *** 作的简单表达；
常见的使用语法：

-- 使用方法1
group seting(a,b)、
-- 等价于 
group by a union all group by b

--WITH ROLLUP使用方法（类似取含a的所有排列组合）
select a, b, c from table group by a, b, c WITH ROLLUP;
-- 等价于
select a, b, c from table group by a, b, c
GROUPING SETS((a,b,c),(a,b),(a),());

-- with cube使用方法（类似取所有的排列组合）;
select a, b, c from table group by a, b, c WITH cube;
-- 等价于
select a, b, c from table group by a, b, c
GROUPING SETS((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),());

-- having的使用方法
 select A
       ,max(b) b1 
  from t 
 group by A 
having b1  > 1000;

-- 等价于

select t.A
      ,t.b1
  from (
        select A
              ,max(b) b1 
          from t 
         group by A
       ) t
  where t.b1 > 1000;

3、解决方案

不妨假设每一层分别为l1 l2 l3 用户为user_id
写法为：

select l1
      ,l2
      ,l3  
      ,is_ths_user
      ,count(1) as user_cnt  
  from table
 group by l1
         ,l2
         ,l3
  with cube```

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5618690.html

大数据hive篇

发表评论

评论列表（0条）