目录
Window as 使用
Hive开窗函数
窗口帧:用于从分区中选择指定的多条数据,供窗口函数处理
Hive行
转列
Window as 使用
列子:
Window w as (partition by clazz order by score desc) ;
Hive开窗函数开窗函数一般使用和聚合函数一起
row_number:无并列排名
用法:select xxx,row_number() over(partition by 分组字段 order by 排序字段 desc) as rn from tb group by xxxx;
dense_rank:有并列排名,并且依次递增
rank:有并列排名,不依次递增
percent_rank:(rank的结果-1)/(分区内数据的个数-1)
cume_dist:计算某个窗口或分区某个值的累积分布
假定升序排序,则使用以下公式确定累积分布: 小于等于当前值x的行数 / 窗口或partition分区内的总行数。其中,x 等于 order by 子句中指定的列的当前行中的值。
NTILE(n):对分区内数据再分成n组,然后打上组号
max、min、avg、count、sum:基于每个partition分区内的数据做对应的计算
窗口帧:用于从分区中选择指定的多条数据,供窗口函数处理ROWS和RANGE两种模式,都需要配置上界和下界
ROWS BETWEEN UNDOUNDED PRECEDING AND CURRENT ROW
表示选择分区起始记录到当前记录的所有行;
如果没有定义窗口帧
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
range between 3 PRECEDING and 11 FOLLOWING
Hive行 转列select name,s from student lateral view explode(weight) s1 as s;
select key from (select explode(map('key1',1,'key2',2,'key3',3)) as (key,value)) t;
key1
key2
key3
Hive列转行
select name,collect_list(weight) from students group by name;
Hive自定义函数
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)