hive sql子查询缓慢

jupiter • 2022-12-16 • 随笔 • 阅读 10

hive sql子查询缓慢

查询数据最新分区，有时候数据不是t-1的，需要获取到最后一次的分区数据，之前使用的是最大分区：

select user_no,score from table_a where pday=(select max(pday) from table_a)

上述方式为子查询，会扫描所有分区执行非常缓慢，经查询发现，join管理会更快，如下：

select
    user_no
    ,score
from (select max(pday) pday from table_a where pday>='${three_day_ago}') t1
join table_a t2 on t1.pday=t2.pday

可酌情针对近x天的数据匹配

欢迎分享，转载请注明来源：内存溢出

分区数据查询酌情匹配

打赏

微信扫一扫

支付宝扫一扫

上一篇 2022-12-16

下一篇 2022-12-16

登录后才能评论