在dwd层关联维度表时经常会遇到数据倾斜,假如某app商店的曝光数据超百亿,在关联app维度表表时,排在前面的app肯定占了大部分数据,如抖音微信,肯定会遇到数据倾斜。这个时候该怎么解决呢? mapjoin无疑是比较好的选择!
步骤1:先建个临时表,筛选3000万以上的数据
CREATE TABLE temp_store_exposure_dm AS SELECt t1.app_id ,t2.app_name ,t2.app_category FROM ( SELECt app_id FROM dwd_store_exposure_dm WHERe pt_d='$date' group by app_id having count(*)>30000000 ) t1 inner join ( SELECt app_id ,app_name ,app_category FROM dim_store_app_ds WHERe pt_d='$date' ) t2 ON t1.app_id=t2.app_id
步骤2:事实表关联临时表 , 走mapjoin
SELECt t1.app_id ,t2.app_name ,t2.app_category FROM ( SELECt app_id FROM dwd_store_exposure_dm WHERe pt_d='$date' ) t1 LEFT JOIN ( SELECt app_id ,app_name ,app_category FROM temp_store_exposure_dm ) t2 ON t1.app_id=t2.app_id UNIOn ALL SELECt t1.app_id ,t2.app_name ,t2.app_category FROM ( SELECt app_id FROM dwd_store_exposure_dm WHERe pt_d='$date' ) t1 LEFT JOIN ( SELECt app_id ,app_name ,app_category FROM temp_store_exposure_dm ) t2 ON t1.app_id=t2.app_id WHERe t2.app_id IS NULL
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)