Hive Lateral View、视图与索引

Hive Lateral View、视图与索引,第1张

Hive Lateral View、视图索引 1.Hive Lateral View

 Lateral View 用于和 UDTF 函数(explode、split)结合来使用。
 首先通过 UDTF 函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。
 主要解决在 select 使用 UDTF 做查询过程中,查询只能包含单个 UDTF,不能包含其 他字段、以及多个 UDTF 的问题
语法:

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)
hive> select explode(likes) from person7;
OK
lol
book
movie
lol
book
movie
lol
book
movie
lol
book
hive> select id,name,mycol1,mycol2,mycol3 from person7 lateral view explode(likes) mytb1 as mycol1 lateral view explode(address) mytb2 as mycol2,mycol3;
OK
1	小明 1	lol	beijing	xisanqi
1	小明 1	lol	shanghai	pudong 
1	小明 1	book	beijing	xisanqi
1	小明 1	book	shanghai	pudong 
1	小明 1	movie	beijing	xisanqi
1	小明 1	movie	shanghai	pudong 
2	小明 2	lol	beijing	xisanqi
2	小明 2	lol	shanghai	pudong 
2	小明 2	book	beijing	xisanqi
2	小明 2	book	shanghai	pudong 
2	小明 2	movie	beijing	xisanqi
2	小明 2	movie	shanghai	pudong 
3	小明 3	lol	beijing	xisanqi
3	小明 3	lol	shanghai	pudong 
3	小明 3	book	beijing	xisanqi
3	小明 3	book	shanghai	pudong 
3	小明 3	movie	beijing	xisanqi
3	小明 3	movie	shanghai	pudong 
4	小明 4	lol	beijing	xisanqi
4	小明 4	lol	shanghai	pudong 
4	小明 4	book	beijing	xisanqi
4	小明 4	book	shanghai	pudong 
4	小明 4	movie	beijing	xisanqi
4	小明 4	movie	shanghai	pudong 
5	小明 5	lol	beijing	xisanqi
hive> select count(distinct(mycol2)),count(distinct(mycol3)) from person7 lateral view explode(address) mytb2 as mycol2,mycol3;
Query ID = root_20211112124842_626f1f8e-0764-4306-ab9d-dc404771b732
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1636690643978_0001, Tracking URL = http://node3:8088/proxy/application_1636690643978_0001/
Kill Command = /opt/hadoop-2.6.5/bin/hadoop job  -kill job_1636690643978_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-11-12 12:49:07,938 Stage-1 map = 0%,  reduce = 0%
2021-11-12 12:49:29,804 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.54 sec
2021-11-12 12:49:46,263 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.42 sec
MapReduce Total cumulative CPU time: 5 seconds 420 msec
Ended Job = job_1636690643978_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.42 sec   HDFS Read: 12719 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 420 msec
OK
2	3
Time taken: 66.565 seconds, Fetched: 1 row(s)
hive> create table yyy(
    > id int,
    > name string);
OK
Time taken: 2.269 seconds

2.Hive 视图


和关系型数据库中的普通视图一样,hive 也支持视图
特点:
 不支持物化视图
 只能查询,不能做加载数据 *** 作
 视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
 view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT 语句 *** 作,view 当中定义的优先级更高
 view 支持迭代视图

#mysql 中支持视图删除:
 CREATE VIEW v_users AS SELECt * FROM myusers; 
 DELETe FROM v_users WHERe id = '1316403900579872';

创建视图:

hive> create view v_psn as select * from person5;
OK
Time taken: 0.275 seconds

查询视图:
在对应元数据库中的 TBLS 中多出一条记录:

删除视图:

hive> drop view v_psn;
OK
Time taken: 1.22 seconds

3.Hive 索引

目的:优化查询以及检索性能
创建索引:

hive> create index t1_index
    > on table person5(name)
    > as "org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler"
    > with deferred rebuild
    > in table t1_index_table
    > comment "table person5 name index comment";
OK
Time taken: 1.732 seconds

as:指定索引器;
in table:指定索引表,若不指定默认生成在 default__person5_t1_index__表中

重建索引(建立索引之后必须重建索引才能生效)

ALTER INDEX t1_index ON person5 REBUILD;

重建完毕之后,再次查询有索引数据:select * from t1_index_table;

hive> select * from t1_index_table;
OK
NULL	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[513]	10
小明 1	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[0]	10
小明 2	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[59]	10
小明 3	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[118]	10
小明 4	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[177]	10
小明 5	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[236]	10
小明 6	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[290]	10
小明 7	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[349]	10
小明 8	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[402]	10
小明 9	hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt	[455]	10

删除索引

DROP INDEX IF EXISTS t1_index ON person5;

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5479792.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-12
下一篇 2022-12-12

发表评论

登录后才能评论

评论列表(0条)

保存