Lateral View 用于和 UDTF 函数(explode、split)结合来使用。
首先通过 UDTF 函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。
主要解决在 select 使用 UDTF 做查询过程中,查询只能包含单个 UDTF,不能包含其 他字段、以及多个 UDTF 的问题
语法:
LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)
hive> select explode(likes) from person7; OK lol book movie lol book movie lol book movie lol book hive> select id,name,mycol1,mycol2,mycol3 from person7 lateral view explode(likes) mytb1 as mycol1 lateral view explode(address) mytb2 as mycol2,mycol3; OK 1 小明 1 lol beijing xisanqi 1 小明 1 lol shanghai pudong 1 小明 1 book beijing xisanqi 1 小明 1 book shanghai pudong 1 小明 1 movie beijing xisanqi 1 小明 1 movie shanghai pudong 2 小明 2 lol beijing xisanqi 2 小明 2 lol shanghai pudong 2 小明 2 book beijing xisanqi 2 小明 2 book shanghai pudong 2 小明 2 movie beijing xisanqi 2 小明 2 movie shanghai pudong 3 小明 3 lol beijing xisanqi 3 小明 3 lol shanghai pudong 3 小明 3 book beijing xisanqi 3 小明 3 book shanghai pudong 3 小明 3 movie beijing xisanqi 3 小明 3 movie shanghai pudong 4 小明 4 lol beijing xisanqi 4 小明 4 lol shanghai pudong 4 小明 4 book beijing xisanqi 4 小明 4 book shanghai pudong 4 小明 4 movie beijing xisanqi 4 小明 4 movie shanghai pudong 5 小明 5 lol beijing xisanqi hive> select count(distinct(mycol2)),count(distinct(mycol3)) from person7 lateral view explode(address) mytb2 as mycol2,mycol3; Query ID = root_20211112124842_626f1f8e-0764-4306-ab9d-dc404771b732 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=2.Hive 视图In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1636690643978_0001, Tracking URL = http://node3:8088/proxy/application_1636690643978_0001/ Kill Command = /opt/hadoop-2.6.5/bin/hadoop job -kill job_1636690643978_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2021-11-12 12:49:07,938 Stage-1 map = 0%, reduce = 0% 2021-11-12 12:49:29,804 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.54 sec 2021-11-12 12:49:46,263 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.42 sec MapReduce Total cumulative CPU time: 5 seconds 420 msec Ended Job = job_1636690643978_0001 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.42 sec HDFS Read: 12719 HDFS Write: 4 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 420 msec OK 2 3 Time taken: 66.565 seconds, Fetched: 1 row(s) hive> create table yyy( > id int, > name string); OK Time taken: 2.269 seconds
和关系型数据库中的普通视图一样,hive 也支持视图
特点:
不支持物化视图
只能查询,不能做加载数据 *** 作
视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT 语句 *** 作,view 当中定义的优先级更高
view 支持迭代视图
#mysql 中支持视图删除: CREATE VIEW v_users AS SELECt * FROM myusers; DELETe FROM v_users WHERe id = '1316403900579872';
创建视图:
hive> create view v_psn as select * from person5; OK Time taken: 0.275 seconds
查询视图:
在对应元数据库中的 TBLS 中多出一条记录:
删除视图:
hive> drop view v_psn; OK Time taken: 1.22 seconds3.Hive 索引
目的:优化查询以及检索性能
创建索引:
hive> create index t1_index > on table person5(name) > as "org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler" > with deferred rebuild > in table t1_index_table > comment "table person5 name index comment"; OK Time taken: 1.732 seconds
as:指定索引器;
in table:指定索引表,若不指定默认生成在 default__person5_t1_index__表中
重建索引(建立索引之后必须重建索引才能生效)
ALTER INDEX t1_index ON person5 REBUILD;
重建完毕之后,再次查询有索引数据:select * from t1_index_table;
hive> select * from t1_index_table; OK NULL hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [513] 10 小明 1 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [0] 10 小明 2 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [59] 10 小明 3 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [118] 10 小明 4 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [177] 10 小明 5 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [236] 10 小明 6 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [290] 10 小明 7 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [349] 10 小明 8 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [402] 10 小明 9 hdfs://mycluster/user/hive_remote/warehouse/person5/age=10/person01.txt [455] 10
删除索引
DROP INDEX IF EXISTS t1_index ON person5;
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)