【Hive DQL之表连接】_随笔

【Hive DQL之表连接】
11
-------------------------Full join ----你有，我有，--你有，我没有---，  你没有，我有  ---- 两表全都显示，




--笛卡尔积-----每一一个join一遍    -----数据量大的吓人   6 * 6 = 36 


-------------------------------------------------


hive (mydb)> select * from u1 join u2 on u1.id = u2.id;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220112222831_c8641f66-8d56-4cb7-bf33-93e2b1e70128
Total jobs = 1
2022-01-12 22:28:39     Starting to launch local task to process map join;      maximum memory = 518979584
2022-01-12 22:28:40     Dump the side-table for tag: 0 with group count: 6 into file: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-28-31_054_4724095481681766367-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
2022-01-12 22:28:40     Uploaded 1 File to: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-28-31_054_4724095481681766367-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile00--.hashtable (386 bytes)
2022-01-12 22:28:40     End of local task; Time Taken: 1.147 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1641995564638_0001, Tracking URL = http://linux123:8088/proxy/application_1641995564638_0001/
Kill Command = /opt/lagou/servers/hadoop-2.9.2/bin/hadoop job  -kill job_1641995564638_0001
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-01-12 22:28:51,469 Stage-3 map = 0%,  reduce = 0%
2022-01-12 22:28:59,717 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.18 sec
MapReduce Total cumulative CPU time: 1 seconds 180 msec
Ended Job = job_1641995564638_0001
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1   Cumulative CPU: 1.18 sec   HDFS Read: 6092 HDFS Write: 147 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 180 msec
OK
u1.id   u1.name u2.id   u2.name
4       d       4       d
5       e       5       e
6       f       6       f
Time taken: 29.726 seconds, Fetched: 3 row(s)
hive (mydb)> select * from u1 left join u2 on u1.id = u2.id;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220112223213_e0369e9c-ff56-4164-a46e-1fbd05c7a21b
Total jobs = 1
2022-01-12 22:32:20     Starting to launch local task to process map join;      maximum memory = 518979584
2022-01-12 22:32:21     Dump the side-table for tag: 1 with group count: 6 into file: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-32-13_210_5733734088434832898-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
2022-01-12 22:32:21     Uploaded 1 File to: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-32-13_210_5733734088434832898-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (386 bytes)
2022-01-12 22:32:21     End of local task; Time Taken: 1.054 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1641995564638_0002, Tracking URL = http://linux123:8088/proxy/application_1641995564638_0002/
Kill Command = /opt/lagou/servers/hadoop-2.9.2/bin/hadoop job  -kill job_1641995564638_0002
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-01-12 22:32:31,981 Stage-3 map = 0%,  reduce = 0%
2022-01-12 22:32:37,151 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.74 sec
MapReduce Total cumulative CPU time: 740 msec
Ended Job = job_1641995564638_0002
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1   Cumulative CPU: 0.74 sec   HDFS Read: 5770 HDFS Write: 213 SUCCESS
Total MapReduce CPU Time Spent: 740 msec
OK
u1.id   u1.name u2.id   u2.name
1       a       NULL    NULL
2       b       NULL    NULL
3       c       NULL    NULL
4       d       4       d
5       e       5       e
6       f       6       f
Time taken: 25.01 seconds, Fetched: 6 row(s)
hive (mydb)> select * from u1 right join u2 on u1.id = u2.id;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220112223714_8fb055ba-3d36-4ea7-a252-b9ff2bb2ec54
Total jobs = 1
2022-01-12 22:37:21     Starting to launch local task to process map join;      maximum memory = 518979584
2022-01-12 22:37:22     Dump the side-table for tag: 0 with group count: 6 into file: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-37-14_046_7767994758097546401-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile20--.hashtable
2022-01-12 22:37:22     Uploaded 1 File to: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-37-14_046_7767994758097546401-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile20--.hashtable (386 bytes)
2022-01-12 22:37:22     End of local task; Time Taken: 1.094 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1641995564638_0003, Tracking URL = http://linux123:8088/proxy/application_1641995564638_0003/
Kill Command = /opt/lagou/servers/hadoop-2.9.2/bin/hadoop job  -kill job_1641995564638_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-01-12 22:37:32,594 Stage-3 map = 0%,  reduce = 0%
2022-01-12 22:37:37,701 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
MapReduce Total cumulative CPU time: 960 msec
Ended Job = job_1641995564638_0003
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1   Cumulative CPU: 0.96 sec   HDFS Read: 5770 HDFS Write: 213 SUCCESS
Total MapReduce CPU Time Spent: 960 msec
OK
u1.id   u1.name   u2.id   u2.name
4       d                    4       d
5       e                   5       e
6       f                    6       f
NULL    NULL       7       g
NULL    NULL       8       h
NULL    NULL       9       i
Time taken: 24.754 seconds, Fetched: 6 row(s)
hive (mydb)> select * from u1 full join u2 on u1.id = u2.id;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220112223920_77ef5961-e704-43d2-9a6b-2a56e7064e42
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Defaulting to jobconf value of: 4
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1641995564638_0004, Tracking URL = http://linux123:8088/proxy/application_1641995564638_0004/
Kill Command = /opt/lagou/servers/hadoop-2.9.2/bin/hadoop job  -kill job_1641995564638_0004
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 4
2022-01-12 22:39:28,499 Stage-1 map = 0%,  reduce = 0%
2022-01-12 22:39:40,692 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.7 sec
2022-01-12 22:39:48,647 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 5.34 sec
2022-01-12 22:39:51,866 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.94 sec
MapReduce Total cumulative CPU time: 7 seconds 940 msec
Ended Job = job_1641995564638_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2  Reduce: 4   Cumulative CPU: 7.94 sec   HDFS Read: 27544 HDFS Write: 540 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 940 msec
OK
u1.id   u1.name u2.id   u2.name
4       d       4       d
NULL    NULL    8       h
1       a       NULL    NULL
5       e       5       e
NULL    NULL    9       i
2       b       NULL    NULL
6       f       6       f
3       c       NULL    NULL
NULL    NULL    7       g
Time taken: 32.148 seconds, Fetched: 9 row(s)
hive (mydb)> select * from u1,u2;
FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please sethive.strict.checks.cartesian.product to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
hive (mydb)> set hive.strict.checks.cartesian.product;
hive.strict.checks.cartesian.product=true
hive (mydb)> set hive.strict.checks.cartesian.product=false;
hive (mydb)> select * from u1,u2;
Warning: Map Join MAPJOIN[9][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220112225121_2784f4ef-a0f1-4e44-95f5-d090246f17cc
Total jobs = 1
2022-01-12 22:51:29     Starting to launch local task to process map join;      maximum memory = 518979584
2022-01-12 22:51:30     Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-51-21_357_175171533818054252-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable
2022-01-12 22:51:30     Uploaded 1 File to: file:/tmp/root/d6200ad0-564c-4cd8-8a3a-2aa6255ab21d/hive_2022-01-12_22-51-21_357_175171533818054252-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile30--.hashtable (320 bytes)
2022-01-12 22:51:30     End of local task; Time Taken: 1.185 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1641995564638_0005, Tracking URL = http://linux123:8088/proxy/application_1641995564638_0005/
Kill Command = /opt/lagou/servers/hadoop-2.9.2/bin/hadoop job  -kill job_1641995564638_0005
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-01-12 22:51:37,443 Stage-3 map = 0%,  reduce = 0%
2022-01-12 22:51:43,602 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.26 sec
MapReduce Total cumulative CPU time: 1 seconds 260 msec
Ended Job = job_1641995564638_0005
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1   Cumulative CPU: 1.26 sec   HDFS Read: 5721 HDFS Write: 807 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 260 msec
OK
u1.id   u1.name u2.id   u2.name
1       a       4       d
2       b       4       d
3       c       4       d
4       d       4       d
5       e       4       d
6       f       4       d
1       a       5       e
2       b       5       e
3       c       5       e
4       d       5       e
5       e       5       e
6       f       5       e
1       a       6       f
2       b       6       f
3       c       6       f
4       d       6       f
5       e       6       f
6       f       6       f
1       a       7       g
2       b       7       g
3       c       7       g
4       d       7       g
5       e       7       g
6       f       7       g
1       a       8       h
2       b       8       h
3       c       8       h
4       d       8       h
5       e       8       h
6       f       8       h
1       a       9       i
2       b       9       i
3       c       9       i
4       d       9       i
5       e       9       i
6       f       9       i
Time taken: 24.318 seconds, Fetched: 36 row(s)
hive (mydb)>
欢迎分享，转载请注明来源：内存溢出
原文地址: https://outofmemory.cn/zaji/5706340.html
【Hive DQL之表连接】

发表评论

评论列表（0条）