day16hive分桶_随笔

day16hive分桶一. 分桶表

1.1概念：就是MapReduce的分区，将元数据分开存放到不同的文件，分桶就是分文件
1.2作用：抽样查询，提高join桶相同
1.3 *** 作过程：
开启分桶：

set hive.enforce.bucketing=true;

设置reduce的个数：

set mapreduce.job.reduce=3;//2.x之后不起作用

--创建分桶表
create table course(
    cid int,
    c_name string,
    tid string
)clustered by (cid)into 3 buckets row format delimited fields terminated by 't';

--创建临时表
create table course_common(
                       cid int,
                       c_name string,
                       tid string
) row format delimited fields terminated by 't';
--给临时表加载数据
load data  local inpath'/export/data/hivedata/course.data' into table course_common;
--将临时表数据查询插入到分桶表
insert overwrite table course select  * from course_common cluster by (cid)

1.4分区和分桶的区别：
分区的表现形式是文件夹，分桶的表现是文件。分区是为了将数据进行分类管理，分桶是为了抽样

1.5修改表
1.5.1重命名表：

alter table 旧表名 rename to  新表名;

1.5.2增加列：

alter table score add columns (mycol string,mysoc string);

1.5.3查看表结构

desc 表名

1.5.4更新列：
1.5.5清空表（内部表）：实际是删除HDFS上的表目录文件

truncate table score;

二.Hive的数据加载

insert into table name partition(month=‘20201’)values();hadoop fs -putload data local/ inpath ''into table course_commoninsert into table B partition(month=‘20206’)select * from a;create table score as select * from score; 三.Hive中的数据导出

将查询结果导出到本地

insert  overwrite  local  directory '/export/data/hivedata/ae'select * from score2;

指定分隔符导出

insert  overwrite  local  directory '/export/data/hivedata/ae' row format delimited fields terminated by 't'
select * from score2;

导出到hdfs

insert  overwrite  directory '/export/data/hivedata/ae' row format delimited fields terminated by 't'
select * from score2;

利用export导出到hdfs

export table score to ‘/路径’

四.hive的查询

全表扫描select *from score列别名：select sid as myid from score;常用函数：求总行数：count（distinct 重复）；求最大值max（）；最小值min（）；和sum（）；平均值avg（）；limit select * from score limit 0,5从索引0开始显示五条select * from score where sscore between 80 and 100;select * from score where sscore is null;select * from score where sscore in(80,90）select * from score where sscore like’_8%'注“_代表一个字符%代表多个”select * from score where sscore rlike‘[1]’//包含1的select * from score group by sid；分组select sid ,avg(sscore) from score score group by sid;select sid ,avg(sscore) from score score group by sid having sc>85;inner join求两表的交集select sid from score s,course c where s.tid=c.tidleft join;左边所有包含的记录返回select sid from score s left join course c on s.tid=c.tid

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5716988.html

day16hive分桶

发表评论

评论列表（0条）