1.1概念:就是MapReduce的分区,将元数据分开存放到不同的文件,分桶就是分文件
1.2作用:抽样查询,提高join桶相同
1.3 *** 作过程:
开启分桶:
set hive.enforce.bucketing=true;
设置reduce的个数:
set mapreduce.job.reduce=3;//2.x之后不起作用
--创建分桶表 create table course( cid int, c_name string, tid string )clustered by (cid)into 3 buckets row format delimited fields terminated by 't'; --创建临时表 create table course_common( cid int, c_name string, tid string ) row format delimited fields terminated by 't'; --给临时表加载数据 load data local inpath'/export/data/hivedata/course.data' into table course_common; --将临时表数据查询插入到分桶表 insert overwrite table course select * from course_common cluster by (cid)
1.4分区和分桶的区别:
分区的表现形式是文件夹,分桶的表现是文件。分区是为了将数据进行分类管理,分桶是为了抽样
1.5修改表
1.5.1重命名表:
alter table 旧表名 rename to 新表名;
1.5.2增加列:
alter table score add columns (mycol string,mysoc string);
1.5.3查看表结构
desc 表名
1.5.4更新列:
1.5.5清空表(内部表):实际是删除HDFS上的表目录文件
truncate table score;二.Hive的数据加载
- insert into table name partition(month=‘20201’)values();hadoop fs -putload data local/ inpath ''into table course_commoninsert into table B partition(month=‘20206’)select * from a;create table score as select * from score;
将查询结果导出到本地
insert overwrite local directory '/export/data/hivedata/ae'select * from score2;
指定分隔符导出
insert overwrite local directory '/export/data/hivedata/ae' row format delimited fields terminated by 't' select * from score2;
导出到hdfs
insert overwrite directory '/export/data/hivedata/ae' row format delimited fields terminated by 't' select * from score2;
利用export导出到hdfs
export table score to ‘/路径’四.hive的查询
- 全表扫描select *from score列别名:select sid as myid from score;常用函数:求总行数:count(distinct 重复);求最大值max();最小值min();和sum();平均值avg();limit select * from score limit 0,5从索引0开始显示五条select * from score where sscore between 80 and 100;select * from score where sscore is null;select * from score where sscore in(80,90)select * from score where sscore like’_8%'注“_代表一个字符%代表多个”select * from score where sscore rlike‘[1]’//包含1的select * from score group by sid;分组select sid ,avg(sscore) from score score group by sid;select sid ,avg(sscore) from score score group by sid having sc>85;inner join求两表的交集select sid from score s,course c where s.tid=c.tidleft join;左边所有包含的记录返回select sid from score s left join course c on s.tid=c.tid
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)