Hive的安装配置和使用_随笔

Hive的安装配置和使用

前言

（1）小杜出品(海大计科大数据专用版)，请收到者个人使用！！！

（2）本教程在目录上要留意，要根据使用者的具体情况去修改！

（3）出现“~”的目录大多数要使用安装账户/普通账户目录，但修改“/etc/profile”和“~/.bashrc”目录要使用root账户！

（4）慎用root账户，要使用root账户的地方会提示。“source /etc/profile”要安装账户/普通账户目录和root账户都弄！

（5）相关的表名或数据库名等记得要改改，不要和小杜一样！！！

（6）使用者可能会出现少数BUG，请自行解决。本教程未收录BUG解决方法！

一、Hive环境的搭建

（1）安装配置MySQL

（2）安装Hive

二、Hive的基本 *** 作

（1）数据库 *** 作

（2）表 *** 作

（3）表数据载入

三、利用Hive进行大容量数据的分析

（1）创建1个外部表格，并关联sogou数据，使用HiveQL完成下列数据分析；

Hive的安装配置和使用前言（1）小杜出品(海大计科大数据专用版)，请收到者个人使用！！！（2）本教程在目录上要留意，要根据使用者的具体情况去修改！（3）出现“~”的目录大多数要使用安装账户/普通账户目录，但修改“/etc/profile”和“~/.bashrc”目录要使用root账户！（4）慎用root账户，要使用root账户的地方会提示。“source /etc/profile”要安装账户/普通账户目录和root账户都弄！（5）相关的表名或数据库名等记得要改改，不要和小杜一样！！！（6）使用者可能会出现少数BUG，请自行解决。本教程未收录BUG解决方法！一、Hive环境的搭建（1）安装配置MySQL

1、查询以前安装的mysql相关包:(root账户)

 rpm -qa | grep mysql

2、输入指令，卸载：(root账户)

rpm -e  --nodeps mysql-libs-*

3、安装MySQL数据库（包括Server包和Client包的安装）【已将Server包和Client包上传到Master结点】(root账户)

rpm -ivh MySQL-server-5.6.14-1.linux_glibc2.5.x86_64.rpm
rpm -ivh MySQL-client-community-5.1.73-1.rhel5.x86_64.rpm

4、开启mysql服务(root账户)

service mysql start

5、查看mysql运行状态(root账户)

service mysql status

6、将mysql的默认配置文件拷贝到/etc/my.cnf(root账户)

cp /usr/share/mysql/my-default.cnf /etc/my.cnf

7、添加skip-grant-tables到/etc/my.cnf的[mysqld]配置项，设置MySQL免密登录(root账户)

8、停止mysql服务(root账户)

service mysql stop

9、重新开启mysql服务(root账户)

service mysql start

10、MySQL免密登录(root账户)

mysql -uroot -p

11、进入MySQL，重置root账户登录密码并更改root账户过期状态(root账户)

use mysql
update user set password=password("123456") where user='root';
flush privileges;
select user,password,password_expired from user where user= "root";
update user set password_expired="N" where user='root';
flush privileges;
exit

12、去掉步骤7中在/etc/my.cnf添加的内容：skip-grant-tables，并重启mysql服务(root账户)

service mysql restart

13、在MySQL中创建hive用户及数据库(root账户)

grant all on *.* to hadoop@'localhost' identified by "hadoop";
grant all on *.* to hadoop@'master' identified by "hadoop";
flush privileges;
create database hive_1;
exit

（2）安装Hive

1、解压并安装Hive的jar包【已将Hive的jar包上传到Master结点】

cp ~//resources/software/hive/apache-hive-0.13.1-bin.tar.gz ~/
cd
tar -xzvf ~/apache-hive-0.13.1-bin.tar.gz

2、生成hive-site.xml配置文件

gedit ~/apache-hive-0.13.1-bin/conf/hive-site.xml

添加以下内容：




  
    hive.metastore.local
    ture
  
  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://master:3306/hive_1?characterEncoding=UTF-8
    
  
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
  
  
    javax.jdo.option.ConnectionUserName
    hadoop
  
  
    javax.jdo.option.ConnectionPassword
    hadoop

3、安装MySQL连接驱动插件Java Connector；

cp ~//resources/software/mysql/mysql-connector-java-5.1.27.tar.gz ~/
cd
tar -xzvf ~/mysql-connector-java-5.1.27.tar.gz

4、将jar包复制到Hive依赖库中，即将 mysql-connector-java-5.1.27-bin.jar 复制到 ~/apache-hive-0.13.1-bin/lib/ 下。

cp  ~/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ~/apache-hive-0.13.1-bin/lib/

5、打开系统环境变量设置文件(root账户)。

gedit /etc/profile

6、将Hive安装路径添加到系统环境变量设置文件末尾(root账户)。

export HIVE_HOME=/home/2011921408dxb/apache-hive-0.13.1-bin
export PATH=$PATH:$HIVE_HOME/bin

7、激活系统环境变量设置文件，使其生效。

[特殊步骤 (root账户) 注：该步骤每台虚拟机用一次就好]
vim ~/.bashrc
在最后一行添加 source /etc/profile
重启结点
[主要步骤 （普通账户+root账户）]
source /etc/profile

二、Hive的基本 *** 作（1）数据库 *** 作

1、进入Hive(0(要先开启Hadoop），之后的命令在Hive下运行。

Hive

2、创建数据库test，建表。

create database test;

3、检索数据库（模糊查看），检索数据库名称形如 ‘teXXXX’的。

show databases like 'te*';

4、查看数据库详情。

describe database test;

5、删除数据库test。

drop database test cascade;

（2）表 *** 作

1、创建1个学生数据库stus，在其中创建1张内部表Student，该表包含两列：学号（字符型），姓名（字符型）；

create database stus;
use stus;
create table Student(sno string , name string);
describe Student;

2、创建1个和已经存在的Student表结构相同的表格名为Student2

create table Student2 like Student;
describe Student2;

3、修改表Student结构，添加新的1列：年龄（整型）。

alter table Student add columns (age int);

（3）表数据载入

1、创建一个表格，名为Employees；列名自定义，其结构能载入下列格式的数据：

1,hengdian,1000.0,13872787890,Zhejiang2,hengqin,1234.0,18739292798,Guangdong3,baishui,8797.0,13490980090,Hunan

create database company;
use company;
create table Employees(id int , name string, wages float , phone string , city string) row format delimited fields terminated by ',';

2、使用LOAD语句将上述数据内容载入表格（注意数据的来源路径是hdfs文件系统里的路径)

load data inpath '/user/resources/data1/emp.csv' into table Employees;
select * from Employees;

三、利用Hive进行大容量数据的分析（1）创建1个外部表格，并关联sogou数据，使用HiveQL完成下列数据分析；

1、创建数据库

create database test;

2、创建1个用于关联sogou数据的外部表格，并关联sogou数据。（注意数据的来源路径是hdfs文件系统里的路径，且该文件夹下只有文件sogou.500w.utf8，无其他文件。)

use test;
create external table sogou_500w_view(
     link_opening_time string, 
     uid string,
     search_keyword string,
     user_open_order int ,
     search_results_ranking int ,
     user_open_link string )
 comment 'this is the sogou_500w view table'
 row format delimited fields terminated by 't'
 stored as textfile
 location '/user/resources/data1';

注：该目录'/user/resources/data1'不能有除sogou数据文件外的其他文件。

3、统计关键字非空查询的条数；

select count(*) from sogou_500w_view where search_keyword is not null;

4、统计每个uid的查询次数；

select uid ,count(uid) as num from sogou_500w_view group by uid;

5、搜索关键字内容包含“仙剑”超过三次的用户id；

select X.uid,X.num from(select Y.uid,count(Y.uid) as num from (select * from sogou_500w_view where search_keyword like concat('%','仙剑','%')) as Y group by Y.uid) as X where X.num > 3;

6、统计不重复的uid的行数；

select count(DISTINCT uid) from sogou_500w_view;

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5677728.html

Hive的安装配置和使用

发表评论

评论列表（0条）