《MysqL应用在Hadoop集群环境中为MysqL安装配置Sqoop的教程》要点:
本文介绍了MysqL应用在Hadoop集群环境中为MysqL安装配置Sqoop的教程,希望对您有用。如果有疑问,可以联系我们。
Sqoop是一个用来将Hadoop和关系型数据库中的数据相互转移的工具,可以将一个关系型数据库(例如 : MysqL,Oracle,Postgres等)中的数据导进到Hadoop的HDFS中,也可以将HDFS的数据导进到关系型数据库中.MysqL入门
Sqoop中一大亮点就是可以通过hadoop的mapreduce把数据从关系型数据库中导入数据到HDFS.MysqL入门
一、安装sqoop
1、下载sqoop压缩包,并解压MysqL入门
压缩包分别是:sqoop-1.2.0-CDH3B4.tar.gz,hadoop-0.20.2-CDH3B4.tar.gz,MysqL JDBC驱动包mysql-connector-java-5.1.10-bin.jarMysqL入门
[root@node1 ~]# ll
drwxr-xr-x 15 root root 4096 Feb 22 2011 hadoop-0.20.2-CDH3B4-rw-r--r-- 1 root root 724225 Sep 15 06:46 mysql-connector-java-5.1.10-bin.jardrwxr-xr-x 11 root root 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4
2、将sqoop-1.2.0-CDH3B4拷贝到/home/hadoop目录下,并将MysqL JDBC驱动包和hadoop-0.20.2-CDH3B4下的hadoop-core-0.20.2-CDH3B4.jar至sqoop-1.2.0-CDH3B4/lib下,最后修改一下属主.
MysqL入门
[root@node1 ~]# cp mysql-connector-java-5.1.10-bin.jar sqoop-1.2.0-CDH3B4/lib[root@node1 ~]# cp hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar sqoop-1.2.0-CDH3B4/lib[root@node1 ~]# chown -R hadoop:hadoop sqoop-1.2.0-CDH3B4[root@node1 ~]# mv sqoop-1.2.0-CDH3B4 /home/hadoop[root@node1 ~]# ll /home/hadoop
total 35748-rw-rw-r-- 1 hadoop hadoop 343 Sep 15 05:13 derby.logdrwxr-xr-x 13 hadoop hadoop 4096 Sep 14 16:16 hadoop-0.20.2drwxr-xr-x 9 hadoop hadoop 4096 Sep 14 20:21 hive-0.10.0-rw-r--r-- 1 hadoop hadoop 36524032 Sep 14 20:20 hive-0.10.0.tar.gzdrwxr-xr-x 8 hadoop hadoop 4096 Sep 25 2012 jdk1.7drwxr-xr-x 12 hadoop hadoop 4096 Sep 15 00:25 mahout-distribution-0.7drwxrwxr-x 5 hadoop hadoop 4096 Sep 15 05:13 metastore_db-rw-rw-r-- 1 hadoop hadoop 406 Sep 14 16:02 scp.shdrwxr-xr-x 11 hadoop hadoop 4096 Feb 22 2011 sqoop-1.2.0-CDH3B4drwxrwxr-x 3 hadoop hadoop 4096 Sep 14 16:17 tempdrwxrwxr-x 3 hadoop hadoop 4096 Sep 14 15:59 user
3、配置configure-sqoop,注释掉对于HBase和ZooKeeper的检查
MysqL入门
[root@node1 bin]# pwd
/home/hadoop/sqoop-1.2.0-CDH3B4/bin
[root@node1 bin]# vi configure-sqoop
#!/bin/bash## licensed to Cloudera,Inc. under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership....# Check: If we can't find our dependencIEs,give up here.if [ ! -d "${HADOOP_HOME}" ]; then echo "Error: $HADOOP_HOME does not exist!" echo 'Please set $HADOOP_HOME to the root of your Hadoop installation.' exit 1fi#if [ ! -d "${HBASE_HOME}" ]; then# echo "Error: $HBASE_HOME does not exist!"# echo 'Please set $HBASE_HOME to the root of your HBase installation.'# exit 1#fi#if [ ! -d "${ZOOKEEPER_HOME}" ]; then# echo "Error: $ZOOKEEPER_HOME does not exist!"# echo 'Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.'# exit 1#fi
4、修改/etc/profile和.bash_profile文件,添加Hadoop_Home,调整PATH
MysqL入门
[hadoop@node1 ~]$ vi .bash_profile
# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then . ~/.bashrcfi# User specific environment and startup programsHADOOP_HOME=/home/hadoop/hadoop-0.20.2PATH=$HADOOP_HOME/bin:$PATH:$HOME/binexport HIVE_HOME=/home/hadoop/hive-0.10.0export MAHOUT_HOME=/home/hadoop/mahout-distribution-0.7export PATH HADOOP_HOME
二、测试SqoopMysqL入门
1、查看MysqL中的数据库:
MysqL入门
[hadoop@node1 bin]$ ./sqoop List-databases --connect jdbc:MysqL://192.168.1.152:3306/ --username sqoop --password sqoop
13/09/15 07:17:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. ConsIDer using -P instead.13/09/15 07:17:17 INFO manager.MysqLManager: Executing sql statement: SHOW DATABASESinformation_schemaMysqLperformance_schemasqooptest
2、将MysqL的表导入到hive中:MysqL入门
[hadoop@node1 bin]$ ./sqoop import --connect jdbc:MysqL://192.168.1.152:3306/sqoop --username sqoop --password sqoop --table test --hive-import -m 1
13/09/15 08:15:01 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. ConsIDer using -P instead.13/09/15 08:15:01 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can overrIDe13/09/15 08:15:01 INFO tool.BaseSqoopTool: delimiters with --fIElds-terminated-by,etc.13/09/15 08:15:01 INFO tool.CodeGenTool: Beginning code generation13/09/15 08:15:01 INFO manager.MysqLManager: Executing sql statement: SELECT t.* FROM `test` AS t liMIT 113/09/15 08:15:02 INFO manager.MysqLManager: Executing sql statement: SELECT t.* FROM `test` AS t liMIT 113/09/15 08:15:02 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2/bin/..13/09/15 08:15:02 INFO orm.CompilationManager: Found hadoop core jar at: /home/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar13/09/15 08:15:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a71936fd2bb45ea6757df22751a320e3/test.jar13/09/15 08:15:03 WARN manager.MysqLManager: It looks like you are importing from MysqL.13/09/15 08:15:03 WARN manager.MysqLManager: This transfer can be faster! Use the --direct13/09/15 08:15:03 WARN manager.MysqLManager: option to exercise a MysqL-specific fast path.13/09/15 08:15:03 INFO manager.MysqLManager: Setting zero DATETIME behavior to convertToNull (MysqL)13/09/15 08:15:03 INFO mapreduce.importJobBase: Beginning import of test13/09/15 08:15:04 INFO manager.MysqLManager: Executing sql statement: SELECT t.* FROM `test` AS t liMIT 113/09/15 08:15:05 INFO mapred.JobClIEnt: Running job: job_201309150505_000913/09/15 08:15:06 INFO mapred.JobClIEnt: map 0% reduce 0%13/09/15 08:15:34 INFO mapred.JobClIEnt: map 100% reduce 0%13/09/15 08:15:36 INFO mapred.JobClIEnt: Job complete: job_201309150505_000913/09/15 08:15:36 INFO mapred.JobClIEnt: Counters: 513/09/15 08:15:36 INFO mapred.JobClIEnt: Job Counters 13/09/15 08:15:36 INFO mapred.JobClIEnt: Launched map tasks=113/09/15 08:15:36 INFO mapred.JobClIEnt: fileSystemCounters13/09/15 08:15:36 INFO mapred.JobClIEnt: HDFS_BYTES_WRITTEN=58332313/09/15 08:15:36 INFO mapred.JobClIEnt: Map-Reduce Framework13/09/15 08:15:36 INFO mapred.JobClIEnt: Map input records=6553613/09/15 08:15:36 INFO mapred.JobClIEnt: Spilled Records=013/09/15 08:15:36 INFO mapred.JobClIEnt: Map output records=6553613/09/15 08:15:36 INFO mapreduce.importJobBase: Transferred 569.6514 KB in 32.0312 seconds (17.7842 KB/sec)13/09/15 08:15:36 INFO mapreduce.importJobBase: RetrIEved 65536 records.13/09/15 08:15:36 INFO hive.Hiveimport: Removing temporary files from import process: test/_logs13/09/15 08:15:36 INFO hive.Hiveimport: Loading uploaded data into Hive13/09/15 08:15:36 INFO manager.MysqLManager: Executing sql statement: SELECT t.* FROM `test` AS t liMIT 113/09/15 08:15:36 INFO manager.MysqLManager: Executing sql statement: SELECT t.* FROM `test` AS t liMIT 113/09/15 08:15:41 INFO hive.Hiveimport: Logging initialized using configuration in jar:file:/home/hadoop/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.propertIEs13/09/15 08:15:41 INFO hive.Hiveimport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201309150815_1877092059.txt13/09/15 08:16:10 INFO hive.Hiveimport: OK13/09/15 08:16:10 INFO hive.Hiveimport: Time taken: 28.791 seconds13/09/15 08:16:11 INFO hive.Hiveimport: Loading data to table default.test13/09/15 08:16:12 INFO hive.Hiveimport: table default.test stats: [num_partitions: 0,num_files: 1,num_rows: 0,total_size: 583323,raw_data_size: 0]13/09/15 08:16:12 INFO hive.Hiveimport: OK13/09/15 08:16:12 INFO hive.Hiveimport: Time taken: 1.704 seconds13/09/15 08:16:12 INFO hive.Hiveimport: Hive import complete.
三、Sqoop 命令MysqL入门
Sqoop大约有13种命令,和几种通用的参数(都支持这13种命令),这里先列出这13种命令.
接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments,Incremental import arguments,Output line formatting arguments,input parsing arguments,Hive arguments,HBase arguments,Generic Hadoop command-line arguments,下面说明一下几个常用的命令:
1.Common arguments
通用参数,主要是针对关系型数据库链接的一些参数
1)列出MysqL数据库中的所有数据库
MysqL入门
sqoop List-databases Cconnect jdbc:MysqL://localhost:3306/ Cusername root Cpassword 123456
2)连接MysqL并列出test数据库中的表
MysqL入门
sqoop List-tables Cconnect jdbc:MysqL://localhost:3306/test Cusername root Cpassword 123456
命令中的test为MysqL数据库中的test数据库名称 username password分别为MysqL数据库的用户密码MysqL入门
3)将关系型数据的表结构复制到hive中,只是复制表的结构,表中的内容没有复制过去.
MysqL入门
sqoop create-hive-table Cconnect jdbc:MysqL://localhost:3306/testCtable sqoop_test Cusername root Cpassword 123456 Chive-tabletest
其中 Ctable sqoop_test为MysqL中的数据库test中的表 Chive-table
test 为hive中新建的表名称MysqL入门
4)从关系数据库导入文件到hive中
MysqL入门
sqoop import Cconnect jdbc:MysqL://localhost:3306/zxtest Cusernameroot Cpassword 123456 Ctable sqoop_test Chive-import Chive-tables_test -m 1
5)将hive中的表数据导入到MysqL中,在进行导入之前,MysqL中的表
hive_test必须已经提起创建好了.
MysqL入门
sqoop export Cconnect jdbc:MysqL://localhost:3306/zxtest Cusernameroot Cpassword root Ctable hive_test Cexport-dir/user/hive/warehouse/new_test_partition/dt=2012-03-05
6)从数据库导出表的数据到HDFS上文件
MysqL入门
./sqoop import Cconnectjdbc:MysqL://10.28.168.109:3306/compression Cusername=hadoopCpassword=123456 Ctable HADOOP_USER_INFO -m 1 Ctarget-dir/user/test
7)从数据库增量导入表数据到hdfs中
MysqL入门
./sqoop import Cconnect jdbc:MysqL://10.28.168.109:3306/compressionCusername=hadoop Cpassword=123456 Ctable HADOOP_USER_INFO -m 1Ctarget-dir /user/test Ccheck-column ID Cincremental appendClast-value 3总结
以上是内存溢出为你收集整理的Mysql应用在Hadoop集群环境中为MySQL安装配置Sqoop的教程全部内容,希望文章能够帮你解决Mysql应用在Hadoop集群环境中为MySQL安装配置Sqoop的教程所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)