sqoop1的架构:
sqoop2的架构:
sqoop1和sqoop2的功能差别:
create database sqoop2 default character set utf8; CREATE USER 'sqoop2'@'%' IDENTIFIED BY 'password'; GRANT ALL PRIVILEGES ON sqoop2. * TO 'sqoop2'@'%'; FLUSH PRIVILEGES; use sqoop2; create table widgets (id INT NOT NULL PRIMARY KEY AUTO_INCREMENT, widget_name VARCHAr(64) NOT NULL, price DECIMAL(10,2), design_date DATE, version INT, design_comment VARCHAr(100)); insert into widgets values (null,'sprocket',0.25,'2010-02-10',1,'connects two gizmos'); insert into widgets values (null,'gizmo',4.00,'2009-11-30',4,null); insert into widgets values (null,'gadget',99.99,'1983-08-13',13,'our flagship product');
2.创建MySQL和HDFS的链接
- 设置客户端配置服务器
[root@ip-186-31-16-68 ~]# sqoop2 Sqoop home directory: /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/sqoop2 Feb 28, 2018 9:47:31 PM java.util.prefs.FileSystemPreferences$1 run INFO: Created user preferences directory. Sqoop Shell: Type 'help' or 'h' for help. sqoop:000> set server --host ip-186-31-16-68.ap-southeast-1.compute.internal --port 12000 Server is set successfully
- 查看连接器
sqoop:000> show connector 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable +----+------------------------+------------------+------------------------------------------------------+----------------------+ | Id | Name | Version | Class | Supported Directions | +----+------------------------+------------------+------------------------------------------------------+----------------------+ | 1 | kafka-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.kafka.KafkaConnector | TO | | 2 | kite-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO | | 3 | hdfs-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO | | 4 | generic-jdbc-connector | 1.99.5-cdh5.13.1 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO | +----+------------------------+------------------+------------------------------------------------------+----------------------+
- 创建MySQL连接器
sqoop:000> create link -cid 4 Creating link for connector with id 4 Please fill following values to create new link object Name: mysql link link configuration JDBC Driver Class: com.mysql.jdbc.Driver JDBC Connection String: jdbc:mysql://ip-186-31-16-68.ap-southeast-1.compute.internal/sqoop2 Username: sqoop2 Password: ******** JDBC Connection Properties: There are currently 0 values in the map: entry# New link was successfully created with validation status OK and persistent id 2
- 创建HDFS连接器
sqoop:000> create link --cid 3 Creating link for connector with id 3 Please fill following values to create new link object Name: hdfs link link configuration HDFS URI: hdfs://ip-186-31-16-68.ap-southeast-1.compute.internal:8020/ New link was successfully created with validation status OK and persistent id 1
3.创建Sqoop2的job
sqoop:000> create job -f 2 -t 1 Creating job for links with from id 2 and to id 1 Please fill following values to create new job object Name: sqoop2 From database configuration Schema name: sqoop2 Table name: widgets Table SQL statement: Table column names: Partition column name: Null value allowed for the partition column: Boundary query: ToJob configuration Override null value: Null value: Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 1 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOM Choose: 0 Custom compression format: Output directory: Throttling resources Extractors: Loaders: There are issues with entered data, please revise your input: Name: sqoop2 From database configuration Schema name: sqoop2 Table name: widgets Table SQL statement: Table column names: Partition column name: Null value allowed for the partition column: Boundary query: ToJob configuration Override null value: Null value: Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILE Choose: 0 Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOM Choose: 0 Custom compression format: Error message: Can't be null nor empty Output directory: hdfs://ip-186-31-16-68.ap-southeast-1.compute.internal:8020/fayson Throttling resources Extractors: Loaders: New job was successfully created with validation status OK and persistent id 1 sqoop:000> show job +----+--------+----------------+--------------+---------+ | Id | Name | From Connector | To Connector | Enabled | +----+--------+----------------+--------------+---------+ | 1 | sqoop2 | 4 | 3 | true | +----+--------+----------------+--------------+---------+
4.启动Sqoop2的job
- 在后端直接执行命令启动Sqoop2的job
sqoop:000> start job -j 1 -s Submission details Job ID: 1 Server URL: http://ip-186-31-16-68.ap-southeast-1.compute.internal:12000/sqoop/ Created by: fayson Creation date: 2018-02-28 22:33:00 EST Lastly updated by: fayson External ID: job_1519874824956_0002 http://ip-186-31-16-68.ap-southeast-1.compute.internal:8088/proxy/application_1519874824956_0002/ 2018-02-28 22:33:00 EST: BOOTING - Progress is not available 2018-02-28 22:33:11 EST: RUNNING - 0.00 % 2018-02-28 22:33:21 EST: SUCCEEDED Counters: org.apache.hadoop.mapreduce.JobCounter SLOTS_MILLIS_MAPS: 8800 MB_MILLIS_MAPS: 9011200 TOTAL_LAUNCHED_MAPS: 2 MILLIS_MAPS: 8800 VCORES_MILLIS_MAPS: 8800 SLOTS_MILLIS_REDUCES: 0 OTHER_LOCAL_MAPS: 2 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter BYTES_READ: 0 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter BYTES_WRITTEN: 0 org.apache.hadoop.mapreduce.TaskCounter MAP_INPUT_RECORDS: 0 MERGED_MAP_OUTPUTS: 0 PHYSICAL_MEMORY_BYTES: 406069248 SPILLED_RECORDS: 0 FAILED_SHUFFLE: 0 CPU_MILLISECONDS: 1890 COMMITTED_HEAP_BYTES: 619708416 VIRTUAL_MEMORY_BYTES: 3162779648 MAP_OUTPUT_RECORDS: 3 SPLIT_RAW_BYTES: 219 GC_TIME_MILLIS: 75 org.apache.hadoop.mapreduce.FileSystemCounter FILE_READ_OPS: 0 FILE_WRITE_OPS: 0 FILE_BYTES_READ: 0 FILE_LARGE_READ_OPS: 0 HDFS_BYTES_READ: 219 FILE_BYTES_WRITTEN: 336174 HDFS_LARGE_READ_OPS: 0 HDFS_BYTES_WRITTEN: 146 HDFS_READ_OPS: 2 HDFS_WRITE_OPS: 2 org.apache.sqoop.submission.counter.SqoopCounters ROWS_WRITTEN: 3 ROWS_READ: 3 Job executed successfully
- 在YARN的8088界面查看
- 查看导入到HDFS的文件
[root@ip-186-31-16-68 ~]# hadoop fs -ls /fayson Found 4 items -rw-r--r-- 3 sqoop2 supergroup 55 2018-02-28 22:33 /fayson/9c013cbb-3f05-49cb-a3c1-fe70ac47241c.txt -rw-r--r-- 3 sqoop2 supergroup 91 2018-02-28 22:33 /fayson/f5d66a1b-4c7f-4dde-8549-7139e2dc7543.txt drwxr-xr-x - fayson supergroup 0 2018-02-22 21:14 /fayson/jars -rw-r--r-- 3 fayson supergroup 674 2018-02-04 09:35 /fayson/krb5.conf [root@ip-186-31-16-68 ~]# hadoop fs -cat /fayson/9c013cbb-3f05-49cb-a3c1-fe70ac47241c.txt /fayson/f5d66a1b-4c7f-4dde-8549-7139e2dc7543.txt 1,'sprocket',0.25,'2010-02-10',1,'connects two gizmos' 2,'gizmo',4.00,'2009-11-30',4,NULL 3,'gadget',99.99,'1983-08-13',13,'our flagship product'
大数据视频推荐:
CSDN
大数据语音推荐:
企业级大数据技术应用
大数据机器学习案例之推荐系统
自然语言处理
大数据基础
人工智能:深度学习入门到精通
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)