- 1 DataX介绍及安装
- 1.1 DataX介绍
- 1.2 支持的数据源
- 1.3 运行原理
- 1.4 DataX安装
- 2 DataX的使用
- 2.1 streamTostream
- 2.1.1 创建配置文件(json格式)
- 2.1.2 启动DataX
- 2.1.3 执行结果
- 2.2 mysqlTohbase
- 2.2.1 创建配置文件(json格式)
- 2.2.2 执行DataX
- 2.2.3 执行结果
- 2.3 mysqlTohdfs
- 2.3.1 创建配置文件
- 2.4 hbaseTomysql
- 2.4.1 创建配置文件(同上)
1.2 支持的数据源 1.3 运行原理 1.4 DataX安装DataX 是阿里巴巴开源的一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle 等)、HDFS、Hive、ODPS、Hbase、FTP 等各种异构数据源之间稳定高效的数据同步功能。
为了解决异构数据源同步问题,DataX 将复杂的网状的同步链路变成了星型数据链路,DataX 作为中间传输载体负责连接各种数据源。当需要接入一个新的数据源的时候,只需要将此数据源对接到 DataX,便能跟已有的数据源做到无缝数据同步。
下载地址:http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
源码地址:https://github.com/alibaba/DataX
DataX不需要依赖其他服务,直接上传、解压、安装、配置环境变量即可
也可以直接在windows上解压
#解压至/usr/local/soft/ tar -zxvf datax.tar.gz -C /usr/local/soft/ #配置环境变量 vim /etc/profile #更新配置文件 source /etc/profile2 DataX的使用 2.1 streamTostream
2.1.1 创建配置文件(json格式)从stream读取数据并打印到控制台
# stream2stream.json { "job": { "content": [ { "reader": { "name": "streamreader", "parameter": { "sliceRecordCount": 10, "column": [ { "type": "long", "value": "10" }, { "type": "string", "value": "hello,你好,世界-DataX" } ] } }, "writer": { "name": "streamwriter", "parameter": { "encoding": "UTF-8", "print": true } } } ], "setting": { "speed": { "channel": 5 } } } }2.1.2 启动DataX
datax.py stream2stream.json2.1.3 执行结果
2021-12-07 20:08:30.673 [job-0] INFO JobContainer - 任务启动时刻 : 2021-12-07 20:08:20 任务结束时刻 : 2021-12-07 20:08:30 任务总计耗时 : 10s 任务平均流量 : 95B/s 记录写入速度 : 5rec/s 读出记录总数 : 50 读写失败总数 : 02.2 mysqlTohbase
2.2.1 创建配置文件(json格式)需要在mysql中创建student库和student表
需要在hbase中创建datax_test表
不同数据库和表,相应的参数也要改变
#mysqlTohbase.json { "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "123456", "column": [ "id", "name", "age", "gender", "clazz", "last_mod" ], "splitPk": "id", "connection": [ { "table": [ "student" ], "jdbcUrl": [ "jdbc:mysql://master:3306/student?useSSL=false&characterEncoding=utf8" ] } ] } }, "writer": { "name": "hbase11xwriter", "parameter": { "hbaseConfig": { "hbase.zookeeper.quorum": "master:2181,node1:2181,node2:2181" }, "table": "data_test", "mode": "normal", "rowkeyColumn": [ { "index":0, "type":"string" } ], "column": [ { "index":1, "name": "cf1:name", "type": "string" }, { "index":2, "name": "cf1:age", "type": "string" }, { "index":3, "name": "cf1:gender", "type": "string" }, { "index":4, "name": "cf1:clazz", "type": "string" } ], "versionColumn":{ "index": 5, }, "encoding": "utf-8" } } } ], "setting": { "speed": { "channel": 5 } } } }2.2.2 执行DataX
datax.py mysqlTohbase.json2.2.3 执行结果
2021-12-07 20:51:14.214 [job-0] INFO JobContainer - 任务启动时刻 : 2021-12-07 20:51:03 任务结束时刻 : 2021-12-07 20:51:14 任务总计耗时 : 11s 任务平均流量 : 4.30KB/s 记录写入速度 : 100rec/s 读出记录总数 : 1000 读写失败总数 : 02.3 mysqlTohdfs 2.3.1 创建配置文件
{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "123456", "column": [ "id", "name", "age", "gender", "clazz", "last_mod" ], "splitPk": "age", "connection": [ { "table": [ "student" ], "jdbcUrl": [ "jdbc:mysql://master:3306/student" ] } ] } }, "writer": { "name": "hdfswriter", "parameter": { "defaultFS": "hdfs://master:9000", "fileType": "text", "path": "/user/hive/warehouse/datax.db/students", "fileName": "student", "column": [ { "name": "id", "type": "bigint" }, { "name": "name", "type": "string" }, { "name": "age", "type": "INT" }, { "name": "gender", "type": "string" }, { "name": "clazz", "type": "string" }, { "name": "last_mod", "type": "string" } ], "writeMode": "append", "fieldDelimiter": "," } } } ], "setting": { "speed": { "channel": 5 } } } }2.4 hbaseTomysql 2.4.1 创建配置文件(同上)
{ "job": { "content": [ { "reader": { "name": "hbase11xreader", "parameter": { "hbaseConfig": { "hbase.zookeeper.quorum": "master:2181" }, "table": "student", "encoding": "utf-8", "mode": "normal", "column": [ { "name": "rowkey", "type": "string" }, { "name": "cf1:name", "type": "string" }, { "name": "cf1:age", "type": "string" }, { "name": "cf1:gender", "type": "string" }, { "name": "cf1:clazz", "type": "string" } ], "range": { "startRowkey": "", "endRowkey": "", "isBinaryRowkey": false } } }, "writer": { "name": "mysqlwriter", "parameter": { "writeMode": "insert", "username": "root", "password": "123456", "column": [ "id", "name", "age", "gender", "clazz" ], "preSql": [ "truncate student2" ], "connection": [ { "jdbcUrl": "jdbc:mysql://master:3306/student2?useUnicode=true&characterEncoding=utf8", "table": [ "student2" ] } ] } } } ], "setting": { "speed": { "channel": 5 } } } }
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)