Gbase 8a 数据加载工具有什么优点?

Gbase 8a 数据加载工具有什么优点?,第1张

加载工具具备如下一些特性和优点:

1)与集群高度集成,方便部署;

2)支持 SQL 及外部工具的加载方式;面向用户的 SQL 接口方式使集群和单机加载方式统一,更符合用户的使用习惯;

3)支持单表多数据源并行加载,支持多加载机对单表的并行加载,最大化加载性能;

4)支持从通用数据服务器拉取数据,支持 ftp/http/hdfs/RESTFUL/Kafka 等多种文件传输协议;

5)支持普通文本、gzip 压缩、snappy 压缩等多种格式数据文件;

6)支持普通文本、定长文本、宽松模式的加载;

7)支持错误数据溯源功能,可以准确定位错误数据在源文件中的位置;

8)加载性能可以随着集群规模的扩展而持续提升。

前提参数设置

// 默认值是3,这个是session级别的,无需恢复

gbase>set gcluster_log_level=7

Query OK, 0 rows affected (Elapsed: 00:00:00.00)

// 默认值是0,建议先show一下原始值,排查完了记得改回去。

gbase>set global gbase_sql_trace_level=15

Query OK, 0 rows affected (Elapsed: 00:00:00.12)

// 默认值是0,这个是session级别,无需恢复

gbase>set gbase_sql_trace=1

Query OK, 0 rows affected (Elapsed: 00:00:00.01)

执行语句

gbase>load data infile ‘ftp://gbase:gbase1234@192.168.163.100/t1.txt’ into table testdb.t1

Query OK, 4 rows affected (Elapsed: 00:00:00.35)

Task 394392 finished, Loaded 4 records, Skipped 0 records

查看日志

/opt/gbase/gcluster/log/gcluster/express.log

# SQL处于check permission状态

# 获得表的锁

# 如果长时间无法拿到锁,通过gcadmin showlock 看看对应的表的锁在哪个节点哪个ID持有

2019-07-03 15:02:41.352 [LOCK][INFO ][S:2108][Q:13221]:acquired WRITE lock: testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA, lwp:35693

2019-07-03 15:02:41.352 [LOCK][INFO ][S:2108][Q:13221]:acquired READ lock: testdb.t1, lwp:35693

2019-07-03 15:02:41.353 [LOCK][INFO ][S:2108][Q:13221]:acquired READ lock: testdb.t1.rsync, lwp:35693

# 读取数据源ftp文件信息

# 检查数据库和FTP之间的性能问题,包括网络和FTP服务器的设置

2019-07-03 15:02:41.354 [LOAD][INFO ][S:2108][Q:13221]:Start cluster load

2019-07-03 15:02:41.358 == Information : Trying 192.168.163.100...

2019-07-03 15:02:41.362 == Information : Connected to 192.168.163.100 (192.168.163.100) port 21 (#0)

2019-07-03 15:02:41.370 <= Recv header : 220 (vsFTPd 3.0.2)

2019-07-03 15:02:41.371 =>Send header : USER gbase

2019-07-03 15:02:41.371 <= Recv header : 331 Please specify the password.

2019-07-03 15:02:41.371 =>Send header : PASS gbase1234

2019-07-03 15:02:41.419 <= Recv header : 230 Login successful.

2019-07-03 15:02:41.419 =>Send header : PWD

2019-07-03 15:02:41.419 <= Recv header : 257 "/home/gbase"

2019-07-03 15:02:41.419 == Information : Entry path is '/home/gbase'

2019-07-03 15:02:41.419 =>Send header : TYPE I

2019-07-03 15:02:41.419 == Information : ftp_perform ends with SECONDARY: 0

2019-07-03 15:02:41.420 <= Recv header : 200 Switching to Binary mode.

2019-07-03 15:02:41.420 =>Send header : SIZE t1.txt

2019-07-03 15:02:41.420 <= Recv header : 213 8

2019-07-03 15:02:41.420 =>Send header : REST 0

2019-07-03 15:02:41.420 <= Recv header : 350 Restart position accepted (0).

2019-07-03 15:02:41.421 == Information : Remembering we are in dir ""

2019-07-03 15:02:41.421 == Information : Connection #0 to host 192.168.163.100 left intact

2019-07-03 15:02:41.422 == Information : Trying 192.168.163.100...

2019-07-03 15:02:41.422 == Information : Connected to 192.168.163.100 (192.168.163.100) port 21 (#0)

2019-07-03 15:02:41.424 <= Recv header : 220 (vsFTPd 3.0.2)

2019-07-03 15:02:41.424 =>Send header : USER gbase

2019-07-03 15:02:41.425 <= Recv header : 331 Please specify the password.

2019-07-03 15:02:41.425 =>Send header : PASS gbase1234

2019-07-03 15:02:41.465 <= Recv header : 230 Login successful.

2019-07-03 15:02:41.465 =>Send header : PWD

2019-07-03 15:02:41.465 <= Recv header : 257 "/home/gbase"

2019-07-03 15:02:41.465 == Information : Entry path is '/home/gbase'

2019-07-03 15:02:41.465 =>Send header : TYPE I

2019-07-03 15:02:41.465 == Information : ftp_perform ends with SECONDARY: 0

2019-07-03 15:02:41.465 <= Recv header : 200 Switching to Binary mode.

2019-07-03 15:02:41.465 =>Send header : SIZE t1_2.txt

2019-07-03 15:02:41.466 <= Recv header : 213 8

2019-07-03 15:02:41.466 =>Send header : REST 0

2019-07-03 15:02:41.466 <= Recv header : 350 Restart position accepted (0).

2019-07-03 15:02:41.466 == Information : Remembering we are in dir ""

2019-07-03 15:02:41.466 == Information : Connection #0 to host 192.168.163.100 left intact

#准备分派任务

2019-07-03 15:02:41.467 [LOAD][INFO ][S:2108][Q:13221]:AddNodeLoadFileNum host[::ffff:192.168.163.101]add num [1].

2019-07-03 15:02:41.474 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1

2019-07-03 15:02:41.474 [LOCK][INFO ][S:2108][Q:13221]:acquired WRITE lock: testdb.t16ef6f8a9-87f0-4d6c-8043-899367d02df3, lwp:35693

2019-07-03 15:02:41.475 [LOAD][INFO ][S:2108][Q:13221]:AddNodeLoadFileNum cur node state:[ip:::ffff:192.168.163.100:num:0][ip:::ffff:192.168.163.101:num:1].

# 此时SQL已经在loading状态,

# 开始下发任务。一般是某个节点性能导致,或者数据非常倾斜。

# a) 可以测试从每个数据库的数据节点,用ftp直接获取数据文件的性能

# 比如 curl ftp://XXXXX/XX.csv >1.txt

# 多跑几次,看看哪个节点比较慢

# b) 排查机器的内存是否出现SWAP, *** 作系统/var/log/messages是否有磁盘报错

# c) 查看CPUINFO是否有异常降频

# d) 用top、iotop、nmon等工具观察一段时间系统资源,是否有资源紧张

#

2019-07-03 15:02:41.476 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:set scn = 395447.

2019-07-03 15:02:41.521 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:LOAD DATA INFILE 'ftp://gbase:*********@192.168.163.100/t1.txt#offset=0&length=8&firstblock&ffsize=8,ftp://gbase:*********@192.168.163.100/t1_2.txt#offset=0&length=8&firstblock&ffsize=8' INTO TABLE `testdb`.`t1_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED HOST '::ffff:192.168.163.101' CURRENT_TIMESTAMP 1562137361 SCN_NUMBER 395447 GCLUSTER_PORT 5258 INTO SERVER (HOST '::ffff:192.168.163.100, ::ffff:192.168.163.101', PORT '5050', USER 'root', DATABASE 'testdb', TABLE 't1_n1, t1_n2', COMMENT 'table_host 0 1 0 # 1 0 1, scn 395447, group -1').

2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:ReduceNodeLoadFileNum host[::ffff:192.168.163.100] reduce num [0].

2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:ReduceNodeLoadFileNum host[::ffff:192.168.163.101] reduce num [1].

2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:ReduceNodeLoadFileNum cur node state:[ip:::ffff:192.168.163.100:num:0][ip:::ffff:192.168.163.101:num:0].

2019-07-03 15:02:42.004 [LOAD][INFO ][S:2108][Q:13221]:Task 395447 finished, Loaded 8 records, Skipped 0 records

# 拿到排它锁,要切换版本了。

2019-07-03 15:02:42.008 [LOCK][INFO ][S:2108][Q:13221]:acquired WRITE lock: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F, lwp:35693

# 设置SCN

2019-07-03 15:02:42.008 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:SET SELF SCN = 395447.

2019-07-03 15:02:42.009 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:SET SELF SCN = 395447.

2019-07-03 15:02:42.009 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:SET SELF SCN = 395447.

2019-07-03 15:02:42.009 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:SET SELF SCN = 395447.

# 以指定的SCN来刷新数据

2019-07-03 15:02:42.011 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:flush commit "testdb"."t1_n1" scn_number 395447.

2019-07-03 15:02:42.011 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:flush commit "testdb"."t1_n2" scn_number 395447.

2019-07-03 15:02:42.019 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:flush commit "testdb"."t1_n2" scn_number 395447.

2019-07-03 15:02:42.055 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:flush commit "testdb"."t1_n1" scn_number 395447.

# 验证SCN

2019-07-03 15:02:42.060 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n1'.

2019-07-03 15:02:42.061 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n1'.

2019-07-03 15:02:42.061 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n2'.

2019-07-03 15:02:42.061 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:select scn from information_schema.tables where table_schema = 'testdb' and table_name = 't1_n2'.

2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:set autocommit=1.

2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:set autocommit=1.

2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.100, SQL:set autocommit=1.

2019-07-03 15:02:42.063 [SQLDISP][INFO ][S:2108][Q:13221]:Target:::ffff:192.168.163.101, SQL:set autocommit=1.

# SQL处于commit阶段 或者 rollback阶段

# 提交commit成功

2019-07-03 15:02:42.067 [COMMIT][INFO ][S:2108][Q:13221]:commit table(testdb.t1) success for scn:395447.

2019-07-03 15:02:42.068 [LOAD][INFO ][S:2108][Q:13221]:successful to cluster load

# 释放锁

2019-07-03 15:02:42.068 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb

2019-07-03 15:02:42.069 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F

2019-07-03 15:02:42.070 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1.rsync

2019-07-03 15:02:42.071 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA

2019-07-03 15:02:42.072 [LOCK][INFO ][S:2108][Q:13221]:unlocked: testdb.t16ef6f8a9-87f0-4d6c-8043-899367d02df3


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/sjk/10010527.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-05-04
下一篇 2023-05-04

发表评论

登录后才能评论

评论列表(0条)

保存