主从同步UUID问题：found a zombie dump thread with the same UUID_框架

问题现象：

环境情况：

MySQL:5721架构1主2从M<-->M—>SLAVE

现象分析：

1、每60秒产生一次报警2、相同的uuid

问题原因分析：

原因

Uuid 重复问题

机器上的autoconf id问题

网络问题

备份时停止同步问题

slave 超时时间slave_net_timeout

版本配置问题

排查思路：

1Uuid 重复问题都是唯一

2机器上的Autoconf都是唯一

3网络问题利用iptables模拟

4备份停止时的 *** 作备份时间不对

5Master 数据不更新Slave 超时时间问题有的db组大量更新还是存在

6Slave 重启很像，但是凌晨没人重启

测试步骤：

根据上面的排查思路准备环境进行开始测试。

环境：

Master：101101031

Slave:101101030互为主从

Sysbench:用来测试插入数据配置同步（略）

测试

Uuid 与autoconf 相同因为做主从配置时通常拷贝数据，因此有可能存在auto相同的情况，但是一一确认后发现不同。

测试slave 重启，master日志情况

在101101030 上执行如下

Stopslave;Startsalve;

发现日志如下：

Master 日志：

2020-08-05T17:48:46033859+08:00 63510[Note] While initializing dump thread for slavewithUUID,founda zombie dumpthreadwiththe same UUIDMasteriskilling the zombie dumpthread(8398)2020-08-05T17:48:46034067+08:0063510[Note]Startbinlog_dumptomaster_thread_id(63510)slave_server(9013100), pos(,4)

解说：

发现日志很符合线上报错的日志，但是查日志发现每60秒就会产生，线上也不可能频繁重启slave。

测试网络

在101101031 上执行IP tables 限制30访问，查看日志情况

iptables-IINPUT-s101101030-jDROP

slave 同步状态

mysql> show slave status\G; 1 row Slave_IO_State: Reconnecting after a failed master event readMaster_Host: 101101031Master_User: replicMaster_Port: 3070Connect_Retry: 60Master_Log_File: mysql-bin000008Read_Master_Log_Pos: 2659Relay_Log_File: relay-bin000010Relay_Log_Pos: 2832Relay_Master_Log_File: mysql-bin000008Slave_IO_Running: ConnectingSlave_SQL_Running: Yes

30上日志：

2020-08-05T20:08:04931077+08:0063498[Warning]Storing MySQL user nameorpassword informationinthe master info repositoryisnotsecureandisthereforenotrecommended Please consider using the USERand PASSWORD connection optionsforSTART SLAVE; see the'START SLAVE Syntax'inthe MySQL Manualformore information2020-08-05T20:08:34869063+08:0063510[Note] Aborted connection63510to db:'unconnected'user:'replic'host:'101101030'(failed on flush_net())2020-08-05T20:08:34925537+08:0063792[Note] Start binlog_dump tomaster_thread_id(63792) slave_server(9013100), pos(,4)

解说：

发现日志访问master 出现flush_net() 关键字，但是线上并没有，因此排除网络问题。

备份停止时的 *** 作xtrabackup

因为备份是利用percona 的xtrabackup工具，加了slave信息保持一致性的参数 --slave-info 但是发现时间对应不上因此排除

思路中断

此时发现常规的想法已经实现不通，需要查下同步参数，翻阅官方文档发现同步参数slave_net_timeout，根据参数定义为slave 在没有收到master 数据时的超时时间，会再次重新连接master，感觉有点线索了。对slave_net_timeout开始进行测试。此值默认为60

缩短 slave 网络超时时间,但没数据

在101101030 上执行如下：

Stopslave;setglobalslave_net_timeout=5;startsalve;

测试日志：

101101031 上的日志:

2020-08-06T09:59:31226156+08:00 65500 [Note] While initializingdump thread for slavewithUUID,founda zombie dumpthreadwiththe same UUIDMasteriskilling the zombiedumpthread(65499)2020-08-06T09:59:31226273+08:0065500[Note]Startbinlog_dump tomaster_thread_id(65500) slave_server(901311100), pos(mysql-bin000008,828)2020-08-06T09:59:32237103+08:0065501[Note]Whileinitializing dump threadforslavewithUUID,foundazombie dumpthreadwiththe same UUIDMasteriskilling the zombie dumpthread(65500)2020-08-06T09:59:32237253+08:0065501[Note]Startbinlog_dump tomaster_thread_id(65501) slave_server(901311100), pos(mysql-bin000008,828)2020-08-06T09:59:33245805+08:0065502[Note]Whileinitializing dump threadforslavewithUUID,foundazombie dumpthreadwiththe same UUIDMasteriskilling the zombie dumpthread(65501)2020-08-06T09:59:33245910+08:0065502[Note]Startbinlog_dumptomaster_thread_id(65502)slave_server(901311100), pos(mysql-bin000008,828)

101101030 上的日志:

2020-08-06T10:02:01030159+08:0018[Warning] Storing MySQL usernameorpassword informationinthe master info repositoryisnotsecureandisthereforenotrecommended Please consider using the USERandPASSWORDconnection optionsforSTART SLAVE; see the'START SLAVE Syntax'inthe MySQLManualformore information

解说：

发现结果日志已经符合预期，当slave_net_timeout时间过短，会频繁重连，发现产生日志和问题一样，那么接下来我插入数据看下情况

缩短 slave 网络超时时间,但有数据插入

在101101030 上执行如下：

Stopslave;setglobalslave_net_timeout=5;startsalve;

插入数据：

sysbench/root/sysbench-1019/tests/include/oltp_legacy/oltplua--mysql-host=101101030--mysql-user=root --mysql-password='xxx'--mysql-port=3070--mysql-db=sbtest --oltp-table-size=10000000--oltp-tables-count=1--threads=1--events=500000--time=1200--report-interval=1prepare

日志：

解说：

发现已经没有日志输出，说明插入数据时，说明slave_net_timeout在有master 有binlog日志产生的情况下没问题，只有在空闲时会发生。

再次中断

可以验证slave_net_timeout 是产生问题的原因，但是线上有很多组db 没有日志产生也没有报错，反而写入量大的会存在问题，此时陷入迷惑中，因为此理由不能说服领导啊。继续看官方文档。发下如下信息

发现有个MASTER_HEARTBEAT_PERIOD参数，意思是多长时间对master 进行探测是否存活，默认为slave_net_timeout的一半也就是30，有线索了继续测试

MASTER_HEARTBEAT_PERIOD测试

对于发生问题的db机器进行查询MASTER_HEARTBEAT_PERIOD参数的值，发现为1800

而正常机器都是30，发现其中猫腻。

mysql> select from mysqlslave_master_info\G 1 row Number_of_lines: 25Master_log_name: mysql-bin000651Master_log_pos: 131563718Connect_retry: 60Heartbeat: 1800

在101101030slave上执行

slave：上

mysql>stop slave;Query OK, 0 rows affected (000 sec)mysql>change master to master_heartbeat_period = 1800;Query OK, 0 rows affected, 1 warning (011 sec)mysql>start slave;Query OK, 0 rows affected (000 sec)

查看日志：

Slave上：

Master上：

发现问题已经复现

这个确实是从55 升级到57 的版本，slave_net_timetou原来是3600，Heartbeat：1800，因此问题已经定位。

FileStream fs = new FileStream("d:\\atxt", FileModeOpen); StreamReader m_streamReader = new StreamReader(fs); m_streamReaderBaseStreamSeek(0, SeekOriginBegin); string arry = ""; string strLine = m_streamReaderReadLine(); do { string[] split = strLineSplit('='); string a = split[0]; if (aToLower() == "ip") { arry += strLine + "\n"; } strLine = m_streamReaderReadLine(); } while (strLine != null && strLine != ""); m_streamReaderClose(); m_streamReaderDispose(); fsClose(); fsDispose(); ConsoleWrite(arry); ConsoleReadLine(); 如果你要大小写也要匹配的话把ToLower() 去掉就行了

如果在Windows下（Linux行不行不知道）\x0d\obj=iopopen("cd") --如果不在交互模式下，前面可以添加local \x0d\path=obj:read("all"):sub(1,-2) --path存放当前路径\x0d\obj:close() --关掉句柄\x0d\上述原理是利用Windows的cd命令返回工作目录；至于sub(1,-2)是为了去掉换行符\x0d\当然如果你有lua socks或者你有lfs（注意匹配你的Lua版本），你可以使用lfs(Lua File System)\x0d\require("lfs")\x0d\path=lfscurrentdir()\x0d\这个则是Lua文件系统库中的函数。\x0d\就是这样

--获取准确小数

-- num 源数字

--n 位数

function GetPreciseDecimal(num, n)

if type(num) ~= "number" then

return num

end

n = n or 0

n = mathfloor(n)

if n < 0 then n = 0 end

local decimal = 10 ^ n

local temp = mathfloor(num decimal)

return = temp / decimal

end

保存到本地的Lua代码需要经过以下步骤：

1 导入相关库文件

首先需要导入相关库文件，包括cocos2d-x库和io库。这可以通过在代码开头添加以下代码实现：

local FileUtils = ccFileUtils:getInstance()

local io = require("io")

2 获取数据

接下来需要获取需要保存的的数据。这可以通过使用cocos2d-x中的Sprite类来实现：

local sprite = ccSprite:create("imagepng")

local texture2D = sprite:getTexture()

local size = texture2D:getContentSizeInPixels()

local data = texture2D:getData()

3 将数据写入文件

最后一步是将获取到的数据写入文件。这可以通过使用io库中的文件 *** 作函数来实现：

local path = FileUtils:getWritablePath() "imagepng"

local file = ioopen(path, "wb")

file:write(data, size)

file:close()

这段代码会将获取到的数据写入一个名为imagepng的文件中，保存在应用的可写目录下。需要注意的是，在写入文件之前需要先创建一个io文件对象，并且在写入完成之后需要关闭文件对象。

以上就是将保存到本地的Lua代码的详细步骤。

以上就是关于主从同步UUID问题：found a zombie dump thread with the same UUID全部的内容，包括:主从同步UUID问题：found a zombie dump thread with the same UUID、lua的io,读取文件,匹配关键字,获取内容下面内容.并返回、lua脚本怎么获取当前正在执行的脚本的当前路径等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/web/9269440.html

主从同步UUID问题：found a zombie dump thread with the same UUID

发表评论

评论列表（0条）