MySQL 在崩溃恢复时,会遍历打开所有 ibd 文件的 header page 验证数据字典的准确性,如果 MySQL 中包含了大量表,这个校验过程就会比较耗时。 MySQL 下崩溃恢复确实和表数量有关,表总数越大,崩溃恢复时间越长。另外磁盘 IOPS 也会影响崩溃恢复时间,像这里开发库的 HDD IOPS 较低,因此面对大量的表空间,校验速度就非常缓慢。另外一个发现,MySQL 8 下正常启用时居然也会进行表空间校验,而故障恢复时则会额外再进行一次表空间校验,等于校验了 2 遍。不过 MySQL 80 里多了一个特性,即表数量超过 5W 时,会启用多线程扫描,加快表空间校验过程。
如何跳过校验MySQL 57 下有方法可以跳过崩溃恢复时的表空间校验过程嘛?查阅了资料,方法主要有两种:
1 配置 innodb_force_recovery可以使 srv_force_recovery != 0 ,那么 validate = false,即可以跳过表空间校验。实际测试的时候设置 innodb_force_recovery =1,也就是强制恢复跳过坏页,就可以跳过校验,然后重启就是正常启动了。通过这种临时方式可以避免崩溃恢复后非常耗时的表空间校验过程,快速启动 MySQL,个人目前暂时未发现有什么隐患。2 使用共享表空间替代独立表空间这样就不需要打开 N 个 ibd 文件了,只需要打开一个 ibdata 文件即可,大大节省了校验时间。自从听了姜老师讲过使用共享表空间替代独立表空间解决 drop 大表时性能抖动的原理后,感觉共享表空间在很多业务环境下,反而更有优势。
临时冒出另外一种解决想法,即用 GDB 调试崩溃恢复,通过临时修改 validate 变量值让 MySQL 跳过表空间验证过程,然后让 MySQL 正常关闭,重新启动就可以正常启动了。但是实际测试发现,如果以 debug 模式运行,确实可以临时修改 validate 变量,跳过表空间验证过程,但是 debug 模式下代码运行效率大打折扣,反而耗时更长。而以非 debug 模式运行,则无法修改 validate 变量,想法破灭。
首先登陆主机,执行top发现CPU资源几乎消耗殆尽,存在很多占用CPU很高的进程,而内存和I/O都不高,具体如下:
last pid: 26136; load averages: 889, 891, 812
216 processes: 204 sleeping, 8 running, 4 on cpu
CPU states: 06% idle, 973% user, 18% kernel, 02% iowait, 00% swap
Memory: 8192M real, 1166M free, 14M swap in use, 8179M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
25725 oracle 1 50 0 4550M 4508M cpu2 12:23 1123% oracle
25774 oracle 1 41 0 4550M 4508M run 14:25 1066% oracle
26016 oracle 1 31 0 4550M 4508M run 5:41 1037% oracle
26010 oracle 1 41 0 4550M 4508M run 4:40 981% oracle
26014 oracle 1 51 0 4550M 4506M cpu6 4:19 976% oracle
25873 oracle 1 41 0 4550M 4508M run 12:10 945% oracle
25723 oracle 1 50 0 4550M 4508M run 15:09 940% oracle
26121 oracle 1 41 0 4550M 4506M cpu0 1:13 928% oracle
25745 oracle 1 41 0 4551M 4512M run 9:33 928% oracle
26136 oracle 1 41 0 4550M 4506M run 0:06 561% oracle
409 root 15 59 0 7168K 7008K sleep 1731H 052% picld
25653 oracle 1 59 0 4550M 4508M sleep 1:01 046% oracle
25565 oracle 1 59 0 4550M 4508M sleep 0:07 024% oracle
25703 oracle 1 59 0 4550M 4506M sleep 0:08 013% oracle
25701 oracle 1 59 0 4550M 4509M sleep 0:23 010% oracle
于是先查看数据库的告警日志ALERT文件,并没有发现有什么错误存在,日志显示数据库运行正常,排除数据库本身存在问题。
然后查看这些占用CPU资源很高的Oracle进程究竟是在做什么 *** 作,使用如下SQL语句:
select sql_text,spid,v$sessionprogram,process from
v$sqlarea,v$session,v$process
where v$sqlareaaddress=v$sessionsql_address
and v$sqlareahash_value=v$sessionsql_hash_value
and v$sessionpaddr=v$processaddr
and v$processspid in (PID);
用top中占用CPU很高的进程的PID替换脚本中的PID,得到相应的Oracle进程所执行的SQL语句,发现占用CPU资源很高的进程都是执行同一个SQL语句:
SELECT ddomainname,dmswitchdomainid, aSERVICEID,aSERVICECODE,aUSERTYPE,aSTATUS,aNOTIFYSTATUS,to_char(aDATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE bmswitchdomainid = dmswitchdomainid and bgatewaysn = cgatewaysn AND aServiceCode like ccode||'%' and aserviceSpecID=1 and astatus!='4' and astatus!='10' and aservicecode like '010987654321%' and SubsidiaryID=999999999
基本上可以肯定是这个SQL引起了系统CPU资源大量被占用,那究竟是什么原因造成这个SQL这么大量占用CPU资源呢,我们先来看看数据库的进程等待事件都有些什么:
SQL> select sid,event,p1,p1text from v$session_wait;
SID EVENT P1 P1TEXT
---------- ----------------------------------------------------------------
12 latch free 43982E+12 address
36 latch free 43982E+12 address
37 latch free 43982E+12 address
84 latch free 43982E+12 address
102 latch free 43982E+12 address
101 latch free 43982E+12 address
85 latch free 43982E+12 address
41 latch free 43982E+12 address
106 latch free 43982E+12 address
155 latch free 43982E+12 address
151 latch free 43982E+12 address
149 latch free 43982E+12 address
147 latch free 43982E+12 address
1 pmon timer 300 duration
从上面的查询我们可以看出,大都是latch free的等待事件,然后接着查一下这些latch的等待都是什么进程产生的:
SQL> select spid from v$process where addr in
(select paddr from v$session where sid in(84,102,101,106,155,151));
SPID
------------
25774
26010
25873
25725
26014
26016
由此看出latch free这个等待事件导致了上面的那个SQL语句都在等待,占用了大量的CPU资源。我们来看看究竟主要是那种类型的latch的等待,根据下面的SQL语句:
SQL> SELECT latch#, name, gets, misses, sleeps
FROM v$latch
WHERE sleeps>0
ORDER BY sleeps;
LATCH# NAME GETS MISSES SLEEPS
---------- ----------------------------------------------------------------
15 messages 96876 20 1
159 library cache pin allocation 407322 43 1
132 dml lock allocation 194533 213 2
4 session allocation 304897 48 3
115 redo allocation 238031 286 4
17 enqueue hash chains 277510 85 5
7 session idle bit 2727264 314 16
158 library cache pin 3881788 5586 58
156 shared pool 2771629 6184 662
157 library cache 5637573 25246 801
98 cache buffers chains 1722750424 758400 109837
由上面的查询可以看出最主要的latch等待是cache buffers chains,这个latch的等待表明数据库存在单独的BLOCK的竞争这些latch,我们来看这个latch存在的子latch及其对应的类型:
SQL> SELECT addr, latch#, gets, misses, sleeps
FROM v$latch_children
WHERE sleeps>0
and latch# = 98
ORDER BY sleeps desc;
ADDR LATCH# GETS MISSES SLEEPS
---------------- ---------- ---------- ---------- ----------
000004000A3DFD10 98 10840661 82891 389
000004000A698C70 98 159510 2 244
0000040009B21738 98 104269771 34926 209
0000040009B227A8 98 107604659 35697 185
000004000A3E0D70 98 5447601 18922 156
000004000A6C2BD0 98 853375 7 134
0000040009B24888 98 85538409 25752 106
000004000A36B250 98 1083351 199 96
000004000A79EC70 98 257970 64 35
000004000A356AD0 98 1184810 160 34
……………
接着我们来查看sleep较多的子latch对应都有哪些对象:
SQL> select distinct aowner,asegment_name,asegment_type from
dba_extents a,
(select dbarfil,dbablk
from x$bh
where hladdr in
(select addr
from (select addr
from v$latch_children
order by sleeps desc)
where rownum < 5)) b
where aRELATIVE_FNO = bdbarfil
and aBLOCK_ID <= bdbablk and ablock_id + ablocks > bdbablk;
OWNER SEGMENT_NAME SEGMENT_TYPE
---------------------------------------------------------------------------
TEST I_SERVICE_SERVICESPECID INDEX
TEST I_SERVICE_SUBSIDIARYID INDEX
TEST SERVICE TABLE
TEST MSWITCHDOMAIN TABLE
TEST I_SERVICE_SC_S INDEX
TEST PK_MSWITCHDOMAIN INDEX
TEST GATEWAYLOC TABLE
…………………
我们看到在开始的那个SQL语句中的几个对象都有包括在内,于是来看看开始的那个SQL的执行计划:
SQL> set autotrace trace explain
SQL>SELECT ddomainname,dmswitchdomainid, aSERVICEID,aSERVICECODE,aUSERTYPE,aSTATUS,aNOTIFYSTATUS,to_char(aDATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE bmswitchdomainid = dmswitchdomainid and bgatewaysn = cgatewaysn AND aServiceCode like ccode||'%' and aserviceSpecID=1 and astatus!='4' and astatus!='10' and aservicecode like '010987654321%' and SubsidiaryID=999999999;
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE
1 0 NESTED LOOPS
2 1 NESTED LOOPS
3 2 NESTED LOOPS
4 3 TABLE ACCESS (FULL) OF 'SUBBUREAUNUMBER'
5 3 TABLE ACCESS (BY INDEX ROWID) OF 'GATEWAYLOC'
1登陆数据库主机
使用vmstat检查,发现CPU资源已经耗尽,大量任务位于运行队列:
bash-203$ vmstat 3
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr s6 s9 s1 sd in sy cs us sy id
0 0 055042321464112 0 0 0 0 0 0 0 0 1 1 0 4294967196 0 0 -84 -5 -145
131 0 0 5368072 1518360 56 691 0 2 2 0 0 0 1 0 0 3011 7918 2795 97 3 0
131 0 0 5377328 1522464 81 719 0 2 2 0 0 0 1 0 0 2766 8019 2577 96 4 0
130 0 0 5382400 1524776 67 682 0 0 0 0 0 0 0 0 0 3570 8534 3316 97 3 0
134 0 0 5373616 1520512 127 1078 0 2 2 0 0 0 1 0 0 3838 9584 3623 96 4 0
136 0 0 5369392 1518496 107 924 0 5 5 0 0 0 0 0 0 2920 8573 2639 97 3 0
132 0 0 5364912 1516224 63 578 0 0 0 0 0 0 0 0 0 3358 7944 3119 97 3 0
129 0 0 5358648 1511712 189 1236 0 0 0 0 0 0 0 0 0 3366 10365 3135 95 5 0
129 0 0 5354528 1511304 120 1194 0 0 0 0 0 0 0 4 0 3235 8864 2911 96 4 0
128 0 0 5346848 1507704 99 823 0 0 0 0 0 0 0 3 0 3189 9048 3074 96 4 0
125 0 0 5341248 1504704 80 843 0 2 2 0 0 0 6 1 0 3563 9514 3314 95 5 0
133 0 0 5332744 1501112 79 798 0 0 0 0 0 0 0 1 0 3218 8805 2902 97 3 0
129 0 0 5325384 1497368 107 643 0 2 2 0 0 0 1 4 0 3184 8297 2879 96 4 0
126 0 0 5363144 1514320 81 753 0 0 0 0 0 0 0 0 0 2533 7409 2164 97 3 0
136 0 0 5355624 1510512 169 566 786 0 0 0 0 0 0 1 0 3002 8600 2810 96 4 0
130 1 0 5351448 1502936 267 580 1821 0 0 0 0 0 0 0 0 3126 7812 2900 96 4 0
129 0 0 5347256 1499568 155 913 2 2 2 0 0 0 0 1 0 2225 8076 1941 98 2 0
116 0 0 5338192 1495400 177 1162 0 0 0 0 0 0 0 1 0 1947 7781 1639 97 3 0
2使用Top命令
观察进程CPU耗用,发现没有明显过高CPU使用的进程
$ top
last pid: 28313; load averages: 9990, 11754, 12571 23:28:38
296 processes: 186 sleeping, 99 running, 2 zombie, 9 on cpu
CPU states: 00% idle,965% user, 35% kernel,00% iowait, 00%swap
Memory: 4096M real, 1404M free, 2185M swap in use, 5114M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
27082 oracle8i 1 33 0 1328M 1309M run 0:17 129% oracle
26719 oracle8i 1 55 0 1327M 1306M sleep 0:29 111% oracle
28103 oracle8i 1 35 0 1327M 1304M run 0:06 110% oracle
28161 oracle8i 1 25 0 1327M 1305M run 0:04 110% oracle
26199 oracle8i 1 45 0 1328M 1309M run 0:42 110% oracle
26892 oracle8i 1 33 0 1328M 1310M run 0:24 109% oracle
27805 oracle8i 1 45 0 1327M 1306M cpu/1 0:10 104% oracle
23800 oracle8i 1 23 0 1327M 1306M run 1:28 103% oracle
25197 oracle8i 1 34 0 1328M 1309M run 0:57 103% oracle
21593 oracle8i 1 33 0 1327M 1306M run 2:12 101% oracle
27616 oracle8i 1 45 0 1329M 1311M run 0:14 101% oracle
27821 oracle8i 1 43 0 1327M 1306M run 0:10 100% oracle
26517 oracle8i 1 33 0 1328M 1309M run 0:33 097% oracle
25785 oracle8i 1 44 0 1328M 1309M run 0:46 096% oracle
26241 oracle8i 1 45 0 1327M 1306M run 0:42 096% oracle
3检查进程数量
bash-203$ ps -ef|grep ora|wc -l
258
bash-203$ ps -ef|grep ora|wc -l
275
bash-203$ ps -ef|grep ora|wc -l
274
bash-203$ ps -ef|grep ora|wc -l
278
bash-203$ ps -ef|grep ora|wc -l
277
bash-203$ ps -ef|grep ora|wc -l
366
发现系统存在大量Oracle进程,大约在300左右,而正常情况下Oracle连接数应该在100左右
4检查数据库
查询v$session_wait获取各进程等待事件
SQL> select sid,event,p1,p1text from v$session_wait;
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- -------- 124 latch free 16144E+10 address
1 pmon timer 300 duration
2 rdbms ipc message 300 timeout
3 rdbms ipc message 300 timeout
11 rdbms ipc message 30000 timeout
6 rdbms ipc message 180000 timeout
4 rdbms ipc message 300 timeout
134 rdbms ipc message 6000 timeout
147 rdbms ipc message 6000 timeout
275 rdbms ipc message 17995 timeout
274 rdbms ipc message 6000 timeout
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- --------
118 rdbms ipc message 6000 timeout
7 buffer busy waits 17 file#
56 buffer busy waits 17 file#
161 buffer busy waits 17 file#
195 buffer busy waits 17 file#
311 buffer busy waits 17 file#
314 buffer busy waits 17 file#
205 buffer busy waits 17 file#
269 buffer busy waits 17 file#
200 buffer busy waits 17 file#
164 buffer busy waits 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- -------
140 buffer busy waits 17 file#
66 buffer busy waits 17 file#
10 db file sequential read 17 file#
18 db file sequential read 17 file#
54 db file sequential read 17 file#
49 db file sequential read 17 file#
48 db file sequential read 17 file#
46 db file sequential read 17 file#
45 db file sequential read 17 file#
35 db file sequential read 17 file#
30 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- -------
29 db file sequential read 17 file#
22 db file sequential read 17 file#
178 db file sequential read 17 file#
175 db file sequential read 17 file#
171 db file sequential read 17 file#
123 db file sequential read 17 file#
121 db file sequential read 17 file#
120 db file sequential read 17 file#
117 db file sequential read 17 file#
114 db file sequential read 17 file#
113 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
111 db file sequential read 17 file#
107 db file sequential read 17 file#
80 db file sequential read 17 file#
222 db file sequential read 17 file#
218 db file sequential read 17 file#
216 db file sequential read 17 file#
213 db file sequential read 17 file#
199 db file sequential read 17 file#
198 db file sequential read 17 file#
194 db file sequential read 17 file#
192 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
188 db file sequential read 17 file#
249 db file sequential read 17 file#
242 db file sequential read 17 file#
239 db file sequential read 17 file#
236 db file sequential read 17 file#
235 db file sequential read 17 file#
234 db file sequential read 17 file#
233 db file sequential read 17 file#
230 db file sequential read 17 file#
227 db file sequential read 17 file#
336 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
333 db file sequential read 17 file#
331 db file sequential read 17 file#
329 db file sequential read 17 file#
327 db file sequential read 17 file#
325 db file sequential read 17 file#
324 db file sequential read 17 file#
320 db file sequential read 17 file#
318 db file sequential read 17 file#
317 db file sequential read 17 file#
316 db file sequential read 17 file#
313 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
305 db file sequential read 17 file#
303 db file sequential read 17 file#
301 db file sequential read 17 file#
293 db file sequential read 17 file#
290 db file sequential read 17 file#
288 db file sequential read 17 file#
287 db file sequential read 17 file#
273 db file sequential read 17 file#
271 db file sequential read 17 file#
257 db file sequential read 17 file#
256 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
254 db file sequential read 17 file#
252 db file sequential read 17 file#
159 db file sequential read 17 file#
153 db file sequential read 17 file#
146 db file sequential read 17 file#
142 db file sequential read 17 file#
135 db file sequential read 17 file#
133 db file sequential read 17 file#
132 db file sequential read 17 file#
126 db file sequential read 17 file#
79 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
77 db file sequential read 17 file#
72 db file sequential read 17 file#
70 db file sequential read 17 file#
69 db file sequential read 17 file#
67 db file sequential read 17 file#
63 db file sequential read 17 file#
55 db file sequential read 17 file#
102 db file sequential read 17 file#
96 db file sequential read 17 file#
95 db file sequential read 17 file#
91 db file sequential read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
81 db file sequential read 17 file#
15 db file sequential read 17 file#
19 db file scattered read 17 file#
50 db file scattered read 17 file#
285 db file scattered read 17 file#
279 db file scattered read 17 file#
255 db file scattered read 17 file#
243 db file scattered read 17 file#
196 db file scattered read 17 file#
187 db file scattered read 17 file#
170 db file scattered read 17 file#
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ------
162 db file scattered read 17 file#
138 db file scattered read 17 file#
110 db file scattered read 17 file#
108 db file scattered read 17 file#
92 db file scattered read 17 file#
330 db file scattered read 17 file#
310 db file scattered read 17 file#
302 db file scattered read 17 file#
299 db file scattered read 17 file#
89 db file scattered read 17 file#
5 smon timer 300 sleep time
SID EVENT P1 P1TEXT
---------- ------------------------------ ---------- ---------
20 SQLNet message to client 1952673792 driver id
103 SQLNet message to client 1650815232 driver id
148 SQLNet more data from client 1952673792 driver id
291 SQLNet more data from client 1952673792 driver id
244 rows selected
发现存在大量db file scattered read及db file sequential read等待
以上就是关于请教大神,mysql运行突然变特别慢全部的内容,包括:请教大神,mysql运行突然变特别慢、oracle 进程 sleeping正常吗、如何解决CPU过度消耗问题等相关内容解答,如果想了解更多相关内容,可以关注我们,你们的支持是我们更新的动力!
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)