好多 *** 作都会引发这种问题,有的是执行sql加上了hint或connect with,先看下alert日志,和它的trace日志。我也同样遇到了这个问题,是在用数据泵恢复加密的备份文件时遇到的。网上说是恢复的表里,有的字段名有空格,可是我这里确认没有空格字段。
Oracle DBA神器:PRM-DUL灾难恢复工具可以直接从这种受损的Oracle数据库中将数据拯救出来。当你的数据库因为ORA-00600/ORA-07445或其他ORA-报错,或丢失关键的system表空间数据文件,或ASM diskgroup损坏时均可以考虑采用PRM-DUL来做恢复。PRM-DUL采用独创的DataBridge恢复技术,直接从数据文件中抽取数据后可以像DBLINK那样直接插入到新建数据库中,而无需数据落地成为DMP文件占用空间。
如何分析这种问题了?先看系统日志,像他这个是HP-UX,那么系统日志为/var/log/syslog/syslog.log,AIX是errpt在系统日志中,我看到:
Nov 11 18:43:57 rx8640c syslog: Oracle CSS family monitor shutting down. 3
Nov 11 18:43:59 rx8640c su: + tty?? root-oracle
Nov 11 18:43:59 rx8640c syslog: Cluster Ready Services completed waiting on dependencies.
在对比ALERT日志,发现系统基本是在这个时候重启的
Wed Nov 11 18:43:28 2009
Trace dumping is performing id=[cdmp_20091111184328]
Wed Nov 11 18:57:17 2009
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
如果是AIX系统,可以用last shutdown看看,HP我不知道是不是这个
这里,在syslog.log中可以看到,CSS进程shutdown(这个意思是偶猜的),CSS关闭或异常,会自动重启主机,符合现在的情况
接下来就是分析ORA_CRS_HOME中的ocssd日志了
[CSSD]2009-11-11 18:39:18.460 [13] >WARNING: clssgmAssignMemberNo(): grock(#CSS_CLSSOMON) memberNo(1) already assigned
[CSSD]2009-11-11 18:39:34.313 [14] >WARNING: clssnmPollingThread: node rx8640c (1) at 50% heartbeat fatal, eviction in 14.807 se
conds
[CSSD]2009-11-11 18:39:35.313 [14] >WARNING: clssnmPollingThread: node rx8640c (1) at 50% heartbeat fatal, eviction in 13.807 se
conds
[CSSD]2009-11-11 18:39:42.313 [14] >WARNING: clssnmPollingThread: node rx8640c (1) at 75% heartbeat fatal, eviction in 6.807 sec
onds
[CSSD]2009-11-11 18:39:45.313 [14] >TRACE: clssnmPollingThread: node rx8640c (1) is impending reconfig
[CSSD]2009-11-11 18:39:45.314 [14] >TRACE: clssnmPollingThread: diskTimeout set to (27000)ms impending reconfig status(1)
[CSSD]2009-11-11 18:39:46.313 [14] >TRACE: clssnmPollingThread: node rx8640c (1) is impending reconfig
[CSSD]2009-11-11 18:39:46.314 [14] >WARNING: clssnmPollingThread: node rx8640c (1) at 90% heartbeat fatal, eviction in 2.807 sec
onds
[CSSD]2009-11-11 18:39:47.313 [14] >TRACE: clssnmPollingThread: node rx8640c (1) is impending reconfig
[CSSD]2009-11-11 18:39:47.314 [14] >WARNING: clssnmPollingThread: node rx8640c (1) at 90% heartbeat fatal, eviction in 1.807 sec
onds
[CSSD]2009-11-11 18:39:48.313 [14] >TRACE: clssnmPollingThread: node rx8640c (1) is impending reconfig
[CSSD]2009-11-11 18:39:48.314 [14] >WARNING: clssnmPollingThread: node rx8640c (1) at 90% heartbeat fatal, eviction in 0.807 sec
onds
[CSSD]2009-11-11 18:39:49.133 [14] >TRACE: clssnmPollingThread: node rx8640c (1) is impending reconfig
[CSSD]2009-11-11 18:39:49.134 [14] >TRACE: clssnmPollingThread: Eviction started for node rx8640c (1), flags 0x000f, state 3,
这个日志信息很明显了,私有网络心跳丢失,节点被驱除
至于为什么私有网络出现问题,心跳丢失,我想这个不是DBA能处理的了,写个报告丢给管网络的去看吧
另外提下,可能造成节点重启的进程有3个,OCSSD,OPROCD,OCLSOMON
一般的,OCSSD的原因就是心跳丢失(网络心跳或者投票磁盘出现问题)和CSS进程请求不到CPU资源和BUG;OPROCD,OCLSOMON的原因是进程请求不到CPU资源和BUG
他这里在节点重启前,还顺便报了个600错误
Wed Nov 11 18:43:27 2009
Errors in file /oracle/app/oracle/admin/ora10g/udump/ora10g1_ora_24884.trc:
ORA-00600: internal error code, arguments: [keltnfy-ldmInit], [46], [1], [], [], [], [], []
确认是个Bug 5486074
ORA-600 [keltnfy-ldminit] can occur in the Server Generated Alert
subsystem when it cannot determine the Host Name or
Network Address. This can be caused by DNS server being unaavilable.
查了下,没说这个错误会导致CSS死亡,主机重启的,而该错误应该是客户端报出来的。。。
至少说可以确认网络出现过问题
启动的时候,报错
Wed Nov 11 18:58:06 2009
Errors in file /oracle/app/oracle/admin/ora10g/udump/ora10g1_ora_7203.trc:
ORA-00600: internal error code, arguments: [ksprlspeeq3], [65536], [], [], [], [], [], []
Wed Nov 11 18:58:07 2009
Errors in file /oracle/app/oracle/admin/ora10g/udump/ora10g1_ora_7203.trc:
ORA-07445: exception encountered: core dump [kgscDump()+801] [SIGSEGV] [Address not mapped to object] [0x000001004] [] []
ORA-00600: internal error code, arguments: [ksprlspeeq3], [65536], [], [], [], [], [], []
Wed Nov 11 18:58:08 2009
Errors in file /oracle/app/oracle/admin/ora10g/udump/ora10g1_ora_7203.trc:
ORA-07445: exception encountered: core dump [kgscDump()+801] [SIGSEGV] [Address not mapped to object] [0x000001004] [] []
ORA-07445: exception encountered: core dump [kgscDump()+801] [SIGSEGV] [Address not mapped to object] [0x000001004] [] []
ORA-00600: internal error code, arguments: [ksprlspeeq3], [65536], [], [], [], [], [], []
ORA-07445[kgscDump]对应有个Bug 5508574 - OERI[504] / OERI[99999] / Dump [kgscdump] with >31 CPUs,可是系统只有15C,30核。
ORA-00600[ksprlspeeq3]这个没找到10203相关的BUG,先也懒的管了
推荐一个METALINK的note:4.1,这个就是以前的knowledge,里面有很多归类的文章,和一些工具的列表
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)