tbwshc

在线索引重建被取消导致大量ORA-600(kghstack_underflow_internal_3)错误

客户在比较繁忙的时刻执行了索引的REBUILD操作,导致大量会话被锁,最终对ONLINE REBUILD操作执行了取消操作,引发了一系列更严重的ORA-600错误。

 

 

登录数据库后,发现两个节点上ORACLE_BASE所在目录都已经100%占用。数据库无法正常通过/ AS SYSDBA方式登录。

查询告警日志发现大量的ORA-600和ORA-7445错误:

Tue May 08 21:20:26 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_1555.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_3891.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_26190.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_2873.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_4518.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:27 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_22469.trc:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Tue May 08 21:20:28 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_26440.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_762.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:29 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_26106.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_1597.trc:
ORA-07445:
出现异常错误:核心转储[0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
Tue May 08 21:20:30 EAT 2012
Errors in file /oracle/app/admin/orcl/udump/orcl2_ora_856.trc:
ORA-07445: exception encountered: core dump [0000000000000000] [SIGSEGV] [Invalid permissions for mapped object] [0x000000000] [] []
.
.
.

检查对应的TRACE文件,发现导致错误语句执行的是TABLE_A表的插入:

ksedmp: internal or fatal error tb
ORA-00600:
内部错误代码,参数: [kghstack_underflow_internal_3], [0x60000000003002F0], [keybuf], [], [], [], [], []
Current SQL statement for this session:
INSERT INTO TABLE_A (O_ID, P_ID, S_ID, F_ITEM, F_NAME, T_MON, D_MON, I_MON, P_STATE, P_TIME, R1, R2) VALUES (:B10 , SUBSTR(:B10 , LENGTH(:B10 ) - 1, 2), :B9 , :B8 , :B7 , :B6 , :B5 , :B4 , :B3 , NULL, :B2 , :B1 )
----- PL/SQL Call Stack -----
 object     line object
 handle   number name
c00000203c2cc550      119 package body U1.P_O_I
c00000203b788200      288 procedure U1.U_B_O_I
c00000203719b8d0        1 anonymous block
----- Call Stack Trace -----
calling             call    entry               argument values in hex     
location            type    point               (? means dubious value)    
-------------------- -------- -------------------- ----------------------------
ksedst()+64         call    ksedst1()           000000000 ? 000000001 ?
ksedmp()+2176       call    ksedst()            000000000 ?
                                                  C000000000000D20 ?
                                                  4000000004037940 ?
                                                  000000000 ? 000000000 ?
                                                  000000000 ?
ksfdmp()+112        call    ksedmp()            000000003 ?
                                                  9FFFFFFFFFFF1230 ?
                                                  60000000000BA290 ?
                                                  9FFFFFFFFFFF1800 ?
                                                  C000000000000999 ?
                                                  400000000407F9B0 ?
kgerinv()+304       call    ksfdmp()            9FFFFFFFFFFF1D90 ?
                                                  000000003 ?
                                                  9FFFFFFFFFFF1810 ?
                                                  60000000000BA290 ?
                                                  C000000000000612 ?
                                                  40000000098C38B0 ?
kgeasnmierr()+144   call    kgerinv()           60000000000318D0 ?
                                                  4000000001AD98A0 ?
                                                  6000000000032988 ?
                                                  4000000001AD98A0 ?
                                                  9FFFFFFFFFFF1DD0 ?
$cold_kghstack_unde call    kgeasnmierr()       60000000000318D0 ?
rflow_internal()+36                               9FFFFFFFBF3B1168 ?
8                                                 9FFFFFFFBF3B1178 ?
                                                  6000000000032D00 ?
                                                  000000002 ?
                                                  60000000003002F0 ?
                                                  000000001 ? 000000006 ?
kghstack_free()+208 call    $cold_kghstack_unde 60000000000318D0 ?
                             rflow_internal()    60000000003002F0 ?
                                                  60000000000BA290 ?
                                                  C000000000000B1D ?
                                                  4000000002F7A510 ?
                                                  00002C87B ?
                                                  6000000000031A10 ?
ksmfrs()+48         call    kghstack_free()     60000000000318D0 ?
                                                  60000000003002F0 ?
rpiswu2()+1312      call    ksmfrs()            60000000003002F0 ?
                                                  C000000000001026 ?
                                                  4000000002F78960 ?
                                                  000000000 ? 000000000 ?
                                                  000000000 ?
rpidrv()+2352       call    rpiswu2()           9FFFFFFFFFFF2AF0 ?
                                                  4000000002F7AE60 ?
                                                  00002F833 ?
                                                  60000000000A7D20 ?
                                                  9FFFFFFFFFFF1E20 ?
                                                  C000000000001ABD ?
                                                  4000000000F14558 ?
                                                  60000000000C6CF0 ?
psddr0()+864        call    rpidrv()            000000018 ? 000000066 ?
                                                  9FFFFFFFFFFF3700 ?
                                                  000000038 ?
                                                  9FFFFFFFFFFF2B20 ?
                                                  60000000000BA290 ?
psdnal()+736        call    psddr0()            000000018 ? 000000066 ?
                                                  9FFFFFFFFFFF3700 ?
                                                  000000030 ?
pevm_EXECC()+832    call    psdnal()            9FFFFFFFFFFF54D0 ?
                                                  C00000203489A9F8 ?
                                                  C0000000000011AA ?
                                                  9FFFFFFFBE832840 ?
                                                  C00000203C2CC550 ?
                                                  400000000313C770 ?
                                                  000026035 ?
pfrinstr_EXECC()+16 call    pevm_EXECC()        9FFFFFFFBE83D1D0 ?
0                                                 9FFFFFFFBE8328B0 ?
                                                  000000020 ?
pfrrun_no_tool()+19 call    pfrinstr_EXECC()    9FFFFFFFBE83D1D0 ?
2                                                 C000001DA198D61C ?
                                                  9FFFFFFFBE83D238 ?
pfrrun()+1376       call    pfrrun_no_tool()    9FFFFFFFBE83D1D0 ?
                                                  000002001 ?
                                                  9FFFFFFFBE83D238 ?
                                                  60000000000BA290 ?
                                                  C000000000000A1C ?
                                                  4000000003163040 ?
                                                  9FFFFFFFBE83D620 ?
                                                  9FFFFFFFBE83D298 ?
plsql_run()+1328    call    pfrrun()            9FFFFFFFFFFF3788 ?
                                                  9FFFFFFFFFFF3770 ?
                                                  60000000000BA290 ?
                                                  9FFFFFFFFFFF4370 ?
                                                  9FFFFFFFFFFF4370 ?
                                                  C000000000000E23 ?
                                                  4000000002C34D70 ?
peicnt()+560        call    plsql_run()         9FFFFFFFFFFF4380 ?
                                                  000000001 ? 000000000 ?
                                                  9FFFFFFFFFFF4380 ?
                                                  60000000000BA290 ?
                                                  9FFFFFFFFFFF4E90 ?
kkxexe()+1008       call    peicnt()            9FFFFFFFFFFF54D0 ?
                                                  9FFFFFFFBE83D1D0 ?
                                                  9FFFFFFFFFFF4EA0 ?
                                                  60000000000BA290 ?
                                                  9FFFFFFFFFFF5420 ?
                                                  C000000000000A1C ?
                                                  000000020 ?
                                                  9FFFFFFFFFFF4F00 ?
opiexe()+8016       call    kkxexe()            0000051F0 ?
.
.
.

这个SQL语句是正常的业务操作,而这种简单的INSERT都会导致ORA-600和ORA-7445错误,说明插入的表或索引本身存在问题。

检查表和索引的状态,发现索引部分分区状态异常:

CREATE INDEX "U1"."IDX_F_2"
ON "U1"."TABLE_A" ("S_ID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(
BUFFER_POOL DEFAULT) LOCAL
(PARTITION "P1"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS7" ,
PARTITION "P2"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS8" ,
PARTITION "P3"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS9" ,
PARTITION "P4"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS10" ,
PARTITION "P5"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS1" ,
PARTITION "P6"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS2" ,
PARTITION "P7"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS3" ,
PARTITION "P8"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS4" ,
PARTITION "P9"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS5" ,
PARTITION "P10"
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "TBS6" )

ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P6" UNUSABLE
ALTER INDEX "U1"."IDX_F_2" MODIFY PARTITION "P7" UNUSABLE

和客户沟通后,确认今天问题发生之前有人对索引进行维护操作,操作语句包括:

alter index u1.IDX_F_2 rebuild partition p6 parallel

alter index U1.IDX_T_2 rebuild partition P9 nologging parallel online

alter index U1.IDX_T_1 rebuild partition P3 nologging parallel online

alter index U1.IDX_F_2 rebuild partition P1 nologging parallel online

alter index U1.IDX_T_4 rebuild partition P1 nologging parallel online

alter index U1.IDX_F_4 rebuild partition P5 nologging parallel online

在对索引进行维护之后,数据库中出现了大量的enq: TM – contention等待,TM锁出现的原和是索引IDX_F_2的PARTITION P6重建语句没有添加ONLINE参数有直接关系。随后大量会话被阻塞,而整个索引重建操作被人为中止,其中包括正在运行的ONLINE REBUILD操作,而ONLINE REBUILD操作被中止是十分危险的,很容易导致bug的产生,而当前就导致了ORA-600和ORA-7445的产生。

由于客户着急解决问题,因此对数据库进行了重启。重启后,ORA-600错误信息不再出现。但是前台应用报错,索引失效。

SELECT 'ALTER INDEX U1.' || INDEX_NAME || ' REBUILD PARTITION ' || PARTITION_NAME || ';'
FROM DBA_IND_PARTITIONS
WHERE INDEX_OWNER = 'U1'
AND STATUS != 'USABLE';

获取所有失效的索引重建语句,将索引重新编译后,数据库恢复正常。

在Oracle的MOS上没有找到与之最相关的bug信息,反而是找到了一个相关性很高的补丁信息,而对应的bug信息并没有公开:补丁程序13737888: ONLINE DDL:ORA-600[KGHSTACK_UNDERFLOW_INTERNAL_3], [0X2B7F4E1E7678], [KEYBUF]。

 


posted on 2012-09-13 17:18 chen11-1 阅读(830) 评论(0)  编辑  收藏