#
最近发现10g的RMAN备份保留REDUNDANCY策略和9i相比发生了改变。
在Oracle9i中,备份保留策略的REDUNDANCY的值,指的是备份冗余的个数。也就是说,如果REDUNDANCY设置为1,那么Oracle会保留2个备份。
但是在10g以后,REDUNDANCY的值,就是最终备份保留的值,手头没有10g的环境,用11g的rman做了一个例子:
solaris*orcl-/home/oracle$ rman target /
Recovery Manager: Release 11.2.0.3.0 - Production on Sun Jul 8 19:04:43 2012
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL (DBID=1299676637)
RMAN> show retention policy;
using target database control file instead of recovery catalog
RMAN configuration parameters for database with db_unique_name ORCL are:
CONFIGURE RETENTION POLICY TO REDUNDANCY 1; # default
RMAN> backup tablespace ts_32k;
Starting backup at 08-JUL-12
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=180 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datbafile(s) in backup set
input datafile file number=00005 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
channel ORA_DISK_1: starting piece 1 at 08-JUL-12
channel ORA_DISK_1: finished piece 1 at 08-JUL-12
piece handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp tag=TAG20120708T190559 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 08-JUL-12
RMAN> backup tablespace ts_32k;
Starting backup at 08-JUL-12
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datbafile(s) in backup set
input datafile file number=00005 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
channel ORA_DISK_1: starting piece 1 at 08-JUL-12
channel ORA_DISK_1: finished piece 1 at 08-JUL-12
piece handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190609_7zltf22b_.bkp tag=TAG20120708T190609 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 08-JUL-12
RMAN> list backup of tablespace ts_32k;
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
20 Full 2.69M DISK 00:00:01 08-JUL-12
BP Key: 20 Status: AVAILABLE Compressed: NO Tag: TAG20120708T190559
Piece Name: /u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp
List of Datafiles in backup set 20
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
5 Full 28932281 08-JUL-12 /u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
21 Full 2.69M DISK 00:00:01 08-JUL-12
BP Key: 21 Status: AVAILABLE Compressed: NO Tag: TAG20120708T190609
Piece Name: /u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190609_7zltf22b_.bkp
List of Datafiles in backup set 21
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
5 Full 28932300 08-JUL-12 /u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts_32k_7w1w3zmb_.dbf
RMAN> delete obsolete;
RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
using channel ORA_DISK_1
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Backup Set 20 08-JUL-12
Backup Piece 20 08-JUL-12 /u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp
Do you really want to delete the above objects (enter YES or NO)? yes
deleted backup piece
backup piece handle=/u01/app/oracle/flash_recovery_area/ORCL/backupset/2012_07_08/o1_mf_nnndf_TAG20120708T190559_7zltdqxy_.bkp RECID=20 STAMP=788123159
Deleted 1 objects
可以看到,从10g开始设置的REDUNDANCY的值,就是最终备份保留的个数。为了确认这个问题,特意查询了一下9i和10g的官方文档。
9i的说法是:
The REDUNDANCY parameter specifies that any number of backups or copies beyond a specified number need not be retained.
而10g的文档中,该参数的描述变为:
A redundancy-based backup retention policy determines whether a backup is obsolete based on how many backups of a file are currently on disk.
Oracle改变功能的实现很常见,但是没有想到,对于这种细节的定义也会调整。对于熟悉9i备份策略的DBA需要留神,在设置10g以后的RMAN备份保留策略时,需要在9i的基础上增加1。
客户数据库版本为9208 RAC FOR AIX,客户反应系统缓慢,检查告警日志,发现大量Library cache lock和Library cache load lock等待。
由于客户的原因,这个问题只是远程协助的方式帮忙检查了一下,因此没有留下任何的操作记录,这里只是简单描述一下问题。
客户反应数据库操作响应变慢,平时一个执行很快的基于主键的UPDATE操作也变得异常缓慢,且执行计划本身并未发生改变。
登录数据库后检查两个节点上的告警日志,并未发现任何异常报错。分别检查两个实例的等待信息,发现除了上面提到的大量Library cache lock和Library cache load lock以外,还有明显的gc等待。
但是随后发现,查询V$SESSION和GV$SESSION的结果居然没有区别,接着查询GV$INSTANTBCE视图,发现只有当前的实例存在,而此时恰好连接另一个节点的工具出现了断连,以至于我一度以为另外一个节点上的实例已经DOWN掉,但是随后重新登录到该节点上,发现数据库实例仍然存在,而且登录到数据库实例中也可以进行任何正常的操作。不过发现在当前节点所有的GV$视图都只会返回当前实例的信息,这与另外一个节点的情况完全一样。显然两个节点间的通信出现了问题,当前节点已经不清楚另外一个节点的状态的。
现在再去分析那些等待信息已经没有太多的意义了,因为整个数据库已经处于不正常的状态。不难推断,当前数据库的异常是由于节点间的通信异常导致。由于9i使用的操作系统的CLUSTER,还没有Oracle的clusterware,剩下只能由操作系统或硬件维护人员去进一步跟踪了。
最终数据库和系统在夜间闲时进行了重启操作,重启后数据库恢复正常,GV$视图的结果也恢复了正常。
正常情况下,DBMS_OUTPUT包无法直接输出一个空行。
以前还真没有注意这个问题,前两天想在输出结果的时候进行一下简单的格式化,发现了这个问题:
SQL> set serverout on
SQL> begin
2 dbms_output.put_line('a');
3 dbms_output.put_line(' ');
4 dbms_output.put_line('b');
5 dbms_output.new_line;
6 dbms_output.put_line('c');
7 end;
8 /
a
b
c
PL/SQL procedure successfully completed.
导致问题的原因在于,如果使用DBMS_OUTPUTB包输出的一行都是不可见字符,那么这行内容被DBMS_OUTPUT包忽略掉。
虽然DBMS_OUTPUT包本身并没有提供开关来屏蔽这个属性,不过这个问题依然很容易解决,最简单的方法莫过于直接把回车包含在字符串中:
SQL> begin
2 dbms_output.put_line('a
3
4 b');
5 dbms_output.put_line('
6 c');
7 end;
8 /
a
b
c
PL/SQL procedure successfully completed.
当然这种方法有可能导致PL/SQL代码的可读性变差,也容易影响代码的缩进格式,此外还有一种方式:
SQL> begin
2 dbms_output.put_line('a' || chr(10) || chr(13));
3 dbms_output.put_line('b');
4 dbms_output.put_line(chr(10) || chr(13) || 'c');
5 end;
6 /
a
b
c
PL/SQL procedure successfully completed.
客户10.2.0.4环境告警日志出现ORA-27468错误。
详细错误信息为:
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_j000_18724.trc:
ORA-12012: error on auto execute of job 42791
ORA-27468: "EXFSYS.RLM$EVTCLEANUP" is locked by another process
导致这个错误的原因在于升级时没有执行catupgrd.sql,而是执行了建库的部分脚本如catalog.sql和catproc.sql,这导致只有CATALOG视图和系统的PACKAGE以及TYPE的版本更新到10.2.0.4,而其他数据库中组件的版本并没有升级,仍然是10.2.0.1。
在MOS文档ORA-12012 ORA-27468: "SYS.PURGE_LOG" is Locked by Another Process [ID 751884.1]中介绍了这个错误,这个问题可能发生在10.2.0.2到10.2.0.5之间,解决问题的方法很简单,在闲时执行catbupgrd.sql,完成升级组件的后续操作既可。
客户的10.2.0.4 RAC for Hp-un环境碰到了这个错误。
错误信息为:
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_11261.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_32036.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_5935.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_5026.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:05 2012
Errors in file /opt/app/oracle/admin/orcl/udump/orcl1_ora_7620.trc:
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:42:08 2012
Trace dumping is performing id=[cdmp_20120229194207]
Wed Feb 29 19:42:17 2012
Trace dumping is performing id=[cdmp_20120229194217]
这个ORA-600[qersqCloseRem-2]错误非常罕见,在MOS上居然没有任何记载。不过从tb错误信息进行进一步的分析,这个错误发生在远端数据库的访问异常。
检查进一步的详细信息:
*** 2012-02-29 19:42:05.564
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [qersqCloseRem-2], [Invalid Handle], [], [], [], [], [], []
ORA-02068: following severe error from WEBDB.COM
ORA-03113: end-of-file on communication channel
Current SQL statement for this session:
SELECT ACCESS_LOG_SEQUENCE.NEXTVAL@WEBDB.COM FROM DUAL
----- PL/SQL Call Stack -----
object line object
handle number name
0x39b5c3720 5 ECOMMERCE.P_USER_AT
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
7FBFFF4370 ? 7FBFFF43D0 ?
7FBFFF4310 ? 000000000 ?
ksedmp()+610 call ksedst() 000000000 ? 000000001 ?
7FBFFF4370 ? 7FBFFF43D0 ?
7FBFFF4310 ? 000000000 ?
ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ?
7FBFFF4370 ? 7FBFFF43D0 ?
7FBFFF4310 ? 000000000 ?
.
.
.
0059DF200 ? 683F6E400000001 ?
main()+116 call opimai_real() 000000002 ? 7FBFFFF4E0 ?
000000004 ? 7FBFFFF478 ?
0059DF200 ? 683F6E400000001 ?
__libc_start_main() call main() 000000002 ? 7FBFFFF4E0 ?
+219 000000004 ? 7FBFFFF478 ?
0059DF200 ? 683F6E400000001 ?
_start()+42 call __libc_start_main() 0007139F8 ? 000000002 ?
7FBFFFF628 ? 0052B4BD0 ?
000000000 ? 000000002 ?
--------------------- Binary Stack Dump ---------------------
从详细TRACE分析,在问题发生时刻,正在通过数据库链读取远端序列的值。而此时出现的ORA-3113通信错误,多半与远端数据库状态异常有关。
检查远端数据库的告警日志,果然发现在问题出现时刻,数据库状态异常并最终导致了实例重启:
Wed Feb 29 19:39:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:39:30 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:40:01 2012
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:28 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:29 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:43:30 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:45:26 2012
PMON failed to acquire latch, see PMON dump
Wed Feb 29 19:46:32 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:33 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:34 2012
PMON failed to acquire latch, see PMON dump
Wed Feb 29 19:46:40 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:43 2012
WARNING: inbound connection timed out (ORA-3136)
Wed Feb 29 19:46:44 2012
Errors in file /opt/app/oracle/admin/orcl/bdump/orcl1_asmb_14614.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Wed Feb 29 19:46:44 2012
ASMB: terminating instance due to error 15064
Wed Feb 29 19:46:44 2012
System state dump is made for local instance
System State dumped to trace file /opt/app/oracle/admin/orcl/bdump/orcl1_diag_14555.trc
Wed Feb 29 19:46:47 2012
Shutting down instance (abort)
License high water mark = 1623
Wed Feb 29 19:46:49 2012
Instance terminated by ASMB, pid = 14614
Wed Feb 29 19:46:52 2012
Instance terminated by USER, pid = 3684
显然远端数据库状态异常是这个ORA-600错误的直接原因。
简单描述一下Oracle VM Server的安装过程。
需要注意,VM 3.0以上版本才支持升级操作,因此在VM 2.2没有办法升级到当前版本,安装VM 3.0将会删除服务器上所有的数据。
将VM Server的光盘放入,并从光盘启动服务器。
在启动界面直输入Enter开始安装过程:
Oracle会提示是否监测截至,这里可以直接SKIP跳过;
键盘选择:选择us;
然后是版权声明,选择Accept后,开始正式的安装步骤;
如果服务器上没有系统,那么会直接进入后面的分区阶段,否则会提示重装系统还是在原有系统上升级;
选择ReInstall后,会显示当前系统磁盘分区信息,首先选择准备进行系统安装的分区,然后选择Remove all partitions and create a new defaultb partition layout,Oracle在格式化分区之前会要求再次确认,并询问是否预览分区空间详细配置,可以完全按照默认推荐值安装,因此这里可以跳过,也可以进入到分区空间修改页面进行自定义的修改;
随后选择Boot Loader配置,选择Master Boot Record;
然后选择一个管理网络接口,手工输入IP和掩码,在下一个页面输入网关、DNS信息,接着是主机名信息;
配置服务器所在时区,配置中找不到北京,可以tb设置Asia/Shanghai代替;
分别输入Agent密码和root密码后,安装操作完成,这是会提示整个安装的日志文件的位置。
在重启界面选择REBOOT,完成整个安装过程。
启动后,进入Oracle VM Server 3.0控制台界面,可以通过Alt + F2进入linux的登录界面。至此VM Server安装完成。
在VM Manager中搜索VM Server时出现这个错误。
按照VM Server以及VM Manager后,通过指定IP地址,让VM Manager自动寻找VM Server,结果JOB运行报错,详细的错误信息为:
Job Construction Phase
----------------------
begin()
Appended operation 'Discover Manager Server Discover' to object 'OVM Foundry : Discover Manager'.
commit()
Completed Step: COMMIT
Objects and Operations
----------------------
Object (IN_USE): [Server] 35:38:33:39:31:34:43:4e:47:31:33:30:53:37:33:42 (server2.zihexin.com)
Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager
Operation: Discover Manager Server Discover
Job Running Phase at 18:05 on Fri, Nov 25, 2011
----------------------------------------------
Job Participants: []
Actioner
--------
Starting operation 'Discover Manager Server Discover' on object 'OVM Foundry : Discover Manager'
Setting Context to model only in job with id=1322215534120
Job Internal Error (Operation)com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Attempt to send command: discover_server to server: 10.0.10.171 failed. OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:474)
at com.oracle.ovm.mgr.action.ActionEngine.sendDiscoverCommand(ActionEngine.java:283)
at com.oracle.ovm.mgr.action.ServerAction.getServerInfo(ServerAction.java:95)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:131)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:61)
at com.oracle.ovm.mgr.discover.ovm.DiscoverHandler.execute(DiscoverHandler.java:50)
at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:435)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverNewServer(DiscoverEngine.java:345)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverServer(DiscoverEngine.java:265)
at com.oracle.ovm.mgr.op.manager.DiscoverManagerServerDiscover.action(DiscoverManagerServerDiscover.java:48)
at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:191)
at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:257)
at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1019)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:223)
at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:136)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:100)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:751)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:395)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:453)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:752)
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:470)
... 24 more
FailedOperationCleanup
----------
Starting failed operation 'Discover Manager Server Discover' cleanup on object 'OVM Foundry : Discover Manager'
Complete rollback operation 'Discover Manager Server Discover' completed with direction=OVM Foundry : Discover Manager
Rollbacker
----------
Objects To Be Rolled Back
-------------------------
Object (IN_USE): [Server] 35:38:33:39:31:34:43:4e:47:31:33:30:53:37:33:42 (server2.zihexin.com)
Object (IN_USE): [DiscoverManager] OVM Foundry : Discover Manager
Completed Step: ROLLBACK
Job failed commit (internal) due to OVMAPI_4010E Attempt to send command: discover_server to server: 10.0.10.171 failed. OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
Fri Nov 25 18:05:34 CST 2011
com.oracle.ovm.mgr.api.exception.FailedOperationException: OVMAPI_4010E Atbtempt to send command: discover_server to server: 10.0.10.171 failed. OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActionEngine.java:474)
at com.oracle.ovm.mgr.action.ActionEngine.sendDiscoverCommand(ActionEngine.java:283)
at com.oracle.ovm.mgr.action.ServerAction.getServerInfo(ServerAction.java:95)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:131)
at com.oracle.ovm.mgr.discover.ovm.ServerBasicDiscoverHandler.query(ServerBasicDiscoverHandler.java:61)
at com.oracle.ovm.mgr.discover.ovm.DiscoverHandler.execute(DiscoverHandler.java:50)
at com.oracle.ovm.mgr.discover.DiscoverEngine.handleDiscover(DiscoverEngine.java:435)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverNewServer(DiscoverEngine.java:345)
at com.oracle.ovm.mgr.discover.DiscoverEngine.discoverServer(DiscoverEngine.java:265)
at com.oracle.ovm.mgr.op.manager.DiscoverManagerServerDiscover.action(DiscoverManagerServerDiscover.java:48)
at com.oracle.ovm.mgr.api.job.JobEngine.operationActioner(JobEngine.java:191)
at com.oracle.ovm.mgr.api.job.JobEngine.objectActioner(JobEngine.java:257)
at com.oracle.ovm.mgr.api.job.InternalJobDbImpl.objectCommitter(InternalJobDbImpl.java:1019)
at sun.reflect.GeneratedMethodAccessor1001.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.oracle.odof.core.AbstractVessel.invokeMethod(AbstractVessel.java:223)
at com.oracle.odof.core.BasicWork.invokeMethod(BasicWork.java:136)
at com.oracle.odof.command.InvokeMethodCommand.process(InvokeMethodCommand.java:100)
at com.oracle.odof.core.BasicWork.processCommand(BasicWork.java:81)
at com.oracle.odof.core.TransactionManager.processCommand(TransactionManager.java:751)
at com.oracle.odof.core.WorkflowManager.processCommand(WorkflowManager.java:395)
at com.oracle.odof.core.WorkflowManager.processWork(WorkflowManager.java:453)
at com.oracle.odof.io.AbstractClient.run(AbstractClient.java:42)
at java.lang.Thread.run(Thread.java:662)
Caused by: com.oracle.ovm.mgr.api.exception.IllegalOperationException: OVMAPI_4004E Server Failed Command: discover_server, Status:
Fri Nov 25 18:05:34 CST 2011
at com.oracle.ovm.mgr.action.ActionEngine.sendAction(ActionEngine.java:752)
at com.oracle.ovm.mgr.action.ActionEngine.sendCommandToServer(ActbionEngine.java:470)
... 24 more
----------
End of Job
----------
由于关键性信息确实,所以无法判断导致错误的原因。即使是在metalink或GOOGLE中查询,也得不到任何有价值的信息。
虽然在VM Manager中得不到有意义的信息,但是在VM Server上,却可以得到更详细的信息,通过检查var/log/ovs-agent.log文件,获取到下面的信息:
[2011-04-16 13:21:46 25970] ERROR (OVSAgentServer:108) Unauthorized access attempt from ('10.0.10.173', 59424)!
Traceback (most recent call last):
File "/opt/ovs-agent-3.0/OVSAgentServer.py", line 103, in do_POST
auth(username, password)
File "/opt/ovs-agent-3.0/OVSAgentServer.py", line 42, in auth
raise Exception('Authorization failed: user does not exist or password error.')
Exception: Authorization failed: user does not exist or password error.
[2011-04-16 13:21:46 25970] INFO (OVSAgentServer:169) code 403, message Unauthorized access attempt from ('10.0.10.173', 59424)!
这次信息就明确多了,显然是由于VM Manager中配置的密码不正确所致,在VM Server上修改oracle用户密码:
[root@server2 ~]# ovs-agent-passwd oracle
Password:
Again:
在搜索VM Server时使用这里修改的密码,VM Manager成功的发现了VM Server信息。
一个客户碰到的具体需求,分区表中有些分区所在的表空间被OFFLINE,tb导致在删除统计信息时报错。
下面通过例子来说明这个问题:
SQL> create table t_part_read (id number)
2 partition by range (id)
3 (partition p1 values less than (10) tablespace ts1,
4 partition p2 values less than (20) tablespace ts2,
5 partition pmax values less than (maxvalue) tablespace users);
Table created.
SQL> insert into t_part_read select rownum from tab;
54 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
PL/SQL procedure successfully completed.
SQL> alter tablespace ts1 read only;
Tablespace altered.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
PL/SQL procedure successfully completed.
SQL> alter tablespace ts1 offline;
Tablespace altered.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
BEGIN dbms_stats.gather_table_stats(user, 'T_PART_READ'); END;
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts1_7w8l5fz1_.dbf'
ORA-06512: at "SYS.DBMS_STATS", line 23829
ORA-06512: at "SYS.DBMS_STATS", line 23880
ORA-06512: at line 1
如果将表空间只读,并不会影响到表空间上的表或分区的统计信息的收集,因为收集过程只是读取,而收集的结果信息是写到SYSTEM表空间的。
但是如果分区所在的表空间处于OFFLINE状态,那么在统计信息收集的过程中就会报错。
有一个很简单的方法可以解决这个问题,就是将被OFFLINE影响的分区的统计信息锁定,这样Oracle在收集统计信息时就会跳过锁定的分区,通过这个办法就可以避免统计信息收集过程中的报错:
SQL> exec dbms_stats.lock_partition_stats(user, 'T_PART_READ', 'P1')
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ')
BEGIN dbms_stats.gather_table_stats(user, 'T_PART_READ'); END;
*
ERROR at line 1:
ORA-00376: file 6 cannot be read at this time
ORA-01110: data file 6: '/u01/app/oracle/oradata/ORCL/datafile/o1_mf_ts1_7w8l5fz1_.dbf'
ORA-06512: at "SYS.DBMS_STATS", line 23829
ORA-06512: at "SYS.DBMS_STATS", line 23880
ORA-06512: at line 1
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART_READ', granularity => 'PARTITION')
PL/SQL procedure successfully completed.
即使锁定分区后,尝试收集统计信息仍然报错,这是因为Oracle默认除了要收集分区上的统计信息以外,还要收集表级的统计信息,而这就会造成被OFFLINE影响的分区也要被读取。
解决方法就是在收集统计信息的时候指定收集的粒度是分区,不收集表上的GLOBAL信息。
当用户密码即将过期时,在登录时Oracle会提示ORA-28002错误,但是并不会影响正常的登录。
本来认为这个信息并没有太大的影响,但是没想到这个tb错误会导致SET AUTOTRACE功能失效:
solaris*orcl-/home/oracle$ sqlplus test/test
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 11:27:28 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
ERROR:
ORA-28002: the password will expire within 1 days
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
SQL> set autot on
ERROR:
ORA-28002: the password will expire within 1 days
SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
SQL> alter user test identified by test;
User altered.
SQL> set autot on
ERROR:
ORA-24315: illegal attribute type
SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
solaris*orcl-/home/oracle$ sqlplus test/test
SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 13 11:27:52 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
SQL> set autot on
SQL> insert into t values (1, 'a');
1 row created.
Execution Plan
----------------------------------------------------------
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 12 | 1 (0)| 00:00:01 |
| 1 | LOAD TABLE CONVENTIONAL | T | | | | |
---------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
71 recursive calls
9 db block gets
75 consistent gets
10 physical reads
1284 redo size
829 bytes sent via SQL*Net to client
785 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
9 sorts (memory)
0 sorts (disk)
1 rows processed
很显然,由于ORA-28002错误导致了SET AUTOTRACE ON功能启用时碰到了错误。当修改了当前的用户密码,则ORA-28002不再出现后,SET AUTOTRACE ON的功能恢复正常。
根据上面的信息其实可以判断,在启用SET AUTOTRACE ON功能时,sqlplus会自动创建一个新的会话来记录当前会话的统计信息。
而启用的新的会话会使用当前会话登录时保存的密码来进行登录,因此可以通过下面的例子来验证这个推论:
solaris*orcl-/home/oracle$ sqlplus test/test
SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 15 01:28:38 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
SQL> set autot on
SQL> set autot off
SQL> alter user test identified by test1;
User altered.
SQL> set autot on
ERROR:
ORA-01017: invalid username/password; logon denied
SP2-0619: Error while connecting
SP2-0611: Error enabling STATISTICS report
果然,在用户登录后,如果当前的密码被修改,是会导致SET AUTOTRACE ON启用时报错不正确的用户名密码错误的。
Oracle10g的DBA_TAB_STATISTICS视图的STATTYPE_LOCKED列没有tb正确的显示结果。
看一个简单的例子:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE 10.2.0.5.0 Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> create table t_part (id number, name varchar2(30))
2 partition by range (id)
3 (partition p1 values less than (10),
4 partition p2 values less than (20),
5 partition pmax values less than (maxvalue));
Table created.
SQL> select table_name, partition_name, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME STATT
------------------------------ ------------------------------ -----
T_PART
T_PART P1
T_PART P2
T_PART PMAX
SQL> exec dbms_stats.lock_partition_stats(user, 'T_PART', 'P1')
PL/SQL procedure successfully completed.
SQL> select table_name, partition_name, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME STATT
------------------------------ ------------------------------ -----
T_PART
T_PART P1
T_PART P2
T_PART PMAX
SQL> exec dbms_stats.gather_table_stats(user, 'T_PART')
PL/SQL procedure successfully completed.
SQL> select table_name, partition_name, last_analyzed, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME LAST_ANAL STATT
------------------------------ ------------------------------ --------- -----
T_PART 16-JUL-12
T_PART P1
T_PART P2 16-JUL-12
T_PART PMAX 16-JUL-12
可以看到在10.2环境中,LOCK_PARTITION_STATS过程是正常工作的,但是DBA_TAB_STATISTICS视图的STATTYPE_LOCKED列并没有正确的显示分区被锁定的结果。
而对于表来说,LOCK_TABLE_STATS过程执行后,STATTYPE_LOCKED的结果显示是正常的:
SQL> exec dbms_stats.lock_table_stats(user, 'T_PART')
PL/SQL procedure successfully completed.
SQL> select table_name, partition_name, last_analyzed, stattype_locked from all_tab_statistics where wner = user and table_name = 'T_PART';
TABLE_NAME PARTITION_NAME LAST_ANAL STATT
------------------------------ ------------------------------ --------- -----
T_PART 16-JUL-12 ALL
T_PART P1 ALL
T_PART P2 16-JUL-12 ALL
T_PART PMAX 16-JUL-12 ALL
这说明在10.2中,Oracle对于分区列的锁定的支持是存在问题的。查询了一下MOS,Oracle将这个问题确认为内部BUG:7240460,这个问题在11.1.0.7中被FIXED。
而在11.2中,这个问题以及不存在了:
SQL> select * from v$version;
BANNER
----------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Solaris: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> select owner, table_name, partition_name, stattype_locked
2 from dba_tab_statistics
3 where wner = 'TEST'
4 and table_name = 'T_PART';
OWNER TABLE_NAME PARTITION_NAME STATT
---------- ------------ --------------- -----
TEST T_PART
TEST T_PART P2
TEST T_PART P3
TEST T_PART P4
TEST T_PART P5
TEST T_PART PMAX
6 rows selected.
SQL> exec dbms_stats.lock_partition_stats('TEST', 'T_PART', 'P2')
PL/SQL procedure successfully completed.
SQL> select owner, table_name, partition_name, stattype_locked
2 from dba_tab_statistics
3 where wner = 'TEST'
4 and table_name = 'T_PART';
OWNER TABLE_NAME PARTITION_NAME STATT
---------- ------------ --------------- -----
TEST T_PART
TEST T_PART P2 ALL
TEST T_PART P3
TEST T_PART P4
TEST T_PART P5
TEST T_PART PMAX
6 rows selected.