posts - 42,comments - 83,trackbacks - 0

        有个客户跟我说,他们把weblogic从923升级到923以后,总是提示failed to resume transaction的。当时没有看到任何信息,以为跟JTA超时有关,让客户把JTA timeout加大,同时把下面的设置加入jdbc-config.xml中。客户反馈过来说:还是不行。 :( 

    <keep-xa-conn-till-tx-complete>true</keep-xa-conn-till-tx-complete>
    <rollback-local-tx-upon-conn-close>true</rollback-local-tx-upon-conn-close>

        比较郁闷,后来想想,客户的jdbc是使用non-XA的driver, 同时将emulate-2pc设为了true,上面连个参数是针对XA的,测试没有效果应该是正常的。:)  客户把具体错误信息给我发了过来,如下:

####<2008-10-27 上午10时35分28秒 CST> <Error> <JDBC> <SZSEWEB-YSXAPP> <appServer11> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1225074928234> <BEA-001112> <Test "SELECT 1" set up for pool "szseWebDataSource" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]服务器无法继续执行该事务。说明: 3c00000047。".> 

        奇怪了,对于non-XA的connection, weblogic做测试的时候,应该是不会其transaction的,而且是直接拿底层的物理connection来做个select,怎么会有transaction resume呢? 是不是底层物理连接出现了问题? 跟客户沟通了一下,大概了解了他们的应用:通过jdbc调用SQL Server的stored procedure,而stored procedure中会起自己的transaction。 Tx中的操作分两种类型:
1:业务简单,数据量大
2:数据不大,但业务比较复杂
总之,TX中的SQL执行时间比较长。

        开始怀疑跟客户的stored procedure有关,建议他先去掉里面的TX,果然奏效。因为transaction一直是我比较模糊的董东,也不敢跟客户说:你不能这么写,这么写是不可以的。毕竟自己底气不足啊。不是很清楚jdbc connection的auto commit提交的到底是哪个事务,是driver的tx,还是stored procedure里的tx。应该是前者吧。花了一上午,自己搭了个测试环境,终于复现了这个问题:

存储过程:

1 create proc dbo.TestProc
2 as
3 begin transaction
4     waitfor delay '00:02:00'
5     insert into dbo.TestT_1 values('test')
6 commit


JDBC 代码:

  1 package com.bea.cs.test.jdbc;
  2 
  3 import com.bea.cs.test.utils.JNDIRetriver;
  4 import java.sql.*;
  5 import java.io.*;
  6 import javax.transaction.*;
  7 
  8 public class SQLServerJDBCTest {
  9     
 10     public static void main(String args[])
 11     {
 12         SQLServerJDBCTest test = new SQLServerJDBCTest();
 13         
 14         for(int loop=0; loop<15; loop++)
 15             test.callProc("jdbc/SQLServerNonXADS", loop);
 16             
 17         try{
 18             Thread.currentThread().sleep(10000);
 19         }catch(Exception e){}
 20         
 21         for(int loop=0; loop<15; loop++)
 22             test.checkAutoCommit("jdbc/SQLServerNonXADS");
 23     }
 24     
 25     public void checkAutoCommit(String dsName)
 26     {
 27         CheckAutoCommitThread cacThread = new CheckAutoCommitThread(dsName);
 28         cacThread.start();
 29     }
 30     
 31     class CheckAutoCommitThread extends Thread
 32     {
 33         private String dsName = null;
 34         
 35         public CheckAutoCommitThread(String ds){
 36             dsName = ds;
 37         }
 38         
 39     private void callProc(String dsName, int loop)
 40     {
 41         ProcThread procThread = new ProcThread(dsName, loop);
 42         procThread.start();
 43     }
 44     
 45     class ProcThread extends Thread
 46     {
 47         private String ds = null;
 48         private int id = -1;
 49         
 50         public ProcThread(String dsName, int loop)
 51         {
 52             ds = dsName;
 53             id = loop;
 54         }
 55         
 56         public void run()
 57         {
 58             String url = "t3://10.130.2.90:7001";
 59             String sql = "{ call TestProc() }";
 60             Connection conn = null;
 61             JNDIRetriver retriever = new JNDIRetriver(url);
 62             try{
 63                 conn = retriever.getJBDCConnection(ds);
 64                 boolean autoCommit = conn.getAutoCommit();
 65                 CallableStatement cstmt = conn.prepareCall(sql);
 66                 
 67                 //start a thread to close current connection, so that a connection
 68                 //that attachs a tx will be returned to connection pool and when
 69                 //it's retrieved from connection pool by other client, the error
 70                 //will be reproduced.
 71                 ConnCloseThread closeThread = new ConnCloseThread(conn, id);
 72                 closeThread.start();
 73                 long start = System.currentTimeMillis();
 74                 System.out.println( "execute-" + id + "starts at: " + start/1000.0);
 75                 cstmt.execute();
 76                 long end = System.currentTimeMillis();
 77                 System.out.println("statement " + id + " execute: " + (end-start)/1000.0);
 78                 conn.close();
 79             }catch(Exception e)
 80             {
 81                 try{
 82                     System.out.println("connection is closed for exception: " + e.getMessage());
 83                     conn.close();
 84                 }catch(Exception e1){}
 85                 e.printStackTrace();
 86             }
 87         }    
 88     }
 89     
 90     class ConnCloseThread extends Thread
 91     {
 92         private Connection connection = null;
 93         private int id = -1;
 94         
 95         public ConnCloseThread(Connection conn, int loop){
 96             connection = conn;
 97             id = loop;
 98         }
 99         
100         public void run()
101         {
102             try{
103                 Thread.currentThread().sleep(10000);
104                 //connection.rollback();
105                 long start = System.currentTimeMillis();
106                 System.out.println( "closeConn-" + id + "starts at: " + start/1000.0);
107                 connection.close();
108                 long end = System.currentTimeMillis();
109                 System.out.println("close connection " + id + " takes: " + (end -start)/1000.0);
110             }catch(Exception e){}
111         }
112     }
113 }
114 
115 


        测试结果和预想的有点出入:ConnCloseThread中关闭连接的时候,不是立刻返回的。Connection.close()会触发Connection.commit(),而因为调用的存储过程中,存储过程起了自己的事务,connection.commit()必须等到存储过程结束才能完成(这个是microsoft论坛上看到的)。如果所有connection.close()都等到tx commit或rollback完成才执行的话,这个问题就不会出现了。看看我的测试结果:

statement 5 execute:表示从存储过程调用开始,到调用返回的时间
close connection 5 takes:表示关闭连接耗费的时间(也就是connection.commit()等待存储过程事务结果的时间)

statement 5 execute: 125.922
close connection 5 takes: 148.39
statement 14 execute: 130.031
close connection 14 takes: 148.39
statement 2 execute: 134.031
close connection 2 takes: 148.39
statement 6 execute: 138.14
close connection 6 takes: 148.406
statement 8 execute: 142.14
close connection 8 takes: 148.406
statement 0 execute: 146.156
close connection 0 takes: 148.406
statement 3 execute: 162.39
close connection 3 takes: 168.625
statement 11 execute: 166.39
close connection 11 takes: 168.625

statement 13 execute: 120.0
close connection 13 takes: 115.359
statement 12 execute: 150.265
close connection 12 takes: 148.406
statement 9 execute: 154.281
close connection 9 takes: 148.406
statement 1 execute: 158.39
close connection 1 takes: 148.406
statement 4 execute: 170.5
close connection 4 takes: 168.625
statement 10 execute: 174.515
close connection 10 takes: 168.625
statement 7 execute: 178.609
close connection 7 takes: 168.625

####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966102> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3b00000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966132> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3e00000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '31' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966142> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3800000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966162> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3a00000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '29' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966172> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3400000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966172> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3600000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966182> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3f00000001.".>


        从测试结果来看,凡是close connection耗时比execute statement短的,连接(物理连接)都会报出该问题。分析原因:通过weblogic datasource获取的connection并不是物理connection,而是由weblogic wrapped的connection。这些conection在被close后,并不会关闭物理连接,而只是将物理连接还池。我们对connection的所有操作,最终都会被delegated到底层物理连接上,即commit(),rollback()最终都是在物理连接上执行。如果上面的connection.close(),底层物理连接没有等到存储过程事务结束就返回的话,那么物理连接上应该还带有此次操作的事务,而weblogic这边不会关系物理连接的情况,直接将连接放入connection pool供其它客户端使用。这时候如果设定了test on reserve的话,下次客户端从data source获取连接时,weblogic会检查这个物理连接,作一个select操作的,这个有问题的连接就会暴露出来,也就是上面的异常。这个问题如果使用driver manager来获取连接的话(如果每次都关闭的话),则不会出现,因为使用的物理连接每次都是不同的。还好,weblogic会帮忙重新创建有问题的连接。原因大概了解了,但这是谁的问题呢? 为什么connection.close()不等存储过程的事务结束?

        结论:一般而言,我们不建议通过JDBC调用存储过程的时候,在存储过程中定义事务,应该将tx的管理工作交给jdbc去做。 non-xa如此,xa亦如此,毕竟事务嵌套了以后,管理起来是个问题,完整性更是个问题。

posted on 2008-10-28 19:38 走走停停又三年 阅读(8992) 评论(0)  编辑  收藏 所属分类: Database

只有注册用户登录后才能发表评论。


网站导航: