有个客户跟我说,他们把weblogic从923升级到923以后,总是提示failed to resume transaction的。当时没有看到任何信息,以为跟JTA超时有关,让客户把JTA timeout加大,同时把下面的设置加入jdbc-config.xml中。客户反馈过来说:还是不行。 :(
比较郁闷,后来想想,客户的jdbc是使用non-XA的driver, 同时将emulate-2pc设为了true,上面连个参数是针对XA的,测试没有效果应该是正常的。:) 客户把具体错误信息给我发了过来,如下:
####<2008-10-27 上午10时35分28秒 CST> <Error> <JDBC> <SZSEWEB-YSXAPP> <appServer11> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1225074928234> <BEA-001112> <Test "SELECT 1" set up for pool "szseWebDataSource" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]服务器无法继续执行该事务。说明: 3c00000047。".>
奇怪了,对于non-XA的connection, weblogic做测试的时候,应该是不会其transaction的,而且是直接拿底层的物理connection来做个select,怎么会有transaction resume呢? 是不是底层物理连接出现了问题? 跟客户沟通了一下,大概了解了他们的应用:通过jdbc调用SQL Server的stored procedure,而stored procedure中会起自己的transaction。 Tx中的操作分两种类型:
开始怀疑跟客户的stored procedure有关,建议他先去掉里面的TX,果然奏效。因为transaction一直是我比较模糊的董东,也不敢跟客户说:你不能这么写,这么写是不可以的。毕竟自己底气不足啊。不是很清楚jdbc connection的auto commit提交的到底是哪个事务,是driver的tx,还是stored procedure里的tx。应该是前者吧。花了一上午,自己搭了个测试环境,终于复现了这个问题:
1 create proc dbo.TestProc
2 as
3 begin transaction
4 waitfor delay '00:02:00'
5 insert into dbo.TestT_1 values('test')
6 commit
JDBC 代码:
1 package com.bea.cs.test.jdbc;
3 import com.bea.cs.test.utils.JNDIRetriver;
4 import java.sql.*;
5 import java.io.*;
6 import javax.transaction.*;
8 public class SQLServerJDBCTest {
10 public static void main(String args[])
11 {
12 SQLServerJDBCTest test = new SQLServerJDBCTest();
14 for(int loop=0; loop<15; loop++)
15 test.callProc("jdbc/SQLServerNonXADS", loop);
17 try{
18 Thread.currentThread().sleep(10000);
19 }catch(Exception e){}
21 for(int loop=0; loop<15; loop++)
22 test.checkAutoCommit("jdbc/SQLServerNonXADS");
23 }
25 public void checkAutoCommit(String dsName)
26 {
27 CheckAutoCommitThread cacThread = new CheckAutoCommitThread(dsName);
28 cacThread.start();
29 }
31 class CheckAutoCommitThread extends Thread
32 {
33 private String dsName = null;
35 public CheckAutoCommitThread(String ds){
36 dsName = ds;
37 }
39 private void callProc(String dsName, int loop)
40 {
41 ProcThread procThread = new ProcThread(dsName, loop);
42 procThread.start();
43 }
45 class ProcThread extends Thread
46 {
47 private String ds = null;
48 private int id = -1;
50 public ProcThread(String dsName, int loop)
51 {
52 ds = dsName;
53 id = loop;
54 }
56 public void run()
57 {
58 String url = "t3://";
59 String sql = "{ call TestProc() }";
60 Connection conn = null;
61 JNDIRetriver retriever = new JNDIRetriver(url);
62 try{
63 conn = retriever.getJBDCConnection(ds);
64 boolean autoCommit = conn.getAutoCommit();
65 CallableStatement cstmt = conn.prepareCall(sql);
67 //start a thread to close current connection, so that a connection
68 //that attachs a tx will be returned to connection pool and when
69 //it's retrieved from connection pool by other client, the error
70 //will be reproduced.
71 ConnCloseThread closeThread = new ConnCloseThread(conn, id);
72 closeThread.start();
73 long start = System.currentTimeMillis();
74 System.out.println( "execute-" + id + "starts at: " + start/1000.0);
75 cstmt.execute();
76 long end = System.currentTimeMillis();
77 System.out.println("statement " + id + " execute: " + (end-start)/1000.0);
78 conn.close();
79 }catch(Exception e)
80 {
81 try{
82 System.out.println("connection is closed for exception: " + e.getMessage());
83 conn.close();
84 }catch(Exception e1){}
85 e.printStackTrace();
86 }
87 }
88 }
90 class ConnCloseThread extends Thread
91 {
92 private Connection connection = null;
93 private int id = -1;
95 public ConnCloseThread(Connection conn, int loop){
96 connection = conn;
97 id = loop;
98 }
100 public void run()
101 {
102 try{
103 Thread.currentThread().sleep(10000);
104 //connection.rollback();
105 long start = System.currentTimeMillis();
106 System.out.println( "closeConn-" + id + "starts at: " + start/1000.0);
107 connection.close();
108 long end = System.currentTimeMillis();
109 System.out.println("close connection " + id + " takes: " + (end -start)/1000.0);
110 }catch(Exception e){}
111 }
112 }
113 }
测试结果和预想的有点出入:ConnCloseThread中关闭连接的时候,不是立刻返回的。Connection.close()会触发Connection.commit(),而因为调用的存储过程中,存储过程起了自己的事务,connection.commit()必须等到存储过程结束才能完成(这个是microsoft论坛上看到的)。如果所有connection.close()都等到tx commit或rollback完成才执行的话,这个问题就不会出现了。看看我的测试结果:
statement 5 execute:表示从存储过程调用开始,到调用返回的时间
close connection 5 takes:表示关闭连接耗费的时间(也就是connection.commit()等待存储过程事务结果的时间)
statement 5 execute: 125.922
close connection 5 takes: 148.39
statement 14 execute: 130.031
close connection 14 takes: 148.39
statement 2 execute: 134.031
close connection 2 takes: 148.39
statement 6 execute: 138.14
close connection 6 takes: 148.406
statement 8 execute: 142.14
close connection 8 takes: 148.406
statement 0 execute: 146.156
close connection 0 takes: 148.406
statement 3 execute: 162.39
close connection 3 takes: 168.625
statement 11 execute: 166.39
close connection 11 takes: 168.625
statement 13 execute: 120.0
close connection 13 takes: 115.359
statement 12 execute: 150.265
close connection 12 takes: 148.406
statement 9 execute: 154.281
close connection 9 takes: 148.406
statement 1 execute: 158.39
close connection 1 takes: 148.406
statement 4 execute: 170.5
close connection 4 takes: 168.625
statement 10 execute: 174.515
close connection 10 takes: 168.625
statement 7 execute: 178.609
close connection 7 takes: 168.625
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966102> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3b00000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966132> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3e00000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '31' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966142> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3800000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966162> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3a00000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '29' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966172> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3400000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966172> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3600000001.".>
####<Oct 28, 2008 5:59:26 PM CST> <Error> <JDBC> <fjin01> <AdminServer> <[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'> <<anonymous>> <> <> <1225187966182> <BEA-001112> <Test "SELECT 1" set up for pool "SQLServerNonXADS" failed with exception: "java.sql.SQLException: [BEA][SQLServer JDBC Driver][SQLServer]The server failed to resume the transaction. Desc:3f00000001.".>
从测试结果来看,凡是close connection耗时比execute statement短的,连接(物理连接)都会报出该问题。分析原因:通过weblogic datasource获取的connection并不是物理connection,而是由weblogic wrapped的connection。这些conection在被close后,并不会关闭物理连接,而只是将物理连接还池。我们对connection的所有操作,最终都会被delegated到底层物理连接上,即commit(),rollback()最终都是在物理连接上执行。如果上面的connection.close(),底层物理连接没有等到存储过程事务结束就返回的话,那么物理连接上应该还带有此次操作的事务,而weblogic这边不会关系物理连接的情况,直接将连接放入connection pool供其它客户端使用。这时候如果设定了test on reserve的话,下次客户端从data source获取连接时,weblogic会检查这个物理连接,作一个select操作的,这个有问题的连接就会暴露出来,也就是上面的异常。这个问题如果使用driver manager来获取连接的话(如果每次都关闭的话),则不会出现,因为使用的物理连接每次都是不同的。还好,weblogic会帮忙重新创建有问题的连接。原因大概了解了,但这是谁的问题呢? 为什么connection.close()不等存储过程的事务结束?
结论:一般而言,我们不建议通过JDBC调用存储过程的时候,在存储过程中定义事务,应该将tx的管理工作交给jdbc去做。 non-xa如此,xa亦如此,毕竟事务嵌套了以后,管理起来是个问题,完整性更是个问题。
