在JBoss Remoting 2.2.2中存在这么一个bug,如果刚好客户端的timeout比服务器端处理时间短的话,就会出现客户端连接池中的连接被无故用掉一个的状况,而且是没法回收的,最终就会导致很快客户端的连接池被占满的现象,在分析JBoss Remoting 2.2.2的代码后发现了问题的所在,同时查看了下JBoss Remoting 2.4的代码,发现在2.4中此bug已被修复。
来看下JBoss Remoting 2.2.2中有问题的这段代码的片断:
synchronized(usedPoolLock)
{
if (pooled != null)
{
usedPooled++;
if (trace) log.trace(this + " got a socket, usedPooled: " + usedPooled);
break;
}
if (usedPooled < maxPoolSize)
{
// Try to get a socket.
if (trace) log.trace(this + " getting a socket, usedPooled: " + usedPooled);
usedPooled++;
}
else
{
retry = true;
if (trace) log.trace(this + " will try again to get a socket");
}
}
Socket socket = null;
long timestamp = System.currentTimeMillis();
try
{
if (trace) { log.trace(this + " creating socket " + (counter++) + ", attempt " + (i + 1)); }
socket = createSocket(address.address, address.port, timeRemaining);
if (trace) log.trace(this + " created socket: " + socket);
}
catch (Exception ex)
{
log.debug(this + " got Exception " + ex + ", creation attempt took " +
(System.currentTimeMillis() - timestamp) + " ms");
synchronized(usedPoolLock)
{
usedPooled--;
}
if (i + 1 < numberOfRetries)
{
Thread.sleep(1);
continue;
}
throw ex;
}
socket.setTcpNoDelay(address.enableTcpNoDelay);
Map metadata = getLocator().getParameters();
if (metadata == null)
{
metadata = new HashMap(2);
}
else
{
metadata = new HashMap(metadata);
}
metadata.put(SocketWrapper.MARSHALLER, marshaller);
metadata.put(SocketWrapper.UNMARSHALLER, unmarshaller);
if (timeAllowed > 0)
{
timeRemaining = (int) (timeAllowed - (System.currentTimeMillis() - start));
if (timeRemaining <= 0)
break;
metadata.put(SocketWrapper.TEMP_TIMEOUT, new Integer(timeRemaining));
}
pooled = createClientSocket(socket, address.timeout, metadata);
这段代码的问题出在哪呢,就出在最后一行,或者说出在前面实现给usedPooled++上也可以。
在这里JBoss Remoting过于相信createClientSocket这行代码了,jboss remoting认为这行代码是不可能抛出异常的,但事实上其实这行是有可能会抛出异常的,可以想下,如果这行代码执行抛出异常的话,会造成的现象就是之前说的,客户端连接池中的连接被占用了一个,而且没有回收的地方。
所以最简单的方法自然是在pooled=
createClientSocket(socket, address.timeout, metadata);这行代码上增加捕捉try...catch,如果有异常抛出的话,则将usedPooled--;就像之前createSocket那个地方一样。
在JBoss Remoting 2.4中,jboss不再采用usedPooled这个long型加上usedPoolLock这个对象锁的方式来控制连接池,而是改为了采用更简单好用的Semphore,不过用的还是EDG包的,而不是java 5的,来看看jboss remoting 2.4中的这段代码改成什么样了:
boolean timedout = !semaphore.attempt(timeToWait);
if (trace) log.trace(this + " obtained semaphore: " + semaphore.permits());
if (timedout)
{
throw new IllegalStateException("Timeout waiting for a free socket");
}
SocketWrapper pooled = null;
if (tryPool)
{
synchronized (pool)
{
// if connection within pool, use it
if (pool.size() > 0)
{
pooled = getPooledConnection();
if (trace) log.trace(this + " reusing pooled connection: " + pooled);
}
}
}
else
{
if (trace) log.trace(this + " avoiding connection pool, creating new socket");
}
if (pooled == null)
{
//Need to create a new one
Socket socket = null;
if (trace) { log.trace(this + " creating socket "); }
// timeAllowed < 0 indicates no per invocation timeout has been set.
int timeRemaining = -1;
if (0 <= timeAllowed)
{
timeRemaining = (int) (timeAllowed - (System.currentTimeMillis() - start));
}
socket = createSocket(address.address, address.port, timeRemaining);
if (trace) log.trace(this + " created socket: " + socket);
socket.setTcpNoDelay(address.enableTcpNoDelay);
Map metadata = getLocator().getParameters();
if (metadata == null)
{
metadata = new HashMap(2);
}
else
{
metadata = new HashMap(metadata);
}
metadata.put(SocketWrapper.MARSHALLER, marshaller);
metadata.put(SocketWrapper.UNMARSHALLER, unmarshaller);
if (timeAllowed > 0)
{
timeRemaining = (int) (timeAllowed - (System.currentTimeMillis() - start));
if (timeRemaining <= 0)
throw new IllegalStateException("Timeout creating a new socket");
metadata.put(SocketWrapper.TEMP_TIMEOUT, new Integer(timeRemaining));
}
pooled = createClientSocket(socket, address.timeout, metadata);
}
return pooled;
从以上代码可以看到,JBoss首先是通过semphore.attempt的方式来获取信号量锁,然后就在下面的所有代码中都不做异常的捕捉,jboss在这里改为了在外面统一捕捉这个方法的所有异常,并在有异常的情况下再调用semphore.release():
try
{
boolean tryPool = retryCount < (numberOfCallRetries - 1)
|| maxPoolSize == 1
|| numberOfCallRetries == 1;
long l = System.currentTimeMillis();
socketWrapper = getConnection(marshaller, unmarshaller, tryPool, timeLeft);
long d = System.currentTimeMillis() - l;
if (trace) log.trace("took " + d + " ms to get socket " + socketWrapper);
}
catch (Exception e)
{
// if (bailOut)
// return null;
semaphore.release();
if (trace) log.trace(this + " released semaphore: " + semaphore.permits(), e);
sockEx = new CannotConnectException(
"Can not get connection to server. Problem establishing " +
"socket connection for " + locator, e);
continue;
}
这样自然是不会再出现2.2.2版本里的那个bug了。
:),由于2.4是重构为了采用semphore,不知道这个bug是刚好凑巧被这样修复了呢,还是知道了这个bug进行fixed,呵呵,不管如何,总之bug是被修订了。
这个bug对于使用jboss remoting的同学们而言还是要引起注意的,因为jboss remoting 2.2.2是jboss as 4.2.2中默认带的版本。
从上面的代码可以看到使用semphore这样的方式来控制并发的需要限制大小的数据结构是非常好的,简单易用,以前的那些long+Object Lock的方式实在是繁琐,另外这个bug也给大家提了醒,在并发的这些资源的控制上千万要注意锁以及释放的点,千万不要主观的认为某些代码是绝对不会出问题的。