0

I am having trouble recovering from a broken database connection. My test case uses the following libraries.
guice-persist 4.0
Hibernate core 4.3.1
HikariCP-java6-2.3.9 (connection pool).

My test runs a simple read in a loop with a small sleep between each read.

@com.google.inject.persist.Transactional
protected void simpleRead() {
  dao = new MyDao(....)
  dao.findBy(...)
}

The Dao acquires its entityManager like this

@Inject
protected com.google.inject.Provider<EntityManager> entityManager;

The guice module is bound like this

@Override
public void configure() {
  install(new JpaPersistModule("test"));
  bind(JPAInitializer.class).asEagerSingleton();
}

My persistence.xml defines the transaction-type as "RESOURCE-LOCAL".

And I have a few hikari settings, but Im not sure if these are relevant

 <property name="hibernate.connection.provider_class" value="com.zaxxer.hikari.hibernate.HikariConnectionProvider" /> 
 <property name="hibernate.hikari.maxLifetime" value="1800000" />
 <property name="hibernate.hikari.connectionTestQuery" value="SELECT 1" />
 <property name="hibernate.hikari.leakDetectionThreshold" value="300000" />

While the test is looping through the reads, I break the SQLServer connection by stopping the service from task manager. If the loop was not in the simplRead() method, at the time, it gracefully handles the error on the next attempted read and re-connects when the service is started again. However, if it was in the simpleRead() method when the service is stopped, it fails and never recovers. The first failure is reported as follows


org.hibernate.TransactionException: JDBC begin transaction failed: javax.persistence.PersistenceException: org.hibernate.TransactionException: JDBC begin transaction failed: at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1763) at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1677) at org.hibernate.jpa.spi.AbstractEntityManagerImpl.throwPersistenceException(AbstractEntityManagerImpl.java:1771) at org.hibernate.jpa.internal.TransactionImpl.begin(TransactionImpl.java:64) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:66) at net.impro.portal.server.model.dao.TestConnectionStuff.testConnectionBroken(TestConnectionStuff.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80) at org.testng.internal.Invoker.invokeMethod(Invoker.java:714) at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901) at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111) at org.testng.TestRunner.privateRun(TestRunner.java:767) at org.testng.TestRunner.run(TestRunner.java:617) at org.testng.SuiteRunner.runTest(SuiteRunner.java:334) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291) at org.testng.SuiteRunner.run(SuiteRunner.java:240) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1197) at org.testng.TestNG.runSuitesLocally(TestNG.java:1122) at org.testng.TestNG.run(TestNG.java:1030) at org.testng.remote.RemoteTestNG.run(RemoteTestNG.java:111) at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:204) at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:175) Caused by: org.hibernate.TransactionException: JDBC begin transaction failed: at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:76) at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.begin(AbstractTransactionImpl.java:162) at org.hibernate.internal.SessionImpl.beginTransaction(SessionImpl.java:1431) at org.hibernate.jpa.internal.TransactionImpl.begin(TransactionImpl.java:61) ... 26 more Caused by: java.sql.SQLException: I/O Error: Connection reset by peer: socket write error at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1059) at net.sourceforge.jtds.jdbc.TdsCore.submitSQL(TdsCore.java:905) at net.sourceforge.jtds.jdbc.JtdsConnection.setAutoCommit(JtdsConnection.java:2291) at com.zaxxer.hikari.proxy.ConnectionProxy.setAutoCommit(ConnectionProxy.java:334) at com.zaxxer.hikari.proxy.ConnectionJavassistProxy.setAutoCommit(ConnectionJavassistProxy.java) at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:72) ... 29 more Caused by: java.net.SocketException: Connection reset by peer: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at java.io.DataOutputStream.write(Unknown Source) at net.sourceforge.jtds.jdbc.SharedSocket.sendNetPacket(SharedSocket.java:717) at net.sourceforge.jtds.jdbc.RequestStream.putPacket(RequestStream.java:570) at net.sourceforge.jtds.jdbc.RequestStream.flush(RequestStream.java:518) at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1046) ... 34 more ...

And because that now doesnt clean up properly, the subsequent requests give fail like this

org.hibernate.TransactionException: Already have an associated managed connection

Im not finding it easy to determine which layer is responsible for this problem. From what I have read, the connection pool should test broken connections and clean things up, but I suspect its guice-persist which forever has the broken EntityManager reference associated with its thread-local object. Of course, it may well also be the way Im using them.

EDIT

After further investigation, I noticed that if I wait > 1000ms between each test read, then the code behaves as expected. I get the StackTrace below, and then after I restore the SQLServer service, it happily reconnects. Anything less than 1000ms and I get the error listed above in the original post. It seems hibernate does not handle the IO/Error in the same way and the connection is forever left in a broken state.

javax.persistence.QueryTimeoutException: Could not open connection
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1725)
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1677)
    at org.hibernate.jpa.spi.AbstractEntityManagerImpl.throwPersistenceException(AbstractEntityManagerImpl.java:1771)
    at org.hibernate.jpa.internal.TransactionImpl.begin(TransactionImpl.java:64)
    at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:66)
    at net.impro.portal.server.model.dao.TestConnectionStuff.testConnectionBroken(TestConnectionStuff.java:40)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:80)
    at org.testng.internal.Invoker.invokeMethod(Invoker.java:714)
    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:901)
    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1231)
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:127)
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:111)
    at org.testng.TestRunner.privateRun(TestRunner.java:767)
    at org.testng.TestRunner.run(TestRunner.java:617)
    at org.testng.SuiteRunner.runTest(SuiteRunner.java:334)
    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:329)
    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:291)
    at org.testng.SuiteRunner.run(SuiteRunner.java:240)
    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1197)
    at org.testng.TestNG.runSuitesLocally(TestNG.java:1122)
    at org.testng.TestNG.run(TestNG.java:1030)
    at org.testng.remote.RemoteTestNG.run(RemoteTestNG.java:111)
    at org.testng.remote.RemoteTestNG.initAndRun(RemoteTestNG.java:204)
    at org.testng.remote.RemoteTestNG.main(RemoteTestNG.java:175)
Caused by: org.hibernate.QueryTimeoutException: Could not open connection
    at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:83)
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:126)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:112)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:235)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection(LogicalConnectionImpl.java:171)
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:67)
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.begin(AbstractTransactionImpl.java:162)
    at org.hibernate.internal.SessionImpl.beginTransaction(SessionImpl.java:1471)
    at org.hibernate.jpa.internal.TransactionImpl.begin(TransactionImpl.java:61)
    ... 26 more
Caused by: java.sql.SQLTimeoutException: Timeout after 30004ms of waiting for a connection.
    at com.zaxxer.hikari.pool.BaseHikariPool.getConnection(BaseHikariPool.java:233)
    at com.zaxxer.hikari.pool.BaseHikariPool.getConnection(BaseHikariPool.java:183)
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:93)
    at com.zaxxer.hikari.hibernate.HikariConnectionProvider.getConnection(HikariConnectionProvider.java:101)
    at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:380)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:228)
    ... 31 more
Caused by: java.sql.SQLException: Network error IOException: Connection refused: connect
    at net.sourceforge.jtds.jdbc.JtdsConnection.<init>(JtdsConnection.java:436)
    at net.sourceforge.jtds.jdbc.Driver.connect(Driver.java:184)
    at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:95)
    at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:101)
    at com.zaxxer.hikari.pool.BaseHikariPool.addConnection(BaseHikariPool.java:444)
    at com.zaxxer.hikari.pool.BaseHikariPool$1.run(BaseHikariPool.java:419)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.net.ConnectException: Connection refused: connect
    at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.SocksSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at net.sourceforge.jtds.jdbc.SharedSocket.createSocketForJDBC3(SharedSocket.java:288)
    at net.sourceforge.jtds.jdbc.SharedSocket.<init>(SharedSocket.java:251)
    at net.sourceforge.jtds.jdbc.JtdsConnection.<init>(JtdsConnection.java:331)
    ... 10 more
user2046211
  • 316
  • 1
  • 4
  • 20
  • Can you provide a *full-stack* trace, including sub-causes all the way down to the driver, for the "JDBC begin transaction failed" exception? – brettw Aug 07 '15 at 02:28
  • I have added the full stack. As a matter of interest, this message changed when I upgraded my Hikari & guice-persist libraries. Before the upgrade I think it said it couldnt rollback the transaction. – user2046211 Aug 07 '15 at 06:39

2 Answers2

2

What version of jTDS are you using? It looks like that version of the driver is not wrapping the SocketException with a SQLException containing a SQLState value starting with '08' (indicating disconnection).

HikariCP checks for these SQLStates and flags the connection for eviction. That connection should be closed and never returned again.

Newer versions of the jTDS driver appears to handle it correctly, if I am reading their code correctly.

UPDATE: This of course assumes that Hibernate still closes the Connection after the exception (like in a finally block).

UPDATE2: I missed the second exception part, that looks like a Hibernate issue to me ... not properly recovering from the first exception. If you enable logging for the com.zaxxer.hikari package, you should see a warning about the connection being evicted. Do you see it?

UPDATE3: I looked at jTDS 1.3.0, and it is doing the right thing. You can see here that it will generate a SQLException with SQLState="08S01", which will be checked by HikariCP and the Connection evicted.

More suspicion now lies on Hibernate. I do not have the time right now to look at their code, but I suggest you do, using the line numbers in the stacktrace.

brettw
  • 9,718
  • 2
  • 37
  • 52
  • I was using JTDS-1.3.0 but have just tried 1.3.1 and got the same results. – user2046211 Aug 07 '15 at 10:34
  • jTDS 1.3.x looked OK to me, but I'll look more closely. Can you put a breakpoint in HikariCP's ConnectionProxy.checkException() method, and inspect the SQLException and SQLState there? – brettw Aug 07 '15 at 16:23
  • I will l as soon as I'm back at the office on Tuesday. long weekend here. your help us much appreciated. – user2046211 Aug 07 '15 at 20:34
  • Sorry, I didnt see the updates to your post. The SQLState is as you said, and it does reach the warning message. I will try to start looking down the hibernate route. – user2046211 Aug 11 '15 at 10:02
  • I have added some findings to my original post if youre still interested. – user2046211 Aug 11 '15 at 14:36
0

I finally found the problem and it is a known hibernate bug with "blocker" status.

Without any reasonable workaround, database recovery seems impossible or at least very impractical.

user2046211
  • 316
  • 1
  • 4
  • 20