11

For months, my web application ran smoothly, but for the past week or two, it keeps dropping its connection to MySQL server. I'm not a DBA guy and have no idea how to debug this.

Here is what I know:

  1. The connection seems to drop every few hours. Sometimes during the day, but always during the night.
  2. My lab has a MySQL server that hosts databases for multiple applications.
  3. Currently, we have 46 connections to the MySQL server.
  4. To my knowledge, no other application is experiencing this issue.
  5. My application is using the same stack, configuration, and even code for connecting to the DB as another application—this other application supports around 200 users per day and has been running smoothly since 2013.
  6. Both applications use Hibernate ORM; this is the only configuration that I know of:

    <!-- TomcatJDBCConnectionProvider class is common to both applications -->
    <property name="hibernate.connection.provider_class">org.hibernate.connection.TomcatJDBCConnectionProvider</property>
    <property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
    <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property>
    <property name="hibernate.connection.pool_size">5</property>
    <property name="hibernate.current_session_context_class">thread</property>
    <property name="hibernate.tomcatJdbcPool.validationQuery">SELECT 1</property>
    <property name="hibernate.tomcatJdbcPool.testOnBorrow">true</property>
    <property name="hibernate.enable_lazy_load_no_trans">true</property>
    
  7. The issue started around the same time as when someone tried to use the application's RESTful API to download our data. This user—actually a collaborator—has a small script iterates over every row in a specific table and requests all the metadata.

  8. The issue also started around the same time that my lab started offering a Coursera Massive Open Online Course. I don't know what the numbers are, but the actual usage on the site must have jumped.

I'm aware that this is a broad question, but I'm really at a loss as to how to go about debugging this. Any suggestions are appreciated.

EDIT:

Digging around the other application's ServletContextListener, I found this bit of code that my contextDestroyed function does not have:

// TODO: Find memory leak that requires server to be restarted after hot deploying several (3?) times.
Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for (Thread t : threadSet) {
    if (t.getName().contains("Abandoned connection cleanup thread")) {
        synchronized (t) {
            System.out.println("Forcibly stopping thread to avoid memory leak: " + t.getName());
            t.stop(); // don't complain, it works
        }
    }
}

It appears to iterate over the stack traces, find the one with the text "Abandoned connection cleanup thread" and manually stop it. It seems probably that this is related to my issue?

EDIT 21/9/2015:

My application went down this weekend. Here is the stack trace from the error log from yesterday (when I believe it went down):

20-Sep-2015 14:22:18.160 SEVERE [http-apr-8080-exec-35] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [edu.mssm.pharm.maayanlab.Harmonizome.api.GeneMetadataApi] in context with path [/Harmonizome] threw exception
 org.hibernate.exception.GenericJDBCException: Could not open connection
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:304)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection(LogicalConnectionImpl.java:169)
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:67)
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.begin(AbstractTransactionImpl.java:160)
    at org.hibernate.internal.SessionImpl.beginTransaction(SessionImpl.java:1395)
    at org.hibernate.collection.internal.AbstractPersistentCollection.withTemporarySessionIfNeeded(AbstractPersistentCollection.java:224)
    at org.hibernate.collection.internal.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:545)
    at org.hibernate.collection.internal.AbstractPersistentCollection.read(AbstractPersistentCollection.java:124)
    at org.hibernate.collection.internal.PersistentSet.iterator(PersistentSet.java:180)
    at edu.mssm.pharm.maayanlab.Harmonizome.json.serdes.GeneMetadataSerializer.serialize(GeneMetadataSerializer.java:54)
    at edu.mssm.pharm.maayanlab.Harmonizome.json.serdes.GeneMetadataSerializer.serialize(GeneMetadataSerializer.java:23)
    at com.google.gson.TreeTypeAdapter.write(TreeTypeAdapter.java:70)
    at com.google.gson.Gson.toJson(Gson.java:600)
    at com.google.gson.Gson.toJson(Gson.java:579)
    at com.google.gson.Gson.toJson(Gson.java:534)
    at edu.mssm.pharm.maayanlab.Harmonizome.api.GeneMetadataApi.doGet(GeneMetadataApi.java:65)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:673)
    at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2440)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-apr-8080-exec-35] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:5; busy:5; idle:0; lastwait:30000].
    at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:672)
    at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:186)
    at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:127)
    at org.hibernate.connection.TomcatJDBCConnectionProvider.getConnection(TomcatJDBCConnectionProvider.java:208)
    at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:292)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:297)

Here is are my connection variables from MySQL:

mysql>  SHOW VARIABLES LIKE '%connect%';
+-----------------------------------------------+-----------------+
| Variable_name                                 | Value           |
+-----------------------------------------------+-----------------+
| character_set_connection                      | utf8            |
| collation_connection                          | utf8_general_ci |
| connect_timeout                               | 5               |
| default_master_connection                     |                 |
| extra_max_connections                         | 1               |
| init_connect                                  |                 |
| max_connect_errors                            | 100             |
| max_connections                               | 100             |
| max_user_connections                          | 0               |
| performance_schema_session_connect_attrs_size | 512             |
+-----------------------------------------------+-----------------+

mysql>  SHOW VARIABLES LIKE '%timeout%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| connect_timeout             | 5        |
| deadlock_timeout_long       | 50000000 |
| deadlock_timeout_short      | 10000    |
| delayed_insert_timeout      | 300      |
| innodb_flush_log_at_timeout | 1        |
| innodb_lock_wait_timeout    | 50       |
| innodb_rollback_on_timeout  | OFF      |
| interactive_timeout         | 28800    |
| lock_wait_timeout           | 31536000 |
| net_read_timeout            | 30       |
| net_write_timeout           | 60       |
| slave_net_timeout           | 3600     |
| thread_pool_idle_timeout    | 60       |
| wait_timeout                | 28800    |
+-----------------------------+----------+

EDIT 22/9/2015:

Would a SEVERE Tomcat error cause the issue? I am seeing an error, unrelated to the database, about parsing a date:

22-Sep-2015 10:09:53.481 SEVERE [http-apr-8080-exec-26] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [edu.mssm.pharm.maayanlab.Harmonizome.page.DatasetPage] in context with path [/Harmonizome] threw exception [javax.servlet.ServletException: javax.servlet.jsp.JspException: In &lt;parseDate&gt;, a parse locale can not be established] with root cause
 javax.servlet.jsp.JspException: In &lt;parseDate&gt;, a parse locale can not be established
    at org.apache.taglibs.standard.tag.common.fmt.ParseDateSupport.doEndTag(ParseDateSupport.java:147)

Attaching JConsole output of heap memory usage:

enter image description here

JConsole output for thread usage; it started around 24-25 and jumped up to 34 once I started using the site. Even after closing the browser window, it remained there:

enter image description here

EDIT 23/9/2015:

One thing I changed right before the issue began was how I deal with Hibernate transactions. Previously, I had enable_lazy_load_no_trans disabled (which is the default). Previously, I was using the "open session in view" pattern. It seemed like people didn't like the open session in view pattern, so I enabled enable_lazy_load_no_trans. Thus, I have code like this:

List<MyObjects> myObjects = null;
try {
    HibernateUtil.beginTransaction();
    myObjects = // fetch my objects from the DB
    HibernateUtil.commitTransaction();
} catch (HibernateException he) {
    HibernateUtil.rollbackTransaction();
} finally {
    HibernateUtil.close();
}

// render myObjects in JSP/JSTL
// this JSP may lazily load related objects

In retrospect, this seems... problematic. I have no idea when Hibernate "lets go" of the objects.

Community
  • 1
  • 1
gwg
  • 7,111
  • 8
  • 50
  • 87
  • Memory Issue ? Any other application were deployed on the machine where the Mysql and Application is deployed ? – Gaurav Sep 18 '15 at 14:43
  • I'm not experienced with this connection pool implementation, but I'd start debugging with implementing QueryTimeoutInterceptor : https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html#org.apache.tomcat.jdbc.pool.interceptor.QueryTimeoutInterceptor – a1ex07 Sep 18 '15 at 14:43
  • @Gaurav, a couple other small applications are running on the same machine. We have a cluster, and for historical reasons, I had a constraint in place to run the application on the same machine every time, but I've just redeployed so it can run anywhere, in case that one machine has issues. – gwg Sep 18 '15 at 14:53
  • Have you looked in the db logfiles, to see if there an error. Post the Result from: SHOW VARIABLES LIKE '%connect%'; – Bernd Buffen Sep 18 '15 at 14:59
  • Maybe it is a memory leak. Please see my latest edit. @BerndBuffen, I will look into your suggestion as well. – gwg Sep 18 '15 at 21:00
  • Do you have any application logs from when the error occurred? If so please can you post them. – Alex Barnes Sep 20 '15 at 20:12
  • @Alex, I've added the latest logs. – gwg Sep 21 '15 at 14:13
  • Just a wild guess. Java tries to fecth a free connection from the pool. The system is very busy and 30 seconds pass without a free connection.The system responds: "Timeout: Pool empty. Unable to fetch a connection". May be you could increment the size of the pool. If more connections solve the issue it may be the case – borjab Sep 21 '15 at 14:18
  • That seems reasonable, although I wonder why it only seems to happen to my application. Is there a good way to test this? I have noticed that while I've configured the pool to be 5, I typically only see 1 connection in `phpMyAdmin > Status > Processes`. Sometimes, after restarting, I'll see 5, but then in a few hours it will be back down to 1 or 0. – gwg Sep 21 '15 at 14:22
  • @BerndBuffen, I've added the results from `SHOW VARIABLES...`. – gwg Sep 21 '15 at 14:31
  • Could a developer forgot to close the connection after executing a query? Is a very typical mistake. Or it could be that your application starts with a lot of parallel queries. – borjab Sep 21 '15 at 14:40
  • Thanks, can you also send the result from: **SHOW VARIABLES LIKE '%timeout%';** the importend vars are **wait_timeout** and **interactive_timeout** . these are the time in second when the connection is inactive before the Server close the connection. wich one the connection used is dependent on the connection (there is a parameter). so in the most batch programms the connection use nteractive_timeout and this is by default 600 sec (10 minutes). you can change it by set global nteractive_timeout = 2800; – Bernd Buffen Sep 21 '15 at 17:15
  • @BerndBuffen, added. – gwg Sep 21 '15 at 19:50
  • Unfortunately, even after releasing a fix to close the Hibernate connection after a transaction, the issue persists. – gwg Sep 22 '15 at 15:04

3 Answers3

3

Hibernate errors are a bit abstract and sometimes it can be tricky to find the bug by the stack trace. I think that may be a problem of your application, maybe you're not closing Hibernate connections properly on some cases or your application may have a memory leak.

Have you tried to monitor the application with jconsole from the JDK?

You can set this on your Tomcat configuration console in the Java arguments (I'm assuming you're using Tomcat), to enable the jconsole

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8086
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

Then connect to a remote process for example

localhost:8086 

and watch the threads as you go thru the operations that make the application stop.

Edit

If you're not using Tomcat and you're running your application in a Windows environment you can monitorize the threads using for example Process Explorer and monitorize your application.

pedromendessk
  • 3,190
  • 1
  • 23
  • 34
  • Whoa. I've never seen this. Using Eclipse on OS X, I put those arguments in `Open launch configuration > Arguments`. The port should be for my application? – gwg Sep 22 '15 at 17:38
  • I've added a screenshot from the memory usage. – gwg Sep 22 '15 at 17:56
  • You must set a port that it isn't being used, to connect with jconsole. But it seems that you've managed to connect :) Can you put a print of the threads instead of the memory? Take a print of the threads, than do some random actions in place where you think you have problems, and then take another print. And share both – pedromendessk Sep 22 '15 at 20:03
  • I've added a screenshot of my application's thread usage. Mind you, this is all on my local machine right now. – gwg Sep 22 '15 at 20:41
  • And if you open another instance of your application and navigate a little more does it creates more threads (that don't get closed) or does it continues at the 30s? Try to navigate a bit longer and do more actions. The more you do the better – pedromendessk Sep 22 '15 at 20:46
  • When I started the server, the thread count was in the mid-20s. It jumped to high-20s when I opened a browser window for the first time, and then jumped to 34 once I started clicking around. I closed the browser window and waited for a minute and it stayed at 34. I then opened 7 browser windows simultaneously (on 3 separate browsers) and the thread count stayed at 34. You'll see in the attached script above that when I run the API script—it requests every entity in my DB, programmatically—the thread count stayed at 34. – gwg Sep 22 '15 at 21:07
  • Which tomcat version are you using? – pedromendessk Sep 24 '15 at 05:48
3

From the stack-trace you provided, I can draw a single conclusion: you are simply running out of connections.

This can be caused by long running transactions, possibly due to slow queries or improper application transaction boundaries.

I suggest you start using FlexyPool, which supports Tomcat DBCP and get a better understanding of both the connection and transaction usage. FlexyPool provides many histograms you might be interested in, like connection acquisition time and lease time.

An, just to be on the safe side, check the MySQL driver version too and see if you're running on an outdated library.

Vlad Mihalcea
  • 103,297
  • 39
  • 432
  • 788
  • I'll check out FlexyPool and that article. Please take a look at my edit on 23/9/2015 and let me know if my lazy loading of objects seems problematic. – gwg Sep 23 '15 at 20:21
  • Using "Open Session In View" is not really a good idea, because transactions must wait for the response to be sent to the client, in order to commit. The longer the transaction, the more pressure it will put on the connection pool, since a connection will be leased for longer intervals. – Vlad Mihalcea Sep 25 '15 at 05:54
  • It may be a bad idea, and I can look into alternatives in the future, but I removed `enable_lazy_load_no_trans` and re-implemented the open session in view pattern. My application has run without a DB connection timeout for 5 days now. I'm fairly confident we're in the clear. I'll definitely look into FlexyPool—and that article looks useful—and if you have a suggestion on a clean alternative to open session in view I'd be glad to hear it. – gwg Sep 27 '15 at 20:50
1

It seems your connection pool cannot return a free connection to Hibernate within timeout duration. This happens because your application have very long transactions or transaction dead locks. You can try following options to fix the bug.

  1. change your connection pool size in following line

    <property name="hibernate.connection.pool_size">5</property>

make the pool size about 10 and test. You should keep your eye on the count of connections to your database. If it exceeds the mysql database connection limitations change max_connections of mysql server and keep testing.

  1. Use an another connection pool. I recommend to use apache commons dbcp2. Maven dependencies of dbcp2 as follows.

    <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-dbcp2</artifactId> <version>2.1</version> </dependency>

Add dbcp2 into your POM then config dbcp2 with your application.

If it was the solution your application had only long transactions. Sometimes it may minimize the occurrence, and if it is still happening definitely your application have transaction dead locks. So you have to identify what are the possible problems with your code.

There are other alternative solutions such changing the waiting timeout to a higher value. But it is not good for your application performance and it doesn't make any sense for transaction dead locks. Finally you should remember to care about transaction management and database structure in your further developments for better performance of database.

Channa Jayamuni
  • 1,796
  • 1
  • 15
  • 29