1

I am facing an issue from past few days in my application which is runnning for few months now: I have configured connection pool in WebSphere Application Server with no of maximum connections to 20, min connection 1, reap time 180 sec, unused time 1800 sec, Aged timeout 0 sec in data source(oracle) for my application. From past few days we are seeing an issue where suddenly in mornings we start getting timeouts in application which then follows a strange behavior where we don't get any exception but Callable Statement used to run a procedure return zero results for all executions even though data is there in DB.

There were no exceptions in System Out or System Err logs at the time when issue starts occuring. Only exception we could see was one Stale Connection Exception and a database session kill exception

java.sql.SQLRecoverableException: ORA-00028: your session has been killed 
ORA-00028: your session has been killed

But this happened 6 hours before the timeouts in application.

Also On checking the no of connections in database we can see only 9 sessions were there although we have configured connection pool to 20. Used below query to check TOTAL_CON:

SELECT s.machine
 , s.username 
 , count(decode(s.STATUS, 'ACTIVE', 1))   as active_con
 , count(decode(s.STATUS, 'INACTIVE', 1)) as inactive_con
 , count(*)                               as total_con
FROM v$session s
WHERE type <> 'BACKGROUND'
GROUP BY username, machine 
ORDER BY total_con DESC;

We did a WAS server restart and that resolved the issue and also increased the no of sessions in database to 20.

So I have 2 questions: 1. what can cause ORA-00028 from application side as database team says no session was killed from their side. 2. Is it normal that no of total sessions in database are less than no of max connections configured in datasource and if not what can cause this.

Neel
  • 179
  • 2
  • 14
  • You've configured your connection pool to have a *maximum* of 20 connections, so only having 9 at some point doesn't necessarily seem odd. If a session was killed, that happened from the server end. If the DBAs didn't manually kill one, are there perhaps resource limits that caused sessions to be killed automatically? Was there less traffic than usual so some sessions were idle (you haven't mentioned keep-alives), leading to the listener or DB or a firewall dropping them? Were there network issues that caused Oracle to think the client was dead? – Alex Poole Dec 29 '16 at 10:49
  • Connection Timeout is set to 180 secs. I don''t think this happened because of resource limits as it happened around 3 AM when traffic is least. You mentioned "If a session was killed, that happened from the server end", what can cause this? – Neel Dec 29 '16 at 10:56
  • I am still checking if there is some firewall with TCP timeout less than reap time. Also no network issues were found during this time – Neel Dec 29 '16 at 10:59
  • I'm talking about a listener, database or network timeout - not a connection pool timeout. Something *may* have cause the network layer to be interrupted. Could be something I already thought of above, or something else I haven't thought of. I think it's impossible to tell from here. Was any database or network work done in that period? Were there any known issues with the network generally? I'm just guessing, unfortunately... But if it's a one-off thing, it's not likely to be a setting, unless it changed recently. – Alex Poole Dec 29 '16 at 10:59
  • The TCP Keepalive time is set at 1800 sec – Neel Dec 29 '16 at 11:27

1 Answers1

0
  1. what can cause ORA-00028 from application side as database team says no session was killed from their side.

I would recommend checking any firewall settings on your network. Since this issue occurred at 3am when traffic is least, the problem is likely caused by a single connection sitting around for a while and then being terminated by an external timeout (such as a firewall idle timeout).

In the comments you said that your TCP keep alive time is set to 1800 sec (30 mins) so lets assume that after 20 mins a TCP connection goes bad.

Your configuration is set up to:

  • discard a connection that goes unsused for 1800 sec (30 mins)
  • run pool maintenance every 180 sec (3 mins)
  • infinite amount of time a single connection can be open, since aged timeout is 0

With these settings, a single connection could be pooled by websphere for more than 20 mins, so I would recommend setting aged timeout to about 900 sec (15 mins) so a single connection will not be pooled for more than 20 mins.

Note that total amount of time a connection can be pooled by websphere is:

aged_timeout + reap_time = max_pooled_connection_time

And you always want firewall_timeout > max_pooled_connection_time

  1. Is it normal that no of total sessions in database are less than no of max connections configured in datasource and if not what can cause this.

Yes this is perfectly normal, and ideal. WebSphere will pool connections as needed, up to the max pool size. Once you hit max pool size consistently, performance bottlenecks start to occur.

Andy Guibert
  • 34,857
  • 7
  • 32
  • 54
  • I have now changed Min connection to 0 as I expected Stale Connection Issue happened because of that. Should I still change aged timeout from zero? Also If you could provide some inputs on what can cause this - "From past few days we are seeing an issue where suddenly in mornings we start getting timeouts in application which then follows a strange behavior where we don't get any exception but Callable Statement used to run a procedure return zero results for all executions even though data is there in DB" – Neel Dec 29 '16 at 15:23