[DBCP] Connection just obtained from datasource is invalid

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
I'm using DBCP 2.1.1 with POOL 2.4.3 (which are the latest releases as
of a few minutes ago) to handle connections to MySQL databases.  In my
program I have code like this:

  conn = ds.getConnection();
  conn.clearWarnings();
  if (conn.isValid(5))

Recently, I've been having problems where the connection is showing up
as *not* valid, and I couldn't figure out why.  My code uses
conn.getWarnings() on invalid connections to try and determine why the
connection is invalid, but it wasn't getting anything.  No exception was
being logged, so the return was likely null.  The validation query is
"SELECT 1".

After scratching my head for a while, I decided to have the code run a
"SELECT 1" query on the connection that reports invalid.  As expected
because the connection tests as invalid, that query failed, but the
specific failure was surprising:

Caused by:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No
operations allowed after connection closed.

How can a connection that I *just* retrieved from the pool be closed?

The only thing that comes to mind is that I have configured idle
connection eviction:

        opMaster.setTimeBetweenEvictionRunsMillis(Const.ONE_MINUTE);
        opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);

If the idle connection eviction is what's to blame for this (connection
becomes invalid between the time it is  retrieved and the validity is
checked), it seems like that shouldn't be possible.

Before I open an issue in Jira, I would like to know if there's anything
I can add to my code (logging, info retrieval, etc) to check what might
be going wrong -- try to confirm my theory that idle connection eviction
is to blame.  If that is indeed the problem, I would comment that
getConnection() on the datasource should probably reset the idle timer
so that the connection will not be quickly closed/pruned.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
On 11/15/17 12:50 PM, Shawn Heisey wrote:

> I'm using DBCP 2.1.1 with POOL 2.4.3 (which are the latest releases as
> of a few minutes ago) to handle connections to MySQL databases.  In my
> program I have code like this:
>
>   conn = ds.getConnection();
>   conn.clearWarnings();
>   if (conn.isValid(5))
>
> Recently, I've been having problems where the connection is showing up
> as *not* valid, and I couldn't figure out why.  My code uses
> conn.getWarnings() on invalid connections to try and determine why the
> connection is invalid, but it wasn't getting anything.  No exception was
> being logged, so the return was likely null.  The validation query is
> "SELECT 1".
>
> After scratching my head for a while, I decided to have the code run a
> "SELECT 1" query on the connection that reports invalid.  As expected
> because the connection tests as invalid, that query failed, but the
> specific failure was surprising:
>
> Caused by:
> com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No
> operations allowed after connection closed.
>
> How can a connection that I *just* retrieved from the pool be closed?

Should not happen.  In most cases like this, it turns out not to be
the pool that closed the connection.  The most common causes of this
are network connectivity problems or the server closing the
connection due to idle timeout.  The latter seems unlikely given
that you have idle eviction configured.  Is it possible that some
clients are holding connections for a very long time without using
them?  The evictor only acts on connections that are idle in the
pool (i.e. it does not do anything to connections that are checked
out to clients).

> d
>
> The only thing that comes to mind is that I have configured idle
> connection eviction:
>
>         opMaster.setTimeBetweenEvictionRunsMillis(Const.ONE_MINUTE);
>         opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);
>
> If the idle connection eviction is what's to blame for this (connection
> becomes invalid between the time it is  retrieved and the validity is
> checked), it seems like that shouldn't be possible.

That should not be possible, as once a connection has been checked
out, it is not eligible for eviction.  And even if it were evicted
by the pool, the pool would set the DelegatingConnection's closed
property to true so isValid would return false rather than
throwing.  It looks like the situation here is that the pool thinks
the connection is open, but the underlying physical connection has
been closed.

It would be good to see your full pool config and a stack trace of
the exception above.  You might also look at the server logs to see
what is happening on the Mysql side.

Phil
>
> Before I open an issue in Jira, I would like to know if there's anything
> I can add to my code (logging, info retrieval, etc) to check what might
> be going wrong -- try to confirm my theory that idle connection eviction
> is to blame.  If that is indeed the problem, I would comment that
> getConnection() on the datasource should probably reset the idle timer
> so that the connection will not be quickly closed/pruned.


>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
On 11/16/2017 5:21 PM, Phil Steitz wrote:

>> How can a connection that I *just* retrieved from the pool be closed?
> Should not happen.  In most cases like this, it turns out not to be
> the pool that closed the connection.  The most common causes of this
> are network connectivity problems or the server closing the
> connection due to idle timeout.  The latter seems unlikely given
> that you have idle eviction configured.  Is it possible that some
> clients are holding connections for a very long time without using
> them?  The evictor only acts on connections that are idle in the
> pool (i.e. it does not do anything to connections that are checked
> out to clients).

Thank you for your reply.

My program does its work once a minute, and I had also configured the
idle eviction to run once a minute.

Just to see what it would do, I slightly changed the run interval for
the idle eviction -- for one of the object pools, adding 943
milliseconds, and 944 milliseconds for the other.  I've only seen the
problem happen once since making that change, where before it would
happen several times per day.  This isn't actual evidence, but it is
curious.

Although a network problem is always a possibility, it's not likely. 
It's a gigabit LAN with Cisco switches.  I will trace the network path
and check all the switch logs to see whether that might indeed be the
problem.

> That should not be possible, as once a connection has been checked
> out, it is not eligible for eviction.  And even if it were evicted
> by the pool, the pool would set the DelegatingConnection's closed
> property to true so isValid would return false rather than
> throwing.  It looks like the situation here is that the pool thinks
> the connection is open, but the underlying physical connection has
> been closed.

The code is within a try/catch block that traps all exceptions, so I am
100 percent certain that isValid is not throwing an exception, it's just
returning false.  When I first started investigating my strange alarms,
I couldn't tell what was wrong at all until I decided to add some code
to try a query on the connection that said it wasn't valid.   I fully
expected that query to fail, I was just doing it to try and discover why
isValid returned false.

This is a very recent problem, but the code hasn't had any changes since
well before the problem began.  That does lend credibility to the
possibility of a network problem, which as I said, I will investigate.

> It would be good to see your full pool config and a stack trace of
> the exception above.  You might also look at the server logs to see
> what is happening on the Mysql side.

Tell me exactly what I need to do in order to gather the pool config
you're after.  Are you looking for Java code, or some kind of
information from the objects that I need to gather?  Here's the code
that defines a datasource.  This is the section for the master server, I
have a virtually identical code section for the "main" server, which is
a replication slave on the MySQL side.

  /*
   * Create a datasource (connection pool) for the master database server.
   */
  ConnectionFactory cfMaster = new
DriverManagerConnectionFactory(masterUrl, dbUser, dbPass);
  PoolableConnectionFactory pcfMaster = new
PoolableConnectionFactory(cfMaster, null);
  pcfMaster.setValidationQuery(validationQuery);
  pcfMaster.setValidationQueryTimeout(Const.FIVE_SECONDS / 1000);
  opMaster = new GenericObjectPool<>(pcfMaster);
  opMaster.setMaxWaitMillis(Const.THIRTY_SECONDS);
  opMaster.setMaxIdle(numShards);
  opMaster.setMaxTotal(numShards * 5);
  opMaster.setNumTestsPerEvictionRun(numShards * 5);
  opMaster.setTimeBetweenEvictionRunsMillis(Const.ONE_MINUTE + 943);
  opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);
  pcfMaster.setPool(opMaster);
  dsMaster = new PoolingDataSource<>(opMaster);

The numShards value is 6 on the server where I am doing the investigation.

The error log on all my MySQL servers isn't saying anything, but we
haven't changed the logging config, so I don't know if that's expected.

This is the stacktrace when I try a query on a connection that has just
failed the isValid check:

WARN  - 2017-11-16 13:10:20.229;   418; a; Problem with master DB server
noted.  Additional log may follow.
com.REDACTED.idxbuild.util.BuildException: Arbitrary DB query failed!
        at
com.REDACTED.idxbuild.db.Database.runArbitraryQueryOnConnection(Database.java:743)
        at
com.REDACTED.idxbuild.db.Database.getConnection(Database.java:408)
        at com.REDACTED.idxbuild.config.ConfigItem.load(ConfigItem.java:160)
        at com.REDACTED.idxbuild.config.ConfigDb.load(ConfigDb.java:251)
        at com.REDACTED.idxbuild.index.Chain$Flags.load(Chain.java:320)
        at com.REDACTED.idxbuild.index.Chain.run(Chain.java:2561)
Caused by:
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No
operations allowed after connection closed.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
        at com.mysql.jdbc.Util.getInstance(Util.java:408)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
        at
com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1187)
        at
com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1182)
        at
com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2374)
        at
com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2358)
        at
org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
        at
org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
        at
com.REDACTED.idxbuild.db.Database.runArbitraryQueryOnConnection(Database.java:736)
        ... 5 more

While reviewing the specifics of that stacktrace, I discovered a tiny
bug in my code, where problems talking to the main DB server were being
incorrectly classified as problems with the master server.  I think I've
fixed that problem, but I will have to wait for it to happen again
before I can be sure.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
On 11/17/17 12:22 PM, Shawn Heisey wrote:

> On 11/16/2017 5:21 PM, Phil Steitz wrote:
>>> How can a connection that I *just* retrieved from the pool be closed?
>> Should not happen.  In most cases like this, it turns out not to be
>> the pool that closed the connection.  The most common causes of this
>> are network connectivity problems or the server closing the
>> connection due to idle timeout.  The latter seems unlikely given
>> that you have idle eviction configured.  Is it possible that some
>> clients are holding connections for a very long time without using
>> them?  The evictor only acts on connections that are idle in the
>> pool (i.e. it does not do anything to connections that are checked
>> out to clients).
> Thank you for your reply.
>
> My program does its work once a minute, and I had also configured the
> idle eviction to run once a minute.
>
> Just to see what it would do, I slightly changed the run interval for
> the idle eviction -- for one of the object pools, adding 943
> milliseconds, and 944 milliseconds for the other.  I've only seen the
> problem happen once since making that change, where before it would
> happen several times per day.  This isn't actual evidence, but it is
> curious.

See comment below on the config.

>
> Although a network problem is always a possibility, it's not likely. 
> It's a gigabit LAN with Cisco switches.  I will trace the network path
> and check all the switch logs to see whether that might indeed be the
> problem.
>
>> That should not be possible, as once a connection has been checked
>> out, it is not eligible for eviction.  And even if it were evicted
>> by the pool, the pool would set the DelegatingConnection's closed
>> property to true so isValid would return false rather than
>> throwing.  It looks like the situation here is that the pool thinks
>> the connection is open, but the underlying physical connection has
>> been closed.
> The code is within a try/catch block that traps all exceptions, so I am
> 100 percent certain that isValid is not throwing an exception, it's just
> returning false.

OK, sorry I misread your initial post.  That means either the pool
agrees that the connection is closed (which would point to a pool or
DBCP bug) or the driver's isValid() on the underlying physical
connection is returning true (without throwing).  If it's not too
hard to do, it would be good to also log what comes back from
isClosed() on the DelegatingConnection (what dsMaster returns).  If
that returns true, then either this is a DBCP or pool bug or you
have somehow closed the DelegatingConnection via another reference
or something.  If it returns false, that means the pool thinks the
connection is open so it is unlikely that the pool closed it.

>  When I first started investigating my strange alarms,
> I couldn't tell what was wrong at all until I decided to add some code
> to try a query on the connection that said it wasn't valid.   I fully
> expected that query to fail, I was just doing it to try and discover why
> isValid returned false.
>
> This is a very recent problem, but the code hasn't had any changes since
> well before the problem began.  That does lend credibility to the
> possibility of a network problem, which as I said, I will investigate.
>
>> It would be good to see your full pool config and a stack trace of
>> the exception above.  You might also look at the server logs to see
>> what is happening on the Mysql side.
> Tell me exactly what I need to do in order to gather the pool config
> you're after.  Are you looking for Java code, or some kind of
> information from the objects that I need to gather?  Here's the code
> that defines a datasource.

What you have below is fine.  I just wanted to see the pool
configuration settings.  Looks OK, but might end up creating a lot
of connection churn having maxIdle set to 6 with maxTotal at 30.  I
guess the idea is that when it runs, you want to have an idle
connection available for each shard with the ability to add more up
to a total of 30, but you don't want those idle connections to stay
in the pool when the work isn't being done.  Do you know how many
connections actually get opened during the work period?  If that is
more than a few more than 6, you are kind of defeating the purpose
of the pool by closing them all when they get returned (as the
maxIdle setting will force).

The fact that changing the eviction interval to not exactly coincide
with when the work happens makes the problem go away is troubling. 
I can see how that might improve performance, but it does make the
pool bug scenario more likely.

One more question on the config.  Why are you running the evictor
every minute and using eviction to close idle connections?  The
maxIdle setting will keep the connection count down.  If what you
are worried about is server-side idle timeouts, you could use set
testWhileIdle to true and that would use the connections rather than
closing them.

Phil

>  This is the section for the master server, I
> have a virtually identical code section for the "main" server, which is
> a replication slave on the MySQL side.
>
>   /*
>    * Create a datasource (connection pool) for the master database server.
>    */
>   ConnectionFactory cfMaster = new
> DriverManagerConnectionFactory(masterUrl, dbUser, dbPass);
>   PoolableConnectionFactory pcfMaster = new
> PoolableConnectionFactory(cfMaster, null);
>   pcfMaster.setValidationQuery(validationQuery);
>   pcfMaster.setValidationQueryTimeout(Const.FIVE_SECONDS / 1000);
>   opMaster = new GenericObjectPool<>(pcfMaster);
>   opMaster.setMaxWaitMillis(Const.THIRTY_SECONDS);
>   opMaster.setMaxIdle(numShards);
>   opMaster.setMaxTotal(numShards * 5);
>   opMaster.setNumTestsPerEvictionRun(numShards * 5);
>   opMaster.setTimeBetweenEvictionRunsMillis(Const.ONE_MINUTE + 943);
>   opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);
>   pcfMaster.setPool(opMaster);
>   dsMaster = new PoolingDataSource<>(opMaster);
>
> The numShards value is 6 on the server where I am doing the investigation.
>
> The error log on all my MySQL servers isn't saying anything, but we
> haven't changed the logging config, so I don't know if that's expected.
>
> This is the stacktrace when I try a query on a connection that has just
> failed the isValid check:
>
> WARN  - 2017-11-16 13:10:20.229;   418; a; Problem with master DB server
> noted.  Additional log may follow.
> com.REDACTED.idxbuild.util.BuildException: Arbitrary DB query failed!
>         at
> com.REDACTED.idxbuild.db.Database.runArbitraryQueryOnConnection(Database.java:743)
>         at
> com.REDACTED.idxbuild.db.Database.getConnection(Database.java:408)
>         at com.REDACTED.idxbuild.config.ConfigItem.load(ConfigItem.java:160)
>         at com.REDACTED.idxbuild.config.ConfigDb.load(ConfigDb.java:251)
>         at com.REDACTED.idxbuild.index.Chain$Flags.load(Chain.java:320)
>         at com.REDACTED.idxbuild.index.Chain.run(Chain.java:2561)
> Caused by:
> com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No
> operations allowed after connection closed.
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>         at com.mysql.jdbc.Util.getInstance(Util.java:408)
>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
>         at
> com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1187)
>         at
> com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1182)
>         at
> com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2374)
>         at
> com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2358)
>         at
> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
>         at
> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
>         at
> com.REDACTED.idxbuild.db.Database.runArbitraryQueryOnConnection(Database.java:736)
>         ... 5 more
>
> While reviewing the specifics of that stacktrace, I discovered a tiny
> bug in my code, where problems talking to the main DB server were being
> incorrectly classified as problems with the master server.  I think I've
> fixed that problem, but I will have to wait for it to happen again
> before I can be sure.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
On 11/17/17 3:27 PM, Phil Steitz wrote:

> On 11/17/17 12:22 PM, Shawn Heisey wrote:
>> On 11/16/2017 5:21 PM, Phil Steitz wrote:
>>>> How can a connection that I *just* retrieved from the pool be closed?
>>> Should not happen.  In most cases like this, it turns out not to be
>>> the pool that closed the connection.  The most common causes of this
>>> are network connectivity problems or the server closing the
>>> connection due to idle timeout.  The latter seems unlikely given
>>> that you have idle eviction configured.  Is it possible that some
>>> clients are holding connections for a very long time without using
>>> them?  The evictor only acts on connections that are idle in the
>>> pool (i.e. it does not do anything to connections that are checked
>>> out to clients).
>> Thank you for your reply.
>>
>> My program does its work once a minute, and I had also configured the
>> idle eviction to run once a minute.
>>
>> Just to see what it would do, I slightly changed the run interval for
>> the idle eviction -- for one of the object pools, adding 943
>> milliseconds, and 944 milliseconds for the other.  I've only seen the
>> problem happen once since making that change, where before it would
>> happen several times per day.  This isn't actual evidence, but it is
>> curious.
> See comment below on the config.
>> Although a network problem is always a possibility, it's not likely. 
>> It's a gigabit LAN with Cisco switches.  I will trace the network path
>> and check all the switch logs to see whether that might indeed be the
>> problem.
>>
>>> That should not be possible, as once a connection has been checked
>>> out, it is not eligible for eviction.  And even if it were evicted
>>> by the pool, the pool would set the DelegatingConnection's closed
>>> property to true so isValid would return false rather than
>>> throwing.  It looks like the situation here is that the pool thinks
>>> the connection is open, but the underlying physical connection has
>>> been closed.
>> The code is within a try/catch block that traps all exceptions, so I am
>> 100 percent certain that isValid is not throwing an exception, it's just
>> returning false.
> OK, sorry I misread your initial post.  That means either the pool
> agrees that the connection is closed (which would point to a pool or
> DBCP bug) or the driver's isValid() on the underlying physical
> connection is returning true (without throwing).
Sorry, I meant "false" above.  DelegatingConnection's isValid is just

public boolean isValid(int timeout) throws SQLException {
        if (isClosed()) {
            return false;
        }
        try {
            return _conn.isValid(timeout);
        }
        catch (SQLException e) {
            handleException(e);
            return false;
        }
    }

Note that the validation query is not executed here.  That query is
only used by the pool lifecycle methods when you have testOnBorrow,
testOnReturn or testWhileIdle set to true.  handleExeption as
overridden by PoolableConnection rethrows.  _conn is the underlying
physical connection,

Phil

>   If it's not too
> hard to do, it would be good to also log what comes back from
> isClosed() on the DelegatingConnection (what dsMaster returns).  If
> that returns true, then either this is a DBCP or pool bug or you
> have somehow closed the DelegatingConnection via another reference
> or something.  If it returns false, that means the pool thinks the
> connection is open so it is unlikely that the pool closed it.
>>  When I first started investigating my strange alarms,
>> I couldn't tell what was wrong at all until I decided to add some code
>> to try a query on the connection that said it wasn't valid.   I fully
>> expected that query to fail, I was just doing it to try and discover why
>> isValid returned false.
>>
>> This is a very recent problem, but the code hasn't had any changes since
>> well before the problem began.  That does lend credibility to the
>> possibility of a network problem, which as I said, I will investigate.
>>
>>> It would be good to see your full pool config and a stack trace of
>>> the exception above.  You might also look at the server logs to see
>>> what is happening on the Mysql side.
>> Tell me exactly what I need to do in order to gather the pool config
>> you're after.  Are you looking for Java code, or some kind of
>> information from the objects that I need to gather?  Here's the code
>> that defines a datasource.
> What you have below is fine.  I just wanted to see the pool
> configuration settings.  Looks OK, but might end up creating a lot
> of connection churn having maxIdle set to 6 with maxTotal at 30.  I
> guess the idea is that when it runs, you want to have an idle
> connection available for each shard with the ability to add more up
> to a total of 30, but you don't want those idle connections to stay
> in the pool when the work isn't being done.  Do you know how many
> connections actually get opened during the work period?  If that is
> more than a few more than 6, you are kind of defeating the purpose
> of the pool by closing them all when they get returned (as the
> maxIdle setting will force).
>
> The fact that changing the eviction interval to not exactly coincide
> with when the work happens makes the problem go away is troubling. 
> I can see how that might improve performance, but it does make the
> pool bug scenario more likely.
>
> One more question on the config.  Why are you running the evictor
> every minute and using eviction to close idle connections?  The
> maxIdle setting will keep the connection count down.  If what you
> are worried about is server-side idle timeouts, you could use set
> testWhileIdle to true and that would use the connections rather than
> closing them.
>
> Phil
>>  This is the section for the master server, I
>> have a virtually identical code section for the "main" server, which is
>> a replication slave on the MySQL side.
>>
>>   /*
>>    * Create a datasource (connection pool) for the master database server.
>>    */
>>   ConnectionFactory cfMaster = new
>> DriverManagerConnectionFactory(masterUrl, dbUser, dbPass);
>>   PoolableConnectionFactory pcfMaster = new
>> PoolableConnectionFactory(cfMaster, null);
>>   pcfMaster.setValidationQuery(validationQuery);
>>   pcfMaster.setValidationQueryTimeout(Const.FIVE_SECONDS / 1000);
>>   opMaster = new GenericObjectPool<>(pcfMaster);
>>   opMaster.setMaxWaitMillis(Const.THIRTY_SECONDS);
>>   opMaster.setMaxIdle(numShards);
>>   opMaster.setMaxTotal(numShards * 5);
>>   opMaster.setNumTestsPerEvictionRun(numShards * 5);
>>   opMaster.setTimeBetweenEvictionRunsMillis(Const.ONE_MINUTE + 943);
>>   opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);
>>   pcfMaster.setPool(opMaster);
>>   dsMaster = new PoolingDataSource<>(opMaster);
>>
>> The numShards value is 6 on the server where I am doing the investigation.
>>
>> The error log on all my MySQL servers isn't saying anything, but we
>> haven't changed the logging config, so I don't know if that's expected.
>>
>> This is the stacktrace when I try a query on a connection that has just
>> failed the isValid check:
>>
>> WARN  - 2017-11-16 13:10:20.229;   418; a; Problem with master DB server
>> noted.  Additional log may follow.
>> com.REDACTED.idxbuild.util.BuildException: Arbitrary DB query failed!
>>         at
>> com.REDACTED.idxbuild.db.Database.runArbitraryQueryOnConnection(Database.java:743)
>>         at
>> com.REDACTED.idxbuild.db.Database.getConnection(Database.java:408)
>>         at com.REDACTED.idxbuild.config.ConfigItem.load(ConfigItem.java:160)
>>         at com.REDACTED.idxbuild.config.ConfigDb.load(ConfigDb.java:251)
>>         at com.REDACTED.idxbuild.index.Chain$Flags.load(Chain.java:320)
>>         at com.REDACTED.idxbuild.index.Chain.run(Chain.java:2561)
>> Caused by:
>> com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No
>> operations allowed after connection closed.
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>         at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
>>         at com.mysql.jdbc.Util.getInstance(Util.java:408)
>>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
>>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:897)
>>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:886)
>>         at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
>>         at
>> com.mysql.jdbc.ConnectionImpl.throwConnectionClosedException(ConnectionImpl.java:1187)
>>         at
>> com.mysql.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:1182)
>>         at
>> com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2374)
>>         at
>> com.mysql.jdbc.ConnectionImpl.createStatement(ConnectionImpl.java:2358)
>>         at
>> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
>>         at
>> org.apache.commons.dbcp2.DelegatingConnection.createStatement(DelegatingConnection.java:262)
>>         at
>> com.REDACTED.idxbuild.db.Database.runArbitraryQueryOnConnection(Database.java:736)
>>         ... 5 more
>>
>> While reviewing the specifics of that stacktrace, I discovered a tiny
>> bug in my code, where problems talking to the main DB server were being
>> incorrectly classified as problems with the master server.  I think I've
>> fixed that problem, but I will have to wait for it to happen again
>> before I can be sure.
>>
>> Thanks,
>> Shawn
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
In reply to this post by Phil Steitz
On 11/17/2017 3:27 PM, Phil Steitz wrote:

> OK, sorry I misread your initial post.  That means either the pool
> agrees that the connection is closed (which would point to a pool or
> DBCP bug) or the driver's isValid() on the underlying physical
> connection is returning true (without throwing).  If it's not too
> hard to do, it would be good to also log what comes back from
> isClosed() on the DelegatingConnection (what dsMaster returns).  If
> that returns true, then either this is a DBCP or pool bug or you
> have somehow closed the DelegatingConnection via another reference
> or something.  If it returns false, that means the pool thinks the
> connection is open so it is unlikely that the pool closed it.

I'm pretty sure (will have to check) that when my code uses the
connections it gets from the pool, it does do a close on it when
complete, but it's my understanding that this is how to return
connections to the pool, that it doesn't *truly* close the connection as
it would if I were using JDBC directly instead of via DBCP.

I have changed my program to check isClosed instead of making the
"SELECT 1" query on the invalid connection, and to log something if that
returns true.  So far the problem hasn't happened again, even though I
did set the eviction interval back to exactly one minute.

> What you have below is fine.  I just wanted to see the pool
> configuration settings.  Looks OK, but might end up creating a lot
> of connection churn having maxIdle set to 6 with maxTotal at 30.  I
> guess the idea is that when it runs, you want to have an idle
> connection available for each shard with the ability to add more up
> to a total of 30, but you don't want those idle connections to stay
> in the pool when the work isn't being done.  Do you know how many
> connections actually get opened during the work period?  If that is
> more than a few more than 6, you are kind of defeating the purpose
> of the pool by closing them all when they get returned (as the
> maxIdle setting will force).

Here's a screenshot showing the open connections on the DB server at a
particular moment.  The connections in this case are coming from the
server named "idxa1".  There are two copies of the program running --
one where numShards is 2, and one where numShards is 6.  As you can see
in the screenshot, there are twelve connections from idxa1.

https://www.dropbox.com/s/g9swe4hrq19jbcc/mysql-dbcp-connections-workbench.png?dl=0

As the program does its work, it uses one connection to the main server
to gather some information and then it can open up numShards+1
connections to the main server to actually handle updating the Solr
index from the database.  (numShards refers to the number of cold
shards, but there is one more shard in each index -- the hot shard).  On
each cycle, position information is written to the master server.

Because we had to do some work on a DB server a while back, the main
pool and the master pool are both set to the same host -- the master
server.  This is slightly unusual, but that configuration pre-dates the
recent problems with closed connections by quite a bit of time. 
Eventually I will be switching it back so the main server is one of the
slaves, but I would like to figure this problem out first.

> The fact that changing the eviction interval to not exactly coincide
> with when the work happens makes the problem go away is troubling. 
> I can see how that might improve performance, but it does make the
> pool bug scenario more likely.

Do you think I should be doing things differently than I currently do? 
If so, please feel free to make recommendations.  I'm not really sure
what your statement about performance is saying.  I'm not doing eviction
for performance reasons -- in fact, it might actually *reduce*
performance, because it's more likely that "getConnection" is going to
have to actually spin up a new TCP connection to the database.  The time
required for that doesn't worry me.

> One more question on the config.  Why are you running the evictor
> every minute and using eviction to close idle connections?  The
> maxIdle setting will keep the connection count down.  If what you
> are worried about is server-side idle timeouts, you could use set
> testWhileIdle to true and that would use the connections rather than
> closing them.

Because we keep having problems where we have too many connections open
on the database server, and when that happens, everything breaks.  I
want to be absolutely sure that the software I am responsible for is not
contributing to the problem.  At the moment, the massive numbers of
connections are being kept open by our webapps, code that I am not
responsible for.  The idle connection eviction in my programs ensures
that if my software has a connection open, that it's because it has
actually *needed* that connection in the last five minutes.  Our
application developers need to implement something similar in the code
running under Tomcat, so that there are not dozens of connections open
from each webserver with hours and hours of idle time.  I went with one
minute for the eviction interval because if I were to configure it with
a shorter interval, the program would be spending a lot of resources to
keep the connection count down.  Once a minute is (for a program)
relatively infrequent.  I can increase the interval.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
In reply to this post by Phil Steitz
On 11/16/2017 5:21 PM, Phil Steitz wrote:
> That should not be possible, as once a connection has been checked
> out, it is not eligible for eviction.  And even if it were evicted
> by the pool, the pool would set the DelegatingConnection's closed
> property to true so isValid would return false rather than
> throwing.  It looks like the situation here is that the pool thinks
> the connection is open, but the underlying physical connection has
> been closed.

I am very glad to hear you describe the intended operation this way --
the gist of which seems to say that when getConnection is called, it
*should* immediately mark that connection as ineligible for eviction.

My current suspicion is that there is a bug, where getConnection isn't
marking the connection as ineligible like it should, or maybe where
there is insufficient synchronization to prevent the eviction thread
from closing/evicting the connection before getConnection *does* mark
the connection as ineligible.  This is just a theory, but it does fit
what I've seen.  I have tried to investigate the code, but the project
has used object inheritance and abstraction very effectively, which
makes it difficult for me to follow exactly what the code is doing,
because I am not familiar with all of that inheritance.

I am not aware of any way for one thread in my code to close a
connection obtained by another thread.  My design intent is to ensure
that this is completely impossible, by handling the full lifecycle of a
connection in one thread and never passing connections between threads. 
I am not aware of any places in the program that violate this design
intent ... but I've written plenty of bugs in my time, so it's always
possible that I *have* made a mistake like that.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
On 11/21/17 11:34 AM, Shawn Heisey wrote:

> On 11/16/2017 5:21 PM, Phil Steitz wrote:
>> That should not be possible, as once a connection has been checked
>> out, it is not eligible for eviction.  And even if it were evicted
>> by the pool, the pool would set the DelegatingConnection's closed
>> property to true so isValid would return false rather than
>> throwing.  It looks like the situation here is that the pool thinks
>> the connection is open, but the underlying physical connection has
>> been closed.
> I am very glad to hear you describe the intended operation this way --
> the gist of which seems to say that when getConnection is called, it
> *should* immediately mark that connection as ineligible for eviction.
>
> My current suspicion is that there is a bug, where getConnection isn't
> marking the connection as ineligible like it should, or maybe where
> there is insufficient synchronization to prevent the eviction thread
> from closing/evicting the connection before getConnection *does* mark
> the connection as ineligible.  This is just a theory, but it does fit
> what I've seen.  I have tried to investigate the code, but the project
> has used object inheritance and abstraction very effectively, which
> makes it difficult for me to follow exactly what the code is doing,
> because I am not familiar with all of that inheritance.

It's always possible that there is a bug. The relevant code is in
commons pool.  The code to look at is in
o.a.c.pool2.impl.GenericObjectPool#evict.  The guard is here:

if (!underTest.startEvictionTest()) {
    // Object was borrowed in another thread
    // Don't count this as an eviction test so reduce i;
    i--;
    continue;
 }

The objects that the pool works with are PooledObjects.  DBCP's
PoolableConnectionFactory creates DefaultPooledObjects, which
implement the method above by returning false unless the object is
idle in the pool.  If it is idle in the pool, its state is changed
to indicate that it is under test.  This method is synchronized (on
the PooledObject's monitor), as are the other methods that read or
mutate its PooledObjectState.

Before returning an object from the pool to a client,
GenericObjectPool's borrowObject calls the PooledObject's allocate
method.  That method, again synchronized on the object, checks to
make sure that the object's state is idle in the pool.  If so, it
changes its state to allocated and before returning it to the client.

As I said above, it's always possible that there is a bug somewhere,
but the setup is simple and it has been hammered pretty hard for a
few years now.
>
> I am not aware of any way for one thread in my code to close a
> connection obtained by another thread.  My design intent is to ensure
> that this is completely impossible, by handling the full lifecycle of a
> connection in one thread and never passing connections between threads. 
> I am not aware of any places in the program that violate this design
> intent ... but I've written plenty of bugs in my time, so it's always
> possible that I *have* made a mistake like that.

As I said in my first response, the most common explanation for this
kind of exception when using DBCP is that the underlying physical
connection is closed on the server (or network) side without DBCP
knowing about it.  And the most common cause of that is server-side
idle connection timeout.

It just occurred to me that since you do not have any of the testXxx
flags set to true, DBCP is never actually exercising the
connections.  Is it possible that sometimes your code checks out a
connection from the pool but does not use it?  "Idle" to DBCP/Pool
means idle in the pool - i.e., not checked out to a client.  So if
connections are checked out and checked back in without being used,
they will not be evicted, but could be timed out by the server.

Phil

>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
In reply to this post by Shawn Heisey
On 11/21/17 10:58 AM, Shawn Heisey wrote:

> On 11/17/2017 3:27 PM, Phil Steitz wrote:
>> OK, sorry I misread your initial post.  That means either the pool
>> agrees that the connection is closed (which would point to a pool or
>> DBCP bug) or the driver's isValid() on the underlying physical
>> connection is returning true (without throwing).  If it's not too
>> hard to do, it would be good to also log what comes back from
>> isClosed() on the DelegatingConnection (what dsMaster returns).  If
>> that returns true, then either this is a DBCP or pool bug or you
>> have somehow closed the DelegatingConnection via another reference
>> or something.  If it returns false, that means the pool thinks the
>> connection is open so it is unlikely that the pool closed it.
> I'm pretty sure (will have to check) that when my code uses the
> connections it gets from the pool, it does do a close on it when
> complete, but it's my understanding that this is how to return
> connections to the pool, that it doesn't *truly* close the connection as
> it would if I were using JDBC directly instead of via DBCP.

That is correct.
>
> I have changed my program to check isClosed instead of making the
> "SELECT 1" query on the invalid connection, and to log something if that
> returns true.  So far the problem hasn't happened again, even though I
> did set the eviction interval back to exactly one minute.

That will tell if something funny is going on.  Having isClosed
return true on a connection that you have just gotten from the pool
(and no other thread has messed with) would be DBCP/Pool badness.

>
>> What you have below is fine.  I just wanted to see the pool
>> configuration settings.  Looks OK, but might end up creating a lot
>> of connection churn having maxIdle set to 6 with maxTotal at 30.  I
>> guess the idea is that when it runs, you want to have an idle
>> connection available for each shard with the ability to add more up
>> to a total of 30, but you don't want those idle connections to stay
>> in the pool when the work isn't being done.  Do you know how many
>> connections actually get opened during the work period?  If that is
>> more than a few more than 6, you are kind of defeating the purpose
>> of the pool by closing them all when they get returned (as the
>> maxIdle setting will force).
> Here's a screenshot showing the open connections on the DB server at a
> particular moment.  The connections in this case are coming from the
> server named "idxa1".  There are two copies of the program running --
> one where numShards is 2, and one where numShards is 6.  As you can see
> in the screenshot, there are twelve connections from idxa1.
>
> https://www.dropbox.com/s/g9swe4hrq19jbcc/mysql-dbcp-connections-workbench.png?dl=0
>
> As the program does its work, it uses one connection to the main server
> to gather some information and then it can open up numShards+1
> connections to the main server to actually handle updating the Solr
> index from the database.  (numShards refers to the number of cold
> shards, but there is one more shard in each index -- the hot shard).  On
> each cycle, position information is written to the master server.
>
> Because we had to do some work on a DB server a while back, the main
> pool and the master pool are both set to the same host -- the master
> server.  This is slightly unusual, but that configuration pre-dates the
> recent problems with closed connections by quite a bit of time. 
> Eventually I will be switching it back so the main server is one of the
> slaves, but I would like to figure this problem out first.
>
>> The fact that changing the eviction interval to not exactly coincide
>> with when the work happens makes the problem go away is troubling. 
>> I can see how that might improve performance, but it does make the
>> pool bug scenario more likely.
> Do you think I should be doing things differently than I currently do? 
> If so, please feel free to make recommendations.  I'm not really sure
> what your statement about performance is saying.  I'm not doing eviction
> for performance reasons -- in fact, it might actually *reduce*
> performance, because it's more likely that "getConnection" is going to
> have to actually spin up a new TCP connection to the database.  The time
> required for that doesn't worry me.

The performance comment was just to point out that if you run the
evictor exactly at the time when the work happens, there is some
contention between the evictor and the borrowing threads.  But this
has really been optimized in pool2, so you probably can't measure
it.  So basically forget that comment.  The fact that the problem
goes away when you eliminate the contention remains troubling from
DBCP/Pool standpoint.

>
>> One more question on the config.  Why are you running the evictor
>> every minute and using eviction to close idle connections?  The
>> maxIdle setting will keep the connection count down.  If what you
>> are worried about is server-side idle timeouts, you could use set
>> testWhileIdle to true and that would use the connections rather than
>> closing them.
> Because we keep having problems where we have too many connections open
> on the database server, and when that happens, everything breaks.  I
> want to be absolutely sure that the software I am responsible for is not
> contributing to the problem.  At the moment, the massive numbers of
> connections are being kept open by our webapps, code that I am not
> responsible for.  The idle connection eviction in my programs ensures
> that if my software has a connection open, that it's because it has
> actually *needed* that connection in the last five minutes.  Our
> application developers need to implement something similar in the code
> running under Tomcat, so that there are not dozens of connections open
> from each webserver with hours and hours of idle time.  I went with one
> minute for the eviction interval because if I were to configure it with
> a shorter interval, the program would be spending a lot of resources to
> keep the connection count down.  Once a minute is (for a program)
> relatively infrequent.  I can increase the interval.

I get what you are trying to do now.  Should not be a problem to run
with this config if connections are dear.  It just means you end up
closing and reopening connections a lot in exchange for having fewer
of them idle at any moment in time.

Phil

>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
In reply to this post by Phil Steitz
On 11/21/2017 1:01 PM, Phil Steitz wrote:
> As I said in my first response, the most common explanation for this
> kind of exception when using DBCP is that the underlying physical
> connection is closed on the server (or network) side without DBCP
> knowing about it.  And the most common cause of that is server-side
> idle connection timeout.

I do have results from the isClosed method when the problem happens. 
That method *does* return true.

I am pretty sure that the server-side idle timeout is at MySQL's default
of 28800 seconds, or 8 hours.  The problem I am experiencing has
happened only a few minutes after starting my program, so in that case,
I would not expect a server-side timeout to happen.

> It just occurred to me that since you do not have any of the testXxx
> flags set to true, DBCP is never actually exercising the
> connections.

I have now enabled the flags for testing on create, borrow, return, and
while idle.  If this does eliminate problems, do you think my theory
might be correct, or that it's probably something else?  The
testWhileIdle flag does seem a little excessive, but I figured I would
go ahead with it.  The "SELECT 1" validation query should always be fast
as long as everything's working right ... and if it's not working right,
I am not going to mind a few seconds passing before the problem is reported.

> Is it possible that sometimes your code checks out a
> connection from the pool but does not use it?

I'm only calling the datasource getConnection in a getConnection method
of my own, and that method is only used in the methods which perform
work on a connection.  It seems unlikely that I am checking out a
connection and never using it, but to be certain I would need to do a
full review of the code in my Database class.  I should probably do that
anyway, but it's a fair amount of work, so I've been avoiding it.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
On 11/22/17 9:43 AM, Shawn Heisey wrote:
> On 11/21/2017 1:01 PM, Phil Steitz wrote:
>> As I said in my first response, the most common explanation for this
>> kind of exception when using DBCP is that the underlying physical
>> connection is closed on the server (or network) side without DBCP
>> knowing about it.  And the most common cause of that is server-side
>> idle connection timeout.
> I do have results from the isClosed method when the problem happens. 
> That method *does* return true.

That points to a Pool or DBCP bug, assuming you are sure that no
other thread has a reference to the PoolableConnection or some other
code path did not call close on it before you tested isClosed.  If
you are sure this is not happening, you should open a DBCP JIRA
(which may end up reassigned to pool).  Ideal would be to have a
test case that makes it happen.
>
> I am pretty sure that the server-side idle timeout is at MySQL's default
> of 28800 seconds, or 8 hours.  The problem I am experiencing has
> happened only a few minutes after starting my program, so in that case,
> I would not expect a server-side timeout to happen.

Makes sense, so can rule that out.
>
>> It just occurred to me that since you do not have any of the testXxx
>> flags set to true, DBCP is never actually exercising the
>> connections.
> I have now enabled the flags for testing on create, borrow, return, and
> while idle.  If this does eliminate problems, do you think my theory
> might be correct, or that it's probably something else?

Your theory that it is a pool bug?  If adding those tests makes the
problem go away, that might just be due to timing.

>  The
> testWhileIdle flag does seem a little excessive, but I figured I would
> go ahead with it.  The "SELECT 1" validation query should always be fast
> as long as everything's working right ... and if it's not working right,
> I am not going to mind a few seconds passing before the problem is reported.
>
>> Is it possible that sometimes your code checks out a
>> connection from the pool but does not use it?
> I'm only calling the datasource getConnection in a getConnection method
> of my own, and that method is only used in the methods which perform
> work on a connection.  It seems unlikely that I am checking out a
> connection and never using it, but to be certain I would need to do a
> full review of the code in my Database class.  I should probably do that
> anyway, but it's a fair amount of work, so I've been avoiding it.

Code review is always great ;) but given that the problem happens
close to startup, it is not likely that the connections are being
closed server-side.  The fact that isClosed is returning true when
the problem happens also rules that out - it means that DBCP knows
that the connection is closed. That doesn't happen when server-side
timeouts happen.

>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
In reply to this post by Shawn Heisey
On 11/22/17 9:43 AM, Shawn Heisey wrote:

> On 11/21/2017 1:01 PM, Phil Steitz wrote:
>> As I said in my first response, the most common explanation for this
>> kind of exception when using DBCP is that the underlying physical
>> connection is closed on the server (or network) side without DBCP
>> knowing about it.  And the most common cause of that is server-side
>> idle connection timeout.
> I do have results from the isClosed method when the problem happens. 
> That method *does* return true.
>
> I am pretty sure that the server-side idle timeout is at MySQL's default
> of 28800 seconds, or 8 hours.  The problem I am experiencing has
> happened only a few minutes after starting my program, so in that case,
> I would not expect a server-side timeout to happen.
>
>> It just occurred to me that since you do not have any of the testXxx
>> flags set to true, DBCP is never actually exercising the
>> connections.
> I have now enabled the flags for testing on create, borrow, return, and
> while idle.  If this does eliminate problems, do you think my theory
> might be correct, or that it's probably something else?  The
> testWhileIdle flag does seem a little excessive, but I figured I would
> go ahead with it.  The "SELECT 1" validation query should always be fast
> as long as everything's working right ... and if it's not working right,
> I am not going to mind a few seconds passing before the problem is reported.
>
>> Is it possible that sometimes your code checks out a
>> connection from the pool but does not use it?
> I'm only calling the datasource getConnection in a getConnection method
> of my own, and that method is only used in the methods which perform
> work on a connection.

You're not storing the connection somewhere in a possibly not
thread-safe way, are you? 

Phil

>   It seems unlikely that I am checking out a
> connection and never using it, but to be certain I would need to do a
> full review of the code in my Database class.  I should probably do that
> anyway, but it's a fair amount of work, so I've been avoiding it.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
On 11/22/2017 1:56 PM, Phil Steitz wrote:
> You're not storing the connection somewhere in a possibly not
> thread-safe way, are you?

Not as far as I can tell.  Connections are mostly obtained within my
"Database" class, which has static methods that get called from
elsewhere in my code.  The connection is obtained, used to do some work,
and closed, all within single methods that are very short and do not
spawn threads.  In the two other places where I obtain a connection, the
code is also very well contained.  Connections are never passed between
threads.

The review to verify how I'm obtaining and using connections was not as
time-consuming as I thought it would be.  I can now be certain that
every connection that is obtained *is* being used.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
In reply to this post by Phil Steitz
On 11/22/2017 12:04 PM, Phil Steitz wrote:
> On 11/22/17 9:43 AM, Shawn Heisey wrote:
>> I do have results from the isClosed method when the problem happens. 
>> That method *does* return true.
> That points to a Pool or DBCP bug, assuming you are sure that no
> other thread has a reference to the PoolableConnection or some other
> code path did not call close on it before you tested isClosed.  If
> you are sure this is not happening, you should open a DBCP JIRA
> (which may end up reassigned to pool).  Ideal would be to have a
> test case that makes it happen.

I would absolutely love to have a test case that can reproduce it, but
since I haven't got any idea what the root of the problem is, I wouldn't
know how to write such a test case.

What I'd really like to do is be able to look over dbcp2 and pool2 code
to see if I can spot a problem, but I'm having a hard time following the
code.  I expected to find some kind of synchronization in the code
branching from getConnection() to prevent different threads from being
able to step on each other, but haven't seen any so far.  I can't tell
if this means I'm looking in the wrong place or not.  The object
inheritance is pretty extensive, so I'm having a hard time finding the
right place to look.

If it turns out that there is zero synchronization happening between the
idle eviction thread and the depths of the code for things like
getConnection, then I don't see how any kind of guarantee can be made. 
So far the synchronization object in the eviction thread only seems to
pair with other parts of eviction, NOT with anything else in the library.

If the testXXX flags I've enabled do eliminate the problems I'm seeing,
that's awesome, and it's *A* solution, but I think I'm still running
into some kind of issue that needs to be fixed.  I just need to figure
out whether it's dbcp2, pool2, or something in my own environment.  I'm
willing to entertain the idea that it's my environment, but based on
everything that I understand about my own code and our database servers
(and I fully admit it's circumstantial evidence), it points to a problem
with the idle eviction thread.

I believe you when you say that the *intent* is for idle eviction to
never close/evict a connection that's been requested from the pool.  I
would like to verify whether the intent and what's actually implemented
are the same.  If they're not the same, then I would like to attempt a
patch.  I'm going to need help in figuring out exactly where I should be
looking in the code for dbcp2 and pool2.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Phil Steitz
On 11/22/17 2:43 PM, Shawn Heisey wrote:

> On 11/22/2017 12:04 PM, Phil Steitz wrote:
>> On 11/22/17 9:43 AM, Shawn Heisey wrote:
>>> I do have results from the isClosed method when the problem happens. 
>>> That method *does* return true.
>> That points to a Pool or DBCP bug, assuming you are sure that no
>> other thread has a reference to the PoolableConnection or some other
>> code path did not call close on it before you tested isClosed.  If
>> you are sure this is not happening, you should open a DBCP JIRA
>> (which may end up reassigned to pool).  Ideal would be to have a
>> test case that makes it happen.
> I would absolutely love to have a test case that can reproduce it, but
> since I haven't got any idea what the root of the problem is, I wouldn't
> know how to write such a test case.
>
> What I'd really like to do is be able to look over dbcp2 and pool2 code
> to see if I can spot a problem, but I'm having a hard time following the
> code.  I expected to find some kind of synchronization in the code
> branching from getConnection() to prevent different threads from being
> able to step on each other, but haven't seen any so far.  I can't tell
> if this means I'm looking in the wrong place or not.  The object
> inheritance is pretty extensive, so I'm having a hard time finding the
> right place to look.
>
> If it turns out that there is zero synchronization happening between the
> idle eviction thread and the depths of the code for things like
> getConnection, then I don't see how any kind of guarantee can be made. 
> So far the synchronization object in the eviction thread only seems to
> pair with other parts of eviction, NOT with anything else in the library.
>
> If the testXXX flags I've enabled do eliminate the problems I'm seeing,
> that's awesome, and it's *A* solution, but I think I'm still running
> into some kind of issue that needs to be fixed.  I just need to figure
> out whether it's dbcp2, pool2, or something in my own environment.  I'm
> willing to entertain the idea that it's my environment, but based on
> everything that I understand about my own code and our database servers
> (and I fully admit it's circumstantial evidence), it points to a problem
> with the idle eviction thread.
>
> I believe you when you say that the *intent* is for idle eviction to
> never close/evict a connection that's been requested from the pool.  I
> would like to verify whether the intent and what's actually implemented
> are the same.  If they're not the same, then I would like to attempt a
> patch.  I'm going to need help in figuring out exactly where I should be
> looking in the code for dbcp2 and pool2.

If the problem is the evictor closing a connection and having that
connection delivered to a client, the problem is almost certainly in
pool.  The thread-safety of the pool in this regard is engineered in
DefaultPooledObject, which is the wrapper that pool manages and
delivers to DBCP.  When the evictor visits a PooledObject (in
GenericObjectPool#evict) it tries to start the eviction test on the
object by calling its startEvictionTest method.  This method is
synchronized on the DefaultPooledObject.  Look at the code in that
method.  It checks to make sure that the object is in fact idle in
the pool.  The other half of the protection here is in
GenericObjectPool#borrowObject, which is what PoolingDataSource
calls to get a connection.  That method tries to get a PooledObject
from the pool and before handing it out (or validating it), it calls
the PooledObject's allocate method.  Look at the code for that in
DefaultPooledObject.  That method (also synchronized on the
PooledObject) checks that the object is not under eviction and sets
its state to allocated.  That is the core sync protection that
*should* make it impossible for the evictor to do anything to an
object that has been handed out to a client.

The logical place to start to get a test case that shows this
protection failing is to just set up a pool with very aggressive
eviction config (very small idle object timeout), frequent eviction
runs and a lot of concurrent borrowing.  Make sure the factory's
destroy method does something to simulate what PCF does to mark the
object as dead and see if you get any corpses handed out to
borrowers.  Also make sure that there are enough idle instances in
the pool for the evictor to visit.  For that, you probably want to
vary the borrowing load.  You can set up jmx to observe the pool
stats to see how many are idle at a given time or just log it using
the getNumIdle.  A quick look at the existing pool2 test cases does
not show exactly that scenario covered, so it would be good to add
in any case.

Phil


>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Erwin Hogeweg-2
All -

I am following this thread with some interest because many moons ago I had a weird problem after upgrading from pax-jdbc-0.9.0 to pax-jdbc-1.1.0. Intermittently the DB connection pool would get into a crazy loop and continued to spin off new OS threads until the OS (Windows Server in this case) ran out of threads/handles, at which point the whole application crashed and burned.

I was never able to reliably reproduce this, and it only seemed to happen on Windows OS when many new connections are opened ’simultaneous'. In the end we rolled back to 0.9.0 and life is good ever since.

DB server is mySQL-5.7
Java8

I have NO idea if this is related or not, but it might… so I figure I share the info.


Cheers,

Erwin


On Nov 22, 2017, at 19:00, Phil Steitz <[hidden email]<mailto:[hidden email]>> wrote:

On 11/22/17 2:43 PM, Shawn Heisey wrote:
On 11/22/2017 12:04 PM, Phil Steitz wrote:
On 11/22/17 9:43 AM, Shawn Heisey wrote:
I do have results from the isClosed method when the problem happens.
That method *does* return true.
That points to a Pool or DBCP bug, assuming you are sure that no
other thread has a reference to the PoolableConnection or some other
code path did not call close on it before you tested isClosed.  If
you are sure this is not happening, you should open a DBCP JIRA
(which may end up reassigned to pool).  Ideal would be to have a
test case that makes it happen.
I would absolutely love to have a test case that can reproduce it, but
since I haven't got any idea what the root of the problem is, I wouldn't
know how to write such a test case.

What I'd really like to do is be able to look over dbcp2 and pool2 code
to see if I can spot a problem, but I'm having a hard time following the
code.  I expected to find some kind of synchronization in the code
branching from getConnection() to prevent different threads from being
able to step on each other, but haven't seen any so far.  I can't tell
if this means I'm looking in the wrong place or not.  The object
inheritance is pretty extensive, so I'm having a hard time finding the
right place to look.

If it turns out that there is zero synchronization happening between the
idle eviction thread and the depths of the code for things like
getConnection, then I don't see how any kind of guarantee can be made.
So far the synchronization object in the eviction thread only seems to
pair with other parts of eviction, NOT with anything else in the library.

If the testXXX flags I've enabled do eliminate the problems I'm seeing,
that's awesome, and it's *A* solution, but I think I'm still running
into some kind of issue that needs to be fixed.  I just need to figure
out whether it's dbcp2, pool2, or something in my own environment.  I'm
willing to entertain the idea that it's my environment, but based on
everything that I understand about my own code and our database servers
(and I fully admit it's circumstantial evidence), it points to a problem
with the idle eviction thread.

I believe you when you say that the *intent* is for idle eviction to
never close/evict a connection that's been requested from the pool.  I
would like to verify whether the intent and what's actually implemented
are the same.  If they're not the same, then I would like to attempt a
patch.  I'm going to need help in figuring out exactly where I should be
looking in the code for dbcp2 and pool2.

If the problem is the evictor closing a connection and having that
connection delivered to a client, the problem is almost certainly in
pool.  The thread-safety of the pool in this regard is engineered in
DefaultPooledObject, which is the wrapper that pool manages and
delivers to DBCP.  When the evictor visits a PooledObject (in
GenericObjectPool#evict) it tries to start the eviction test on the
object by calling its startEvictionTest method.  This method is
synchronized on the DefaultPooledObject.  Look at the code in that
method.  It checks to make sure that the object is in fact idle in
the pool.  The other half of the protection here is in
GenericObjectPool#borrowObject, which is what PoolingDataSource
calls to get a connection.  That method tries to get a PooledObject
from the pool and before handing it out (or validating it), it calls
the PooledObject's allocate method.  Look at the code for that in
DefaultPooledObject.  That method (also synchronized on the
PooledObject) checks that the object is not under eviction and sets
its state to allocated.  That is the core sync protection that
*should* make it impossible for the evictor to do anything to an
object that has been handed out to a client.

The logical place to start to get a test case that shows this
protection failing is to just set up a pool with very aggressive
eviction config (very small idle object timeout), frequent eviction
runs and a lot of concurrent borrowing.  Make sure the factory's
destroy method does something to simulate what PCF does to mark the
object as dead and see if you get any corpses handed out to
borrowers.  Also make sure that there are enough idle instances in
the pool for the evictor to visit.  For that, you probably want to
vary the borrowing load.  You can set up jmx to observe the pool
stats to see how many are idle at a given time or just log it using
the getNumIdle.  A quick look at the existing pool2 test cases does
not show exactly that scenario covered, so it would be good to add
in any case.

Phil



Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]<mailto:[hidden email]>
For additional commands, e-mail: [hidden email]<mailto:[hidden email]>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]<mailto:[hidden email]>
For additional commands, e-mail: [hidden email]<mailto:[hidden email]>

Erwin Hogeweg
CTO
3690 Airport Road
Boca Raton, FL 33431
P. +1 (954) 556-6565
M. +1 (561) 306-7395
F. +1 (561) 948-2730
[Seecago]<http://www.seecago.com>
Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
In reply to this post by Phil Steitz
On 11/22/2017 5:00 PM, Phil Steitz wrote:

> If the problem is the evictor closing a connection and having that
> connection delivered to a client, the problem is almost certainly in
> pool.  The thread-safety of the pool in this regard is engineered in
> DefaultPooledObject, which is the wrapper that pool manages and
> delivers to DBCP.  When the evictor visits a PooledObject (in
> GenericObjectPool#evict) it tries to start the eviction test on the
> object by calling its startEvictionTest method.  This method is
> synchronized on the DefaultPooledObject.  Look at the code in that
> method.  It checks to make sure that the object is in fact idle in
> the pool.  The other half of the protection here is in
> GenericObjectPool#borrowObject, which is what PoolingDataSource
> calls to get a connection.  That method tries to get a PooledObject
> from the pool and before handing it out (or validating it), it calls
> the PooledObject's allocate method.  Look at the code for that in
> DefaultPooledObject.  That method (also synchronized on the
> PooledObject) checks that the object is not under eviction and sets
> its state to allocated.  That is the core sync protection that
> *should* make it impossible for the evictor to do anything to an
> object that has been handed out to a client.

I see the synchronization you're talking about here.  It appears that
all of the critical methods in DefaultPooledObject are synchronized (on
the object).

If you're absolutely certain that DefaultPooledObject is involved with
all of the implementation my code is using, it all looks pretty complete
to me.  So I'm really curious as to why the connection is getting
closed.  I have seen the problem only minutes after restarting my
program, so it seems unlikely that the server side is closing the
connection, since the timeout for that is 8 hours.

I did add the code a while back to test on create, borrow, return, and
while idle, but it turns out that I hadn't actually pulled it down to
the test server and recompiled.  That is now done, so we'll see if that
makes any difference.

If testing the connection on pool actions does make a difference, then
what is your speculation about what was happening when I ran into the
closed connection only minutes after restart, and would it be worthy of
an issue in Jira?  The only theory I had was a race condition between
eviction and borrowing, but unless there's something amiss in how all
the object inheritance works out, it looks like that's probably not it. 
Some kind of issue with the TCP stack in Linux (either on the machines
running my code or the MySQL server) is the only other idea I can think
of.  Or maybe a hardware/firmware issue, since it's likely that at least
one of the NICs involved is doing TCP offload.  I think that virtually
every NIC in our infrastructure has that feature and that Linux enables it.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Charles Honton
Don’t forget about any proxies or routers between the client and the server. Most infrastructure components will close quiescent connections (within minutes)

Chas

> On Jan 8, 2018, at 3:23 PM, Shawn Heisey <[hidden email]> wrote:
>
>> On 11/22/2017 5:00 PM, Phil Steitz wrote:
>> If the problem is the evictor closing a connection and having that
>> connection delivered to a client, the problem is almost certainly in
>> pool.  The thread-safety of the pool in this regard is engineered in
>> DefaultPooledObject, which is the wrapper that pool manages and
>> delivers to DBCP.  When the evictor visits a PooledObject (in
>> GenericObjectPool#evict) it tries to start the eviction test on the
>> object by calling its startEvictionTest method.  This method is
>> synchronized on the DefaultPooledObject.  Look at the code in that
>> method.  It checks to make sure that the object is in fact idle in
>> the pool.  The other half of the protection here is in
>> GenericObjectPool#borrowObject, which is what PoolingDataSource
>> calls to get a connection.  That method tries to get a PooledObject
>> from the pool and before handing it out (or validating it), it calls
>> the PooledObject's allocate method.  Look at the code for that in
>> DefaultPooledObject.  That method (also synchronized on the
>> PooledObject) checks that the object is not under eviction and sets
>> its state to allocated.  That is the core sync protection that
>> *should* make it impossible for the evictor to do anything to an
>> object that has been handed out to a client.
>
> I see the synchronization you're talking about here.  It appears that
> all of the critical methods in DefaultPooledObject are synchronized (on
> the object).
>
> If you're absolutely certain that DefaultPooledObject is involved with
> all of the implementation my code is using, it all looks pretty complete
> to me.  So I'm really curious as to why the connection is getting
> closed.  I have seen the problem only minutes after restarting my
> program, so it seems unlikely that the server side is closing the
> connection, since the timeout for that is 8 hours.
>
> I did add the code a while back to test on create, borrow, return, and
> while idle, but it turns out that I hadn't actually pulled it down to
> the test server and recompiled.  That is now done, so we'll see if that
> makes any difference.
>
> If testing the connection on pool actions does make a difference, then
> what is your speculation about what was happening when I ran into the
> closed connection only minutes after restart, and would it be worthy of
> an issue in Jira?  The only theory I had was a race condition between
> eviction and borrowing, but unless there's something amiss in how all
> the object inheritance works out, it looks like that's probably not it.
> Some kind of issue with the TCP stack in Linux (either on the machines
> running my code or the MySQL server) is the only other idea I can think
> of.  Or maybe a hardware/firmware issue, since it's likely that at least
> one of the NICs involved is doing TCP offload.  I think that virtually
> every NIC in our infrastructure has that feature and that Linux enables it.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

Shawn Heisey
On 1/8/2018 11:19 PM, Chas Honton wrote:
> Don’t forget about any proxies or routers between the client and the server. Most infrastructure components will close quiescent connections (within minutes)

There are no proxies for the MySQL traffic.  The machines involved are
on the same subnet (vlan), so it's only layer 2 switches made by Cisco
that provide the networking -- no routing.  There is no QoS configured. 
The connections are gigabit and the switchports are not incrementing
error counters.  The inter-switch connections are trunk ports and also
show no incrementing error counters.

Cisco switches tend to have enough backplane bandwidth to run every port
at full line rate in both directions.  We don't run our infrastructure
anywhere near full switching capacity, so dropped packets are unlikely,
but even if that did happen, TCP is designed to handle that
transparently to applications.

Cisco routers and switches do not typically interfere in TCP
communication.  Their firewalls do get involved in TCP, but even if this
traffic was passing through our firewall (which it's not), our TCP idle
timeout on the firewall is set to two hours, to accommodate long-lived
TCP connections over the Internet.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DBCP] Connection just obtained from datasource is invalid

garydgregory
Currious: are you using the current versions of DBCP and Pool?

Gary

On Jan 9, 2018 8:28 AM, "Shawn Heisey" <[hidden email]> wrote:

> On 1/8/2018 11:19 PM, Chas Honton wrote:
> > Don’t forget about any proxies or routers between the client and the
> server. Most infrastructure components will close quiescent connections
> (within minutes)
>
> There are no proxies for the MySQL traffic.  The machines involved are
> on the same subnet (vlan), so it's only layer 2 switches made by Cisco
> that provide the networking -- no routing.  There is no QoS configured.
> The connections are gigabit and the switchports are not incrementing
> error counters.  The inter-switch connections are trunk ports and also
> show no incrementing error counters.
>
> Cisco switches tend to have enough backplane bandwidth to run every port
> at full line rate in both directions.  We don't run our infrastructure
> anywhere near full switching capacity, so dropped packets are unlikely,
> but even if that did happen, TCP is designed to handle that
> transparently to applications.
>
> Cisco routers and switches do not typically interfere in TCP
> communication.  Their firewalls do get involved in TCP, but even if this
> traffic was passing through our firewall (which it's not), our TCP idle
> timeout on the firewall is set to two hours, to accommodate long-lived
> TCP connections over the Internet.
>
> Thanks,
> Shawn
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
12