[pool] New DBCP deadlock reported (DBCP-44)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[pool] New DBCP deadlock reported (DBCP-44)

Phil Steitz
Pool 1.4 has made DBCP-44 worse. The synchronization changes
implemented to address other issues in pool 1.4 have created more
opportunities for Evictor / client contention for locks on the pool
and factory-related objects.  The stack trace added to DBCP-44 on
27-feb-08 shows a new deadlock.  That particular issue could be
resolved by (re-)combining the last two synchronized blocks in
addObjectToPool, but that has some performance downside and there may
be other exposures.

While the DBCP side of this (DBCP-44) needs work as well, I think we
need to do something like the above to patch pool.  More eyeballs on
this appreciated.  I will open a pool ticket to track once I have done
some more testing.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [pool] New DBCP deadlock reported (DBCP-44)

Christoph Kutzinski
As the 'inversed' locking order in evict() is obviously the source of
the problems, I would try to avoid calling any _factory methods while
holding the 'this' lock.

I thinking about something like this:

public void evict() {
   CursorableLinkedList tmpPool;
   Collection connsToEvict = new ArrayList();
   synchronized(this) {
     tmpPool = new CursorableLinkedList(_pool);
   }

   // checking for conns to evict, calling methods on _factory and adding
   // conns to evict to connsToEvict

   synchronized(this) {
     // remove conns in connsToEvict from _pool
     // ignoring conns that are (currently) not in pool
   }
  }

Obviuosly, this doesn't work with the evictionCursor as it is
implemented now, so it would require some work on that one.

just my 2 cents,
Christoph

Phil Steitz wrote:

> Pool 1.4 has made DBCP-44 worse. The synchronization changes
> implemented to address other issues in pool 1.4 have created more
> opportunities for Evictor / client contention for locks on the pool
> and factory-related objects.  The stack trace added to DBCP-44 on
> 27-feb-08 shows a new deadlock.  That particular issue could be
> resolved by (re-)combining the last two synchronized blocks in
> addObjectToPool, but that has some performance downside and there may
> be other exposures.
>
> While the DBCP side of this (DBCP-44) needs work as well, I think we
> need to do something like the above to patch pool.  More eyeballs on
> this appreciated.  I will open a pool ticket to track once I have done
> some more testing.
>
> Phil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [pool] New DBCP deadlock reported (DBCP-44)

Phil Steitz
Thanks, Christoph

On Sat, Mar 1, 2008 at 4:21 AM, Christoph Kutzinski <[hidden email]> wrote:
> As the 'inversed' locking order in evict() is obviously the source of
>  the problems, I would try to avoid calling any _factory methods while
>  holding the 'this' lock.

+1
evict and addObject currently violate this and I think both should be fixed.

>
>  I thinking about something like this:
>
>  public void evict() {
>    CursorableLinkedList tmpPool;
>    Collection connsToEvict = new ArrayList();
>    synchronized(this) {
>      tmpPool = new CursorableLinkedList(_pool);
>    }
>
>    // checking for conns to evict, calling methods on _factory and adding
>    // conns to evict to connsToEvict
>
>    synchronized(this) {
>      // remove conns in connsToEvict from _pool
>      // ignoring conns that are (currently) not in pool
>    }
>   }
>
>  Obviuosly, this doesn't work with the evictionCursor as it is
>  implemented now, so it would require some work on that one.
>

I think something like this might work if we can solve:
1)  Maintaining the cycling behavior of the evictionCursor -  will be
tricky, but should be doable.  2)  "ignoring conns that are
(currently) not in pool" in the second synch block - hard to do
efficiently, but again should be doable
3) when to do _factory.destroyObject - can't do this is the
mid-section because borrowObject could jump in and grab the destroyed
object.  Need some way to prevent that, maybe by locking / marking the
ObjectTimeStampPair.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [pool] New DBCP deadlock reported (DBCP-44)

Christoph Kutzinski
Hi Phil,

Phil Steitz wrote:

> Thanks, Christoph
>
> On Sat, Mar 1, 2008 at 4:21 AM, Christoph Kutzinski <[hidden email]> wrote:
>> As the 'inversed' locking order in evict() is obviously the source of
>>  the problems, I would try to avoid calling any _factory methods while
>>  holding the 'this' lock.
>
> +1
> evict and addObject currently violate this and I think both should be fixed.
>
>>  I thinking about something like this:
>>
>>  public void evict() {
>>    CursorableLinkedList tmpPool;
>>    Collection connsToEvict = new ArrayList();
>>    synchronized(this) {
>>      tmpPool = new CursorableLinkedList(_pool);
>>    }
>>
>>    // checking for conns to evict, calling methods on _factory and adding
>>    // conns to evict to connsToEvict
>>
>>    synchronized(this) {
>>      // remove conns in connsToEvict from _pool
>>      // ignoring conns that are (currently) not in pool
>>    }
>>   }
>>
>>  Obviuosly, this doesn't work with the evictionCursor as it is
>>  implemented now, so it would require some work on that one.
>>
>
> I think something like this might work if we can solve:
> 1)  Maintaining the cycling behavior of the evictionCursor -  will be
> tricky, but should be doable.

I haven't looked into the implementation, but I think this shouldn't be
to hard. Especially if one would do only a 'best effort' approach, i.e.
it would be possible that pooled objects are skipped or are repeatedly
checked, a simple index into the _pool should be ok. Or am I missing
something?


>  2)  "ignoring conns that are
> (currently) not in pool" in the second synch block - hard to do
> efficiently, but again should be doable

What problems do you see here? I'm not sure why this shouldn't be
possible to do efficiently

> 3) when to do _factory.destroyObject - can't do this is the
> mid-section because borrowObject could jump in and grab the destroyed
> object.  Need some way to prevent that, maybe by locking / marking the
> ObjectTimeStampPair.

That could be done after the second synchronized block, if we remember
only the successful removed objects in connsToEvict (i.e. removing the
ones which couldn't be removed from _pool)

But that have made me think about a (probably) major flaw in this approach:
It is probably not ok to access a pooled object from within the evictor
thread while it is possibly concurrently accessed from a thread which
borrowed it from the pool (and AFAIS we cannot prevent this scenario
with this approach).
So I also thought about marking the ObjectTimeStampPair as either being
currently 'inEviction' or 'borrowed' or both (volatile boolean fields in
ObjectTimeStampPair). But I fear that this would be vulnerable to
various race conditions.

But I think that it might work if we would protect the pooled objects
via proxy objects (ObjectTimeStampPair could be reused for this) and
only access all factory methods via this one.

Sketch:
static class PooledObjectProxy implements Comparable {
   Object value;
   long tstamp;
   boolean borrowed = false;

   public synchronized setBorrowed(boolean borrowed) ...

   public synchronized boolean validateObject() {
    if(borrowed) {
      return true;
    } else {
      return _factory.validateObject( value );
    }
   }

   public synchronized void activateObject() {
    if(borrowed) {
      return;
    } else {
      _factory.activateObject( value );
    }
   }

   public synchronized void destroyObject() {
    if(borrowed) {
      // TODO: should not happen: throw Exception?
    } else {
      _factory.destroyObject( value );
    }
   }
}

public Object borrowObject() {
   // find PooledObjectProxy
   ...
   pooledObjectProxy.setBorrowed(true);
}

public void returnObject(Object obj) {
   ...
   pooledObjectProxy.setBorrowed(false);
}

However, I still think this might me vulnerable to race conditions,
maybe we also need an 'evicted' boolean in PooledObjectProxy to prevent
borrowing objects which are already designated to be evicted.
Of course this approach has also its performance downside, because every
access to PooledObjectProxy has to be synchronized on it, but it is
probably still better than the alternatives (at least I hope so).

Christoph




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [pool] New DBCP deadlock reported (DBCP-44)

Phil Steitz
On Sun, Mar 2, 2008 at 6:09 AM, Christoph Kutzinski <[hidden email]> wrote:

> Hi Phil,
>
>
>  Phil Steitz wrote:
>  > Thanks, Christoph
>  >
>  > On Sat, Mar 1, 2008 at 4:21 AM, Christoph Kutzinski <[hidden email]> wrote:
>  >> As the 'inversed' locking order in evict() is obviously the source of
>  >>  the problems, I would try to avoid calling any _factory methods while
>  >>  holding the 'this' lock.
>  >
>  > +1
>  > evict and addObject currently violate this and I think both should be fixed.
>  >
>  >>  I thinking about something like this:
>  >>
>  >>  public void evict() {
>  >>    CursorableLinkedList tmpPool;
>  >>    Collection connsToEvict = new ArrayList();
>  >>    synchronized(this) {
>  >>      tmpPool = new CursorableLinkedList(_pool);
>  >>    }
>  >>
>  >>    // checking for conns to evict, calling methods on _factory and adding
>  >>    // conns to evict to connsToEvict
>  >>
>  >>    synchronized(this) {
>  >>      // remove conns in connsToEvict from _pool
>  >>      // ignoring conns that are (currently) not in pool
>  >>    }
>  >>   }
>  >>
>  >>  Obviuosly, this doesn't work with the evictionCursor as it is
>  >>  implemented now, so it would require some work on that one.
>  >>
>  >
>  > I think something like this might work if we can solve:
>  > 1)  Maintaining the cycling behavior of the evictionCursor -  will be
>  > tricky, but should be doable.
>
>  I haven't looked into the implementation, but I think this shouldn't be
>  to hard. Especially if one would do only a 'best effort' approach, i.e.
>  it would be possible that pooled objects are skipped or are repeatedly
>  checked, a simple index into the _pool should be ok. Or am I missing
>  something?
>
No, you are not missing anything.  This should not be hard.
>
>
>  >  2)  "ignoring conns that are
>  > (currently) not in pool" in the second synch block - hard to do
>  > efficiently, but again should be doable
>
>  What problems do you see here? I'm not sure why this shouldn't be
>  possible to do efficiently
>
This gets the the issue you have identified below.

>
>  > 3) when to do _factory.destroyObject - can't do this is the
>  > mid-section because borrowObject could jump in and grab the destroyed
>  > object.  Need some way to prevent that, maybe by locking / marking the
>  > ObjectTimeStampPair.
>
>  That could be done after the second synchronized block, if we remember
>  only the successful removed objects in connsToEvict (i.e. removing the
>  ones which couldn't be removed from _pool)
>
>  But that have made me think about a (probably) major flaw in this approach:
>  It is probably not ok to access a pooled object from within the evictor
>  thread while it is possibly concurrently accessed from a thread which
>  borrowed it from the pool (and AFAIS we cannot prevent this scenario
>  with this approach).
>  So I also thought about marking the ObjectTimeStampPair as either being
>  currently 'inEviction' or 'borrowed' or both (volatile boolean fields in
>  ObjectTimeStampPair). But I fear that this would be vulnerable to
>  various race conditions.
>
What race conditions, exactly?
Not having thought this through all the way, I was thinking about an
integer property that could take one of four values
"BORROWED" - a ghost in the pool snapshot
"IDLE"  - in the pool, OK to borrow
"DEAD" - in the pool, marked for eviction
"VALIDATING" - in the pool, being validated
If we need to synchronize, similarly to AtomicInteger, we could define
a synchronized compareAndSet method that borrowObject, returnObject,
and evict could use to control access.  Synchronization would, as you
point out, cost something, but there should not be much contention for
these locks.  I need to work through the various race conditions,
though, to convince myself that a volatile field properly managed
would not suffice.

Thanks again for your thoughts on this.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [pool] New DBCP deadlock reported (DBCP-44)

Christoph Kutzinski
Phil Steitz wrote:

>>  > 3) when to do _factory.destroyObject - can't do this is the
>>  > mid-section because borrowObject could jump in and grab the destroyed
>>  > object.  Need some way to prevent that, maybe by locking / marking the
>>  > ObjectTimeStampPair.
>>
>>  That could be done after the second synchronized block, if we remember
>>  only the successful removed objects in connsToEvict (i.e. removing the
>>  ones which couldn't be removed from _pool)
>>
>>  But that have made me think about a (probably) major flaw in this approach:
>>  It is probably not ok to access a pooled object from within the evictor
>>  thread while it is possibly concurrently accessed from a thread which
>>  borrowed it from the pool (and AFAIS we cannot prevent this scenario
>>  with this approach).
>>  So I also thought about marking the ObjectTimeStampPair as either being
>>  currently 'inEviction' or 'borrowed' or both (volatile boolean fields in
>>  ObjectTimeStampPair). But I fear that this would be vulnerable to
>>  various race conditions.
>>
> What race conditions, exactly?

I was thinking to naive. I thought about e.g. getting an object from
_pool, checking if inEviction is false and then setting borrowed to
true. That would be of course vulnerable to race conditions.
But your approach with a single int status field and compareAndSet
operations to update the status should be safe.

> Not having thought this through all the way, I was thinking about an
> integer property that could take one of four values
> "BORROWED" - a ghost in the pool snapshot
> "IDLE"  - in the pool, OK to borrow
> "DEAD" - in the pool, marked for eviction
> "VALIDATING" - in the pool, being validated
> If we need to synchronize, similarly to AtomicInteger, we could define
> a synchronized compareAndSet method that borrowObject, returnObject,
> and evict could use to control access.  Synchronization would, as you
> point out, cost something, but there should not be much contention for
> these locks.  I need to work through the various race conditions,
> though, to convince myself that a volatile field properly managed
> would not suffice.

Maybe a single volatile boolean could work. I'm not sure either.
I don't think that a volatile int field would work as you never could
assure - without synchronization - that updates to that field are not
executed out of order.

greetings
Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [pool] New DBCP deadlock reported (DBCP-44)

Christoph Kutzinski

> > Not having thought this through all the way, I was thinking about an
> > integer property that could take one of four values
> > "BORROWED" - a ghost in the pool snapshot
> > "IDLE"  - in the pool, OK to borrow
> > "DEAD" - in the pool, marked for eviction
> > "VALIDATING" - in the pool, being validated
> > If we need to synchronize, similarly to AtomicInteger, we could define
> > a synchronized compareAndSet method that borrowObject, returnObject,
> > and evict could use to control access.  Synchronization would, as you
> > point out, cost something, but there should not be much contention for
> > these locks.  I need to work through the various race conditions,
> > though, to convince myself that a volatile field properly managed
> > would not suffice.
>
> Maybe a single volatile boolean could work. I'm not sure either.

Uhm, no it can't work. I see it this way: we have to check from either side, if the object was not concurrently marked by the other side. I.e. in borrowObject one has to check if the object is not currently being validated and only then may mark it as being borrowed. And vice versa for the evict method. Without synchronization or CAS this cannot work, being volatile alone is not enough.
Unless I missed something of course and you have a much smarter solution :)

greetings
Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]