Strange performance issues seen with Log.isDebugEnabled()

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
Hi, I'm seeing some very strange performance issues around the use of
commons logging, and I'm pretty much out of ideas at this point. (Long)
explanation follows...

First, two code samples (they're from a messaging system, the operation
of which shouldn't be terribly important; also, I removed some state
validation at the top of the method that's irrelevant), the only
difference is that A has two lines commented out:


SAMPLE A:
public void ackMessage(JCSMPXMLMessage msg) {
  long id = msg.getMessageIdLong();
  eAppAckResult ack_r = unackedList.applicationAck(id);
//if (Trace.isDebugEnabled()) {
    Trace.debug(String.format("appack>>> id=%s res=%s", id, ack_r));
//}
  // only increment ack stats if we're sending acks (prevent DUP-ack)
  if (ack_r == eAppAckResult.NOT_FOUND || ack_r == eAppAckResult.OK) {
    JCSMPSessionStats sessionStats = session.getSessionStats();
    sessionStats.incStat(StatType.RELIABLE_MSGS_RECVED_ACKED);
    if (msg.getMessageType().equals(MessageType.PERSISTENT)) {
 
sessionStats.incStat(StatType.RELIABLE_PERSISTENT_MSGS_RECVED_ACKED);
    } else if (msg.getMessageType().equals(MessageType.NON_PERSISTENT))
{
 
sessionStats.incStat(StatType.RELIABLE_NONPERSISTENT_MSGS_RECVED_ACKED);
    }
  }
}


SAMPLE B:
public void ackMessage(JCSMPXMLMessage msg) {
  long id = msg.getMessageIdLong();
  eAppAckResult ack_r = unackedList.applicationAck(id);
  if (Trace.isDebugEnabled()) {
    Trace.debug(String.format("appack>>> id=%s res=%s", id, ack_r));
  }
  // only increment ack stats if we're sending acks (prevent DUP-ack)
  if (ack_r == eAppAckResult.NOT_FOUND || ack_r == eAppAckResult.OK) {
    JCSMPSessionStats sessionStats = session.getSessionStats();
    sessionStats.incStat(StatType.RELIABLE_MSGS_RECVED_ACKED);
    if (msg.getMessageType().equals(MessageType.PERSISTENT)) {
 
sessionStats.incStat(StatType.RELIABLE_PERSISTENT_MSGS_RECVED_ACKED);
    } else if (msg.getMessageType().equals(MessageType.NON_PERSISTENT))
{
 
sessionStats.incStat(StatType.RELIABLE_NONPERSISTENT_MSGS_RECVED_ACKED);
    }
  }
}


The only difference between the two samples is whether or not that
"Trace.isDebugEnabled()" method is called (Trace is a
org.apache.commons.logging.Log). Know that in this scenario of course,
that method returns false (that is, debug logging is disabled, because
we care about throughput here).

You'd naturally expect that SAMPLE A would run slower, because it's
paying the cost of a String.format() on every call, and the logging
level will be checked downstream in .debug(...) anyways. And you'd be
right, if you use nanoTime() to time the execution of that whole logging
block, SAMPLE A averages ~4000ns and SAMPLE B averages ~400ns, just as
you'd expect.

But this is where expectations get thrown out of the window: when you
measure the throughput of the whole message reception system, in, let's
call them arbitrary "work units per second", in A you get 80000 WU/s and
in B you get 30000 WU/s. That's right, I didn't flip those around: the
slower A method, which runs on each message reception, yields more than
double the throughput, and I cannot explain why. Commenting out that
line is the only difference between the two libraries.

Pre-emptive Q/A:

Q) What type of Logger is instantiated?
A) Jdk14Logger, but I'm seeing the same results with my own NO-OP logger
(see below).

Q) Are you sure isDebugEnabled() isn't just ridiculously slow?
A) Yes. Validated by timing the calls, and also running under a
profiler: it uses a negligible amount of time. Also, I completely ripped
out the apache commons logging library and put in my own
"org.apache.commons.logging.Log" implementation in the classpath, where
isDebugEnabled() returns false, and the logging calls such as
.debug(...) are just an empty block. I'm seeing the exact same effect
with my zero-cost methods.

Q) Does isDebugEnabled() acquire any locks?
A) No. Stepped through the whole thing in a debugger, doesn't seem to be
taking any locks. Especially not with my hacked version, which does
nothing.

Q) What is this "Trace" variable?
A) It's a private, static instance of a Logger for the current class:
"private static final Log Trace =
LogFactory.getLog(FlowHandleImpl.class);". Just in case that matters,
I've also tried switching it to a private local instance variable, which
didn't affect the throughput in any way.

Q) How long have you been staring at this?
A) Far. Too. Long. I've narrowed down the difference to THAT ONE LINE.

Q) Anything else?
A) ackMessage() is always called by the same single thread. Also, using
perfmon on Windows shows that when the messaging system with SAMPLE A is
running, it's seeing, globally, 4000 ctx switches/sec. The same
messaging system built with SAMPLE B sees 35000 ctx switches/sec.
Clearly, this is related, but I have no idea why this is happening, and
it's happening even with my hacked Logger implementation that just
consists of "boolean isDebugEnabled() {return false;}".

Conclusion: I'm pretty much out of ideas for explaining this. Clearly
it's not the isDebugEnabled() call itself that's killing us, it's just
having some other weird effect on the system. Which is somehow resulting
in a 60% throughput reduction.

Thanks in advance to anyone that can offer opinions on what's going
on...
Jean-Philippe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

simon.kitching@chello.at

On Wed, 2008-06-25 at 17:11 -0400, Jean-Philippe Daigle wrote:

> Hi, I'm seeing some very strange performance issues around the use of
> commons logging, and I'm pretty much out of ideas at this point. (Long)
> explanation follows...
>
> First, two code samples (they're from a messaging system, the operation
> of which shouldn't be terribly important; also, I removed some state
> validation at the top of the method that's irrelevant), the only
> difference is that A has two lines commented out:
>
>
> SAMPLE A:
> public void ackMessage(JCSMPXMLMessage msg) {
>   long id = msg.getMessageIdLong();
>   eAppAckResult ack_r = unackedList.applicationAck(id);
> //if (Trace.isDebugEnabled()) {
>     Trace.debug(String.format("appack>>> id=%s res=%s", id, ack_r));
> //}
>   // only increment ack stats if we're sending acks (prevent DUP-ack)
>   if (ack_r == eAppAckResult.NOT_FOUND || ack_r == eAppAckResult.OK) {
>     JCSMPSessionStats sessionStats = session.getSessionStats();
>     sessionStats.incStat(StatType.RELIABLE_MSGS_RECVED_ACKED);
>     if (msg.getMessageType().equals(MessageType.PERSISTENT)) {
>  
> sessionStats.incStat(StatType.RELIABLE_PERSISTENT_MSGS_RECVED_ACKED);
>     } else if (msg.getMessageType().equals(MessageType.NON_PERSISTENT))
> {
>  
> sessionStats.incStat(StatType.RELIABLE_NONPERSISTENT_MSGS_RECVED_ACKED);
>     }
>   }
> }
>
>
> SAMPLE B:
> public void ackMessage(JCSMPXMLMessage msg) {
>   long id = msg.getMessageIdLong();
>   eAppAckResult ack_r = unackedList.applicationAck(id);
>   if (Trace.isDebugEnabled()) {
>     Trace.debug(String.format("appack>>> id=%s res=%s", id, ack_r));
>   }
>   // only increment ack stats if we're sending acks (prevent DUP-ack)
>   if (ack_r == eAppAckResult.NOT_FOUND || ack_r == eAppAckResult.OK) {
>     JCSMPSessionStats sessionStats = session.getSessionStats();
>     sessionStats.incStat(StatType.RELIABLE_MSGS_RECVED_ACKED);
>     if (msg.getMessageType().equals(MessageType.PERSISTENT)) {
>  
> sessionStats.incStat(StatType.RELIABLE_PERSISTENT_MSGS_RECVED_ACKED);
>     } else if (msg.getMessageType().equals(MessageType.NON_PERSISTENT))
> {
>  
> sessionStats.incStat(StatType.RELIABLE_NONPERSISTENT_MSGS_RECVED_ACKED);
>     }
>   }
> }
>
>
> The only difference between the two samples is whether or not that
> "Trace.isDebugEnabled()" method is called (Trace is a
> org.apache.commons.logging.Log). Know that in this scenario of course,
> that method returns false (that is, debug logging is disabled, because
> we care about throughput here).
>
> You'd naturally expect that SAMPLE A would run slower, because it's
> paying the cost of a String.format() on every call, and the logging
> level will be checked downstream in .debug(...) anyways. And you'd be
> right, if you use nanoTime() to time the execution of that whole logging
> block, SAMPLE A averages ~4000ns and SAMPLE B averages ~400ns, just as
> you'd expect.
>
> But this is where expectations get thrown out of the window: when you
> measure the throughput of the whole message reception system, in, let's
> call them arbitrary "work units per second", in A you get 80000 WU/s and
> in B you get 30000 WU/s. That's right, I didn't flip those around: the
> slower A method, which runs on each message reception, yields more than
> double the throughput, and I cannot explain why. Commenting out that
> line is the only difference between the two libraries.
>
> Pre-emptive Q/A:
>
> Q) What type of Logger is instantiated?
> A) Jdk14Logger, but I'm seeing the same results with my own NO-OP logger
> (see below).
>
> Q) Are you sure isDebugEnabled() isn't just ridiculously slow?
> A) Yes. Validated by timing the calls, and also running under a
> profiler: it uses a negligible amount of time. Also, I completely ripped
> out the apache commons logging library and put in my own
> "org.apache.commons.logging.Log" implementation in the classpath, where
> isDebugEnabled() returns false, and the logging calls such as
> .debug(...) are just an empty block. I'm seeing the exact same effect
> with my zero-cost methods.
>
> Q) Does isDebugEnabled() acquire any locks?
> A) No. Stepped through the whole thing in a debugger, doesn't seem to be
> taking any locks. Especially not with my hacked version, which does
> nothing.
>
> Q) What is this "Trace" variable?
> A) It's a private, static instance of a Logger for the current class:
> "private static final Log Trace =
> LogFactory.getLog(FlowHandleImpl.class);". Just in case that matters,
> I've also tried switching it to a private local instance variable, which
> didn't affect the throughput in any way.
>
> Q) How long have you been staring at this?
> A) Far. Too. Long. I've narrowed down the difference to THAT ONE LINE.
>
> Q) Anything else?
> A) ackMessage() is always called by the same single thread. Also, using
> perfmon on Windows shows that when the messaging system with SAMPLE A is
> running, it's seeing, globally, 4000 ctx switches/sec. The same
> messaging system built with SAMPLE B sees 35000 ctx switches/sec.
> Clearly, this is related, but I have no idea why this is happening, and
> it's happening even with my hacked Logger implementation that just
> consists of "boolean isDebugEnabled() {return false;}".
>
> Conclusion: I'm pretty much out of ideas for explaining this. Clearly
> it's not the isDebugEnabled() call itself that's killing us, it's just
> having some other weird effect on the system. Which is somehow resulting
> in a 60% throughput reduction.
>
> Thanks in advance to anyone that can offer opinions on what's going
> on...

Wow. Really weird.

I'll try to find some time to stare at the Log class tomorrow. But
you've been so thorough I'm not sure what I might spot that you haven't
already seen.

I presume you've also just tried
  if (Trace.isDebugEnabled()) {
    // do nothing
  }
and that the performance impact is still there?

Log is just an interface, and you say you have put in your own
implementation that returns immediately (presumably you also wrote a
LogFactory implementation that returned an instance of your Log type?).
So in that case
  Trace.isDebugEnabled()
is exactly equivalent to
  private static final SomeClass Trace = new SomeClass();
  ...
  Trace.anyOldEmptyMethod();
and it doesn't make *any* sense that that would affect performance. So
are you *really* sure that your "replacement" class actually got used?

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

Emmanuel Bourg-3
Could this be related to the JRE used ?

Emmanuel Bourg


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
In reply to this post by simon.kitching@chello.at


> -----Original Message-----
> From: simon [mailto:[hidden email]]
> Sent: Wednesday, June 25, 2008 5:34 PM
> To: Commons Users List
> Subject: Re: Strange performance issues seen with Log.isDebugEnabled()
>
>
> On Wed, 2008-06-25 at 17:11 -0400, Jean-Philippe Daigle wrote:
> > [snip]
>
> Wow. Really weird.
>
Thanks for replying, my thoughts exactly. Looks like gremlins to me.

> I'll try to find some time to stare at the Log class tomorrow. But
> you've been so thorough I'm not sure what I might spot that you
haven't
> already seen.
>
> I presume you've also just tried
>   if (Trace.isDebugEnabled()) {
>     // do nothing
>   }
> and that the performance impact is still there?

Yes. The 60% throughput degradation is there. If I remove that call to
isDebugEnabled() the throughput goes back up.


>
> Log is just an interface, and you say you have put in your own
> implementation that returns immediately (presumably you also wrote a
> LogFactory implementation that returned an instance of your Log
type?).

Sorry, should have said more about this, but my email was already too
long! I wrote three files: Log, LogImpl, and LogFactory. Log and
LogFactory clone the public interface of the equivalent "official"
versions so I don't need to recompile the whole world, just switch out a
jar (but most methods are just empty). LogFactory returns LogImpl
instances, which print out a debug line to stdout when constructed, to
make absolutely sure I'm really using my own version and not the apache
one. I then remove the official commons-logging-1.1.1.jar file from the
classpath, leaving only my version.


> So in that case
>   Trace.isDebugEnabled()
> is exactly equivalent to
>   private static final SomeClass Trace = new SomeClass();
>   ...
>   Trace.anyOldEmptyMethod();
> and it doesn't make *any* sense that that would affect performance. So
> are you *really* sure that your "replacement" class actually got used?
>

Yes, it *should* be equivalent, but I've also tried defining a Dummy
class, getting a static instance of it, and calling a dummy method (that
won't get optimized away) and it's fast (~80000 WU/s). Going back to my
custom isDebugEnabled(), boom, down to ~30000 WU/s. Unless you tell me
that calling a method on an interface is HUGELY more expensive than a
direct call (I don't think it is, and it wouldn't explain my
micro-timings anyways), I have no idea what's going on.

Thanks!
Jean-Philippe Daigle


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
In reply to this post by Emmanuel Bourg-3
> -----Original Message-----
> From: Emmanuel Bourg [mailto:[hidden email]]
> Sent: Wednesday, June 25, 2008 5:57 PM
> To: Commons Users List
> Subject: Re: Strange performance issues seen with Log.isDebugEnabled()
>
> Could this be related to the JRE used ?
>
> Emmanuel Bourg
>

Thanks for the suggestion!

I had thought about that (again, didn't mention it for fear of the email
being too long). I tried running again under JDK 1.6.0_06 (WinXP, dual
Xeon machine), and I'm seeing the same difference.

Original tests were with JDK 1.5.0_09 (<-- blast from the past, but it's
what we support).
Second wave of tests were with JDK 1.6.0_06.

The move to 1.6.0 had a "rising tide floats all boats" effect: both the
slow and the fast test case got faster, but in about the same
proportion, so there's still a large throughput difference between the
two.

Jean-Philippe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
In reply to this post by Jean-Philippe Daigle-3
> -----Original Message-----
> From: Jean-Philippe Daigle
[mailto:[hidden email]]

> Sent: Wednesday, June 25, 2008 6:06 PM
> To: Commons Users List
> Subject: RE: Strange performance issues seen with Log.isDebugEnabled()
>
>
>
> > -----Original Message-----
> > From: simon [mailto:[hidden email]]
> > Sent: Wednesday, June 25, 2008 5:34 PM
> > To: Commons Users List
> > Subject: Re: Strange performance issues seen with
Log.isDebugEnabled()

> >
> >
> > On Wed, 2008-06-25 at 17:11 -0400, Jean-Philippe Daigle wrote:
> > > [snip]
> >
> > Wow. Really weird.
> >
> Thanks for replying, my thoughts exactly. Looks like gremlins to me.
>
> > I'll try to find some time to stare at the Log class tomorrow. But
> > you've been so thorough I'm not sure what I might spot that you
> haven't
> > already seen.
> >
> > I presume you've also just tried
> >   if (Trace.isDebugEnabled()) {
> >     // do nothing
> >   }
> > and that the performance impact is still there?
>
> Yes. The 60% throughput degradation is there. If I remove that call to
> isDebugEnabled() the throughput goes back up.
>
>
> >
> > Log is just an interface, and you say you have put in your own
> > implementation that returns immediately (presumably you also wrote a
> > LogFactory implementation that returned an instance of your Log
> type?).
>
> Sorry, should have said more about this, but my email was already too
> long! I wrote three files: Log, LogImpl, and LogFactory. Log and
> LogFactory clone the public interface of the equivalent "official"
> versions so I don't need to recompile the whole world, just switch out
a
> jar (but most methods are just empty). LogFactory returns LogImpl
> instances, which print out a debug line to stdout when constructed, to
> make absolutely sure I'm really using my own version and not the
apache
> one. I then remove the official commons-logging-1.1.1.jar file from
the

> classpath, leaving only my version.
>
>
> > So in that case
> >   Trace.isDebugEnabled()
> > is exactly equivalent to
> >   private static final SomeClass Trace = new SomeClass();
> >   ...
> >   Trace.anyOldEmptyMethod();
> > and it doesn't make *any* sense that that would affect performance.
So
> > are you *really* sure that your "replacement" class actually got
used?
> >
>
> Yes, it *should* be equivalent, but I've also tried defining a Dummy
> class, getting a static instance of it, and calling a dummy method
(that
> won't get optimized away) and it's fast (~80000 WU/s). Going back to
my
> custom isDebugEnabled(), boom, down to ~30000 WU/s. Unless you tell me
> that calling a method on an interface is HUGELY more expensive than a
> direct call (I don't think it is, and it wouldn't explain my
> micro-timings anyways), I have no idea what's going on.
>
> Thanks!
> Jean-Philippe Daigle
>

Ah, so there's one more thing I hadn't tried, which I just did, and
explained a bit of the situation: completely removing the logging block
(don't check isDebugEnabled(), don't call .debug(), etc.

Removing the whole logging block results in getting the same slow
throughput case that I was blaming on isDebugEnabled().

Replacing it with just this assignment (to a public var so it's not
optimized away):

   _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);

Gives me the fast throughput case. Wow. So apache commons logging is
blameless, what determines whether you get the fast or the slow message
throughput case is actually _slowing down the thread calling the
ackMessage() method_. Slowing down this frequently-called method with
useless String.format() work results in more than double the throughput.

I have no idea why (it's counterintuitive), I suspect I need to get more
visibility into the interactions between threads here.

Thanks to everyone that read this thread, looks like the problem's
somewhere else in my code. (Though if you DO have any suggestions that
could explain what I'm seeing, let me know, even if this is no longer a
logging issue :) )

Jean-Philippe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

sebb-2-2
On 26/06/2008, Jean-Philippe Daigle
<[hidden email]> wrote:

> > -----Original Message-----
>  > From: Jean-Philippe Daigle
>  [mailto:[hidden email]]
>  > Sent: Wednesday, June 25, 2008 6:06 PM
>  > To: Commons Users List
>
> > Subject: RE: Strange performance issues seen with Log.isDebugEnabled()
>  >
>  >
>  >
>  > > -----Original Message-----
>  > > From: simon [mailto:[hidden email]]
>  > > Sent: Wednesday, June 25, 2008 5:34 PM
>  > > To: Commons Users List
>  > > Subject: Re: Strange performance issues seen with
>  Log.isDebugEnabled()
>  > >
>  > >
>  > > On Wed, 2008-06-25 at 17:11 -0400, Jean-Philippe Daigle wrote:
>  > > > [snip]
>  > >
>  > > Wow. Really weird.
>  > >
>  > Thanks for replying, my thoughts exactly. Looks like gremlins to me.
>  >
>  > > I'll try to find some time to stare at the Log class tomorrow. But
>  > > you've been so thorough I'm not sure what I might spot that you
>  > haven't
>  > > already seen.
>  > >
>  > > I presume you've also just tried
>  > >   if (Trace.isDebugEnabled()) {
>  > >     // do nothing
>  > >   }
>  > > and that the performance impact is still there?
>  >
>  > Yes. The 60% throughput degradation is there. If I remove that call to
>  > isDebugEnabled() the throughput goes back up.
>  >
>  >
>  > >
>  > > Log is just an interface, and you say you have put in your own
>  > > implementation that returns immediately (presumably you also wrote a
>  > > LogFactory implementation that returned an instance of your Log
>  > type?).
>  >
>  > Sorry, should have said more about this, but my email was already too
>  > long! I wrote three files: Log, LogImpl, and LogFactory. Log and
>  > LogFactory clone the public interface of the equivalent "official"
>  > versions so I don't need to recompile the whole world, just switch out
>  a
>  > jar (but most methods are just empty). LogFactory returns LogImpl
>  > instances, which print out a debug line to stdout when constructed, to
>  > make absolutely sure I'm really using my own version and not the
>  apache
>  > one. I then remove the official commons-logging-1.1.1.jar file from
>  the
>  > classpath, leaving only my version.
>  >
>  >
>  > > So in that case
>  > >   Trace.isDebugEnabled()
>  > > is exactly equivalent to
>  > >   private static final SomeClass Trace = new SomeClass();
>  > >   ...
>  > >   Trace.anyOldEmptyMethod();
>  > > and it doesn't make *any* sense that that would affect performance.
>  So
>  > > are you *really* sure that your "replacement" class actually got
>  used?
>  > >
>  >
>  > Yes, it *should* be equivalent, but I've also tried defining a Dummy
>  > class, getting a static instance of it, and calling a dummy method
>  (that
>  > won't get optimized away) and it's fast (~80000 WU/s). Going back to
>  my
>  > custom isDebugEnabled(), boom, down to ~30000 WU/s. Unless you tell me
>  > that calling a method on an interface is HUGELY more expensive than a
>  > direct call (I don't think it is, and it wouldn't explain my
>  > micro-timings anyways), I have no idea what's going on.
>  >
>  > Thanks!
>  > Jean-Philippe Daigle
>  >
>
>
> Ah, so there's one more thing I hadn't tried, which I just did, and
>  explained a bit of the situation: completely removing the logging block
>  (don't check isDebugEnabled(), don't call .debug(), etc.
>
>  Removing the whole logging block results in getting the same slow
>  throughput case that I was blaming on isDebugEnabled().
>
>  Replacing it with just this assignment (to a public var so it's not
>  optimized away):
>
>    _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
>
>  Gives me the fast throughput case. Wow. So apache commons logging is
>  blameless, what determines whether you get the fast or the slow message
>  throughput case is actually _slowing down the thread calling the
>  ackMessage() method_. Slowing down this frequently-called method with
>  useless String.format() work results in more than double the throughput.
>
>  I have no idea why (it's counterintuitive), I suspect I need to get more
>  visibility into the interactions between threads here.
>
>  Thanks to everyone that read this thread, looks like the problem's
>  somewhere else in my code. (Though if you DO have any suggestions that
>  could explain what I'm seeing, let me know, even if this is no longer a
>  logging issue :) )
>

What happens if you simplify the format to remove the ack_r parameter?
Is it possible that the toString() method on that parameter is having an effect?

Or perhaps try replacing the format call with a short sleep.

Maybe there is some odd thread interaction that is worse when the
enclosing method runs faster.

Or perhaps the format code is touching a volatile variable or using a
lock that allows a shared variable to be published?

>
>  Jean-Philippe
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
sebb wrote:

> On 26/06/2008, Jean-Philippe Daigle
> <[hidden email]> wrote:
>  
>>> -----Original Message-----
>>>      
>>  > From: Jean-Philippe Daigle
>>  [mailto:[hidden email]]
>>  > Sent: Wednesday, June 25, 2008 6:06 PM
>>  > To: Commons Users List
>>
>>    
>>> Subject: RE: Strange performance issues seen with Log.isDebugEnabled()
>>>      
>>  >
>>  >
>>  >
>>  > > -----Original Message-----
>>  > > From: simon [mailto:[hidden email]]
>>  > > Sent: Wednesday, June 25, 2008 5:34 PM
>>  > > To: Commons Users List
>>  > > Subject: Re: Strange performance issues seen with
>>  Log.isDebugEnabled()
>>  > >
>>  > >
>>  > > On Wed, 2008-06-25 at 17:11 -0400, Jean-Philippe Daigle wrote:
>>  > > > [snip]
>>  > >
>>  > > Wow. Really weird.
>>  > >
>>  > Thanks for replying, my thoughts exactly. Looks like gremlins to me.
>>  >
>>  > > I'll try to find some time to stare at the Log class tomorrow. But
>>  > > you've been so thorough I'm not sure what I might spot that you
>>  > haven't
>>  > > already seen.
>>  > >
>>  > > I presume you've also just tried
>>  > >   if (Trace.isDebugEnabled()) {
>>  > >     // do nothing
>>  > >   }
>>  > > and that the performance impact is still there?
>>  >
>>  > Yes. The 60% throughput degradation is there. If I remove that call to
>>  > isDebugEnabled() the throughput goes back up.
>>  >
>>  >
>>  > >
>>  > > Log is just an interface, and you say you have put in your own
>>  > > implementation that returns immediately (presumably you also wrote a
>>  > > LogFactory implementation that returned an instance of your Log
>>  > type?).
>>  >
>>  > Sorry, should have said more about this, but my email was already too
>>  > long! I wrote three files: Log, LogImpl, and LogFactory. Log and
>>  > LogFactory clone the public interface of the equivalent "official"
>>  > versions so I don't need to recompile the whole world, just switch out
>>  a
>>  > jar (but most methods are just empty). LogFactory returns LogImpl
>>  > instances, which print out a debug line to stdout when constructed, to
>>  > make absolutely sure I'm really using my own version and not the
>>  apache
>>  > one. I then remove the official commons-logging-1.1.1.jar file from
>>  the
>>  > classpath, leaving only my version.
>>  >
>>  >
>>  > > So in that case
>>  > >   Trace.isDebugEnabled()
>>  > > is exactly equivalent to
>>  > >   private static final SomeClass Trace = new SomeClass();
>>  > >   ...
>>  > >   Trace.anyOldEmptyMethod();
>>  > > and it doesn't make *any* sense that that would affect performance.
>>  So
>>  > > are you *really* sure that your "replacement" class actually got
>>  used?
>>  > >
>>  >
>>  > Yes, it *should* be equivalent, but I've also tried defining a Dummy
>>  > class, getting a static instance of it, and calling a dummy method
>>  (that
>>  > won't get optimized away) and it's fast (~80000 WU/s). Going back to
>>  my
>>  > custom isDebugEnabled(), boom, down to ~30000 WU/s. Unless you tell me
>>  > that calling a method on an interface is HUGELY more expensive than a
>>  > direct call (I don't think it is, and it wouldn't explain my
>>  > micro-timings anyways), I have no idea what's going on.
>>  >
>>  > Thanks!
>>  > Jean-Philippe Daigle
>>  >
>>
>>
>> Ah, so there's one more thing I hadn't tried, which I just did, and
>>  explained a bit of the situation: completely removing the logging block
>>  (don't check isDebugEnabled(), don't call .debug(), etc.
>>
>>  Removing the whole logging block results in getting the same slow
>>  throughput case that I was blaming on isDebugEnabled().
>>
>>  Replacing it with just this assignment (to a public var so it's not
>>  optimized away):
>>
>>    _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
>>
>>  Gives me the fast throughput case. Wow. So apache commons logging is
>>  blameless, what determines whether you get the fast or the slow message
>>  throughput case is actually _slowing down the thread calling the
>>  ackMessage() method_. Slowing down this frequently-called method with
>>  useless String.format() work results in more than double the throughput.
>>
>>  I have no idea why (it's counterintuitive), I suspect I need to get more
>>  visibility into the interactions between threads here.
>>
>>  Thanks to everyone that read this thread, looks like the problem's
>>  somewhere else in my code. (Though if you DO have any suggestions that
>>  could explain what I'm seeing, let me know, even if this is no longer a
>>  logging issue :) )
>>
>>    
>
> What happens if you simplify the format to remove the ack_r parameter?
> Is it possible that the toString() method on that parameter is having an effect?
>
> Or perhaps try replacing the format call with a short sleep.
>
>  
Removing ack_r wouldn't change the timing IMHO, it's just an enum.

I think replacing it with a sleep wouldn't tell me much for testing -
this method gets called tens of thousands of times a second, and the
shortest sleep granularity is something like 16ms. Yielding for 16ms at
a time would be pretty much like not having this thread run at all, as
far as the others are concerned.
> Maybe there is some odd thread interaction that is worse when the
> enclosing method runs faster.
>  
Bingo - The "odd thread interaction" conclusion is correct, I think.
This particular method running faster results in a catastrophe for the
system as a whole, for a reason I currently don't understand but will
have to figure out.
> Or perhaps the format code is touching a volatile variable or using a
> lock that allows a shared variable to be published?
>  
Yeah, touching a volatile would force synchronization of the thread's
memory AFAIK, and could explain some timing weirdness, but since the
arguments are a long and an enum, I don't think this can be happening.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

simon.kitching@chello.at
In reply to this post by Jean-Philippe Daigle-3
Jean-Philippe Daigle schrieb:
> Replacing it with just this assignment (to a public var so it's not
> optimized away):
>
>    _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
>
> Gives me the fast throughput case.

I wasn't too surprised to hear that your custom Log implementation was
causing a difference when it was doing IO (writing the message out). Any
IO operation requires a call into the operating system kernel which can
then trigger a context switch.

But the fact that you see a difference just when calling String.format
is rather surprising. I've had a brief look at the implementation of
String.format and don't see any operations that would require a call to
the kernel, or thread synchronization operations.

Nevertheless it does look like there is some thread scheduling issue
involved here, and that something that String.format call is doing is
affecting which threads are running and which are blocked, and that in
turn is affecting the throughput of your application.

And of course you *are* running this on Windows.

Good luck trying to tune this; if it is indeed a thread-scheduling issue
then you may well find that changes to OS version, JVM version, and
apparently unrelated parts of the app cause performance to alter.

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Strange performance issues seen with Log.isDebugEnabled()

Jörg Schaible-2
In reply to this post by Jean-Philippe Daigle-3
Jean-Philippe Daigle wrote:
[snip]

>
> Ah, so there's one more thing I hadn't tried, which I just did, and
> explained a bit of the situation: completely removing the
> logging block
> (don't check isDebugEnabled(), don't call .debug(), etc.
>
> Removing the whole logging block results in getting the same slow
> throughput case that I was blaming on isDebugEnabled().
>
> Replacing it with just this assignment (to a public var so it's not
> optimized away):
>
>    _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
>
> Gives me the fast throughput case. Wow. So apache commons logging is
> blameless, what determines whether you get the fast or the
> slow message
> throughput case is actually _slowing down the thread calling the
> ackMessage() method_. Slowing down this frequently-called method with
> useless String.format() work results in more than double the
> throughput.
>
> I have no idea why (it's counterintuitive), I suspect I need
> to get more
> visibility into the interactions between threads here.
>
> Thanks to everyone that read this thread, looks like the problem's
> somewhere else in my code. (Though if you DO have any suggestions that
> could explain what I'm seeing, let me know, even if this is
> no longer a
> logging issue :) )

Can you stip it down to detect the real call that is responsible for the speed-up?

1/ _dbg_str = ack_r.toString();

 or

2/ _dbg_str = String.valueOf(id);

 or even

3/ ack_r.toString();

alone?

What's implemented with ack_r.toString() ? What kind of implementation is ack_r ? A proxy ?

- Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

simon.kitching@chello.at
Jörg Schaible schrieb:

> Jean-Philippe Daigle wrote:
> [snip]
>  
>> Ah, so there's one more thing I hadn't tried, which I just did, and
>> explained a bit of the situation: completely removing the
>> logging block
>> (don't check isDebugEnabled(), don't call .debug(), etc.
>>
>> Removing the whole logging block results in getting the same slow
>> throughput case that I was blaming on isDebugEnabled().
>>
>> Replacing it with just this assignment (to a public var so it's not
>> optimized away):
>>
>>    _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
>>
>> Gives me the fast throughput case. Wow. So apache commons logging is
>> blameless, what determines whether you get the fast or the
>> slow message
>> throughput case is actually _slowing down the thread calling the
>> ackMessage() method_. Slowing down this frequently-called method with
>> useless String.format() work results in more than double the
>> throughput.
>>
>> I have no idea why (it's counterintuitive), I suspect I need
>> to get more
>> visibility into the interactions between threads here.
>>
>> Thanks to everyone that read this thread, looks like the problem's
>> somewhere else in my code. (Though if you DO have any suggestions that
>> could explain what I'm seeing, let me know, even if this is
>> no longer a
>> logging issue :) )
>>    
>
> Can you stip it down to detect the real call that is responsible for the speed-up?
>
> 1/ _dbg_str = ack_r.toString();
>
>  or
>
> 2/ _dbg_str = String.valueOf(id);
>
>  or even
>
> 3/ ack_r.toString();
>
> alone?
>
> What's implemented with ack_r.toString() ? What kind of implementation is ack_r ? A proxy ?
>  

Yes, good point. I missed that ack_r.toString will be triggered.

I wonder what happens if you replace the String.format call with just this:
  synchronized(this) {
    ++dummy;
  }
where dummy is a public int member, just so the synchronized block isn't
optimised away. The synchronization operation will have a number of
side-effects, including entering the OS kernel, flushing CPU caches and
allowing the OS scheduler to run.

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
[hidden email] wrote:

> Jörg Schaible schrieb:
>> Jean-Philippe Daigle wrote:
>> [snip]
>>  
>>> Ah, so there's one more thing I hadn't tried, which I just did, and
>>> explained a bit of the situation: completely removing the
>>> logging block
>>> (don't check isDebugEnabled(), don't call .debug(), etc.
>>>
>>> Removing the whole logging block results in getting the same slow
>>> throughput case that I was blaming on isDebugEnabled().
>>>
>>> Replacing it with just this assignment (to a public var so it's not
>>> optimized away):
>>>    _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
>>>
>>> Gives me the fast throughput case. Wow. So apache commons logging is
>>> blameless, what determines whether you get the fast or the
>>> slow message
>>> throughput case is actually _slowing down the thread calling the
>>> ackMessage() method_. Slowing down this frequently-called method with
>>> useless String.format() work results in more than double the
>>> throughput.
>>> I have no idea why (it's counterintuitive), I suspect I need
>>> to get more
>>> visibility into the interactions between threads here.
>>>
>>> Thanks to everyone that read this thread, looks like the problem's
>>> somewhere else in my code. (Though if you DO have any suggestions that
>>> could explain what I'm seeing, let me know, even if this is
>>> no longer a
>>> logging issue :) )
>>>    
>>
>> Can you stip it down to detect the real call that is responsible for
>> the speed-up?
>>
>> 1/ _dbg_str = ack_r.toString();
>>
>>  or
>> 2/ _dbg_str = String.valueOf(id);
>>
>>  or even
>>
>> 3/ ack_r.toString();
>>
>> alone?
>>
>> What's implemented with ack_r.toString() ? What kind of
>> implementation is ack_r ? A proxy ?
>>  
>
> Yes, good point. I missed that ack_r.toString will be triggered.
>
> I wonder what happens if you replace the String.format call with just
> this:
>  synchronized(this) {
>    ++dummy;
>  }
> where dummy is a public int member, just so the synchronized block
> isn't optimised away. The synchronization operation will have a number
> of side-effects, including entering the OS kernel, flushing CPU caches
> and allowing the OS scheduler to run.
>
> Regards,
> Simon
>
Thanks, good idea, I'll give that a shot today and see if it gives me
more info. I'll be interested in seeing how forcing anything to happen
in a synchronized block will affect the timing, but as some have
mentioned, this is probably going to end up an OS/VM specific issue so
it's going to be very hard to tune for.

To clarify, since ack_r is an instance of a simple java enum, toString()
just returns the enum's name, which definitely doesn't need any
synchronization and is trivially cheap.

To clarify another point that I've seen made,
> simon>> I wasn't too surprised to hear that your custom Log
> implementation was causing a difference when it was doing IO (writing
> the message out). Any IO operation requires a call into the operating
> system kernel which can then trigger a context switch.
>
> But the fact that you see a difference just when calling String.format
> is rather surprising. [...]
The point of the custom logger was to have no cost of its own, so it
doesn't do any I/O of course (except when first instantiated, to prove
it's really in use), its logging methods are empty. The speed
differences were traced entirely to the String.format().


Jean-Philippe Daigle


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

Christopher Schultz-2
In reply to this post by Jean-Philippe Daigle-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jean-Philippe,

Jean-Philippe Daigle wrote:
| Yes, it *should* be equivalent, but I've also tried defining a Dummy
| class, getting a static instance of it, and calling a dummy method (that
| won't get optimized away) and it's fast (~80000 WU/s). Going back to my
| custom isDebugEnabled(), boom, down to ~30000 WU/s. Unless you tell me
| that calling a method on an interface is HUGELY more expensive than a
| direct call (I don't think it is, and it wouldn't explain my
| micro-timings anyways), I have no idea what's going on.

Interface calls are a /bit/ longer than regular ones, but I wouldn't
have expected them to be twice as long. What JDK, OS, etc. are you using?

Remember that on many OSs, currentTimeNanos can lie to you and not give
you as fine-grained resolution as it should.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhjlRAACgkQ9CaO5/Lv0PCdCACeJt8ypRrH/fuhPFjENYq1S8KF
wPQAniwoo1PD7ygkGJz3Ab61gKCYkNXn
=4NlD
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

sebb-2-2
In reply to this post by simon.kitching@chello.at
On 26/06/2008, [hidden email] <[hidden email]> wrote:

> Jörg Schaible schrieb:
>
>
> > Jean-Philippe Daigle wrote:
> > [snip]
> >
> >
> > > Ah, so there's one more thing I hadn't tried, which I just did, and
> > > explained a bit of the situation: completely removing the
> > > logging block
> > > (don't check isDebugEnabled(), don't call .debug(), etc.
> > >
> > > Removing the whole logging block results in getting the same slow
> > > throughput case that I was blaming on isDebugEnabled().
> > >
> > > Replacing it with just this assignment (to a public var so it's not
> > > optimized away):
> > >   _dbg_str = String.format("appack>>> id=%s res=%s", id, ack_r);
> > >
> > > Gives me the fast throughput case. Wow. So apache commons logging is
> > > blameless, what determines whether you get the fast or the
> > > slow message
> > > throughput case is actually _slowing down the thread calling the
> > > ackMessage() method_. Slowing down this frequently-called method with
> > > useless String.format() work results in more than double the
> > > throughput.
> > > I have no idea why (it's counterintuitive), I suspect I need
> > > to get more
> > > visibility into the interactions between threads here.
> > >
> > > Thanks to everyone that read this thread, looks like the problem's
> > > somewhere else in my code. (Though if you DO have any suggestions that
> > > could explain what I'm seeing, let me know, even if this is
> > > no longer a
> > > logging issue :) )
> > >
> > >
> >
> > Can you stip it down to detect the real call that is responsible for the
> speed-up?
> >
> > 1/ _dbg_str = ack_r.toString();
> >
> >  or
> > 2/ _dbg_str = String.valueOf(id);
> >
> >  or even
> >
> > 3/ ack_r.toString();
> >
> > alone?
> >
> > What's implemented with ack_r.toString() ? What kind of implementation is
> ack_r ? A proxy ?
> >
> >
>
>  Yes, good point. I missed that ack_r.toString will be triggered.
>
>  I wonder what happens if you replace the String.format call with just this:
>   synchronized(this) {
>    ++dummy;
>   }
>  where dummy is a public int member, just so the synchronized block isn't
> optimised away. The synchronization operation will have a number of
> side-effects, including entering the OS kernel, flushing CPU caches and
> allowing the OS scheduler to run.

Or perhaps:

public volatile long dummy;

...


dummy++;

If you are running Java 1.5+ this will cause memory flushing; not sure
about the effects on the OS scheduler.

>  Regards,
>  Simon
>
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail:
> [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

Mario Ivankovits
In reply to this post by Jean-Philippe Daigle-3
Hi!

>> But the fact that you see a difference just when calling
>> String.format is rather surprising. [...]
> The point of the custom logger was to have no cost of its own, so it
> doesn't do any I/O of course (except when first instantiated, to prove
> it's really in use), its logging methods are empty. The speed
> differences were traced entirely to the String.format().
String.format() at least creates new objects (Formatter) internally,
probably some sort of synchronization happens internally the JVM there
too. Don't konw.

Anther guess: Try a simple "Thread.yield()" (instead of your message
format) to see if it is a threading isse. Maybe some other thread is
blocked due to high cpu usage of your thread!?


Ciao,
Mario


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Strange performance issues seen with Log.isDebugEnabled()

Ralph Goers
In reply to this post by Jean-Philippe Daigle-3


Jean-Philippe Daigle wrote:
> Hi, I'm seeing some very strange performance issues around the use of
> commons logging, and I'm pretty much out of ideas at this point. (Long)
> explanation follows...
>
> <snip>
>  

I suggest getting a copy of YourKit and profiling the application.
Unfortunately, with the behavior you are seeing this might slow the
application just enough to cause the problem to go away.  However, from
your description of the problem it sounds like you have some sort of
contention going on that is avoiding simply by making the processing
take a bit more time. If you take out the debugging code and stress it
hopefully YourKit will show you where the code contention actually is.

A brute force technique I've used in the past (which is fairly tedious)
is just to take a few thread dumps and then look for threads that are in
the same place across the dumps or within the same dump. In a system
without bottlenecks you wouldn't expect to see very many threads at the
same place.

Ralph

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Strange performance issues seen with Log.isDebugEnabled()

Jean-Philippe Daigle-3
> I suggest getting a copy of YourKit and profiling the application.
> Unfortunately, with the behavior you are seeing this might slow the
> application just enough to cause the problem to go away.  However,
from
> your description of the problem it sounds like you have some sort of
> contention going on that is avoiding simply by making the processing
> take a bit more time. If you take out the debugging code and stress it
> hopefully YourKit will show you where the code contention actually is.
>
> A brute force technique I've used in the past (which is fairly
tedious)
> is just to take a few thread dumps and then look for threads that are
in
> the same place across the dumps or within the same dump. In a system
> without bottlenecks you wouldn't expect to see very many threads at
the
> same place.
>
> Ralph
>

Solved!

Running it through the profiler (I have ej-technologies' JProfiler,
which I happen to like better than YourKit) finally allowed me to see
what was going on. I was looking at the whole thing too closely, from a
"what's slowing this particular thread" point of view.

Turns out (I should've suspected from the huge amount of context
switches I noted last week) that two threads were simply blocking each
other, waiting alternately on the same lock, 30000 times a second, due
to one of them doing some lengthy work in a critical section while only
a tiny portion of that work needed to be in the critical section.

The speed boost I was getting by adding the String.format() call was due
to the fact that the useless string work ever-so-slightly changed the
balance of timings in which this "greedy" thread was running, and made
the lock contention happen much less frequently. I fixed the underlying
problem and now I'm always running fast.

Thanks to everyone for the help offered, it was very useful.

Jean-Philippe Daigle
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]