[Math] How fast is fast enough?

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Math] How fast is fast enough?

Gilles Sadowski
Hi.

Here is a micro-benchmark report (performed with "PerfTestUtils"):
-----
nextInt() (calls per timed block: 2000000, timed blocks: 100, time
unit: ms)
                         name time/call std dev total time ratio   cv
difference
o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000 0.26
0.0000e+00
    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941 0.15
-1.2900e+02
           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097 0.04
2.1032e+02
          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239 0.14
5.1945e+02
         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374 0.14
8.1451e+02
         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450 0.06
9.7816e+02
         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763 0.08
1.6602e+03
         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795 0.14
1.7301e+03
        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074 0.16
1.6139e+02
-----
where "cv" is the ratio of the 3rd to the 2nd column.

Questions are:
* How meaningful are micro-benchmarks when the timed operation has a
very
   small duration (wrt e.g. the duration of other machine instructions
that
   are required to perform them)?
* In a given environment (HW, OS, JVM), is there a lower limit
(absolute
   duration) below which anything will be deemed good enough?
* Can a library like CM admit a trade-off between ultimate performance
and
   good design?
   IOW, is there an acceptable overhead in exchange for other qualities
   (clarity, non-redundancy, extensibility, etc.)?
* Does ultimate performance for the base functionality (generation of a
   random number) trump any consideration of use-cases that would need
an
   extension (of the base functionality, such as computation to match
another
   distribution) that will unavoidably degrades the performance (hence
the
   micro-benchmark will be completely misleading for those users)?
* What are usages of the CM RNGs?
   Do those use-cases strictly forbid "loosing" a dozen milliseconds per
   million calls?
   IOW, would those users for which such a difference matters use CM at
all?


Thanks,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Phil Steitz
On 2/4/16 3:59 PM, Gilles wrote:

> Hi.
>
> Here is a micro-benchmark report (performed with "PerfTestUtils"):
> -----
> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
> unit: ms)
>                         name time/call std dev total time ratio  
> cv difference
> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
> 0.26 0.0000e+00
>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
> 0.15 -1.2900e+02
>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
> 0.04 2.1032e+02
>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
> 0.14 5.1945e+02
>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
> 0.14 8.1451e+02
>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
> 0.06 9.7816e+02
>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
> 0.08 1.6602e+03
>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
> 0.14 1.7301e+03
>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
> 0.16 1.6139e+02
> -----
> where "cv" is the ratio of the 3rd to the 2nd column.
>
> Questions are:
> * How meaningful are micro-benchmarks when the timed operation has
> a very
>   small duration (wrt e.g. the duration of other machine
> instructions that
>   are required to perform them)?

It is harder to get good benchmarks for shorter duration activities,
but not impossible.  One thing that it would be good to do is to
compare these results with JMH [1].

> * In a given environment (HW, OS, JVM), is there a lower limit
> (absolute
>   duration) below which anything will be deemed good enough?

That depends completely on the application.

> * Can a library like CM admit a trade-off between ultimate
> performance and
>   good design?   IOW, is there an acceptable overhead in exchange
> for other qualities
>   (clarity, non-redundancy, extensibility, etc.)?

That is too general a question to be meaningful.   We need to look
at specific cases.  What exactly are you proposing?
> * Does ultimate performance for the base functionality (generation
> of a
>   random number) trump any consideration of use-cases that would
> need an
>   extension (of the base functionality, such as computation to
> match another
>   distribution) that will unavoidably degrades the performance
> (hence the
>   micro-benchmark will be completely misleading for those users)?

Again, this is vague and the answer depends on what exactly you are
talking about. Significantly damaging performance of PRNG
implementations is a bad idea, unless there are actual practical use
cases you can point to that whatever changes you are proposing enable.
 
> * What are usages of the CM RNGs?
>   Do those use-cases strictly forbid "loosing" a dozen
> milliseconds per
>   million calls?

There are many different use cases.  My own applications use them in
simulations to generate random deviates, to generate random hex
strings as identifiers and in stochastic algorithms like some of our
internal uses.  The last case is definitely sensitive to PRNG
performance.

Phil

[1] http://openjdk.java.net/projects/code-tools/jmh/
>   IOW, would those users for which such a difference matters use
> CM at all?

>
>
> Thanks,
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Gilles Sadowski
On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:

> On 2/4/16 3:59 PM, Gilles wrote:
>> Hi.
>>
>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>> -----
>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>> unit: ms)
>>                         name time/call std dev total time ratio
>> cv difference
>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>> 0.26 0.0000e+00
>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>> 0.15 -1.2900e+02
>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>> 0.04 2.1032e+02
>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>> 0.14 5.1945e+02
>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>> 0.14 8.1451e+02
>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>> 0.06 9.7816e+02
>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>> 0.08 1.6602e+03
>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>> 0.14 1.7301e+03
>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>> 0.16 1.6139e+02
>> -----
>> where "cv" is the ratio of the 3rd to the 2nd column.
>>
>> Questions are:
>> * How meaningful are micro-benchmarks when the timed operation has
>> a very
>>   small duration (wrt e.g. the duration of other machine
>> instructions that
>>   are required to perform them)?
>
> It is harder to get good benchmarks for shorter duration activities,
> but not impossible.  One thing that it would be good to do is to
> compare these results with JMH [1].

I was expecting insights based on the benchmark which I did run.

We have a tool in CM; if it's wrong, we should remove it.
How its results compare with JMH is an interesting question, I
agree, but I don't have time to make an analysis of benchmarking
tools (on top of what I've been doing since December because
totally innocuous changes in the RNG classes were frowned upon
out of baseless fear).

>> * In a given environment (HW, OS, JVM), is there a lower limit
>> (absolute
>>   duration) below which anything will be deemed good enough?
>
> That depends completely on the application.

Sorry, I thought that it was obvious: I don't speak of applications
that don't care about performance. :-)

For those that do, I do not agree with the statement: the question
relates to finding a point below which it is the environment that
overwhelms the other conditions.
A point where there will be _unavoidable_ overhead (transferring data
from/to memory, JVM book-keeping, ...) and perturbations (context
switches, ...) such that their duration adds a constant time (on
average) that may render most enhancements to an already efficient
algorithm barely noticeable in practice.
Similarly, but in the opposite direction, some language constructs
or design choices might slow down things a bit, but without
endangering any user.

A problem arises when any enhancement to the design is deemed
harmful because it degrades a micro-benchmark, even though that
benchmark may not reflect any real use-cases.
Then, the real harm is against development.

>> * Can a library like CM admit a trade-off between ultimate
>> performance and
>>   good design?   IOW, is there an acceptable overhead in exchange
>> for other qualities
>>   (clarity, non-redundancy, extensibility, etc.)?
>
> That is too general a question to be meaningful.   We need to look
> at specific cases.  What exactly are you proposing?

<rant>
It is quite meaningful even if it refers to general principles.
Those could (should, IMO) be taken into account when managing a
project like CM, on a par with "performance" (whose intrinsic value
is never questioned).
</rant>

Two specific cases are:
* inheritance vs delegation (a.k.a. composition)
* generics (that could require runtime casts)

>> * Does ultimate performance for the base functionality (generation
>> of a
>>   random number) trump any consideration of use-cases that would
>> need an
>>   extension (of the base functionality, such as computation to
>> match another
>>   distribution) that will unavoidably degrades the performance
>> (hence the
>>   micro-benchmark will be completely misleading for those users)?
>
> Again, this is vague and the answer depends on what exactly you are
> talking about. Significantly damaging performance of PRNG
> implementations is a bad idea,

Now, *this* is vague: what do you mean by "significantly"?
That was actually my question in the first place.  Referring to the
benchmark above, people who'd know why they require ultimate
performance
should be able to tell what range of numbers they'd find acceptable in
that table.

<rant>
Actually my questions are very precise, but the answers would require
some decent analysis, rather than the usual "bad idea" dismissal.
</rant>

In the Javadoc of the "random" package, there is information about
performance but no reference as to the benchmarking procedure.

I can consistently observe a totally different behaviour (using
"PerfTestUtils"):
  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
  2. moreover, the ratio *grows* with each of the longer periods
     members of the WELL family (see the above table).

This makes me wonder how someone who purports to need "ultimate"
performance can have any objective basis to determine what is good
or bad for his own applications.

> unless there are actual practical use
> cases you can point to that whatever changes you are proposing
> enable.

As I've explained in very much details in another thread, I've
reviewed (from a design POV) the RNG code in "random" and IMHO, there
is room for improvement (cf. above for what I mean by that term).
<rant>
I have some code ready for review but I had to resort to what I
considered sub-optimal design (preemptively renouncing to propose a
"delegation"-based design) solely because of the destructive community
process that takes place here.[1]
</rant>

The practical use-cases is anything that needs further processing of
the numbers produced according to a uniform distribution: I agree that
there would be little sense to code that latter part in a "pure" OO
way[2].  And Luc made it indeed quite efficient, I think, in the
various
concrete classes.
What I want to reconsider is how those concrete low-level algorithms
can
be plugged in a higher-level function that just requires a "source of
randomness", as I'd call a provider of "int" (or "long") values, where
the high level functionality does not care at all about the provider's
inner working (a.o. how it's seeded!).

A case in point is the sampling of other distributions (namely the
Normal distribution).
Here is the benchmark report:
-----
nextGaussian() (calls per timed block: 2000000, timed blocks: 100, time
unit: ms)
                         name time/call std dev total time ratio   cv
difference
o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000 0.14
0.0000e+00
o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371 0.07
1.2892e+04
    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330 0.06
1.0393e+04
           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733 0.07
1.1360e+04
          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797 0.04
1.1513e+04
         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052 0.03
1.2125e+04
         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970 0.06
1.1928e+04
         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804 0.04
1.3931e+04
         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882 0.06
1.4118e+04
        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603 0.08
1.1049e+04
-----
where the first line is JDK's "nextInt()" and the remaining are
"nextGaussian()".

The generation time is thus about 4-fold that of "nextInt()".
Thus, degrading the performance of "nextInt()" by 10% would degrade the
performance of "nextGaussian()" by half that.

For a performance discussion to be meaningful, I think that we'd need
to know how that fact would affect, even modestly, any moderately
complex
post-processing of the generated values.

Another case, for modularity, would be to consider that other
algorithms could
be implemented to provide the required distribution.[3]
In the current design (inheritance-based), that can only be done by
creating
a subclass, even though the core functionality ("nextDouble()") is not
overridden.

>> * What are usages of the CM RNGs?
>>   Do those use-cases strictly forbid "loosing" a dozen
>> milliseconds per
>>   million calls?
>
> There are many different use cases.  My own applications use them in
> simulations to generate random deviates, to generate random hex
> strings as identifiers and in stochastic algorithms like some of our
> internal uses.  The last case is definitely sensitive to PRNG
> performance.

Thanks for giving examples, but since we talk about performance, I
was hoping for some real flesh, like the relative duration of numbers
generation (e.g. the total duration of calls to the "RandomGenerator"
instances wrt to the total duration of the application).

I don't know if by "last case", you are referring to code that is
inside CM.  I didn't spot anything that makes "heavy" usage of a
RNG (in the sense that generation would count as a sizable part of
the whole processing).

As I pointed out many times: if an application is severely dependent
on the performance of RNG, the user probably will turn to specific
tools (e.g. GPUs? [4]) rather than use CM.

Conversely, using Java might be preferred for its flexibility, which
is destroyed by a search for ultimate performance (which nobody seems
able to define reasonably).
Performance is not a goal in itself; it should not be a trophy which
sits uselessly on a shelf.

My goal is not to deliberately slow things down; it is to allow some
leeway so that designs which are deemed better (on all counts except,
perhaps, performance) are given a chance to show their strengths, in
particular in areas where performance in absolute terms is "good
enough" for all use-cases which CM should care about (hence the need
of actual data points[5]).


Gilles

[1] "Is it faster?"
     "No."
     "Then, no."
[2] Although that is in some sense what you indirectly defend by
wanting
     to stick with a meaningless "next(int bits)" method.
[3] http://www.doornik.com/research/ziggurat.pdf
[4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
[5] Hence the need to agree on a methodology/policy for benchmarking.

>
> Phil
>
> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>   IOW, would those users for which such a difference matters use
>> CM at all?
>
>>
>>
>> Thanks,
>> Gilles



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Phil Steitz
On 2/5/16 12:59 PM, Gilles wrote:

> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>> On 2/4/16 3:59 PM, Gilles wrote:
>>> Hi.
>>>
>>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>>> -----
>>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>>> unit: ms)
>>>                         name time/call std dev total time ratio
>>> cv difference
>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>> 0.26 0.0000e+00
>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>> 0.15 -1.2900e+02
>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>> 0.04 2.1032e+02
>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>> 0.14 5.1945e+02
>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>> 0.14 8.1451e+02
>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>> 0.06 9.7816e+02
>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>> 0.08 1.6602e+03
>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>> 0.14 1.7301e+03
>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>> 0.16 1.6139e+02
>>> -----
>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>
>>> Questions are:
>>> * How meaningful are micro-benchmarks when the timed operation has
>>> a very
>>>   small duration (wrt e.g. the duration of other machine
>>> instructions that
>>>   are required to perform them)?
>>
>> It is harder to get good benchmarks for shorter duration activities,
>> but not impossible.  One thing that it would be good to do is to
>> compare these results with JMH [1].
>
> I was expecting insights based on the benchmark which I did run.

You asked whether or not benchmarks are meaningful when the task
being benchmarked is short duration.  I answered that question.
>
> We have a tool in CM; if it's wrong, we should remove it.
> How its results compare with JMH is an interesting question,

I will look into this.
> I
> agree, but I don't have time to make an analysis of benchmarking
> tools (on top of what I've been doing since December because
> totally innocuous changes in the RNG classes were frowned upon
> out of baseless fear).

Please cut the hypberbole.

>
>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>> (absolute
>>>   duration) below which anything will be deemed good enough?
>>
>> That depends completely on the application.
>
> Sorry, I thought that it was obvious: I don't speak of applications
> that don't care about performance. :-)
>
> For those that do, I do not agree with the statement: the question
> relates to finding a point below which it is the environment that
> overwhelms the other conditions.
> A point where there will be _unavoidable_ overhead (transferring data
> from/to memory, JVM book-keeping, ...) and perturbations (context
> switches, ...) such that their duration adds a constant time (on
> average) that may render most enhancements to an already efficient
> algorithm barely noticeable in practice.
> Similarly, but in the opposite direction, some language constructs
> or design choices might slow down things a bit, but without
> endangering any user.
>
> A problem arises when any enhancement to the design is deemed
> harmful because it degrades a micro-benchmark, even though that
> benchmark may not reflect any real use-cases.
> Then, the real harm is against development.
>
>>> * Can a library like CM admit a trade-off between ultimate
>>> performance and
>>>   good design?   IOW, is there an acceptable overhead in exchange
>>> for other qualities
>>>   (clarity, non-redundancy, extensibility, etc.)?
>>
>> That is too general a question to be meaningful.   We need to look
>> at specific cases.  What exactly are you proposing?
>
> <rant>
> It is quite meaningful even if it refers to general principles.
> Those could (should, IMO) be taken into account when managing a
> project like CM, on a par with "performance" (whose intrinsic value
> is never questioned).
> </rant>

Rant all you want.  Vague generalities and hyperbole have no value.
>
> Two specific cases are:
> * inheritance vs delegation (a.k.a. composition)
> * generics (that could require runtime casts)

This is getting closer to meaningful.  Where exactly in the code are
you wanting to use something and seeing benchmark damage?

>
>>> * Does ultimate performance for the base functionality (generation
>>> of a
>>>   random number) trump any consideration of use-cases that would
>>> need an
>>>   extension (of the base functionality, such as computation to
>>> match another
>>>   distribution) that will unavoidably degrades the performance
>>> (hence the
>>>   micro-benchmark will be completely misleading for those users)?
>>
>> Again, this is vague and the answer depends on what exactly you are
>> talking about. Significantly damaging performance of PRNG
>> implementations is a bad idea,
>
> Now, *this* is vague: what do you mean by "significantly"?
> That was actually my question in the first place.
If you are talking about PRNG performance, I would say a 1% hit is
significant.

> Referring to the
> benchmark above, people who'd know why they require ultimate
> performance
> should be able to tell what range of numbers they'd find
> acceptable in
> that table.
>
> <rant>
> Actually my questions are very precise, but the answers would require
> some decent analysis, rather than the usual "bad idea" dismissal.
> </rant>
>
> In the Javadoc of the "random" package, there is information about
> performance but no reference as to the benchmarking procedure.

It would be great to repeat these using JMH, which is emerging as a
de facto standard for java benchmarking.  I will look into this.

>
> I can consistently observe a totally different behaviour (using
> "PerfTestUtils"):
>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>  2. moreover, the ratio *grows* with each of the longer periods
>     members of the WELL family (see the above table).
>
> This makes me wonder how someone who purports to need "ultimate"
> performance can have any objective basis to determine what is good
> or bad for his own applications.
>
>> unless there are actual practical use
>> cases you can point to that whatever changes you are proposing
>> enable.
>
> As I've explained in very much details in another thread, I've
> reviewed (from a design POV) the RNG code in "random" and IMHO, there
> is room for improvement (cf. above for what I mean by that term).
> <rant>
> I have some code ready for review but I had to resort to what I
> considered sub-optimal design (preemptively renouncing to propose a
> "delegation"-based design) solely because of the destructive
> community
> process that takes place here.[1]
> </rant>

More vague hyperbole that serves no purpose.  Please focus on actual
code or design issues.
>
> The practical use-cases is anything that needs further processing of
> the numbers produced according to a uniform distribution:

Isn't that what the samplers in the distributions package do?  What
we need from the PRNG implementations is just blocks of bits.  Since
we wanted a pluggable replacement for j.u.Random, we added uniform
ints, longs and floats and gaussian floats.  The samplers just need
uniform doubles.  The practical use case we need is well-supported
in the code we have.  What is missing, exactly?

> I agree that
> there would be little sense to code that latter part in a "pure" OO
> way[2].  And Luc made it indeed quite efficient, I think, in the
> various
> concrete classes.
> What I want to reconsider is how those concrete low-level
> algorithms can
> be plugged in a higher-level function that just requires a "source of
> randomness", as I'd call a provider of "int" (or "long") values,
> where
> the high level functionality does not care at all about the
> provider's
> inner working (a.o. how it's seeded!).

This is why many higher-level samplers and other things that require
random data inside [math] have a pluggable RandomGenerator.
>
> A case in point is the sampling of other distributions (namely the
> Normal distribution).

Or any of the others.  We have a default, inversion-based method
that the abstract distribution classes provide and some pretty good
specialized implementations within individual distributions.  Most
of these just require uniform random doubles as source.

>
> Here is the benchmark report:
> -----
> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
> time unit: ms)
>                         name time/call std dev total time ratio  
> cv difference
> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
> 0.14 0.0000e+00
> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
> 0.07 1.2892e+04
>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
> 0.06 1.0393e+04
>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
> 0.07 1.1360e+04
>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
> 0.04 1.1513e+04
>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
> 0.03 1.2125e+04
>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
> 0.06 1.1928e+04
>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
> 0.04 1.3931e+04
>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
> 0.06 1.4118e+04
>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
> 0.08 1.1049e+04
> -----
> where the first line is JDK's "nextInt()" and the remaining are
> "nextGaussian()".
>
> The generation time is thus about 4-fold that of "nextInt()".
> Thus, degrading the performance of "nextInt()" by 10% would
> degrade the
> performance of "nextGaussian()" by half that.
>
> For a performance discussion to be meaningful, I think that we'd need
> to know how that fact would affect, even modestly, any moderately
> complex
> post-processing of the generated values.
>
> Another case, for modularity, would be to consider that other
> algorithms could
> be implemented to provide the required distribution.[3]
> In the current design (inheritance-based), that can only be done
> by creating
> a subclass, even though the core functionality ("nextDouble()") is
> not
> overridden.
>
>>> * What are usages of the CM RNGs?
>>>   Do those use-cases strictly forbid "loosing" a dozen
>>> milliseconds per
>>>   million calls?
>>
>> There are many different use cases.  My own applications use them in
>> simulations to generate random deviates, to generate random hex
>> strings as identifiers and in stochastic algorithms like some of our
>> internal uses.  The last case is definitely sensitive to PRNG
>> performance.
>
> Thanks for giving examples, but since we talk about performance, I
> was hoping for some real flesh, like the relative duration of numbers
> generation (e.g. the total duration of calls to the "RandomGenerator"
> instances wrt to the total duration of the application).
>
> I don't know if by "last case", you are referring to code that is
> inside CM.  I didn't spot anything that makes "heavy" usage of a
> RNG (in the sense that generation would count as a sizable part of
> the whole processing).
monteCarloP in KolmogorovSmirnovTest is one to check.  
>
> As I pointed out many times: if an application is severely dependent
> on the performance of RNG, the user probably will turn to specific
> tools (e.g. GPUs? [4]) rather than use CM.

That is a bogus argument.  We should make our PRNGs simple and fast
so their use can extend to performance-sensitive applications.
>
> Conversely, using Java might be preferred for its flexibility, which
> is destroyed by a search for ultimate performance (which nobody seems
> able to define reasonably).
> Performance is not a goal in itself; it should not be a trophy which
> sits uselessly on a shelf.

Nor should "beautiful design" in the eyes of one person.
>
> My goal is not to deliberately slow things down; it is to allow some
> leeway so that designs which are deemed better (on all counts except,
> perhaps, performance) are given a chance to show their strengths, in
> particular in areas where performance in absolute terms is "good
> enough" for all use-cases which CM should care about (hence the need
> of actual data points[5]).

I see no reason that we can't have it both ways - good design and
good performance. What we have now, modulo maybe some small changes
to reduce code duplication, works fine.  If you want to play with
64-bit generators and can find reference implementations and verify
that they do in fact perform better, great.  If not, I don't see the
point.  You can rant and complain all you want; but I am not going
to let us trash performance or correctness of code in the random
class or anywhere else just because you think it is somehow "better
designed"  unless you can show specific, practical use cases
demonstrating the value of the changes.

Phil

>
>
> Gilles
>
> [1] "Is it faster?"
>     "No."
>     "Then, no."
> [2] Although that is in some sense what you indirectly defend by
> wanting
>     to stick with a meaningless "next(int bits)" method.
> [3] http://www.doornik.com/research/ziggurat.pdf
> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
> [5] Hence the need to agree on a methodology/policy for benchmarking.
>
>>
>> Phil
>>
>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>   IOW, would those users for which such a difference matters use
>>> CM at all?
>>
>>>
>>>
>>> Thanks,
>>> Gilles
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Niall Pemberton
Are you not concerned about forming a TLP of 7 around Math when one of the
seven is clearly not a happy camper?

Niall

On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <[hidden email]> wrote:

> On 2/5/16 12:59 PM, Gilles wrote:
> > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
> >> On 2/4/16 3:59 PM, Gilles wrote:
> >>> Hi.
> >>>
> >>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
> >>> -----
> >>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
> >>> unit: ms)
> >>>                         name time/call std dev total time ratio
> >>> cv difference
> >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
> >>> 0.26 0.0000e+00
> >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
> >>> 0.15 -1.2900e+02
> >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
> >>> 0.04 2.1032e+02
> >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
> >>> 0.14 5.1945e+02
> >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
> >>> 0.14 8.1451e+02
> >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
> >>> 0.06 9.7816e+02
> >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
> >>> 0.08 1.6602e+03
> >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
> >>> 0.14 1.7301e+03
> >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
> >>> 0.16 1.6139e+02
> >>> -----
> >>> where "cv" is the ratio of the 3rd to the 2nd column.
> >>>
> >>> Questions are:
> >>> * How meaningful are micro-benchmarks when the timed operation has
> >>> a very
> >>>   small duration (wrt e.g. the duration of other machine
> >>> instructions that
> >>>   are required to perform them)?
> >>
> >> It is harder to get good benchmarks for shorter duration activities,
> >> but not impossible.  One thing that it would be good to do is to
> >> compare these results with JMH [1].
> >
> > I was expecting insights based on the benchmark which I did run.
>
> You asked whether or not benchmarks are meaningful when the task
> being benchmarked is short duration.  I answered that question.
> >
> > We have a tool in CM; if it's wrong, we should remove it.
> > How its results compare with JMH is an interesting question,
>
> I will look into this.
> > I
> > agree, but I don't have time to make an analysis of benchmarking
> > tools (on top of what I've been doing since December because
> > totally innocuous changes in the RNG classes were frowned upon
> > out of baseless fear).
>
> Please cut the hypberbole.
> >
> >>> * In a given environment (HW, OS, JVM), is there a lower limit
> >>> (absolute
> >>>   duration) below which anything will be deemed good enough?
> >>
> >> That depends completely on the application.
> >
> > Sorry, I thought that it was obvious: I don't speak of applications
> > that don't care about performance. :-)
> >
> > For those that do, I do not agree with the statement: the question
> > relates to finding a point below which it is the environment that
> > overwhelms the other conditions.
> > A point where there will be _unavoidable_ overhead (transferring data
> > from/to memory, JVM book-keeping, ...) and perturbations (context
> > switches, ...) such that their duration adds a constant time (on
> > average) that may render most enhancements to an already efficient
> > algorithm barely noticeable in practice.
> > Similarly, but in the opposite direction, some language constructs
> > or design choices might slow down things a bit, but without
> > endangering any user.
> >
> > A problem arises when any enhancement to the design is deemed
> > harmful because it degrades a micro-benchmark, even though that
> > benchmark may not reflect any real use-cases.
> > Then, the real harm is against development.
> >
> >>> * Can a library like CM admit a trade-off between ultimate
> >>> performance and
> >>>   good design?   IOW, is there an acceptable overhead in exchange
> >>> for other qualities
> >>>   (clarity, non-redundancy, extensibility, etc.)?
> >>
> >> That is too general a question to be meaningful.   We need to look
> >> at specific cases.  What exactly are you proposing?
> >
> > <rant>
> > It is quite meaningful even if it refers to general principles.
> > Those could (should, IMO) be taken into account when managing a
> > project like CM, on a par with "performance" (whose intrinsic value
> > is never questioned).
> > </rant>
>
> Rant all you want.  Vague generalities and hyperbole have no value.
> >
> > Two specific cases are:
> > * inheritance vs delegation (a.k.a. composition)
> > * generics (that could require runtime casts)
>
> This is getting closer to meaningful.  Where exactly in the code are
> you wanting to use something and seeing benchmark damage?
> >
> >>> * Does ultimate performance for the base functionality (generation
> >>> of a
> >>>   random number) trump any consideration of use-cases that would
> >>> need an
> >>>   extension (of the base functionality, such as computation to
> >>> match another
> >>>   distribution) that will unavoidably degrades the performance
> >>> (hence the
> >>>   micro-benchmark will be completely misleading for those users)?
> >>
> >> Again, this is vague and the answer depends on what exactly you are
> >> talking about. Significantly damaging performance of PRNG
> >> implementations is a bad idea,
> >
> > Now, *this* is vague: what do you mean by "significantly"?
> > That was actually my question in the first place.
> If you are talking about PRNG performance, I would say a 1% hit is
> significant.
> > Referring to the
> > benchmark above, people who'd know why they require ultimate
> > performance
> > should be able to tell what range of numbers they'd find
> > acceptable in
> > that table.
> >
> > <rant>
> > Actually my questions are very precise, but the answers would require
> > some decent analysis, rather than the usual "bad idea" dismissal.
> > </rant>
> >
> > In the Javadoc of the "random" package, there is information about
> > performance but no reference as to the benchmarking procedure.
>
> It would be great to repeat these using JMH, which is emerging as a
> de facto standard for java benchmarking.  I will look into this.
> >
> > I can consistently observe a totally different behaviour (using
> > "PerfTestUtils"):
> >  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
> >  2. moreover, the ratio *grows* with each of the longer periods
> >     members of the WELL family (see the above table).
> >
> > This makes me wonder how someone who purports to need "ultimate"
> > performance can have any objective basis to determine what is good
> > or bad for his own applications.
> >
> >> unless there are actual practical use
> >> cases you can point to that whatever changes you are proposing
> >> enable.
> >
> > As I've explained in very much details in another thread, I've
> > reviewed (from a design POV) the RNG code in "random" and IMHO, there
> > is room for improvement (cf. above for what I mean by that term).
> > <rant>
> > I have some code ready for review but I had to resort to what I
> > considered sub-optimal design (preemptively renouncing to propose a
> > "delegation"-based design) solely because of the destructive
> > community
> > process that takes place here.[1]
> > </rant>
>
> More vague hyperbole that serves no purpose.  Please focus on actual
> code or design issues.
> >
> > The practical use-cases is anything that needs further processing of
> > the numbers produced according to a uniform distribution:
>
> Isn't that what the samplers in the distributions package do?  What
> we need from the PRNG implementations is just blocks of bits.  Since
> we wanted a pluggable replacement for j.u.Random, we added uniform
> ints, longs and floats and gaussian floats.  The samplers just need
> uniform doubles.  The practical use case we need is well-supported
> in the code we have.  What is missing, exactly?
> > I agree that
> > there would be little sense to code that latter part in a "pure" OO
> > way[2].  And Luc made it indeed quite efficient, I think, in the
> > various
> > concrete classes.
> > What I want to reconsider is how those concrete low-level
> > algorithms can
> > be plugged in a higher-level function that just requires a "source of
> > randomness", as I'd call a provider of "int" (or "long") values,
> > where
> > the high level functionality does not care at all about the
> > provider's
> > inner working (a.o. how it's seeded!).
>
> This is why many higher-level samplers and other things that require
> random data inside [math] have a pluggable RandomGenerator.
> >
> > A case in point is the sampling of other distributions (namely the
> > Normal distribution).
>
> Or any of the others.  We have a default, inversion-based method
> that the abstract distribution classes provide and some pretty good
> specialized implementations within individual distributions.  Most
> of these just require uniform random doubles as source.
>
> >
> > Here is the benchmark report:
> > -----
> > nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
> > time unit: ms)
> >                         name time/call std dev total time ratio
> > cv difference
> > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
> > 0.14 0.0000e+00
> > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
> > 0.07 1.2892e+04
> >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
> > 0.06 1.0393e+04
> >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
> > 0.07 1.1360e+04
> >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
> > 0.04 1.1513e+04
> >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
> > 0.03 1.2125e+04
> >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
> > 0.06 1.1928e+04
> >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
> > 0.04 1.3931e+04
> >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
> > 0.06 1.4118e+04
> >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
> > 0.08 1.1049e+04
> > -----
> > where the first line is JDK's "nextInt()" and the remaining are
> > "nextGaussian()".
> >
> > The generation time is thus about 4-fold that of "nextInt()".
> > Thus, degrading the performance of "nextInt()" by 10% would
> > degrade the
> > performance of "nextGaussian()" by half that.
> >
> > For a performance discussion to be meaningful, I think that we'd need
> > to know how that fact would affect, even modestly, any moderately
> > complex
> > post-processing of the generated values.
> >
> > Another case, for modularity, would be to consider that other
> > algorithms could
> > be implemented to provide the required distribution.[3]
> > In the current design (inheritance-based), that can only be done
> > by creating
> > a subclass, even though the core functionality ("nextDouble()") is
> > not
> > overridden.
> >
> >>> * What are usages of the CM RNGs?
> >>>   Do those use-cases strictly forbid "loosing" a dozen
> >>> milliseconds per
> >>>   million calls?
> >>
> >> There are many different use cases.  My own applications use them in
> >> simulations to generate random deviates, to generate random hex
> >> strings as identifiers and in stochastic algorithms like some of our
> >> internal uses.  The last case is definitely sensitive to PRNG
> >> performance.
> >
> > Thanks for giving examples, but since we talk about performance, I
> > was hoping for some real flesh, like the relative duration of numbers
> > generation (e.g. the total duration of calls to the "RandomGenerator"
> > instances wrt to the total duration of the application).
> >
> > I don't know if by "last case", you are referring to code that is
> > inside CM.  I didn't spot anything that makes "heavy" usage of a
> > RNG (in the sense that generation would count as a sizable part of
> > the whole processing).
> monteCarloP in KolmogorovSmirnovTest is one to check.
> >
> > As I pointed out many times: if an application is severely dependent
> > on the performance of RNG, the user probably will turn to specific
> > tools (e.g. GPUs? [4]) rather than use CM.
>
> That is a bogus argument.  We should make our PRNGs simple and fast
> so their use can extend to performance-sensitive applications.
> >
> > Conversely, using Java might be preferred for its flexibility, which
> > is destroyed by a search for ultimate performance (which nobody seems
> > able to define reasonably).
> > Performance is not a goal in itself; it should not be a trophy which
> > sits uselessly on a shelf.
>
> Nor should "beautiful design" in the eyes of one person.
> >
> > My goal is not to deliberately slow things down; it is to allow some
> > leeway so that designs which are deemed better (on all counts except,
> > perhaps, performance) are given a chance to show their strengths, in
> > particular in areas where performance in absolute terms is "good
> > enough" for all use-cases which CM should care about (hence the need
> > of actual data points[5]).
>
> I see no reason that we can't have it both ways - good design and
> good performance. What we have now, modulo maybe some small changes
> to reduce code duplication, works fine.  If you want to play with
> 64-bit generators and can find reference implementations and verify
> that they do in fact perform better, great.  If not, I don't see the
> point.  You can rant and complain all you want; but I am not going
> to let us trash performance or correctness of code in the random
> class or anywhere else just because you think it is somehow "better
> designed"  unless you can show specific, practical use cases
> demonstrating the value of the changes.
>
> Phil
> >
> >
> > Gilles
> >
> > [1] "Is it faster?"
> >     "No."
> >     "Then, no."
> > [2] Although that is in some sense what you indirectly defend by
> > wanting
> >     to stick with a meaningless "next(int bits)" method.
> > [3] http://www.doornik.com/research/ziggurat.pdf
> > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
> > [5] Hence the need to agree on a methodology/policy for benchmarking.
> >
> >>
> >> Phil
> >>
> >> [1] http://openjdk.java.net/projects/code-tools/jmh/
> >>>   IOW, would those users for which such a difference matters use
> >>> CM at all?
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Gilles
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

James Carman
Passion is a good thing. It means he gives a damn. Gilles obviously cares
quite a bit about the subject matter. I think it's great that he's willing
to crack open some code that a lot of folks wouldn't touch and propose some
interesting changes. Perhaps adding the new implementations alongside the
existing ones would make sense? Maybe in a "beta" package? That way we can
get this stuff in front of folks and let them take it for a spin. Maybe we
create another artifact that we release with less strenuous rules around
backward compatibility? Just some thoughts.

On Fri, Feb 5, 2016 at 8:54 PM Niall Pemberton <[hidden email]>
wrote:

> Are you not concerned about forming a TLP of 7 around Math when one of the
> seven is clearly not a happy camper?
>
> Niall
>
> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <[hidden email]>
> wrote:
>
> > On 2/5/16 12:59 PM, Gilles wrote:
> > > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
> > >> On 2/4/16 3:59 PM, Gilles wrote:
> > >>> Hi.
> > >>>
> > >>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
> > >>> -----
> > >>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
> > >>> unit: ms)
> > >>>                         name time/call std dev total time ratio
> > >>> cv difference
> > >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
> > >>> 0.26 0.0000e+00
> > >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
> > >>> 0.15 -1.2900e+02
> > >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
> > >>> 0.04 2.1032e+02
> > >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
> > >>> 0.14 5.1945e+02
> > >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
> > >>> 0.14 8.1451e+02
> > >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
> > >>> 0.06 9.7816e+02
> > >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
> > >>> 0.08 1.6602e+03
> > >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
> > >>> 0.14 1.7301e+03
> > >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
> > >>> 0.16 1.6139e+02
> > >>> -----
> > >>> where "cv" is the ratio of the 3rd to the 2nd column.
> > >>>
> > >>> Questions are:
> > >>> * How meaningful are micro-benchmarks when the timed operation has
> > >>> a very
> > >>>   small duration (wrt e.g. the duration of other machine
> > >>> instructions that
> > >>>   are required to perform them)?
> > >>
> > >> It is harder to get good benchmarks for shorter duration activities,
> > >> but not impossible.  One thing that it would be good to do is to
> > >> compare these results with JMH [1].
> > >
> > > I was expecting insights based on the benchmark which I did run.
> >
> > You asked whether or not benchmarks are meaningful when the task
> > being benchmarked is short duration.  I answered that question.
> > >
> > > We have a tool in CM; if it's wrong, we should remove it.
> > > How its results compare with JMH is an interesting question,
> >
> > I will look into this.
> > > I
> > > agree, but I don't have time to make an analysis of benchmarking
> > > tools (on top of what I've been doing since December because
> > > totally innocuous changes in the RNG classes were frowned upon
> > > out of baseless fear).
> >
> > Please cut the hypberbole.
> > >
> > >>> * In a given environment (HW, OS, JVM), is there a lower limit
> > >>> (absolute
> > >>>   duration) below which anything will be deemed good enough?
> > >>
> > >> That depends completely on the application.
> > >
> > > Sorry, I thought that it was obvious: I don't speak of applications
> > > that don't care about performance. :-)
> > >
> > > For those that do, I do not agree with the statement: the question
> > > relates to finding a point below which it is the environment that
> > > overwhelms the other conditions.
> > > A point where there will be _unavoidable_ overhead (transferring data
> > > from/to memory, JVM book-keeping, ...) and perturbations (context
> > > switches, ...) such that their duration adds a constant time (on
> > > average) that may render most enhancements to an already efficient
> > > algorithm barely noticeable in practice.
> > > Similarly, but in the opposite direction, some language constructs
> > > or design choices might slow down things a bit, but without
> > > endangering any user.
> > >
> > > A problem arises when any enhancement to the design is deemed
> > > harmful because it degrades a micro-benchmark, even though that
> > > benchmark may not reflect any real use-cases.
> > > Then, the real harm is against development.
> > >
> > >>> * Can a library like CM admit a trade-off between ultimate
> > >>> performance and
> > >>>   good design?   IOW, is there an acceptable overhead in exchange
> > >>> for other qualities
> > >>>   (clarity, non-redundancy, extensibility, etc.)?
> > >>
> > >> That is too general a question to be meaningful.   We need to look
> > >> at specific cases.  What exactly are you proposing?
> > >
> > > <rant>
> > > It is quite meaningful even if it refers to general principles.
> > > Those could (should, IMO) be taken into account when managing a
> > > project like CM, on a par with "performance" (whose intrinsic value
> > > is never questioned).
> > > </rant>
> >
> > Rant all you want.  Vague generalities and hyperbole have no value.
> > >
> > > Two specific cases are:
> > > * inheritance vs delegation (a.k.a. composition)
> > > * generics (that could require runtime casts)
> >
> > This is getting closer to meaningful.  Where exactly in the code are
> > you wanting to use something and seeing benchmark damage?
> > >
> > >>> * Does ultimate performance for the base functionality (generation
> > >>> of a
> > >>>   random number) trump any consideration of use-cases that would
> > >>> need an
> > >>>   extension (of the base functionality, such as computation to
> > >>> match another
> > >>>   distribution) that will unavoidably degrades the performance
> > >>> (hence the
> > >>>   micro-benchmark will be completely misleading for those users)?
> > >>
> > >> Again, this is vague and the answer depends on what exactly you are
> > >> talking about. Significantly damaging performance of PRNG
> > >> implementations is a bad idea,
> > >
> > > Now, *this* is vague: what do you mean by "significantly"?
> > > That was actually my question in the first place.
> > If you are talking about PRNG performance, I would say a 1% hit is
> > significant.
> > > Referring to the
> > > benchmark above, people who'd know why they require ultimate
> > > performance
> > > should be able to tell what range of numbers they'd find
> > > acceptable in
> > > that table.
> > >
> > > <rant>
> > > Actually my questions are very precise, but the answers would require
> > > some decent analysis, rather than the usual "bad idea" dismissal.
> > > </rant>
> > >
> > > In the Javadoc of the "random" package, there is information about
> > > performance but no reference as to the benchmarking procedure.
> >
> > It would be great to repeat these using JMH, which is emerging as a
> > de facto standard for java benchmarking.  I will look into this.
> > >
> > > I can consistently observe a totally different behaviour (using
> > > "PerfTestUtils"):
> > >  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
> > >  2. moreover, the ratio *grows* with each of the longer periods
> > >     members of the WELL family (see the above table).
> > >
> > > This makes me wonder how someone who purports to need "ultimate"
> > > performance can have any objective basis to determine what is good
> > > or bad for his own applications.
> > >
> > >> unless there are actual practical use
> > >> cases you can point to that whatever changes you are proposing
> > >> enable.
> > >
> > > As I've explained in very much details in another thread, I've
> > > reviewed (from a design POV) the RNG code in "random" and IMHO, there
> > > is room for improvement (cf. above for what I mean by that term).
> > > <rant>
> > > I have some code ready for review but I had to resort to what I
> > > considered sub-optimal design (preemptively renouncing to propose a
> > > "delegation"-based design) solely because of the destructive
> > > community
> > > process that takes place here.[1]
> > > </rant>
> >
> > More vague hyperbole that serves no purpose.  Please focus on actual
> > code or design issues.
> > >
> > > The practical use-cases is anything that needs further processing of
> > > the numbers produced according to a uniform distribution:
> >
> > Isn't that what the samplers in the distributions package do?  What
> > we need from the PRNG implementations is just blocks of bits.  Since
> > we wanted a pluggable replacement for j.u.Random, we added uniform
> > ints, longs and floats and gaussian floats.  The samplers just need
> > uniform doubles.  The practical use case we need is well-supported
> > in the code we have.  What is missing, exactly?
> > > I agree that
> > > there would be little sense to code that latter part in a "pure" OO
> > > way[2].  And Luc made it indeed quite efficient, I think, in the
> > > various
> > > concrete classes.
> > > What I want to reconsider is how those concrete low-level
> > > algorithms can
> > > be plugged in a higher-level function that just requires a "source of
> > > randomness", as I'd call a provider of "int" (or "long") values,
> > > where
> > > the high level functionality does not care at all about the
> > > provider's
> > > inner working (a.o. how it's seeded!).
> >
> > This is why many higher-level samplers and other things that require
> > random data inside [math] have a pluggable RandomGenerator.
> > >
> > > A case in point is the sampling of other distributions (namely the
> > > Normal distribution).
> >
> > Or any of the others.  We have a default, inversion-based method
> > that the abstract distribution classes provide and some pretty good
> > specialized implementations within individual distributions.  Most
> > of these just require uniform random doubles as source.
> >
> > >
> > > Here is the benchmark report:
> > > -----
> > > nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
> > > time unit: ms)
> > >                         name time/call std dev total time ratio
> > > cv difference
> > > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
> > > 0.14 0.0000e+00
> > > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
> > > 0.07 1.2892e+04
> > >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
> > > 0.06 1.0393e+04
> > >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
> > > 0.07 1.1360e+04
> > >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
> > > 0.04 1.1513e+04
> > >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
> > > 0.03 1.2125e+04
> > >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
> > > 0.06 1.1928e+04
> > >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
> > > 0.04 1.3931e+04
> > >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
> > > 0.06 1.4118e+04
> > >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
> > > 0.08 1.1049e+04
> > > -----
> > > where the first line is JDK's "nextInt()" and the remaining are
> > > "nextGaussian()".
> > >
> > > The generation time is thus about 4-fold that of "nextInt()".
> > > Thus, degrading the performance of "nextInt()" by 10% would
> > > degrade the
> > > performance of "nextGaussian()" by half that.
> > >
> > > For a performance discussion to be meaningful, I think that we'd need
> > > to know how that fact would affect, even modestly, any moderately
> > > complex
> > > post-processing of the generated values.
> > >
> > > Another case, for modularity, would be to consider that other
> > > algorithms could
> > > be implemented to provide the required distribution.[3]
> > > In the current design (inheritance-based), that can only be done
> > > by creating
> > > a subclass, even though the core functionality ("nextDouble()") is
> > > not
> > > overridden.
> > >
> > >>> * What are usages of the CM RNGs?
> > >>>   Do those use-cases strictly forbid "loosing" a dozen
> > >>> milliseconds per
> > >>>   million calls?
> > >>
> > >> There are many different use cases.  My own applications use them in
> > >> simulations to generate random deviates, to generate random hex
> > >> strings as identifiers and in stochastic algorithms like some of our
> > >> internal uses.  The last case is definitely sensitive to PRNG
> > >> performance.
> > >
> > > Thanks for giving examples, but since we talk about performance, I
> > > was hoping for some real flesh, like the relative duration of numbers
> > > generation (e.g. the total duration of calls to the "RandomGenerator"
> > > instances wrt to the total duration of the application).
> > >
> > > I don't know if by "last case", you are referring to code that is
> > > inside CM.  I didn't spot anything that makes "heavy" usage of a
> > > RNG (in the sense that generation would count as a sizable part of
> > > the whole processing).
> > monteCarloP in KolmogorovSmirnovTest is one to check.
> > >
> > > As I pointed out many times: if an application is severely dependent
> > > on the performance of RNG, the user probably will turn to specific
> > > tools (e.g. GPUs? [4]) rather than use CM.
> >
> > That is a bogus argument.  We should make our PRNGs simple and fast
> > so their use can extend to performance-sensitive applications.
> > >
> > > Conversely, using Java might be preferred for its flexibility, which
> > > is destroyed by a search for ultimate performance (which nobody seems
> > > able to define reasonably).
> > > Performance is not a goal in itself; it should not be a trophy which
> > > sits uselessly on a shelf.
> >
> > Nor should "beautiful design" in the eyes of one person.
> > >
> > > My goal is not to deliberately slow things down; it is to allow some
> > > leeway so that designs which are deemed better (on all counts except,
> > > perhaps, performance) are given a chance to show their strengths, in
> > > particular in areas where performance in absolute terms is "good
> > > enough" for all use-cases which CM should care about (hence the need
> > > of actual data points[5]).
> >
> > I see no reason that we can't have it both ways - good design and
> > good performance. What we have now, modulo maybe some small changes
> > to reduce code duplication, works fine.  If you want to play with
> > 64-bit generators and can find reference implementations and verify
> > that they do in fact perform better, great.  If not, I don't see the
> > point.  You can rant and complain all you want; but I am not going
> > to let us trash performance or correctness of code in the random
> > class or anywhere else just because you think it is somehow "better
> > designed"  unless you can show specific, practical use cases
> > demonstrating the value of the changes.
> >
> > Phil
> > >
> > >
> > > Gilles
> > >
> > > [1] "Is it faster?"
> > >     "No."
> > >     "Then, no."
> > > [2] Although that is in some sense what you indirectly defend by
> > > wanting
> > >     to stick with a meaningless "next(int bits)" method.
> > > [3] http://www.doornik.com/research/ziggurat.pdf
> > > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
> > > [5] Hence the need to agree on a methodology/policy for benchmarking.
> > >
> > >>
> > >> Phil
> > >>
> > >> [1] http://openjdk.java.net/projects/code-tools/jmh/
> > >>>   IOW, would those users for which such a difference matters use
> > >>> CM at all?
> > >>
> > >>>
> > >>>
> > >>> Thanks,
> > >>> Gilles
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Gilles Sadowski
In reply to this post by Phil Steitz
Phil,

You talk again about me trying to push forward changes that
serve no purpose besides "trash performance and correctness".
This is again baseless FUD to which I've already answered
(with detailed list of facts which you chose to ignore).
You declare anything for which you don't have an answer as
"bogus argument". Why is the reference to multi-threaded
implementations bogus?  You contradict yourself in pretending
that CM RNGs could be so good as to make people want to use
them while refusing to consider whether another design might
be better suited to such high(er)-performance extensions.
This particular case is a long shot but if any and all
discussions are stopped dead, how do you imagine that we can
go anywhere?
As you could read from experts, micro-benchmarks are deceiving;
but you refuse to even consider alternative designs if there
might be a slight suspicion of degradation.
How can we ever set up a constructive discussion on how to
make everybody enjoy this project if the purported chair is
so bent to protecting existing code rather than nurture a good
relationship with developers who may sometimes have other ideas?
I'm trying to improve the code (in a dimension which you can't
seem to understand unfortunately) but respectfully request
data points from those users of said code, in order to be
able to prove that no harm will be done.
But you seem to prefer to not disclose anything that would
get us closer to agreement (better design with similar
performance and room for improvement, to be discussed
together as a real development team -- Not you requiring,
as a bad boss, that I bow to your standards for judging
usefulness).
This 1% which you throw at me, where does it come from?
What does 1% mean when the benchmark shows standard deviations
that vary from 4 to 26% in the "nextInt" case and from 3 to
7% in the "nextGaussian" case?
This 1% looks meaningless without context; context is what I'm
asking in order to try and establish objectively whether
another design will have a measurable impact on actual tasks.
I'm not going to show any "damaged" benchmark because of how
unwelcome you make me feel every time I wish to talk about
other aspects of the code.
There is no development community here.  Only solitary
coders who share a repository.

Not sorry for the top-post,
Gilles


On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:

> On 2/5/16 12:59 PM, Gilles wrote:
>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>> Hi.
>>>>
>>>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>>>> -----
>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>>>> unit: ms)
>>>>                         name time/call std dev total time ratio
>>>> cv difference
>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>> 0.26 0.0000e+00
>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>> 0.15 -1.2900e+02
>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>> 0.04 2.1032e+02
>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>> 0.14 5.1945e+02
>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>> 0.14 8.1451e+02
>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>> 0.06 9.7816e+02
>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>> 0.08 1.6602e+03
>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>> 0.14 1.7301e+03
>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>> 0.16 1.6139e+02
>>>> -----
>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>
>>>> Questions are:
>>>> * How meaningful are micro-benchmarks when the timed operation has
>>>> a very
>>>>   small duration (wrt e.g. the duration of other machine
>>>> instructions that
>>>>   are required to perform them)?
>>>
>>> It is harder to get good benchmarks for shorter duration
>>> activities,
>>> but not impossible.  One thing that it would be good to do is to
>>> compare these results with JMH [1].
>>
>> I was expecting insights based on the benchmark which I did run.
>
> You asked whether or not benchmarks are meaningful when the task
> being benchmarked is short duration.  I answered that question.
>>
>> We have a tool in CM; if it's wrong, we should remove it.
>> How its results compare with JMH is an interesting question,
>
> I will look into this.
>> I
>> agree, but I don't have time to make an analysis of benchmarking
>> tools (on top of what I've been doing since December because
>> totally innocuous changes in the RNG classes were frowned upon
>> out of baseless fear).
>
> Please cut the hypberbole.
>>
>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>> (absolute
>>>>   duration) below which anything will be deemed good enough?
>>>
>>> That depends completely on the application.
>>
>> Sorry, I thought that it was obvious: I don't speak of applications
>> that don't care about performance. :-)
>>
>> For those that do, I do not agree with the statement: the question
>> relates to finding a point below which it is the environment that
>> overwhelms the other conditions.
>> A point where there will be _unavoidable_ overhead (transferring
>> data
>> from/to memory, JVM book-keeping, ...) and perturbations (context
>> switches, ...) such that their duration adds a constant time (on
>> average) that may render most enhancements to an already efficient
>> algorithm barely noticeable in practice.
>> Similarly, but in the opposite direction, some language constructs
>> or design choices might slow down things a bit, but without
>> endangering any user.
>>
>> A problem arises when any enhancement to the design is deemed
>> harmful because it degrades a micro-benchmark, even though that
>> benchmark may not reflect any real use-cases.
>> Then, the real harm is against development.
>>
>>>> * Can a library like CM admit a trade-off between ultimate
>>>> performance and
>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>> for other qualities
>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>
>>> That is too general a question to be meaningful.   We need to look
>>> at specific cases.  What exactly are you proposing?
>>
>> <rant>
>> It is quite meaningful even if it refers to general principles.
>> Those could (should, IMO) be taken into account when managing a
>> project like CM, on a par with "performance" (whose intrinsic value
>> is never questioned).
>> </rant>
>
> Rant all you want.  Vague generalities and hyperbole have no value.
>>
>> Two specific cases are:
>> * inheritance vs delegation (a.k.a. composition)
>> * generics (that could require runtime casts)
>
> This is getting closer to meaningful.  Where exactly in the code are
> you wanting to use something and seeing benchmark damage?
>>
>>>> * Does ultimate performance for the base functionality (generation
>>>> of a
>>>>   random number) trump any consideration of use-cases that would
>>>> need an
>>>>   extension (of the base functionality, such as computation to
>>>> match another
>>>>   distribution) that will unavoidably degrades the performance
>>>> (hence the
>>>>   micro-benchmark will be completely misleading for those users)?
>>>
>>> Again, this is vague and the answer depends on what exactly you are
>>> talking about. Significantly damaging performance of PRNG
>>> implementations is a bad idea,
>>
>> Now, *this* is vague: what do you mean by "significantly"?
>> That was actually my question in the first place.
> If you are talking about PRNG performance, I would say a 1% hit is
> significant.
>> Referring to the
>> benchmark above, people who'd know why they require ultimate
>> performance
>> should be able to tell what range of numbers they'd find
>> acceptable in
>> that table.
>>
>> <rant>
>> Actually my questions are very precise, but the answers would
>> require
>> some decent analysis, rather than the usual "bad idea" dismissal.
>> </rant>
>>
>> In the Javadoc of the "random" package, there is information about
>> performance but no reference as to the benchmarking procedure.
>
> It would be great to repeat these using JMH, which is emerging as a
> de facto standard for java benchmarking.  I will look into this.
>>
>> I can consistently observe a totally different behaviour (using
>> "PerfTestUtils"):
>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>  2. moreover, the ratio *grows* with each of the longer periods
>>     members of the WELL family (see the above table).
>>
>> This makes me wonder how someone who purports to need "ultimate"
>> performance can have any objective basis to determine what is good
>> or bad for his own applications.
>>
>>> unless there are actual practical use
>>> cases you can point to that whatever changes you are proposing
>>> enable.
>>
>> As I've explained in very much details in another thread, I've
>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>> there
>> is room for improvement (cf. above for what I mean by that term).
>> <rant>
>> I have some code ready for review but I had to resort to what I
>> considered sub-optimal design (preemptively renouncing to propose a
>> "delegation"-based design) solely because of the destructive
>> community
>> process that takes place here.[1]
>> </rant>
>
> More vague hyperbole that serves no purpose.  Please focus on actual
> code or design issues.
>>
>> The practical use-cases is anything that needs further processing of
>> the numbers produced according to a uniform distribution:
>
> Isn't that what the samplers in the distributions package do?  What
> we need from the PRNG implementations is just blocks of bits.  Since
> we wanted a pluggable replacement for j.u.Random, we added uniform
> ints, longs and floats and gaussian floats.  The samplers just need
> uniform doubles.  The practical use case we need is well-supported
> in the code we have.  What is missing, exactly?
>> I agree that
>> there would be little sense to code that latter part in a "pure" OO
>> way[2].  And Luc made it indeed quite efficient, I think, in the
>> various
>> concrete classes.
>> What I want to reconsider is how those concrete low-level
>> algorithms can
>> be plugged in a higher-level function that just requires a "source
>> of
>> randomness", as I'd call a provider of "int" (or "long") values,
>> where
>> the high level functionality does not care at all about the
>> provider's
>> inner working (a.o. how it's seeded!).
>
> This is why many higher-level samplers and other things that require
> random data inside [math] have a pluggable RandomGenerator.
>>
>> A case in point is the sampling of other distributions (namely the
>> Normal distribution).
>
> Or any of the others.  We have a default, inversion-based method
> that the abstract distribution classes provide and some pretty good
> specialized implementations within individual distributions.  Most
> of these just require uniform random doubles as source.
>
>>
>> Here is the benchmark report:
>> -----
>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>> time unit: ms)
>>                         name time/call std dev total time ratio
>> cv difference
>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>> 0.14 0.0000e+00
>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>> 0.07 1.2892e+04
>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>> 0.06 1.0393e+04
>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>> 0.07 1.1360e+04
>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>> 0.04 1.1513e+04
>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>> 0.03 1.2125e+04
>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>> 0.06 1.1928e+04
>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>> 0.04 1.3931e+04
>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>> 0.06 1.4118e+04
>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>> 0.08 1.1049e+04
>> -----
>> where the first line is JDK's "nextInt()" and the remaining are
>> "nextGaussian()".
>>
>> The generation time is thus about 4-fold that of "nextInt()".
>> Thus, degrading the performance of "nextInt()" by 10% would
>> degrade the
>> performance of "nextGaussian()" by half that.
>>
>> For a performance discussion to be meaningful, I think that we'd
>> need
>> to know how that fact would affect, even modestly, any moderately
>> complex
>> post-processing of the generated values.
>>
>> Another case, for modularity, would be to consider that other
>> algorithms could
>> be implemented to provide the required distribution.[3]
>> In the current design (inheritance-based), that can only be done
>> by creating
>> a subclass, even though the core functionality ("nextDouble()") is
>> not
>> overridden.
>>
>>>> * What are usages of the CM RNGs?
>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>> milliseconds per
>>>>   million calls?
>>>
>>> There are many different use cases.  My own applications use them
>>> in
>>> simulations to generate random deviates, to generate random hex
>>> strings as identifiers and in stochastic algorithms like some of
>>> our
>>> internal uses.  The last case is definitely sensitive to PRNG
>>> performance.
>>
>> Thanks for giving examples, but since we talk about performance, I
>> was hoping for some real flesh, like the relative duration of
>> numbers
>> generation (e.g. the total duration of calls to the
>> "RandomGenerator"
>> instances wrt to the total duration of the application).
>>
>> I don't know if by "last case", you are referring to code that is
>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>> RNG (in the sense that generation would count as a sizable part of
>> the whole processing).
> monteCarloP in KolmogorovSmirnovTest is one to check.
>>
>> As I pointed out many times: if an application is severely dependent
>> on the performance of RNG, the user probably will turn to specific
>> tools (e.g. GPUs? [4]) rather than use CM.
>
> That is a bogus argument.  We should make our PRNGs simple and fast
> so their use can extend to performance-sensitive applications.
>>
>> Conversely, using Java might be preferred for its flexibility, which
>> is destroyed by a search for ultimate performance (which nobody
>> seems
>> able to define reasonably).
>> Performance is not a goal in itself; it should not be a trophy which
>> sits uselessly on a shelf.
>
> Nor should "beautiful design" in the eyes of one person.
>>
>> My goal is not to deliberately slow things down; it is to allow some
>> leeway so that designs which are deemed better (on all counts
>> except,
>> perhaps, performance) are given a chance to show their strengths, in
>> particular in areas where performance in absolute terms is "good
>> enough" for all use-cases which CM should care about (hence the need
>> of actual data points[5]).
>
> I see no reason that we can't have it both ways - good design and
> good performance. What we have now, modulo maybe some small changes
> to reduce code duplication, works fine.  If you want to play with
> 64-bit generators and can find reference implementations and verify
> that they do in fact perform better, great.  If not, I don't see the
> point.  You can rant and complain all you want; but I am not going
> to let us trash performance or correctness of code in the random
> class or anywhere else just because you think it is somehow "better
> designed"  unless you can show specific, practical use cases
> demonstrating the value of the changes.
>
> Phil
>>
>>
>> Gilles
>>
>> [1] "Is it faster?"
>>     "No."
>>     "Then, no."
>> [2] Although that is in some sense what you indirectly defend by
>> wanting
>>     to stick with a meaningless "next(int bits)" method.
>> [3] http://www.doornik.com/research/ziggurat.pdf
>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>> [5] Hence the need to agree on a methodology/policy for
>> benchmarking.
>>
>>>
>>> Phil
>>>
>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>   IOW, would those users for which such a difference matters use
>>>> CM at all?
>>>
>>>>
>>>>
>>>> Thanks,
>>>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Gilles Sadowski
In reply to this post by Niall Pemberton
On Sat, 6 Feb 2016 01:53:54 +0000, Niall Pemberton wrote:
> Are you not concerned about forming a TLP of 7 around Math when one
> of the
> seven is clearly not a happy camper?

Of course I am.
Besides stopping to annoy non-Math Commons developers, I've asked
that we clarify the positive objectives of the move, as far as
development would be concerned.
No one except Thomas seemed interested.

Gilles

>
> Niall
>
> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <[hidden email]>
> wrote:
>
>> On 2/5/16 12:59 PM, Gilles wrote:
>> > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>> >> On 2/4/16 3:59 PM, Gilles wrote:
>> >>> Hi.
>> >>>
>> >>> Here is a micro-benchmark report (performed with
>> "PerfTestUtils"):
>> >>> -----
>> >>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>> time
>> >>> unit: ms)
>> >>>                         name time/call std dev total time ratio
>> >>> cv difference
>> >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>> >>> 0.26 0.0000e+00
>> >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>> >>> 0.15 -1.2900e+02
>> >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>> >>> 0.04 2.1032e+02
>> >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>> >>> 0.14 5.1945e+02
>> >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>> >>> 0.14 8.1451e+02
>> >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>> >>> 0.06 9.7816e+02
>> >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>> >>> 0.08 1.6602e+03
>> >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>> >>> 0.14 1.7301e+03
>> >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>> >>> 0.16 1.6139e+02
>> >>> -----
>> >>> where "cv" is the ratio of the 3rd to the 2nd column.
>> >>>
>> >>> Questions are:
>> >>> * How meaningful are micro-benchmarks when the timed operation
>> has
>> >>> a very
>> >>>   small duration (wrt e.g. the duration of other machine
>> >>> instructions that
>> >>>   are required to perform them)?
>> >>
>> >> It is harder to get good benchmarks for shorter duration
>> activities,
>> >> but not impossible.  One thing that it would be good to do is to
>> >> compare these results with JMH [1].
>> >
>> > I was expecting insights based on the benchmark which I did run.
>>
>> You asked whether or not benchmarks are meaningful when the task
>> being benchmarked is short duration.  I answered that question.
>> >
>> > We have a tool in CM; if it's wrong, we should remove it.
>> > How its results compare with JMH is an interesting question,
>>
>> I will look into this.
>> > I
>> > agree, but I don't have time to make an analysis of benchmarking
>> > tools (on top of what I've been doing since December because
>> > totally innocuous changes in the RNG classes were frowned upon
>> > out of baseless fear).
>>
>> Please cut the hypberbole.
>> >
>> >>> * In a given environment (HW, OS, JVM), is there a lower limit
>> >>> (absolute
>> >>>   duration) below which anything will be deemed good enough?
>> >>
>> >> That depends completely on the application.
>> >
>> > Sorry, I thought that it was obvious: I don't speak of
>> applications
>> > that don't care about performance. :-)
>> >
>> > For those that do, I do not agree with the statement: the question
>> > relates to finding a point below which it is the environment that
>> > overwhelms the other conditions.
>> > A point where there will be _unavoidable_ overhead (transferring
>> data
>> > from/to memory, JVM book-keeping, ...) and perturbations (context
>> > switches, ...) such that their duration adds a constant time (on
>> > average) that may render most enhancements to an already efficient
>> > algorithm barely noticeable in practice.
>> > Similarly, but in the opposite direction, some language constructs
>> > or design choices might slow down things a bit, but without
>> > endangering any user.
>> >
>> > A problem arises when any enhancement to the design is deemed
>> > harmful because it degrades a micro-benchmark, even though that
>> > benchmark may not reflect any real use-cases.
>> > Then, the real harm is against development.
>> >
>> >>> * Can a library like CM admit a trade-off between ultimate
>> >>> performance and
>> >>>   good design?   IOW, is there an acceptable overhead in
>> exchange
>> >>> for other qualities
>> >>>   (clarity, non-redundancy, extensibility, etc.)?
>> >>
>> >> That is too general a question to be meaningful.   We need to
>> look
>> >> at specific cases.  What exactly are you proposing?
>> >
>> > <rant>
>> > It is quite meaningful even if it refers to general principles.
>> > Those could (should, IMO) be taken into account when managing a
>> > project like CM, on a par with "performance" (whose intrinsic
>> value
>> > is never questioned).
>> > </rant>
>>
>> Rant all you want.  Vague generalities and hyperbole have no value.
>> >
>> > Two specific cases are:
>> > * inheritance vs delegation (a.k.a. composition)
>> > * generics (that could require runtime casts)
>>
>> This is getting closer to meaningful.  Where exactly in the code are
>> you wanting to use something and seeing benchmark damage?
>> >
>> >>> * Does ultimate performance for the base functionality
>> (generation
>> >>> of a
>> >>>   random number) trump any consideration of use-cases that would
>> >>> need an
>> >>>   extension (of the base functionality, such as computation to
>> >>> match another
>> >>>   distribution) that will unavoidably degrades the performance
>> >>> (hence the
>> >>>   micro-benchmark will be completely misleading for those
>> users)?
>> >>
>> >> Again, this is vague and the answer depends on what exactly you
>> are
>> >> talking about. Significantly damaging performance of PRNG
>> >> implementations is a bad idea,
>> >
>> > Now, *this* is vague: what do you mean by "significantly"?
>> > That was actually my question in the first place.
>> If you are talking about PRNG performance, I would say a 1% hit is
>> significant.
>> > Referring to the
>> > benchmark above, people who'd know why they require ultimate
>> > performance
>> > should be able to tell what range of numbers they'd find
>> > acceptable in
>> > that table.
>> >
>> > <rant>
>> > Actually my questions are very precise, but the answers would
>> require
>> > some decent analysis, rather than the usual "bad idea" dismissal.
>> > </rant>
>> >
>> > In the Javadoc of the "random" package, there is information about
>> > performance but no reference as to the benchmarking procedure.
>>
>> It would be great to repeat these using JMH, which is emerging as a
>> de facto standard for java benchmarking.  I will look into this.
>> >
>> > I can consistently observe a totally different behaviour (using
>> > "PerfTestUtils"):
>> >  1. "MersenneTwister" is *always* faster than all of the WELL
>> RNGs;
>> >  2. moreover, the ratio *grows* with each of the longer periods
>> >     members of the WELL family (see the above table).
>> >
>> > This makes me wonder how someone who purports to need "ultimate"
>> > performance can have any objective basis to determine what is good
>> > or bad for his own applications.
>> >
>> >> unless there are actual practical use
>> >> cases you can point to that whatever changes you are proposing
>> >> enable.
>> >
>> > As I've explained in very much details in another thread, I've
>> > reviewed (from a design POV) the RNG code in "random" and IMHO,
>> there
>> > is room for improvement (cf. above for what I mean by that term).
>> > <rant>
>> > I have some code ready for review but I had to resort to what I
>> > considered sub-optimal design (preemptively renouncing to propose
>> a
>> > "delegation"-based design) solely because of the destructive
>> > community
>> > process that takes place here.[1]
>> > </rant>
>>
>> More vague hyperbole that serves no purpose.  Please focus on actual
>> code or design issues.
>> >
>> > The practical use-cases is anything that needs further processing
>> of
>> > the numbers produced according to a uniform distribution:
>>
>> Isn't that what the samplers in the distributions package do?  What
>> we need from the PRNG implementations is just blocks of bits.  Since
>> we wanted a pluggable replacement for j.u.Random, we added uniform
>> ints, longs and floats and gaussian floats.  The samplers just need
>> uniform doubles.  The practical use case we need is well-supported
>> in the code we have.  What is missing, exactly?
>> > I agree that
>> > there would be little sense to code that latter part in a "pure"
>> OO
>> > way[2].  And Luc made it indeed quite efficient, I think, in the
>> > various
>> > concrete classes.
>> > What I want to reconsider is how those concrete low-level
>> > algorithms can
>> > be plugged in a higher-level function that just requires a "source
>> of
>> > randomness", as I'd call a provider of "int" (or "long") values,
>> > where
>> > the high level functionality does not care at all about the
>> > provider's
>> > inner working (a.o. how it's seeded!).
>>
>> This is why many higher-level samplers and other things that require
>> random data inside [math] have a pluggable RandomGenerator.
>> >
>> > A case in point is the sampling of other distributions (namely the
>> > Normal distribution).
>>
>> Or any of the others.  We have a default, inversion-based method
>> that the abstract distribution classes provide and some pretty good
>> specialized implementations within individual distributions.  Most
>> of these just require uniform random doubles as source.
>>
>> >
>> > Here is the benchmark report:
>> > -----
>> > nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>> > time unit: ms)
>> >                         name time/call std dev total time ratio
>> > cv difference
>> > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>> > 0.14 0.0000e+00
>> > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>> > 0.07 1.2892e+04
>> >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>> > 0.06 1.0393e+04
>> >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>> > 0.07 1.1360e+04
>> >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>> > 0.04 1.1513e+04
>> >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>> > 0.03 1.2125e+04
>> >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>> > 0.06 1.1928e+04
>> >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>> > 0.04 1.3931e+04
>> >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>> > 0.06 1.4118e+04
>> >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>> > 0.08 1.1049e+04
>> > -----
>> > where the first line is JDK's "nextInt()" and the remaining are
>> > "nextGaussian()".
>> >
>> > The generation time is thus about 4-fold that of "nextInt()".
>> > Thus, degrading the performance of "nextInt()" by 10% would
>> > degrade the
>> > performance of "nextGaussian()" by half that.
>> >
>> > For a performance discussion to be meaningful, I think that we'd
>> need
>> > to know how that fact would affect, even modestly, any moderately
>> > complex
>> > post-processing of the generated values.
>> >
>> > Another case, for modularity, would be to consider that other
>> > algorithms could
>> > be implemented to provide the required distribution.[3]
>> > In the current design (inheritance-based), that can only be done
>> > by creating
>> > a subclass, even though the core functionality ("nextDouble()") is
>> > not
>> > overridden.
>> >
>> >>> * What are usages of the CM RNGs?
>> >>>   Do those use-cases strictly forbid "loosing" a dozen
>> >>> milliseconds per
>> >>>   million calls?
>> >>
>> >> There are many different use cases.  My own applications use them
>> in
>> >> simulations to generate random deviates, to generate random hex
>> >> strings as identifiers and in stochastic algorithms like some of
>> our
>> >> internal uses.  The last case is definitely sensitive to PRNG
>> >> performance.
>> >
>> > Thanks for giving examples, but since we talk about performance, I
>> > was hoping for some real flesh, like the relative duration of
>> numbers
>> > generation (e.g. the total duration of calls to the
>> "RandomGenerator"
>> > instances wrt to the total duration of the application).
>> >
>> > I don't know if by "last case", you are referring to code that is
>> > inside CM.  I didn't spot anything that makes "heavy" usage of a
>> > RNG (in the sense that generation would count as a sizable part of
>> > the whole processing).
>> monteCarloP in KolmogorovSmirnovTest is one to check.
>> >
>> > As I pointed out many times: if an application is severely
>> dependent
>> > on the performance of RNG, the user probably will turn to specific
>> > tools (e.g. GPUs? [4]) rather than use CM.
>>
>> That is a bogus argument.  We should make our PRNGs simple and fast
>> so their use can extend to performance-sensitive applications.
>> >
>> > Conversely, using Java might be preferred for its flexibility,
>> which
>> > is destroyed by a search for ultimate performance (which nobody
>> seems
>> > able to define reasonably).
>> > Performance is not a goal in itself; it should not be a trophy
>> which
>> > sits uselessly on a shelf.
>>
>> Nor should "beautiful design" in the eyes of one person.
>> >
>> > My goal is not to deliberately slow things down; it is to allow
>> some
>> > leeway so that designs which are deemed better (on all counts
>> except,
>> > perhaps, performance) are given a chance to show their strengths,
>> in
>> > particular in areas where performance in absolute terms is "good
>> > enough" for all use-cases which CM should care about (hence the
>> need
>> > of actual data points[5]).
>>
>> I see no reason that we can't have it both ways - good design and
>> good performance. What we have now, modulo maybe some small changes
>> to reduce code duplication, works fine.  If you want to play with
>> 64-bit generators and can find reference implementations and verify
>> that they do in fact perform better, great.  If not, I don't see the
>> point.  You can rant and complain all you want; but I am not going
>> to let us trash performance or correctness of code in the random
>> class or anywhere else just because you think it is somehow "better
>> designed"  unless you can show specific, practical use cases
>> demonstrating the value of the changes.
>>
>> Phil
>> >
>> >
>> > Gilles
>> >
>> > [1] "Is it faster?"
>> >     "No."
>> >     "Then, no."
>> > [2] Although that is in some sense what you indirectly defend by
>> > wanting
>> >     to stick with a meaningless "next(int bits)" method.
>> > [3] http://www.doornik.com/research/ziggurat.pdf
>> > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>> > [5] Hence the need to agree on a methodology/policy for
>> benchmarking.
>> >
>> >>
>> >> Phil
>> >>
>> >> [1] http://openjdk.java.net/projects/code-tools/jmh/
>> >>>   IOW, would those users for which such a difference matters use
>> >>> CM at all?
>> >>
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

garydgregory
On Fri, Feb 5, 2016 at 6:32 PM, Gilles <[hidden email]> wrote:

> On Sat, 6 Feb 2016 01:53:54 +0000, Niall Pemberton wrote:
>
>> Are you not concerned about forming a TLP of 7 around Math when one of the
>> seven is clearly not a happy camper?
>>
>
> Of course I am.
> Besides stopping to annoy non-Math Commons developers, I've asked
> that we clarify the positive objectives of the move, as far as
> development would be concerned.
> No one except Thomas seemed interested.
>
> Gilles
>
>
>> Niall
>>
>> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <[hidden email]>
>> wrote:
>>
>> On 2/5/16 12:59 PM, Gilles wrote:
>>> > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>> >> On 2/4/16 3:59 PM, Gilles wrote:
>>> >>> Hi.
>>> >>>
>>> >>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>>> >>> -----
>>> >>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>>> >>> unit: ms)
>>> >>>                         name time/call std dev total time ratio
>>> >>> cv difference
>>> >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>> >>> 0.26 0.0000e+00
>>> >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>> >>> 0.15 -1.2900e+02
>>> >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>> >>> 0.04 2.1032e+02
>>> >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>> >>> 0.14 5.1945e+02
>>> >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>> >>> 0.14 8.1451e+02
>>> >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>> >>> 0.06 9.7816e+02
>>> >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>> >>> 0.08 1.6602e+03
>>> >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>> >>> 0.14 1.7301e+03
>>> >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>> >>> 0.16 1.6139e+02
>>> >>> -----
>>> >>> where "cv" is the ratio of the 3rd to the 2nd column.
>>> >>>
>>> >>> Questions are:
>>> >>> * How meaningful are micro-benchmarks when the timed operation has
>>> >>> a very
>>> >>>   small duration (wrt e.g. the duration of other machine
>>> >>> instructions that
>>> >>>   are required to perform them)?
>>> >>
>>> >> It is harder to get good benchmarks for shorter duration activities,
>>> >> but not impossible.  One thing that it would be good to do is to
>>> >> compare these results with JMH [1].
>>>
>>
We use JMH over at Log4j and it does a fine job. See the log4j-perf module.


Gary

> >
>>> > I was expecting insights based on the benchmark which I did run.
>>>
>>> You asked whether or not benchmarks are meaningful when the task
>>> being benchmarked is short duration.  I answered that question.
>>> >
>>> > We have a tool in CM; if it's wrong, we should remove it.
>>> > How its results compare with JMH is an interesting question,
>>>
>>> I will look into this.
>>> > I
>>> > agree, but I don't have time to make an analysis of benchmarking
>>> > tools (on top of what I've been doing since December because
>>> > totally innocuous changes in the RNG classes were frowned upon
>>> > out of baseless fear).
>>>
>>> Please cut the hypberbole.
>>> >
>>> >>> * In a given environment (HW, OS, JVM), is there a lower limit
>>> >>> (absolute
>>> >>>   duration) below which anything will be deemed good enough?
>>> >>
>>> >> That depends completely on the application.
>>> >
>>> > Sorry, I thought that it was obvious: I don't speak of applications
>>> > that don't care about performance. :-)
>>> >
>>> > For those that do, I do not agree with the statement: the question
>>> > relates to finding a point below which it is the environment that
>>> > overwhelms the other conditions.
>>> > A point where there will be _unavoidable_ overhead (transferring data
>>> > from/to memory, JVM book-keeping, ...) and perturbations (context
>>> > switches, ...) such that their duration adds a constant time (on
>>> > average) that may render most enhancements to an already efficient
>>> > algorithm barely noticeable in practice.
>>> > Similarly, but in the opposite direction, some language constructs
>>> > or design choices might slow down things a bit, but without
>>> > endangering any user.
>>> >
>>> > A problem arises when any enhancement to the design is deemed
>>> > harmful because it degrades a micro-benchmark, even though that
>>> > benchmark may not reflect any real use-cases.
>>> > Then, the real harm is against development.
>>> >
>>> >>> * Can a library like CM admit a trade-off between ultimate
>>> >>> performance and
>>> >>>   good design?   IOW, is there an acceptable overhead in exchange
>>> >>> for other qualities
>>> >>>   (clarity, non-redundancy, extensibility, etc.)?
>>> >>
>>> >> That is too general a question to be meaningful.   We need to look
>>> >> at specific cases.  What exactly are you proposing?
>>> >
>>> > <rant>
>>> > It is quite meaningful even if it refers to general principles.
>>> > Those could (should, IMO) be taken into account when managing a
>>> > project like CM, on a par with "performance" (whose intrinsic value
>>> > is never questioned).
>>> > </rant>
>>>
>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>> >
>>> > Two specific cases are:
>>> > * inheritance vs delegation (a.k.a. composition)
>>> > * generics (that could require runtime casts)
>>>
>>> This is getting closer to meaningful.  Where exactly in the code are
>>> you wanting to use something and seeing benchmark damage?
>>> >
>>> >>> * Does ultimate performance for the base functionality (generation
>>> >>> of a
>>> >>>   random number) trump any consideration of use-cases that would
>>> >>> need an
>>> >>>   extension (of the base functionality, such as computation to
>>> >>> match another
>>> >>>   distribution) that will unavoidably degrades the performance
>>> >>> (hence the
>>> >>>   micro-benchmark will be completely misleading for those users)?
>>> >>
>>> >> Again, this is vague and the answer depends on what exactly you are
>>> >> talking about. Significantly damaging performance of PRNG
>>> >> implementations is a bad idea,
>>> >
>>> > Now, *this* is vague: what do you mean by "significantly"?
>>> > That was actually my question in the first place.
>>> If you are talking about PRNG performance, I would say a 1% hit is
>>> significant.
>>> > Referring to the
>>> > benchmark above, people who'd know why they require ultimate
>>> > performance
>>> > should be able to tell what range of numbers they'd find
>>> > acceptable in
>>> > that table.
>>> >
>>> > <rant>
>>> > Actually my questions are very precise, but the answers would require
>>> > some decent analysis, rather than the usual "bad idea" dismissal.
>>> > </rant>
>>> >
>>> > In the Javadoc of the "random" package, there is information about
>>> > performance but no reference as to the benchmarking procedure.
>>>
>>> It would be great to repeat these using JMH, which is emerging as a
>>> de facto standard for java benchmarking.  I will look into this.
>>> >
>>> > I can consistently observe a totally different behaviour (using
>>> > "PerfTestUtils"):
>>> >  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>> >  2. moreover, the ratio *grows* with each of the longer periods
>>> >     members of the WELL family (see the above table).
>>> >
>>> > This makes me wonder how someone who purports to need "ultimate"
>>> > performance can have any objective basis to determine what is good
>>> > or bad for his own applications.
>>> >
>>> >> unless there are actual practical use
>>> >> cases you can point to that whatever changes you are proposing
>>> >> enable.
>>> >
>>> > As I've explained in very much details in another thread, I've
>>> > reviewed (from a design POV) the RNG code in "random" and IMHO, there
>>> > is room for improvement (cf. above for what I mean by that term).
>>> > <rant>
>>> > I have some code ready for review but I had to resort to what I
>>> > considered sub-optimal design (preemptively renouncing to propose a
>>> > "delegation"-based design) solely because of the destructive
>>> > community
>>> > process that takes place here.[1]
>>> > </rant>
>>>
>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>> code or design issues.
>>> >
>>> > The practical use-cases is anything that needs further processing of
>>> > the numbers produced according to a uniform distribution:
>>>
>>> Isn't that what the samplers in the distributions package do?  What
>>> we need from the PRNG implementations is just blocks of bits.  Since
>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>> ints, longs and floats and gaussian floats.  The samplers just need
>>> uniform doubles.  The practical use case we need is well-supported
>>> in the code we have.  What is missing, exactly?
>>> > I agree that
>>> > there would be little sense to code that latter part in a "pure" OO
>>> > way[2].  And Luc made it indeed quite efficient, I think, in the
>>> > various
>>> > concrete classes.
>>> > What I want to reconsider is how those concrete low-level
>>> > algorithms can
>>> > be plugged in a higher-level function that just requires a "source of
>>> > randomness", as I'd call a provider of "int" (or "long") values,
>>> > where
>>> > the high level functionality does not care at all about the
>>> > provider's
>>> > inner working (a.o. how it's seeded!).
>>>
>>> This is why many higher-level samplers and other things that require
>>> random data inside [math] have a pluggable RandomGenerator.
>>> >
>>> > A case in point is the sampling of other distributions (namely the
>>> > Normal distribution).
>>>
>>> Or any of the others.  We have a default, inversion-based method
>>> that the abstract distribution classes provide and some pretty good
>>> specialized implementations within individual distributions.  Most
>>> of these just require uniform random doubles as source.
>>>
>>> >
>>> > Here is the benchmark report:
>>> > -----
>>> > nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>> > time unit: ms)
>>> >                         name time/call std dev total time ratio
>>> > cv difference
>>> > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>> > 0.14 0.0000e+00
>>> > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>> > 0.07 1.2892e+04
>>> >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>> > 0.06 1.0393e+04
>>> >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>> > 0.07 1.1360e+04
>>> >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>> > 0.04 1.1513e+04
>>> >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>> > 0.03 1.2125e+04
>>> >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>> > 0.06 1.1928e+04
>>> >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>> > 0.04 1.3931e+04
>>> >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>> > 0.06 1.4118e+04
>>> >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>> > 0.08 1.1049e+04
>>> > -----
>>> > where the first line is JDK's "nextInt()" and the remaining are
>>> > "nextGaussian()".
>>> >
>>> > The generation time is thus about 4-fold that of "nextInt()".
>>> > Thus, degrading the performance of "nextInt()" by 10% would
>>> > degrade the
>>> > performance of "nextGaussian()" by half that.
>>> >
>>> > For a performance discussion to be meaningful, I think that we'd need
>>> > to know how that fact would affect, even modestly, any moderately
>>> > complex
>>> > post-processing of the generated values.
>>> >
>>> > Another case, for modularity, would be to consider that other
>>> > algorithms could
>>> > be implemented to provide the required distribution.[3]
>>> > In the current design (inheritance-based), that can only be done
>>> > by creating
>>> > a subclass, even though the core functionality ("nextDouble()") is
>>> > not
>>> > overridden.
>>> >
>>> >>> * What are usages of the CM RNGs?
>>> >>>   Do those use-cases strictly forbid "loosing" a dozen
>>> >>> milliseconds per
>>> >>>   million calls?
>>> >>
>>> >> There are many different use cases.  My own applications use them in
>>> >> simulations to generate random deviates, to generate random hex
>>> >> strings as identifiers and in stochastic algorithms like some of our
>>> >> internal uses.  The last case is definitely sensitive to PRNG
>>> >> performance.
>>> >
>>> > Thanks for giving examples, but since we talk about performance, I
>>> > was hoping for some real flesh, like the relative duration of numbers
>>> > generation (e.g. the total duration of calls to the "RandomGenerator"
>>> > instances wrt to the total duration of the application).
>>> >
>>> > I don't know if by "last case", you are referring to code that is
>>> > inside CM.  I didn't spot anything that makes "heavy" usage of a
>>> > RNG (in the sense that generation would count as a sizable part of
>>> > the whole processing).
>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>> >
>>> > As I pointed out many times: if an application is severely dependent
>>> > on the performance of RNG, the user probably will turn to specific
>>> > tools (e.g. GPUs? [4]) rather than use CM.
>>>
>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>> so their use can extend to performance-sensitive applications.
>>> >
>>> > Conversely, using Java might be preferred for its flexibility, which
>>> > is destroyed by a search for ultimate performance (which nobody seems
>>> > able to define reasonably).
>>> > Performance is not a goal in itself; it should not be a trophy which
>>> > sits uselessly on a shelf.
>>>
>>> Nor should "beautiful design" in the eyes of one person.
>>> >
>>> > My goal is not to deliberately slow things down; it is to allow some
>>> > leeway so that designs which are deemed better (on all counts except,
>>> > perhaps, performance) are given a chance to show their strengths, in
>>> > particular in areas where performance in absolute terms is "good
>>> > enough" for all use-cases which CM should care about (hence the need
>>> > of actual data points[5]).
>>>
>>> I see no reason that we can't have it both ways - good design and
>>> good performance. What we have now, modulo maybe some small changes
>>> to reduce code duplication, works fine.  If you want to play with
>>> 64-bit generators and can find reference implementations and verify
>>> that they do in fact perform better, great.  If not, I don't see the
>>> point.  You can rant and complain all you want; but I am not going
>>> to let us trash performance or correctness of code in the random
>>> class or anywhere else just because you think it is somehow "better
>>> designed"  unless you can show specific, practical use cases
>>> demonstrating the value of the changes.
>>>
>>> Phil
>>> >
>>> >
>>> > Gilles
>>> >
>>> > [1] "Is it faster?"
>>> >     "No."
>>> >     "Then, no."
>>> > [2] Although that is in some sense what you indirectly defend by
>>> > wanting
>>> >     to stick with a meaningless "next(int bits)" method.
>>> > [3] http://www.doornik.com/research/ziggurat.pdf
>>> > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>> > [5] Hence the need to agree on a methodology/policy for benchmarking.
>>> >
>>> >>
>>> >> Phil
>>> >>
>>> >> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>> >>>   IOW, would those users for which such a difference matters use
>>> >>> CM at all?
>>> >>
>>> >>>
>>> >>>
>>> >>> Thanks,
>>> >>> Gilles
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Gilles Sadowski
In reply to this post by James Carman
On Sat, 06 Feb 2016 02:07:05 +0000, James Carman wrote:
> Passion is a good thing. It means he gives a damn. Gilles obviously
> cares
> quite a bit about the subject matter. I think it's great that he's
> willing
> to crack open some code that a lot of folks wouldn't touch and
> propose some
> interesting changes.

This way of viewing it makes all the difference!
Thanks.  I want to stay here... ;-)

[Actually, it also would not have occurred to me to go and look
under hood, until I had the truly bad idea of trying to quickly
get a user request into the code base...]

> Perhaps adding the new implementations alongside the
> existing ones would make sense? Maybe in a "beta" package? That way
> we can
> get this stuff in front of folks and let them take it for a spin.
> Maybe we
> create another artifact that we release with less strenuous rules
> around
> backward compatibility? Just some thoughts.

I proposed as much in this message:
   http://markmail.org/message/nx2uuwlrvcun4cyq
as a consequence of abundant (destructive) criticism.


Gilles

>
> On Fri, Feb 5, 2016 at 8:54 PM Niall Pemberton
> <[hidden email]>
> wrote:
>
>> Are you not concerned about forming a TLP of 7 around Math when one
>> of the
>> seven is clearly not a happy camper?
>>
>> Niall
>>
>> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <[hidden email]>
>> wrote:
>>
>> > On 2/5/16 12:59 PM, Gilles wrote:
>> > > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>> > >> On 2/4/16 3:59 PM, Gilles wrote:
>> > >>> Hi.
>> > >>>
>> > >>> Here is a micro-benchmark report (performed with
>> "PerfTestUtils"):
>> > >>> -----
>> > >>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>> time
>> > >>> unit: ms)
>> > >>>                         name time/call std dev total time
>> ratio
>> > >>> cv difference
>> > >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03
>> 1.000
>> > >>> 0.26 0.0000e+00
>> > >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03
>> 0.941
>> > >>> 0.15 -1.2900e+02
>> > >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03
>> 1.097
>> > >>> 0.04 2.1032e+02
>> > >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03
>> 1.239
>> > >>> 0.14 5.1945e+02
>> > >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03
>> 1.374
>> > >>> 0.14 8.1451e+02
>> > >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03
>> 1.450
>> > >>> 0.06 9.7816e+02
>> > >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03
>> 1.763
>> > >>> 0.08 1.6602e+03
>> > >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03
>> 1.795
>> > >>> 0.14 1.7301e+03
>> > >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03
>> 1.074
>> > >>> 0.16 1.6139e+02
>> > >>> -----
>> > >>> where "cv" is the ratio of the 3rd to the 2nd column.
>> > >>>
>> > >>> Questions are:
>> > >>> * How meaningful are micro-benchmarks when the timed operation
>> has
>> > >>> a very
>> > >>>   small duration (wrt e.g. the duration of other machine
>> > >>> instructions that
>> > >>>   are required to perform them)?
>> > >>
>> > >> It is harder to get good benchmarks for shorter duration
>> activities,
>> > >> but not impossible.  One thing that it would be good to do is
>> to
>> > >> compare these results with JMH [1].
>> > >
>> > > I was expecting insights based on the benchmark which I did run.
>> >
>> > You asked whether or not benchmarks are meaningful when the task
>> > being benchmarked is short duration.  I answered that question.
>> > >
>> > > We have a tool in CM; if it's wrong, we should remove it.
>> > > How its results compare with JMH is an interesting question,
>> >
>> > I will look into this.
>> > > I
>> > > agree, but I don't have time to make an analysis of benchmarking
>> > > tools (on top of what I've been doing since December because
>> > > totally innocuous changes in the RNG classes were frowned upon
>> > > out of baseless fear).
>> >
>> > Please cut the hypberbole.
>> > >
>> > >>> * In a given environment (HW, OS, JVM), is there a lower limit
>> > >>> (absolute
>> > >>>   duration) below which anything will be deemed good enough?
>> > >>
>> > >> That depends completely on the application.
>> > >
>> > > Sorry, I thought that it was obvious: I don't speak of
>> applications
>> > > that don't care about performance. :-)
>> > >
>> > > For those that do, I do not agree with the statement: the
>> question
>> > > relates to finding a point below which it is the environment
>> that
>> > > overwhelms the other conditions.
>> > > A point where there will be _unavoidable_ overhead (transferring
>> data
>> > > from/to memory, JVM book-keeping, ...) and perturbations
>> (context
>> > > switches, ...) such that their duration adds a constant time (on
>> > > average) that may render most enhancements to an already
>> efficient
>> > > algorithm barely noticeable in practice.
>> > > Similarly, but in the opposite direction, some language
>> constructs
>> > > or design choices might slow down things a bit, but without
>> > > endangering any user.
>> > >
>> > > A problem arises when any enhancement to the design is deemed
>> > > harmful because it degrades a micro-benchmark, even though that
>> > > benchmark may not reflect any real use-cases.
>> > > Then, the real harm is against development.
>> > >
>> > >>> * Can a library like CM admit a trade-off between ultimate
>> > >>> performance and
>> > >>>   good design?   IOW, is there an acceptable overhead in
>> exchange
>> > >>> for other qualities
>> > >>>   (clarity, non-redundancy, extensibility, etc.)?
>> > >>
>> > >> That is too general a question to be meaningful.   We need to
>> look
>> > >> at specific cases.  What exactly are you proposing?
>> > >
>> > > <rant>
>> > > It is quite meaningful even if it refers to general principles.
>> > > Those could (should, IMO) be taken into account when managing a
>> > > project like CM, on a par with "performance" (whose intrinsic
>> value
>> > > is never questioned).
>> > > </rant>
>> >
>> > Rant all you want.  Vague generalities and hyperbole have no
>> value.
>> > >
>> > > Two specific cases are:
>> > > * inheritance vs delegation (a.k.a. composition)
>> > > * generics (that could require runtime casts)
>> >
>> > This is getting closer to meaningful.  Where exactly in the code
>> are
>> > you wanting to use something and seeing benchmark damage?
>> > >
>> > >>> * Does ultimate performance for the base functionality
>> (generation
>> > >>> of a
>> > >>>   random number) trump any consideration of use-cases that
>> would
>> > >>> need an
>> > >>>   extension (of the base functionality, such as computation to
>> > >>> match another
>> > >>>   distribution) that will unavoidably degrades the performance
>> > >>> (hence the
>> > >>>   micro-benchmark will be completely misleading for those
>> users)?
>> > >>
>> > >> Again, this is vague and the answer depends on what exactly you
>> are
>> > >> talking about. Significantly damaging performance of PRNG
>> > >> implementations is a bad idea,
>> > >
>> > > Now, *this* is vague: what do you mean by "significantly"?
>> > > That was actually my question in the first place.
>> > If you are talking about PRNG performance, I would say a 1% hit is
>> > significant.
>> > > Referring to the
>> > > benchmark above, people who'd know why they require ultimate
>> > > performance
>> > > should be able to tell what range of numbers they'd find
>> > > acceptable in
>> > > that table.
>> > >
>> > > <rant>
>> > > Actually my questions are very precise, but the answers would
>> require
>> > > some decent analysis, rather than the usual "bad idea"
>> dismissal.
>> > > </rant>
>> > >
>> > > In the Javadoc of the "random" package, there is information
>> about
>> > > performance but no reference as to the benchmarking procedure.
>> >
>> > It would be great to repeat these using JMH, which is emerging as
>> a
>> > de facto standard for java benchmarking.  I will look into this.
>> > >
>> > > I can consistently observe a totally different behaviour (using
>> > > "PerfTestUtils"):
>> > >  1. "MersenneTwister" is *always* faster than all of the WELL
>> RNGs;
>> > >  2. moreover, the ratio *grows* with each of the longer periods
>> > >     members of the WELL family (see the above table).
>> > >
>> > > This makes me wonder how someone who purports to need "ultimate"
>> > > performance can have any objective basis to determine what is
>> good
>> > > or bad for his own applications.
>> > >
>> > >> unless there are actual practical use
>> > >> cases you can point to that whatever changes you are proposing
>> > >> enable.
>> > >
>> > > As I've explained in very much details in another thread, I've
>> > > reviewed (from a design POV) the RNG code in "random" and IMHO,
>> there
>> > > is room for improvement (cf. above for what I mean by that
>> term).
>> > > <rant>
>> > > I have some code ready for review but I had to resort to what I
>> > > considered sub-optimal design (preemptively renouncing to
>> propose a
>> > > "delegation"-based design) solely because of the destructive
>> > > community
>> > > process that takes place here.[1]
>> > > </rant>
>> >
>> > More vague hyperbole that serves no purpose.  Please focus on
>> actual
>> > code or design issues.
>> > >
>> > > The practical use-cases is anything that needs further
>> processing of
>> > > the numbers produced according to a uniform distribution:
>> >
>> > Isn't that what the samplers in the distributions package do?  
>> What
>> > we need from the PRNG implementations is just blocks of bits.  
>> Since
>> > we wanted a pluggable replacement for j.u.Random, we added uniform
>> > ints, longs and floats and gaussian floats.  The samplers just
>> need
>> > uniform doubles.  The practical use case we need is well-supported
>> > in the code we have.  What is missing, exactly?
>> > > I agree that
>> > > there would be little sense to code that latter part in a "pure"
>> OO
>> > > way[2].  And Luc made it indeed quite efficient, I think, in the
>> > > various
>> > > concrete classes.
>> > > What I want to reconsider is how those concrete low-level
>> > > algorithms can
>> > > be plugged in a higher-level function that just requires a
>> "source of
>> > > randomness", as I'd call a provider of "int" (or "long") values,
>> > > where
>> > > the high level functionality does not care at all about the
>> > > provider's
>> > > inner working (a.o. how it's seeded!).
>> >
>> > This is why many higher-level samplers and other things that
>> require
>> > random data inside [math] have a pluggable RandomGenerator.
>> > >
>> > > A case in point is the sampling of other distributions (namely
>> the
>> > > Normal distribution).
>> >
>> > Or any of the others.  We have a default, inversion-based method
>> > that the abstract distribution classes provide and some pretty
>> good
>> > specialized implementations within individual distributions.  Most
>> > of these just require uniform random doubles as source.
>> >
>> > >
>> > > Here is the benchmark report:
>> > > -----
>> > > nextGaussian() (calls per timed block: 2000000, timed blocks:
>> 100,
>> > > time unit: ms)
>> > >                         name time/call std dev total time ratio
>> > > cv difference
>> > > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>> > > 0.14 0.0000e+00
>> > > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>> > > 0.07 1.2892e+04
>> > >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>> > > 0.06 1.0393e+04
>> > >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>> > > 0.07 1.1360e+04
>> > >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>> > > 0.04 1.1513e+04
>> > >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>> > > 0.03 1.2125e+04
>> > >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>> > > 0.06 1.1928e+04
>> > >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>> > > 0.04 1.3931e+04
>> > >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>> > > 0.06 1.4118e+04
>> > >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>> > > 0.08 1.1049e+04
>> > > -----
>> > > where the first line is JDK's "nextInt()" and the remaining are
>> > > "nextGaussian()".
>> > >
>> > > The generation time is thus about 4-fold that of "nextInt()".
>> > > Thus, degrading the performance of "nextInt()" by 10% would
>> > > degrade the
>> > > performance of "nextGaussian()" by half that.
>> > >
>> > > For a performance discussion to be meaningful, I think that we'd
>> need
>> > > to know how that fact would affect, even modestly, any
>> moderately
>> > > complex
>> > > post-processing of the generated values.
>> > >
>> > > Another case, for modularity, would be to consider that other
>> > > algorithms could
>> > > be implemented to provide the required distribution.[3]
>> > > In the current design (inheritance-based), that can only be done
>> > > by creating
>> > > a subclass, even though the core functionality ("nextDouble()")
>> is
>> > > not
>> > > overridden.
>> > >
>> > >>> * What are usages of the CM RNGs?
>> > >>>   Do those use-cases strictly forbid "loosing" a dozen
>> > >>> milliseconds per
>> > >>>   million calls?
>> > >>
>> > >> There are many different use cases.  My own applications use
>> them in
>> > >> simulations to generate random deviates, to generate random hex
>> > >> strings as identifiers and in stochastic algorithms like some
>> of our
>> > >> internal uses.  The last case is definitely sensitive to PRNG
>> > >> performance.
>> > >
>> > > Thanks for giving examples, but since we talk about performance,
>> I
>> > > was hoping for some real flesh, like the relative duration of
>> numbers
>> > > generation (e.g. the total duration of calls to the
>> "RandomGenerator"
>> > > instances wrt to the total duration of the application).
>> > >
>> > > I don't know if by "last case", you are referring to code that
>> is
>> > > inside CM.  I didn't spot anything that makes "heavy" usage of a
>> > > RNG (in the sense that generation would count as a sizable part
>> of
>> > > the whole processing).
>> > monteCarloP in KolmogorovSmirnovTest is one to check.
>> > >
>> > > As I pointed out many times: if an application is severely
>> dependent
>> > > on the performance of RNG, the user probably will turn to
>> specific
>> > > tools (e.g. GPUs? [4]) rather than use CM.
>> >
>> > That is a bogus argument.  We should make our PRNGs simple and
>> fast
>> > so their use can extend to performance-sensitive applications.
>> > >
>> > > Conversely, using Java might be preferred for its flexibility,
>> which
>> > > is destroyed by a search for ultimate performance (which nobody
>> seems
>> > > able to define reasonably).
>> > > Performance is not a goal in itself; it should not be a trophy
>> which
>> > > sits uselessly on a shelf.
>> >
>> > Nor should "beautiful design" in the eyes of one person.
>> > >
>> > > My goal is not to deliberately slow things down; it is to allow
>> some
>> > > leeway so that designs which are deemed better (on all counts
>> except,
>> > > perhaps, performance) are given a chance to show their
>> strengths, in
>> > > particular in areas where performance in absolute terms is "good
>> > > enough" for all use-cases which CM should care about (hence the
>> need
>> > > of actual data points[5]).
>> >
>> > I see no reason that we can't have it both ways - good design and
>> > good performance. What we have now, modulo maybe some small
>> changes
>> > to reduce code duplication, works fine.  If you want to play with
>> > 64-bit generators and can find reference implementations and
>> verify
>> > that they do in fact perform better, great.  If not, I don't see
>> the
>> > point.  You can rant and complain all you want; but I am not going
>> > to let us trash performance or correctness of code in the random
>> > class or anywhere else just because you think it is somehow
>> "better
>> > designed"  unless you can show specific, practical use cases
>> > demonstrating the value of the changes.
>> >
>> > Phil
>> > >
>> > >
>> > > Gilles
>> > >
>> > > [1] "Is it faster?"
>> > >     "No."
>> > >     "Then, no."
>> > > [2] Although that is in some sense what you indirectly defend by
>> > > wanting
>> > >     to stick with a meaningless "next(int bits)" method.
>> > > [3] http://www.doornik.com/research/ziggurat.pdf
>> > > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>> > > [5] Hence the need to agree on a methodology/policy for
>> benchmarking.
>> > >
>> > >>
>> > >> Phil
>> > >>
>> > >> [1] http://openjdk.java.net/projects/code-tools/jmh/
>> > >>>   IOW, would those users for which such a difference matters
>> use
>> > >>> CM at all?
>> > >>
>> > >>>
>> > >>>
>> > >>> Thanks,
>> > >>> Gilles
>> > >
>> > >
>> > >
>> > >
>> ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: [hidden email]
>> > > For additional commands, e-mail: [hidden email]
>> > >
>> > >
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>> >
>> >
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Phil Steitz
In reply to this post by Gilles Sadowski
OK, I give up.  I am withdrawing as volunteer chair or member of the
new TLP.

Phil

On 2/5/16 7:23 PM, Gilles wrote:

> Phil,
>
> You talk again about me trying to push forward changes that
> serve no purpose besides "trash performance and correctness".
> This is again baseless FUD to which I've already answered
> (with detailed list of facts which you chose to ignore).
> You declare anything for which you don't have an answer as
> "bogus argument". Why is the reference to multi-threaded
> implementations bogus?  You contradict yourself in pretending
> that CM RNGs could be so good as to make people want to use
> them while refusing to consider whether another design might
> be better suited to such high(er)-performance extensions.
> This particular case is a long shot but if any and all
> discussions are stopped dead, how do you imagine that we can
> go anywhere?
> As you could read from experts, micro-benchmarks are deceiving;
> but you refuse to even consider alternative designs if there
> might be a slight suspicion of degradation.
> How can we ever set up a constructive discussion on how to
> make everybody enjoy this project if the purported chair is
> so bent to protecting existing code rather than nurture a good
> relationship with developers who may sometimes have other ideas?
> I'm trying to improve the code (in a dimension which you can't
> seem to understand unfortunately) but respectfully request
> data points from those users of said code, in order to be
> able to prove that no harm will be done.
> But you seem to prefer to not disclose anything that would
> get us closer to agreement (better design with similar
> performance and room for improvement, to be discussed
> together as a real development team -- Not you requiring,
> as a bad boss, that I bow to your standards for judging
> usefulness).
> This 1% which you throw at me, where does it come from?
> What does 1% mean when the benchmark shows standard deviations
> that vary from 4 to 26% in the "nextInt" case and from 3 to
> 7% in the "nextGaussian" case?
> This 1% looks meaningless without context; context is what I'm
> asking in order to try and establish objectively whether
> another design will have a measurable impact on actual tasks.
> I'm not going to show any "damaged" benchmark because of how
> unwelcome you make me feel every time I wish to talk about
> other aspects of the code.
> There is no development community here.  Only solitary
> coders who share a repository.
>
> Not sorry for the top-post,
> Gilles
>
>
> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>> On 2/5/16 12:59 PM, Gilles wrote:
>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>> Hi.
>>>>>
>>>>> Here is a micro-benchmark report (performed with
>>>>> "PerfTestUtils"):
>>>>> -----
>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>> time
>>>>> unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>> 0.26 0.0000e+00
>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>> 0.15 -1.2900e+02
>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>> 0.04 2.1032e+02
>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>> 0.14 5.1945e+02
>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>> 0.14 8.1451e+02
>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>> 0.06 9.7816e+02
>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>> 0.08 1.6602e+03
>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>> 0.14 1.7301e+03
>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>> 0.16 1.6139e+02
>>>>> -----
>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>
>>>>> Questions are:
>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>> has
>>>>> a very
>>>>>   small duration (wrt e.g. the duration of other machine
>>>>> instructions that
>>>>>   are required to perform them)?
>>>>
>>>> It is harder to get good benchmarks for shorter duration
>>>> activities,
>>>> but not impossible.  One thing that it would be good to do is to
>>>> compare these results with JMH [1].
>>>
>>> I was expecting insights based on the benchmark which I did run.
>>
>> You asked whether or not benchmarks are meaningful when the task
>> being benchmarked is short duration.  I answered that question.
>>>
>>> We have a tool in CM; if it's wrong, we should remove it.
>>> How its results compare with JMH is an interesting question,
>>
>> I will look into this.
>>> I
>>> agree, but I don't have time to make an analysis of benchmarking
>>> tools (on top of what I've been doing since December because
>>> totally innocuous changes in the RNG classes were frowned upon
>>> out of baseless fear).
>>
>> Please cut the hypberbole.
>>>
>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>> (absolute
>>>>>   duration) below which anything will be deemed good enough?
>>>>
>>>> That depends completely on the application.
>>>
>>> Sorry, I thought that it was obvious: I don't speak of applications
>>> that don't care about performance. :-)
>>>
>>> For those that do, I do not agree with the statement: the question
>>> relates to finding a point below which it is the environment that
>>> overwhelms the other conditions.
>>> A point where there will be _unavoidable_ overhead (transferring
>>> data
>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>> switches, ...) such that their duration adds a constant time (on
>>> average) that may render most enhancements to an already efficient
>>> algorithm barely noticeable in practice.
>>> Similarly, but in the opposite direction, some language constructs
>>> or design choices might slow down things a bit, but without
>>> endangering any user.
>>>
>>> A problem arises when any enhancement to the design is deemed
>>> harmful because it degrades a micro-benchmark, even though that
>>> benchmark may not reflect any real use-cases.
>>> Then, the real harm is against development.
>>>
>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>> performance and
>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>> for other qualities
>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>
>>>> That is too general a question to be meaningful.   We need to look
>>>> at specific cases.  What exactly are you proposing?
>>>
>>> <rant>
>>> It is quite meaningful even if it refers to general principles.
>>> Those could (should, IMO) be taken into account when managing a
>>> project like CM, on a par with "performance" (whose intrinsic value
>>> is never questioned).
>>> </rant>
>>
>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>
>>> Two specific cases are:
>>> * inheritance vs delegation (a.k.a. composition)
>>> * generics (that could require runtime casts)
>>
>> This is getting closer to meaningful.  Where exactly in the code are
>> you wanting to use something and seeing benchmark damage?
>>>
>>>>> * Does ultimate performance for the base functionality
>>>>> (generation
>>>>> of a
>>>>>   random number) trump any consideration of use-cases that would
>>>>> need an
>>>>>   extension (of the base functionality, such as computation to
>>>>> match another
>>>>>   distribution) that will unavoidably degrades the performance
>>>>> (hence the
>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>
>>>> Again, this is vague and the answer depends on what exactly you
>>>> are
>>>> talking about. Significantly damaging performance of PRNG
>>>> implementations is a bad idea,
>>>
>>> Now, *this* is vague: what do you mean by "significantly"?
>>> That was actually my question in the first place.
>> If you are talking about PRNG performance, I would say a 1% hit is
>> significant.
>>> Referring to the
>>> benchmark above, people who'd know why they require ultimate
>>> performance
>>> should be able to tell what range of numbers they'd find
>>> acceptable in
>>> that table.
>>>
>>> <rant>
>>> Actually my questions are very precise, but the answers would
>>> require
>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>> </rant>
>>>
>>> In the Javadoc of the "random" package, there is information about
>>> performance but no reference as to the benchmarking procedure.
>>
>> It would be great to repeat these using JMH, which is emerging as a
>> de facto standard for java benchmarking.  I will look into this.
>>>
>>> I can consistently observe a totally different behaviour (using
>>> "PerfTestUtils"):
>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>     members of the WELL family (see the above table).
>>>
>>> This makes me wonder how someone who purports to need "ultimate"
>>> performance can have any objective basis to determine what is good
>>> or bad for his own applications.
>>>
>>>> unless there are actual practical use
>>>> cases you can point to that whatever changes you are proposing
>>>> enable.
>>>
>>> As I've explained in very much details in another thread, I've
>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>> there
>>> is room for improvement (cf. above for what I mean by that term).
>>> <rant>
>>> I have some code ready for review but I had to resort to what I
>>> considered sub-optimal design (preemptively renouncing to propose a
>>> "delegation"-based design) solely because of the destructive
>>> community
>>> process that takes place here.[1]
>>> </rant>
>>
>> More vague hyperbole that serves no purpose.  Please focus on actual
>> code or design issues.
>>>
>>> The practical use-cases is anything that needs further
>>> processing of
>>> the numbers produced according to a uniform distribution:
>>
>> Isn't that what the samplers in the distributions package do?  What
>> we need from the PRNG implementations is just blocks of bits.  Since
>> we wanted a pluggable replacement for j.u.Random, we added uniform
>> ints, longs and floats and gaussian floats.  The samplers just need
>> uniform doubles.  The practical use case we need is well-supported
>> in the code we have.  What is missing, exactly?
>>> I agree that
>>> there would be little sense to code that latter part in a "pure" OO
>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>> various
>>> concrete classes.
>>> What I want to reconsider is how those concrete low-level
>>> algorithms can
>>> be plugged in a higher-level function that just requires a
>>> "source of
>>> randomness", as I'd call a provider of "int" (or "long") values,
>>> where
>>> the high level functionality does not care at all about the
>>> provider's
>>> inner working (a.o. how it's seeded!).
>>
>> This is why many higher-level samplers and other things that require
>> random data inside [math] have a pluggable RandomGenerator.
>>>
>>> A case in point is the sampling of other distributions (namely the
>>> Normal distribution).
>>
>> Or any of the others.  We have a default, inversion-based method
>> that the abstract distribution classes provide and some pretty good
>> specialized implementations within individual distributions.  Most
>> of these just require uniform random doubles as source.
>>
>>>
>>> Here is the benchmark report:
>>> -----
>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>> time unit: ms)
>>>                         name time/call std dev total time ratio
>>> cv difference
>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>> 0.14 0.0000e+00
>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>> 0.07 1.2892e+04
>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>> 0.06 1.0393e+04
>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>> 0.07 1.1360e+04
>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>> 0.04 1.1513e+04
>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>> 0.03 1.2125e+04
>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>> 0.06 1.1928e+04
>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>> 0.04 1.3931e+04
>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>> 0.06 1.4118e+04
>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>> 0.08 1.1049e+04
>>> -----
>>> where the first line is JDK's "nextInt()" and the remaining are
>>> "nextGaussian()".
>>>
>>> The generation time is thus about 4-fold that of "nextInt()".
>>> Thus, degrading the performance of "nextInt()" by 10% would
>>> degrade the
>>> performance of "nextGaussian()" by half that.
>>>
>>> For a performance discussion to be meaningful, I think that we'd
>>> need
>>> to know how that fact would affect, even modestly, any moderately
>>> complex
>>> post-processing of the generated values.
>>>
>>> Another case, for modularity, would be to consider that other
>>> algorithms could
>>> be implemented to provide the required distribution.[3]
>>> In the current design (inheritance-based), that can only be done
>>> by creating
>>> a subclass, even though the core functionality ("nextDouble()") is
>>> not
>>> overridden.
>>>
>>>>> * What are usages of the CM RNGs?
>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>> milliseconds per
>>>>>   million calls?
>>>>
>>>> There are many different use cases.  My own applications use
>>>> them in
>>>> simulations to generate random deviates, to generate random hex
>>>> strings as identifiers and in stochastic algorithms like some
>>>> of our
>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>> performance.
>>>
>>> Thanks for giving examples, but since we talk about performance, I
>>> was hoping for some real flesh, like the relative duration of
>>> numbers
>>> generation (e.g. the total duration of calls to the
>>> "RandomGenerator"
>>> instances wrt to the total duration of the application).
>>>
>>> I don't know if by "last case", you are referring to code that is
>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>> RNG (in the sense that generation would count as a sizable part of
>>> the whole processing).
>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>
>>> As I pointed out many times: if an application is severely
>>> dependent
>>> on the performance of RNG, the user probably will turn to specific
>>> tools (e.g. GPUs? [4]) rather than use CM.
>>
>> That is a bogus argument.  We should make our PRNGs simple and fast
>> so their use can extend to performance-sensitive applications.
>>>
>>> Conversely, using Java might be preferred for its flexibility,
>>> which
>>> is destroyed by a search for ultimate performance (which nobody
>>> seems
>>> able to define reasonably).
>>> Performance is not a goal in itself; it should not be a trophy
>>> which
>>> sits uselessly on a shelf.
>>
>> Nor should "beautiful design" in the eyes of one person.
>>>
>>> My goal is not to deliberately slow things down; it is to allow
>>> some
>>> leeway so that designs which are deemed better (on all counts
>>> except,
>>> perhaps, performance) are given a chance to show their
>>> strengths, in
>>> particular in areas where performance in absolute terms is "good
>>> enough" for all use-cases which CM should care about (hence the
>>> need
>>> of actual data points[5]).
>>
>> I see no reason that we can't have it both ways - good design and
>> good performance. What we have now, modulo maybe some small changes
>> to reduce code duplication, works fine.  If you want to play with
>> 64-bit generators and can find reference implementations and verify
>> that they do in fact perform better, great.  If not, I don't see the
>> point.  You can rant and complain all you want; but I am not going
>> to let us trash performance or correctness of code in the random
>> class or anywhere else just because you think it is somehow "better
>> designed"  unless you can show specific, practical use cases
>> demonstrating the value of the changes.
>>
>> Phil
>>>
>>>
>>> Gilles
>>>
>>> [1] "Is it faster?"
>>>     "No."
>>>     "Then, no."
>>> [2] Although that is in some sense what you indirectly defend by
>>> wanting
>>>     to stick with a meaningless "next(int bits)" method.
>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>> [5] Hence the need to agree on a methodology/policy for
>>> benchmarking.
>>>
>>>>
>>>> Phil
>>>>
>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>>   IOW, would those users for which such a difference matters use
>>>>> CM at all?
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

William Barker
In reply to this post by Gilles Sadowski
As a veteran of the Great Tomcat Flame Wars let me show you how this should
be done.
<flame>
Gilles is too ignorant of even basic statistics (as shown by his post on
geometric distributions) to even be working on PRNGs. As a user of C-M I
would never use any class authored by Gilles relating to random numbers
without spending weeks of testing it against a trusted tool box.
I haven't yet moved this to C-M, but my typical use case would be about
1000 simulations on a 100 dimensional space. So a hypothetical 1% decrease
in performance would make C-M a piece of crap
</flame>

On Friday, February 5, 2016, Gilles <[hidden email]> wrote:

> Phil,
>
> You talk again about me trying to push forward changes that
> serve no purpose besides "trash performance and correctness".
> This is again baseless FUD to which I've already answered
> (with detailed list of facts which you chose to ignore).
> You declare anything for which you don't have an answer as
> "bogus argument". Why is the reference to multi-threaded
> implementations bogus?  You contradict yourself in pretending
> that CM RNGs could be so good as to make people want to use
> them while refusing to consider whether another design might
> be better suited to such high(er)-performance extensions.
> This particular case is a long shot but if any and all
> discussions are stopped dead, how do you imagine that we can
> go anywhere?
> As you could read from experts, micro-benchmarks are deceiving;
> but you refuse to even consider alternative designs if there
> might be a slight suspicion of degradation.
> How can we ever set up a constructive discussion on how to
> make everybody enjoy this project if the purported chair is
> so bent to protecting existing code rather than nurture a good
> relationship with developers who may sometimes have other ideas?
> I'm trying to improve the code (in a dimension which you can't
> seem to understand unfortunately) but respectfully request
> data points from those users of said code, in order to be
> able to prove that no harm will be done.
> But you seem to prefer to not disclose anything that would
> get us closer to agreement (better design with similar
> performance and room for improvement, to be discussed
> together as a real development team -- Not you requiring,
> as a bad boss, that I bow to your standards for judging
> usefulness).
> This 1% which you throw at me, where does it come from?
> What does 1% mean when the benchmark shows standard deviations
> that vary from 4 to 26% in the "nextInt" case and from 3 to
> 7% in the "nextGaussian" case?
> This 1% looks meaningless without context; context is what I'm
> asking in order to try and establish objectively whether
> another design will have a measurable impact on actual tasks.
> I'm not going to show any "damaged" benchmark because of how
> unwelcome you make me feel every time I wish to talk about
> other aspects of the code.
> There is no development community here.  Only solitary
> coders who share a repository.
>
> Not sorry for the top-post,
> Gilles
>
>
> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>
>> On 2/5/16 12:59 PM, Gilles wrote:
>>
>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>
>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>
>>>>> Hi.
>>>>>
>>>>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>>>>> -----
>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>>>>> unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>> 0.26 0.0000e+00
>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>> 0.15 -1.2900e+02
>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>> 0.04 2.1032e+02
>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>> 0.14 5.1945e+02
>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>> 0.14 8.1451e+02
>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>> 0.06 9.7816e+02
>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>> 0.08 1.6602e+03
>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>> 0.14 1.7301e+03
>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>> 0.16 1.6139e+02
>>>>> -----
>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>
>>>>> Questions are:
>>>>> * How meaningful are micro-benchmarks when the timed operation has
>>>>> a very
>>>>>   small duration (wrt e.g. the duration of other machine
>>>>> instructions that
>>>>>   are required to perform them)?
>>>>>
>>>>
>>>> It is harder to get good benchmarks for shorter duration activities,
>>>> but not impossible.  One thing that it would be good to do is to
>>>> compare these results with JMH [1].
>>>>
>>>
>>> I was expecting insights based on the benchmark which I did run.
>>>
>>
>> You asked whether or not benchmarks are meaningful when the task
>> being benchmarked is short duration.  I answered that question.
>>
>>>
>>> We have a tool in CM; if it's wrong, we should remove it.
>>> How its results compare with JMH is an interesting question,
>>>
>>
>> I will look into this.
>>
>>> I
>>> agree, but I don't have time to make an analysis of benchmarking
>>> tools (on top of what I've been doing since December because
>>> totally innocuous changes in the RNG classes were frowned upon
>>> out of baseless fear).
>>>
>>
>> Please cut the hypberbole.
>>
>>>
>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>> (absolute
>>>>>   duration) below which anything will be deemed good enough?
>>>>>
>>>>
>>>> That depends completely on the application.
>>>>
>>>
>>> Sorry, I thought that it was obvious: I don't speak of applications
>>> that don't care about performance. :-)
>>>
>>> For those that do, I do not agree with the statement: the question
>>> relates to finding a point below which it is the environment that
>>> overwhelms the other conditions.
>>> A point where there will be _unavoidable_ overhead (transferring data
>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>> switches, ...) such that their duration adds a constant time (on
>>> average) that may render most enhancements to an already efficient
>>> algorithm barely noticeable in practice.
>>> Similarly, but in the opposite direction, some language constructs
>>> or design choices might slow down things a bit, but without
>>> endangering any user.
>>>
>>> A problem arises when any enhancement to the design is deemed
>>> harmful because it degrades a micro-benchmark, even though that
>>> benchmark may not reflect any real use-cases.
>>> Then, the real harm is against development.
>>>
>>> * Can a library like CM admit a trade-off between ultimate
>>>>> performance and
>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>> for other qualities
>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>>
>>>>
>>>> That is too general a question to be meaningful.   We need to look
>>>> at specific cases.  What exactly are you proposing?
>>>>
>>>
>>> <rant>
>>> It is quite meaningful even if it refers to general principles.
>>> Those could (should, IMO) be taken into account when managing a
>>> project like CM, on a par with "performance" (whose intrinsic value
>>> is never questioned).
>>> </rant>
>>>
>>
>> Rant all you want.  Vague generalities and hyperbole have no value.
>>
>>>
>>> Two specific cases are:
>>> * inheritance vs delegation (a.k.a. composition)
>>> * generics (that could require runtime casts)
>>>
>>
>> This is getting closer to meaningful.  Where exactly in the code are
>> you wanting to use something and seeing benchmark damage?
>>
>>>
>>> * Does ultimate performance for the base functionality (generation
>>>>> of a
>>>>>   random number) trump any consideration of use-cases that would
>>>>> need an
>>>>>   extension (of the base functionality, such as computation to
>>>>> match another
>>>>>   distribution) that will unavoidably degrades the performance
>>>>> (hence the
>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>>
>>>>
>>>> Again, this is vague and the answer depends on what exactly you are
>>>> talking about. Significantly damaging performance of PRNG
>>>> implementations is a bad idea,
>>>>
>>>
>>> Now, *this* is vague: what do you mean by "significantly"?
>>> That was actually my question in the first place.
>>>
>> If you are talking about PRNG performance, I would say a 1% hit is
>> significant.
>>
>>> Referring to the
>>> benchmark above, people who'd know why they require ultimate
>>> performance
>>> should be able to tell what range of numbers they'd find
>>> acceptable in
>>> that table.
>>>
>>> <rant>
>>> Actually my questions are very precise, but the answers would require
>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>> </rant>
>>>
>>> In the Javadoc of the "random" package, there is information about
>>> performance but no reference as to the benchmarking procedure.
>>>
>>
>> It would be great to repeat these using JMH, which is emerging as a
>> de facto standard for java benchmarking.  I will look into this.
>>
>>>
>>> I can consistently observe a totally different behaviour (using
>>> "PerfTestUtils"):
>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>     members of the WELL family (see the above table).
>>>
>>> This makes me wonder how someone who purports to need "ultimate"
>>> performance can have any objective basis to determine what is good
>>> or bad for his own applications.
>>>
>>> unless there are actual practical use
>>>> cases you can point to that whatever changes you are proposing
>>>> enable.
>>>>
>>>
>>> As I've explained in very much details in another thread, I've
>>> reviewed (from a design POV) the RNG code in "random" and IMHO, there
>>> is room for improvement (cf. above for what I mean by that term).
>>> <rant>
>>> I have some code ready for review but I had to resort to what I
>>> considered sub-optimal design (preemptively renouncing to propose a
>>> "delegation"-based design) solely because of the destructive
>>> community
>>> process that takes place here.[1]
>>> </rant>
>>>
>>
>> More vague hyperbole that serves no purpose.  Please focus on actual
>> code or design issues.
>>
>>>
>>> The practical use-cases is anything that needs further processing of
>>> the numbers produced according to a uniform distribution:
>>>
>>
>> Isn't that what the samplers in the distributions package do?  What
>> we need from the PRNG implementations is just blocks of bits.  Since
>> we wanted a pluggable replacement for j.u.Random, we added uniform
>> ints, longs and floats and gaussian floats.  The samplers just need
>> uniform doubles.  The practical use case we need is well-supported
>> in the code we have.  What is missing, exactly?
>>
>>> I agree that
>>> there would be little sense to code that latter part in a "pure" OO
>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>> various
>>> concrete classes.
>>> What I want to reconsider is how those concrete low-level
>>> algorithms can
>>> be plugged in a higher-level function that just requires a "source of
>>> randomness", as I'd call a provider of "int" (or "long") values,
>>> where
>>> the high level functionality does not care at all about the
>>> provider's
>>> inner working (a.o. how it's seeded!).
>>>
>>
>> This is why many higher-level samplers and other things that require
>> random data inside [math] have a pluggable RandomGenerator.
>>
>>>
>>> A case in point is the sampling of other distributions (namely the
>>> Normal distribution).
>>>
>>
>> Or any of the others.  We have a default, inversion-based method
>> that the abstract distribution classes provide and some pretty good
>> specialized implementations within individual distributions.  Most
>> of these just require uniform random doubles as source.
>>
>>
>>> Here is the benchmark report:
>>> -----
>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>> time unit: ms)
>>>                         name time/call std dev total time ratio
>>> cv difference
>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>> 0.14 0.0000e+00
>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>> 0.07 1.2892e+04
>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>> 0.06 1.0393e+04
>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>> 0.07 1.1360e+04
>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>> 0.04 1.1513e+04
>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>> 0.03 1.2125e+04
>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>> 0.06 1.1928e+04
>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>> 0.04 1.3931e+04
>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>> 0.06 1.4118e+04
>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>> 0.08 1.1049e+04
>>> -----
>>> where the first line is JDK's "nextInt()" and the remaining are
>>> "nextGaussian()".
>>>
>>> The generation time is thus about 4-fold that of "nextInt()".
>>> Thus, degrading the performance of "nextInt()" by 10% would
>>> degrade the
>>> performance of "nextGaussian()" by half that.
>>>
>>> For a performance discussion to be meaningful, I think that we'd need
>>> to know how that fact would affect, even modestly, any moderately
>>> complex
>>> post-processing of the generated values.
>>>
>>> Another case, for modularity, would be to consider that other
>>> algorithms could
>>> be implemented to provide the required distribution.[3]
>>> In the current design (inheritance-based), that can only be done
>>> by creating
>>> a subclass, even though the core functionality ("nextDouble()") is
>>> not
>>> overridden.
>>>
>>> * What are usages of the CM RNGs?
>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>> milliseconds per
>>>>>   million calls?
>>>>>
>>>>
>>>> There are many different use cases.  My own applications use them in
>>>> simulations to generate random deviates, to generate random hex
>>>> strings as identifiers and in stochastic algorithms like some of our
>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>> performance.
>>>>
>>>
>>> Thanks for giving examples, but since we talk about performance, I
>>> was hoping for some real flesh, like the relative duration of numbers
>>> generation (e.g. the total duration of calls to the "RandomGenerator"
>>> instances wrt to the total duration of the application).
>>>
>>> I don't know if by "last case", you are referring to code that is
>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>> RNG (in the sense that generation would count as a sizable part of
>>> the whole processing).
>>>
>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>
>>>
>>> As I pointed out many times: if an application is severely dependent
>>> on the performance of RNG, the user probably will turn to specific
>>> tools (e.g. GPUs? [4]) rather than use CM.
>>>
>>
>> That is a bogus argument.  We should make our PRNGs simple and fast
>> so their use can extend to performance-sensitive applications.
>>
>>>
>>> Conversely, using Java might be preferred for its flexibility, which
>>> is destroyed by a search for ultimate performance (which nobody seems
>>> able to define reasonably).
>>> Performance is not a goal in itself; it should not be a trophy which
>>> sits uselessly on a shelf.
>>>
>>
>> Nor should "beautiful design" in the eyes of one person.
>>
>>>
>>> My goal is not to deliberately slow things down; it is to allow some
>>> leeway so that designs which are deemed better (on all counts except,
>>> perhaps, performance) are given a chance to show their strengths, in
>>> particular in areas where performance in absolute terms is "good
>>> enough" for all use-cases which CM should care about (hence the need
>>> of actual data points[5]).
>>>
>>
>> I see no reason that we can't have it both ways - good design and
>> good performance. What we have now, modulo maybe some small changes
>> to reduce code duplication, works fine.  If you want to play with
>> 64-bit generators and can find reference implementations and verify
>> that they do in fact perform better, great.  If not, I don't see the
>> point.  You can rant and complain all you want; but I am not going
>> to let us trash performance or correctness of code in the random
>> class or anywhere else just because you think it is somehow "better
>> designed"  unless you can show specific, practical use cases
>> demonstrating the value of the changes.
>>
>> Phil
>>
>>>
>>>
>>> Gilles
>>>
>>> [1] "Is it faster?"
>>>     "No."
>>>     "Then, no."
>>> [2] Although that is in some sense what you indirectly defend by
>>> wanting
>>>     to stick with a meaningless "next(int bits)" method.
>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>> [5] Hence the need to agree on a methodology/policy for benchmarking.
>>>
>>>
>>>> Phil
>>>>
>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>
>>>>>   IOW, would those users for which such a difference matters use
>>>>> CM at all?
>>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Gilles
>>>>>
>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

James Carman
In reply to this post by Phil Steitz
Okay, folks, this is definitely getting out of hand. Let's put a moratorium
on this thread for the weekend or something and try to come back together
next week and try to move forward. I would urge folks to watch this while
we wait:

https://m.youtube.com/watch?v=rOWmrlft2FI

p.s. Phil, I do hope you'll reconsider.

On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <[hidden email]> wrote:

> OK, I give up.  I am withdrawing as volunteer chair or member of the
> new TLP.
>
> Phil
>
> On 2/5/16 7:23 PM, Gilles wrote:
> > Phil,
> >
> > You talk again about me trying to push forward changes that
> > serve no purpose besides "trash performance and correctness".
> > This is again baseless FUD to which I've already answered
> > (with detailed list of facts which you chose to ignore).
> > You declare anything for which you don't have an answer as
> > "bogus argument". Why is the reference to multi-threaded
> > implementations bogus?  You contradict yourself in pretending
> > that CM RNGs could be so good as to make people want to use
> > them while refusing to consider whether another design might
> > be better suited to such high(er)-performance extensions.
> > This particular case is a long shot but if any and all
> > discussions are stopped dead, how do you imagine that we can
> > go anywhere?
> > As you could read from experts, micro-benchmarks are deceiving;
> > but you refuse to even consider alternative designs if there
> > might be a slight suspicion of degradation.
> > How can we ever set up a constructive discussion on how to
> > make everybody enjoy this project if the purported chair is
> > so bent to protecting existing code rather than nurture a good
> > relationship with developers who may sometimes have other ideas?
> > I'm trying to improve the code (in a dimension which you can't
> > seem to understand unfortunately) but respectfully request
> > data points from those users of said code, in order to be
> > able to prove that no harm will be done.
> > But you seem to prefer to not disclose anything that would
> > get us closer to agreement (better design with similar
> > performance and room for improvement, to be discussed
> > together as a real development team -- Not you requiring,
> > as a bad boss, that I bow to your standards for judging
> > usefulness).
> > This 1% which you throw at me, where does it come from?
> > What does 1% mean when the benchmark shows standard deviations
> > that vary from 4 to 26% in the "nextInt" case and from 3 to
> > 7% in the "nextGaussian" case?
> > This 1% looks meaningless without context; context is what I'm
> > asking in order to try and establish objectively whether
> > another design will have a measurable impact on actual tasks.
> > I'm not going to show any "damaged" benchmark because of how
> > unwelcome you make me feel every time I wish to talk about
> > other aspects of the code.
> > There is no development community here.  Only solitary
> > coders who share a repository.
> >
> > Not sorry for the top-post,
> > Gilles
> >
> >
> > On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
> >> On 2/5/16 12:59 PM, Gilles wrote:
> >>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
> >>>> On 2/4/16 3:59 PM, Gilles wrote:
> >>>>> Hi.
> >>>>>
> >>>>> Here is a micro-benchmark report (performed with
> >>>>> "PerfTestUtils"):
> >>>>> -----
> >>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
> >>>>> time
> >>>>> unit: ms)
> >>>>>                         name time/call std dev total time ratio
> >>>>> cv difference
> >>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
> >>>>> 0.26 0.0000e+00
> >>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
> >>>>> 0.15 -1.2900e+02
> >>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
> >>>>> 0.04 2.1032e+02
> >>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
> >>>>> 0.14 5.1945e+02
> >>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
> >>>>> 0.14 8.1451e+02
> >>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
> >>>>> 0.06 9.7816e+02
> >>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
> >>>>> 0.08 1.6602e+03
> >>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
> >>>>> 0.14 1.7301e+03
> >>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
> >>>>> 0.16 1.6139e+02
> >>>>> -----
> >>>>> where "cv" is the ratio of the 3rd to the 2nd column.
> >>>>>
> >>>>> Questions are:
> >>>>> * How meaningful are micro-benchmarks when the timed operation
> >>>>> has
> >>>>> a very
> >>>>>   small duration (wrt e.g. the duration of other machine
> >>>>> instructions that
> >>>>>   are required to perform them)?
> >>>>
> >>>> It is harder to get good benchmarks for shorter duration
> >>>> activities,
> >>>> but not impossible.  One thing that it would be good to do is to
> >>>> compare these results with JMH [1].
> >>>
> >>> I was expecting insights based on the benchmark which I did run.
> >>
> >> You asked whether or not benchmarks are meaningful when the task
> >> being benchmarked is short duration.  I answered that question.
> >>>
> >>> We have a tool in CM; if it's wrong, we should remove it.
> >>> How its results compare with JMH is an interesting question,
> >>
> >> I will look into this.
> >>> I
> >>> agree, but I don't have time to make an analysis of benchmarking
> >>> tools (on top of what I've been doing since December because
> >>> totally innocuous changes in the RNG classes were frowned upon
> >>> out of baseless fear).
> >>
> >> Please cut the hypberbole.
> >>>
> >>>>> * In a given environment (HW, OS, JVM), is there a lower limit
> >>>>> (absolute
> >>>>>   duration) below which anything will be deemed good enough?
> >>>>
> >>>> That depends completely on the application.
> >>>
> >>> Sorry, I thought that it was obvious: I don't speak of applications
> >>> that don't care about performance. :-)
> >>>
> >>> For those that do, I do not agree with the statement: the question
> >>> relates to finding a point below which it is the environment that
> >>> overwhelms the other conditions.
> >>> A point where there will be _unavoidable_ overhead (transferring
> >>> data
> >>> from/to memory, JVM book-keeping, ...) and perturbations (context
> >>> switches, ...) such that their duration adds a constant time (on
> >>> average) that may render most enhancements to an already efficient
> >>> algorithm barely noticeable in practice.
> >>> Similarly, but in the opposite direction, some language constructs
> >>> or design choices might slow down things a bit, but without
> >>> endangering any user.
> >>>
> >>> A problem arises when any enhancement to the design is deemed
> >>> harmful because it degrades a micro-benchmark, even though that
> >>> benchmark may not reflect any real use-cases.
> >>> Then, the real harm is against development.
> >>>
> >>>>> * Can a library like CM admit a trade-off between ultimate
> >>>>> performance and
> >>>>>   good design?   IOW, is there an acceptable overhead in exchange
> >>>>> for other qualities
> >>>>>   (clarity, non-redundancy, extensibility, etc.)?
> >>>>
> >>>> That is too general a question to be meaningful.   We need to look
> >>>> at specific cases.  What exactly are you proposing?
> >>>
> >>> <rant>
> >>> It is quite meaningful even if it refers to general principles.
> >>> Those could (should, IMO) be taken into account when managing a
> >>> project like CM, on a par with "performance" (whose intrinsic value
> >>> is never questioned).
> >>> </rant>
> >>
> >> Rant all you want.  Vague generalities and hyperbole have no value.
> >>>
> >>> Two specific cases are:
> >>> * inheritance vs delegation (a.k.a. composition)
> >>> * generics (that could require runtime casts)
> >>
> >> This is getting closer to meaningful.  Where exactly in the code are
> >> you wanting to use something and seeing benchmark damage?
> >>>
> >>>>> * Does ultimate performance for the base functionality
> >>>>> (generation
> >>>>> of a
> >>>>>   random number) trump any consideration of use-cases that would
> >>>>> need an
> >>>>>   extension (of the base functionality, such as computation to
> >>>>> match another
> >>>>>   distribution) that will unavoidably degrades the performance
> >>>>> (hence the
> >>>>>   micro-benchmark will be completely misleading for those users)?
> >>>>
> >>>> Again, this is vague and the answer depends on what exactly you
> >>>> are
> >>>> talking about. Significantly damaging performance of PRNG
> >>>> implementations is a bad idea,
> >>>
> >>> Now, *this* is vague: what do you mean by "significantly"?
> >>> That was actually my question in the first place.
> >> If you are talking about PRNG performance, I would say a 1% hit is
> >> significant.
> >>> Referring to the
> >>> benchmark above, people who'd know why they require ultimate
> >>> performance
> >>> should be able to tell what range of numbers they'd find
> >>> acceptable in
> >>> that table.
> >>>
> >>> <rant>
> >>> Actually my questions are very precise, but the answers would
> >>> require
> >>> some decent analysis, rather than the usual "bad idea" dismissal.
> >>> </rant>
> >>>
> >>> In the Javadoc of the "random" package, there is information about
> >>> performance but no reference as to the benchmarking procedure.
> >>
> >> It would be great to repeat these using JMH, which is emerging as a
> >> de facto standard for java benchmarking.  I will look into this.
> >>>
> >>> I can consistently observe a totally different behaviour (using
> >>> "PerfTestUtils"):
> >>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
> >>>  2. moreover, the ratio *grows* with each of the longer periods
> >>>     members of the WELL family (see the above table).
> >>>
> >>> This makes me wonder how someone who purports to need "ultimate"
> >>> performance can have any objective basis to determine what is good
> >>> or bad for his own applications.
> >>>
> >>>> unless there are actual practical use
> >>>> cases you can point to that whatever changes you are proposing
> >>>> enable.
> >>>
> >>> As I've explained in very much details in another thread, I've
> >>> reviewed (from a design POV) the RNG code in "random" and IMHO,
> >>> there
> >>> is room for improvement (cf. above for what I mean by that term).
> >>> <rant>
> >>> I have some code ready for review but I had to resort to what I
> >>> considered sub-optimal design (preemptively renouncing to propose a
> >>> "delegation"-based design) solely because of the destructive
> >>> community
> >>> process that takes place here.[1]
> >>> </rant>
> >>
> >> More vague hyperbole that serves no purpose.  Please focus on actual
> >> code or design issues.
> >>>
> >>> The practical use-cases is anything that needs further
> >>> processing of
> >>> the numbers produced according to a uniform distribution:
> >>
> >> Isn't that what the samplers in the distributions package do?  What
> >> we need from the PRNG implementations is just blocks of bits.  Since
> >> we wanted a pluggable replacement for j.u.Random, we added uniform
> >> ints, longs and floats and gaussian floats.  The samplers just need
> >> uniform doubles.  The practical use case we need is well-supported
> >> in the code we have.  What is missing, exactly?
> >>> I agree that
> >>> there would be little sense to code that latter part in a "pure" OO
> >>> way[2].  And Luc made it indeed quite efficient, I think, in the
> >>> various
> >>> concrete classes.
> >>> What I want to reconsider is how those concrete low-level
> >>> algorithms can
> >>> be plugged in a higher-level function that just requires a
> >>> "source of
> >>> randomness", as I'd call a provider of "int" (or "long") values,
> >>> where
> >>> the high level functionality does not care at all about the
> >>> provider's
> >>> inner working (a.o. how it's seeded!).
> >>
> >> This is why many higher-level samplers and other things that require
> >> random data inside [math] have a pluggable RandomGenerator.
> >>>
> >>> A case in point is the sampling of other distributions (namely the
> >>> Normal distribution).
> >>
> >> Or any of the others.  We have a default, inversion-based method
> >> that the abstract distribution classes provide and some pretty good
> >> specialized implementations within individual distributions.  Most
> >> of these just require uniform random doubles as source.
> >>
> >>>
> >>> Here is the benchmark report:
> >>> -----
> >>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
> >>> time unit: ms)
> >>>                         name time/call std dev total time ratio
> >>> cv difference
> >>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
> >>> 0.14 0.0000e+00
> >>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
> >>> 0.07 1.2892e+04
> >>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
> >>> 0.06 1.0393e+04
> >>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
> >>> 0.07 1.1360e+04
> >>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
> >>> 0.04 1.1513e+04
> >>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
> >>> 0.03 1.2125e+04
> >>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
> >>> 0.06 1.1928e+04
> >>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
> >>> 0.04 1.3931e+04
> >>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
> >>> 0.06 1.4118e+04
> >>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
> >>> 0.08 1.1049e+04
> >>> -----
> >>> where the first line is JDK's "nextInt()" and the remaining are
> >>> "nextGaussian()".
> >>>
> >>> The generation time is thus about 4-fold that of "nextInt()".
> >>> Thus, degrading the performance of "nextInt()" by 10% would
> >>> degrade the
> >>> performance of "nextGaussian()" by half that.
> >>>
> >>> For a performance discussion to be meaningful, I think that we'd
> >>> need
> >>> to know how that fact would affect, even modestly, any moderately
> >>> complex
> >>> post-processing of the generated values.
> >>>
> >>> Another case, for modularity, would be to consider that other
> >>> algorithms could
> >>> be implemented to provide the required distribution.[3]
> >>> In the current design (inheritance-based), that can only be done
> >>> by creating
> >>> a subclass, even though the core functionality ("nextDouble()") is
> >>> not
> >>> overridden.
> >>>
> >>>>> * What are usages of the CM RNGs?
> >>>>>   Do those use-cases strictly forbid "loosing" a dozen
> >>>>> milliseconds per
> >>>>>   million calls?
> >>>>
> >>>> There are many different use cases.  My own applications use
> >>>> them in
> >>>> simulations to generate random deviates, to generate random hex
> >>>> strings as identifiers and in stochastic algorithms like some
> >>>> of our
> >>>> internal uses.  The last case is definitely sensitive to PRNG
> >>>> performance.
> >>>
> >>> Thanks for giving examples, but since we talk about performance, I
> >>> was hoping for some real flesh, like the relative duration of
> >>> numbers
> >>> generation (e.g. the total duration of calls to the
> >>> "RandomGenerator"
> >>> instances wrt to the total duration of the application).
> >>>
> >>> I don't know if by "last case", you are referring to code that is
> >>> inside CM.  I didn't spot anything that makes "heavy" usage of a
> >>> RNG (in the sense that generation would count as a sizable part of
> >>> the whole processing).
> >> monteCarloP in KolmogorovSmirnovTest is one to check.
> >>>
> >>> As I pointed out many times: if an application is severely
> >>> dependent
> >>> on the performance of RNG, the user probably will turn to specific
> >>> tools (e.g. GPUs? [4]) rather than use CM.
> >>
> >> That is a bogus argument.  We should make our PRNGs simple and fast
> >> so their use can extend to performance-sensitive applications.
> >>>
> >>> Conversely, using Java might be preferred for its flexibility,
> >>> which
> >>> is destroyed by a search for ultimate performance (which nobody
> >>> seems
> >>> able to define reasonably).
> >>> Performance is not a goal in itself; it should not be a trophy
> >>> which
> >>> sits uselessly on a shelf.
> >>
> >> Nor should "beautiful design" in the eyes of one person.
> >>>
> >>> My goal is not to deliberately slow things down; it is to allow
> >>> some
> >>> leeway so that designs which are deemed better (on all counts
> >>> except,
> >>> perhaps, performance) are given a chance to show their
> >>> strengths, in
> >>> particular in areas where performance in absolute terms is "good
> >>> enough" for all use-cases which CM should care about (hence the
> >>> need
> >>> of actual data points[5]).
> >>
> >> I see no reason that we can't have it both ways - good design and
> >> good performance. What we have now, modulo maybe some small changes
> >> to reduce code duplication, works fine.  If you want to play with
> >> 64-bit generators and can find reference implementations and verify
> >> that they do in fact perform better, great.  If not, I don't see the
> >> point.  You can rant and complain all you want; but I am not going
> >> to let us trash performance or correctness of code in the random
> >> class or anywhere else just because you think it is somehow "better
> >> designed"  unless you can show specific, practical use cases
> >> demonstrating the value of the changes.
> >>
> >> Phil
> >>>
> >>>
> >>> Gilles
> >>>
> >>> [1] "Is it faster?"
> >>>     "No."
> >>>     "Then, no."
> >>> [2] Although that is in some sense what you indirectly defend by
> >>> wanting
> >>>     to stick with a meaningless "next(int bits)" method.
> >>> [3] http://www.doornik.com/research/ziggurat.pdf
> >>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
> >>> [5] Hence the need to agree on a methodology/policy for
> >>> benchmarking.
> >>>
> >>>>
> >>>> Phil
> >>>>
> >>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
> >>>>>   IOW, would those users for which such a difference matters use
> >>>>> CM at all?
> >>>>
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Gilles
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Mark Thomas
On 06/02/2016 11:31, James Carman wrote:
> Okay, folks, this is definitely getting out of hand. Let's put a moratorium
> on this thread for the weekend or something and try to come back together
> next week and try to move forward. I would urge folks to watch this while
> we wait:
>
> https://m.youtube.com/watch?v=rOWmrlft2FI

Revolution vs. evolution is not a new discussion topic at the ASF. In
addition to the above, I also recommend this:

http://incubator.apache.org/learn/rules-for-revolutionaries.html

Clearly, the rest of the community should have stepped in sooner to try
and cool things down. That we didn't is a lesson for all of us.

> p.s. Phil, I do hope you'll reconsider

+1. I very much value the help and advice you have provided with Pool
and DBCP and I would be very sorry to lose that.


>
> On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <[hidden email]> wrote:
>
>> OK, I give up.  I am withdrawing as volunteer chair or member of the
>> new TLP.
>>
>> Phil
>>
>> On 2/5/16 7:23 PM, Gilles wrote:
>>> Phil,
>>>
>>> You talk again about me trying to push forward changes that
>>> serve no purpose besides "trash performance and correctness".
>>> This is again baseless FUD to which I've already answered
>>> (with detailed list of facts which you chose to ignore).
>>> You declare anything for which you don't have an answer as
>>> "bogus argument". Why is the reference to multi-threaded
>>> implementations bogus?  You contradict yourself in pretending
>>> that CM RNGs could be so good as to make people want to use
>>> them while refusing to consider whether another design might
>>> be better suited to such high(er)-performance extensions.
>>> This particular case is a long shot but if any and all
>>> discussions are stopped dead, how do you imagine that we can
>>> go anywhere?
>>> As you could read from experts, micro-benchmarks are deceiving;
>>> but you refuse to even consider alternative designs if there
>>> might be a slight suspicion of degradation.
>>> How can we ever set up a constructive discussion on how to
>>> make everybody enjoy this project if the purported chair is
>>> so bent to protecting existing code rather than nurture a good
>>> relationship with developers who may sometimes have other ideas?
>>> I'm trying to improve the code (in a dimension which you can't
>>> seem to understand unfortunately) but respectfully request
>>> data points from those users of said code, in order to be
>>> able to prove that no harm will be done.
>>> But you seem to prefer to not disclose anything that would
>>> get us closer to agreement (better design with similar
>>> performance and room for improvement, to be discussed
>>> together as a real development team -- Not you requiring,
>>> as a bad boss, that I bow to your standards for judging
>>> usefulness).
>>> This 1% which you throw at me, where does it come from?
>>> What does 1% mean when the benchmark shows standard deviations
>>> that vary from 4 to 26% in the "nextInt" case and from 3 to
>>> 7% in the "nextGaussian" case?
>>> This 1% looks meaningless without context; context is what I'm
>>> asking in order to try and establish objectively whether
>>> another design will have a measurable impact on actual tasks.
>>> I'm not going to show any "damaged" benchmark because of how
>>> unwelcome you make me feel every time I wish to talk about
>>> other aspects of the code.
>>> There is no development community here.  Only solitary
>>> coders who share a repository.
>>>
>>> Not sorry for the top-post,
>>> Gilles
>>>
>>>
>>> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>>>> On 2/5/16 12:59 PM, Gilles wrote:
>>>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> Here is a micro-benchmark report (performed with
>>>>>>> "PerfTestUtils"):
>>>>>>> -----
>>>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>>>> time
>>>>>>> unit: ms)
>>>>>>>                         name time/call std dev total time ratio
>>>>>>> cv difference
>>>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>>>> 0.26 0.0000e+00
>>>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>>>> 0.15 -1.2900e+02
>>>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>>>> 0.04 2.1032e+02
>>>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>>>> 0.14 5.1945e+02
>>>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>>>> 0.14 8.1451e+02
>>>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>>>> 0.06 9.7816e+02
>>>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>>>> 0.08 1.6602e+03
>>>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>>>> 0.14 1.7301e+03
>>>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>>>> 0.16 1.6139e+02
>>>>>>> -----
>>>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>>>
>>>>>>> Questions are:
>>>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>>>> has
>>>>>>> a very
>>>>>>>   small duration (wrt e.g. the duration of other machine
>>>>>>> instructions that
>>>>>>>   are required to perform them)?
>>>>>>
>>>>>> It is harder to get good benchmarks for shorter duration
>>>>>> activities,
>>>>>> but not impossible.  One thing that it would be good to do is to
>>>>>> compare these results with JMH [1].
>>>>>
>>>>> I was expecting insights based on the benchmark which I did run.
>>>>
>>>> You asked whether or not benchmarks are meaningful when the task
>>>> being benchmarked is short duration.  I answered that question.
>>>>>
>>>>> We have a tool in CM; if it's wrong, we should remove it.
>>>>> How its results compare with JMH is an interesting question,
>>>>
>>>> I will look into this.
>>>>> I
>>>>> agree, but I don't have time to make an analysis of benchmarking
>>>>> tools (on top of what I've been doing since December because
>>>>> totally innocuous changes in the RNG classes were frowned upon
>>>>> out of baseless fear).
>>>>
>>>> Please cut the hypberbole.
>>>>>
>>>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>>>> (absolute
>>>>>>>   duration) below which anything will be deemed good enough?
>>>>>>
>>>>>> That depends completely on the application.
>>>>>
>>>>> Sorry, I thought that it was obvious: I don't speak of applications
>>>>> that don't care about performance. :-)
>>>>>
>>>>> For those that do, I do not agree with the statement: the question
>>>>> relates to finding a point below which it is the environment that
>>>>> overwhelms the other conditions.
>>>>> A point where there will be _unavoidable_ overhead (transferring
>>>>> data
>>>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>>>> switches, ...) such that their duration adds a constant time (on
>>>>> average) that may render most enhancements to an already efficient
>>>>> algorithm barely noticeable in practice.
>>>>> Similarly, but in the opposite direction, some language constructs
>>>>> or design choices might slow down things a bit, but without
>>>>> endangering any user.
>>>>>
>>>>> A problem arises when any enhancement to the design is deemed
>>>>> harmful because it degrades a micro-benchmark, even though that
>>>>> benchmark may not reflect any real use-cases.
>>>>> Then, the real harm is against development.
>>>>>
>>>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>>>> performance and
>>>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>>>> for other qualities
>>>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>>>
>>>>>> That is too general a question to be meaningful.   We need to look
>>>>>> at specific cases.  What exactly are you proposing?
>>>>>
>>>>> <rant>
>>>>> It is quite meaningful even if it refers to general principles.
>>>>> Those could (should, IMO) be taken into account when managing a
>>>>> project like CM, on a par with "performance" (whose intrinsic value
>>>>> is never questioned).
>>>>> </rant>
>>>>
>>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>>>
>>>>> Two specific cases are:
>>>>> * inheritance vs delegation (a.k.a. composition)
>>>>> * generics (that could require runtime casts)
>>>>
>>>> This is getting closer to meaningful.  Where exactly in the code are
>>>> you wanting to use something and seeing benchmark damage?
>>>>>
>>>>>>> * Does ultimate performance for the base functionality
>>>>>>> (generation
>>>>>>> of a
>>>>>>>   random number) trump any consideration of use-cases that would
>>>>>>> need an
>>>>>>>   extension (of the base functionality, such as computation to
>>>>>>> match another
>>>>>>>   distribution) that will unavoidably degrades the performance
>>>>>>> (hence the
>>>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>>>
>>>>>> Again, this is vague and the answer depends on what exactly you
>>>>>> are
>>>>>> talking about. Significantly damaging performance of PRNG
>>>>>> implementations is a bad idea,
>>>>>
>>>>> Now, *this* is vague: what do you mean by "significantly"?
>>>>> That was actually my question in the first place.
>>>> If you are talking about PRNG performance, I would say a 1% hit is
>>>> significant.
>>>>> Referring to the
>>>>> benchmark above, people who'd know why they require ultimate
>>>>> performance
>>>>> should be able to tell what range of numbers they'd find
>>>>> acceptable in
>>>>> that table.
>>>>>
>>>>> <rant>
>>>>> Actually my questions are very precise, but the answers would
>>>>> require
>>>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>>>> </rant>
>>>>>
>>>>> In the Javadoc of the "random" package, there is information about
>>>>> performance but no reference as to the benchmarking procedure.
>>>>
>>>> It would be great to repeat these using JMH, which is emerging as a
>>>> de facto standard for java benchmarking.  I will look into this.
>>>>>
>>>>> I can consistently observe a totally different behaviour (using
>>>>> "PerfTestUtils"):
>>>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>>>     members of the WELL family (see the above table).
>>>>>
>>>>> This makes me wonder how someone who purports to need "ultimate"
>>>>> performance can have any objective basis to determine what is good
>>>>> or bad for his own applications.
>>>>>
>>>>>> unless there are actual practical use
>>>>>> cases you can point to that whatever changes you are proposing
>>>>>> enable.
>>>>>
>>>>> As I've explained in very much details in another thread, I've
>>>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>>>> there
>>>>> is room for improvement (cf. above for what I mean by that term).
>>>>> <rant>
>>>>> I have some code ready for review but I had to resort to what I
>>>>> considered sub-optimal design (preemptively renouncing to propose a
>>>>> "delegation"-based design) solely because of the destructive
>>>>> community
>>>>> process that takes place here.[1]
>>>>> </rant>
>>>>
>>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>>> code or design issues.
>>>>>
>>>>> The practical use-cases is anything that needs further
>>>>> processing of
>>>>> the numbers produced according to a uniform distribution:
>>>>
>>>> Isn't that what the samplers in the distributions package do?  What
>>>> we need from the PRNG implementations is just blocks of bits.  Since
>>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>>> ints, longs and floats and gaussian floats.  The samplers just need
>>>> uniform doubles.  The practical use case we need is well-supported
>>>> in the code we have.  What is missing, exactly?
>>>>> I agree that
>>>>> there would be little sense to code that latter part in a "pure" OO
>>>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>>>> various
>>>>> concrete classes.
>>>>> What I want to reconsider is how those concrete low-level
>>>>> algorithms can
>>>>> be plugged in a higher-level function that just requires a
>>>>> "source of
>>>>> randomness", as I'd call a provider of "int" (or "long") values,
>>>>> where
>>>>> the high level functionality does not care at all about the
>>>>> provider's
>>>>> inner working (a.o. how it's seeded!).
>>>>
>>>> This is why many higher-level samplers and other things that require
>>>> random data inside [math] have a pluggable RandomGenerator.
>>>>>
>>>>> A case in point is the sampling of other distributions (namely the
>>>>> Normal distribution).
>>>>
>>>> Or any of the others.  We have a default, inversion-based method
>>>> that the abstract distribution classes provide and some pretty good
>>>> specialized implementations within individual distributions.  Most
>>>> of these just require uniform random doubles as source.
>>>>
>>>>>
>>>>> Here is the benchmark report:
>>>>> -----
>>>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>>>> time unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>>>> 0.14 0.0000e+00
>>>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>>>> 0.07 1.2892e+04
>>>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>>>> 0.06 1.0393e+04
>>>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>>>> 0.07 1.1360e+04
>>>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>>>> 0.04 1.1513e+04
>>>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>>>> 0.03 1.2125e+04
>>>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>>>> 0.06 1.1928e+04
>>>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>>>> 0.04 1.3931e+04
>>>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>>>> 0.06 1.4118e+04
>>>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>>>> 0.08 1.1049e+04
>>>>> -----
>>>>> where the first line is JDK's "nextInt()" and the remaining are
>>>>> "nextGaussian()".
>>>>>
>>>>> The generation time is thus about 4-fold that of "nextInt()".
>>>>> Thus, degrading the performance of "nextInt()" by 10% would
>>>>> degrade the
>>>>> performance of "nextGaussian()" by half that.
>>>>>
>>>>> For a performance discussion to be meaningful, I think that we'd
>>>>> need
>>>>> to know how that fact would affect, even modestly, any moderately
>>>>> complex
>>>>> post-processing of the generated values.
>>>>>
>>>>> Another case, for modularity, would be to consider that other
>>>>> algorithms could
>>>>> be implemented to provide the required distribution.[3]
>>>>> In the current design (inheritance-based), that can only be done
>>>>> by creating
>>>>> a subclass, even though the core functionality ("nextDouble()") is
>>>>> not
>>>>> overridden.
>>>>>
>>>>>>> * What are usages of the CM RNGs?
>>>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>>>> milliseconds per
>>>>>>>   million calls?
>>>>>>
>>>>>> There are many different use cases.  My own applications use
>>>>>> them in
>>>>>> simulations to generate random deviates, to generate random hex
>>>>>> strings as identifiers and in stochastic algorithms like some
>>>>>> of our
>>>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>>>> performance.
>>>>>
>>>>> Thanks for giving examples, but since we talk about performance, I
>>>>> was hoping for some real flesh, like the relative duration of
>>>>> numbers
>>>>> generation (e.g. the total duration of calls to the
>>>>> "RandomGenerator"
>>>>> instances wrt to the total duration of the application).
>>>>>
>>>>> I don't know if by "last case", you are referring to code that is
>>>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>>>> RNG (in the sense that generation would count as a sizable part of
>>>>> the whole processing).
>>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>>>
>>>>> As I pointed out many times: if an application is severely
>>>>> dependent
>>>>> on the performance of RNG, the user probably will turn to specific
>>>>> tools (e.g. GPUs? [4]) rather than use CM.
>>>>
>>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>>> so their use can extend to performance-sensitive applications.
>>>>>
>>>>> Conversely, using Java might be preferred for its flexibility,
>>>>> which
>>>>> is destroyed by a search for ultimate performance (which nobody
>>>>> seems
>>>>> able to define reasonably).
>>>>> Performance is not a goal in itself; it should not be a trophy
>>>>> which
>>>>> sits uselessly on a shelf.
>>>>
>>>> Nor should "beautiful design" in the eyes of one person.
>>>>>
>>>>> My goal is not to deliberately slow things down; it is to allow
>>>>> some
>>>>> leeway so that designs which are deemed better (on all counts
>>>>> except,
>>>>> perhaps, performance) are given a chance to show their
>>>>> strengths, in
>>>>> particular in areas where performance in absolute terms is "good
>>>>> enough" for all use-cases which CM should care about (hence the
>>>>> need
>>>>> of actual data points[5]).
>>>>
>>>> I see no reason that we can't have it both ways - good design and
>>>> good performance. What we have now, modulo maybe some small changes
>>>> to reduce code duplication, works fine.  If you want to play with
>>>> 64-bit generators and can find reference implementations and verify
>>>> that they do in fact perform better, great.  If not, I don't see the
>>>> point.  You can rant and complain all you want; but I am not going
>>>> to let us trash performance or correctness of code in the random
>>>> class or anywhere else just because you think it is somehow "better
>>>> designed"  unless you can show specific, practical use cases
>>>> demonstrating the value of the changes.
>>>>
>>>> Phil
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>> [1] "Is it faster?"
>>>>>     "No."
>>>>>     "Then, no."
>>>>> [2] Although that is in some sense what you indirectly defend by
>>>>> wanting
>>>>>     to stick with a meaningless "next(int bits)" method.
>>>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>>>> [5] Hence the need to agree on a methodology/policy for
>>>>> benchmarking.
>>>>>
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>   IOW, would those users for which such a difference matters use
>>>>>>> CM at all?
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gilles
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

dbrosIus
In reply to this post by Gilles Sadowski
The problem with the revolutionary vs evolutionary post is that you say to the revolutionaries "go off and play around. Invest all kinds of effort and time" but we are going to not give you any kind of sense as to whether we will do anything with what you produced. "WE" the status quo will decide.
To me that's just anti-meritocritous.
Apache tends to be a very conservative organizatiob, with the upshot being do-less becomes favored over do-more.
I have no skin in this math game, and don't begin to say I know the right and wrong, but let's favor movement.
Dave

-------- Original message --------
From: Mark Thomas <[hidden email]>
Date: 02/06/2016  1:17 PM  (GMT-05:00)
To: Commons Developers List <[hidden email]>
Subject: Re: [Math] How fast is fast enough?

On 06/02/2016 11:31, James Carman wrote:
> Okay, folks, this is definitely getting out of hand. Let's put a moratorium
> on this thread for the weekend or something and try to come back together
> next week and try to move forward. I would urge folks to watch this while
> we wait:
>
> https://m.youtube.com/watch?v=rOWmrlft2FI

Revolution vs. evolution is not a new discussion topic at the ASF. In
addition to the above, I also recommend this:

http://incubator.apache.org/learn/rules-for-revolutionaries.html

Clearly, the rest of the community should have stepped in sooner to try
and cool things down. That we didn't is a lesson for all of us.

> p.s. Phil, I do hope you'll reconsider

+1. I very much value the help and advice you have provided with Pool
and DBCP and I would be very sorry to lose that.


>
> On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <[hidden email]> wrote:
>
>> OK, I give up.  I am withdrawing as volunteer chair or member of the
>> new TLP.
>>
>> Phil
>>
>> On 2/5/16 7:23 PM, Gilles wrote:
>>> Phil,
>>>
>>> You talk again about me trying to push forward changes that
>>> serve no purpose besides "trash performance and correctness".
>>> This is again baseless FUD to which I've already answered
>>> (with detailed list of facts which you chose to ignore).
>>> You declare anything for which you don't have an answer as
>>> "bogus argument". Why is the reference to multi-threaded
>>> implementations bogus?  You contradict yourself in pretending
>>> that CM RNGs could be so good as to make people want to use
>>> them while refusing to consider whether another design might
>>> be better suited to such high(er)-performance extensions.
>>> This particular case is a long shot but if any and all
>>> discussions are stopped dead, how do you imagine that we can
>>> go anywhere?
>>> As you could read from experts, micro-benchmarks are deceiving;
>>> but you refuse to even consider alternative designs if there
>>> might be a slight suspicion of degradation.
>>> How can we ever set up a constructive discussion on how to
>>> make everybody enjoy this project if the purported chair is
>>> so bent to protecting existing code rather than nurture a good
>>> relationship with developers who may sometimes have other ideas?
>>> I'm trying to improve the code (in a dimension which you can't
>>> seem to understand unfortunately) but respectfully request
>>> data points from those users of said code, in order to be
>>> able to prove that no harm will be done.
>>> But you seem to prefer to not disclose anything that would
>>> get us closer to agreement (better design with similar
>>> performance and room for improvement, to be discussed
>>> together as a real development team -- Not you requiring,
>>> as a bad boss, that I bow to your standards for judging
>>> usefulness).
>>> This 1% which you throw at me, where does it come from?
>>> What does 1% mean when the benchmark shows standard deviations
>>> that vary from 4 to 26% in the "nextInt" case and from 3 to
>>> 7% in the "nextGaussian" case?
>>> This 1% looks meaningless without context; context is what I'm
>>> asking in order to try and establish objectively whether
>>> another design will have a measurable impact on actual tasks.
>>> I'm not going to show any "damaged" benchmark because of how
>>> unwelcome you make me feel every time I wish to talk about
>>> other aspects of the code.
>>> There is no development community here.  Only solitary
>>> coders who share a repository.
>>>
>>> Not sorry for the top-post,
>>> Gilles
>>>
>>>
>>> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>>>> On 2/5/16 12:59 PM, Gilles wrote:
>>>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> Here is a micro-benchmark report (performed with
>>>>>>> "PerfTestUtils"):
>>>>>>> -----
>>>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>>>> time
>>>>>>> unit: ms)
>>>>>>>                         name time/call std dev total time ratio
>>>>>>> cv difference
>>>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>>>> 0.26 0.0000e+00
>>>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>>>> 0.15 -1.2900e+02
>>>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>>>> 0.04 2.1032e+02
>>>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>>>> 0.14 5.1945e+02
>>>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>>>> 0.14 8.1451e+02
>>>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>>>> 0.06 9.7816e+02
>>>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>>>> 0.08 1.6602e+03
>>>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>>>> 0.14 1.7301e+03
>>>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>>>> 0.16 1.6139e+02
>>>>>>> -----
>>>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>>>
>>>>>>> Questions are:
>>>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>>>> has
>>>>>>> a very
>>>>>>>   small duration (wrt e.g. the duration of other machine
>>>>>>> instructions that
>>>>>>>   are required to perform them)?
>>>>>>
>>>>>> It is harder to get good benchmarks for shorter duration
>>>>>> activities,
>>>>>> but not impossible.  One thing that it would be good to do is to
>>>>>> compare these results with JMH [1].
>>>>>
>>>>> I was expecting insights based on the benchmark which I did run.
>>>>
>>>> You asked whether or not benchmarks are meaningful when the task
>>>> being benchmarked is short duration.  I answered that question.
>>>>>
>>>>> We have a tool in CM; if it's wrong, we should remove it.
>>>>> How its results compare with JMH is an interesting question,
>>>>
>>>> I will look into this.
>>>>> I
>>>>> agree, but I don't have time to make an analysis of benchmarking
>>>>> tools (on top of what I've been doing since December because
>>>>> totally innocuous changes in the RNG classes were frowned upon
>>>>> out of baseless fear).
>>>>
>>>> Please cut the hypberbole.
>>>>>
>>>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>>>> (absolute
>>>>>>>   duration) below which anything will be deemed good enough?
>>>>>>
>>>>>> That depends completely on the application.
>>>>>
>>>>> Sorry, I thought that it was obvious: I don't speak of applications
>>>>> that don't care about performance. :-)
>>>>>
>>>>> For those that do, I do not agree with the statement: the question
>>>>> relates to finding a point below which it is the environment that
>>>>> overwhelms the other conditions.
>>>>> A point where there will be _unavoidable_ overhead (transferring
>>>>> data
>>>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>>>> switches, ...) such that their duration adds a constant time (on
>>>>> average) that may render most enhancements to an already efficient
>>>>> algorithm barely noticeable in practice.
>>>>> Similarly, but in the opposite direction, some language constructs
>>>>> or design choices might slow down things a bit, but without
>>>>> endangering any user.
>>>>>
>>>>> A problem arises when any enhancement to the design is deemed
>>>>> harmful because it degrades a micro-benchmark, even though that
>>>>> benchmark may not reflect any real use-cases.
>>>>> Then, the real harm is against development.
>>>>>
>>>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>>>> performance and
>>>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>>>> for other qualities
>>>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>>>
>>>>>> That is too general a question to be meaningful.   We need to look
>>>>>> at specific cases.  What exactly are you proposing?
>>>>>
>>>>> <rant>
>>>>> It is quite meaningful even if it refers to general principles.
>>>>> Those could (should, IMO) be taken into account when managing a
>>>>> project like CM, on a par with "performance" (whose intrinsic value
>>>>> is never questioned).
>>>>> </rant>
>>>>
>>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>>>
>>>>> Two specific cases are:
>>>>> * inheritance vs delegation (a.k.a. composition)
>>>>> * generics (that could require runtime casts)
>>>>
>>>> This is getting closer to meaningful.  Where exactly in the code are
>>>> you wanting to use something and seeing benchmark damage?
>>>>>
>>>>>>> * Does ultimate performance for the base functionality
>>>>>>> (generation
>>>>>>> of a
>>>>>>>   random number) trump any consideration of use-cases that would
>>>>>>> need an
>>>>>>>   extension (of the base functionality, such as computation to
>>>>>>> match another
>>>>>>>   distribution) that will unavoidably degrades the performance
>>>>>>> (hence the
>>>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>>>
>>>>>> Again, this is vague and the answer depends on what exactly you
>>>>>> are
>>>>>> talking about. Significantly damaging performance of PRNG
>>>>>> implementations is a bad idea,
>>>>>
>>>>> Now, *this* is vague: what do you mean by "significantly"?
>>>>> That was actually my question in the first place.
>>>> If you are talking about PRNG performance, I would say a 1% hit is
>>>> significant.
>>>>> Referring to the
>>>>> benchmark above, people who'd know why they require ultimate
>>>>> performance
>>>>> should be able to tell what range of numbers they'd find
>>>>> acceptable in
>>>>> that table.
>>>>>
>>>>> <rant>
>>>>> Actually my questions are very precise, but the answers would
>>>>> require
>>>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>>>> </rant>
>>>>>
>>>>> In the Javadoc of the "random" package, there is information about
>>>>> performance but no reference as to the benchmarking procedure.
>>>>
>>>> It would be great to repeat these using JMH, which is emerging as a
>>>> de facto standard for java benchmarking.  I will look into this.
>>>>>
>>>>> I can consistently observe a totally different behaviour (using
>>>>> "PerfTestUtils"):
>>>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>>>     members of the WELL family (see the above table).
>>>>>
>>>>> This makes me wonder how someone who purports to need "ultimate"
>>>>> performance can have any objective basis to determine what is good
>>>>> or bad for his own applications.
>>>>>
>>>>>> unless there are actual practical use
>>>>>> cases you can point to that whatever changes you are proposing
>>>>>> enable.
>>>>>
>>>>> As I've explained in very much details in another thread, I've
>>>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>>>> there
>>>>> is room for improvement (cf. above for what I mean by that term).
>>>>> <rant>
>>>>> I have some code ready for review but I had to resort to what I
>>>>> considered sub-optimal design (preemptively renouncing to propose a
>>>>> "delegation"-based design) solely because of the destructive
>>>>> community
>>>>> process that takes place here.[1]
>>>>> </rant>
>>>>
>>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>>> code or design issues.
>>>>>
>>>>> The practical use-cases is anything that needs further
>>>>> processing of
>>>>> the numbers produced according to a uniform distribution:
>>>>
>>>> Isn't that what the samplers in the distributions package do?  What
>>>> we need from the PRNG implementations is just blocks of bits.  Since
>>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>>> ints, longs and floats and gaussian floats.  The samplers just need
>>>> uniform doubles.  The practical use case we need is well-supported
>>>> in the code we have.  What is missing, exactly?
>>>>> I agree that
>>>>> there would be little sense to code that latter part in a "pure" OO
>>>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>>>> various
>>>>> concrete classes.
>>>>> What I want to reconsider is how those concrete low-level
>>>>> algorithms can
>>>>> be plugged in a higher-level function that just requires a
>>>>> "source of
>>>>> randomness", as I'd call a provider of "int" (or "long") values,
>>>>> where
>>>>> the high level functionality does not care at all about the
>>>>> provider's
>>>>> inner working (a.o. how it's seeded!).
>>>>
>>>> This is why many higher-level samplers and other things that require
>>>> random data inside [math] have a pluggable RandomGenerator.
>>>>>
>>>>> A case in point is the sampling of other distributions (namely the
>>>>> Normal distribution).
>>>>
>>>> Or any of the others.  We have a default, inversion-based method
>>>> that the abstract distribution classes provide and some pretty good
>>>> specialized implementations within individual distributions.  Most
>>>> of these just require uniform random doubles as source.
>>>>
>>>>>
>>>>> Here is the benchmark report:
>>>>> -----
>>>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>>>> time unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>>>> 0.14 0.0000e+00
>>>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>>>> 0.07 1.2892e+04
>>>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>>>> 0.06 1.0393e+04
>>>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>>>> 0.07 1.1360e+04
>>>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>>>> 0.04 1.1513e+04
>>>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>>>> 0.03 1.2125e+04
>>>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>>>> 0.06 1.1928e+04
>>>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>>>> 0.04 1.3931e+04
>>>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>>>> 0.06 1.4118e+04
>>>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>>>> 0.08 1.1049e+04
>>>>> -----
>>>>> where the first line is JDK's "nextInt()" and the remaining are
>>>>> "nextGaussian()".
>>>>>
>>>>> The generation time is thus about 4-fold that of "nextInt()".
>>>>> Thus, degrading the performance of "nextInt()" by 10% would
>>>>> degrade the
>>>>> performance of "nextGaussian()" by half that.
>>>>>
>>>>> For a performance discussion to be meaningful, I think that we'd
>>>>> need
>>>>> to know how that fact would affect, even modestly, any moderately
>>>>> complex
>>>>> post-processing of the generated values.
>>>>>
>>>>> Another case, for modularity, would be to consider that other
>>>>> algorithms could
>>>>> be implemented to provide the required distribution.[3]
>>>>> In the current design (inheritance-based), that can only be done
>>>>> by creating
>>>>> a subclass, even though the core functionality ("nextDouble()") is
>>>>> not
>>>>> overridden.
>>>>>
>>>>>>> * What are usages of the CM RNGs?
>>>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>>>> milliseconds per
>>>>>>>   million calls?
>>>>>>
>>>>>> There are many different use cases.  My own applications use
>>>>>> them in
>>>>>> simulations to generate random deviates, to generate random hex
>>>>>> strings as identifiers and in stochastic algorithms like some
>>>>>> of our
>>>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>>>> performance.
>>>>>
>>>>> Thanks for giving examples, but since we talk about performance, I
>>>>> was hoping for some real flesh, like the relative duration of
>>>>> numbers
>>>>> generation (e.g. the total duration of calls to the
>>>>> "RandomGenerator"
>>>>> instances wrt to the total duration of the application).
>>>>>
>>>>> I don't know if by "last case", you are referring to code that is
>>>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>>>> RNG (in the sense that generation would count as a sizable part of
>>>>> the whole processing).
>>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>>>
>>>>> As I pointed out many times: if an application is severely
>>>>> dependent
>>>>> on the performance of RNG, the user probably will turn to specific
>>>>> tools (e.g. GPUs? [4]) rather than use CM.
>>>>
>>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>>> so their use can extend to performance-sensitive applications.
>>>>>
>>>>> Conversely, using Java might be preferred for its flexibility,
>>>>> which
>>>>> is destroyed by a search for ultimate performance (which nobody
>>>>> seems
>>>>> able to define reasonably).
>>>>> Performance is not a goal in itself; it should not be a trophy
>>>>> which
>>>>> sits uselessly on a shelf.
>>>>
>>>> Nor should "beautiful design" in the eyes of one person.
>>>>>
>>>>> My goal is not to deliberately slow things down; it is to allow
>>>>> some
>>>>> leeway so that designs which are deemed better (on all counts
>>>>> except,
>>>>> perhaps, performance) are given a chance to show their
>>>>> strengths, in
>>>>> particular in areas where performance in absolute terms is "good
>>>>> enough" for all use-cases which CM should care about (hence the
>>>>> need
>>>>> of actual data points[5]).
>>>>
>>>> I see no reason that we can't have it both ways - good design and
>>>> good performance. What we have now, modulo maybe some small changes
>>>> to reduce code duplication, works fine.  If you want to play with
>>>> 64-bit generators and can find reference implementations and verify
>>>> that they do in fact perform better, great.  If not, I don't see the
>>>> point.  You can rant and complain all you want; but I am not going
>>>> to let us trash performance or correctness of code in the random
>>>> class or anywhere else just because you think it is somehow "better
>>>> designed"  unless you can show specific, practical use cases
>>>> demonstrating the value of the changes.
>>>>
>>>> Phil
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>> [1] "Is it faster?"
>>>>>     "No."
>>>>>     "Then, no."
>>>>> [2] Although that is in some sense what you indirectly defend by
>>>>> wanting
>>>>>     to stick with a meaningless "next(int bits)" method.
>>>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>>>> [5] Hence the need to agree on a methodology/policy for
>>>>> benchmarking.
>>>>>
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>   IOW, would those users for which such a difference matters use
>>>>>>> CM at all?
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gilles
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Oliver Heger-3
In reply to this post by James Carman


Am 06.02.2016 um 12:31 schrieb James Carman:
> Okay, folks, this is definitely getting out of hand. Let's put a moratorium
> on this thread for the weekend or something and try to come back together
> next week and try to move forward. I would urge folks to watch this while
> we wait:
>
> https://m.youtube.com/watch?v=rOWmrlft2FI
>
> p.s. Phil, I do hope you'll reconsider.

Big +1! Please take some time to rethink. This would be a huge loss for
this community.

Oliver

>
> On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <[hidden email]> wrote:
>
>> OK, I give up.  I am withdrawing as volunteer chair or member of the
>> new TLP.
>>
>> Phil
>>
>> On 2/5/16 7:23 PM, Gilles wrote:
>>> Phil,
>>>
>>> You talk again about me trying to push forward changes that
>>> serve no purpose besides "trash performance and correctness".
>>> This is again baseless FUD to which I've already answered
>>> (with detailed list of facts which you chose to ignore).
>>> You declare anything for which you don't have an answer as
>>> "bogus argument". Why is the reference to multi-threaded
>>> implementations bogus?  You contradict yourself in pretending
>>> that CM RNGs could be so good as to make people want to use
>>> them while refusing to consider whether another design might
>>> be better suited to such high(er)-performance extensions.
>>> This particular case is a long shot but if any and all
>>> discussions are stopped dead, how do you imagine that we can
>>> go anywhere?
>>> As you could read from experts, micro-benchmarks are deceiving;
>>> but you refuse to even consider alternative designs if there
>>> might be a slight suspicion of degradation.
>>> How can we ever set up a constructive discussion on how to
>>> make everybody enjoy this project if the purported chair is
>>> so bent to protecting existing code rather than nurture a good
>>> relationship with developers who may sometimes have other ideas?
>>> I'm trying to improve the code (in a dimension which you can't
>>> seem to understand unfortunately) but respectfully request
>>> data points from those users of said code, in order to be
>>> able to prove that no harm will be done.
>>> But you seem to prefer to not disclose anything that would
>>> get us closer to agreement (better design with similar
>>> performance and room for improvement, to be discussed
>>> together as a real development team -- Not you requiring,
>>> as a bad boss, that I bow to your standards for judging
>>> usefulness).
>>> This 1% which you throw at me, where does it come from?
>>> What does 1% mean when the benchmark shows standard deviations
>>> that vary from 4 to 26% in the "nextInt" case and from 3 to
>>> 7% in the "nextGaussian" case?
>>> This 1% looks meaningless without context; context is what I'm
>>> asking in order to try and establish objectively whether
>>> another design will have a measurable impact on actual tasks.
>>> I'm not going to show any "damaged" benchmark because of how
>>> unwelcome you make me feel every time I wish to talk about
>>> other aspects of the code.
>>> There is no development community here.  Only solitary
>>> coders who share a repository.
>>>
>>> Not sorry for the top-post,
>>> Gilles
>>>
>>>
>>> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>>>> On 2/5/16 12:59 PM, Gilles wrote:
>>>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>>>> Hi.
>>>>>>>
>>>>>>> Here is a micro-benchmark report (performed with
>>>>>>> "PerfTestUtils"):
>>>>>>> -----
>>>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>>>> time
>>>>>>> unit: ms)
>>>>>>>                         name time/call std dev total time ratio
>>>>>>> cv difference
>>>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>>>> 0.26 0.0000e+00
>>>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>>>> 0.15 -1.2900e+02
>>>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>>>> 0.04 2.1032e+02
>>>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>>>> 0.14 5.1945e+02
>>>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>>>> 0.14 8.1451e+02
>>>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>>>> 0.06 9.7816e+02
>>>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>>>> 0.08 1.6602e+03
>>>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>>>> 0.14 1.7301e+03
>>>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>>>> 0.16 1.6139e+02
>>>>>>> -----
>>>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>>>
>>>>>>> Questions are:
>>>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>>>> has
>>>>>>> a very
>>>>>>>   small duration (wrt e.g. the duration of other machine
>>>>>>> instructions that
>>>>>>>   are required to perform them)?
>>>>>>
>>>>>> It is harder to get good benchmarks for shorter duration
>>>>>> activities,
>>>>>> but not impossible.  One thing that it would be good to do is to
>>>>>> compare these results with JMH [1].
>>>>>
>>>>> I was expecting insights based on the benchmark which I did run.
>>>>
>>>> You asked whether or not benchmarks are meaningful when the task
>>>> being benchmarked is short duration.  I answered that question.
>>>>>
>>>>> We have a tool in CM; if it's wrong, we should remove it.
>>>>> How its results compare with JMH is an interesting question,
>>>>
>>>> I will look into this.
>>>>> I
>>>>> agree, but I don't have time to make an analysis of benchmarking
>>>>> tools (on top of what I've been doing since December because
>>>>> totally innocuous changes in the RNG classes were frowned upon
>>>>> out of baseless fear).
>>>>
>>>> Please cut the hypberbole.
>>>>>
>>>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>>>> (absolute
>>>>>>>   duration) below which anything will be deemed good enough?
>>>>>>
>>>>>> That depends completely on the application.
>>>>>
>>>>> Sorry, I thought that it was obvious: I don't speak of applications
>>>>> that don't care about performance. :-)
>>>>>
>>>>> For those that do, I do not agree with the statement: the question
>>>>> relates to finding a point below which it is the environment that
>>>>> overwhelms the other conditions.
>>>>> A point where there will be _unavoidable_ overhead (transferring
>>>>> data
>>>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>>>> switches, ...) such that their duration adds a constant time (on
>>>>> average) that may render most enhancements to an already efficient
>>>>> algorithm barely noticeable in practice.
>>>>> Similarly, but in the opposite direction, some language constructs
>>>>> or design choices might slow down things a bit, but without
>>>>> endangering any user.
>>>>>
>>>>> A problem arises when any enhancement to the design is deemed
>>>>> harmful because it degrades a micro-benchmark, even though that
>>>>> benchmark may not reflect any real use-cases.
>>>>> Then, the real harm is against development.
>>>>>
>>>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>>>> performance and
>>>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>>>> for other qualities
>>>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>>>
>>>>>> That is too general a question to be meaningful.   We need to look
>>>>>> at specific cases.  What exactly are you proposing?
>>>>>
>>>>> <rant>
>>>>> It is quite meaningful even if it refers to general principles.
>>>>> Those could (should, IMO) be taken into account when managing a
>>>>> project like CM, on a par with "performance" (whose intrinsic value
>>>>> is never questioned).
>>>>> </rant>
>>>>
>>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>>>
>>>>> Two specific cases are:
>>>>> * inheritance vs delegation (a.k.a. composition)
>>>>> * generics (that could require runtime casts)
>>>>
>>>> This is getting closer to meaningful.  Where exactly in the code are
>>>> you wanting to use something and seeing benchmark damage?
>>>>>
>>>>>>> * Does ultimate performance for the base functionality
>>>>>>> (generation
>>>>>>> of a
>>>>>>>   random number) trump any consideration of use-cases that would
>>>>>>> need an
>>>>>>>   extension (of the base functionality, such as computation to
>>>>>>> match another
>>>>>>>   distribution) that will unavoidably degrades the performance
>>>>>>> (hence the
>>>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>>>
>>>>>> Again, this is vague and the answer depends on what exactly you
>>>>>> are
>>>>>> talking about. Significantly damaging performance of PRNG
>>>>>> implementations is a bad idea,
>>>>>
>>>>> Now, *this* is vague: what do you mean by "significantly"?
>>>>> That was actually my question in the first place.
>>>> If you are talking about PRNG performance, I would say a 1% hit is
>>>> significant.
>>>>> Referring to the
>>>>> benchmark above, people who'd know why they require ultimate
>>>>> performance
>>>>> should be able to tell what range of numbers they'd find
>>>>> acceptable in
>>>>> that table.
>>>>>
>>>>> <rant>
>>>>> Actually my questions are very precise, but the answers would
>>>>> require
>>>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>>>> </rant>
>>>>>
>>>>> In the Javadoc of the "random" package, there is information about
>>>>> performance but no reference as to the benchmarking procedure.
>>>>
>>>> It would be great to repeat these using JMH, which is emerging as a
>>>> de facto standard for java benchmarking.  I will look into this.
>>>>>
>>>>> I can consistently observe a totally different behaviour (using
>>>>> "PerfTestUtils"):
>>>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>>>     members of the WELL family (see the above table).
>>>>>
>>>>> This makes me wonder how someone who purports to need "ultimate"
>>>>> performance can have any objective basis to determine what is good
>>>>> or bad for his own applications.
>>>>>
>>>>>> unless there are actual practical use
>>>>>> cases you can point to that whatever changes you are proposing
>>>>>> enable.
>>>>>
>>>>> As I've explained in very much details in another thread, I've
>>>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>>>> there
>>>>> is room for improvement (cf. above for what I mean by that term).
>>>>> <rant>
>>>>> I have some code ready for review but I had to resort to what I
>>>>> considered sub-optimal design (preemptively renouncing to propose a
>>>>> "delegation"-based design) solely because of the destructive
>>>>> community
>>>>> process that takes place here.[1]
>>>>> </rant>
>>>>
>>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>>> code or design issues.
>>>>>
>>>>> The practical use-cases is anything that needs further
>>>>> processing of
>>>>> the numbers produced according to a uniform distribution:
>>>>
>>>> Isn't that what the samplers in the distributions package do?  What
>>>> we need from the PRNG implementations is just blocks of bits.  Since
>>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>>> ints, longs and floats and gaussian floats.  The samplers just need
>>>> uniform doubles.  The practical use case we need is well-supported
>>>> in the code we have.  What is missing, exactly?
>>>>> I agree that
>>>>> there would be little sense to code that latter part in a "pure" OO
>>>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>>>> various
>>>>> concrete classes.
>>>>> What I want to reconsider is how those concrete low-level
>>>>> algorithms can
>>>>> be plugged in a higher-level function that just requires a
>>>>> "source of
>>>>> randomness", as I'd call a provider of "int" (or "long") values,
>>>>> where
>>>>> the high level functionality does not care at all about the
>>>>> provider's
>>>>> inner working (a.o. how it's seeded!).
>>>>
>>>> This is why many higher-level samplers and other things that require
>>>> random data inside [math] have a pluggable RandomGenerator.
>>>>>
>>>>> A case in point is the sampling of other distributions (namely the
>>>>> Normal distribution).
>>>>
>>>> Or any of the others.  We have a default, inversion-based method
>>>> that the abstract distribution classes provide and some pretty good
>>>> specialized implementations within individual distributions.  Most
>>>> of these just require uniform random doubles as source.
>>>>
>>>>>
>>>>> Here is the benchmark report:
>>>>> -----
>>>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>>>> time unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>>>> 0.14 0.0000e+00
>>>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>>>> 0.07 1.2892e+04
>>>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>>>> 0.06 1.0393e+04
>>>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>>>> 0.07 1.1360e+04
>>>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>>>> 0.04 1.1513e+04
>>>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>>>> 0.03 1.2125e+04
>>>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>>>> 0.06 1.1928e+04
>>>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>>>> 0.04 1.3931e+04
>>>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>>>> 0.06 1.4118e+04
>>>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>>>> 0.08 1.1049e+04
>>>>> -----
>>>>> where the first line is JDK's "nextInt()" and the remaining are
>>>>> "nextGaussian()".
>>>>>
>>>>> The generation time is thus about 4-fold that of "nextInt()".
>>>>> Thus, degrading the performance of "nextInt()" by 10% would
>>>>> degrade the
>>>>> performance of "nextGaussian()" by half that.
>>>>>
>>>>> For a performance discussion to be meaningful, I think that we'd
>>>>> need
>>>>> to know how that fact would affect, even modestly, any moderately
>>>>> complex
>>>>> post-processing of the generated values.
>>>>>
>>>>> Another case, for modularity, would be to consider that other
>>>>> algorithms could
>>>>> be implemented to provide the required distribution.[3]
>>>>> In the current design (inheritance-based), that can only be done
>>>>> by creating
>>>>> a subclass, even though the core functionality ("nextDouble()") is
>>>>> not
>>>>> overridden.
>>>>>
>>>>>>> * What are usages of the CM RNGs?
>>>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>>>> milliseconds per
>>>>>>>   million calls?
>>>>>>
>>>>>> There are many different use cases.  My own applications use
>>>>>> them in
>>>>>> simulations to generate random deviates, to generate random hex
>>>>>> strings as identifiers and in stochastic algorithms like some
>>>>>> of our
>>>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>>>> performance.
>>>>>
>>>>> Thanks for giving examples, but since we talk about performance, I
>>>>> was hoping for some real flesh, like the relative duration of
>>>>> numbers
>>>>> generation (e.g. the total duration of calls to the
>>>>> "RandomGenerator"
>>>>> instances wrt to the total duration of the application).
>>>>>
>>>>> I don't know if by "last case", you are referring to code that is
>>>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>>>> RNG (in the sense that generation would count as a sizable part of
>>>>> the whole processing).
>>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>>>
>>>>> As I pointed out many times: if an application is severely
>>>>> dependent
>>>>> on the performance of RNG, the user probably will turn to specific
>>>>> tools (e.g. GPUs? [4]) rather than use CM.
>>>>
>>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>>> so their use can extend to performance-sensitive applications.
>>>>>
>>>>> Conversely, using Java might be preferred for its flexibility,
>>>>> which
>>>>> is destroyed by a search for ultimate performance (which nobody
>>>>> seems
>>>>> able to define reasonably).
>>>>> Performance is not a goal in itself; it should not be a trophy
>>>>> which
>>>>> sits uselessly on a shelf.
>>>>
>>>> Nor should "beautiful design" in the eyes of one person.
>>>>>
>>>>> My goal is not to deliberately slow things down; it is to allow
>>>>> some
>>>>> leeway so that designs which are deemed better (on all counts
>>>>> except,
>>>>> perhaps, performance) are given a chance to show their
>>>>> strengths, in
>>>>> particular in areas where performance in absolute terms is "good
>>>>> enough" for all use-cases which CM should care about (hence the
>>>>> need
>>>>> of actual data points[5]).
>>>>
>>>> I see no reason that we can't have it both ways - good design and
>>>> good performance. What we have now, modulo maybe some small changes
>>>> to reduce code duplication, works fine.  If you want to play with
>>>> 64-bit generators and can find reference implementations and verify
>>>> that they do in fact perform better, great.  If not, I don't see the
>>>> point.  You can rant and complain all you want; but I am not going
>>>> to let us trash performance or correctness of code in the random
>>>> class or anywhere else just because you think it is somehow "better
>>>> designed"  unless you can show specific, practical use cases
>>>> demonstrating the value of the changes.
>>>>
>>>> Phil
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>> [1] "Is it faster?"
>>>>>     "No."
>>>>>     "Then, no."
>>>>> [2] Although that is in some sense what you indirectly defend by
>>>>> wanting
>>>>>     to stick with a meaningless "next(int bits)" method.
>>>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>>>> [5] Hence the need to agree on a methodology/policy for
>>>>> benchmarking.
>>>>>
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>   IOW, would those users for which such a difference matters use
>>>>>>> CM at all?
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gilles
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Phil Steitz
In reply to this post by dbrosIus
On 2/6/16 11:57 AM, dbrosIus wrote:
> The problem with the revolutionary vs evolutionary post is that you say to the revolutionaries "go off and play around. Invest all kinds of effort and time" but we are going to not give you any kind of sense as to whether we will do anything with what you produced. "WE" the status quo will decide.
> To me that's just anti-meritocritous.
> Apache tends to be a very conservative organizatiob, with the upshot being do-less becomes favored over do-more.
> I have no skin in this math game, and don't begin to say I know the right and wrong, but let's favor movement.

That is precisely the kind of idiotic argument that decided me to
leave this community.  "Lets favor movement."  Did you ever consider
that maybe, just maybe, the "movement" is stupid?  Lots of talk
about talking about code by people not willing or interested to
actually penetrate the code or do the research to make intelligent
comments or decisions.  Yes, it is technical and specialized.  That
means if you want to "crack it open" you need to be prepared to
really learn the math.  Most of my objections to the latest
"revolution" come down to forcing that issue.  People can call me
"conservative" or any other names, but that is not what is going on
here.  I have a long track record of supporting positive change here
and elsewhere.  Meritocracy used to mean the the best ideas won.  In
this case, vague arguments, personal attacks and hand-waving fuel
the flames of half-baked "revolution" and the technically competent
contributors leave out of exhaustion.  Sorry it has come to this,
but I am just fed up with talking about talking about code, personal
attacks and diminishing quality of code that I contribute to.

Phil

> Dave
>
> -------- Original message --------
> From: Mark Thomas <[hidden email]>
> Date: 02/06/2016  1:17 PM  (GMT-05:00)
> To: Commons Developers List <[hidden email]>
> Subject: Re: [Math] How fast is fast enough?
>
> On 06/02/2016 11:31, James Carman wrote:
>> Okay, folks, this is definitely getting out of hand. Let's put a moratorium
>> on this thread for the weekend or something and try to come back together
>> next week and try to move forward. I would urge folks to watch this while
>> we wait:
>>
>> https://m.youtube.com/watch?v=rOWmrlft2FI
> Revolution vs. evolution is not a new discussion topic at the ASF. In
> addition to the above, I also recommend this:
>
> http://incubator.apache.org/learn/rules-for-revolutionaries.html
>
> Clearly, the rest of the community should have stepped in sooner to try
> and cool things down. That we didn't is a lesson for all of us.
>
>> p.s. Phil, I do hope you'll reconsider
> +1. I very much value the help and advice you have provided with Pool
> and DBCP and I would be very sorry to lose that.
>
>
>> On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <[hidden email]> wrote:
>>
>>> OK, I give up.  I am withdrawing as volunteer chair or member of the
>>> new TLP.
>>>
>>> Phil
>>>
>>> On 2/5/16 7:23 PM, Gilles wrote:
>>>> Phil,
>>>>
>>>> You talk again about me trying to push forward changes that
>>>> serve no purpose besides "trash performance and correctness".
>>>> This is again baseless FUD to which I've already answered
>>>> (with detailed list of facts which you chose to ignore).
>>>> You declare anything for which you don't have an answer as
>>>> "bogus argument". Why is the reference to multi-threaded
>>>> implementations bogus?  You contradict yourself in pretending
>>>> that CM RNGs could be so good as to make people want to use
>>>> them while refusing to consider whether another design might
>>>> be better suited to such high(er)-performance extensions.
>>>> This particular case is a long shot but if any and all
>>>> discussions are stopped dead, how do you imagine that we can
>>>> go anywhere?
>>>> As you could read from experts, micro-benchmarks are deceiving;
>>>> but you refuse to even consider alternative designs if there
>>>> might be a slight suspicion of degradation.
>>>> How can we ever set up a constructive discussion on how to
>>>> make everybody enjoy this project if the purported chair is
>>>> so bent to protecting existing code rather than nurture a good
>>>> relationship with developers who may sometimes have other ideas?
>>>> I'm trying to improve the code (in a dimension which you can't
>>>> seem to understand unfortunately) but respectfully request
>>>> data points from those users of said code, in order to be
>>>> able to prove that no harm will be done.
>>>> But you seem to prefer to not disclose anything that would
>>>> get us closer to agreement (better design with similar
>>>> performance and room for improvement, to be discussed
>>>> together as a real development team -- Not you requiring,
>>>> as a bad boss, that I bow to your standards for judging
>>>> usefulness).
>>>> This 1% which you throw at me, where does it come from?
>>>> What does 1% mean when the benchmark shows standard deviations
>>>> that vary from 4 to 26% in the "nextInt" case and from 3 to
>>>> 7% in the "nextGaussian" case?
>>>> This 1% looks meaningless without context; context is what I'm
>>>> asking in order to try and establish objectively whether
>>>> another design will have a measurable impact on actual tasks.
>>>> I'm not going to show any "damaged" benchmark because of how
>>>> unwelcome you make me feel every time I wish to talk about
>>>> other aspects of the code.
>>>> There is no development community here.  Only solitary
>>>> coders who share a repository.
>>>>
>>>> Not sorry for the top-post,
>>>> Gilles
>>>>
>>>>
>>>> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>>>>> On 2/5/16 12:59 PM, Gilles wrote:
>>>>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> Here is a micro-benchmark report (performed with
>>>>>>>> "PerfTestUtils"):
>>>>>>>> -----
>>>>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>>>>> time
>>>>>>>> unit: ms)
>>>>>>>>                          name time/call std dev total time ratio
>>>>>>>> cv difference
>>>>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>>>>> 0.26 0.0000e+00
>>>>>>>>     o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>>>>> 0.15 -1.2900e+02
>>>>>>>>            o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>>>>> 0.04 2.1032e+02
>>>>>>>>           o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>>>>> 0.14 5.1945e+02
>>>>>>>>          o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>>>>> 0.14 8.1451e+02
>>>>>>>>          o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>>>>> 0.06 9.7816e+02
>>>>>>>>          o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>>>>> 0.08 1.6602e+03
>>>>>>>>          o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>>>>> 0.14 1.7301e+03
>>>>>>>>         o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>>>>> 0.16 1.6139e+02
>>>>>>>> -----
>>>>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>>>>
>>>>>>>> Questions are:
>>>>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>>>>> has
>>>>>>>> a very
>>>>>>>>    small duration (wrt e.g. the duration of other machine
>>>>>>>> instructions that
>>>>>>>>    are required to perform them)?
>>>>>>> It is harder to get good benchmarks for shorter duration
>>>>>>> activities,
>>>>>>> but not impossible.  One thing that it would be good to do is to
>>>>>>> compare these results with JMH [1].
>>>>>> I was expecting insights based on the benchmark which I did run.
>>>>> You asked whether or not benchmarks are meaningful when the task
>>>>> being benchmarked is short duration.  I answered that question.
>>>>>> We have a tool in CM; if it's wrong, we should remove it.
>>>>>> How its results compare with JMH is an interesting question,
>>>>> I will look into this.
>>>>>> I
>>>>>> agree, but I don't have time to make an analysis of benchmarking
>>>>>> tools (on top of what I've been doing since December because
>>>>>> totally innocuous changes in the RNG classes were frowned upon
>>>>>> out of baseless fear).
>>>>> Please cut the hypberbole.
>>>>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>>>>> (absolute
>>>>>>>>    duration) below which anything will be deemed good enough?
>>>>>>> That depends completely on the application.
>>>>>> Sorry, I thought that it was obvious: I don't speak of applications
>>>>>> that don't care about performance. :-)
>>>>>>
>>>>>> For those that do, I do not agree with the statement: the question
>>>>>> relates to finding a point below which it is the environment that
>>>>>> overwhelms the other conditions.
>>>>>> A point where there will be _unavoidable_ overhead (transferring
>>>>>> data
>>>>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>>>>> switches, ...) such that their duration adds a constant time (on
>>>>>> average) that may render most enhancements to an already efficient
>>>>>> algorithm barely noticeable in practice.
>>>>>> Similarly, but in the opposite direction, some language constructs
>>>>>> or design choices might slow down things a bit, but without
>>>>>> endangering any user.
>>>>>>
>>>>>> A problem arises when any enhancement to the design is deemed
>>>>>> harmful because it degrades a micro-benchmark, even though that
>>>>>> benchmark may not reflect any real use-cases.
>>>>>> Then, the real harm is against development.
>>>>>>
>>>>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>>>>> performance and
>>>>>>>>    good design?   IOW, is there an acceptable overhead in exchange
>>>>>>>> for other qualities
>>>>>>>>    (clarity, non-redundancy, extensibility, etc.)?
>>>>>>> That is too general a question to be meaningful.   We need to look
>>>>>>> at specific cases.  What exactly are you proposing?
>>>>>> <rant>
>>>>>> It is quite meaningful even if it refers to general principles.
>>>>>> Those could (should, IMO) be taken into account when managing a
>>>>>> project like CM, on a par with "performance" (whose intrinsic value
>>>>>> is never questioned).
>>>>>> </rant>
>>>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>>>> Two specific cases are:
>>>>>> * inheritance vs delegation (a.k.a. composition)
>>>>>> * generics (that could require runtime casts)
>>>>> This is getting closer to meaningful.  Where exactly in the code are
>>>>> you wanting to use something and seeing benchmark damage?
>>>>>>>> * Does ultimate performance for the base functionality
>>>>>>>> (generation
>>>>>>>> of a
>>>>>>>>    random number) trump any consideration of use-cases that would
>>>>>>>> need an
>>>>>>>>    extension (of the base functionality, such as computation to
>>>>>>>> match another
>>>>>>>>    distribution) that will unavoidably degrades the performance
>>>>>>>> (hence the
>>>>>>>>    micro-benchmark will be completely misleading for those users)?
>>>>>>> Again, this is vague and the answer depends on what exactly you
>>>>>>> are
>>>>>>> talking about. Significantly damaging performance of PRNG
>>>>>>> implementations is a bad idea,
>>>>>> Now, *this* is vague: what do you mean by "significantly"?
>>>>>> That was actually my question in the first place.
>>>>> If you are talking about PRNG performance, I would say a 1% hit is
>>>>> significant.
>>>>>> Referring to the
>>>>>> benchmark above, people who'd know why they require ultimate
>>>>>> performance
>>>>>> should be able to tell what range of numbers they'd find
>>>>>> acceptable in
>>>>>> that table.
>>>>>>
>>>>>> <rant>
>>>>>> Actually my questions are very precise, but the answers would
>>>>>> require
>>>>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>>>>> </rant>
>>>>>>
>>>>>> In the Javadoc of the "random" package, there is information about
>>>>>> performance but no reference as to the benchmarking procedure.
>>>>> It would be great to repeat these using JMH, which is emerging as a
>>>>> de facto standard for java benchmarking.  I will look into this.
>>>>>> I can consistently observe a totally different behaviour (using
>>>>>> "PerfTestUtils"):
>>>>>>   1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>>>>   2. moreover, the ratio *grows* with each of the longer periods
>>>>>>      members of the WELL family (see the above table).
>>>>>>
>>>>>> This makes me wonder how someone who purports to need "ultimate"
>>>>>> performance can have any objective basis to determine what is good
>>>>>> or bad for his own applications.
>>>>>>
>>>>>>> unless there are actual practical use
>>>>>>> cases you can point to that whatever changes you are proposing
>>>>>>> enable.
>>>>>> As I've explained in very much details in another thread, I've
>>>>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>>>>> there
>>>>>> is room for improvement (cf. above for what I mean by that term).
>>>>>> <rant>
>>>>>> I have some code ready for review but I had to resort to what I
>>>>>> considered sub-optimal design (preemptively renouncing to propose a
>>>>>> "delegation"-based design) solely because of the destructive
>>>>>> community
>>>>>> process that takes place here.[1]
>>>>>> </rant>
>>>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>>>> code or design issues.
>>>>>> The practical use-cases is anything that needs further
>>>>>> processing of
>>>>>> the numbers produced according to a uniform distribution:
>>>>> Isn't that what the samplers in the distributions package do?  What
>>>>> we need from the PRNG implementations is just blocks of bits.  Since
>>>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>>>> ints, longs and floats and gaussian floats.  The samplers just need
>>>>> uniform doubles.  The practical use case we need is well-supported
>>>>> in the code we have.  What is missing, exactly?
>>>>>> I agree that
>>>>>> there would be little sense to code that latter part in a "pure" OO
>>>>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>>>>> various
>>>>>> concrete classes.
>>>>>> What I want to reconsider is how those concrete low-level
>>>>>> algorithms can
>>>>>> be plugged in a higher-level function that just requires a
>>>>>> "source of
>>>>>> randomness", as I'd call a provider of "int" (or "long") values,
>>>>>> where
>>>>>> the high level functionality does not care at all about the
>>>>>> provider's
>>>>>> inner working (a.o. how it's seeded!).
>>>>> This is why many higher-level samplers and other things that require
>>>>> random data inside [math] have a pluggable RandomGenerator.
>>>>>> A case in point is the sampling of other distributions (namely the
>>>>>> Normal distribution).
>>>>> Or any of the others.  We have a default, inversion-based method
>>>>> that the abstract distribution classes provide and some pretty good
>>>>> specialized implementations within individual distributions.  Most
>>>>> of these just require uniform random doubles as source.
>>>>>
>>>>>> Here is the benchmark report:
>>>>>> -----
>>>>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>>>>> time unit: ms)
>>>>>>                          name time/call std dev total time ratio
>>>>>> cv difference
>>>>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>>>>> 0.14 0.0000e+00
>>>>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>>>>> 0.07 1.2892e+04
>>>>>>     o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>>>>> 0.06 1.0393e+04
>>>>>>            o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>>>>> 0.07 1.1360e+04
>>>>>>           o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>>>>> 0.04 1.1513e+04
>>>>>>          o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>>>>> 0.03 1.2125e+04
>>>>>>          o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>>>>> 0.06 1.1928e+04
>>>>>>          o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>>>>> 0.04 1.3931e+04
>>>>>>          o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>>>>> 0.06 1.4118e+04
>>>>>>         o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>>>>> 0.08 1.1049e+04
>>>>>> -----
>>>>>> where the first line is JDK's "nextInt()" and the remaining are
>>>>>> "nextGaussian()".
>>>>>>
>>>>>> The generation time is thus about 4-fold that of "nextInt()".
>>>>>> Thus, degrading the performance of "nextInt()" by 10% would
>>>>>> degrade the
>>>>>> performance of "nextGaussian()" by half that.
>>>>>>
>>>>>> For a performance discussion to be meaningful, I think that we'd
>>>>>> need
>>>>>> to know how that fact would affect, even modestly, any moderately
>>>>>> complex
>>>>>> post-processing of the generated values.
>>>>>>
>>>>>> Another case, for modularity, would be to consider that other
>>>>>> algorithms could
>>>>>> be implemented to provide the required distribution.[3]
>>>>>> In the current design (inheritance-based), that can only be done
>>>>>> by creating
>>>>>> a subclass, even though the core functionality ("nextDouble()") is
>>>>>> not
>>>>>> overridden.
>>>>>>
>>>>>>>> * What are usages of the CM RNGs?
>>>>>>>>    Do those use-cases strictly forbid "loosing" a dozen
>>>>>>>> milliseconds per
>>>>>>>>    million calls?
>>>>>>> There are many different use cases.  My own applications use
>>>>>>> them in
>>>>>>> simulations to generate random deviates, to generate random hex
>>>>>>> strings as identifiers and in stochastic algorithms like some
>>>>>>> of our
>>>>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>>>>> performance.
>>>>>> Thanks for giving examples, but since we talk about performance, I
>>>>>> was hoping for some real flesh, like the relative duration of
>>>>>> numbers
>>>>>> generation (e.g. the total duration of calls to the
>>>>>> "RandomGenerator"
>>>>>> instances wrt to the total duration of the application).
>>>>>>
>>>>>> I don't know if by "last case", you are referring to code that is
>>>>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>>>>> RNG (in the sense that generation would count as a sizable part of
>>>>>> the whole processing).
>>>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>>>> As I pointed out many times: if an application is severely
>>>>>> dependent
>>>>>> on the performance of RNG, the user probably will turn to specific
>>>>>> tools (e.g. GPUs? [4]) rather than use CM.
>>>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>>>> so their use can extend to performance-sensitive applications.
>>>>>> Conversely, using Java might be preferred for its flexibility,
>>>>>> which
>>>>>> is destroyed by a search for ultimate performance (which nobody
>>>>>> seems
>>>>>> able to define reasonably).
>>>>>> Performance is not a goal in itself; it should not be a trophy
>>>>>> which
>>>>>> sits uselessly on a shelf.
>>>>> Nor should "beautiful design" in the eyes of one person.
>>>>>> My goal is not to deliberately slow things down; it is to allow
>>>>>> some
>>>>>> leeway so that designs which are deemed better (on all counts
>>>>>> except,
>>>>>> perhaps, performance) are given a chance to show their
>>>>>> strengths, in
>>>>>> particular in areas where performance in absolute terms is "good
>>>>>> enough" for all use-cases which CM should care about (hence the
>>>>>> need
>>>>>> of actual data points[5]).
>>>>> I see no reason that we can't have it both ways - good design and
>>>>> good performance. What we have now, modulo maybe some small changes
>>>>> to reduce code duplication, works fine.  If you want to play with
>>>>> 64-bit generators and can find reference implementations and verify
>>>>> that they do in fact perform better, great.  If not, I don't see the
>>>>> point.  You can rant and complain all you want; but I am not going
>>>>> to let us trash performance or correctness of code in the random
>>>>> class or anywhere else just because you think it is somehow "better
>>>>> designed"  unless you can show specific, practical use cases
>>>>> demonstrating the value of the changes.
>>>>>
>>>>> Phil
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>> [1] "Is it faster?"
>>>>>>      "No."
>>>>>>      "Then, no."
>>>>>> [2] Although that is in some sense what you indirectly defend by
>>>>>> wanting
>>>>>>      to stick with a meaningless "next(int bits)" method.
>>>>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>>>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>>>>> [5] Hence the need to agree on a methodology/policy for
>>>>>> benchmarking.
>>>>>>
>>>>>>> Phil
>>>>>>>
>>>>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>>>>>    IOW, would those users for which such a difference matters use
>>>>>>>> CM at all?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gilles
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Matt Benson-2
I don't know that Dave intended his comments precisely as an indictment in
this specific situation, but he is certainly neither the first nor, almost
certainly, the last person to express similar feelings. I appreciated your
support during my recent release vote, Phil, and would be sorry to see you
leave the Commons community (even though I myself sometimes consider
leaving as well). I agree that yours is not a voice I typically see as
regressive/repressive.

You mentioned digging in and learning the math. I have a very average
mathematical background and therefore have always stayed far away from
discussions having to do with that component (incidentally, I liked the
name Epsilon but, again, didn't cast a vote). In this situation, however,
it strikes me that, as far as I noticed, the main issue was whether
performance were negatively impacted by the suggested change, and I can't
help wondering if there were not some way to identify the reason for the
degradation and address it, perhaps even with bytecode modification (not
that I suffer from golden hammer syndrome or anything).

Br,
Matt
On Feb 6, 2016 3:37 PM, "Phil Steitz" <[hidden email]> wrote:

> On 2/6/16 11:57 AM, dbrosIus wrote:
> > The problem with the revolutionary vs evolutionary post is that you say
> to the revolutionaries "go off and play around. Invest all kinds of effort
> and time" but we are going to not give you any kind of sense as to whether
> we will do anything with what you produced. "WE" the status quo will decide.
> > To me that's just anti-meritocritous.
> > Apache tends to be a very conservative organizatiob, with the upshot
> being do-less becomes favored over do-more.
> > I have no skin in this math game, and don't begin to say I know the
> right and wrong, but let's favor movement.
>
> That is precisely the kind of idiotic argument that decided me to
> leave this community.  "Lets favor movement."  Did you ever consider
> that maybe, just maybe, the "movement" is stupid?  Lots of talk
> about talking about code by people not willing or interested to
> actually penetrate the code or do the research to make intelligent
> comments or decisions.  Yes, it is technical and specialized.  That
> means if you want to "crack it open" you need to be prepared to
> really learn the math.  Most of my objections to the latest
> "revolution" come down to forcing that issue.  People can call me
> "conservative" or any other names, but that is not what is going on
> here.  I have a long track record of supporting positive change here
> and elsewhere.  Meritocracy used to mean the the best ideas won.  In
> this case, vague arguments, personal attacks and hand-waving fuel
> the flames of half-baked "revolution" and the technically competent
> contributors leave out of exhaustion.  Sorry it has come to this,
> but I am just fed up with talking about talking about code, personal
> attacks and diminishing quality of code that I contribute to.
>
> Phil
> > Dave
> >
> > -------- Original message --------
> > From: Mark Thomas <[hidden email]>
> > Date: 02/06/2016  1:17 PM  (GMT-05:00)
> > To: Commons Developers List <[hidden email]>
> > Subject: Re: [Math] How fast is fast enough?
> >
> > On 06/02/2016 11:31, James Carman wrote:
> >> Okay, folks, this is definitely getting out of hand. Let's put a
> moratorium
> >> on this thread for the weekend or something and try to come back
> together
> >> next week and try to move forward. I would urge folks to watch this
> while
> >> we wait:
> >>
> >> https://m.youtube.com/watch?v=rOWmrlft2FI
> > Revolution vs. evolution is not a new discussion topic at the ASF. In
> > addition to the above, I also recommend this:
> >
> > http://incubator.apache.org/learn/rules-for-revolutionaries.html
> >
> > Clearly, the rest of the community should have stepped in sooner to try
> > and cool things down. That we didn't is a lesson for all of us.
> >
> >> p.s. Phil, I do hope you'll reconsider
> > +1. I very much value the help and advice you have provided with Pool
> > and DBCP and I would be very sorry to lose that.
> >
> >
> >> On Fri, Feb 5, 2016 at 10:47 PM Phil Steitz <[hidden email]>
> wrote:
> >>
> >>> OK, I give up.  I am withdrawing as volunteer chair or member of the
> >>> new TLP.
> >>>
> >>> Phil
> >>>
> >>> On 2/5/16 7:23 PM, Gilles wrote:
> >>>> Phil,
> >>>>
> >>>> You talk again about me trying to push forward changes that
> >>>> serve no purpose besides "trash performance and correctness".
> >>>> This is again baseless FUD to which I've already answered
> >>>> (with detailed list of facts which you chose to ignore).
> >>>> You declare anything for which you don't have an answer as
> >>>> "bogus argument". Why is the reference to multi-threaded
> >>>> implementations bogus?  You contradict yourself in pretending
> >>>> that CM RNGs could be so good as to make people want to use
> >>>> them while refusing to consider whether another design might
> >>>> be better suited to such high(er)-performance extensions.
> >>>> This particular case is a long shot but if any and all
> >>>> discussions are stopped dead, how do you imagine that we can
> >>>> go anywhere?
> >>>> As you could read from experts, micro-benchmarks are deceiving;
> >>>> but you refuse to even consider alternative designs if there
> >>>> might be a slight suspicion of degradation.
> >>>> How can we ever set up a constructive discussion on how to
> >>>> make everybody enjoy this project if the purported chair is
> >>>> so bent to protecting existing code rather than nurture a good
> >>>> relationship with developers who may sometimes have other ideas?
> >>>> I'm trying to improve the code (in a dimension which you can't
> >>>> seem to understand unfortunately) but respectfully request
> >>>> data points from those users of said code, in order to be
> >>>> able to prove that no harm will be done.
> >>>> But you seem to prefer to not disclose anything that would
> >>>> get us closer to agreement (better design with similar
> >>>> performance and room for improvement, to be discussed
> >>>> together as a real development team -- Not you requiring,
> >>>> as a bad boss, that I bow to your standards for judging
> >>>> usefulness).
> >>>> This 1% which you throw at me, where does it come from?
> >>>> What does 1% mean when the benchmark shows standard deviations
> >>>> that vary from 4 to 26% in the "nextInt" case and from 3 to
> >>>> 7% in the "nextGaussian" case?
> >>>> This 1% looks meaningless without context; context is what I'm
> >>>> asking in order to try and establish objectively whether
> >>>> another design will have a measurable impact on actual tasks.
> >>>> I'm not going to show any "damaged" benchmark because of how
> >>>> unwelcome you make me feel every time I wish to talk about
> >>>> other aspects of the code.
> >>>> There is no development community here.  Only solitary
> >>>> coders who share a repository.
> >>>>
> >>>> Not sorry for the top-post,
> >>>> Gilles
> >>>>
> >>>>
> >>>> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
> >>>>> On 2/5/16 12:59 PM, Gilles wrote:
> >>>>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
> >>>>>>> On 2/4/16 3:59 PM, Gilles wrote:
> >>>>>>>> Hi.
> >>>>>>>>
> >>>>>>>> Here is a micro-benchmark report (performed with
> >>>>>>>> "PerfTestUtils"):
> >>>>>>>> -----
> >>>>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
> >>>>>>>> time
> >>>>>>>> unit: ms)
> >>>>>>>>                          name time/call std dev total time ratio
> >>>>>>>> cv difference
> >>>>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
> >>>>>>>> 0.26 0.0000e+00
> >>>>>>>>     o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
> >>>>>>>> 0.15 -1.2900e+02
> >>>>>>>>            o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
> >>>>>>>> 0.04 2.1032e+02
> >>>>>>>>           o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
> >>>>>>>> 0.14 5.1945e+02
> >>>>>>>>          o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
> >>>>>>>> 0.14 8.1451e+02
> >>>>>>>>          o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
> >>>>>>>> 0.06 9.7816e+02
> >>>>>>>>          o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
> >>>>>>>> 0.08 1.6602e+03
> >>>>>>>>          o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
> >>>>>>>> 0.14 1.7301e+03
> >>>>>>>>         o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
> >>>>>>>> 0.16 1.6139e+02
> >>>>>>>> -----
> >>>>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
> >>>>>>>>
> >>>>>>>> Questions are:
> >>>>>>>> * How meaningful are micro-benchmarks when the timed operation
> >>>>>>>> has
> >>>>>>>> a very
> >>>>>>>>    small duration (wrt e.g. the duration of other machine
> >>>>>>>> instructions that
> >>>>>>>>    are required to perform them)?
> >>>>>>> It is harder to get good benchmarks for shorter duration
> >>>>>>> activities,
> >>>>>>> but not impossible.  One thing that it would be good to do is to
> >>>>>>> compare these results with JMH [1].
> >>>>>> I was expecting insights based on the benchmark which I did run.
> >>>>> You asked whether or not benchmarks are meaningful when the task
> >>>>> being benchmarked is short duration.  I answered that question.
> >>>>>> We have a tool in CM; if it's wrong, we should remove it.
> >>>>>> How its results compare with JMH is an interesting question,
> >>>>> I will look into this.
> >>>>>> I
> >>>>>> agree, but I don't have time to make an analysis of benchmarking
> >>>>>> tools (on top of what I've been doing since December because
> >>>>>> totally innocuous changes in the RNG classes were frowned upon
> >>>>>> out of baseless fear).
> >>>>> Please cut the hypberbole.
> >>>>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
> >>>>>>>> (absolute
> >>>>>>>>    duration) below which anything will be deemed good enough?
> >>>>>>> That depends completely on the application.
> >>>>>> Sorry, I thought that it was obvious: I don't speak of applications
> >>>>>> that don't care about performance. :-)
> >>>>>>
> >>>>>> For those that do, I do not agree with the statement: the question
> >>>>>> relates to finding a point below which it is the environment that
> >>>>>> overwhelms the other conditions.
> >>>>>> A point where there will be _unavoidable_ overhead (transferring
> >>>>>> data
> >>>>>> from/to memory, JVM book-keeping, ...) and perturbations (context
> >>>>>> switches, ...) such that their duration adds a constant time (on
> >>>>>> average) that may render most enhancements to an already efficient
> >>>>>> algorithm barely noticeable in practice.
> >>>>>> Similarly, but in the opposite direction, some language constructs
> >>>>>> or design choices might slow down things a bit, but without
> >>>>>> endangering any user.
> >>>>>>
> >>>>>> A problem arises when any enhancement to the design is deemed
> >>>>>> harmful because it degrades a micro-benchmark, even though that
> >>>>>> benchmark may not reflect any real use-cases.
> >>>>>> Then, the real harm is against development.
> >>>>>>
> >>>>>>>> * Can a library like CM admit a trade-off between ultimate
> >>>>>>>> performance and
> >>>>>>>>    good design?   IOW, is there an acceptable overhead in exchange
> >>>>>>>> for other qualities
> >>>>>>>>    (clarity, non-redundancy, extensibility, etc.)?
> >>>>>>> That is too general a question to be meaningful.   We need to look
> >>>>>>> at specific cases.  What exactly are you proposing?
> >>>>>> <rant>
> >>>>>> It is quite meaningful even if it refers to general principles.
> >>>>>> Those could (should, IMO) be taken into account when managing a
> >>>>>> project like CM, on a par with "performance" (whose intrinsic value
> >>>>>> is never questioned).
> >>>>>> </rant>
> >>>>> Rant all you want.  Vague generalities and hyperbole have no value.
> >>>>>> Two specific cases are:
> >>>>>> * inheritance vs delegation (a.k.a. composition)
> >>>>>> * generics (that could require runtime casts)
> >>>>> This is getting closer to meaningful.  Where exactly in the code are
> >>>>> you wanting to use something and seeing benchmark damage?
> >>>>>>>> * Does ultimate performance for the base functionality
> >>>>>>>> (generation
> >>>>>>>> of a
> >>>>>>>>    random number) trump any consideration of use-cases that would
> >>>>>>>> need an
> >>>>>>>>    extension (of the base functionality, such as computation to
> >>>>>>>> match another
> >>>>>>>>    distribution) that will unavoidably degrades the performance
> >>>>>>>> (hence the
> >>>>>>>>    micro-benchmark will be completely misleading for those users)?
> >>>>>>> Again, this is vague and the answer depends on what exactly you
> >>>>>>> are
> >>>>>>> talking about. Significantly damaging performance of PRNG
> >>>>>>> implementations is a bad idea,
> >>>>>> Now, *this* is vague: what do you mean by "significantly"?
> >>>>>> That was actually my question in the first place.
> >>>>> If you are talking about PRNG performance, I would say a 1% hit is
> >>>>> significant.
> >>>>>> Referring to the
> >>>>>> benchmark above, people who'd know why they require ultimate
> >>>>>> performance
> >>>>>> should be able to tell what range of numbers they'd find
> >>>>>> acceptable in
> >>>>>> that table.
> >>>>>>
> >>>>>> <rant>
> >>>>>> Actually my questions are very precise, but the answers would
> >>>>>> require
> >>>>>> some decent analysis, rather than the usual "bad idea" dismissal.
> >>>>>> </rant>
> >>>>>>
> >>>>>> In the Javadoc of the "random" package, there is information about
> >>>>>> performance but no reference as to the benchmarking procedure.
> >>>>> It would be great to repeat these using JMH, which is emerging as a
> >>>>> de facto standard for java benchmarking.  I will look into this.
> >>>>>> I can consistently observe a totally different behaviour (using
> >>>>>> "PerfTestUtils"):
> >>>>>>   1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
> >>>>>>   2. moreover, the ratio *grows* with each of the longer periods
> >>>>>>      members of the WELL family (see the above table).
> >>>>>>
> >>>>>> This makes me wonder how someone who purports to need "ultimate"
> >>>>>> performance can have any objective basis to determine what is good
> >>>>>> or bad for his own applications.
> >>>>>>
> >>>>>>> unless there are actual practical use
> >>>>>>> cases you can point to that whatever changes you are proposing
> >>>>>>> enable.
> >>>>>> As I've explained in very much details in another thread, I've
> >>>>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
> >>>>>> there
> >>>>>> is room for improvement (cf. above for what I mean by that term).
> >>>>>> <rant>
> >>>>>> I have some code ready for review but I had to resort to what I
> >>>>>> considered sub-optimal design (preemptively renouncing to propose a
> >>>>>> "delegation"-based design) solely because of the destructive
> >>>>>> community
> >>>>>> process that takes place here.[1]
> >>>>>> </rant>
> >>>>> More vague hyperbole that serves no purpose.  Please focus on actual
> >>>>> code or design issues.
> >>>>>> The practical use-cases is anything that needs further
> >>>>>> processing of
> >>>>>> the numbers produced according to a uniform distribution:
> >>>>> Isn't that what the samplers in the distributions package do?  What
> >>>>> we need from the PRNG implementations is just blocks of bits.  Since
> >>>>> we wanted a pluggable replacement for j.u.Random, we added uniform
> >>>>> ints, longs and floats and gaussian floats.  The samplers just need
> >>>>> uniform doubles.  The practical use case we need is well-supported
> >>>>> in the code we have.  What is missing, exactly?
> >>>>>> I agree that
> >>>>>> there would be little sense to code that latter part in a "pure" OO
> >>>>>> way[2].  And Luc made it indeed quite efficient, I think, in the
> >>>>>> various
> >>>>>> concrete classes.
> >>>>>> What I want to reconsider is how those concrete low-level
> >>>>>> algorithms can
> >>>>>> be plugged in a higher-level function that just requires a
> >>>>>> "source of
> >>>>>> randomness", as I'd call a provider of "int" (or "long") values,
> >>>>>> where
> >>>>>> the high level functionality does not care at all about the
> >>>>>> provider's
> >>>>>> inner working (a.o. how it's seeded!).
> >>>>> This is why many higher-level samplers and other things that require
> >>>>> random data inside [math] have a pluggable RandomGenerator.
> >>>>>> A case in point is the sampling of other distributions (namely the
> >>>>>> Normal distribution).
> >>>>> Or any of the others.  We have a default, inversion-based method
> >>>>> that the abstract distribution classes provide and some pretty good
> >>>>> specialized implementations within individual distributions.  Most
> >>>>> of these just require uniform random doubles as source.
> >>>>>
> >>>>>> Here is the benchmark report:
> >>>>>> -----
> >>>>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
> >>>>>> time unit: ms)
> >>>>>>                          name time/call std dev total time ratio
> >>>>>> cv difference
> >>>>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
> >>>>>> 0.14 0.0000e+00
> >>>>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
> >>>>>> 0.07 1.2892e+04
> >>>>>>     o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
> >>>>>> 0.06 1.0393e+04
> >>>>>>            o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
> >>>>>> 0.07 1.1360e+04
> >>>>>>           o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
> >>>>>> 0.04 1.1513e+04
> >>>>>>          o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
> >>>>>> 0.03 1.2125e+04
> >>>>>>          o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
> >>>>>> 0.06 1.1928e+04
> >>>>>>          o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
> >>>>>> 0.04 1.3931e+04
> >>>>>>          o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
> >>>>>> 0.06 1.4118e+04
> >>>>>>         o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
> >>>>>> 0.08 1.1049e+04
> >>>>>> -----
> >>>>>> where the first line is JDK's "nextInt()" and the remaining are
> >>>>>> "nextGaussian()".
> >>>>>>
> >>>>>> The generation time is thus about 4-fold that of "nextInt()".
> >>>>>> Thus, degrading the performance of "nextInt()" by 10% would
> >>>>>> degrade the
> >>>>>> performance of "nextGaussian()" by half that.
> >>>>>>
> >>>>>> For a performance discussion to be meaningful, I think that we'd
> >>>>>> need
> >>>>>> to know how that fact would affect, even modestly, any moderately
> >>>>>> complex
> >>>>>> post-processing of the generated values.
> >>>>>>
> >>>>>> Another case, for modularity, would be to consider that other
> >>>>>> algorithms could
> >>>>>> be implemented to provide the required distribution.[3]
> >>>>>> In the current design (inheritance-based), that can only be done
> >>>>>> by creating
> >>>>>> a subclass, even though the core functionality ("nextDouble()") is
> >>>>>> not
> >>>>>> overridden.
> >>>>>>
> >>>>>>>> * What are usages of the CM RNGs?
> >>>>>>>>    Do those use-cases strictly forbid "loosing" a dozen
> >>>>>>>> milliseconds per
> >>>>>>>>    million calls?
> >>>>>>> There are many different use cases.  My own applications use
> >>>>>>> them in
> >>>>>>> simulations to generate random deviates, to generate random hex
> >>>>>>> strings as identifiers and in stochastic algorithms like some
> >>>>>>> of our
> >>>>>>> internal uses.  The last case is definitely sensitive to PRNG
> >>>>>>> performance.
> >>>>>> Thanks for giving examples, but since we talk about performance, I
> >>>>>> was hoping for some real flesh, like the relative duration of
> >>>>>> numbers
> >>>>>> generation (e.g. the total duration of calls to the
> >>>>>> "RandomGenerator"
> >>>>>> instances wrt to the total duration of the application).
> >>>>>>
> >>>>>> I don't know if by "last case", you are referring to code that is
> >>>>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
> >>>>>> RNG (in the sense that generation would count as a sizable part of
> >>>>>> the whole processing).
> >>>>> monteCarloP in KolmogorovSmirnovTest is one to check.
> >>>>>> As I pointed out many times: if an application is severely
> >>>>>> dependent
> >>>>>> on the performance of RNG, the user probably will turn to specific
> >>>>>> tools (e.g. GPUs? [4]) rather than use CM.
> >>>>> That is a bogus argument.  We should make our PRNGs simple and fast
> >>>>> so their use can extend to performance-sensitive applications.
> >>>>>> Conversely, using Java might be preferred for its flexibility,
> >>>>>> which
> >>>>>> is destroyed by a search for ultimate performance (which nobody
> >>>>>> seems
> >>>>>> able to define reasonably).
> >>>>>> Performance is not a goal in itself; it should not be a trophy
> >>>>>> which
> >>>>>> sits uselessly on a shelf.
> >>>>> Nor should "beautiful design" in the eyes of one person.
> >>>>>> My goal is not to deliberately slow things down; it is to allow
> >>>>>> some
> >>>>>> leeway so that designs which are deemed better (on all counts
> >>>>>> except,
> >>>>>> perhaps, performance) are given a chance to show their
> >>>>>> strengths, in
> >>>>>> particular in areas where performance in absolute terms is "good
> >>>>>> enough" for all use-cases which CM should care about (hence the
> >>>>>> need
> >>>>>> of actual data points[5]).
> >>>>> I see no reason that we can't have it both ways - good design and
> >>>>> good performance. What we have now, modulo maybe some small changes
> >>>>> to reduce code duplication, works fine.  If you want to play with
> >>>>> 64-bit generators and can find reference implementations and verify
> >>>>> that they do in fact perform better, great.  If not, I don't see the
> >>>>> point.  You can rant and complain all you want; but I am not going
> >>>>> to let us trash performance or correctness of code in the random
> >>>>> class or anywhere else just because you think it is somehow "better
> >>>>> designed"  unless you can show specific, practical use cases
> >>>>> demonstrating the value of the changes.
> >>>>>
> >>>>> Phil
> >>>>>>
> >>>>>> Gilles
> >>>>>>
> >>>>>> [1] "Is it faster?"
> >>>>>>      "No."
> >>>>>>      "Then, no."
> >>>>>> [2] Although that is in some sense what you indirectly defend by
> >>>>>> wanting
> >>>>>>      to stick with a meaningless "next(int bits)" method.
> >>>>>> [3] http://www.doornik.com/research/ziggurat.pdf
> >>>>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
> >>>>>> [5] Hence the need to agree on a methodology/policy for
> >>>>>> benchmarking.
> >>>>>>
> >>>>>>> Phil
> >>>>>>>
> >>>>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
> >>>>>>>>    IOW, would those users for which such a difference matters use
> >>>>>>>> CM at all?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Gilles
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Benson Margulies
1. I don't understand the source of urgency here. If someone has a new
algorithm they want to release to the general public, they can put it
on github. It does not matter very much if a method is sitting in
Apache (commons) Math on any particular schedule. It's not like adding
a feature to some platform that has to be integrated to be useful. Of
course, the 'Apache (commons) Math mark of quality' won't mean
anything if there is a sudden shift to 'movement' as a guiding
principle, so ironically the victory would be Pyrrhic.

2. JMH is the current gold standard of microbenchmark measurement. It
gives meaningful results. If someone wants to claim that they have new
code that has some particular performance on a micro scale, they
should be eager to contribute a JMH benchmark; it's not much work.

3. Apache Math can only come into existence if the board is convinced
that there is a community prepared to operated according to ASF
principles. That means finding some way to collegially resolve
disputes. There's always a trip to the incubator available.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] How fast is fast enough?

Phil Steitz
On 2/6/16 4:34 PM, Benson Margulies wrote:
> 1. I don't understand the source of urgency here. If someone has a new
> algorithm they want to release to the general public, they can put it
> on github. It does not matter very much if a method is sitting in
> Apache (commons) Math on any particular schedule. It's not like adding
> a feature to some platform that has to be integrated to be useful.

It would be great if what we were arguing about was actually
*adding* a *new algorithm* - I would be big +1 to more PRNGs.
Unfortunately, "the movement" here is about refactoring the
framework that provides the implementations of the algorithms we
already have.  No practical use case or new, better algorithm has
been presented to justify this "movement."  The sad thing is we have
a nice, pluggable framework for PRNGs everywhere in [math} which is
simple and performant.  What really would be an advance is
implementation of new algorithms, which could sit nicely beside the
others that already exist and users could choose to use or not use
them.  Something demonstrably better than the Well generators that
we plug in as defaults in various places could be really useful.  We
could make that change transparently and lots of users would
benefit.  Unfortunately, we are not talking about that.  This is why
I am leaving.  We started [math] 13 years ago aiming to provide
high-quality, well-documented, well-tested implementations of
commonly used standard algorithms.  I am proud of what we have
done.  Now, however, the loudest, most persistent voices are more
interested in fiddling with the APIs rather than researching and
implementing new algorithms or improving the ones we have.  That's
fine; just not interesting to me nor in my opinion in the best
interest of the users.  What they end up with is incompatible change
without material improvement in functionality.  It is natural, I
guess, for components and communities to evolve and that is fine.  I
am sure others will step up to maintain and develop this code.
>  Of
> course, the 'Apache (commons) Math mark of quality' won't mean
> anything if there is a sudden shift to 'movement' as a guiding
> principle, so ironically the victory would be Pyrrhic.
>
> 2. JMH is the current gold standard of microbenchmark measurement. It
> gives meaningful results. If someone wants to claim that they have new
> code that has some particular performance on a micro scale, they
> should be eager to contribute a JMH benchmark; it's not much work.

Big +1 there.  And jmh uses [math]'s stat package.
>
> 3. Apache Math can only come into existence if the board is convinced
> that there is a community prepared to operated according to ASF
> principles. That means finding some way to collegially resolve
> disputes. There's always a trip to the incubator available.

Oh wouldn't that be lovely!

Phil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12