[Math] JDK "Random" and "BitsStreamGenerator"

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Math] JDK "Random" and "BitsStreamGenerator"

Gilles Sadowski
Hi.

In CM, a base class "BitsStreamGenerator" contains methods for
generating various primitive types from a bit source (to be
implemented by subclasses by overriding the "next(int bits)"
method).
For example, to get a random "long" value, the following code
is called:
---CUT---
     public long nextLong() {
         final long high  = ((long) next(32)) << 32;
         final long  low  = (next(32)) & 0xffffffffL;
         return high | low;
     }
---CUT---

If the bit source were provided by the JDK's "Random" class,
is it assumed that the "long" value generated by the above
code will always be equal to the "long" value generated by
the call to JDK's "Random" class?

In other words, how "standard" are the default implementations
of the methods defined in "BitsStreamGenerator"?

And what are the compatibility requirements with respect to
reproducibility of the sequences (between releases of the
library)?


Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] JDK "Random" and "BitsStreamGenerator"

Luc Maisonobe-2
Le 26/01/2016 01:13, Gilles a écrit :
> Hi.

Hi Gilles,

>
> In CM, a base class "BitsStreamGenerator" contains methods for
> generating various primitive types from a bit source (to be
> implemented by subclasses by overriding the "next(int bits)"
> method).
> For example, to get a random "long" value, the following code
> is called:
> ---CUT---
>     public long nextLong() {
>         final long high  = ((long) next(32)) << 32;
>         final long  low  = (next(32)) & 0xffffffffL;
>         return high | low;
>     }
> ---CUT---
>
> If the bit source were provided by the JDK's "Random" class,
> is it assumed that the "long" value generated by the above
> code will always be equal to the "long" value generated by
> the call to JDK's "Random" class?

No. There is nothing said anywhere in the javadoc that would imply
one implementation of nextLong is the same as another implementation.
The only thing that user can exapect is that when they choose one
implementation, it is consistent, i.e. it is "random enough" and
"not biased". Two different implementations can fulfill these
requirements without producing the same sequence.

>
> In other words, how "standard" are the default implementations
> of the methods defined in "BitsStreamGenerator"?

I don't think there are any standard to abide by.

>
> And what are the compatibility requirements with respect to
> reproducibility of the sequences (between releases of the
> library)?

It's a grey zone, IMHO. It is nice to preserve reproducibility
as changing it changes the reference test cases with fixed seeds.
On the other hand, too much compatibility really is a pain, we
feel it constantly, so I would say that as soon as we need to
change the sequence, we can just do it, and check all the tests
we know of (i.e. those of our own library) still pass and we
go forward.

best regards,
Luc

>
>
> Regards,
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] JDK "Random" and "BitsStreamGenerator"

Phil Steitz
On 1/26/16 1:47 AM, Luc Maisonobe wrote:

> Le 26/01/2016 01:13, Gilles a écrit :
>> Hi.
> Hi Gilles,
>
>> In CM, a base class "BitsStreamGenerator" contains methods for
>> generating various primitive types from a bit source (to be
>> implemented by subclasses by overriding the "next(int bits)"
>> method).
>> For example, to get a random "long" value, the following code
>> is called:
>> ---CUT---
>>     public long nextLong() {
>>         final long high  = ((long) next(32)) << 32;
>>         final long  low  = (next(32)) & 0xffffffffL;
>>         return high | low;
>>     }
>> ---CUT---
>>
>> If the bit source were provided by the JDK's "Random" class,
>> is it assumed that the "long" value generated by the above
>> code will always be equal to the "long" value generated by
>> the call to JDK's "Random" class?
> No. There is nothing said anywhere in the javadoc that would imply
> one implementation of nextLong is the same as another implementation.
> The only thing that user can exapect is that when they choose one
> implementation, it is consistent, i.e. it is "random enough" and
> "not biased". Two different implementations can fulfill these
> requirements without producing the same sequence.
>
>> In other words, how "standard" are the default implementations
>> of the methods defined in "BitsStreamGenerator"?
> I don't think there are any standard to abide by.
>
>> And what are the compatibility requirements with respect to
>> reproducibility of the sequences (between releases of the
>> library)?
> It's a grey zone, IMHO. It is nice to preserve reproducibility
> as changing it changes the reference test cases with fixed seeds.

Did the original test cases for the generators that came in with
Mantissa come from external references?  If so, it would be good to
keep them for at least the bit streams themselves so we can be sure
that what we have implemented is the standard implementation of the
algorithm.  If we leave the standard algorithms themselves behind,
it's then on us to produce tests to verify lots of properties beyond
just lack of bias and uniformity.

Phil

> On the other hand, too much compatibility really is a pain, we
> feel it constantly, so I would say that as soon as we need to
> change the sequence, we can just do it, and check all the tests
> we know of (i.e. those of our own library) still pass and we
> go forward.
>
> best regards,
> Luc
>
>>
>> Regards,
>> Gilles
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] JDK "Random" and "BitsStreamGenerator"

Phil Steitz
On 1/26/16 6:10 AM, Phil Steitz wrote:

> On 1/26/16 1:47 AM, Luc Maisonobe wrote:
>> Le 26/01/2016 01:13, Gilles a écrit :
>>> Hi.
>> Hi Gilles,
>>
>>> In CM, a base class "BitsStreamGenerator" contains methods for
>>> generating various primitive types from a bit source (to be
>>> implemented by subclasses by overriding the "next(int bits)"
>>> method).
>>> For example, to get a random "long" value, the following code
>>> is called:
>>> ---CUT---
>>>     public long nextLong() {
>>>         final long high  = ((long) next(32)) << 32;
>>>         final long  low  = (next(32)) & 0xffffffffL;
>>>         return high | low;
>>>     }
>>> ---CUT---
>>>
>>> If the bit source were provided by the JDK's "Random" class,
>>> is it assumed that the "long" value generated by the above
>>> code will always be equal to the "long" value generated by
>>> the call to JDK's "Random" class?
>> No. There is nothing said anywhere in the javadoc that would imply
>> one implementation of nextLong is the same as another implementation.
>> The only thing that user can exapect is that when they choose one
>> implementation, it is consistent, i.e. it is "random enough" and
>> "not biased". Two different implementations can fulfill these
>> requirements without producing the same sequence.
>>
>>> In other words, how "standard" are the default implementations
>>> of the methods defined in "BitsStreamGenerator"?
>> I don't think there are any standard to abide by.
>>
>>> And what are the compatibility requirements with respect to
>>> reproducibility of the sequences (between releases of the
>>> library)?
>> It's a grey zone, IMHO. It is nice to preserve reproducibility
>> as changing it changes the reference test cases with fixed seeds.
> Did the original test cases for the generators that came in with
> Mantissa come from external references?  If so, it would be good to
> keep them for at least the bit streams themselves so we can be sure
> that what we have implemented is the standard implementation of the
> algorithm.  If we leave the standard algorithms themselves behind,
> it's then on us to produce tests to verify lots of properties beyond
> just lack of bias and uniformity.
>
> Phil
>> On the other hand, too much compatibility really is a pain, we
>> feel it constantly, so I would say that as soon as we need to
>> change the sequence, we can just do it, and check all the tests
>> we know of (i.e. those of our own library) still pass and we
>> go forward.

Sorry, I mean to add that while (maybe a bit strangely) the JDK
commits to a specific long-from-bits recipe (see the javadoc for
j.u.Random#nextLong) we have never done that and I am OK with
changing that if we get better performance as long as uniformity
tests pass.

Phil

>>
>> best regards,
>> Luc
>>
>>> Regards,
>>> Gilles
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] JDK "Random" and "BitsStreamGenerator"

Gilles Sadowski
In reply to this post by Luc Maisonobe-2
Hello Luc.

On Tue, 26 Jan 2016 09:47:20 +0100, Luc Maisonobe wrote:

> Le 26/01/2016 01:13, Gilles a écrit :
>> Hi.
>
> Hi Gilles,
>
>>
>> In CM, a base class "BitsStreamGenerator" contains methods for
>> generating various primitive types from a bit source (to be
>> implemented by subclasses by overriding the "next(int bits)"
>> method).
>> For example, to get a random "long" value, the following code
>> is called:
>> ---CUT---
>>     public long nextLong() {
>>         final long high  = ((long) next(32)) << 32;
>>         final long  low  = (next(32)) & 0xffffffffL;
>>         return high | low;
>>     }
>> ---CUT---
>>
>> If the bit source were provided by the JDK's "Random" class,
>> is it assumed that the "long" value generated by the above
>> code will always be equal to the "long" value generated by
>> the call to JDK's "Random" class?
>
> No. There is nothing said anywhere in the javadoc that would imply
> one implementation of nextLong is the same as another implementation.
> The only thing that user can exapect is that when they choose one
> implementation, it is consistent, i.e. it is "random enough" and
> "not biased". Two different implementations can fulfill these
> requirements without producing the same sequence.

I note that your statement contradicts Phil's[1]:
   "[..] you [...] changed the internal implementation of all
    of the Bitstream generators."
   "[must] verify that the same sequences are generated by the
    3_X code and the refactored code [...]"

Minor point: I think that there is a possible ambiguity with
"JDKRandomGenerator" when the doc indicates
   "[an] adapter that delegates the random number generation to
    the standard [Random] class".

In fact, there are two possible cases:
  (a) delegating all methods to "Random", or
  (b) only use the source of randomness ("next(int bits)") from
      "Random" and create the "long" with the algo provided in CM.

CM implicitly assumes case (a).
A user could infer it from the HTML doc of "JDKRandomGenerator",
as it does not inherit from "BitsStreamGenerator".
But in "RandomGeneratorFactory", it's more confusing.
[By the way, those two are duplicate implementations of the same
functionality.]

In the new "rng" package, I'll actually provide case (b) which
is more consistent with how the other "sources of randomness"
are used by "BitsStreamGenerator" to provide all the "nextXxx"
methods.

>> In other words, how "standard" are the default implementations
>> of the methods defined in "BitsStreamGenerator"?
>
> I don't think there are any standard to abide by.

OK, but how do we detect a suboptimal/buggy algorithm?

>> And what are the compatibility requirements with respect to
>> reproducibility of the sequences (between releases of the
>> library)?
>
> It's a grey zone, IMHO. It is nice to preserve reproducibility
> as changing it changes the reference test cases with fixed seeds.

AFAIK, having one seed pass the tests does not imply that the RNG
is good.

> On the other hand, too much compatibility really is a pain, we
> feel it constantly, so I would say that as soon as we need to
> change the sequence, we can just do it, and check all the tests
> we know of (i.e. those of our own library) still pass and we
> go forward.

Are those tests with sample size of 1000 strong enough to detect
that an implementation does not uphold the properties of a RNG?

I mean, isn't it actually enough that
1. a CM unit test verifies that N numbers (i.e. the "int" values
    returned by "next(32)") are the same as a set of "reference"
    values,
2. a CM unit test verifies that the conversion from "int" to each of
    the other types is correct (hence the need to have a reference
    for those routines as well)
?

If tests (1) and (2) pass, then the CM the uniformity tests seem to
try and show that the reference algorithm itself is wrong.  Which
it'll have a hard time achieving with a small sample.  Even more so
with a single, fixed, seed.

In the absence of (2), the uniformity tests are actually testing
the conversion routines, with each of the different sources of
randomness.  Testing with a single source should be enough since if
there is a bug in the conversion routines, the same tests will fail
for any source of randomness.


Best regards,
Gilles

[1] http://markmail.org/message/ewat3qekz7ec5q6y


> best regards,
> Luc
>
>>
>>
>> Regards,
>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] JDK "Random" and "BitsStreamGenerator"

Luc Maisonobe-2
Hi Gilles,

Le 26/01/2016 20:07, Gilles a écrit :

> Hello Luc.
>
> On Tue, 26 Jan 2016 09:47:20 +0100, Luc Maisonobe wrote:
>> Le 26/01/2016 01:13, Gilles a écrit :
>>> Hi.
>>
>> Hi Gilles,
>>
>>>
>>> In CM, a base class "BitsStreamGenerator" contains methods for
>>> generating various primitive types from a bit source (to be
>>> implemented by subclasses by overriding the "next(int bits)"
>>> method).
>>> For example, to get a random "long" value, the following code
>>> is called:
>>> ---CUT---
>>>     public long nextLong() {
>>>         final long high  = ((long) next(32)) << 32;
>>>         final long  low  = (next(32)) & 0xffffffffL;
>>>         return high | low;
>>>     }
>>> ---CUT---
>>>
>>> If the bit source were provided by the JDK's "Random" class,
>>> is it assumed that the "long" value generated by the above
>>> code will always be equal to the "long" value generated by
>>> the call to JDK's "Random" class?
>>
>> No. There is nothing said anywhere in the javadoc that would imply
>> one implementation of nextLong is the same as another implementation.
>> The only thing that user can exapect is that when they choose one
>> implementation, it is consistent, i.e. it is "random enough" and
>> "not biased". Two different implementations can fulfill these
>> requirements without producing the same sequence.
>
> I note that your statement contradicts Phil's[1]:
>   "[..] you [...] changed the internal implementation of all
>    of the Bitstream generators."
>   "[must] verify that the same sequences are generated by the
>    3_X code and the refactored code [...]"
>
> Minor point: I think that there is a possible ambiguity with
> "JDKRandomGenerator" when the doc indicates
>   "[an] adapter that delegates the random number generation to
>    the standard [Random] class".
>
> In fact, there are two possible cases:
>  (a) delegating all methods to "Random", or
>  (b) only use the source of randomness ("next(int bits)") from
>      "Random" and create the "long" with the algo provided in CM.
>
> CM implicitly assumes case (a).

This is because JDKRandomGenerator is an adaptor, i.e. it changes
an API to match another API, but trying not to change implementations.

The other generators we have are independent generators.

> A user could infer it from the HTML doc of "JDKRandomGenerator",
> as it does not inherit from "BitsStreamGenerator".
> But in "RandomGeneratorFactory", it's more confusing.
> [By the way, those two are duplicate implementations of the same
> functionality.]
>
> In the new "rng" package, I'll actually provide case (b) which
> is more consistent with how the other "sources of randomness"
> are used by "BitsStreamGenerator" to provide all the "nextXxx"
> methods.
>
>>> In other words, how "standard" are the default implementations
>>> of the methods defined in "BitsStreamGenerator"?
>>
>> I don't think there are any standard to abide by.
>
> OK, but how do we detect a suboptimal/buggy algorithm?

We don't. We don't have the skills for it. We are not random
generator experts with years of research behind us. We try to
do our best but sometime fails and I'm pretty sure we have a
bunch of undetected suboptimal/buggy algorithms.

>
>>> And what are the compatibility requirements with respect to
>>> reproducibility of the sequences (between releases of the
>>> library)?
>>
>> It's a grey zone, IMHO. It is nice to preserve reproducibility
>> as changing it changes the reference test cases with fixed seeds.
>
> AFAIK, having one seed pass the tests does not imply that the RNG
> is good.

You misunderstood what I meant.

The test cases I wrote about are the test cases for other classes
that do use random generator for different purposes. In many cases,
it is even simply to have a lot of different test points (a few
thousands), in order to explore a domain. For such cases, in some
cases we even used low discrepancy Sobol sequences instead of
random sequences. So when unit cases for algorithm A (say a geometry
algorithm), uses a sequence of points and we change the implementation
of the random generator, algorithm A then has simply a different
set of test points. It's not a worse or better set, it is a different
set. So we may need to adjust the test tolerances or the like. It is
not a big deal, but it has to be done when we touch the random
generators.

>
>> On the other hand, too much compatibility really is a pain, we
>> feel it constantly, so I would say that as soon as we need to
>> change the sequence, we can just do it, and check all the tests
>> we know of (i.e. those of our own library) still pass and we
>> go forward.
>
> Are those tests with sample size of 1000 strong enough to detect
> that an implementation does not uphold the properties of a RNG?

No, but they are strong enough to detect large errors or biases.
If we follow an expert-published algorithm and pass these rough-grained
tests, we can consider we did not made too much errors. Unit tests
do not validate a theory, they "try" to rule out programming errors.
They are also very important for non-regression. They point you
at differences when suddenly they fail after a change, and then it
is a human brain that has to concentrate on this and decide "oh, it
is simply a side effect of this change, we are still in the valid
range of this algorithm, I will update the test and commit", or
"I did a big mistake here changing this, hopefully the test failed
and pointed me at the problem, I'm glad I have kept this small
rough test around".

>
> I mean, isn't it actually enough that
> 1. a CM unit test verifies that N numbers (i.e. the "int" values
>    returned by "next(32)") are the same as a set of "reference"
>    values,
> 2. a CM unit test verifies that the conversion from "int" to each of
>    the other types is correct (hence the need to have a reference
>    for those routines as well)
> ?

It would be nice is we have these references, of course.

>
> If tests (1) and (2) pass, then the CM the uniformity tests seem to
> try and show that the reference algorithm itself is wrong.  Which
> it'll have a hard time achieving with a small sample.  Even more so
> with a single, fixed, seed.

No, they are not intended to test the algorithm, just to detect
implementation mistakes. They should guard us against something
like <https://www.xkcd.com/221/> (well, I hope).

best regards,
Luc

>
> In the absence of (2), the uniformity tests are actually testing
> the conversion routines, with each of the different sources of
> randomness.  Testing with a single source should be enough since if
> there is a bug in the conversion routines, the same tests will fail
> for any source of randomness.
>
>
> Best regards,
> Gilles
>
> [1] http://markmail.org/message/ewat3qekz7ec5q6y
>
>
>> best regards,
>> Luc
>>
>>>
>>>
>>> Regards,
>>> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] JDK "Random" and "BitsStreamGenerator"

Gilles Sadowski
Hi.

On Tue, 26 Jan 2016 22:11:42 +0100, Luc Maisonobe wrote:

> Hi Gilles,
>
> Le 26/01/2016 20:07, Gilles a écrit :
>> Hello Luc.
>>
>> On Tue, 26 Jan 2016 09:47:20 +0100, Luc Maisonobe wrote:
>>> Le 26/01/2016 01:13, Gilles a écrit :
>>>> Hi.
>>>
>>> Hi Gilles,
>>>
>>>>
>>>> In CM, a base class "BitsStreamGenerator" contains methods for
>>>> generating various primitive types from a bit source (to be
>>>> implemented by subclasses by overriding the "next(int bits)"
>>>> method).
>>>> For example, to get a random "long" value, the following code
>>>> is called:
>>>> ---CUT---
>>>>     public long nextLong() {
>>>>         final long high  = ((long) next(32)) << 32;
>>>>         final long  low  = (next(32)) & 0xffffffffL;
>>>>         return high | low;
>>>>     }
>>>> ---CUT---
>>>>
>>>> If the bit source were provided by the JDK's "Random" class,
>>>> is it assumed that the "long" value generated by the above
>>>> code will always be equal to the "long" value generated by
>>>> the call to JDK's "Random" class?
>>>
>>> No. There is nothing said anywhere in the javadoc that would imply
>>> one implementation of nextLong is the same as another
>>> implementation.
>>> The only thing that user can exapect is that when they choose one
>>> implementation, it is consistent, i.e. it is "random enough" and
>>> "not biased". Two different implementations can fulfill these
>>> requirements without producing the same sequence.
>>
>> I note that your statement contradicts Phil's[1]:
>>   "[..] you [...] changed the internal implementation of all
>>    of the Bitstream generators."
>>   "[must] verify that the same sequences are generated by the
>>    3_X code and the refactored code [...]"
>>
>> Minor point: I think that there is a possible ambiguity with
>> "JDKRandomGenerator" when the doc indicates
>>   "[an] adapter that delegates the random number generation to
>>    the standard [Random] class".
>>
>> In fact, there are two possible cases:
>>  (a) delegating all methods to "Random", or
>>  (b) only use the source of randomness ("next(int bits)") from
>>      "Random" and create the "long" with the algo provided in CM.
>>
>> CM implicitly assumes case (a).
>
> This is because JDKRandomGenerator is an adaptor, i.e. it changes
> an API to match another API, but trying not to change
> implementations.
>
> The other generators we have are independent generators.

Hmm, I know; the confusion arises exactly because of that, one is an
adapter and all the others are not.
That difference is not highlighted in the documentation.

IMO, the adapter should not be "at the same level" as the independent
generators.

>> A user could infer it from the HTML doc of "JDKRandomGenerator",
>> as it does not inherit from "BitsStreamGenerator".
>> But in "RandomGeneratorFactory", it's more confusing.
>> [By the way, those two are duplicate implementations of the same
>> functionality.]
>>
>> In the new "rng" package, I'll actually provide case (b) which
>> is more consistent with how the other "sources of randomness"
>> are used by "BitsStreamGenerator" to provide all the "nextXxx"
>> methods.
>>
>>>> In other words, how "standard" are the default implementations
>>>> of the methods defined in "BitsStreamGenerator"?
>>>
>>> I don't think there are any standard to abide by.
>>
>> OK, but how do we detect a suboptimal/buggy algorithm?
>
> We don't. We don't have the skills for it. We are not random
> generator experts with years of research behind us. We try to
> do our best but sometime fails and I'm pretty sure we have a
> bunch of undetected suboptimal/buggy algorithms.

I don't think it's that bad. ;-)

What I meant is that there is an overrated confidence in what the
unit tests do achieve.

>>
>>>> And what are the compatibility requirements with respect to
>>>> reproducibility of the sequences (between releases of the
>>>> library)?
>>>
>>> It's a grey zone, IMHO. It is nice to preserve reproducibility
>>> as changing it changes the reference test cases with fixed seeds.
>>
>> AFAIK, having one seed pass the tests does not imply that the RNG
>> is good.
>
> You misunderstood what I meant.
>
> The test cases I wrote about are the test cases for other classes
> that do use random generator for different purposes. In many cases,
> it is even simply to have a lot of different test points (a few
> thousands), in order to explore a domain. For such cases, in some
> cases we even used low discrepancy Sobol sequences instead of
> random sequences. So when unit cases for algorithm A (say a geometry
> algorithm), uses a sequence of points and we change the
> implementation
> of the random generator, algorithm A then has simply a different
> set of test points. It's not a worse or better set, it is a different
> set. So we may need to adjust the test tolerances or the like. It is
> not a big deal, but it has to be done when we touch the random
> generators.

OK.
But I thought that we talking about the RNG test suite...

>>
>>> On the other hand, too much compatibility really is a pain, we
>>> feel it constantly, so I would say that as soon as we need to
>>> change the sequence, we can just do it, and check all the tests
>>> we know of (i.e. those of our own library) still pass and we
>>> go forward.
>>
>> Are those tests with sample size of 1000 strong enough to detect
>> that an implementation does not uphold the properties of a RNG?
>
> No, but they are strong enough to detect large errors or biases.
> If we follow an expert-published algorithm and pass these
> rough-grained
> tests, we can consider we did not made too much errors. Unit tests
> do not validate a theory, they "try" to rule out programming errors.
> They are also very important for non-regression. They point you
> at differences when suddenly they fail after a change, and then it
> is a human brain that has to concentrate on this and decide "oh, it
> is simply a side effect of this change, we are still in the valid
> range of this algorithm, I will update the test and commit", or
> "I did a big mistake here changing this, hopefully the test failed
> and pointed me at the problem, I'm glad I have kept this small
> rough test around".

Well, I was  nottrying to convince you that unit test are not good
in general.
Just that the RNG test suite is obscure and full of redundancies.

>>
>> I mean, isn't it actually enough that
>> 1. a CM unit test verifies that N numbers (i.e. the "int" values
>>    returned by "next(32)") are the same as a set of "reference"
>>    values,
>> 2. a CM unit test verifies that the conversion from "int" to each of
>>    the other types is correct (hence the need to have a reference
>>    for those routines as well)
>> ?
>
> It would be nice is we have these references, of course.
>
>>
>> If tests (1) and (2) pass, then the CM the uniformity tests seem to
>> try and show that the reference algorithm itself is wrong.  Which
>> it'll have a hard time achieving with a small sample.  Even more so
>> with a single, fixed, seed.
>
> No, they are not intended to test the algorithm, just to detect
> implementation mistakes. They should guard us against something
> like <https://www.xkcd.com/221/> (well, I hope).

I know that they protect against this, but in an overly redundant way
(see last paragraph of my previous message, below): there are 63 tests
methods (that are executed for each RNG).
And, last time looked, the end result was that one was deemed as
useless. Others do similar checks...

[But it's certainly better that way than the other way around.
And when no reference values are available from an "official" source,
it is certainly necessary to test the uniformity, even if just for
small sample size.]

Best,
Gilles

> best regards,
> Luc
>
>>
>> In the absence of (2), the uniformity tests are actually testing
>> the conversion routines, with each of the different sources of
>> randomness.  Testing with a single source should be enough since if
>> there is a bug in the conversion routines, the same tests will fail
>> for any source of randomness.
>>
>>
>> Best regards,
>> Gilles
>>
>> [1] http://markmail.org/message/ewat3qekz7ec5q6y
>>
>>
>>> best regards,
>>> Luc
>>>
>>>>
>>>>
>>>> Regards,
>>>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]