Fwd: Re: [rng] Split and Jump functions

classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Fwd: Re: [rng] Split and Jump functions

Alex Herbert
Resending to list.


-------- Forwarded Message --------
Subject: Re: [rng] Split and Jump functions
Date: Wed, 10 Apr 2019 18:00:54 +0100
From: Alex Herbert <[hidden email]>
To: Gilles Sadowski <[hidden email]>



On 10/04/2019 18:58, Gilles Sadowski wrote:

> Hi.
>
>> [... long quote skipped where I think we largely agree on the
>> conclusions ...]
>> So do we have a working idea to:
>>
>> - Add interface 'JumpableUniformRandomProvider'
> Do we need to add "createJumpable" factory methods in "RandomSource"
> methods or is there a way to avoid the duplication?
>
> As mentioned in an earlier post, it would be cleaner/nicer (?) to add
> methods
> UniformRandomProvider jump();
> boolean isJumpable();
> to "UniformRandomProvider".
> This would require dropping support of Java 6 and 7 and perhaps a good
> reason to do so (?) ...

And move to V2.0 with Java 8 giving the opportunity to clean up other
deprecated stuff.

Would the the default implementation be to throw an
UnsupportedOperationException?

I'm +0 on this.

I'm not against it but do think the UniformRandomProvider interface
could become quite cluttered with these extra methods that would be in
the minority (jump, longJump, isJumpable, isLongJumpable are not
generally available). It would also allow methods/classes that currently
use simple methods from UniformRandomProvider to have access to call
jump on the generator and spawn lots of sub generators. I think this is
bad. These methods should be written to explicitly require a Jumpable
instance.

My approach would have been to leave RandomSource as is and then state
that the returned generator can be tested to see if it is Jumpable using
instanceof. Someone who is writing code to use a Jumpable RNG should be
fine with that since they would have to know a priory what RandomSource
to create to get a Jumpable.

I would add a method to mimic RandomSource.unrestorable as
RandomSource.unjumpable. Or since they both would be doing the same
thing a new method RandomSource.restrict to 'restrict' functionality to
just the data generation methods in UniformRandomProvider. This restrict
method can be called by RandomSource.unrestorable and make that deprecated.

>
>> - Add interface 'LongJumpable... extends JumpableUniformRandomProvider'
> Same question...
>
>> - Test if a SplitMix variant with a self generated Weyl sequence can
>> pass tests of uniformity. This would just require a seed of long[2], one
>> for the state and one to use to derive the Weyl sequence increment.
> Is the new seed length a temporary workaround for the test,
> to be replaced with a new "SPLIT_MIX_64_K" provider, as
> mentioned in your previous message, if the test passes?
>
> Gilles

I was hoping to avoid creating a duplicate class. But actually that
would be fine and easier for testing. The implementation is trivial anyway.

I've just finished an initial implementation of the MSWS RNG that uses a
self-generated Weyl sequence. It works. So the idea for this can be
applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
both. Perhaps moving the code to create the Weyl sequence increment to
NumberFactory.

>
>> Alex
>>
>>
>>> Regards,
>>> Gilles
>>>
>>>> Alex
>>>>
>>>>
>>>> [1] https://en.wikipedia.org/wiki/Weyl_sequence
>>>>
>>>> [2] The Jira ticket RNG-85 had a note about the seeding algorithm for
>>>> the generator being GPL. There are alternatives based on the
>>>> SplittableRandom seeding method that could be used instead to
>>>> create the
>>>> increment for the Weyl sequence. I've speed tested the provided
>>>> algorithm and it is about 85x slower than the one used in
>>>> SplittableRandom. Since that algorithm has an issue with the unsigned
>>>> shift not being modelled by the Binomial distribution an updated
>>>> algorithm could be used that would be novel so avoid the Oracle or GPL
>>>> licences.
>>>>
>>>>> Best,
>>>>> Gilles
>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>> [1] https://github.com/aappleby/smhasher
>>>>>>>>
>>>>>>>> [2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions
>>>>>>>>
>>>>>>>> [3] The SplitMix64 is a simple linear series and thus can be
>>>>>>>> jumped in
>>>>>>>> any power of 2 up to the maximum for a long (which causes sequence
>>>>>>>> wrapping). It can actually be jumped any number of iterations using
>>>>>>>> BigInteger arithmetic but jumping in powers of 2 can be implemented
>>>>>>>> using long arithmetic where the rollover bits beyond 64 are
>>>>>>>> naturally
>>>>>>>> discarded:
>>>>>>>>
>>>>>>>> long jumps = 1234567;
>>>>>>>>
>>>>>>>> long increment = 0x9e3779b97f4a7c15L;
>>>>>>>>
>>>>>>>> state = BigInteger.valueOf(state)
>>>>>>>>
>>>>>>>> .add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))
>>>>>>>>
>>>>>>>> .longValue(); // narrowing primitive conversion
>>>>>>>>
>>>>>>>> int jumpPower = 32;
>>>>>>>>
>>>>>>>> state = BigInteger.valueOf(state)
>>>>>>>>
>>>>>>>> .add(BigInteger.valueOf(increment).shiftLeft(jumpPower))
>>>>>>>>
>>>>>>>> .longValue(); // narrowing primitive conversion
>>>>>>>>
>>>>>>>> // Same as above by discarding overflow bits
>>>>>>>>
>>>>>>>> state = state + (increment << jumpPower);
>>>>>>>>
>>>>>>>> This is based on my understanding of BigInteger and signed/unsigned
>>>>>>>> arithmetic and should be verified in tests.
>>>>>>>>
>>>>>>>> [4] Given the period of the SplitMix is 2^64, and the period
>>>>>>>> may be less
>>>>>>>> than this in practice it may be better to only support jumps of
>>>>>>>> e.g.
>>>>>>>> 2^32 in a manner similar to those provided for the XorShiRo
>>>>>>>> generators
>>>>>>>> where the state can be advanced a factor of the period, or just not
>>>>>>>> supports jumps. I can see the utility in jumping more than
>>>>>>>> Integer.MAX_VALUE (guaranteed unique outputs for the maximum
>>>>>>>> array size)
>>>>>>>> but less than 2^32 is tending towards not very many random
>>>>>>>> numbers from
>>>>>>>> the original instance before sequence overlap with the jumped
>>>>>>>> instance.
>>>>>>>>
>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: [rng] Split and Jump functions

Gilles Sadowski-2
Hi.

>
> On 10/04/2019 18:58, Gilles Sadowski wrote:
> > Hi.
> >
> >> [... long quote skipped where I think we largely agree on the
> >> conclusions ...]
> >> So do we have a working idea to:
> >>
> >> - Add interface 'JumpableUniformRandomProvider'
> > Do we need to add "createJumpable" factory methods in "RandomSource"
> > methods or is there a way to avoid the duplication?
> >
> > As mentioned in an earlier post, it would be cleaner/nicer (?) to add
> > methods
> > UniformRandomProvider jump();
> > boolean isJumpable();
> > to "UniformRandomProvider".
> > This would require dropping support of Java 6 and 7 and perhaps a good
> > reason to do so (?) ...
>
> And move to V2.0 with Java 8 giving the opportunity to clean up other
> deprecated stuff.

No!
Changing major version version would entail package names change
(thus forcing user codes updates)
We don't need this since there isn't any breaking change if we add
default methods.

>
> Would the the default implementation be to throw an
> UnsupportedOperationException?

Yes.

>
> I'm +0 on this.
>
> I'm not against it but do think the UniformRandomProvider interface
> could become quite cluttered with these extra methods that would be in
> the minority (jump, longJump, isJumpable, isLongJumpable are not
> generally available). It would also allow methods/classes that currently
> use simple methods from UniformRandomProvider to have access to call
> jump on the generator and spawn lots of sub generators. I think this is
> bad. These methods should be written to explicitly require a Jumpable
> instance.

From this POV, I certainly agree.

> My approach would have been to leave RandomSource as is and then state
> that the returned generator can be tested to see if it is Jumpable using
> instanceof. Someone who is writing code to use a Jumpable RNG should be
> fine with that since they would have to know a priory what RandomSource
> to create to get a Jumpable.

If it makes more sense, I'm fine letting the user write the "ugly" code. ;-)

> I would add a method to mimic RandomSource.unrestorable as
> RandomSource.unjumpable. Or since they both would be doing the same
> thing a new method RandomSource.restrict to 'restrict' functionality to
> just the data generation methods in UniformRandomProvider. This restrict
> method can be called by RandomSource.unrestorable and make that deprecated.

Looks neat.

> >
> >> - Add interface 'LongJumpable... extends JumpableUniformRandomProvider'
> > Same question...
> >
> >> - Test if a SplitMix variant with a self generated Weyl sequence can
> >> pass tests of uniformity. This would just require a seed of long[2], one
> >> for the state and one to use to derive the Weyl sequence increment.
> > Is the new seed length a temporary workaround for the test,
> > to be replaced with a new "SPLIT_MIX_64_K" provider, as
> > mentioned in your previous message, if the test passes?
> >
> > Gilles
>
> I was hoping to avoid creating a duplicate class. But actually that
> would be fine and easier for testing. The implementation is trivial anyway.

Why duplicate?
IIUC, shouldn't the existing "core" class define an additional constructor
that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
would use the default increment.
[Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]

> I've just finished an initial implementation of the MSWS RNG that uses a
> self-generated Weyl sequence. It works. So the idea for this can be
> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> both. Perhaps moving the code to create the Weyl sequence increment to
> NumberFactory.

+1

Regards,
Gilles

> >
> >> Alex
> >>
> >>
> >>> Regards,
> >>> Gilles
> >>>
> >>>> Alex
> >>>>
> >>>>
> >>>> [1] https://en.wikipedia.org/wiki/Weyl_sequence
> >>>>
> >>>> [2] The Jira ticket RNG-85 had a note about the seeding algorithm for
> >>>> the generator being GPL. There are alternatives based on the
> >>>> SplittableRandom seeding method that could be used instead to
> >>>> create the
> >>>> increment for the Weyl sequence. I've speed tested the provided
> >>>> algorithm and it is about 85x slower than the one used in
> >>>> SplittableRandom. Since that algorithm has an issue with the unsigned
> >>>> shift not being modelled by the Binomial distribution an updated
> >>>> algorithm could be used that would be novel so avoid the Oracle or GPL
> >>>> licences.
> >>>>
> >>>>> Best,
> >>>>> Gilles
> >>>>>
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>> [1] https://github.com/aappleby/smhasher
> >>>>>>>>
> >>>>>>>> [2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions
> >>>>>>>>
> >>>>>>>> [3] The SplitMix64 is a simple linear series and thus can be
> >>>>>>>> jumped in
> >>>>>>>> any power of 2 up to the maximum for a long (which causes sequence
> >>>>>>>> wrapping). It can actually be jumped any number of iterations using
> >>>>>>>> BigInteger arithmetic but jumping in powers of 2 can be implemented
> >>>>>>>> using long arithmetic where the rollover bits beyond 64 are
> >>>>>>>> naturally
> >>>>>>>> discarded:
> >>>>>>>>
> >>>>>>>> long jumps = 1234567;
> >>>>>>>>
> >>>>>>>> long increment = 0x9e3779b97f4a7c15L;
> >>>>>>>>
> >>>>>>>> state = BigInteger.valueOf(state)
> >>>>>>>>
> >>>>>>>> .add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))
> >>>>>>>>
> >>>>>>>> .longValue(); // narrowing primitive conversion
> >>>>>>>>
> >>>>>>>> int jumpPower = 32;
> >>>>>>>>
> >>>>>>>> state = BigInteger.valueOf(state)
> >>>>>>>>
> >>>>>>>> .add(BigInteger.valueOf(increment).shiftLeft(jumpPower))
> >>>>>>>>
> >>>>>>>> .longValue(); // narrowing primitive conversion
> >>>>>>>>
> >>>>>>>> // Same as above by discarding overflow bits
> >>>>>>>>
> >>>>>>>> state = state + (increment << jumpPower);
> >>>>>>>>
> >>>>>>>> This is based on my understanding of BigInteger and signed/unsigned
> >>>>>>>> arithmetic and should be verified in tests.
> >>>>>>>>
> >>>>>>>> [4] Given the period of the SplitMix is 2^64, and the period
> >>>>>>>> may be less
> >>>>>>>> than this in practice it may be better to only support jumps of
> >>>>>>>> e.g.
> >>>>>>>> 2^32 in a manner similar to those provided for the XorShiRo
> >>>>>>>> generators
> >>>>>>>> where the state can be advanced a factor of the period, or just not
> >>>>>>>> supports jumps. I can see the utility in jumping more than
> >>>>>>>> Integer.MAX_VALUE (guaranteed unique outputs for the maximum
> >>>>>>>> array size)
> >>>>>>>> but less than 2^32 is tending towards not very many random
> >>>>>>>> numbers from
> >>>>>>>> the original instance before sequence overlap with the jumped
> >>>>>>>> instance.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert


> On 10 Apr 2019, at 18:52, Gilles Sadowski <[hidden email]> wrote:
>
> Hi.
>
>>
>> On 10/04/2019 18:58, Gilles Sadowski wrote:
>>> Hi.
>>>
>>>> [... long quote skipped where I think we largely agree on the
>>>> conclusions ...]
>>>> So do we have a working idea to:
>>>>
>>>> - Add interface 'JumpableUniformRandomProvider'
>>> Do we need to add "createJumpable" factory methods in "RandomSource"
>>> methods or is there a way to avoid the duplication?
>>>
>>> As mentioned in an earlier post, it would be cleaner/nicer (?) to add
>>> methods
>>> UniformRandomProvider jump();
>>> boolean isJumpable();
>>> to "UniformRandomProvider".
>>> This would require dropping support of Java 6 and 7 and perhaps a good
>>> reason to do so (?) ...
>>
>> And move to V2.0 with Java 8 giving the opportunity to clean up other
>> deprecated stuff.
>
> No!
> Changing major version version would entail package names change
> (thus forcing user codes updates)
> We don't need this since there isn't any breaking change if we add
> default methods.

My bad. It is obvious when justified like that.

>
>>
>> Would the the default implementation be to throw an
>> UnsupportedOperationException?
>
> Yes.
>>
>> I'm +0 on this.
>>
>> I'm not against it but do think the UniformRandomProvider interface
>> could become quite cluttered with these extra methods that would be in
>> the minority (jump, longJump, isJumpable, isLongJumpable are not
>> generally available). It would also allow methods/classes that currently
>> use simple methods from UniformRandomProvider to have access to call
>> jump on the generator and spawn lots of sub generators. I think this is
>> bad. These methods should be written to explicitly require a Jumpable
>> instance.
>
> From this POV, I certainly agree.
>
>> My approach would have been to leave RandomSource as is and then state
>> that the returned generator can be tested to see if it is Jumpable using
>> instanceof. Someone who is writing code to use a Jumpable RNG should be
>> fine with that since they would have to know a priory what RandomSource
>> to create to get a Jumpable.
>
> If it makes more sense, I'm fine letting the user write the "ugly" code. ;-)

Not adding a dedicated method would mean everyone has to do this:

JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)

But adding a mirror methods:

JumpableUniformRandomProvider RandomSource::createJumpable(…)
LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)

Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.

Is there scope for needing the Jumpable to be detected through the API? E.g. add:

boolean RandomSource::isJumpable(RandomSource)

We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.

>
>> I would add a method to mimic RandomSource.unrestorable as
>> RandomSource.unjumpable. Or since they both would be doing the same
>> thing a new method RandomSource.restrict to 'restrict' functionality to
>> just the data generation methods in UniformRandomProvider. This restrict
>> method can be called by RandomSource.unrestorable and make that deprecated.
>
> Looks neat.
>
>>>
>>>> - Add interface 'LongJumpable... extends JumpableUniformRandomProvider'
>>> Same question...
>>>
>>>> - Test if a SplitMix variant with a self generated Weyl sequence can
>>>> pass tests of uniformity. This would just require a seed of long[2], one
>>>> for the state and one to use to derive the Weyl sequence increment.
>>> Is the new seed length a temporary workaround for the test,
>>> to be replaced with a new "SPLIT_MIX_64_K" provider, as
>>> mentioned in your previous message, if the test passes?
>>>
>>> Gilles
>>
>> I was hoping to avoid creating a duplicate class. But actually that
>> would be fine and easier for testing. The implementation is trivial anyway.
>
> Why duplicate?
> IIUC, shouldn't the existing "core" class define an additional constructor
> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> would use the default increment.
> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]

OK. That was where I should have gone to start with. I’ll do that.

>
>> I've just finished an initial implementation of the MSWS RNG that uses a
>> self-generated Weyl sequence. It works. So the idea for this can be
>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
>> both. Perhaps moving the code to create the Weyl sequence increment to
>> NumberFactory.
>
> +1

OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.

If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.


>
> Regards,
> Gilles
>
>>>
>>>> Alex
>>>>
>>>>
>>>>> Regards,
>>>>> Gilles
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>> [1] https://en.wikipedia.org/wiki/Weyl_sequence
>>>>>>
>>>>>> [2] The Jira ticket RNG-85 had a note about the seeding algorithm for
>>>>>> the generator being GPL. There are alternatives based on the
>>>>>> SplittableRandom seeding method that could be used instead to
>>>>>> create the
>>>>>> increment for the Weyl sequence. I've speed tested the provided
>>>>>> algorithm and it is about 85x slower than the one used in
>>>>>> SplittableRandom. Since that algorithm has an issue with the unsigned
>>>>>> shift not being modelled by the Binomial distribution an updated
>>>>>> algorithm could be used that would be novel so avoid the Oracle or GPL
>>>>>> licences.
>>>>>>
>>>>>>> Best,
>>>>>>> Gilles
>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/aappleby/smhasher
>>>>>>>>>>
>>>>>>>>>> [2] Using Long.bitCount(long ^ (long >>> 1)) to count transitions
>>>>>>>>>>
>>>>>>>>>> [3] The SplitMix64 is a simple linear series and thus can be
>>>>>>>>>> jumped in
>>>>>>>>>> any power of 2 up to the maximum for a long (which causes sequence
>>>>>>>>>> wrapping). It can actually be jumped any number of iterations using
>>>>>>>>>> BigInteger arithmetic but jumping in powers of 2 can be implemented
>>>>>>>>>> using long arithmetic where the rollover bits beyond 64 are
>>>>>>>>>> naturally
>>>>>>>>>> discarded:
>>>>>>>>>>
>>>>>>>>>> long jumps = 1234567;
>>>>>>>>>>
>>>>>>>>>> long increment = 0x9e3779b97f4a7c15L;
>>>>>>>>>>
>>>>>>>>>> state = BigInteger.valueOf(state)
>>>>>>>>>>
>>>>>>>>>> .add(BigInteger.valueOf(increment).multiply(BigInteger.valueOf(jumps)))
>>>>>>>>>>
>>>>>>>>>> .longValue(); // narrowing primitive conversion
>>>>>>>>>>
>>>>>>>>>> int jumpPower = 32;
>>>>>>>>>>
>>>>>>>>>> state = BigInteger.valueOf(state)
>>>>>>>>>>
>>>>>>>>>> .add(BigInteger.valueOf(increment).shiftLeft(jumpPower))
>>>>>>>>>>
>>>>>>>>>> .longValue(); // narrowing primitive conversion
>>>>>>>>>>
>>>>>>>>>> // Same as above by discarding overflow bits
>>>>>>>>>>
>>>>>>>>>> state = state + (increment << jumpPower);
>>>>>>>>>>
>>>>>>>>>> This is based on my understanding of BigInteger and signed/unsigned
>>>>>>>>>> arithmetic and should be verified in tests.
>>>>>>>>>>
>>>>>>>>>> [4] Given the period of the SplitMix is 2^64, and the period
>>>>>>>>>> may be less
>>>>>>>>>> than this in practice it may be better to only support jumps of
>>>>>>>>>> e.g.
>>>>>>>>>> 2^32 in a manner similar to those provided for the XorShiRo
>>>>>>>>>> generators
>>>>>>>>>> where the state can be advanced a factor of the period, or just not
>>>>>>>>>> supports jumps. I can see the utility in jumping more than
>>>>>>>>>> Integer.MAX_VALUE (guaranteed unique outputs for the maximum
>>>>>>>>>> array size)
>>>>>>>>>> but less than 2^32 is tending towards not very many random
>>>>>>>>>> numbers from
>>>>>>>>>> the original instance before sequence overlap with the jumped
>>>>>>>>>> instance.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
> > [...]
>
> Not adding a dedicated method would mean everyone has to do this:
>
> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)
>
> But adding a mirror methods:
>
> JumpableUniformRandomProvider RandomSource::createJumpable(…)
> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
>
> Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.

+1 (for leaving out).

> Is there scope for needing the Jumpable to be detected through the API? E.g. add:
>
> boolean RandomSource::isJumpable(RandomSource)
>
> We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.

+1 (for leaving out for now).

>
> > [...]
> >>
> >> I was hoping to avoid creating a duplicate class. But actually that
> >> would be fine and easier for testing. The implementation is trivial anyway.
> >
> > Why duplicate?
> > IIUC, shouldn't the existing "core" class define an additional constructor
> > that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> > would use the default increment.
> > [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
>
> OK. That was where I should have gone to start with. I’ll do that.
>
> >
> >> I've just finished an initial implementation of the MSWS RNG that uses a
> >> self-generated Weyl sequence. It works. So the idea for this can be
> >> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> >> both. Perhaps moving the code to create the Weyl sequence increment to
> >> NumberFactory.
> >
> > +1
>
> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.
>
> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.

I'm a bit lost here.  I thought that "SplitMix64" did not provide
"jump", hence the
"Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)

Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
cannot ensure that the added flexibility does not come with unsuspected
drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
choosing the (hard-coded) values that do produce good sequences.]

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert

On 11/04/2019 13:22, Gilles Sadowski wrote:

>>> [...]
>> Not adding a dedicated method would mean everyone has to do this:
>>
>> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)
>>
>> But adding a mirror methods:
>>
>> JumpableUniformRandomProvider RandomSource::createJumpable(…)
>> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
>>
>> Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.
> +1 (for leaving out).
>
>> Is there scope for needing the Jumpable to be detected through the API? E.g. add:
>>
>> boolean RandomSource::isJumpable(RandomSource)
>>
>> We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.
> +1 (for leaving out for now).
>
>>> [...]
>>>> I was hoping to avoid creating a duplicate class. But actually that
>>>> would be fine and easier for testing. The implementation is trivial anyway.
>>> Why duplicate?
>>> IIUC, shouldn't the existing "core" class define an additional constructor
>>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
>>> would use the default increment.
>>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
>> OK. That was where I should have gone to start with. I’ll do that.
>>
>>>> I've just finished an initial implementation of the MSWS RNG that uses a
>>>> self-generated Weyl sequence. It works. So the idea for this can be
>>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
>>>> both. Perhaps moving the code to create the Weyl sequence increment to
>>>> NumberFactory.
>>> +1
>> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.
>>
>> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.
> I'm a bit lost here.  I thought that "SplitMix64" did not provide
> "jump",

The state it is a linear sum. So can be jumped very easily.

y = mx + c

c is the current Weyl state, m the number of transitions and x the Weyl
increment. So we can make SplitMix Jumpable. The maximum jump length is
2^64 which will wrap the state.

> hence the
> "Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)
I used the term "weak assurance" as opposed to "concrete assurance". The
wording used by the JDK is "with very high probability, the set of
values collectively generated by the two objects has the same
statistical properties as if the same quantity of values were generated
by a single object". It is a bit of semantics.

>
> Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
> cannot ensure that the added flexibility does not come with unsuspected
> drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
> choosing the (hard-coded) values that do produce good sequences.]

The likelihood of overlap is 1/[the number of different Weyl sequences].
This is around 2^-63. So it is small. However the base implementation
for SplitMix uses a specific Weyl sequence increment based on the golden
ratio. I do not know the original of this. For example is it better that
any other irrational number converted to an integer, e.g. pi, Euler's
number (e), sqrt(2)? The irrational number comes from the origin of the
Weyl sequence which is based on irrationals. But there may be no reason
that the golden ratio was chosen rather than any other source irrational.


On the subject of splitting verses jumping:

Jumping seems to be a case where the limit of the multi-threaded
scenario is known beforehand, e.g. a simulation with 1024 parallel
tasks, each task with a known limit on the count of random numbers
required. So if your jump length is above the number of random numbers
required for each task you can jump to get a single generator for each
task and ensure no overlap.

Split is used when the multi-threaded scenario is not known. This is a
case for SplittableRandom which is used in parallel algorithms which use
forking. The point at which a fork will occur and the task size of the
fork are not known beforehand. The generator is able to create new
generators for each fork that will very unlikely overlap with any other
generator currently in use. For SplittableRandom the probability of the
same generator is about 2^-63. The probability of overlap is reduced
further as the start point of the sequence is variable too and the
length of the sequence that is consumed is not known. So if the number
of forks is much less than 2^63 (which it should be) then split will
suffice in this situation.

I have an algorithm that uses a set number of tasks to do random
splitting of a dataset. The dataset is shuffled before the task so even
if the same generator was used the output would work. Currently I create
a new generator to allow parallel computation but I could split a master
one or use a jumpable. Splitting would be faster than jumping for this
scenario and perfectly acceptable.

This leans me back towards adding a Splittable interface. The conditions
for a split would be that the split creates a new instance based on the
current state of the generator and updates the state. This allows split
to be called repeatedly on the same generator to get new distinct
instances that will not coincide with very high probability. It would be
akin to non-synchronized self-construction. This would then be faster
than using RandomSource.create because that uses synchronized seed
generation. This opens up more generators for use in recursive forking
algorithms that may not care if the two forks have the same generator
but require it to be created fast and thread local. So perhaps do not
support it for JDK (because it is a bad generator) and rule out those
with long initialisation routines TwoCmres, MT, etc. and perhaps also
ruling out those with large states (e.g. Well44497b).


On the subject of jumping we have three options:

1. jump advances the state and returns a copy with the advanced state.

2. jump advances the state and returns a copy with the old state.

3. jump advances the state. Add a copy method.

I think we ruled out number 3 as having a copy command is deemed to be
prone to misuse if using the same copy in multiple places.

Option 1 I think has an issue in that if you do 1 jump and believe you
have two non-overlapping generators you will be wrong. Take this case:

JumpableUniformRandomProvider rng1 = ...;

UniformRandomProvider rng2 = rng1.jump();

In option 1 rng1 and rng2 are actually the same.

In option 2 rng1 and rng2 are in different states where rng1 is far
ahead of rng2. So the returned state (rng2) can be documented to not
overlap the existing generator (rng1) until the jump length has consumed
by calls to generate numbers.


Note that neither option provides a loophole for a poor man's copy.
Option 1 would create a copy but it would be of a state far ahead
because it is after a jump. Option 2 creates a copy of the current state
but then the current state is changed by the jump so you don't have a copy.

To get a copy of the current state would require saving the state using
the RestoreableUniformRandomProvider, doing either of the jump options
to get an instance of the same class, then restoring the state to both.


To rule out doing this via the RestoreableUniformRandomProvider
functionality would require a method in RandomSource to restrict the
implementation to just the defined interface like the unrestorable method:

UniformRandomProvider unrestorable(UniformRandomProvider delegate)

Becomes:

UniformRandomProvider restrict(UniformRandomProvider delegate)
RestorableUniformRandomProvider restrict(RestorableUniformRandomProvider
delegate)
JumpableUniformRandomProvider restrict(JumpableUniformRandomProvider
delegate)
LongJumpableUniformRandomProvider
restrict(LongJumpableUniformRandomProvider delegate)

The correct method would have to be selected by casting:

LongJumpableUniformRandomProvider delegate = ...
JumpableUniformRandomProvider restrict((JumpableUniformRandomProvider)
delegate)

Or do not use overloaded methods and (with A, B, C, D appropriately named):

UniformRandomProvider restrictA(UniformRandomProvider delegate)
RestorableUniformRandomProvider
restrictB(RestorableUniformRandomProvider delegate)
JumpableUniformRandomProvider restrictC(JumpableUniformRandomProvider
delegate)
LongJumpableUniformRandomProvider
restrictD(LongJumpableUniformRandomProvider delegate)

A move to Java 8 would allow the restrict method for the interface to be
added as a static method to the interface itself which would be neater.


> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
Hello.

> On 11/04/2019 13:22, Gilles Sadowski wrote:
> >>> [...]
> >> Not adding a dedicated method would mean everyone has to do this:
> >>
> >> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)
> >>
> >> But adding a mirror methods:
> >>
> >> JumpableUniformRandomProvider RandomSource::createJumpable(…)
> >> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
> >>
> >> Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.
> > +1 (for leaving out).
> >
> >> Is there scope for needing the Jumpable to be detected through the API? E.g. add:
> >>
> >> boolean RandomSource::isJumpable(RandomSource)
> >>
> >> We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.
> > +1 (for leaving out for now).
> >
> >>> [...]
> >>>> I was hoping to avoid creating a duplicate class. But actually that
> >>>> would be fine and easier for testing. The implementation is trivial anyway.
> >>> Why duplicate?
> >>> IIUC, shouldn't the existing "core" class define an additional constructor
> >>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> >>> would use the default increment.
> >>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
> >> OK. That was where I should have gone to start with. I’ll do that.
> >>
> >>>> I've just finished an initial implementation of the MSWS RNG that uses a
> >>>> self-generated Weyl sequence. It works. So the idea for this can be
> >>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> >>>> both. Perhaps moving the code to create the Weyl sequence increment to
> >>>> NumberFactory.
> >>> +1
> >> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.
> >>
> >> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.
> > I'm a bit lost here.  I thought that "SplitMix64" did not provide
> > "jump",
>
> The state it is a linear sum. So can be jumped very easily.
>
> y = mx + c
>
> c is the current Weyl state, m the number of transitions and x the Weyl
> increment. So we can make SplitMix Jumpable. The maximum jump length is
> 2^64 which will wrap the state.
>
> > hence the
> > "Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)
> I used the term "weak assurance" as opposed to "concrete assurance". The
> wording used by the JDK is "with very high probability, the set of
> values collectively generated by the two objects has the same
> statistical properties as if the same quantity of values were generated
> by a single object". It is a bit of semantics.
>
> >
> > Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
> > cannot ensure that the added flexibility does not come with unsuspected
> > drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
> > choosing the (hard-coded) values that do produce good sequences.]
>
> The likelihood of overlap is 1/[the number of different Weyl sequences].
> This is around 2^-63. So it is small. However the base implementation
> for SplitMix uses a specific Weyl sequence increment based on the golden
> ratio. I do not know the original of this. For example is it better that
> any other irrational number converted to an integer, e.g. pi, Euler's
> number (e), sqrt(2)? The irrational number comes from the origin of the
> Weyl sequence which is based on irrationals. But there may be no reason
> that the golden ratio was chosen rather than any other source irrational.

Does the non-overlap weak assurance take into account the fact that
there are no irrational numbers in a truncated representation (like the
one stored in a computer)?
Isn't it why those "internal" numbers are carefully selected (like in
"TwoCmres"), because they generate better sequences?

>
> On the subject of splitting verses jumping:
>
> Jumping seems to be a case where the limit of the multi-threaded
> scenario is known beforehand, e.g. a simulation with 1024 parallel
> tasks, each task with a known limit on the count of random numbers
> required. So if your jump length is above the number of random numbers
> required for each task you can jump to get a single generator for each
> task and ensure no overlap.
>
> Split is used when the multi-threaded scenario is not known. This is a
> case for SplittableRandom which is used in parallel algorithms which use
> forking. The point at which a fork will occur and the task size of the
> fork are not known beforehand. The generator is able to create new
> generators for each fork that will very unlikely overlap with any other
> generator currently in use. For SplittableRandom the probability of the
> same generator is about 2^-63. The probability of overlap is reduced
> further as the start point of the sequence is variable too and the
> length of the sequence that is consumed is not known. So if the number
> of forks is much less than 2^63 (which it should be) then split will
> suffice in this situation.

I'm not convinced that we should add complexity to the API without
a use-case that would require
* fork-join,
* a RNG  that cannot be "SplittableRandom".

> I have an algorithm that uses a set number of tasks to do random
> splitting of a dataset. The dataset is shuffled before the task so even
> if the same generator was used the output would work. Currently I create
> a new generator to allow parallel computation but I could split a master
> one or use a jumpable. Splitting would be faster than jumping for this
> scenario and perfectly acceptable.

Do you mean that most of the total computing time is taken
by instantiating the generator?

> This leans me back towards adding a Splittable interface. The conditions
> for a split would be that the split creates a new instance based on the
> current state of the generator and updates the state. This allows split
> to be called repeatedly on the same generator to get new distinct
> instances that will not coincide with very high probability. It would be
> akin to non-synchronized self-construction. This would then be faster
> than using RandomSource.create because that uses synchronized seed
> generation.

If the seed auto-generation is the issue, why not bypass it (by providing
one's own seed)?
Using a generator's state as a seed for another instance seems but a
particular choice from the seed space.

> This opens up more generators for use in recursive forking
> algorithms that may not care if the two forks have the same generator
> but require it to be created fast and thread local. So perhaps do not
> support it for JDK (because it is a bad generator) and rule out those
> with long initialisation routines TwoCmres, MT, etc. and perhaps also
> ruling out those with large states (e.g. Well44497b).

Then we have to explain why "split" is not provided despite it could
be and could be requested by users with a different sense of what
is fast enough for some purpose.

>
> On the subject of jumping we have three options:
>
> 1. jump advances the state and returns a copy with the advanced state.
>
> 2. jump advances the state and returns a copy with the old state.
>
> 3. jump advances the state. Add a copy method.
>
> I think we ruled out number 3 as having a copy command is deemed to be
> prone to misuse if using the same copy in multiple places.
>
> Option 1 I think has an issue in that if you do 1 jump and believe you
> have two non-overlapping generators you will be wrong. Take this case:
>
> JumpableUniformRandomProvider rng1 = ...;
>
> UniformRandomProvider rng2 = rng1.jump();
>
> In option 1 rng1 and rng2 are actually the same.
>
> In option 2 rng1 and rng2 are in different states where rng1 is far
> ahead of rng2. So the returned state (rng2) can be documented to not
> overlap the existing generator (rng1) until the jump length has consumed
> by calls to generate numbers.
>
>
> Note that neither option provides a loophole for a poor man's copy.
> Option 1 would create a copy but it would be of a state far ahead
> because it is after a jump. Option 2 creates a copy of the current state
> but then the current state is changed by the jump so you don't have a copy.

I'm not sure my preferred option was mentioned.  After this statement:
    rng2 = rng1.jump();
the state of "rng1" is the same as before the call; the newly-created
"rng2" instance has its state far away.

I guess this is different from how it's usually provided in C implementations.
But it is more in line with the intended usage: create an instance that will
not overlap with its "parent".  The semantics is "createAndJump()".

> To get a copy of the current state would require saving the state using
> the RestoreableUniformRandomProvider, doing either of the jump options
> to get an instance of the same class, then restoring the state to both.

With my proposed option, nothing happens to the "parent"; hence the
above is never necessary IIUC.

> To rule out doing this via the RestoreableUniformRandomProvider
> functionality would require a method in RandomSource to restrict the
> implementation to just the defined interface like the unrestorable method:
>
> UniformRandomProvider unrestorable(UniformRandomProvider delegate)
>
> Becomes:
>
> UniformRandomProvider restrict(UniformRandomProvider delegate)
> RestorableUniformRandomProvider restrict(RestorableUniformRandomProvider
> delegate)
> JumpableUniformRandomProvider restrict(JumpableUniformRandomProvider
> delegate)
> LongJumpableUniformRandomProvider
> restrict(LongJumpableUniformRandomProvider delegate)
>
> The correct method would have to be selected by casting:
>
> LongJumpableUniformRandomProvider delegate = ...
> JumpableUniformRandomProvider restrict((JumpableUniformRandomProvider)
> delegate)
>
> Or do not use overloaded methods and (with A, B, C, D appropriately named):
>
> UniformRandomProvider restrictA(UniformRandomProvider delegate)
> RestorableUniformRandomProvider
> restrictB(RestorableUniformRandomProvider delegate)
> JumpableUniformRandomProvider restrictC(JumpableUniformRandomProvider
> delegate)
> LongJumpableUniformRandomProvider
> restrictD(LongJumpableUniformRandomProvider delegate)
>
> A move to Java 8 would allow the restrict method for the interface to be
> added as a static method to the interface itself which would be neater.
>

I think that we could discuss "restrict" options separately.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert


> On 14 Apr 2019, at 01:31, Gilles Sadowski <[hidden email]> wrote:
>
> Hello.
>
>> On 11/04/2019 13:22, Gilles Sadowski wrote:
>>>>> [...]
>>>> Not adding a dedicated method would mean everyone has to do this:
>>>>
>>>> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)
>>>>
>>>> But adding a mirror methods:
>>>>
>>>> JumpableUniformRandomProvider RandomSource::createJumpable(…)
>>>> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
>>>>
>>>> Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.
>>> +1 (for leaving out).
>>>
>>>> Is there scope for needing the Jumpable to be detected through the API? E.g. add:
>>>>
>>>> boolean RandomSource::isJumpable(RandomSource)
>>>>
>>>> We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.
>>> +1 (for leaving out for now).
>>>
>>>>> [...]
>>>>>> I was hoping to avoid creating a duplicate class. But actually that
>>>>>> would be fine and easier for testing. The implementation is trivial anyway.
>>>>> Why duplicate?
>>>>> IIUC, shouldn't the existing "core" class define an additional constructor
>>>>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
>>>>> would use the default increment.
>>>>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
>>>> OK. That was where I should have gone to start with. I’ll do that.
>>>>
>>>>>> I've just finished an initial implementation of the MSWS RNG that uses a
>>>>>> self-generated Weyl sequence. It works. So the idea for this can be
>>>>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
>>>>>> both. Perhaps moving the code to create the Weyl sequence increment to
>>>>>> NumberFactory.
>>>>> +1
>>>> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.
>>>>
>>>> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.
>>> I'm a bit lost here.  I thought that "SplitMix64" did not provide
>>> "jump",
>>
>> The state it is a linear sum. So can be jumped very easily.
>>
>> y = mx + c
>>
>> c is the current Weyl state, m the number of transitions and x the Weyl
>> increment. So we can make SplitMix Jumpable. The maximum jump length is
>> 2^64 which will wrap the state.
>>
>>> hence the
>>> "Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)
>> I used the term "weak assurance" as opposed to "concrete assurance". The
>> wording used by the JDK is "with very high probability, the set of
>> values collectively generated by the two objects has the same
>> statistical properties as if the same quantity of values were generated
>> by a single object". It is a bit of semantics.
>>
>>>
>>> Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
>>> cannot ensure that the added flexibility does not come with unsuspected
>>> drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
>>> choosing the (hard-coded) values that do produce good sequences.]
>>
>> The likelihood of overlap is 1/[the number of different Weyl sequences].
>> This is around 2^-63. So it is small. However the base implementation
>> for SplitMix uses a specific Weyl sequence increment based on the golden
>> ratio. I do not know the original of this. For example is it better that
>> any other irrational number converted to an integer, e.g. pi, Euler's
>> number (e), sqrt(2)? The irrational number comes from the origin of the
>> Weyl sequence which is based on irrationals. But there may be no reason
>> that the golden ratio was chosen rather than any other source irrational.
>
> Does the non-overlap weak assurance take into account the fact that
> there are no irrational numbers in a truncated representation (like the
> one stored in a computer)?

I would say that the 'weak assurance'/'very unlikely' statements allude to the fact that no-one has actually tested all the 2^63 different Weyl sequences after they pass through the hash mixing function to see if they output the same sequence as any other, or are weaker as random generators. It would be quite a task. So you cannot say they will not overlap, just that it probably will not. And you cannot say much about the quality of the generator beyond the results of those you have tested. (But this is the same for any large state space generator.)

> Isn't it why those "internal" numbers are carefully selected (like in
> "TwoCmres"), because they generate better sequences?

That uses a multiply, rotate and subtraction on a number. So the multiply and rotate have been tested to maximise the period of the sequence, and/or the quality of the output. With a Weyl sequence it is proven to have a period of 2^64 (for a long) if the increment is odd. However that sequence then has to be mixed to a decent uniform output. I’ve not read the details on the mixing functions and how they are derived or tested to so would not want to comment too much. My guess would be that the mixing of bits in the long is more random if the bits change a lot each step. So using the Golden ratio, that has a high number of state changes across the 64-bit, to create the sequence works well. Using 1, which would not change many bits at each step in the sequence, would be bad.

>
>>
>> On the subject of splitting verses jumping:
>>
>> Jumping seems to be a case where the limit of the multi-threaded
>> scenario is known beforehand, e.g. a simulation with 1024 parallel
>> tasks, each task with a known limit on the count of random numbers
>> required. So if your jump length is above the number of random numbers
>> required for each task you can jump to get a single generator for each
>> task and ensure no overlap.
>>
>> Split is used when the multi-threaded scenario is not known. This is a
>> case for SplittableRandom which is used in parallel algorithms which use
>> forking. The point at which a fork will occur and the task size of the
>> fork are not known beforehand. The generator is able to create new
>> generators for each fork that will very unlikely overlap with any other
>> generator currently in use. For SplittableRandom the probability of the
>> same generator is about 2^-63. The probability of overlap is reduced
>> further as the start point of the sequence is variable too and the
>> length of the sequence that is consumed is not known. So if the number
>> of forks is much less than 2^63 (which it should be) then split will
>> suffice in this situation.
>
> I'm not convinced that we should add complexity to the API without
> a use-case that would require
> * fork-join,
> * a RNG  that cannot be "SplittableRandom”.

I think that Splittable as we are discussing it is essentially a mix of functionality:

UniformRandomProvider rng
Supplier<UniformRandomProvider> rngSupplier

The Splittable interface just nicely allows you to create an algorithm that declares that it wants a random generator but may actually want more than one because it may create multiple threads. The number of threads is unknown so it needs to be able to create generators as it goes.

For now I am fine with not putting it into the library. I do not see it as essential. It is a nice to have. But as you said it does raise the question of when (and justifying why) we support if for each of the generators.

Given that SplittableRandom is designed to support the Java framework, is not extensible (is a final class) and so seems to be a closed solution to fork-join in Java leaving it out of the API until a concrete use case for it occurs seems sensible.

>
>> I have an algorithm that uses a set number of tasks to do random
>> splitting of a dataset. The dataset is shuffled before the task so even
>> if the same generator was used the output would work. Currently I create
>> a new generator to allow parallel computation but I could split a master
>> one or use a jumpable. Splitting would be faster than jumping for this
>> scenario and perfectly acceptable.
>
> Do you mean that most of the total computing time is taken
> by instantiating the generator?

No. So speed of splitting verses instantiating the generator is mute. My point was that I just need a new thread safe instance. A copy would be fine. A split would be fine. A new instance would be fine. That algorithm was pre Java 8. I could just update it to use SplittableRandom. It is not using any of the sampling functionality that hands a UniformRandomProvider to another class. The RNG is used for shuffling and subset sampling so just needs nextInt(int).

I’ve got a bit of homework to find a case where SplittableRandom comes into its own beyond just offering Supplier<UniformRandomProvider> as part of it’s API.

>
>> This leans me back towards adding a Splittable interface. The conditions
>> for a split would be that the split creates a new instance based on the
>> current state of the generator and updates the state. This allows split
>> to be called repeatedly on the same generator to get new distinct
>> instances that will not coincide with very high probability. It would be
>> akin to non-synchronized self-construction. This would then be faster
>> than using RandomSource.create because that uses synchronized seed
>> generation.
>
> If the seed auto-generation is the issue, why not bypass it (by providing
> one's own seed)?
> Using a generator's state as a seed for another instance seems but a
> particular choice from the seed space.

I think the reason to use ones own mixed up state as the seed is to ensure that the state of the new generator is different. The implementations I have seen do not actually check this is true so it just a probability gamble anyway. No different from creating a new seed and instance. So it is just for convenience.

>
>> This opens up more generators for use in recursive forking
>> algorithms that may not care if the two forks have the same generator
>> but require it to be created fast and thread local. So perhaps do not
>> support it for JDK (because it is a bad generator) and rule out those
>> with long initialisation routines TwoCmres, MT, etc. and perhaps also
>> ruling out those with large states (e.g. Well44497b).
>
> Then we have to explain why "split" is not provided despite it could
> be and could be requested by users with a different sense of what
> is fast enough for some purpose.
>
>>
>> On the subject of jumping we have three options:
>>
>> 1. jump advances the state and returns a copy with the advanced state.
>>
>> 2. jump advances the state and returns a copy with the old state.
>>
>> 3. jump advances the state. Add a copy method.
>>
>> I think we ruled out number 3 as having a copy command is deemed to be
>> prone to misuse if using the same copy in multiple places.
>>
>> Option 1 I think has an issue in that if you do 1 jump and believe you
>> have two non-overlapping generators you will be wrong. Take this case:
>>
>> JumpableUniformRandomProvider rng1 = ...;
>>
>> UniformRandomProvider rng2 = rng1.jump();
>>
>> In option 1 rng1 and rng2 are actually the same.
>>
>> In option 2 rng1 and rng2 are in different states where rng1 is far
>> ahead of rng2. So the returned state (rng2) can be documented to not
>> overlap the existing generator (rng1) until the jump length has consumed
>> by calls to generate numbers.
>>
>>
>> Note that neither option provides a loophole for a poor man's copy.
>> Option 1 would create a copy but it would be of a state far ahead
>> because it is after a jump. Option 2 creates a copy of the current state
>> but then the current state is changed by the jump so you don't have a copy.
>
> I'm not sure my preferred option was mentioned.  After this statement:
>    rng2 = rng1.jump();
> the state of "rng1" is the same as before the call; the newly-created
> "rng2" instance has its state far away.

This was mentioned. I’d ruled it out as jumping the same generator would not move it anywhere and always return the same child. The user would have to:

JumpableURP rng1 = …
JumpableURP rng2 = rng1.jump();
JumpableURP rng3 = rng2.jump();

Rng1 is where it started.

So the options are (in all cases returning the copy):

1. createAndJumpCopy
2. copyAndJumpParent
3. jumpParentAndCopy
4. jump and copy separately

1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately. It is not actually possible to jump forward a single instance. Only children are advanced.
2. My preferred option*. A copy of the state is made. The current state is advanced and the old state that was left behind is returned. This can be called repeatedly to create new generators that are periodically spaced behind the current generator. Or you can ignore the return value and just use it to jump forward.
3. The current state is advanced and a copy returned. After a single call to this method you have two generators that are the same.
4. My other preferred option*. The most flexible API. Jump just advances the state. It is left to the user to decide if they copy before or after the jump. Since the jump will be a void method the user will have to think about what copy they want to use. But it does expose copy functionality.

*A bit of cheating to have two preferences.

>
> I guess this is different from how it's usually provided in C implementations.
> But it is more in line with the intended usage: create an instance that will
> not overlap with its "parent".  The semantics is "createAndJump()”.

In the c implementation you would have to decide when to copy the state, before or after the jump. The jump just advances the state.

So perhaps adding a copy() to the JumpableUniformRandomProvider interface is the least prone to having design challenges later from a user who wants createAndJump rather than jumpAndCopy type functionality.

>
>> To get a copy of the current state would require saving the state using
>> the RestoreableUniformRandomProvider, doing either of the jump options
>> to get an instance of the same class, then restoring the state to both.
>
> With my proposed option, nothing happens to the "parent"; hence the
> above is never necessary IIUC.

I feel that createAndJump makes writing in loops or forking less elegant since the controller of the jumps is the one who should have the generator that will be jumped repeatedly:

JumpableURP rng1 = …
for ( ;; ) {
    JumpableURP advancedCopy = rng1.createAndJumpCopy();
    forkSomething(rng1);
    rng1 = advancedCopy; // To move control forward
}

JumpableURP rng1 = …
for ( ;; ) {
    forkSomething(rng1.copyAndJumpParent());
}

JumpableURP rng1 = …
for ( ;; ) {
    forkSomething(rng1.copy());
    rng1.jump();
}

// This one liner is not the same effect as the first fork will have an advanced (jumped) state.
JumpableURP rng1 = …
for ( ;; ) {
    forkSomething(rng1.jump().copy());
}

Thoughts?

>
>> To rule out doing this via the RestoreableUniformRandomProvider
>> functionality would require a method in RandomSource to restrict the
>> implementation to just the defined interface like the unrestorable method:
>>
>> UniformRandomProvider unrestorable(UniformRandomProvider delegate)
>>
>> Becomes:
>>
>> UniformRandomProvider restrict(UniformRandomProvider delegate)
>> RestorableUniformRandomProvider restrict(RestorableUniformRandomProvider
>> delegate)
>> JumpableUniformRandomProvider restrict(JumpableUniformRandomProvider
>> delegate)
>> LongJumpableUniformRandomProvider
>> restrict(LongJumpableUniformRandomProvider delegate)
>>
>> The correct method would have to be selected by casting:
>>
>> LongJumpableUniformRandomProvider delegate = ...
>> JumpableUniformRandomProvider restrict((JumpableUniformRandomProvider)
>> delegate)
>>
>> Or do not use overloaded methods and (with A, B, C, D appropriately named):
>>
>> UniformRandomProvider restrictA(UniformRandomProvider delegate)
>> RestorableUniformRandomProvider
>> restrictB(RestorableUniformRandomProvider delegate)
>> JumpableUniformRandomProvider restrictC(JumpableUniformRandomProvider
>> delegate)
>> LongJumpableUniformRandomProvider
>> restrictD(LongJumpableUniformRandomProvider delegate)
>>
>> A move to Java 8 would allow the restrict method for the interface to be
>> added as a static method to the interface itself which would be neater.
>>
>
> I think that we could discuss "restrict" options separately.

OK. Let’s sort out exactly what the Jump interface will be. Adding a restrict option is easy in practice but the choice not so obvious. A look through the Java API for similar things would help.

For example I have found methods:

Executors.unconfigurableXYZ  (Java 1.8)
Collections.synchronizedX
Collections.unmodifiableX

So ‘unconfigurable’ seems to match. It stops any configuration of the executor implementation and just exposes the pure Executor interface. The Collections methods either add some function (synchronized) or take away functionality (unmodifiable).

A further search through the Java API is for another day.


>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] <mailto:[hidden email]>
> For additional commands, e-mail: [hidden email] <mailto:[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
Hello.

Le lun. 15 avr. 2019 à 01:03, Alex Herbert <[hidden email]> a écrit :

>
>
>
> > On 14 Apr 2019, at 01:31, Gilles Sadowski <[hidden email]> wrote:
> >
> > Hello.
> >
> >> On 11/04/2019 13:22, Gilles Sadowski wrote:
> >>>>> [...]
> >>>> Not adding a dedicated method would mean everyone has to do this:
> >>>>
> >>>> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)
> >>>>
> >>>> But adding a mirror methods:
> >>>>
> >>>> JumpableUniformRandomProvider RandomSource::createJumpable(…)
> >>>> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
> >>>>
> >>>> Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.
> >>> +1 (for leaving out).
> >>>
> >>>> Is there scope for needing the Jumpable to be detected through the API? E.g. add:
> >>>>
> >>>> boolean RandomSource::isJumpable(RandomSource)
> >>>>
> >>>> We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.
> >>> +1 (for leaving out for now).
> >>>
> >>>>> [...]
> >>>>>> I was hoping to avoid creating a duplicate class. But actually that
> >>>>>> would be fine and easier for testing. The implementation is trivial anyway.
> >>>>> Why duplicate?
> >>>>> IIUC, shouldn't the existing "core" class define an additional constructor
> >>>>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
> >>>>> would use the default increment.
> >>>>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
> >>>> OK. That was where I should have gone to start with. I’ll do that.
> >>>>
> >>>>>> I've just finished an initial implementation of the MSWS RNG that uses a
> >>>>>> self-generated Weyl sequence. It works. So the idea for this can be
> >>>>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
> >>>>>> both. Perhaps moving the code to create the Weyl sequence increment to
> >>>>>> NumberFactory.
> >>>>> +1
> >>>> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.
> >>>>
> >>>> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.
> >>> I'm a bit lost here.  I thought that "SplitMix64" did not provide
> >>> "jump",
> >>
> >> The state it is a linear sum. So can be jumped very easily.
> >>
> >> y = mx + c
> >>
> >> c is the current Weyl state, m the number of transitions and x the Weyl
> >> increment. So we can make SplitMix Jumpable. The maximum jump length is
> >> 2^64 which will wrap the state.
> >>
> >>> hence the
> >>> "Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)
> >> I used the term "weak assurance" as opposed to "concrete assurance". The
> >> wording used by the JDK is "with very high probability, the set of
> >> values collectively generated by the two objects has the same
> >> statistical properties as if the same quantity of values were generated
> >> by a single object". It is a bit of semantics.
> >>
> >>>
> >>> Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
> >>> cannot ensure that the added flexibility does not come with unsuspected
> >>> drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
> >>> choosing the (hard-coded) values that do produce good sequences.]
> >>
> >> The likelihood of overlap is 1/[the number of different Weyl sequences].
> >> This is around 2^-63. So it is small. However the base implementation
> >> for SplitMix uses a specific Weyl sequence increment based on the golden
> >> ratio. I do not know the original of this. For example is it better that
> >> any other irrational number converted to an integer, e.g. pi, Euler's
> >> number (e), sqrt(2)? The irrational number comes from the origin of the
> >> Weyl sequence which is based on irrationals. But there may be no reason
> >> that the golden ratio was chosen rather than any other source irrational.
> >
> > Does the non-overlap weak assurance take into account the fact that
> > there are no irrational numbers in a truncated representation (like the
> > one stored in a computer)?
>
> I would say that the 'weak assurance'/'very unlikely' statements allude to the fact that no-one has actually tested all the 2^63 different Weyl sequences after they pass through the hash mixing function to see if they output the same sequence as any other, or are weaker as random generators. It would be quite a task. So you cannot say they will not overlap, just that it probably will not. And you cannot say much about the quality of the generator beyond the results of those you have tested. (But this is the same for any large state space generator.)
>
> > Isn't it why those "internal" numbers are carefully selected (like in
> > "TwoCmres"), because they generate better sequences?
>
> That uses a multiply, rotate and subtraction on a number. So the multiply and rotate have been tested to maximise the period of the sequence, and/or the quality of the output. With a Weyl sequence it is proven to have a period of 2^64 (for a long) if the increment is odd. However that sequence then has to be mixed to a decent uniform output. I’ve not read the details on the mixing functions and how they are derived or tested to so would not want to comment too much. My guess would be that the mixing of bits in the long is more random if the bits change a lot each step. So using the Golden ratio, that has a high number of state changes across the 64-bit, to create the sequence works well. Using 1, which would not change many bits at each step in the sequence, would be bad.

I don't feel confident that "Commons RNG" should decide how to generate
the numbers to be used in place of the one hard-coded in the reference
algorithm (Golden ratio).

As discussed, we could provide
1. "customizable" implementations for which the "Weyl" number
can be passed at construction (cf. "SPLIT_MIX_63_K"), and
2. an *example* code of how good "K" numbers could be generated
(e.g. taking any number and boosting the number of transitions).

> >
> >>
> >> On the subject of splitting verses jumping:
> >>
> >> Jumping seems to be a case where the limit of the multi-threaded
> >> scenario is known beforehand, e.g. a simulation with 1024 parallel
> >> tasks, each task with a known limit on the count of random numbers
> >> required. So if your jump length is above the number of random numbers
> >> required for each task you can jump to get a single generator for each
> >> task and ensure no overlap.
> >>
> >> Split is used when the multi-threaded scenario is not known. This is a
> >> case for SplittableRandom which is used in parallel algorithms which use
> >> forking. The point at which a fork will occur and the task size of the
> >> fork are not known beforehand. The generator is able to create new
> >> generators for each fork that will very unlikely overlap with any other
> >> generator currently in use. For SplittableRandom the probability of the
> >> same generator is about 2^-63. The probability of overlap is reduced
> >> further as the start point of the sequence is variable too and the
> >> length of the sequence that is consumed is not known. So if the number
> >> of forks is much less than 2^63 (which it should be) then split will
> >> suffice in this situation.
> >
> > I'm not convinced that we should add complexity to the API without
> > a use-case that would require
> > * fork-join,
> > * a RNG  that cannot be "SplittableRandom”.
>
> I think that Splittable as we are discussing it is essentially a mix of functionality:
>
> UniformRandomProvider rng
> Supplier<UniformRandomProvider> rngSupplier
>
> The Splittable interface just nicely allows you to create an algorithm that declares that it wants a random generator but may actually want more than one because it may create multiple threads. The number of threads is unknown so it needs to be able to create generators as it goes.
>
> For now I am fine with not putting it into the library. I do not see it as essential.

Thanks.

> It is a nice to have.

As per the above, given the "SPLIT_MIX_64_K" version of an algorithm, I'd rather
let the user implement what he needs:

for (int i = 1; i < numThreads; i += 2) {
    final long k = boostTransitions(i); // User-defined.
    final UniformRandomProvider rng =
RandomSource.create(SPLIT_MIX_64_K, k); // Commons RNG.
    startProcessingThread(rng); // User-defined.
}

> But as you said it does raise the question of when (and justifying why) we support if for each of the generators.
>
> Given that SplittableRandom is designed to support the Java framework, is not extensible (is a final class) and so seems to be a closed solution to fork-join in Java leaving it out of the API until a concrete use case for it occurs seems sensible.
>
> >
> >> I have an algorithm that uses a set number of tasks to do random
> >> splitting of a dataset. The dataset is shuffled before the task so even
> >> if the same generator was used the output would work. Currently I create
> >> a new generator to allow parallel computation but I could split a master
> >> one or use a jumpable. Splitting would be faster than jumping for this
> >> scenario and perfectly acceptable.
> >
> > Do you mean that most of the total computing time is taken
> > by instantiating the generator?
>
> No. So speed of splitting verses instantiating the generator is mute. My point was that I just need a new thread safe instance. A copy would be fine. A split would be fine. A new instance would be fine. That algorithm was pre Java 8. I could just update it to use SplittableRandom. It is not using any of the sampling functionality that hands a UniformRandomProvider to another class. The RNG is used for shuffling and subset sampling so just needs nextInt(int).
>
> I’ve got a bit of homework to find a case where SplittableRandom comes into its own beyond just offering Supplier<UniformRandomProvider> as part of it’s API.
>
> >
> >> This leans me back towards adding a Splittable interface. The conditions
> >> for a split would be that the split creates a new instance based on the
> >> current state of the generator and updates the state. This allows split
> >> to be called repeatedly on the same generator to get new distinct
> >> instances that will not coincide with very high probability. It would be
> >> akin to non-synchronized self-construction. This would then be faster
> >> than using RandomSource.create because that uses synchronized seed
> >> generation.
> >
> > If the seed auto-generation is the issue, why not bypass it (by providing
> > one's own seed)?
> > Using a generator's state as a seed for another instance seems but a
> > particular choice from the seed space.
>
> I think the reason to use ones own mixed up state as the seed is to ensure that the state of the new generator is different. The implementations I have seen do not actually check this is true so it just a probability gamble anyway. No different from creating a new seed and instance. So it is just for convenience.

IMO, less risk by leaving it out too.

> >
> >> This opens up more generators for use in recursive forking
> >> algorithms that may not care if the two forks have the same generator
> >> but require it to be created fast and thread local. So perhaps do not
> >> support it for JDK (because it is a bad generator) and rule out those
> >> with long initialisation routines TwoCmres, MT, etc. and perhaps also
> >> ruling out those with large states (e.g. Well44497b).
> >
> > Then we have to explain why "split" is not provided despite it could
> > be and could be requested by users with a different sense of what
> > is fast enough for some purpose.
> >
> >>
> >> On the subject of jumping we have three options:
> >>
> >> 1. jump advances the state and returns a copy with the advanced state.
> >>
> >> 2. jump advances the state and returns a copy with the old state.
> >>
> >> 3. jump advances the state. Add a copy method.
> >>
> >> I think we ruled out number 3 as having a copy command is deemed to be
> >> prone to misuse if using the same copy in multiple places.
> >>
> >> Option 1 I think has an issue in that if you do 1 jump and believe you
> >> have two non-overlapping generators you will be wrong. Take this case:
> >>
> >> JumpableUniformRandomProvider rng1 = ...;
> >>
> >> UniformRandomProvider rng2 = rng1.jump();
> >>
> >> In option 1 rng1 and rng2 are actually the same.
> >>
> >> In option 2 rng1 and rng2 are in different states where rng1 is far
> >> ahead of rng2. So the returned state (rng2) can be documented to not
> >> overlap the existing generator (rng1) until the jump length has consumed
> >> by calls to generate numbers.
> >>
> >>
> >> Note that neither option provides a loophole for a poor man's copy.
> >> Option 1 would create a copy but it would be of a state far ahead
> >> because it is after a jump. Option 2 creates a copy of the current state
> >> but then the current state is changed by the jump so you don't have a copy.
> >
> > I'm not sure my preferred option was mentioned.  After this statement:
> >    rng2 = rng1.jump();
> > the state of "rng1" is the same as before the call; the newly-created
> > "rng2" instance has its state far away.
>
> This was mentioned. I’d ruled it out as jumping the same generator would not move it anywhere and always return the same child. The user would have to:
>
> JumpableURP rng1 = …
> JumpableURP rng2 = rng1.jump();
> JumpableURP rng3 = rng2.jump();
>
> Rng1 is where it started.

Yes; my point was that "jump" could be considered a factory method.

>
> So the options are (in all cases returning the copy):
>
> 1. createAndJumpCopy
> 2. copyAndJumpParent
> 3. jumpParentAndCopy
> 4. jump and copy separately
>
> 1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately.

We could provide a convenience method in  "RandomSource":

public UniformRandomProvider[] jump(int n,
JumpableUniformRandomProvider parent) {
    final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
    UniformRandomProvider tmp = parent;
    for (int i = 0; i < n; i++) {
        rngs[i] = restrict(tmp);
        tmp = tmp.jump();
    }
    return rngs;
}

> It is not actually possible to jump forward a single instance. Only children are advanced.

A feature: There is only one way to alter the state of an instance
(i.e. a call to "next()").

> 2. My preferred option*. A copy of the state is made. The current state is advanced and the old state that was left behind is returned. This can be called repeatedly to create new generators that are periodically spaced behind the current generator. Or you can ignore the return value and just use it to jump forward.

Allows shooting oneself in the foot (e.g. when the return value is not ignored,
and then used to make another jump, whose sequence will overlap with the
original parent's).

> 3. The current state is advanced and a copy returned. After a single call to this method you have two generators that are the same.

Challenge: Find a use-case that is not broken. ;-)

> 4. My other preferred option*. The most flexible API. Jump just advances the state. It is left to the user to decide if they copy before or after the jump. Since the jump will be a void method the user will have to think about what copy they want to use. But it does expose copy functionality.

I still fail to see the need to be able to do that, if the purpose is to
allow instantiating RNGs that do not overlap for a definite number
of calls to "next()".

>
> *A bit of cheating to have two preferences.
>
> >
> > I guess this is different from how it's usually provided in C implementations.
> > But it is more in line with the intended usage: create an instance that will
> > not overlap with its "parent".  The semantics is "createAndJump()”.
>
> In the c implementation you would have to decide when to copy the state, before or after the jump. The jump just advances the state.

Yes, because it would be a waste of a costly system call to do a copy
if only a "jump" is needed.  But in Java the copy is much cheaper: Doing

rng.jump(); // "jump" altered the state.
// Use "rng".

vs.

rng = rng.jump(); // "jump" did not alter the state.
// Use "rng".

Would not make any difference performance-wise in sensible use-cases (i.e.
where the sequence is long enough that "jump" was needed in the first place).

> So perhaps adding a copy() to the JumpableUniformRandomProvider interface is the least prone to having design challenges later from a user who wants createAndJump rather than jumpAndCopy type functionality.

Am I missing something that actually requires "copy"?

> >
> >> To get a copy of the current state would require saving the state using
> >> the RestoreableUniformRandomProvider, doing either of the jump options
> >> to get an instance of the same class, then restoring the state to both.
> >
> > With my proposed option, nothing happens to the "parent"; hence the
> > above is never necessary IIUC.
>
> I feel that createAndJump makes writing in loops or forking less elegant since the controller of the jumps is the one who should have the generator that will be jumped repeatedly:
>
> JumpableURP rng1 = …
> for ( ;; ) {
>     JumpableURP advancedCopy = rng1.createAndJumpCopy();
>     forkSomething(rng1);
>     rng1 = advancedCopy; // To move control forward
> }
>
> JumpableURP rng1 = …
> for ( ;; ) {
>     forkSomething(rng1.copyAndJumpParent());
> }

With the convenience method proposed above, the difference becomes moot:

UPR[] rngs = RandomSource.jump(12, rng1);
for (i = 0 ; i < rngs.length; i++) {
    forkSomething(rngs[i]);
}

>
> JumpableURP rng1 = …
> for ( ;; ) {
>     forkSomething(rng1.copy());
>     rng1.jump();
> }
>
> // This one liner is not the same effect as the first fork will have an advanced (jumped) state.
> JumpableURP rng1 = …
> for ( ;; ) {
>     forkSomething(rng1.jump().copy());
> }
>
> Thoughts?
>
>
> >
> >> To rule out doing this via the RestoreableUniformRandomProvider
> >> functionality would require a method in RandomSource to restrict the
> >> implementation to just the defined interface like the unrestorable method:
> >>
> >> UniformRandomProvider unrestorable(UniformRandomProvider delegate)
> >>
> >> Becomes:
> >>
> >> UniformRandomProvider restrict(UniformRandomProvider delegate)
> >> RestorableUniformRandomProvider restrict(RestorableUniformRandomProvider
> >> delegate)
> >> JumpableUniformRandomProvider restrict(JumpableUniformRandomProvider
> >> delegate)
> >> LongJumpableUniformRandomProvider
> >> restrict(LongJumpableUniformRandomProvider delegate)
> >>
> >> The correct method would have to be selected by casting:
> >>
> >> LongJumpableUniformRandomProvider delegate = ...
> >> JumpableUniformRandomProvider restrict((JumpableUniformRandomProvider)
> >> delegate)
> >>
> >> Or do not use overloaded methods and (with A, B, C, D appropriately named):
> >>
> >> UniformRandomProvider restrictA(UniformRandomProvider delegate)
> >> RestorableUniformRandomProvider
> >> restrictB(RestorableUniformRandomProvider delegate)
> >> JumpableUniformRandomProvider restrictC(JumpableUniformRandomProvider
> >> delegate)
> >> LongJumpableUniformRandomProvider
> >> restrictD(LongJumpableUniformRandomProvider delegate)
> >>
> >> A move to Java 8 would allow the restrict method for the interface to be
> >> added as a static method to the interface itself which would be neater.
> >>
> >
> > I think that we could discuss "restrict" options separately.
>
> OK. Let’s sort out exactly what the Jump interface will be. Adding a restrict option is easy in practice but the choice not so obvious. A look through the Java API for similar things would help.
>
> For example I have found methods:
>
> Executors.unconfigurableXYZ  (Java 1.8)
> Collections.synchronizedX
> Collections.unmodifiableX
>
> So ‘unconfigurable’ seems to match.

Not so sure.  At this point, I prefer "restrict" (to the URP interface).

> It stops any configuration of the executor implementation and just exposes the pure Executor interface. The Collections methods either add some function (synchronized) or take away functionality (unmodifiable).
>
> A further search through the Java API is for another day.
>

Best,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert

On 17/04/2019 16:43, Gilles Sadowski wrote:

> Hello.
>
> Le lun. 15 avr. 2019 à 01:03, Alex Herbert <[hidden email]> a écrit :
>>
>>
>>> On 14 Apr 2019, at 01:31, Gilles Sadowski <[hidden email]> wrote:
>>>
>>> Hello.
>>>
>>>> On 11/04/2019 13:22, Gilles Sadowski wrote:
>>>>>>> [...]
>>>>>> Not adding a dedicated method would mean everyone has to do this:
>>>>>>
>>>>>> JumpableUniformRandomProvider rng = (JumpableUniformRandomProvider) RandomSource.create(…)
>>>>>>
>>>>>> But adding a mirror methods:
>>>>>>
>>>>>> JumpableUniformRandomProvider RandomSource::createJumpable(…)
>>>>>> LongJumpableUniformRandomProvider RandomSource::createLongJumpable(…)
>>>>>>
>>>>>> Does seem to add clutter to RandomSource. If we leave it out for now it can be added in the future.
>>>>> +1 (for leaving out).
>>>>>
>>>>>> Is there scope for needing the Jumpable to be detected through the API? E.g. add:
>>>>>>
>>>>>> boolean RandomSource::isJumpable(RandomSource)
>>>>>>
>>>>>> We would just need to maintain an EnumSet for Jumpable and likewise for LongJumpable. Again more clutter to the RandomSource interface but perhaps less so than a mirror create method that would throw IllegalArgumentException for any RandomSource specified that is not Jumpable.
>>>>> +1 (for leaving out for now).
>>>>>
>>>>>>> [...]
>>>>>>>> I was hoping to avoid creating a duplicate class. But actually that
>>>>>>>> would be fine and easier for testing. The implementation is trivial anyway.
>>>>>>> Why duplicate?
>>>>>>> IIUC, shouldn't the existing "core" class define an additional constructor
>>>>>>> that accepts the "K" argument.  Then the current "SPLIT_MIX_64"
>>>>>>> would use the default increment.
>>>>>>> [Same as with "TWO_CMRES" and "TWO_CMRES_SELECT", no?.]
>>>>>> OK. That was where I should have gone to start with. I’ll do that.
>>>>>>
>>>>>>>> I've just finished an initial implementation of the MSWS RNG that uses a
>>>>>>>> self-generated Weyl sequence. It works. So the idea for this can be
>>>>>>>> applied to SPLIT_MIX_64_K by using the same Weyl sequence generation in
>>>>>>>> both. Perhaps moving the code to create the Weyl sequence increment to
>>>>>>>> NumberFactory.
>>>>>>> +1
>>>>>> OK. So I created a Jira ticket for the SPLIT_MIX_64_K. I’m not sure when I’ll get around to doing this though. It seems less important than other tasks. It may even be redundant if the only reason is to increase the state space for the generator. Each generator would still only have a period of 2^64. You can just create a lot of them with a weak assurance they will not overlap and that the uniformity is statistically the same. Since there are of the order of 2^63 variants we are not going to be able to test this and would have to rely on the theory behind it.
>>>>>>
>>>>>> If we add Jumpable to SplitMix with jumps of 2^32 and 2^48 then you can create either 2^32 rngs with no overlap for the first 2^32 output numbers or 2^16 rngs that can each be jumped 2^16 times with no overlap for the first 2^32 output numbers. That is a lot for a small parallel situation and does have the assurance of no overlap. Any parallel usage where longer sequences are expected to be used can use one of the XorShiRo generators.
>>>>> I'm a bit lost here.  I thought that "SplitMix64" did not provide
>>>>> "jump",
>>>> The state it is a linear sum. So can be jumped very easily.
>>>>
>>>> y = mx + c
>>>>
>>>> c is the current Weyl state, m the number of transitions and x the Weyl
>>>> increment. So we can make SplitMix Jumpable. The maximum jump length is
>>>> 2^64 which will wrap the state.
>>>>
>>>>> hence the
>>>>> "Weyl way" to ensure no-overlap, with high probability (why "weak assurance"?)
>>>> I used the term "weak assurance" as opposed to "concrete assurance". The
>>>> wording used by the JDK is "with very high probability, the set of
>>>> values collectively generated by the two objects has the same
>>>> statistical properties as if the same quantity of values were generated
>>>> by a single object". It is a bit of semantics.
>>>>
>>>>> Anyways, I agree that SPLIT_MIX_64_K is low priority, but especially if we
>>>>> cannot ensure that the added flexibility does not come with unsuspected
>>>>> drawbacks (?).  [IIRC, the article about "TwoCmres" reported experiments for
>>>>> choosing the (hard-coded) values that do produce good sequences.]
>>>> The likelihood of overlap is 1/[the number of different Weyl sequences].
>>>> This is around 2^-63. So it is small. However the base implementation
>>>> for SplitMix uses a specific Weyl sequence increment based on the golden
>>>> ratio. I do not know the original of this. For example is it better that
>>>> any other irrational number converted to an integer, e.g. pi, Euler's
>>>> number (e), sqrt(2)? The irrational number comes from the origin of the
>>>> Weyl sequence which is based on irrationals. But there may be no reason
>>>> that the golden ratio was chosen rather than any other source irrational.
>>> Does the non-overlap weak assurance take into account the fact that
>>> there are no irrational numbers in a truncated representation (like the
>>> one stored in a computer)?
>> I would say that the 'weak assurance'/'very unlikely' statements allude to the fact that no-one has actually tested all the 2^63 different Weyl sequences after they pass through the hash mixing function to see if they output the same sequence as any other, or are weaker as random generators. It would be quite a task. So you cannot say they will not overlap, just that it probably will not. And you cannot say much about the quality of the generator beyond the results of those you have tested. (But this is the same for any large state space generator.)
>>
>>> Isn't it why those "internal" numbers are carefully selected (like in
>>> "TwoCmres"), because they generate better sequences?
>> That uses a multiply, rotate and subtraction on a number. So the multiply and rotate have been tested to maximise the period of the sequence, and/or the quality of the output. With a Weyl sequence it is proven to have a period of 2^64 (for a long) if the increment is odd. However that sequence then has to be mixed to a decent uniform output. I’ve not read the details on the mixing functions and how they are derived or tested to so would not want to comment too much. My guess would be that the mixing of bits in the long is more random if the bits change a lot each step. So using the Golden ratio, that has a high number of state changes across the 64-bit, to create the sequence works well. Using 1, which would not change many bits at each step in the sequence, would be bad.
> I don't feel confident that "Commons RNG" should decide how to generate
> the numbers to be used in place of the one hard-coded in the reference
> algorithm (Golden ratio).
>
> As discussed, we could provide
> 1. "customizable" implementations for which the "Weyl" number
> can be passed at construction (cf. "SPLIT_MIX_63_K"), and
> 2. an *example* code of how good "K" numbers could be generated
> (e.g. taking any number and boosting the number of transitions).
>
>>>> On the subject of splitting verses jumping:
>>>>
>>>> Jumping seems to be a case where the limit of the multi-threaded
>>>> scenario is known beforehand, e.g. a simulation with 1024 parallel
>>>> tasks, each task with a known limit on the count of random numbers
>>>> required. So if your jump length is above the number of random numbers
>>>> required for each task you can jump to get a single generator for each
>>>> task and ensure no overlap.
>>>>
>>>> Split is used when the multi-threaded scenario is not known. This is a
>>>> case for SplittableRandom which is used in parallel algorithms which use
>>>> forking. The point at which a fork will occur and the task size of the
>>>> fork are not known beforehand. The generator is able to create new
>>>> generators for each fork that will very unlikely overlap with any other
>>>> generator currently in use. For SplittableRandom the probability of the
>>>> same generator is about 2^-63. The probability of overlap is reduced
>>>> further as the start point of the sequence is variable too and the
>>>> length of the sequence that is consumed is not known. So if the number
>>>> of forks is much less than 2^63 (which it should be) then split will
>>>> suffice in this situation.
>>> I'm not convinced that we should add complexity to the API without
>>> a use-case that would require
>>> * fork-join,
>>> * a RNG  that cannot be "SplittableRandom”.
>> I think that Splittable as we are discussing it is essentially a mix of functionality:
>>
>> UniformRandomProvider rng
>> Supplier<UniformRandomProvider> rngSupplier
>>
>> The Splittable interface just nicely allows you to create an algorithm that declares that it wants a random generator but may actually want more than one because it may create multiple threads. The number of threads is unknown so it needs to be able to create generators as it goes.
>>
>> For now I am fine with not putting it into the library. I do not see it as essential.
> Thanks.
>
>> It is a nice to have.
> As per the above, given the "SPLIT_MIX_64_K" version of an algorithm, I'd rather
> let the user implement what he needs:
>
> for (int i = 1; i < numThreads; i += 2) {
>      final long k = boostTransitions(i); // User-defined.
>      final UniformRandomProvider rng =
> RandomSource.create(SPLIT_MIX_64_K, k); // Commons RNG.
>      startProcessingThread(rng); // User-defined.
> }
>
>> But as you said it does raise the question of when (and justifying why) we support if for each of the generators.
>>
>> Given that SplittableRandom is designed to support the Java framework, is not extensible (is a final class) and so seems to be a closed solution to fork-join in Java leaving it out of the API until a concrete use case for it occurs seems sensible.
>>
>>>> I have an algorithm that uses a set number of tasks to do random
>>>> splitting of a dataset. The dataset is shuffled before the task so even
>>>> if the same generator was used the output would work. Currently I create
>>>> a new generator to allow parallel computation but I could split a master
>>>> one or use a jumpable. Splitting would be faster than jumping for this
>>>> scenario and perfectly acceptable.
>>> Do you mean that most of the total computing time is taken
>>> by instantiating the generator?
>> No. So speed of splitting verses instantiating the generator is mute. My point was that I just need a new thread safe instance. A copy would be fine. A split would be fine. A new instance would be fine. That algorithm was pre Java 8. I could just update it to use SplittableRandom. It is not using any of the sampling functionality that hands a UniformRandomProvider to another class. The RNG is used for shuffling and subset sampling so just needs nextInt(int).
>>
>> I’ve got a bit of homework to find a case where SplittableRandom comes into its own beyond just offering Supplier<UniformRandomProvider> as part of it’s API.
>>
>>>> This leans me back towards adding a Splittable interface. The conditions
>>>> for a split would be that the split creates a new instance based on the
>>>> current state of the generator and updates the state. This allows split
>>>> to be called repeatedly on the same generator to get new distinct
>>>> instances that will not coincide with very high probability. It would be
>>>> akin to non-synchronized self-construction. This would then be faster
>>>> than using RandomSource.create because that uses synchronized seed
>>>> generation.
>>> If the seed auto-generation is the issue, why not bypass it (by providing
>>> one's own seed)?
>>> Using a generator's state as a seed for another instance seems but a
>>> particular choice from the seed space.
>> I think the reason to use ones own mixed up state as the seed is to ensure that the state of the new generator is different. The implementations I have seen do not actually check this is true so it just a probability gamble anyway. No different from creating a new seed and instance. So it is just for convenience.
> IMO, less risk by leaving it out too.
>
>>>> This opens up more generators for use in recursive forking
>>>> algorithms that may not care if the two forks have the same generator
>>>> but require it to be created fast and thread local. So perhaps do not
>>>> support it for JDK (because it is a bad generator) and rule out those
>>>> with long initialisation routines TwoCmres, MT, etc. and perhaps also
>>>> ruling out those with large states (e.g. Well44497b).
>>> Then we have to explain why "split" is not provided despite it could
>>> be and could be requested by users with a different sense of what
>>> is fast enough for some purpose.
>>>
>>>> On the subject of jumping we have three options:
>>>>
>>>> 1. jump advances the state and returns a copy with the advanced state.
>>>>
>>>> 2. jump advances the state and returns a copy with the old state.
>>>>
>>>> 3. jump advances the state. Add a copy method.
>>>>
>>>> I think we ruled out number 3 as having a copy command is deemed to be
>>>> prone to misuse if using the same copy in multiple places.
>>>>
>>>> Option 1 I think has an issue in that if you do 1 jump and believe you
>>>> have two non-overlapping generators you will be wrong. Take this case:
>>>>
>>>> JumpableUniformRandomProvider rng1 = ...;
>>>>
>>>> UniformRandomProvider rng2 = rng1.jump();
>>>>
>>>> In option 1 rng1 and rng2 are actually the same.
>>>>
>>>> In option 2 rng1 and rng2 are in different states where rng1 is far
>>>> ahead of rng2. So the returned state (rng2) can be documented to not
>>>> overlap the existing generator (rng1) until the jump length has consumed
>>>> by calls to generate numbers.
>>>>
>>>>
>>>> Note that neither option provides a loophole for a poor man's copy.
>>>> Option 1 would create a copy but it would be of a state far ahead
>>>> because it is after a jump. Option 2 creates a copy of the current state
>>>> but then the current state is changed by the jump so you don't have a copy.
>>> I'm not sure my preferred option was mentioned.  After this statement:
>>>     rng2 = rng1.jump();
>>> the state of "rng1" is the same as before the call; the newly-created
>>> "rng2" instance has its state far away.
>> This was mentioned. I’d ruled it out as jumping the same generator would not move it anywhere and always return the same child. The user would have to:
>>
>> JumpableURP rng1 = …
>> JumpableURP rng2 = rng1.jump();
>> JumpableURP rng3 = rng2.jump();
>>
>> Rng1 is where it started.
> Yes; my point was that "jump" could be considered a factory method.

OK so this results in:

/**
  * Some summary.
  */
public interface JumpableUniformRandomProvider extends
UniformRandomProvider {
     /**
      * Creates a copy of the UniformRandomProvider and advances the
state of the copy.
      * The state of the current instance is not altered. The state of
the copy will be
      * advanced an equivalent of {@code n} sequential calls to a method
that updates the
      * state of the provider.
      *
      * @return the copy with an advanced state
      */
     JumpableUniformRandomProvider jump();
}


This can be documented in an implementation as:

public class MyJumpingRNG implements JumpableUniformRandomProvider {
     /**
      * {@inheritDoc}
      *
      * <p>The jump size {@code n} is the equivalent of {@code 2^64}
calls to
      * {@link UniformRandomProvider#nextLong() nextLong()}.
      */
     @Override
     public JumpableUniformRandomProvider jump() {
         // TODO Auto-generated method stub
         return null;
     }
}

Do we add a second interface for LongJumpableUniformRandomProvider?


>> So the options are (in all cases returning the copy):
>>
>> 1. createAndJumpCopy
>> 2. copyAndJumpParent
>> 3. jumpParentAndCopy
>> 4. jump and copy separately
>>
>> 1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately.
> We could provide a convenience method in  "RandomSource":
>
> public UniformRandomProvider[] jump(int n,
> JumpableUniformRandomProvider parent) {
>      final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
>      UniformRandomProvider tmp = parent;
>      for (int i = 0; i < n; i++) {
>          rngs[i] = restrict(tmp);
>          tmp = tmp.jump();
>      }
>      return rngs;
> }

+1. Remove the need for the user to repeat boiler plate code.

Same sort of idea of longJump() too.

>> It is not actually possible to jump forward a single instance. Only children are advanced.
> A feature: There is only one way to alter the state of an instance
> (i.e. a call to "next()").
OK.

>
>> 2. My preferred option*. A copy of the state is made. The current state is advanced and the old state that was left behind is returned. This can be called repeatedly to create new generators that are periodically spaced behind the current generator. Or you can ignore the return value and just use it to jump forward.
> Allows shooting oneself in the foot (e.g. when the return value is not ignored,
> and then used to make another jump, whose sequence will overlap with the
> original parent's).
>
>> 3. The current state is advanced and a copy returned. After a single call to this method you have two generators that are the same.
> Challenge: Find a use-case that is not broken. ;-)
>
>> 4. My other preferred option*. The most flexible API. Jump just advances the state. It is left to the user to decide if they copy before or after the jump. Since the jump will be a void method the user will have to think about what copy they want to use. But it does expose copy functionality.
> I still fail to see the need to be able to do that, if the purpose is to
> allow instantiating RNGs that do not overlap for a definite number
> of calls to "next()".
OK. With the factory method for jumping in RandomSource then it will be
easy to a priory set-up your RNGs for a parallel simulation.

>
>> *A bit of cheating to have two preferences.
>>
>>> I guess this is different from how it's usually provided in C implementations.
>>> But it is more in line with the intended usage: create an instance that will
>>> not overlap with its "parent".  The semantics is "createAndJump()”.
>> In the c implementation you would have to decide when to copy the state, before or after the jump. The jump just advances the state.
> Yes, because it would be a waste of a costly system call to do a copy
> if only a "jump" is needed.  But in Java the copy is much cheaper: Doing
>
> rng.jump(); // "jump" altered the state.
> // Use "rng".
>
> vs.
>
> rng = rng.jump(); // "jump" did not alter the state.
> // Use "rng".
>
> Would not make any difference performance-wise in sensible use-cases (i.e.
> where the sequence is long enough that "jump" was needed in the first place).
>
>> So perhaps adding a copy() to the JumpableUniformRandomProvider interface is the least prone to having design challenges later from a user who wants createAndJump rather than jumpAndCopy type functionality.
> Am I missing something that actually requires "copy"?
No. It was just easy to split the functionality and allow the user to
choose. The above copy and jump will work fine.

>
>>>> To get a copy of the current state would require saving the state using
>>>> the RestoreableUniformRandomProvider, doing either of the jump options
>>>> to get an instance of the same class, then restoring the state to both.
>>> With my proposed option, nothing happens to the "parent"; hence the
>>> above is never necessary IIUC.
>> I feel that createAndJump makes writing in loops or forking less elegant since the controller of the jumps is the one who should have the generator that will be jumped repeatedly:
>>
>> JumpableURP rng1 = …
>> for ( ;; ) {
>>      JumpableURP advancedCopy = rng1.createAndJumpCopy();
>>      forkSomething(rng1);
>>      rng1 = advancedCopy; // To move control forward
>> }
>>
>> JumpableURP rng1 = …
>> for ( ;; ) {
>>      forkSomething(rng1.copyAndJumpParent());
>> }
> With the convenience method proposed above, the difference becomes moot:
>
> UPR[] rngs = RandomSource.jump(12, rng1);
> for (i = 0 ; i < rngs.length; i++) {
>      forkSomething(rngs[i]);
> }
>
>> JumpableURP rng1 = …
>> for ( ;; ) {
>>      forkSomething(rng1.copy());
>>      rng1.jump();
>> }
>>
>> // This one liner is not the same effect as the first fork will have an advanced (jumped) state.
>> JumpableURP rng1 = …
>> for ( ;; ) {
>>      forkSomething(rng1.jump().copy());
>> }
>>
>> Thoughts?
>>
>>
>>>> To rule out doing this via the RestoreableUniformRandomProvider
>>>> functionality would require a method in RandomSource to restrict the
>>>> implementation to just the defined interface like the unrestorable method:
>>>>
>>>> UniformRandomProvider unrestorable(UniformRandomProvider delegate)
>>>>
>>>> Becomes:
>>>>
>>>> UniformRandomProvider restrict(UniformRandomProvider delegate)
>>>> RestorableUniformRandomProvider restrict(RestorableUniformRandomProvider
>>>> delegate)
>>>> JumpableUniformRandomProvider restrict(JumpableUniformRandomProvider
>>>> delegate)
>>>> LongJumpableUniformRandomProvider
>>>> restrict(LongJumpableUniformRandomProvider delegate)
>>>>
>>>> The correct method would have to be selected by casting:
>>>>
>>>> LongJumpableUniformRandomProvider delegate = ...
>>>> JumpableUniformRandomProvider restrict((JumpableUniformRandomProvider)
>>>> delegate)
>>>>
>>>> Or do not use overloaded methods and (with A, B, C, D appropriately named):
>>>>
>>>> UniformRandomProvider restrictA(UniformRandomProvider delegate)
>>>> RestorableUniformRandomProvider
>>>> restrictB(RestorableUniformRandomProvider delegate)
>>>> JumpableUniformRandomProvider restrictC(JumpableUniformRandomProvider
>>>> delegate)
>>>> LongJumpableUniformRandomProvider
>>>> restrictD(LongJumpableUniformRandomProvider delegate)
>>>>
>>>> A move to Java 8 would allow the restrict method for the interface to be
>>>> added as a static method to the interface itself which would be neater.
>>>>
>>> I think that we could discuss "restrict" options separately.
>> OK. Let’s sort out exactly what the Jump interface will be. Adding a restrict option is easy in practice but the choice not so obvious. A look through the Java API for similar things would help.
>>
>> For example I have found methods:
>>
>> Executors.unconfigurableXYZ  (Java 1.8)
>> Collections.synchronizedX
>> Collections.unmodifiableX
>>
>> So ‘unconfigurable’ seems to match.
> Not so sure.  At this point, I prefer "restrict" (to the URP interface).
OK. As long as it is clear from the method name what happens.

>
>> It stops any configuration of the executor implementation and just exposes the pure Executor interface. The Collections methods either add some function (synchronized) or take away functionality (unmodifiable).
>>
>> A further search through the Java API is for another day.
>>
> Best,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
Hello Alex.

>>> [...]
>
> OK so this results in:
>
> /**
>   * Some summary.
>   */
> public interface JumpableUniformRandomProvider extends
> UniformRandomProvider {
>      /**
>       * Creates a copy of the UniformRandomProvider and advances the
> state of the copy.
>       * The state of the current instance is not altered. The state of
> the copy will be
>       * advanced an equivalent of {@code n} sequential calls to a method
> that updates the
>       * state of the provider.
>       *
>       * @return the copy with an advanced state
>       */
>      JumpableUniformRandomProvider jump();
> }
>

+1
[Clean and lean: and no side-effects to explain...]

>
> This can be documented in an implementation as:
>
> public class MyJumpingRNG implements JumpableUniformRandomProvider {
>      /**
>       * {@inheritDoc}
>       *
>       * <p>The jump size {@code n} is the equivalent of {@code 2^64}
> calls to
>       * {@link UniformRandomProvider#nextLong() nextLong()}.
>       */
>      @Override
>      public JumpableUniformRandomProvider jump() {
>          // TODO Auto-generated method stub
>          return null;
>      }
> }

+1

>
> Do we add a second interface for LongJumpableUniformRandomProvider?

Sure, if the functionality is provided by some of the algorithms implemented
in [RNG].
But let's have the two functionalities in separate commits.

>
> >> So the options are (in all cases returning the copy):
> >>
> >> 1. createAndJumpCopy
> >> 2. copyAndJumpParent
> >> 3. jumpParentAndCopy
> >> 4. jump and copy separately
> >>
> >> 1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately.
> > We could provide a convenience method in  "RandomSource":
> >
> > public UniformRandomProvider[] jump(int n,
> > JumpableUniformRandomProvider parent) {
> >      final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
> >      UniformRandomProvider tmp = parent;
> >      for (int i = 0; i < n; i++) {
> >          rngs[i] = restrict(tmp);
> >          tmp = tmp.jump();
> >      }
> >      return rngs;
> > }
>
> +1. Remove the need for the user to repeat boiler plate code.
>
> Same sort of idea of longJump() too.

+1

> >> It is not actually possible to jump forward a single instance. Only children are advanced.
> > A feature: There is only one way to alter the state of an instance
> > (i.e. a call to "next()").
> OK.

Great. :-)

Gilles

> >
>>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert


> On 18 Apr 2019, at 14:12, Gilles Sadowski <[hidden email]> wrote:
>
> Hello Alex.
>
>>>> [...]
>>
>> OK so this results in:
>>
>> /**
>>  * Some summary.
>>  */
>> public interface JumpableUniformRandomProvider extends
>> UniformRandomProvider {
>>     /**
>>      * Creates a copy of the UniformRandomProvider and advances the
>> state of the copy.
>>      * The state of the current instance is not altered. The state of
>> the copy will be
>>      * advanced an equivalent of {@code n} sequential calls to a method
>> that updates the
>>      * state of the provider.
>>      *
>>      * @return the copy with an advanced state
>>      */
>>     JumpableUniformRandomProvider jump();
>> }
>>
>
> +1
> [Clean and lean: and no side-effects to explain...]
>
>>
>> This can be documented in an implementation as:
>>
>> public class MyJumpingRNG implements JumpableUniformRandomProvider {
>>     /**
>>      * {@inheritDoc}
>>      *
>>      * <p>The jump size {@code n} is the equivalent of {@code 2^64}
>> calls to
>>      * {@link UniformRandomProvider#nextLong() nextLong()}.
>>      */
>>     @Override
>>     public JumpableUniformRandomProvider jump() {
>>         // TODO Auto-generated method stub
>>         return null;
>>     }
>> }
>
> +1
>
>>
>> Do we add a second interface for LongJumpableUniformRandomProvider?
>
> Sure, if the functionality is provided by some of the algorithms implemented
> in [RNG].
> But let's have the two functionalities in separate commits.
>
>>
>>>> So the options are (in all cases returning the copy):
>>>>
>>>> 1. createAndJumpCopy
>>>> 2. copyAndJumpParent
>>>> 3. jumpParentAndCopy
>>>> 4. jump and copy separately
>>>>
>>>> 1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately.
>>> We could provide a convenience method in  "RandomSource":
>>>
>>> public UniformRandomProvider[] jump(int n,
>>> JumpableUniformRandomProvider parent) {
>>>     final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
>>>     UniformRandomProvider tmp = parent;
>>>     for (int i = 0; i < n; i++) {
>>>         rngs[i] = restrict(tmp);
>>>         tmp = tmp.jump();
>>>     }
>>>     return rngs;
>>> }
>>
>> +1. Remove the need for the user to repeat boiler plate code.
>>
>> Same sort of idea of longJump() too.
>
> +1
>
>>>> It is not actually possible to jump forward a single instance. Only children are advanced.
>>> A feature: There is only one way to alter the state of an instance
>>> (i.e. a call to "next()").
>> OK.
>
> Great. :-)
>
> Gilles

This sounds like a resolution. I will put the ideas into a Jira ticket for Jumpable.

I am a bit busy at the moment with other mini-projects (updates to nextInt(int) being the main one, Poisson samplers (again) being another leading to a family of log normal based distributions that may be supported using cumulative probability look-up tables) but will hope to get this done soon. The actual implementation should be quite easy.

Here’s one for you to think about on the subject of Jumpable. What about support for the generators that can be advanced by a user specified increment? For example the SplitMix algorithm is based on a sequence and so can be advanced from 1 to 2^64-1 steps. It does seem strange to support this (if we add jumpable to SplitMix) using only one specific jump distance. A Skippable can do this:

SkippableURP {
    public SkippableURP skip(long steps);

    // or

    public SkippableURP skipPower2(long power);
}

Too much?

>
>>>
>>>> [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
Le jeu. 18 avr. 2019 à 21:53, Alex Herbert <[hidden email]> a écrit :

>
>
>
> > On 18 Apr 2019, at 14:12, Gilles Sadowski <[hidden email]> wrote:
> >
> > Hello Alex.
> >
> >>>> [...]
> >>
> >> OK so this results in:
> >>
> >> /**
> >>  * Some summary.
> >>  */
> >> public interface JumpableUniformRandomProvider extends
> >> UniformRandomProvider {
> >>     /**
> >>      * Creates a copy of the UniformRandomProvider and advances the
> >> state of the copy.
> >>      * The state of the current instance is not altered. The state of
> >> the copy will be
> >>      * advanced an equivalent of {@code n} sequential calls to a method
> >> that updates the
> >>      * state of the provider.
> >>      *
> >>      * @return the copy with an advanced state
> >>      */
> >>     JumpableUniformRandomProvider jump();
> >> }
> >>
> >
> > +1
> > [Clean and lean: and no side-effects to explain...]
> >
> >>
> >> This can be documented in an implementation as:
> >>
> >> public class MyJumpingRNG implements JumpableUniformRandomProvider {
> >>     /**
> >>      * {@inheritDoc}
> >>      *
> >>      * <p>The jump size {@code n} is the equivalent of {@code 2^64}
> >> calls to
> >>      * {@link UniformRandomProvider#nextLong() nextLong()}.
> >>      */
> >>     @Override
> >>     public JumpableUniformRandomProvider jump() {
> >>         // TODO Auto-generated method stub
> >>         return null;
> >>     }
> >> }
> >
> > +1
> >
> >>
> >> Do we add a second interface for LongJumpableUniformRandomProvider?
> >
> > Sure, if the functionality is provided by some of the algorithms implemented
> > in [RNG].
> > But let's have the two functionalities in separate commits.
> >
> >>
> >>>> So the options are (in all cases returning the copy):
> >>>>
> >>>> 1. createAndJumpCopy
> >>>> 2. copyAndJumpParent
> >>>> 3. jumpParentAndCopy
> >>>> 4. jump and copy separately
> >>>>
> >>>> 1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately.
> >>> We could provide a convenience method in  "RandomSource":
> >>>
> >>> public UniformRandomProvider[] jump(int n,
> >>> JumpableUniformRandomProvider parent) {
> >>>     final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
> >>>     UniformRandomProvider tmp = parent;
> >>>     for (int i = 0; i < n; i++) {
> >>>         rngs[i] = restrict(tmp);
> >>>         tmp = tmp.jump();
> >>>     }
> >>>     return rngs;
> >>> }
> >>
> >> +1. Remove the need for the user to repeat boiler plate code.
> >>
> >> Same sort of idea of longJump() too.
> >
> > +1
> >
> >>>> It is not actually possible to jump forward a single instance. Only children are advanced.
> >>> A feature: There is only one way to alter the state of an instance
> >>> (i.e. a call to "next()").
> >> OK.
> >
> > Great. :-)
> >
> > Gilles
>
> This sounds like a resolution. I will put the ideas into a Jira ticket for Jumpable.

Thanks.

>
> I am a bit busy at the moment with other mini-projects (updates to nextInt(int) being the main one, Poisson samplers (again) being another leading to a family of log normal based distributions that may be supported using cumulative probability look-up tables) but will hope to get this done soon. The actual implementation should be quite easy.
>
> Here’s one for you to think about on the subject of Jumpable. What about support for the generators that can be advanced by a user specified increment? For example the SplitMix algorithm is based on a sequence and so can be advanced from 1 to 2^64-1 steps. It does seem strange to support this (if we add jumpable to SplitMix) using only one specific jump distance. A Skippable can do this:
>
> SkippableURP {
>     public SkippableURP skip(long steps);
>
>     // or
>
>     public SkippableURP skipPower2(long power);
> }
>
> Too much?

You read my mind. ;-)
What would be the uses of "short" jumps (i.e. having small
non-overlapping sequences
from many instances, rather than a longer one from a single instance)?
IIUC, the hard-coded jump sizes in existing implementations seem a compromise,
based on the number of potentially concurrent threads, or independent
simulations.
Increasing that number does not seem necessary for the mid-term.

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert


> On 19 Apr 2019, at 15:30, Gilles Sadowski <[hidden email]> wrote:
>
> Le jeu. 18 avr. 2019 à 21:53, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit :
>>
>>
>>
>>> On 18 Apr 2019, at 14:12, Gilles Sadowski <[hidden email]> wrote:
>>>
>>> Hello Alex.
>>>
>>>>>> [...]
>>>>
>>>> OK so this results in:
>>>>
>>>> /**
>>>> * Some summary.
>>>> */
>>>> public interface JumpableUniformRandomProvider extends
>>>> UniformRandomProvider {
>>>>    /**
>>>>     * Creates a copy of the UniformRandomProvider and advances the
>>>> state of the copy.
>>>>     * The state of the current instance is not altered. The state of
>>>> the copy will be
>>>>     * advanced an equivalent of {@code n} sequential calls to a method
>>>> that updates the
>>>>     * state of the provider.
>>>>     *
>>>>     * @return the copy with an advanced state
>>>>     */
>>>>    JumpableUniformRandomProvider jump();
>>>> }
>>>>
>>>
>>> +1
>>> [Clean and lean: and no side-effects to explain...]
>>>
>>>>
>>>> This can be documented in an implementation as:
>>>>
>>>> public class MyJumpingRNG implements JumpableUniformRandomProvider {
>>>>    /**
>>>>     * {@inheritDoc}
>>>>     *
>>>>     * <p>The jump size {@code n} is the equivalent of {@code 2^64}
>>>> calls to
>>>>     * {@link UniformRandomProvider#nextLong() nextLong()}.
>>>>     */
>>>>    @Override
>>>>    public JumpableUniformRandomProvider jump() {
>>>>        // TODO Auto-generated method stub
>>>>        return null;
>>>>    }
>>>> }
>>>
>>> +1
>>>
>>>>
>>>> Do we add a second interface for LongJumpableUniformRandomProvider?
>>>
>>> Sure, if the functionality is provided by some of the algorithms implemented
>>> in [RNG].
>>> But let's have the two functionalities in separate commits.
>>>
>>>>
>>>>>> So the options are (in all cases returning the copy):
>>>>>>
>>>>>> 1. createAndJumpCopy
>>>>>> 2. copyAndJumpParent
>>>>>> 3. jumpParentAndCopy
>>>>>> 4. jump and copy separately
>>>>>>
>>>>>> 1. Your preferred option. A copy of the state is made. The state is advanced in the copy and returned. But when called repeatedly it will get the same generator and code must be organised appropriately.
>>>>> We could provide a convenience method in  "RandomSource":
>>>>>
>>>>> public UniformRandomProvider[] jump(int n,
>>>>> JumpableUniformRandomProvider parent) {
>>>>>    final UniformRandomProvider[] rngs = new UniformRandomProvider[n];
>>>>>    UniformRandomProvider tmp = parent;
>>>>>    for (int i = 0; i < n; i++) {
>>>>>        rngs[i] = restrict(tmp);
>>>>>        tmp = tmp.jump();
>>>>>    }
>>>>>    return rngs;
>>>>> }
>>>>
>>>> +1. Remove the need for the user to repeat boiler plate code.
>>>>
>>>> Same sort of idea of longJump() too.
>>>
>>> +1
>>>
>>>>>> It is not actually possible to jump forward a single instance. Only children are advanced.
>>>>> A feature: There is only one way to alter the state of an instance
>>>>> (i.e. a call to "next()").
>>>> OK.
>>>
>>> Great. :-)
>>>
>>> Gilles
>>
>> This sounds like a resolution. I will put the ideas into a Jira ticket for Jumpable.
>
> Thanks.
>
>>
>> I am a bit busy at the moment with other mini-projects (updates to nextInt(int) being the main one, Poisson samplers (again) being another leading to a family of log normal based distributions that may be supported using cumulative probability look-up tables) but will hope to get this done soon. The actual implementation should be quite easy.
>>
>> Here’s one for you to think about on the subject of Jumpable. What about support for the generators that can be advanced by a user specified increment? For example the SplitMix algorithm is based on a sequence and so can be advanced from 1 to 2^64-1 steps. It does seem strange to support this (if we add jumpable to SplitMix) using only one specific jump distance. A Skippable can do this:
>>
>> SkippableURP {
>>    public SkippableURP skip(long steps);
>>
>>    // or
>>
>>    public SkippableURP skipPower2(long power);
>> }
>>
>> Too much?
>
> You read my mind. ;-)
> What would be the uses of "short" jumps (i.e. having small
> non-overlapping sequences

I could not think of one. Just wanted to raise the suggestion. It would not be supported for many generators anyway (currently only SplitMix, maybe more in the future).

> from many instances, rather than a longer one from a single instance)?
> IIUC, the hard-coded jump sizes in existing implementations seem a compromise,
> based on the number of potentially concurrent threads, or independent
> simulations.
> Increasing that number does not seem necessary for the mid-term.
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] <mailto:[hidden email]>
> For additional commands, e-mail: [hidden email] <mailto:[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert
In reply to this post by Gilles Sadowski-2
I have created RNG-97 and RNG-98 for Jump and LongJump.

Please take a look and comment.

The documentation highlights the implementation detail that a jump or long jump creates a copy that is far ahead. The original generator is not effected.

The use case is thus:

rng1 = …;
rng2 = rng1.jump();
rng3 = rng2.jump();
rng4 = rng3.jump();

As opposed to:

rng1 = …;
rng2 = rng1.jump();
rng3 = rng1.jump();
rng4 = rng1.jump();

Where rng1 will be advanced each time leaving behind a copy generator.

In either case it will be an overlap problem if any of the children are then used for jumping. So as long as the documentation is clear then this is OK. The helper method to create a jump series (or long jump series) in RandomSource seems the best way to avoid incorrect usage.

Alex




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
Hi.

Le sam. 27 avr. 2019 à 15:05, Alex Herbert <[hidden email]> a écrit :

>
> I have created RNG-97 and RNG-98 for Jump and LongJump.
>
> Please take a look and comment.
>
> The documentation highlights the implementation detail that a jump or long jump creates a copy that is far ahead. The original generator is not effected.
>
> The use case is thus:
>
> rng1 = …;
> rng2 = rng1.jump();
> rng3 = rng2.jump();
> rng4 = rng3.jump();
>
> As opposed to:
>
> rng1 = …;
> rng2 = rng1.jump();
> rng3 = rng1.jump();
> rng4 = rng1.jump();
>
> Where rng1 will be advanced each time leaving behind a copy generator.
>
> In either case it will be an overlap problem if any of the children are then used for jumping. So as long as the documentation is clear then this is OK. The helper method to create a jump series (or long jump series) in RandomSource seems the best way to avoid incorrect usage.

+1

I think that the default should be to prevent a "jump" on the returned
instances.
An overload could be defined with a parameter (e.g. "allowFurtherJump") but I'd
leave it out until it is requested based on an actual use-case.

Best,
Gilles

>
> Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert


> On 27 Apr 2019, at 14:49, Gilles Sadowski <[hidden email]> wrote:
>
> Hi.
>
> Le sam. 27 avr. 2019 à 15:05, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit :
>>
>> I have created RNG-97 and RNG-98 for Jump and LongJump.
>>
>> Please take a look and comment.
>>
>> The documentation highlights the implementation detail that a jump or long jump creates a copy that is far ahead. The original generator is not effected.
>>
>> The use case is thus:
>>
>> rng1 = …;
>> rng2 = rng1.jump();
>> rng3 = rng2.jump();
>> rng4 = rng3.jump();
>>
>> As opposed to:
>>
>> rng1 = …;
>> rng2 = rng1.jump();
>> rng3 = rng1.jump();
>> rng4 = rng1.jump();
>>
>> Where rng1 will be advanced each time leaving behind a copy generator.
>>
>> In either case it will be an overlap problem if any of the children are then used for jumping. So as long as the documentation is clear then this is OK. The helper method to create a jump series (or long jump series) in RandomSource seems the best way to avoid incorrect usage.
>
> +1
>
> I think that the default should be to prevent a "jump" on the returned
> instances.
> An overload could be defined with a parameter (e.g. "allowFurtherJump") but I'd
> leave it out until it is requested based on an actual use-case.

I presume you are talking about the helper method in RandomSource.

However it does open the possibility instead of this:

JumpableUniformRandomProvider {
    UniformRandomProvider jump();
}

This only works if the state is modified for the current instance to allow chaining jumps.

Having typed all this up into a summary for the two tickets I feel that they implement the idea in the wrong way. I think the jump should advance the state of the current generator. This is the master generator created and used in the high level code that controls the number of jumps that are required. The returned copy should be a copy of where the generator was. The copy should not be used for further jumps. In this way the interface for jump could be made to return a UniformRandomProvider.

When done like that the jumpable RNG is the only thing you need to hold a reference to. And you can later decide (perhaps dynamically) if you need to do some more jumps to get another series. Each call to jump moves the master along and leaves behind a RNG that can be used for a set number of cycles (the jump length). So you can do:

JumpableUniformRandomProvider rng = …;

UniformRandomProvider[] series1 = RandomSource.createJumpSeries(rng);
// Do work with series1 and then maybe
UniformRandomProvider[] series2 = RandomSource.createJumpSeries(rng);
// Do work with series2, etc
UniformRandomProvider[] series3 = RandomSource.createJumpSeries(rng);

Or

JumpableUniformRandomProvider masterRng = …;

ExecutorService executor = Executors.newCachedThreadPool();
ArrayList<Future<Result>> futures = new ArrayList<>();
for (Input input : inputs) {
    final UniformRandomProvider rng = masterRng.jump();
    futures.add(executor.submit(new Callable<Result>() {
        // Do something random with rng, then
        return new Result(...);
    }));
}

The later example uses ‘inputs’ as something where perhaps the size is not known such as an Iterable or likewise in Java 8 it could be written to consume a Stream.

Similarly the LongJumpableUniformRandomProvider interface can return a JumpableUniformRandomProvider so preventing the result from being used for another long jump but it can be used for (short) jumps.

Have a think on use cases but my feeling is that the interface is more powerful if you do advance the state and leave copies behind, rather than creating future copies which must be chained together to create a series.

Alex



>
> Best,
> Gilles
>
>>
>> Alex
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] <mailto:[hidden email]>
> For additional commands, e-mail: [hidden email] <mailto:[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
Hello.

>
>
> > On 27 Apr 2019, at 14:49, Gilles Sadowski <[hidden email]> wrote:
> >
> > Hi.
> >
> > Le sam. 27 avr. 2019 à 15:05, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit :
> >>
> >> I have created RNG-97 and RNG-98 for Jump and LongJump.
> >>
> >> Please take a look and comment.
> >>
> >> The documentation highlights the implementation detail that a jump or long jump creates a copy that is far ahead. The original generator is not effected.
> >>
> >> The use case is thus:
> >>
> >> rng1 = …;
> >> rng2 = rng1.jump();
> >> rng3 = rng2.jump();
> >> rng4 = rng3.jump();
> >>
> >> As opposed to:
> >>
> >> rng1 = …;
> >> rng2 = rng1.jump();
> >> rng3 = rng1.jump();
> >> rng4 = rng1.jump();
> >>
> >> Where rng1 will be advanced each time leaving behind a copy generator.
> >>
> >> In either case it will be an overlap problem if any of the children are then used for jumping. So as long as the documentation is clear then this is OK. The helper method to create a jump series (or long jump series) in RandomSource seems the best way to avoid incorrect usage.
> >
> > +1
> >
> > I think that the default should be to prevent a "jump" on the returned
> > instances.
> > An overload could be defined with a parameter (e.g. "allowFurtherJump") but I'd
> > leave it out until it is requested based on an actual use-case.
>
> I presume you are talking about the helper method in RandomSource.
>
> However it does open the possibility instead of this:
>
> JumpableUniformRandomProvider {
>     UniformRandomProvider jump();
> }
>
> This only works if the state is modified for the current instance to allow chaining jumps.
>
> Having typed all this up into a summary for the two tickets I feel that they implement the idea in the wrong way. I think the jump should advance the state of the current generator. This is the master generator created and used in the high level code that controls the number of jumps that are required. The returned copy should be a copy of where the generator was. The copy should not be used for further jumps. In this way the interface for jump could be made to return a UniformRandomProvider.
>
> When done like that the jumpable RNG is the only thing you need to hold a reference to. And you can later decide (perhaps dynamically) if you need to do some more jumps to get another series. Each call to jump moves the master along and leaves behind a RNG that can be used for a set number of cycles (the jump length). So you can do:
>
> JumpableUniformRandomProvider rng = …;
>
> UniformRandomProvider[] series1 = RandomSource.createJumpSeries(rng);
> // Do work with series1 and then maybe
> UniformRandomProvider[] series2 = RandomSource.createJumpSeries(rng);
> // Do work with series2, etc
> UniformRandomProvider[] series3 = RandomSource.createJumpSeries(rng);
>
> Or
>
> JumpableUniformRandomProvider masterRng = …;
>
> ExecutorService executor = Executors.newCachedThreadPool();
> ArrayList<Future<Result>> futures = new ArrayList<>();
> for (Input input : inputs) {
>     final UniformRandomProvider rng = masterRng.jump();
>     futures.add(executor.submit(new Callable<Result>() {
>         // Do something random with rng, then
>         return new Result(...);
>     }));
> }
>
> The later example uses ‘inputs’ as something where perhaps the size is not known such as an Iterable or likewise in Java 8 it could be written to consume a Stream.

That's a convincing example!

> Similarly the LongJumpableUniformRandomProvider interface can return a JumpableUniformRandomProvider so preventing the result from being used for another long jump but it can be used for (short) jumps.
>
> Have a think on use cases but my feeling is that the interface is more powerful if you do advance the state and leave copies behind, rather than creating future copies which must be chained together to create a series.

OK to change the perspective. ;-)

Gilles

>
> Alex
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Bernd Eckenfels
Hello,

Just a question, I am unclear on the terminology, is „jump“ (did I miss the discussion leading toot?) something invented here? It sounds to me like this is a generator where the state can be cloned and it is „seekable“. It probably makes sense to have those two dimensions separated anyway.

Gruss
Bernd


--
http://bernd.eckenfels.net

________________________________
Von: Gilles Sadowski <[hidden email]>
Gesendet: Sonntag, April 28, 2019 12:34 AM
An: Commons Developers List
Betreff: Re: [rng] Split and Jump functions

Hello.

>
>
> > On 27 Apr 2019, at 14:49, Gilles Sadowski <[hidden email]> wrote:
> >
> > Hi.
> >
> > Le sam. 27 avr. 2019 à 15:05, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit :
> >>
> >> I have created RNG-97 and RNG-98 for Jump and LongJump.
> >>
> >> Please take a look and comment.
> >>
> >> The documentation highlights the implementation detail that a jump or long jump creates a copy that is far ahead. The original generator is not effected.
> >>
> >> The use case is thus:
> >>
> >> rng1 = …;
> >> rng2 = rng1.jump();
> >> rng3 = rng2.jump();
> >> rng4 = rng3.jump();
> >>
> >> As opposed to:
> >>
> >> rng1 = …;
> >> rng2 = rng1.jump();
> >> rng3 = rng1.jump();
> >> rng4 = rng1.jump();
> >>
> >> Where rng1 will be advanced each time leaving behind a copy generator.
> >>
> >> In either case it will be an overlap problem if any of the children are then used for jumping. So as long as the documentation is clear then this is OK. The helper method to create a jump series (or long jump series) in RandomSource seems the best way to avoid incorrect usage.
> >
> > +1
> >
> > I think that the default should be to prevent a "jump" on the returned
> > instances.
> > An overload could be defined with a parameter (e.g. "allowFurtherJump") but I'd
> > leave it out until it is requested based on an actual use-case.
>
> I presume you are talking about the helper method in RandomSource.
>
> However it does open the possibility instead of this:
>
> JumpableUniformRandomProvider {
> UniformRandomProvider jump();
> }
>
> This only works if the state is modified for the current instance to allow chaining jumps.
>
> Having typed all this up into a summary for the two tickets I feel that they implement the idea in the wrong way. I think the jump should advance the state of the current generator. This is the master generator created and used in the high level code that controls the number of jumps that are required. The returned copy should be a copy of where the generator was. The copy should not be used for further jumps. In this way the interface for jump could be made to return a UniformRandomProvider.
>
> When done like that the jumpable RNG is the only thing you need to hold a reference to. And you can later decide (perhaps dynamically) if you need to do some more jumps to get another series. Each call to jump moves the master along and leaves behind a RNG that can be used for a set number of cycles (the jump length). So you can do:
>
> JumpableUniformRandomProvider rng = …;
>
> UniformRandomProvider[] series1 = RandomSource.createJumpSeries(rng);
> // Do work with series1 and then maybe
> UniformRandomProvider[] series2 = RandomSource.createJumpSeries(rng);
> // Do work with series2, etc
> UniformRandomProvider[] series3 = RandomSource.createJumpSeries(rng);
>
> Or
>
> JumpableUniformRandomProvider masterRng = …;
>
> ExecutorService executor = Executors.newCachedThreadPool();
> ArrayList<Future<Result>> futures = new ArrayList<>();
> for (Input input : inputs) {
> final UniformRandomProvider rng = masterRng.jump();
> futures.add(executor.submit(new Callable<Result>() {
> // Do something random with rng, then
> return new Result(...);
> }));
> }
>
> The later example uses ‘inputs’ as something where perhaps the size is not known such as an Iterable or likewise in Java 8 it could be written to consume a Stream.

That's a convincing example!

> Similarly the LongJumpableUniformRandomProvider interface can return a JumpableUniformRandomProvider so preventing the result from being used for another long jump but it can be used for (short) jumps.
>
> Have a think on use cases but my feeling is that the interface is more powerful if you do advance the state and leave copies behind, rather than creating future copies which must be chained together to create a series.

OK to change the perspective. ;-)

Gilles

>
> Alex
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Alex Herbert


> On 28 Apr 2019, at 00:59, Bernd Eckenfels <[hidden email]> wrote:
>
> Hello,
>
> Just a question, I am unclear on the terminology, is „jump“ (did I miss the discussion leading toot?) something invented here? It sounds to me like this is a generator where the state can be cloned and it is „seekable“. It probably makes sense to have those two dimensions separated anyway.

Hi Bernd, thanks for the input.

This thread started with the definition:
Jump:

To create a new instance of the generator that is deterministically based on the state of the current instance but advanced a set number of iterations.


However it is not required to create a new instance at the same time as jumping. You are correct in that this is two functionalities:

1. Jump forward in the sequence
2. Copy

However the two are coupled. Having jump on its own is useless (why move forward in the sequence without using it?). So a copy needs to be created somewhere before/after the jump.

The idea of a jump is to create a series of the generator at different points in the state. The generators can be used for parallel computations and will be ensured to not overlap in their output sequence for number of outputs skipped by the jump length.

FYI. The generators that support this have jump sizes of 2^64, 96, 128, 192, 256 and 512. So this is a lot of output sequence to jump.

Copy on its own works but for what purpose? If you want a second generator at the moment you just create a new one (with a different seed). Duplicate copies of generators is prone to potential pitfalls where simulations are not as random as you intend. For a special use case where you wish to run multiple simulations with the same generator you can use the Restorable interface to save the state of one and re-create it in other instances.

The current thread came to the choice of:

>>> So the options are (in all cases returning the copy):
>>>
>>> 1. createAndJumpCopy
>>> 2. copyAndJumpParent
>>> 3. jumpParentAndCopy
>>> 4. jump and copy separately

Jump and copy separately was ruled out to discourage misuse of copy.

The current suggestion is 1. Create a copy and jump that ahead. The current instance is not affected.

I now consider this to be weaker for a variety of use cases than 2. This copies the current state for use and then jumps the parent ahead. So this alters the state of the parent generator.

Note that all other methods of a generator alter its state. So having jump alter its state is reasonable.

The most flexible API is to separate jump and copy into two methods. We can still support helper functions that take in a Jumpable generator and create a jump series of generators for parallel work. Separating jump and copy allows the functionality to be used in a larger number of ways than any other interface that attempts to combine jump and copy.

I am fine with having separate jump and copy. If so the copy method, being part of the Jumpable interface, will be functionally coupled with the jump method and should be described in Javadoc with the intended purpose to use it to copy the parent state either before or after a jump into a child generator.

As a precursor this API is very flexible:

JumpableUniformRandomProvider extends UniformRandomProvider {
    /** Jump and return same instance. */
    JumpableUniformRandomProvider jump();
    /** Copy the instance. */
    JumpableUniformRandomProvider copy();
}

Returning the same instance in jump() allows method chaining such as either:

rng.jump().copy();
rng.copy().jump();

This potential pitfall is that a user may do this:

UniformRandomProvider rng1 = rng.copy().jump();
UniformRandomProvider rng2 = rng.copy().jump();

Where rng1 and 2 will be the same, 1 jump ahead of the parent state. Or:

UniformRandomProvider rng1 = rng.jump();
UniformRandomProvider rng2 = rng.jump();

Where rng, rng1 and rng2 are the same instance all 2 jumps ahead of the start point.

I think our purpose is to provide an API for the generators that can jump and that is not too restrictive given the use cases we have so far thought up. There may be other ideas of use cases that cannot be done with the coupled functionality of:

JumpableUniformRandomProvider extends UniformRandomProvider {
    /** Copy the instance, then jump ahead. Return the copy of the previous state. */
    JumpableUniformRandomProvider jump();
}

JumpableUniformRandomProvider extends UniformRandomProvider {
    /** Copy the instance, then jump the copy ahead. Return the copy. The current instance is not affected. */
    JumpableUniformRandomProvider jump();
}

So the split functions without allowing method chaining:

JumpableUniformRandomProvider extends UniformRandomProvider {
    /** Jump the current instance ahead. */
    void jump();
    /** Copy the instance. This is intended to be used either before or after a call to jump()
     * to create a series of generators. */
    JumpableUniformRandomProvider copy();
}

WDYT?

Alex


>
> Gruss
> Bernd
>
>
> --
> http://bernd.eckenfels.net
>
> ________________________________
> Von: Gilles Sadowski <[hidden email]>
> Gesendet: Sonntag, April 28, 2019 12:34 AM
> An: Commons Developers List
> Betreff: Re: [rng] Split and Jump functions
>
> Hello.
>
>>
>>
>>> On 27 Apr 2019, at 14:49, Gilles Sadowski <[hidden email]> wrote:
>>>
>>> Hi.
>>>
>>> Le sam. 27 avr. 2019 à 15:05, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit :
>>>>
>>>> I have created RNG-97 and RNG-98 for Jump and LongJump.
>>>>
>>>> Please take a look and comment.
>>>>
>>>> The documentation highlights the implementation detail that a jump or long jump creates a copy that is far ahead. The original generator is not effected.
>>>>
>>>> The use case is thus:
>>>>
>>>> rng1 = …;
>>>> rng2 = rng1.jump();
>>>> rng3 = rng2.jump();
>>>> rng4 = rng3.jump();
>>>>
>>>> As opposed to:
>>>>
>>>> rng1 = …;
>>>> rng2 = rng1.jump();
>>>> rng3 = rng1.jump();
>>>> rng4 = rng1.jump();
>>>>
>>>> Where rng1 will be advanced each time leaving behind a copy generator.
>>>>
>>>> In either case it will be an overlap problem if any of the children are then used for jumping. So as long as the documentation is clear then this is OK. The helper method to create a jump series (or long jump series) in RandomSource seems the best way to avoid incorrect usage.
>>>
>>> +1
>>>
>>> I think that the default should be to prevent a "jump" on the returned
>>> instances.
>>> An overload could be defined with a parameter (e.g. "allowFurtherJump") but I'd
>>> leave it out until it is requested based on an actual use-case.
>>
>> I presume you are talking about the helper method in RandomSource.
>>
>> However it does open the possibility instead of this:
>>
>> JumpableUniformRandomProvider {
>> UniformRandomProvider jump();
>> }
>>
>> This only works if the state is modified for the current instance to allow chaining jumps.
>>
>> Having typed all this up into a summary for the two tickets I feel that they implement the idea in the wrong way. I think the jump should advance the state of the current generator. This is the master generator created and used in the high level code that controls the number of jumps that are required. The returned copy should be a copy of where the generator was. The copy should not be used for further jumps. In this way the interface for jump could be made to return a UniformRandomProvider.
>>
>> When done like that the jumpable RNG is the only thing you need to hold a reference to. And you can later decide (perhaps dynamically) if you need to do some more jumps to get another series. Each call to jump moves the master along and leaves behind a RNG that can be used for a set number of cycles (the jump length). So you can do:
>>
>> JumpableUniformRandomProvider rng = …;
>>
>> UniformRandomProvider[] series1 = RandomSource.createJumpSeries(rng);
>> // Do work with series1 and then maybe
>> UniformRandomProvider[] series2 = RandomSource.createJumpSeries(rng);
>> // Do work with series2, etc
>> UniformRandomProvider[] series3 = RandomSource.createJumpSeries(rng);
>>
>> Or
>>
>> JumpableUniformRandomProvider masterRng = …;
>>
>> ExecutorService executor = Executors.newCachedThreadPool();
>> ArrayList<Future<Result>> futures = new ArrayList<>();
>> for (Input input : inputs) {
>> final UniformRandomProvider rng = masterRng.jump();
>> futures.add(executor.submit(new Callable<Result>() {
>> // Do something random with rng, then
>> return new Result(...);
>> }));
>> }
>>
>> The later example uses ‘inputs’ as something where perhaps the size is not known such as an Iterable or likewise in Java 8 it could be written to consume a Stream.
>
> That's a convincing example!
>
>> Similarly the LongJumpableUniformRandomProvider interface can return a JumpableUniformRandomProvider so preventing the result from being used for another long jump but it can be used for (short) jumps.
>>
>> Have a think on use cases but my feeling is that the interface is more powerful if you do advance the state and leave copies behind, rather than creating future copies which must be chained together to create a series.
>
> OK to change the perspective. ;-)
>
> Gilles
>
>>
>> Alex
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

Reply | Threaded
Open this post in threaded view
|

Re: [rng] Split and Jump functions

Gilles Sadowski-2
In reply to this post by Bernd Eckenfels
Hi.

>
> Just a question, I am unclear on the terminology, is „jump“ (did I miss the discussion leading toot?) something invented here?

Not invented here: It's a functionality that exist for some RNG algorithms.

> It sounds to me like this is a generator where the state can be cloned and it is „seekable“. It probably makes sense to have those two dimensions separated anyway.
>
> Gruss
> Bernd
>
>> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

123