[lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

garydgregory
Now that RNG is up and going, it seems odd to still have:

org.apache.commons.lang3.ArrayUtils.shuffle(double[], Random)

Should we deprecate these APIs in favor of Commons RNG and if so which RNG
APIs?

Gary
Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

Alex Herbert

On 13/06/2019 15:59, Gary Gregory wrote:
> Now that RNG is up and going, it seems odd to still have:
>
> org.apache.commons.lang3.ArrayUtils.shuffle(double[], Random)
>
> Should we deprecate these APIs in favor of Commons RNG and if so which RNG
> APIs?
>
> Gary
Shuffling is in the commons-rng-sampling component.

Shuffling for int[] primitive arrays is in the PermutationSampler [1].

Shuffling for a List<T> is in the ListSampler [2].

Currently there are methods to shuffle the entire length, or part of the
length starting from a position and shuffling to the start or end of the
array / list.

There is no method for:

- Shuffling a subsequence within the array / list defined by either
start and length or start and end.

- Shuffling an array T[]

- Shuffling primitive arrays other than int[] (This requires a bit of
work [3])


[Lang] ArrayUtils only shuffles the full array but supports all
primitive array types and Object[]. It does not support a shuffle of T[].

So there is only partial overlap between the libraries. There is not a
like for like swap to deprecate ArrayUtils methods in favour of Commons
RNG methods.


I think it is quite a common use case to want to shuffle full length
arrays. I'm not sure about sub-sequence shuffles but one is a super-set
of the other.

To deprecate ArrayUtils shuffle methods would require a new class in RNG
to support this for all primitives and a generic type T following this
prototype:

ArrayUtils.shuffle(UniformRandomProvider rng, double[] data)
ArrayUtils.shuffle(UniformRandomProvider rng, double[] data, int start,
int length)


I'm not opposed to adding the methods to Commons RNG. Either in the
current sampling module or perhaps a new commons-rng-utils module
instead since a shuffle is not really a sample unless you use a
subsequence from the shuffled array.


Alex


[1]
http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.2/org/apache/commons/rng/sampling/PermutationSampler.html 


[2]
http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.2/org/apache/commons/rng/sampling/ListSampler.html

[3] Supporting all the primitive avoids the currently usage to shuffle a
non-integer array which requires:

- Clone the input array
- Shuffle a newly allocated int[] of ascending indices
- Use the shuffled indices to copy data from the clone back to the input
array

This has unnecessary allocation overhead.


>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

sebb-2-2
On Thu, 13 Jun 2019 at 16:35, Alex Herbert <[hidden email]> wrote:

>
>
> On 13/06/2019 15:59, Gary Gregory wrote:
> > Now that RNG is up and going, it seems odd to still have:
> >
> > org.apache.commons.lang3.ArrayUtils.shuffle(double[], Random)
> >
> > Should we deprecate these APIs in favor of Commons RNG and if so which RNG
> > APIs?
> >
> > Gary
> Shuffling is in the commons-rng-sampling component.
>
> Shuffling for int[] primitive arrays is in the PermutationSampler [1].
>
> Shuffling for a List<T> is in the ListSampler [2].
>
> Currently there are methods to shuffle the entire length, or part of the
> length starting from a position and shuffling to the start or end of the
> array / list.
>
> There is no method for:
>
> - Shuffling a subsequence within the array / list defined by either
> start and length or start and end.
>
> - Shuffling an array T[]
>
> - Shuffling primitive arrays other than int[] (This requires a bit of
> work [3])
>
>
> [Lang] ArrayUtils only shuffles the full array but supports all
> primitive array types and Object[]. It does not support a shuffle of T[].
>
> So there is only partial overlap between the libraries. There is not a
> like for like swap to deprecate ArrayUtils methods in favour of Commons
> RNG methods.
>
>
> I think it is quite a common use case to want to shuffle full length
> arrays. I'm not sure about sub-sequence shuffles but one is a super-set
> of the other.
>
> To deprecate ArrayUtils shuffle methods would require a new class in RNG
> to support this for all primitives and a generic type T following this
> prototype:
>
> ArrayUtils.shuffle(UniformRandomProvider rng, double[] data)
> ArrayUtils.shuffle(UniformRandomProvider rng, double[] data, int start,
> int length)
>
>
> I'm not opposed to adding the methods to Commons RNG. Either in the
> current sampling module or perhaps a new commons-rng-utils module
> instead since a shuffle is not really a sample unless you use a
> subsequence from the shuffled array.

Rather than shuffle etc in place, how about various
iterators/selectors to return entries in randomised order?
[Or does that already exist?]

>
> Alex
>
>
> [1]
> http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.2/org/apache/commons/rng/sampling/PermutationSampler.html
>
>
> [2]
> http://commons.apache.org/proper/commons-rng/commons-rng-sampling/javadocs/api-1.2/org/apache/commons/rng/sampling/ListSampler.html
>
> [3] Supporting all the primitive avoids the currently usage to shuffle a
> non-integer array which requires:
>
> - Clone the input array
> - Shuffle a newly allocated int[] of ascending indices
> - Use the shuffled indices to copy data from the clone back to the input
> array
>
> This has unnecessary allocation overhead.
>
>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

Eric Barnhill
On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:

>
>
> Rather than shuffle etc in place, how about various
> iterators/selectors to return entries in randomised order?
> [Or does that already exist?]
>

I am pretty sure random draws, and shuffling, are implemented with
different algorithms. Though sampling without replacement the full length
of the set would yield a shuffled set, I think there are more efficient
ways to shuffle a set.
Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

Alex Herbert

On 13/06/2019 17:56, Eric Barnhill wrote:

> On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
>
>>
>> Rather than shuffle etc in place, how about various
>> iterators/selectors to return entries in randomised order?
>> [Or does that already exist?]
>>
> I am pretty sure random draws, and shuffling, are implemented with
> different algorithms. Though sampling without replacement the full length
> of the set would yield a shuffled set, I think there are more efficient
> ways to shuffle a set.

Iterators to return a random draw *without* replacement over the full
length of the array? The iterator would dynamically shuffle the array on
each call to next() so could be stopped early or can be called
infinitely as if a continuous stream. Is that your idea?

UniformRandomProvider rng = ...;
int[] big = new int[1000000];
//
// Fill big with lots of data
//
IntIterator iter = ShuffleIterators.create(rng, big);
int x = iter.next();
int y = iter.next();
int z = iter.next();

This doesn't exist but it is easy to do. Memory requirements would
require a copy of the data, or it could be marked as destructive to the
input array order and shuffle in place.

If you want a random draw *with* replacement then you can just call
nextInt(int) with the size of the array to pick something.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

Eric Barnhill
An iterator that dynamically shuffles as you go along. That's really nice,
I had never even thought of that. Thanks.

On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <[hidden email]>
wrote:

>
> On 13/06/2019 17:56, Eric Barnhill wrote:
> > On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
> >
> >>
> >> Rather than shuffle etc in place, how about various
> >> iterators/selectors to return entries in randomised order?
> >> [Or does that already exist?]
> >>
> > I am pretty sure random draws, and shuffling, are implemented with
> > different algorithms. Though sampling without replacement the full length
> > of the set would yield a shuffled set, I think there are more efficient
> > ways to shuffle a set.
>
> Iterators to return a random draw *without* replacement over the full
> length of the array? The iterator would dynamically shuffle the array on
> each call to next() so could be stopped early or can be called
> infinitely as if a continuous stream. Is that your idea?
>
> UniformRandomProvider rng = ...;
> int[] big = new int[1000000];
> //
> // Fill big with lots of data
> //
> IntIterator iter = ShuffleIterators.create(rng, big);
> int x = iter.next();
> int y = iter.next();
> int z = iter.next();
>
> This doesn't exist but it is easy to do. Memory requirements would
> require a copy of the data, or it could be marked as destructive to the
> input array order and shuffle in place.
>
> If you want a random draw *with* replacement then you can just call
> nextInt(int) with the size of the array to pick something.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

sebb-2-2
I meant that the iterator would use the shuffled and/or selected
indices to return the appropriate entry from the original array.

No need to modify the original array.


On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <[hidden email]> wrote:

>
> An iterator that dynamically shuffles as you go along. That's really nice,
> I had never even thought of that. Thanks.
>
> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <[hidden email]>
> wrote:
>
> >
> > On 13/06/2019 17:56, Eric Barnhill wrote:
> > > On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
> > >
> > >>
> > >> Rather than shuffle etc in place, how about various
> > >> iterators/selectors to return entries in randomised order?
> > >> [Or does that already exist?]
> > >>
> > > I am pretty sure random draws, and shuffling, are implemented with
> > > different algorithms. Though sampling without replacement the full length
> > > of the set would yield a shuffled set, I think there are more efficient
> > > ways to shuffle a set.
> >
> > Iterators to return a random draw *without* replacement over the full
> > length of the array? The iterator would dynamically shuffle the array on
> > each call to next() so could be stopped early or can be called
> > infinitely as if a continuous stream. Is that your idea?
> >
> > UniformRandomProvider rng = ...;
> > int[] big = new int[1000000];
> > //
> > // Fill big with lots of data
> > //
> > IntIterator iter = ShuffleIterators.create(rng, big);
> > int x = iter.next();
> > int y = iter.next();
> > int z = iter.next();
> >
> > This doesn't exist but it is easy to do. Memory requirements would
> > require a copy of the data, or it could be marked as destructive to the
> > input array order and shuffle in place.
> >
> > If you want a random draw *with* replacement then you can just call
> > nextInt(int) with the size of the array to pick something.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

Alex Herbert

On 14/06/2019 12:01, sebb wrote:
> I meant that the iterator would use the shuffled and/or selected
> indices to return the appropriate entry from the original array.
>
> No need to modify the original array.

To shuffle an array requires storage as elements are swapped. Either
store an index or store the data. So for primitive types with 4-bytes or
less storage it is more efficient (memory wise) to copy the array and
shuffle in place. For those with size of 8-bytes then you can create an
int array of indices and shuffle that. The index can be used to take the
sample from the original array.

In terms of the RNG library the functionality to shuffle an entire array
once (as per ArrayUtils) would be the same as the functions from [lang]
ArrayUtils. So a decision should be made on whether to copy that
functionality from [lang] and mark it as deprecated there.

The idea of unlimited shuffling would be done as a sampler so this would
go into the rng-sampling module. A single shuffle creates a permutation
of the sequence. The sample would do this dynamically and repeatedly
pass over the array.

The sort of code would be:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng = ...;
DoubleContinuousPermutationSampler sampler = new
DoubleContinuousPermutationSampler(rng, array);
for (int i = 0; i < 100; i++) {
     double sample = sampler.next();
}

The output would be the n values of the array in a random order without
replacement. Once the entire set n is produced then the output would
repeat in a new random order, and so on:

2.2, 1.1, 3.3, 3.3, 2.2, 1.1, 2.2, 3.3, 1.1, ...

Currently the library does not specialise for primitives so a more
generic sampler would only output a stream of int indices:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng = ...;
ContinuousPermutationSampler sampler = new
ContinuousPermutationSampler(rng, array.length);
for (int i = 0; i < 100; i++) {
     double sample = array[sampler.next()];
}

If you are only interested in a complete single pass then creating a
shuffled int[] of indices can be done using the existing
PermutationSampler. This returns an int[] permutation from its sample
method. The array can be used in place of an iterator:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng = ...;
for (int index : new PermutationSampler(rng, array.length,
array.length).sample()) {
     double sample = array[index];
}

I'm not opposed to the continuous permutation sampler idea. I just can't
think of a use case not already satisfied by the existing
PermutationSampler, i.e. when you want to sample permutations using each
index one at a time and not in chunks of indices. This type of idea:

double[] array = { 1.1, 2.2, 3.3 };
UniformRandomProvider rng;
// The PermutationSampler will validate the length is >0
final PermutationSampler sampler = new PermutationSampler(rng,
array.length, array.length);
IntSupplier supplier = new IntSupplier() {
     int i;
     int[] sample;
     @Override
     public int getAsInt() {
         if (i == 0) {
             sample = sampler.sample();
             i = sample.length;
         }
         return sample[--i];
     }
};
for (;;) {
     double sample = array[supplier.getAsInt()];
}


>
> On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <[hidden email]> wrote:
>> An iterator that dynamically shuffles as you go along. That's really nice,
>> I had never even thought of that. Thanks.
>>
>> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <[hidden email]>
>> wrote:
>>
>>> On 13/06/2019 17:56, Eric Barnhill wrote:
>>>> On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
>>>>
>>>>> Rather than shuffle etc in place, how about various
>>>>> iterators/selectors to return entries in randomised order?
>>>>> [Or does that already exist?]
>>>>>
>>>> I am pretty sure random draws, and shuffling, are implemented with
>>>> different algorithms. Though sampling without replacement the full length
>>>> of the set would yield a shuffled set, I think there are more efficient
>>>> ways to shuffle a set.
>>> Iterators to return a random draw *without* replacement over the full
>>> length of the array? The iterator would dynamically shuffle the array on
>>> each call to next() so could be stopped early or can be called
>>> infinitely as if a continuous stream. Is that your idea?
>>>
>>> UniformRandomProvider rng = ...;
>>> int[] big = new int[1000000];
>>> //
>>> // Fill big with lots of data
>>> //
>>> IntIterator iter = ShuffleIterators.create(rng, big);
>>> int x = iter.next();
>>> int y = iter.next();
>>> int z = iter.next();
>>>
>>> This doesn't exist but it is easy to do. Memory requirements would
>>> require a copy of the data, or it could be marked as destructive to the
>>> input array order and shuffle in place.
>>>
>>> If you want a random draw *with* replacement then you can just call
>>> nextInt(int) with the size of the array to pick something.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

sebb-2-2
On Fri, 14 Jun 2019 at 12:57, Alex Herbert <[hidden email]> wrote:

>
>
> On 14/06/2019 12:01, sebb wrote:
> > I meant that the iterator would use the shuffled and/or selected
> > indices to return the appropriate entry from the original array.
> >
> > No need to modify the original array.
>
> To shuffle an array requires storage as elements are swapped. Either
> store an index or store the data. So for primitive types with 4-bytes or
> less storage it is more efficient (memory wise) to copy the array and
> shuffle in place. For those with size of 8-bytes then you can create an
> int array of indices and shuffle that. The index can be used to take the
> sample from the original array.
>
> In terms of the RNG library the functionality to shuffle an entire array
> once (as per ArrayUtils) would be the same as the functions from [lang]
> ArrayUtils. So a decision should be made on whether to copy that
> functionality from [lang] and mark it as deprecated there.
>
> The idea of unlimited shuffling would be done as a sampler so this would
> go into the rng-sampling module. A single shuffle creates a permutation
> of the sequence. The sample would do this dynamically and repeatedly
> pass over the array.
>
> The sort of code would be:
>
> double[] array = { 1.1, 2.2, 3.3 };
> UniformRandomProvider rng = ...;
> DoubleContinuousPermutationSampler sampler = new
> DoubleContinuousPermutationSampler(rng, array);
> for (int i = 0; i < 100; i++) {
>      double sample = sampler.next();
> }
>
> The output would be the n values of the array in a random order without
> replacement. Once the entire set n is produced then the output would
> repeat in a new random order, and so on:
>
> 2.2, 1.1, 3.3, 3.3, 2.2, 1.1, 2.2, 3.3, 1.1, ...
>
> Currently the library does not specialise for primitives so a more
> generic sampler would only output a stream of int indices:
>
> double[] array = { 1.1, 2.2, 3.3 };
> UniformRandomProvider rng = ...;
> ContinuousPermutationSampler sampler = new
> ContinuousPermutationSampler(rng, array.length);
> for (int i = 0; i < 100; i++) {
>      double sample = array[sampler.next()];
> }
>
> If you are only interested in a complete single pass then creating a
> shuffled int[] of indices can be done using the existing
> PermutationSampler. This returns an int[] permutation from its sample
> method. The array can be used in place of an iterator:
>
> double[] array = { 1.1, 2.2, 3.3 };
> UniformRandomProvider rng = ...;
> for (int index : new PermutationSampler(rng, array.length,
> array.length).sample()) {
>      double sample = array[index];
> }

My proposal was basically to implement the above in wrapper methods
using an interface such as List or Iterator.

> I'm not opposed to the continuous permutation sampler idea. I just can't
> think of a use case not already satisfied by the existing
> PermutationSampler, i.e. when you want to sample permutations using each
> index one at a time and not in chunks of indices. This type of idea:
>
> double[] array = { 1.1, 2.2, 3.3 };
> UniformRandomProvider rng;
> // The PermutationSampler will validate the length is >0
> final PermutationSampler sampler = new PermutationSampler(rng,
> array.length, array.length);
> IntSupplier supplier = new IntSupplier() {
>      int i;
>      int[] sample;
>      @Override
>      public int getAsInt() {
>          if (i == 0) {
>              sample = sampler.sample();
>              i = sample.length;
>          }
>          return sample[--i];
>      }
> };
> for (;;) {
>      double sample = array[supplier.getAsInt()];
> }
>
>
> >
> > On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <[hidden email]> wrote:
> >> An iterator that dynamically shuffles as you go along. That's really nice,
> >> I had never even thought of that. Thanks.
> >>
> >> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <[hidden email]>
> >> wrote:
> >>
> >>> On 13/06/2019 17:56, Eric Barnhill wrote:
> >>>> On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
> >>>>
> >>>>> Rather than shuffle etc in place, how about various
> >>>>> iterators/selectors to return entries in randomised order?
> >>>>> [Or does that already exist?]
> >>>>>
> >>>> I am pretty sure random draws, and shuffling, are implemented with
> >>>> different algorithms. Though sampling without replacement the full length
> >>>> of the set would yield a shuffled set, I think there are more efficient
> >>>> ways to shuffle a set.
> >>> Iterators to return a random draw *without* replacement over the full
> >>> length of the array? The iterator would dynamically shuffle the array on
> >>> each call to next() so could be stopped early or can be called
> >>> infinitely as if a continuous stream. Is that your idea?
> >>>
> >>> UniformRandomProvider rng = ...;
> >>> int[] big = new int[1000000];
> >>> //
> >>> // Fill big with lots of data
> >>> //
> >>> IntIterator iter = ShuffleIterators.create(rng, big);
> >>> int x = iter.next();
> >>> int y = iter.next();
> >>> int z = iter.next();
> >>>
> >>> This doesn't exist but it is easy to do. Memory requirements would
> >>> require a copy of the data, or it could be marked as destructive to the
> >>> input array order and shuffle in place.
> >>>
> >>> If you want a random draw *with* replacement then you can just call
> >>> nextInt(int) with the size of the array to pick something.
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

Alex Herbert

On 14/06/2019 13:29, sebb wrote:

> On Fri, 14 Jun 2019 at 12:57, Alex Herbert <[hidden email]> wrote:
>>
>> On 14/06/2019 12:01, sebb wrote:
>>> I meant that the iterator would use the shuffled and/or selected
>>> indices to return the appropriate entry from the original array.
>>>
>>> No need to modify the original array.
>> To shuffle an array requires storage as elements are swapped. Either
>> store an index or store the data. So for primitive types with 4-bytes or
>> less storage it is more efficient (memory wise) to copy the array and
>> shuffle in place. For those with size of 8-bytes then you can create an
>> int array of indices and shuffle that. The index can be used to take the
>> sample from the original array.
>>
>> In terms of the RNG library the functionality to shuffle an entire array
>> once (as per ArrayUtils) would be the same as the functions from [lang]
>> ArrayUtils. So a decision should be made on whether to copy that
>> functionality from [lang] and mark it as deprecated there.
>>
>> The idea of unlimited shuffling would be done as a sampler so this would
>> go into the rng-sampling module. A single shuffle creates a permutation
>> of the sequence. The sample would do this dynamically and repeatedly
>> pass over the array.
>>
>> The sort of code would be:
>>
>> double[] array = { 1.1, 2.2, 3.3 };
>> UniformRandomProvider rng = ...;
>> DoubleContinuousPermutationSampler sampler = new
>> DoubleContinuousPermutationSampler(rng, array);
>> for (int i = 0; i < 100; i++) {
>>       double sample = sampler.next();
>> }
>>
>> The output would be the n values of the array in a random order without
>> replacement. Once the entire set n is produced then the output would
>> repeat in a new random order, and so on:
>>
>> 2.2, 1.1, 3.3, 3.3, 2.2, 1.1, 2.2, 3.3, 1.1, ...
>>
>> Currently the library does not specialise for primitives so a more
>> generic sampler would only output a stream of int indices:
>>
>> double[] array = { 1.1, 2.2, 3.3 };
>> UniformRandomProvider rng = ...;
>> ContinuousPermutationSampler sampler = new
>> ContinuousPermutationSampler(rng, array.length);
>> for (int i = 0; i < 100; i++) {
>>       double sample = array[sampler.next()];
>> }
>>
>> If you are only interested in a complete single pass then creating a
>> shuffled int[] of indices can be done using the existing
>> PermutationSampler. This returns an int[] permutation from its sample
>> method. The array can be used in place of an iterator:
>>
>> double[] array = { 1.1, 2.2, 3.3 };
>> UniformRandomProvider rng = ...;
>> for (int index : new PermutationSampler(rng, array.length,
>> array.length).sample()) {
>>       double sample = array[index];
>> }
> My proposal was basically to implement the above in wrapper methods
> using an interface such as List or Iterator.
You then have boxing of primitives. Commons RNG is not on JDK 8 so we
cannot use the primitive specialisations of iterator for int, long and
double.

The basic Iterator does not support more functionality than a for loop
over a primitive array. The extra from iterator is remove() which is not
functionality we are discussing.

This would be nice:

IntStream PermutationSampler.stream(UniformRandomProvider rng, int n)

The stream would be shuffled indices. Without JDK 8 you use the 1 liner:

IntStream.of(new PermutationSampler(rng, array.length, array.length).sample());

But with the lack of support for Stream materialisation when a
terminating operation is called (i.e. the shuffle will be done even when
the stream is not consumed). Ensuring lazy initialisation would require
some atomic state in a PrimitiveIterator.OfInt implementation used to
construct the stream. Any speed advantage of parallel streams would not
help the shuffle as the full shuffle must be done before sub sections of
the shuffled array can be used.

I'd put all this on the maybe pile.

>
>> I'm not opposed to the continuous permutation sampler idea. I just can't
>> think of a use case not already satisfied by the existing
>> PermutationSampler, i.e. when you want to sample permutations using each
>> index one at a time and not in chunks of indices. This type of idea:
>>
>> double[] array = { 1.1, 2.2, 3.3 };
>> UniformRandomProvider rng;
>> // The PermutationSampler will validate the length is >0
>> final PermutationSampler sampler = new PermutationSampler(rng,
>> array.length, array.length);
>> IntSupplier supplier = new IntSupplier() {
>>       int i;
>>       int[] sample;
>>       @Override
>>       public int getAsInt() {
>>           if (i == 0) {
>>               sample = sampler.sample();
>>               i = sample.length;
>>           }
>>           return sample[--i];
>>       }
>> };
>> for (;;) {
>>       double sample = array[supplier.getAsInt()];
>> }
>>
>>
>>> On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <[hidden email]> wrote:
>>>> An iterator that dynamically shuffles as you go along. That's really nice,
>>>> I had never even thought of that. Thanks.
>>>>
>>>> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <[hidden email]>
>>>> wrote:
>>>>
>>>>> On 13/06/2019 17:56, Eric Barnhill wrote:
>>>>>> On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
>>>>>>
>>>>>>> Rather than shuffle etc in place, how about various
>>>>>>> iterators/selectors to return entries in randomised order?
>>>>>>> [Or does that already exist?]
>>>>>>>
>>>>>> I am pretty sure random draws, and shuffling, are implemented with
>>>>>> different algorithms. Though sampling without replacement the full length
>>>>>> of the set would yield a shuffled set, I think there are more efficient
>>>>>> ways to shuffle a set.
>>>>> Iterators to return a random draw *without* replacement over the full
>>>>> length of the array? The iterator would dynamically shuffle the array on
>>>>> each call to next() so could be stopped early or can be called
>>>>> infinitely as if a continuous stream. Is that your idea?
>>>>>
>>>>> UniformRandomProvider rng = ...;
>>>>> int[] big = new int[1000000];
>>>>> //
>>>>> // Fill big with lots of data
>>>>> //
>>>>> IntIterator iter = ShuffleIterators.create(rng, big);
>>>>> int x = iter.next();
>>>>> int y = iter.next();
>>>>> int z = iter.next();
>>>>>
>>>>> This doesn't exist but it is easy to do. Memory requirements would
>>>>> require a copy of the data, or it could be marked as destructive to the
>>>>> input array order and shuffle in place.
>>>>>
>>>>> If you want a random draw *with* replacement then you can just call
>>>>> nextInt(int) with the size of the array to pick something.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: [lang][rng] org.apache.commons.lang3.ArrayUtils.shuffle()

garydgregory
+1 to updating RNG to Java 8

Gary

On Fri, Jun 14, 2019 at 10:45 AM Alex Herbert <[hidden email]>
wrote:

>
> On 14/06/2019 13:29, sebb wrote:
> > On Fri, 14 Jun 2019 at 12:57, Alex Herbert <[hidden email]>
> wrote:
> >>
> >> On 14/06/2019 12:01, sebb wrote:
> >>> I meant that the iterator would use the shuffled and/or selected
> >>> indices to return the appropriate entry from the original array.
> >>>
> >>> No need to modify the original array.
> >> To shuffle an array requires storage as elements are swapped. Either
> >> store an index or store the data. So for primitive types with 4-bytes or
> >> less storage it is more efficient (memory wise) to copy the array and
> >> shuffle in place. For those with size of 8-bytes then you can create an
> >> int array of indices and shuffle that. The index can be used to take the
> >> sample from the original array.
> >>
> >> In terms of the RNG library the functionality to shuffle an entire array
> >> once (as per ArrayUtils) would be the same as the functions from [lang]
> >> ArrayUtils. So a decision should be made on whether to copy that
> >> functionality from [lang] and mark it as deprecated there.
> >>
> >> The idea of unlimited shuffling would be done as a sampler so this would
> >> go into the rng-sampling module. A single shuffle creates a permutation
> >> of the sequence. The sample would do this dynamically and repeatedly
> >> pass over the array.
> >>
> >> The sort of code would be:
> >>
> >> double[] array = { 1.1, 2.2, 3.3 };
> >> UniformRandomProvider rng = ...;
> >> DoubleContinuousPermutationSampler sampler = new
> >> DoubleContinuousPermutationSampler(rng, array);
> >> for (int i = 0; i < 100; i++) {
> >>       double sample = sampler.next();
> >> }
> >>
> >> The output would be the n values of the array in a random order without
> >> replacement. Once the entire set n is produced then the output would
> >> repeat in a new random order, and so on:
> >>
> >> 2.2, 1.1, 3.3, 3.3, 2.2, 1.1, 2.2, 3.3, 1.1, ...
> >>
> >> Currently the library does not specialise for primitives so a more
> >> generic sampler would only output a stream of int indices:
> >>
> >> double[] array = { 1.1, 2.2, 3.3 };
> >> UniformRandomProvider rng = ...;
> >> ContinuousPermutationSampler sampler = new
> >> ContinuousPermutationSampler(rng, array.length);
> >> for (int i = 0; i < 100; i++) {
> >>       double sample = array[sampler.next()];
> >> }
> >>
> >> If you are only interested in a complete single pass then creating a
> >> shuffled int[] of indices can be done using the existing
> >> PermutationSampler. This returns an int[] permutation from its sample
> >> method. The array can be used in place of an iterator:
> >>
> >> double[] array = { 1.1, 2.2, 3.3 };
> >> UniformRandomProvider rng = ...;
> >> for (int index : new PermutationSampler(rng, array.length,
> >> array.length).sample()) {
> >>       double sample = array[index];
> >> }
> > My proposal was basically to implement the above in wrapper methods
> > using an interface such as List or Iterator.
> You then have boxing of primitives. Commons RNG is not on JDK 8 so we
> cannot use the primitive specialisations of iterator for int, long and
> double.
>
> The basic Iterator does not support more functionality than a for loop
> over a primitive array. The extra from iterator is remove() which is not
> functionality we are discussing.
>
> This would be nice:
>
> IntStream PermutationSampler.stream(UniformRandomProvider rng, int n)
>
> The stream would be shuffled indices. Without JDK 8 you use the 1 liner:
>
> IntStream.of(new PermutationSampler(rng, array.length,
> array.length).sample());
>
> But with the lack of support for Stream materialisation when a
> terminating operation is called (i.e. the shuffle will be done even when
> the stream is not consumed). Ensuring lazy initialisation would require
> some atomic state in a PrimitiveIterator.OfInt implementation used to
> construct the stream. Any speed advantage of parallel streams would not
> help the shuffle as the full shuffle must be done before sub sections of
> the shuffled array can be used.
>
> I'd put all this on the maybe pile.
>
> >
> >> I'm not opposed to the continuous permutation sampler idea. I just can't
> >> think of a use case not already satisfied by the existing
> >> PermutationSampler, i.e. when you want to sample permutations using each
> >> index one at a time and not in chunks of indices. This type of idea:
> >>
> >> double[] array = { 1.1, 2.2, 3.3 };
> >> UniformRandomProvider rng;
> >> // The PermutationSampler will validate the length is >0
> >> final PermutationSampler sampler = new PermutationSampler(rng,
> >> array.length, array.length);
> >> IntSupplier supplier = new IntSupplier() {
> >>       int i;
> >>       int[] sample;
> >>       @Override
> >>       public int getAsInt() {
> >>           if (i == 0) {
> >>               sample = sampler.sample();
> >>               i = sample.length;
> >>           }
> >>           return sample[--i];
> >>       }
> >> };
> >> for (;;) {
> >>       double sample = array[supplier.getAsInt()];
> >> }
> >>
> >>
> >>> On Thu, 13 Jun 2019 at 18:15, Eric Barnhill <[hidden email]>
> wrote:
> >>>> An iterator that dynamically shuffles as you go along. That's really
> nice,
> >>>> I had never even thought of that. Thanks.
> >>>>
> >>>> On Thu, Jun 13, 2019 at 10:11 AM Alex Herbert <
> [hidden email]>
> >>>> wrote:
> >>>>
> >>>>> On 13/06/2019 17:56, Eric Barnhill wrote:
> >>>>>> On Thu, Jun 13, 2019 at 9:36 AM sebb <[hidden email]> wrote:
> >>>>>>
> >>>>>>> Rather than shuffle etc in place, how about various
> >>>>>>> iterators/selectors to return entries in randomised order?
> >>>>>>> [Or does that already exist?]
> >>>>>>>
> >>>>>> I am pretty sure random draws, and shuffling, are implemented with
> >>>>>> different algorithms. Though sampling without replacement the full
> length
> >>>>>> of the set would yield a shuffled set, I think there are more
> efficient
> >>>>>> ways to shuffle a set.
> >>>>> Iterators to return a random draw *without* replacement over the full
> >>>>> length of the array? The iterator would dynamically shuffle the
> array on
> >>>>> each call to next() so could be stopped early or can be called
> >>>>> infinitely as if a continuous stream. Is that your idea?
> >>>>>
> >>>>> UniformRandomProvider rng = ...;
> >>>>> int[] big = new int[1000000];
> >>>>> //
> >>>>> // Fill big with lots of data
> >>>>> //
> >>>>> IntIterator iter = ShuffleIterators.create(rng, big);
> >>>>> int x = iter.next();
> >>>>> int y = iter.next();
> >>>>> int z = iter.next();
> >>>>>
> >>>>> This doesn't exist but it is easy to do. Memory requirements would
> >>>>> require a copy of the data, or it could be marked as destructive to
> the
> >>>>> input array order and shuffle in place.
> >>>>>
> >>>>> If you want a random draw *with* replacement then you can just call
> >>>>> nextInt(int) with the size of the array to pick something.
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [hidden email]
> >>>>> For additional commands, e-mail: [hidden email]
> >>>>>
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>