Hello.
The class "ComplexUtils" deal with multi-dimensional arrays that hold instances of the "Complex" class. I've recently encountered a use-case where it was pointed out that storing many "Complex" instances (as seems the purpose of these utilities) is quite inefficient memory-wise as each instance will take 32 bytes[1] while the real and imaginary parts would only take 16 bytes as 2 primitive "double"s. This is compounded by multi-dimensional array where each sub-dimensional element is an array object (and thus takes another additional 16 bytes). For example, a double[10][5][4] would take 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8 = 2416 bytes. Assuming that in the above array, the last dimension holds 2 complex numbers, the same data can be represented as Complex[10][5][2] that would take 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8 = 4016 bytes. In both cases, the payload (10 * 5 * 2 complex numbers) is 10 * 5 * 2 * 2 * 8 = 1600 bytes. If stored in a one-dimensional array, the size in memory would be 1616 bytes. Thus in the case of a data cube holding 100 complex numbers, a 3D array takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D array. If this is correct, I wonder whether we should advertise such a "ComplexUtils" class. It would perhaps be preferable to propose a data cube abstraction.[2] WDYT? Regards, Gilles [1] https://www.baeldung.com/java-size-of-object [2] Based on http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved double[] arrays, which I suppose is much more space efficient, and the status of a number as real or imaginary is implicit by its location being odd or even. The MultidimensionalCounter functionality you mention is for example known in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of certain functions which can improve performance. Some people swear by it. I always found it an additional mental exercise that I didn't want to occupy myself with unless I absolutely had to. So, maybe it makes sense to "advertise" this approach like you say, but some users may just want the 3D arrays for more rapid prototyping-ish applications. I wonder if there isn't some streaming solution for this -- the numbers are stored as an interleaved 1D array, but are streamed through a Complex constructor before any needed operations are performed. And I guess that leads to my last question -- suppose someone wants to call a trig function on a series of Complex numbers. Now let's imagine the primitives have been stored in some maximally efficient way. It seems like, to use any of the functionality in Complex, these numbers would have to be unpacked, cast as Complex, operated on, then cast back to how they are being stored. I wonder if that would prove to be more efficient in the end. On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <[hidden email]> wrote: > Hello. > > The class "ComplexUtils" deal with multi-dimensional arrays that hold > instances of the "Complex" class. > I've recently encountered a use-case where it was pointed out that storing > many "Complex" instances (as seems the purpose of these utilities) is quite > inefficient memory-wise as each instance will take 32 bytes[1] while the > real and imaginary parts would only take 16 bytes as 2 primitive "double"s. > This is compounded by multi-dimensional array where each sub-dimensional > element is an array object (and thus takes another additional 16 bytes). > For example, a > double[10][5][4] > would take > 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8 > = 2416 bytes. > Assuming that in the above array, the last dimension holds 2 complex > numbers, > the same data can be represented as > Complex[10][5][2] > that would take > 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8 > = 4016 bytes. > In both cases, the payload (10 * 5 * 2 complex numbers) is > 10 * 5 * 2 * 2 * 8 > = 1600 bytes. > If stored in a one-dimensional array, the size in memory would be 1616 > bytes. > Thus in the case of a data cube holding 100 complex numbers, a 3D array > takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D > array. > If this is correct, I wonder whether we should advertise such a > "ComplexUtils" > class. It would perhaps be preferable to propose a data cube > abstraction.[2] > > WDYT? > > Regards, > Gilles > > [1] https://www.baeldung.com/java-size-of-object > [2] Based on > http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > > |
On 05/11/2019 00:09, Eric Barnhill wrote:
> That's interesting. The JTransforms library performs Fourier transforms > that can take complex input, output, or both. They do this with interleaved > double[] arrays, which I suppose is much more space efficient, and the > status of a number as real or imaginary is implicit by its location being > odd or even. The packing of multi-dimensional structures into a single array is common in image processing. It has computational efficiency when operating on the entire set of numbers. It is used a lot in image processing to pack 2D data as a single array. In this case even random access to the (x,y) index as: 2D: [x][y] 1D: [y * maxx + x] is comparable in speed as the later avoids the second array dereference. A single array has advantages for a copy operation as it requires a single .clone() method call. The disadvantage is that the ultimate size of the multi-dimension array is limited by the size of an array. In the case of complex numbers the use of alternating indices to store real and imaginary allows the array to be easily resized and extended. Operations that process all positions only need to cache in memory a sub-section of the array. [re1, im1, re2, im2, re3, im3] The alternative is to store real and imaginary in a large block in either 1 array or 2 arrays: [re1, re2, re3, im1, im2, im3] [re1, re2, re3] [im1, im2, im3] This has advantages for operations that require either the real or the imaginary components on their own. It is a bit more work to extend the data as it requires two copies of the existing data and packing of the new values as the appropriate position. For small sizes the use of 2 arrays this would be less efficient memory wise. It would require some type of pointer to hold the pair. However it could store double the amount of complex numbers. > > The MultidimensionalCounter functionality you mention is for example known > in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of > certain functions which can improve performance. Some people swear by it. I > always found it an additional mental exercise that I didn't want to occupy > myself with unless I absolutely had to. So, maybe it makes sense to > "advertise" this approach like you say, but some users may just want the 3D > arrays for more rapid prototyping-ish applications. > > I wonder if there isn't some streaming solution for this -- the numbers are > stored as an interleaved 1D array, but are streamed through a Complex > constructor before any needed operations are performed. My initial thought is not as double values. A DoubleStream processes single values. To collect the real and imaginary components would require that the stream can be processed two at a time. This structure would work: class ComplexArray { // Must be even in length. Packed as (real, imaginary) double[] data; int size; Stream<Complex> stream() { return IntStream.range(0, size / 2) .mapToObj(i -> Complex.ofCartesian(data[i*2], data[i*2+1])); } void add(Complex c) { checkCapacity(2); data[size++] = c.getReal(); data[size++] = c.getImaginary(); } void combine(ComplexArray c) { checkCapacity(c.size); System.arraycopy(c.data, 0, data, size, c.size); size += c.size; } private void checkCapacity(int length) { final int minCapacity = size + length; final int oldCapacity = data.length; if (minCapacity > oldCapacity) { int newCapacity = ((oldCapacity * 3) >>> 1) + 1; if (newCapacity < minCapacity) { newCapacity = minCapacity; } data = Arrays.copyOf(data, newCapacity); } } } > And I guess that leads to my last question -- suppose someone wants to call > a trig function on a series of Complex numbers. Now let's imagine the > primitives have been stored in some maximally efficient way. It seems like, > to use any of the functionality in Complex, these numbers would have to be > unpacked, cast as Complex, operated on, then cast back to how they are > being stored. I wonder if that would prove to be more efficient in the end. Probably not more efficient to create all the Complex instances. Ideally you would operate on the primitive data directly. Looking at what Complex is capable of doing it seems that there are a lot of operations that would have to be duplicated to allow operations to work on primitives. However you can create the API and implement it lazily using the above object, e.g. public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> mapper) { return array.stream() .map(mapper) .collect(ComplexArray::new, ComplexArray::add, ComplexArray::combine); } public static ComplexArray log(ComplexArray array) { return apply(Complex::log); } This does create a new Complex object for each operation function applied to the data. The other methods that accept another Complex as an argument would have to be handled differently as you cannot start two streams (for each ComplexArray) in parallel. These are add, divide, multiply, subtract. So this would require the computation be moved inside ComplexArray: ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> function) { // TODO: argument checks on size return IntStream.range(0, size / 2) .mapToObj(i -> { Complex c1 = Complex.ofCartesian(data[i*2], data[i*2+1]); Complex c2 = Complex.ofCartesian(other.data[i*2], other.data[i*2+1]); return function.apply(c1, c2); }).collect(ComplexArray::new, ComplexArray::add, ComplexArray::combine); } ComplexArray a1 = ...; ComplexArray a2 = ...; ComplexArray a3 = a1.apply(a2, Complex::add); Note: To avoid creating new objects for all computations would require a rewrite of Complex so that the computations are done by static methods that accept a supplier of real and imaginary components, compute the calculation and send the answer to a function that receives the new components, for example: interface ComplexData { double getReal(); double getImaginary(); } interface ToComplexData { ComplexData create(double real, double imaginary); } // For example public static ComplexData conjugate(ComplexData in, ToComplexData out) { return out.create(in.getReal(), -in.getImaginary()); } // This would make the existing API method public Complex conjugate() { return conjugate(this, Complex::ofCartesian); } This would allow the ComplexArray to provide an object that implements ComplexData to read the current datapoint and an object that implements ToComplexData to write back to the equivalent datapoint in a result adding something like this to the ComplexArray object using inner classes to read and write to the data array: class ComplexDataReader implements ComplexData { int pos; double getReal() { return data[pos*2]; } double getImaginary() { return data[pos*2+1]; } } class ComplexDataAppender implements ToComplexData { ComplexData create(double real, double imaginary) { data[size++] = real; data[size++] = imaginary; return null; } } ComplexDataReader getDataReader() { return new ComplexDataReader(); } ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); } ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) { ComplexDataReader in = getDataReader(); // Initialise the capacity ComplexArray result = new ComplexArray(size); ComplexDataAppender out = result.getDataAppender(); while (in.pos < size) { function.apply(in, out); in.pos += 2; } return result; } // Apply the function from above: // public static ComplexData conjugate(ComplexData in, ToComplexData out) ComplexArray a1 = ...; ComplexArray a2 = a1.apply(Complex::conjugate); This allows a user to pass a function that can do anything to the current data to create new real and imaginary components which are stored in a new array object. I do not know how this would effect performance without a full comparison of the API implemented with both concepts. If the storage of the array of complex data were abstracted it may lead to simplification of ComplexUtils. Each of the methods to convert from multi-dimensional arrays of Complex can be replaced by a single method to extract the required data from the ComplexArray. I can see at least the simple abstraction as useful. Performance improvements using in-place reading of the real and imaginary components for computations can be added later. It would be easier to do this if the ComplexArray is in the same package allowing package private methods to be added to Complex to do computations. Alex > > On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <[hidden email]> wrote: > >> Hello. >> >> The class "ComplexUtils" deal with multi-dimensional arrays that hold >> instances of the "Complex" class. >> I've recently encountered a use-case where it was pointed out that storing >> many "Complex" instances (as seems the purpose of these utilities) is quite >> inefficient memory-wise as each instance will take 32 bytes[1] while the >> real and imaginary parts would only take 16 bytes as 2 primitive "double"s. >> This is compounded by multi-dimensional array where each sub-dimensional >> element is an array object (and thus takes another additional 16 bytes). >> For example, a >> double[10][5][4] >> would take >> 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8 >> = 2416 bytes. >> Assuming that in the above array, the last dimension holds 2 complex >> numbers, >> the same data can be represented as >> Complex[10][5][2] >> that would take >> 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8 >> = 4016 bytes. >> In both cases, the payload (10 * 5 * 2 complex numbers) is >> 10 * 5 * 2 * 2 * 8 >> = 1600 bytes. >> If stored in a one-dimensional array, the size in memory would be 1616 >> bytes. >> Thus in the case of a data cube holding 100 complex numbers, a 3D array >> takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D >> array. >> If this is correct, I wonder whether we should advertise such a >> "ComplexUtils" >> class. It would perhaps be preferable to propose a data cube >> abstraction.[2] >> >> WDYT? >> >> Regards, >> Gilles >> >> [1] https://www.baeldung.com/java-size-of-object >> [2] Based on >> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [hidden email] >> For additional commands, e-mail: [hidden email] >> >> |
Hello.
Le mar. 5 nov. 2019 à 18:38, Alex Herbert <[hidden email]> a écrit : > > On 05/11/2019 00:09, Eric Barnhill wrote: > > That's interesting. The JTransforms library performs Fourier transforms > > that can take complex input, output, or both. They do this with interleaved > > double[] arrays, which I suppose is much more space efficient, and the > > status of a number as real or imaginary is implicit by its location being > > odd or even. > > The packing of multi-dimensional structures into a single array is > common in image processing. It has computational efficiency when > operating on the entire set of numbers. > > It is used a lot in image processing to pack 2D data as a single array. > In this case even random access to the (x,y) index as: > > 2D: [x][y] > 1D: [y * maxx + x] > > is comparable in speed as the later avoids the second array dereference. I suspected that it might be the case. If you have benchmark data that confirm it, then is there ever any advantage to a multidimensional array? > A single array has advantages for a copy operation as it requires a > single .clone() method call. The disadvantage is that the ultimate size > of the multi-dimension array is limited by the size of an array. That's ~17 GB, and more than 1 billion complex numbers. And if there are use-cases that require very large data cubes, then the abstraction would allow to add further indirections. > In the case of complex numbers the use of alternating indices to store > real and imaginary allows the array to be easily resized and extended. > Operations that process all positions only need to cache in memory a > sub-section of the array. > > [re1, im1, re2, im2, re3, im3] > > The alternative is to store real and imaginary in a large block in > either 1 array or 2 arrays: > > [re1, re2, re3, im1, im2, im3] > > [re1, re2, re3] > [im1, im2, im3] > > This has advantages for operations that require either the real or the > imaginary components on their own. It is a bit more work to extend the > data as it requires two copies of the existing data and packing of the > new values as the appropriate position. > > For small sizes the use of 2 arrays this would be less efficient memory > wise. It would require some type of pointer to hold the pair. However it > could store double the amount of complex numbers. > > > > The MultidimensionalCounter functionality you mention is for example known > > in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of > > certain functions which can improve performance. Some people swear by it. I > > always found it an additional mental exercise that I didn't want to occupy > > myself with unless I absolutely had to. So, maybe it makes sense to > > "advertise" this approach like you say, but some users may just want the 3D > > arrays for more rapid prototyping-ish applications. As Alex has shown, the 1D representation is abstracted away: The data is accessed as a multi-dimensional array and the user never needs to remember that it is interleaved. > > > > I wonder if there isn't some streaming solution for this -- the numbers are > > stored as an interleaved 1D array, but are streamed through a Complex > > constructor before any needed operations are performed. > > My initial thought is not as double values. A DoubleStream processes > single values. To collect the real and imaginary components would > require that the stream can be processed two at a time. > > This structure would work: > > class ComplexArray { > // Must be even in length. Packed as (real, imaginary) > double[] data; > int size; > > Stream<Complex> stream() { > return IntStream.range(0, size / 2) > .mapToObj(i -> Complex.ofCartesian(data[i*2], data[i*2+1])); > } > > void add(Complex c) { > checkCapacity(2); > data[size++] = c.getReal(); > data[size++] = c.getImaginary(); > } Perhaps "store" would be less confusing (?). > > void combine(ComplexArray c) { > checkCapacity(c.size); > System.arraycopy(c.data, 0, data, size, c.size); > size += c.size; > } > > private void checkCapacity(int length) { > final int minCapacity = size + length; > final int oldCapacity = data.length; > if (minCapacity > oldCapacity) { > int newCapacity = ((oldCapacity * 3) >>> 1) + 1; > if (newCapacity < minCapacity) { > newCapacity = minCapacity; > } > data = Arrays.copyOf(data, newCapacity); > } > } > } To allow easy usage as a multidimensional array, the API should contain the "MultidimensionalCounter"[1] or equivalent: public class ComplexArray { private final MultidimensionalCounter counter; // ... public ComplexArray(int... dimensions) { counter = new MultidimensionalCounter(dimensions); data = new data[counter.getSize()]; } /** * @param indices Storage location. * @return the complex number stored at the given location. */ public Complex get(int... indices) { return data[counter.getCount(indices)]; } } Or perhaps the data cube should be an additional abstraction on top of "ComplexArray" (?). By the way, I'd suggest to find a more descriptive name for "ComplexArray" (since the 1D array is an "implementation detail"). > > And I guess that leads to my last question -- suppose someone wants to call > > a trig function on a series of Complex numbers. Now let's imagine the > > primitives have been stored in some maximally efficient way. It seems like, > > to use any of the functionality in Complex, these numbers would have to be > > unpacked, cast as Complex, operated on, then cast back to how they are > > being stored. I wonder if that would prove to be more efficient in the end. Anything will be more efficient than swapping when getting near to the max memory limit. > Probably not more efficient to create all the Complex instances. Ideally > you would operate on the primitive data directly. This could certainly go to the wish-list (with the use-case being the "ComplexArray"). > Looking at what Complex is capable of doing it seems that there are a > lot of operations that would have to be duplicated to allow operations > to work on primitives. However you can create the API and implement it > lazily using the above object, e.g. > > public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> mapper) { > return array.stream() > .map(mapper) > .collect(ComplexArray::new, ComplexArray::add, ComplexArray::combine); > } > > public static ComplexArray log(ComplexArray array) { > return apply(Complex::log); > } > > This does create a new Complex object for each operation function > applied to the data. > > The other methods that accept another Complex as an argument would have > to be handled differently as you cannot start two streams (for each > ComplexArray) in parallel. These are add, divide, multiply, subtract. So > this would require the computation be moved inside ComplexArray: > > ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> function) { > // TODO: argument checks on size > return IntStream.range(0, size / 2) > .mapToObj(i -> { > Complex c1 = Complex.ofCartesian(data[i*2], data[i*2+1]); > Complex c2 = Complex.ofCartesian(other.data[i*2], other.data[i*2+1]); > return function.apply(c1, c2); > }).collect(ComplexArray::new, ComplexArray::add, ComplexArray::combine); > } > > ComplexArray a1 = ...; > ComplexArray a2 = ...; > ComplexArray a3 = a1.apply(a2, Complex::add); > > > Note: To avoid creating new objects for all computations would require a > rewrite of Complex so that the computations are done by static methods > that accept a supplier of real and imaginary components, compute the > calculation and send the answer to a function that receives the new > components, for example: > > interface ComplexData { > double getReal(); > double getImaginary(); > } > > interface ToComplexData { > ComplexData create(double real, double imaginary); > } > > // For example > public static ComplexData conjugate(ComplexData in, ToComplexData out) { > return out.create(in.getReal(), -in.getImaginary()); > } > > // This would make the existing API method > public Complex conjugate() { > return conjugate(this, Complex::ofCartesian); > } > > This would allow the ComplexArray to provide an object that implements > ComplexData to read the current datapoint and an object that implements > ToComplexData to write back to the equivalent datapoint in a result > adding something like this to the ComplexArray object using inner > classes to read and write to the data array: > > class ComplexDataReader implements ComplexData { > int pos; > double getReal() { return data[pos*2]; } > double getImaginary() { return data[pos*2+1]; } > } > > class ComplexDataAppender implements ToComplexData { > ComplexData create(double real, double imaginary) { > data[size++] = real; > data[size++] = imaginary; > return null; > } > } > > ComplexDataReader getDataReader() { return new ComplexDataReader(); } > ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); } > > ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) { > ComplexDataReader in = getDataReader(); > // Initialise the capacity > ComplexArray result = new ComplexArray(size); > ComplexDataAppender out = result.getDataAppender(); > while (in.pos < size) { > function.apply(in, out); > in.pos += 2; > } > return result; > } > > > // Apply the function from above: > // public static ComplexData conjugate(ComplexData in, ToComplexData out) > > ComplexArray a1 = ...; > ComplexArray a2 = a1.apply(Complex::conjugate); > > This allows a user to pass a function that can do anything to the > current data to create new real and imaginary components which are > stored in a new array object. I do not know how this would effect > performance without a full comparison of the API implemented with both > concepts. > > If the storage of the array of complex data were abstracted it may lead > to simplification of ComplexUtils. Each of the methods to convert from > multi-dimensional arrays of Complex can be replaced by a single method > to extract the required data from the ComplexArray. > > I can see at least the simple abstraction as useful. +1 > Performance > improvements using in-place reading of the real and imaginary components > for computations can be added later. +1 > It would be easier to do this if > the ComplexArray is in the same package allowing package private methods > to be added to Complex to do computations. +1 > Alex Regards, Gilles [1] http://commons.apache.org/proper/commons-math/apidocs/index.html > > > > > On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <[hidden email]> wrote: > > > >> Hello. > >> > >> The class "ComplexUtils" deal with multi-dimensional arrays that hold > >> instances of the "Complex" class. > >> I've recently encountered a use-case where it was pointed out that storing > >> many "Complex" instances (as seems the purpose of these utilities) is quite > >> inefficient memory-wise as each instance will take 32 bytes[1] while the > >> real and imaginary parts would only take 16 bytes as 2 primitive "double"s. > >> This is compounded by multi-dimensional array where each sub-dimensional > >> element is an array object (and thus takes another additional 16 bytes). > >> For example, a > >> double[10][5][4] > >> would take > >> 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8 > >> = 2416 bytes. > >> Assuming that in the above array, the last dimension holds 2 complex > >> numbers, > >> the same data can be represented as > >> Complex[10][5][2] > >> that would take > >> 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8 > >> = 4016 bytes. > >> In both cases, the payload (10 * 5 * 2 complex numbers) is > >> 10 * 5 * 2 * 2 * 8 > >> = 1600 bytes. > >> If stored in a one-dimensional array, the size in memory would be 1616 > >> bytes. > >> Thus in the case of a data cube holding 100 complex numbers, a 3D array > >> takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D > >> array. > >> If this is correct, I wonder whether we should advertise such a > >> "ComplexUtils" > >> class. It would perhaps be preferable to propose a data cube > >> abstraction.[2] > >> > >> WDYT? > >> > >> Regards, > >> Gilles > >> > >> [1] https://www.baeldung.com/java-size-of-object > >> [2] Based on > >> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html > >> --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On 06/11/2019 12:41, Gilles Sadowski wrote: > Hello. > > Le mar. 5 nov. 2019 à 18:38, Alex Herbert <[hidden email]> a écrit : >> On 05/11/2019 00:09, Eric Barnhill wrote: >>> That's interesting. The JTransforms library performs Fourier transforms >>> that can take complex input, output, or both. They do this with interleaved >>> double[] arrays, which I suppose is much more space efficient, and the >>> status of a number as real or imaginary is implicit by its location being >>> odd or even. >> The packing of multi-dimensional structures into a single array is >> common in image processing. It has computational efficiency when >> operating on the entire set of numbers. >> >> It is used a lot in image processing to pack 2D data as a single array. >> In this case even random access to the (x,y) index as: >> >> 2D: [x][y] >> 1D: [y * maxx + x] >> >> is comparable in speed as the later avoids the second array dereference. > I suspected that it might be the case. > If you have benchmark data that confirm it, then is there ever any > advantage to a multidimensional array? > >> A single array has advantages for a copy operation as it requires a >> single .clone() method call. The disadvantage is that the ultimate size >> of the multi-dimension array is limited by the size of an array. > That's ~17 GB, and more than 1 billion complex numbers. > And if there are use-cases that require very large data cubes, then the > abstraction would allow to add further indirections. > >> In the case of complex numbers the use of alternating indices to store >> real and imaginary allows the array to be easily resized and extended. >> Operations that process all positions only need to cache in memory a >> sub-section of the array. >> >> [re1, im1, re2, im2, re3, im3] >> >> The alternative is to store real and imaginary in a large block in >> either 1 array or 2 arrays: >> >> [re1, re2, re3, im1, im2, im3] >> >> [re1, re2, re3] >> [im1, im2, im3] >> >> This has advantages for operations that require either the real or the >> imaginary components on their own. It is a bit more work to extend the >> data as it requires two copies of the existing data and packing of the >> new values as the appropriate position. >> >> For small sizes the use of 2 arrays this would be less efficient memory >> wise. It would require some type of pointer to hold the pair. However it >> could store double the amount of complex numbers. >>> The MultidimensionalCounter functionality you mention is for example known >>> in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of >>> certain functions which can improve performance. Some people swear by it. I >>> always found it an additional mental exercise that I didn't want to occupy >>> myself with unless I absolutely had to. So, maybe it makes sense to >>> "advertise" this approach like you say, but some users may just want the 3D >>> arrays for more rapid prototyping-ish applications. > As Alex has shown, the 1D representation is abstracted away: The data is > accessed as a multi-dimensional array and the user never needs to remember > that it is interleaved. > >>> I wonder if there isn't some streaming solution for this -- the numbers are >>> stored as an interleaved 1D array, but are streamed through a Complex >>> constructor before any needed operations are performed. >> My initial thought is not as double values. A DoubleStream processes >> single values. To collect the real and imaginary components would >> require that the stream can be processed two at a time. >> >> This structure would work: >> >> class ComplexArray { >> // Must be even in length. Packed as (real, imaginary) >> double[] data; >> int size; >> >> Stream<Complex> stream() { >> return IntStream.range(0, size / 2) >> .mapToObj(i -> Complex.ofCartesian(data[i*2], data[i*2+1])); >> } >> >> void add(Complex c) { >> checkCapacity(2); >> data[size++] = c.getReal(); >> data[size++] = c.getImaginary(); >> } > Perhaps "store" would be less confusing (?). prototype for functionality. > >> void combine(ComplexArray c) { >> checkCapacity(c.size); >> System.arraycopy(c.data, 0, data, size, c.size); >> size += c.size; >> } >> >> private void checkCapacity(int length) { >> final int minCapacity = size + length; >> final int oldCapacity = data.length; >> if (minCapacity > oldCapacity) { >> int newCapacity = ((oldCapacity * 3) >>> 1) + 1; >> if (newCapacity < minCapacity) { >> newCapacity = minCapacity; >> } >> data = Arrays.copyOf(data, newCapacity); >> } >> } >> } > To allow easy usage as a multidimensional array, the API should contain > the "MultidimensionalCounter"[1] or equivalent: > > public class ComplexArray { > private final MultidimensionalCounter counter; > // ... > > public ComplexArray(int... dimensions) { > counter = new MultidimensionalCounter(dimensions); > data = new data[counter.getSize()]; > } > > /** > * @param indices Storage location. > * @return the complex number stored at the given location. > */ > public Complex get(int... indices) { > return data[counter.getCount(indices)]; > } > } > > Or perhaps the data cube should be an additional abstraction on top of > "ComplexArray" (?). Array can be renamed to Collection/List? Thus the count is the only thing that matters, and optionally a constant time access get(int index) method. This can later be reshaped to a Matrix representation using the MultidimensionalCounter pattern. We have two use cases: 1. You know the final count of Complex objects 2. You do not know the count of Complex objects Use case 2 is used in streams where the ComplexList must be expandable for use as a collector. This can be satisfied with a constructor with an initial capacity. The ComplexList would thus offer a specialisation of ArrayList<Complex> by storing the Complex objects using primitive array(s). Use case 2 rules out the possibility of immutability. This is the type of functionality under discussion and could be served using an interface to allow different implementations: public interface ComplexList { /** * The size of complex numbers contained in the list. * * @return the size */ int size(); /** * Gets the complex number at the specified index. * * @param index the index * @return the complex * @throws IndexOutOfBoundsException if the index is out of range */ Complex get(int index); /** * Gets the real component of the complex number at the specified index. * * @param index the index * @return the real component * @throws IndexOutOfBoundsException if the index is out of range */ double getReal(int index); /** * Gets the imaginary component of the complex number at the specified index. * * @param index the index * @return the imaginary component * @throws IndexOutOfBoundsException if the index is out of range */ double getImaginary(int index); /** * Sets the complex number at the specified index. * * @param index the index * @param complex the complex * @throws IndexOutOfBoundsException if the index is out of range */ void set(int index, Complex complex); /** * Sets the complex number at the specified index. * * @param index the index * @param real the real component * @param imaginary the imaginary component * @throws IndexOutOfBoundsException if the index is out of range */ void set(int index, double real, double imaginary); /** * Adds the complex number to the list. * * @param complex the complex */ void add(Complex complex); /** * Adds the complex number to the list. * * @param real the real component * @param imaginary the imaginary component */ void add(double real, double imaginary); /** * Stream the complex numbers. * * @return the stream */ Stream<Complex> stream(); /** * Apply the function to the complex numbers in the list to create a new list. * * @param function the function * @return the result */ ComplexList apply(UnaryOperator<Complex> function); /** * Apply the function to the paired complex numbers in both lists to create a new list. * * @param other the other list (must be the same size) * @param function the function * @return the result */ ComplexList apply(ComplexList other, BiFunction<Complex, Complex, Complex> function); } This could be separated to put the set and add methods in a sub-interface allowing the top level interface to be an immutable object. However the apply functions currently are specified using Complex. Efficiency would be gained using primitive specialisations for operating on 1 or 2 complex numbers and outputting the results to a provided consumer: public interface ComplexProvider { /** * Gets the real component of the complex number. * * @return the real component */ double getReal(); /** * Gets the imaginary component of the complex number. * * @return the imaginary component */ double getImaginary(); } public interface ComplexConsumer { /** * Accept the components of the complex number. * Optionally return a new ComplexProvider with the components. * * @param real the real * @param imaginary the imaginary * @return the complex provider (or null) */ ComplexProvider accept(double real, double imaginary); } public interface ComplexFunction { /** * Apply a function to the components of the complex number. The results are sent * to the consumer and the output from the consumer is returned. * * @param complex the complex number * @param consumer the consumer * @return the result produced by the complex consumer */ ComplexProvider apply(ComplexProvider complex, ComplexConsumer consumer); /** * Returns a composed function that first applies this function to * its input, and then applies the {@code after} function to the result. * The same consumer is used for both function invocations thus it must return * a ComplexProvider. * * @param after the function to apply after this function is applied * @return a composed function that first applies this function and then * applies the {@code after} function */ default ComplexFunction andThen(ComplexFunction after) { return (t, u) -> after.apply(apply(t, u), u); } } public interface BiComplexFunction { /** * Apply a function to the components of two complex numbers. The results are sent * to the consumer and the output from the consumer is returned. * * @param complex1 the first complex number * @param complex2 the second complex number * @param consumer the consumer * @return the result produced by the complex consumer */ ComplexProvider apply(ComplexProvider complex1, ComplexProvider complex2, ComplexConsumer consumer); } /** * Copy the list. * * @return the complex list */ ComplexList copy(); /** * Apply the function to the complex numbers in the list to create a new list. * * @param function the function * @return the result */ default ComplexList apply(ComplexFunction function) { return copy().map(function); } /** * Apply the function to the complex numbers in the list in-place. * * @param function the function * @return the list */ ComplexList map(ComplexFunction function); /** * Apply the function to the paired complex numbers in both lists to create a new list. * * @param other the other list (must be the same size) * @param function the function * @return the result */ default ComplexList apply(ComplexList other, BiComplexFunction function) { return copy().map(other, function); } /** * Apply the function to the paired complex numbers in both lists to modify this list in-place. * * @param other the other list (must be the same size) * @param function the function * @return the list */ ComplexList map(ComplexList other, BiComplexFunction function); If Complex implemented ComplexProvider it would allow all of Complex to be rewritten to have static ComplexFunction implementations which are called from the instance: public static ComplexProvider conjugate(ComplexProvider in, ComplexConsumer out) { return out.accept(in.getReal(), -in.getImaginary()); } public Complex conjugate() { return (Complex) conjugate(this, Complex::ofCartesian); } You can manipulate the list using: ComplexList list = ...; list.map(Complex::conjugate); ComplexFunction f = Complex::conjugate; f = f.andThen(Complex::negate); ComplexList list2 = list.apply(f); This is an only an idea. Digging through the code for Complex it seems that some functions are non-trivial and would not work well with this design. Provision of a Stream<Complex> would require that the primitives are used to create objects. This could be changed to Stream<ComplexProvider> using a spliterator implementation to prevent object creation for each Complex; only the spliterator would require the current index in the underlying storage: /** * Creates a {@link Spliterator} over the elements in this list. * * @return the spliterator */ Spliterator<ComplexProvider> spliterator(); /** * Stream the complex numbers. * * @return the stream */ default Stream<ComplexProvider> stream() { return StreamSupport.stream(spliterator(), false); } Unfortunately processing the stream as encountered may result in out of order elements if parallelisation occurs so the results may not be collected back in the same order. Here there are solutions to different problems. So which are we trying to solve: 1. Efficient storage of large amounts of complex numbers. 2. Efficient computation on large amounts of complex numbers. 3. Custom computation on complex numbers. 4. Computation on large amounts of complex numbers in-place. For 1 we can use the abstraction to a collection (e.g. ComplexList). For 2 we can duplicate all methods in Complex to work directly on the underlying data. For 2/3 we can rewrite computation of Complex to work on input providers and output consumers and allow functions to be applied to the collection. For 4 we require different mutability of the collection. I will continue to think about this as the solution to all issues is not straightforward. Opinions are welcomed. > > By the way, I'd suggest to find a more descriptive name for "ComplexArray" > (since the 1D array is an "implementation detail"). > >>> And I guess that leads to my last question -- suppose someone wants to call >>> a trig function on a series of Complex numbers. Now let's imagine the >>> primitives have been stored in some maximally efficient way. It seems like, >>> to use any of the functionality in Complex, these numbers would have to be >>> unpacked, cast as Complex, operated on, then cast back to how they are >>> being stored. I wonder if that would prove to be more efficient in the end. > Anything will be more efficient than swapping when getting near to the max > memory limit. > >> Probably not more efficient to create all the Complex instances. Ideally >> you would operate on the primitive data directly. > This could certainly go to the wish-list (with the use-case being the > "ComplexArray"). > >> Looking at what Complex is capable of doing it seems that there are a >> lot of operations that would have to be duplicated to allow operations >> to work on primitives. However you can create the API and implement it >> lazily using the above object, e.g. >> >> public static ComplexArray apply(ComplexArray array, UnaryOperator<Complex> mapper) { >> return array.stream() >> .map(mapper) >> .collect(ComplexArray::new, ComplexArray::add, ComplexArray::combine); >> } >> >> public static ComplexArray log(ComplexArray array) { >> return apply(Complex::log); >> } >> >> This does create a new Complex object for each operation function >> applied to the data. >> >> The other methods that accept another Complex as an argument would have >> to be handled differently as you cannot start two streams (for each >> ComplexArray) in parallel. These are add, divide, multiply, subtract. So >> this would require the computation be moved inside ComplexArray: >> >> ComplexArray apply(ComplexArray other, BiFunction<Complex, Complex, Complex> function) { >> // TODO: argument checks on size >> return IntStream.range(0, size / 2) >> .mapToObj(i -> { >> Complex c1 = Complex.ofCartesian(data[i*2], data[i*2+1]); >> Complex c2 = Complex.ofCartesian(other.data[i*2], other.data[i*2+1]); >> return function.apply(c1, c2); >> }).collect(ComplexArray::new, ComplexArray::add, ComplexArray::combine); >> } >> >> ComplexArray a1 = ...; >> ComplexArray a2 = ...; >> ComplexArray a3 = a1.apply(a2, Complex::add); >> >> >> Note: To avoid creating new objects for all computations would require a >> rewrite of Complex so that the computations are done by static methods >> that accept a supplier of real and imaginary components, compute the >> calculation and send the answer to a function that receives the new >> components, for example: >> >> interface ComplexData { >> double getReal(); >> double getImaginary(); >> } >> >> interface ToComplexData { >> ComplexData create(double real, double imaginary); >> } >> >> // For example >> public static ComplexData conjugate(ComplexData in, ToComplexData out) { >> return out.create(in.getReal(), -in.getImaginary()); >> } >> >> // This would make the existing API method >> public Complex conjugate() { >> return conjugate(this, Complex::ofCartesian); >> } >> >> This would allow the ComplexArray to provide an object that implements >> ComplexData to read the current datapoint and an object that implements >> ToComplexData to write back to the equivalent datapoint in a result >> adding something like this to the ComplexArray object using inner >> classes to read and write to the data array: >> >> class ComplexDataReader implements ComplexData { >> int pos; >> double getReal() { return data[pos*2]; } >> double getImaginary() { return data[pos*2+1]; } >> } >> >> class ComplexDataAppender implements ToComplexData { >> ComplexData create(double real, double imaginary) { >> data[size++] = real; >> data[size++] = imaginary; >> return null; >> } >> } >> >> ComplexDataReader getDataReader() { return new ComplexDataReader(); } >> ComplexDataAppender getDataAppender() { return new ComplexDataAppender(); } >> >> ComplexArray apply(BiFunction<ComplexData, Complex, Complex> function) { >> ComplexDataReader in = getDataReader(); >> // Initialise the capacity >> ComplexArray result = new ComplexArray(size); >> ComplexDataAppender out = result.getDataAppender(); >> while (in.pos < size) { >> function.apply(in, out); >> in.pos += 2; >> } >> return result; >> } >> >> >> // Apply the function from above: >> // public static ComplexData conjugate(ComplexData in, ToComplexData out) >> >> ComplexArray a1 = ...; >> ComplexArray a2 = a1.apply(Complex::conjugate); >> >> This allows a user to pass a function that can do anything to the >> current data to create new real and imaginary components which are >> stored in a new array object. I do not know how this would effect >> performance without a full comparison of the API implemented with both >> concepts. >> >> If the storage of the array of complex data were abstracted it may lead >> to simplification of ComplexUtils. Each of the methods to convert from >> multi-dimensional arrays of Complex can be replaced by a single method >> to extract the required data from the ComplexArray. >> >> I can see at least the simple abstraction as useful. > +1 > >> Performance >> improvements using in-place reading of the real and imaginary components >> for computations can be added later. > +1 > >> It would be easier to do this if >> the ComplexArray is in the same package allowing package private methods >> to be added to Complex to do computations. > +1 > >> Alex > Regards, > Gilles > > [1] http://commons.apache.org/proper/commons-math/apidocs/index.html > >>> On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski <[hidden email]> wrote: >>> >>>> Hello. >>>> >>>> The class "ComplexUtils" deal with multi-dimensional arrays that hold >>>> instances of the "Complex" class. >>>> I've recently encountered a use-case where it was pointed out that storing >>>> many "Complex" instances (as seems the purpose of these utilities) is quite >>>> inefficient memory-wise as each instance will take 32 bytes[1] while the >>>> real and imaginary parts would only take 16 bytes as 2 primitive "double"s. >>>> This is compounded by multi-dimensional array where each sub-dimensional >>>> element is an array object (and thus takes another additional 16 bytes). >>>> For example, a >>>> double[10][5][4] >>>> would take >>>> 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8 >>>> = 2416 bytes. >>>> Assuming that in the above array, the last dimension holds 2 complex >>>> numbers, >>>> the same data can be represented as >>>> Complex[10][5][2] >>>> that would take >>>> 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8 >>>> = 4016 bytes. >>>> In both cases, the payload (10 * 5 * 2 complex numbers) is >>>> 10 * 5 * 2 * 2 * 8 >>>> = 1600 bytes. >>>> If stored in a one-dimensional array, the size in memory would be 1616 >>>> bytes. >>>> Thus in the case of a data cube holding 100 complex numbers, a 3D array >>>> takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D >>>> array. >>>> If this is correct, I wonder whether we should advertise such a >>>> "ComplexUtils" >>>> class. It would perhaps be preferable to propose a data cube >>>> abstraction.[2] >>>> >>>> WDYT? >>>> >>>> Regards, >>>> Gilles >>>> >>>> [1] https://www.baeldung.com/java-size-of-object >>>> [2] Based on >>>> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html >>>> > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > |
Hi.
>>> [...] > > > > public class ComplexArray { > > private final MultidimensionalCounter counter; > > // ... > > > > public ComplexArray(int... dimensions) { > > counter = new MultidimensionalCounter(dimensions); > > data = new data[counter.getSize()]; > > } > > > > /** > > * @param indices Storage location. > > * @return the complex number stored at the given location. > > */ > > public Complex get(int... indices) { > > return data[counter.getCount(indices)]; > > } > > } > > > > Or perhaps the data cube should be an additional abstraction on top of > > "ComplexArray" (?). > I think adding the abstraction on top is easier. > > Array can be renamed to Collection/List? Thus the count is the only > thing that matters, and optionally a constant time access get(int index) > method. +1 > > This can later be reshaped to a Matrix representation using the > MultidimensionalCounter pattern. > > We have two use cases: > > 1. You know the final count of Complex objects > 2. You do not know the count of Complex objects > > Use case 2 is used in streams where the ComplexList must be expandable > for use as a collector. This can be satisfied with a constructor with an > initial capacity. The ComplexList would thus offer a specialisation of > ArrayList<Complex> by storing the Complex objects using primitive array(s). > > Use case 2 rules out the possibility of immutability. This is the type > of functionality under discussion and could be served using an interface > to allow different implementations: > > public interface ComplexList { > [...] > } +1 > This could be separated to put the set and add methods in a > sub-interface allowing the top level interface to be an immutable object. Perhaps better to follow the JDK convention and provide a public static ComplexList unmodifiableList(ComplexList delegate) method. > However the apply functions currently are specified using Complex. > Efficiency would be gained using primitive specialisations for operating > on 1 or 2 complex numbers and outputting the results to a provided consumer: > > public interface ComplexProvider { > /** > * Gets the real component of the complex number. > * > * @return the real component > */ > double getReal(); > > /** > * Gets the imaginary component of the complex number. > * > * @return the imaginary component > */ > double getImaginary(); > } > > public interface ComplexConsumer { > /** > * Accept the components of the complex number. > * Optionally return a new ComplexProvider with the components. > * > * @param real the real > * @param imaginary the imaginary > * @return the complex provider (or null) > */ > ComplexProvider accept(double real, double imaginary); > } What is gained wrt using "Complex" instances directly? > [...] > > Unfortunately processing the stream as encountered may result in out of > order elements if parallelisation occurs so the results may not be > collected back in the same order. IIUC, this would be a non-starter for the data cube abstraction. > Here there are solutions to different problems. So which are we trying > to solve: > > 1. Efficient storage of large amounts of complex numbers. > > 2. Efficient computation on large amounts of complex numbers. Those two IMO. > > 3. Custom computation on complex numbers. Not sure what you mean. > > 4. Computation on large amounts of complex numbers in-place. This would assume very, very, large amounts of data. For applications that would manage with data that fill at most half the available memory, this could be emulated with ComplexList data = ... data = function.apply(data); And, with a "data cube" abstraction with several underlying blocks of data, the amount of needed "spare" RAM could be reduced at will: Instead of storing one block of 1000 x 1000, one could store 4 of 500 x 500; process them sequentially would require four times less spare RAM. > For 1 we can use the abstraction to a collection (e.g. ComplexList). > > For 2 we can duplicate all methods in Complex to work directly on the > underlying data. > > For 2/3 we can rewrite computation of Complex to work on input providers > and output consumers and allow functions to be applied to the collection. > > For 4 we require different mutability of the collection. I don't think that immutability is a very useful requirement when handling very large amounts of data. E.g. the data cube should probably allow this kinf of usage: ComplexCube c = ... c.transformInPlace(function); > I will continue to think about this as the solution to all issues is not > straightforward. I didn't want to push this too far. The sole use-case I was aware of is an "OutOfMemoryError" due to storing "Complex" instances in an array of arrays of arrays of... I think that an interesting example application would be how to compute the FFT of the stored data (e.g. using "JTransforms"). Another nice example would be fractal computations. Best, Gilles > > Opinions are welcomed. > > > [...] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
> On 6 Nov 2019, at 18:08, Gilles Sadowski <[hidden email]> wrote: > > Hi. > >>>> [...] >>> >>> public class ComplexArray { >>> private final MultidimensionalCounter counter; >>> // ... >>> >>> public ComplexArray(int... dimensions) { >>> counter = new MultidimensionalCounter(dimensions); >>> data = new data[counter.getSize()]; >>> } >>> >>> /** >>> * @param indices Storage location. >>> * @return the complex number stored at the given location. >>> */ >>> public Complex get(int... indices) { >>> return data[counter.getCount(indices)]; >>> } >>> } >>> >>> Or perhaps the data cube should be an additional abstraction on top of >>> "ComplexArray" (?). >> I think adding the abstraction on top is easier. >> >> Array can be renamed to Collection/List? Thus the count is the only >> thing that matters, and optionally a constant time access get(int index) >> method. > > +1 > >> >> This can later be reshaped to a Matrix representation using the >> MultidimensionalCounter pattern. >> >> We have two use cases: >> >> 1. You know the final count of Complex objects >> 2. You do not know the count of Complex objects >> >> Use case 2 is used in streams where the ComplexList must be expandable >> for use as a collector. This can be satisfied with a constructor with an >> initial capacity. The ComplexList would thus offer a specialisation of >> ArrayList<Complex> by storing the Complex objects using primitive array(s). >> >> Use case 2 rules out the possibility of immutability. This is the type >> of functionality under discussion and could be served using an interface >> to allow different implementations: >> >> public interface ComplexList { >> [...] >> } > > +1 > >> This could be separated to put the set and add methods in a >> sub-interface allowing the top level interface to be an immutable object. > > Perhaps better to follow the JDK convention and provide a > public static ComplexList unmodifiableList(ComplexList delegate) > method. > >> However the apply functions currently are specified using Complex. >> Efficiency would be gained using primitive specialisations for operating >> on 1 or 2 complex numbers and outputting the results to a provided consumer: >> >> public interface ComplexProvider { >> /** >> * Gets the real component of the complex number. >> * >> * @return the real component >> */ >> double getReal(); >> >> /** >> * Gets the imaginary component of the complex number. >> * >> * @return the imaginary component >> */ >> double getImaginary(); >> } >> >> public interfaceComplexProvider { >> /** >> * Accept the components of the complex number. >> * Optionally return a new ComplexProvider with the components. >> * >> * @param real the real >> * @param imaginary the imaginary >> * @return the complex provider (or null) >> */ >> ComplexProvider accept(double real, double imaginary); >> } > > What is gained wrt using "Complex" instances directly? The above is a typo. The accept method should be in a ComplexConsumer interface. If the data is in a primitive array you would like to be able to access it, and then set it back to the same array or a different array without going through creation of a Complex for each (real, imaginary) pair. So the ComplexProvider and ComplexConsumer provide an input and output for any function that changes the complex data. I’ve made it a bit more complicated with the ComplexConsumer returning a ComplexProvider just to allow all the methods in Complex to be rewritten to provide static methods that can use these interfaces. A simpler approach is for the ComplexList to allow modification in-place by providing a list cursor that can traverse the list (like an iterator) and also skip to positions. It would allow get and set of the data at the current position. This cursor could be used by another class to apply functions in place to the data. The functions could be a user specified generic function using the ComplexFunction interface I suggested. I think that performance would have to be assessed for various prototypes. Some of the methods in Complex have a lot of work. So the creation of a Complex for each position may not be a big overhead allowing a stream approach to build a custom process pipeline which is then collected into a new list at the end. For operations that are fast such as negate, add, subtract, conjugate these would be easy to code into the ComplexList to operate directly on the data array. > >> [...] >> >> Unfortunately processing the stream as encountered may result in out of >> order elements if parallelisation occurs so the results may not be >> collected back in the same order. > > IIUC, this would be a non-starter for the data cube abstraction. > >> Here there are solutions to different problems. So which are we trying >> to solve: >> >> 1. Efficient storage of large amounts of complex numbers. >> >> 2. Efficient computation on large amounts of complex numbers. > > Those two IMO. > >> >> 3. Custom computation on complex numbers. > > Not sure what you mean. Passing a function that will accept the real and imaginary components, do a lot of work and send it to an output consumer. > >> >> 4. Computation on large amounts of complex numbers in-place. > > This would assume very, very, large amounts of data. > For applications that would manage with data that fill at most half the > available memory, this could be emulated with > ComplexList data = ... > data = function.apply(data); > > And, with a "data cube" abstraction with several underlying blocks > of data, the amount of needed "spare" RAM could be reduced at will: > Instead of storing one block of 1000 x 1000, one could store 4 of > 500 x 500; process them sequentially would require four times less > spare RAM. > >> For 1 we can use the abstraction to a collection (e.g. ComplexList). >> >> For 2 we can duplicate all methods in Complex to work directly on the >> underlying data. >> >> For 2/3 we can rewrite computation of Complex to work on input providers >> and output consumers and allow functions to be applied to the collection. >> >> For 4 we require different mutability of the collection. > > I don't think that immutability is a very useful requirement when handling > very large amounts of data. E.g. the data cube should probably allow this > kinf of usage: > ComplexCube c = ... > c.transformInPlace(function); > >> I will continue to think about this as the solution to all issues is not >> straightforward. > > I didn't want to push this too far. The sole use-case I was aware of is an > "OutOfMemoryError" due to storing "Complex" instances in an array of > arrays of arrays of... > > I think that an interesting example application would be how to compute > the FFT of the stored data (e.g. using "JTransforms”). JTransforms just creates FFT objects and then works on provided arrays for forward and reverse transforms. Here’s some code for a float forward transform of some image pixels. Here it is square (more efficient) but it doesn’t have to be. // In reality pixels is actually an image float[] pixels = new float[size * size]; // FFT data is twice the size final float[] data = new float[size * size * 2]; final FloatFFT_2D fft = new FloatFFT_2D(size, size); System.arraycopy(pixels, 0, data, 0, pixels.length); fft.realForwardFull(data); So to use JTransforms the data has to be packed in the correct shape. It uses alternating real and imaginary components of successive numbers. It would mean that the ComplexList should have constructor that can wrap an existing complex array. You can then pass it the data output from JTransforms. > > Another nice example would be fractal computations. Not familiar with this. My use case is two complex arrays created from FFT of real data which are used to compute a conjugate multiplication and the absolute magnitude of each array. This is used to compute a cross correlation. > > Best, > Gilles > >> >> Opinions are welcomed. >> >>> [...] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
> >>>> [...]
> >>> > >>> public class ComplexArray { > >>> private final MultidimensionalCounter counter; > >>> // ... > >>> > >>> public ComplexArray(int... dimensions) { > >>> counter = new MultidimensionalCounter(dimensions); > >>> data = new data[counter.getSize()]; > >>> } > >>> > >>> /** > >>> * @param indices Storage location. > >>> * @return the complex number stored at the given location. > >>> */ > >>> public Complex get(int... indices) { > >>> return data[counter.getCount(indices)]; > >>> } > >>> } > >>> > >>> Or perhaps the data cube should be an additional abstraction on top of > >>> "ComplexArray" (?). > >> I think adding the abstraction on top is easier. > >> > >> Array can be renamed to Collection/List? Thus the count is the only > >> thing that matters, and optionally a constant time access get(int index) > >> method. > > > > +1 > > > >> > >> This can later be reshaped to a Matrix representation using the > >> MultidimensionalCounter pattern. > >> > >> We have two use cases: > >> > >> 1. You know the final count of Complex objects > >> 2. You do not know the count of Complex objects > >> > >> Use case 2 is used in streams where the ComplexList must be expandable > >> for use as a collector. This can be satisfied with a constructor with an > >> initial capacity. The ComplexList would thus offer a specialisation of > >> ArrayList<Complex> by storing the Complex objects using primitive array(s). > >> > >> Use case 2 rules out the possibility of immutability. This is the type > >> of functionality under discussion and could be served using an interface > >> to allow different implementations: > >> > >> public interface ComplexList { > >> [...] > >> } > > > > +1 > > > >> This could be separated to put the set and add methods in a > >> sub-interface allowing the top level interface to be an immutable object. > > > > Perhaps better to follow the JDK convention and provide a > > public static ComplexList unmodifiableList(ComplexList delegate) > > method. > > > >> However the apply functions currently are specified using Complex. > >> Efficiency would be gained using primitive specialisations for operating > >> on 1 or 2 complex numbers and outputting the results to a provided consumer: > >> > >> public interface ComplexProvider { > >> /** > >> * Gets the real component of the complex number. > >> * > >> * @return the real component > >> */ > >> double getReal(); > >> > >> /** > >> * Gets the imaginary component of the complex number. > >> * > >> * @return the imaginary component > >> */ > >> double getImaginary(); > >> } > >> > >> public interfaceComplexProvider { > >> /** > >> * Accept the components of the complex number. > >> * Optionally return a new ComplexProvider with the components. > >> * > >> * @param real the real > >> * @param imaginary the imaginary > >> * @return the complex provider (or null) > >> */ > >> ComplexProvider accept(double real, double imaginary); > >> } > > > > What is gained wrt using "Complex" instances directly? > > The above is a typo. The accept method should be in a ComplexConsumer interface. > > If the data is in a primitive array you would like to be able to access it, and then set it back to the same array or a different array without going through creation of a Complex for each (real, imaginary) pair. So the ComplexProvider and ComplexConsumer provide an input and output for any function that changes the complex data. I’ve made it a bit more complicated with the ComplexConsumer returning a ComplexProvider just to allow all the methods in Complex to be rewritten to provide static methods that can use these interfaces. I can see that one pass through the primitive array could yield the "real" and "imaginary" arguments to "accept"; but it must return an instance of a class (that implements "ComplexProvider"); hence calling a constructor. In a chain of operations, only the first call to the constructor is avoided. > > A simpler approach is for the ComplexList to allow modification in-place by providing a list cursor that can traverse the list (like an iterator) and also skip to positions. It would allow get and set of the data at the current position. This cursor could be used by another class to apply functions in place to the data. The functions could be a user specified generic function using the ComplexFunction interface I suggested. Still, I don't see what is the gain wrt to creating a result "ComplexList" and let the caller decide whether to replace the input data with the result: ComplexList in = ... ComplexList out = in.apply(function); in = out; > I think that performance would have to be assessed for various prototypes. Some of the methods in Complex have a lot of work. So the creation of a Complex for each position may not be a big overhead I'd be inclined to think so. > allowing a stream approach to build a custom process pipeline which is then collected into a new list at the end. For operations that are fast such as negate, add, subtract, conjugate these would be easy to code into the ComplexList to operate directly on the data array. At the cost of code duplication, and once allowed for some operations, there is no stopping of requests for others... > > > >> [...] > >> > >> Unfortunately processing the stream as encountered may result in out of > >> order elements if parallelisation occurs so the results may not be > >> collected back in the same order. > > > > IIUC, this would be a non-starter for the data cube abstraction. > > > >> Here there are solutions to different problems. So which are we trying > >> to solve: > >> > >> 1. Efficient storage of large amounts of complex numbers. > >> > >> 2. Efficient computation on large amounts of complex numbers. > > > > Those two IMO. > > > >> > >> 3. Custom computation on complex numbers. > > > > Not sure what you mean. > > Passing a function that will accept the real and imaginary components, do a lot of work and send it to an output consumer. So this falls in the category where one additional intermediate creation of an instance would not matter too much. > > > >> > >> 4. Computation on large amounts of complex numbers in-place. > > > > This would assume very, very, large amounts of data. > > For applications that would manage with data that fill at most half the > > available memory, this could be emulated with > > ComplexList data = ... > > data = function.apply(data); > > > > And, with a "data cube" abstraction with several underlying blocks > > of data, the amount of needed "spare" RAM could be reduced at will: > > Instead of storing one block of 1000 x 1000, one could store 4 of > > 500 x 500; process them sequentially would require four times less > > spare RAM. > > > >> For 1 we can use the abstraction to a collection (e.g. ComplexList). > >> > >> For 2 we can duplicate all methods in Complex to work directly on the > >> underlying data. > >> > >> For 2/3 we can rewrite computation of Complex to work on input providers > >> and output consumers and allow functions to be applied to the collection. > >> > >> For 4 we require different mutability of the collection. > > > > I don't think that immutability is a very useful requirement when handling > > very large amounts of data. E.g. the data cube should probably allow this > > kinf of usage: > > ComplexCube c = ... > > c.transformInPlace(function); > > > >> I will continue to think about this as the solution to all issues is not > >> straightforward. > > > > I didn't want to push this too far. The sole use-case I was aware of is an > > "OutOfMemoryError" due to storing "Complex" instances in an array of > > arrays of arrays of... > > > > I think that an interesting example application would be how to compute > > the FFT of the stored data (e.g. using "JTransforms”). > > JTransforms just creates FFT objects and then works on provided arrays for forward and reverse transforms. Here’s some code for a float forward transform of some image pixels. Here it is square (more efficient) but it doesn’t have to be. > > // In reality pixels is actually an image > float[] pixels = new float[size * size]; > > // FFT data is twice the size > final float[] data = new float[size * size * 2]; > final FloatFFT_2D fft = new FloatFFT_2D(size, size); > > System.arraycopy(pixels, 0, data, 0, pixels.length); > fft.realForwardFull(data); > > So to use JTransforms the data has to be packed in the correct shape. It uses alternating real and imaginary components of successive numbers. It would mean that the ComplexList should have constructor that can wrap an existing complex array. You can then pass it the data output from JTransforms. So IIUC, there is nothing that the ComplexList abstraction can bring to simplify that use-case, e.g. like ComplexList in = ... // Hypothetical function object. ComplexFunction fft = new JTransformsComplexFunction(in.size()); ComplexList out = in.apply(fft); > > Another nice example would be fractal computations. > > Not familiar with this. > > My use case is two complex arrays created from FFT of real data which are used to compute a conjugate multiplication and the absolute magnitude of each array. This is used to compute a cross correlation. This is also what started this thread: The user called the Commons Math's FFT utilities using arrays of "Complex" objects and got the "OutOfMemory" error. Hence the question of whether storing many "Complex" objects is ever useful, as compared to the "ComplexList", backed with an array of primitives, and instantiating transient "Complex" instances on-demand. Regards, Gilles > >>> [...] --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On Thu, Nov 7, 2019 at 6:09 AM Gilles Sadowski <[hidden email]> wrote:
> > This is also what started this thread: The user called the Commons Math's > FFT utilities using arrays of "Complex" objects and got the "OutOfMemory" > error. Hence the question of whether storing many "Complex" objects is > ever useful, as compared to the "ComplexList", backed with an array of > primitives, and instantiating transient "Complex" instances on-demand. > I'm glad it has provoked such improvements. As you implicitly reference in your reply, probably we should just be shading JTransforms in the first place. I started using JTransforms because I had trouble using the commons-math FFT as well. |
I should also add on this note, my use case for developing ComplexUtils in
the first place was compatibility with JTransforms and JOCL. In both cases I wanted to convert Complex[] arrays into interleaved double[] arrays to feed into algorithms using those libraries. On Thu, Nov 7, 2019 at 9:34 AM Eric Barnhill <[hidden email]> wrote: > > > On Thu, Nov 7, 2019 at 6:09 AM Gilles Sadowski <[hidden email]> > wrote: > >> >> This is also what started this thread: The user called the Commons Math's >> FFT utilities using arrays of "Complex" objects and got the "OutOfMemory" >> error. Hence the question of whether storing many "Complex" objects is >> ever useful, as compared to the "ComplexList", backed with an array of >> primitives, and instantiating transient "Complex" instances on-demand. >> > > I'm glad it has provoked such improvements. As you implicitly reference in > your reply, probably we should just be shading JTransforms in the first > place. I started using JTransforms because I had trouble using the > commons-math FFT as well. > |
Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill <[hidden email]> a écrit :
> > I should also add on this note, my use case for developing ComplexUtils in > the first place was compatibility with JTransforms and JOCL. In both cases > I wanted to convert Complex[] arrays into interleaved double[] arrays to > feed into algorithms using those libraries. Implicit in my remark below is the question: Where does the "Complex[]" come from? If it is never a good idea to create such an array, why provide code to convert from it? Do we agree that we should rather create the "ComplexList" abstraction, including accessors that shape the data for use with e.g. JTransforms? Regards, Gilles > On Thu, Nov 7, 2019 at 9:34 AM Eric Barnhill <[hidden email]> wrote: > > > > > > > On Thu, Nov 7, 2019 at 6:09 AM Gilles Sadowski <[hidden email]> > > wrote: > > > >> > >> This is also what started this thread: The user called the Commons Math's > >> FFT utilities using arrays of "Complex" objects and got the "OutOfMemory" > >> error. Hence the question of whether storing many "Complex" objects is > >> ever useful, as compared to the "ComplexList", backed with an array of > >> primitives, and instantiating transient "Complex" instances on-demand. > >> > > > > I'm glad it has provoked such improvements. As you implicitly reference in > > your reply, probably we should just be shading JTransforms in the first > > place. I started using JTransforms because I had trouble using the > > commons-math FFT as well. > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski <[hidden email]> wrote:
> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill <[hidden email]> a > écrit : > > > > I should also add on this note, my use case for developing ComplexUtils > in > > the first place was compatibility with JTransforms and JOCL. In both > cases > > I wanted to convert Complex[] arrays into interleaved double[] arrays to > > feed into algorithms using those libraries. > > Implicit in my remark below is the question: Where does the "Complex[]" > come from? If it is never a good idea to create such an array, why provide > code to convert from it? Do we agree that we should rather create the > "ComplexList" abstraction, including accessors that shape the data for > use with e.g. JTransforms? > > development. |
> On 7 Nov 2019, at 23:29, Eric Barnhill <[hidden email]> wrote: > > On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski <[hidden email]> wrote: > >> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill <[hidden email]> a >> écrit : >>> >>> I should also add on this note, my use case for developing ComplexUtils >> in >>> the first place was compatibility with JTransforms and JOCL. In both >> cases >>> I wanted to convert Complex[] arrays into interleaved double[] arrays to >>> feed into algorithms using those libraries. >> >> Implicit in my remark below is the question: Where does the "Complex[]" >> come from? If it is never a good idea to create such an array, why provide >> code to convert from it? Do we agree that we should rather create the >> "ComplexList" abstraction, including accessors that shape the data for >> use with e.g. JTransforms? >> >> > I completely agree this is a superior solution and look forward to its > development. I think at least the minimum implementation of the abstraction will be fairly easy. It can then be performance checked with a few variants. There currently is not a JMH module for numbers. Should I create one under a module included with an optional profile? I have spent a fair bit of time trying to fix coverage of Complex. It is now at 100% with 95% branch coverage. It currently tests all the edge cases for the C standard. However it doesn’t really test actual computations. I am going to continue on this to add more tests that the computations output the same values as a C reference implementation, and hit all the branches. When working on the class it raised a few questions: 1. HashCode This is current the same for any number with a NaN component. However the C standard states that NaN components are allowed in infinites. So perhaps the hashCode should be updated to at least distinguish infinites from NaN. 2. NaN definition What is not clear is if there exists a definition of NaN for a complex number. The C library has no way to test it with a single function call. You have to test both the real and imaginary components. On top of this you can do computation with NaNs. The GNU g++ compiler computes: (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf Thus this is allowing some computations with NaN to produce results. I don’t know if this is a bug in GNU g++ or intentional. The C standard states: “If both operands of the * operator are complex or if the second operand of the / operator is complex, the operator raises floating-point exceptions if appropriate for the calculation of the parts of the result, and may raise spurious floating-point exceptions.” That is pretty vague. It says that you can do what is “appropriate”. So it seems that perhaps we should only call a complex a NaN is both fields are NaN. If only one is NaN then the complex is semi-NaN and results of computations may or may not make sense. 3. Multiplication I applied a change to the algorithm provided in the C standard when recovering (NaN,NaN) results of multiplying infinities. If you multiply infinity by zero this should be NaN. “if one operand is an infinity and the other operand is a nonzero finite number or an infinity, then the result of the * operator is an infinity;" Note the “non-zero” condition. But the algorithm provided does not do this and you can get an infinite result back. This is what GNU g++ does: multiply(0 + i 0, 0 + i inf) = -nan + i -nan multiply(-0 + i 0, 0 + i inf) = -nan + i -nan multiply(0 + i 0, -0 + i inf) = -nan + i -nan multiply(-0 + i 0, -0 + i inf) = -nan + i -nan multiply(0 + i 0, 1 + i -inf) = -nan + i -nan multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan So they have implemented the multiply differently because here multiply infinity by zero is NaN. Note how printf distinguishes the sign of infinite values. I don’t think Java does this. It collates NaN into one representation. I will have to convert to raw long bits to get the sign for a cross reference to a C version. 4. Divide I have applied a change to this algorithm to check for overflow in the division. The reference text states: “if the first operand is a finite number and the second operand is an infinity, then the result of the / operator is a zero;” But for example GNU g++ does this: divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0 This is what the reference algorithm does if it is not changed. Here overflow cause problems in the result (which should be zero). So I think my patch is correct and the result should be (0 + i 0). The current unit tests are enforcing this requirement. 5. Edge cases Some of the methods have a large amount of if statements to test edge cases. These are very verbose and hard to follow. They also leave the common path of a non infinite, non nan, non zero value at the end. So a rearrangement may allow better branch prediction for the common use case. I’ll carry on working on this as I add more tests. 6. tanh I have put a TODO: note in the code about this. It tests edge case for infinites. Then doubles the real and imaginary components. These could then overflow to infinite and so trigger the edge case that has just been skipped. I am going to compare the result to the GNU C version to decide how to handle overflow. Numbers overall I’ve now updated the configs for checkstyle, PMD and spotbugs. Checkstyle:check and spotbugs:check are enabled. Pmd:check is not. There are some loose ends to fix for that. Unfortunately the syntax for exclusions in the PMD config file are poor. I’ve added some but I’ve not looked at everything yet. A few decisions might need to be taken on whether to drops rules rather than fix them to at least get pmd:check active. It spots a lot of things so is useful for PRs. I’ve also not looked at SonarCloud [1]. There are a few things on there to fix too. [1] https://sonarcloud.io/dashboard?id=commons-numbers <https://sonarcloud.io/dashboard?id=commons-numbers> |
Hi.
Le sam. 9 nov. 2019 à 01:48, Alex Herbert <[hidden email]> a écrit : > > > > > On 7 Nov 2019, at 23:29, Eric Barnhill <[hidden email]> wrote: > > > > On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski <[hidden email]> wrote: > > > >> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill <[hidden email]> a > >> écrit : > >>> > >>> I should also add on this note, my use case for developing ComplexUtils > >> in > >>> the first place was compatibility with JTransforms and JOCL. In both > >> cases > >>> I wanted to convert Complex[] arrays into interleaved double[] arrays to > >>> feed into algorithms using those libraries. > >> > >> Implicit in my remark below is the question: Where does the "Complex[]" > >> come from? If it is never a good idea to create such an array, why provide > >> code to convert from it? Do we agree that we should rather create the > >> "ComplexList" abstraction, including accessors that shape the data for > >> use with e.g. JTransforms? > >> > >> > > I completely agree this is a superior solution and look forward to its > > development. > > I think at least the minimum implementation of the abstraction will be fairly easy. It can then be performance checked with a few variants. > > There currently is not a JMH module for numbers. Should I create one under a module included with an optional profile? What is going to be benchmarked? > > I have spent a fair bit of time trying to fix coverage of Complex. It is now at 100% with 95% branch coverage. It currently tests all the edge cases for the C standard. However it doesn’t really test actual computations. Actual computations? Coveralls seems OK: https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java > I am going to continue on this to add more tests that the computations output the same values as a C reference implementation, and hit all the branches. > > When working on the class it raised a few questions: > > 1. HashCode > > This is current the same for any number with a NaN component. However the C standard states that NaN components are allowed in infinites. So perhaps the hashCode should be updated to at least distinguish infinites from NaN. What would be the purpose? > > 2. NaN definition > > What is not clear is if there exists a definition of NaN for a complex number. The C library has no way to test it with a single function call. You have to test both the real and imaginary components. On top of this you can do computation with NaNs. The GNU g++ compiler computes: > > (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf > > Thus this is allowing some computations with NaN to produce results. I don’t know if this is a bug in GNU g++ or intentional. The C standard states: > > “If both operands of the * operator are complex or if the second operand of the / operator > is complex, the operator raises floating-point exceptions if appropriate for the calculation > of the parts of the result, and may raise spurious floating-point exceptions.” > > That is pretty vague. It says that you can do what is “appropriate”. > > So it seems that perhaps we should only call a complex a NaN is both fields are NaN. If only one is NaN then the complex is semi-NaN and results of computations may or may not make sense. Do you have an example of a NaN that makes sense? What is semi-NaN? > > 3. Multiplication > > I applied a change to the algorithm provided in the C standard when recovering (NaN,NaN) results of multiplying infinities. If you multiply infinity by zero this should be NaN. > > “if one operand is an infinity and the other operand is a nonzero finite number or an > infinity, then the result of the * operator is an infinity;" > > Note the “non-zero” condition. > > But the algorithm provided does not do this and you can get an infinite result back. > > This is what GNU g++ does: > > multiply(0 + i 0, 0 + i inf) = -nan + i -nan > multiply(-0 + i 0, 0 + i inf) = -nan + i -nan > multiply(0 + i 0, -0 + i inf) = -nan + i -nan > multiply(-0 + i 0, -0 + i inf) = -nan + i -nan > multiply(0 + i 0, 1 + i -inf) = -nan + i -nan > multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan > > So they have implemented the multiply differently because here multiply infinity by zero is NaN. Note how printf distinguishes the sign of infinite values. I don’t think Java does this. It collates NaN into one representation. I will have to convert to raw long bits to get the sign for a cross reference to a C version. I don't follow. Why would it be useful to distinguish -NaN from NaN or i NaN? > > 4. Divide > > I have applied a change to this algorithm to check for overflow in the division. The reference text states: > > “if the first operand is a finite number and the second operand is an infinity, then the > result of the / operator is a zero;” > > But for example GNU g++ does this: > > divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0 > > This is what the reference algorithm does if it is not changed. Here overflow cause problems in the result (which should be zero). So I think my patch is correct and the result should be (0 + i 0). Not sure; I'd think that division by an imaginary infinity should yield NaN. > The current unit tests are enforcing this requirement. > > 5. Edge cases > > Some of the methods have a large amount of if statements to test edge cases. These are very verbose and hard to follow. They also leave the common path of a non infinite, non nan, non zero value at the end. So a rearrangement may allow better branch prediction for the common use case. I’ll carry on working on this as I add more tests. > > 6. tanh > > I have put a TODO: note in the code about this. It tests edge case for infinites. Then doubles the real and imaginary components. These could then overflow to infinite and so trigger the edge case that has just been skipped. I am going to compare the result to the GNU C version to decide how to handle overflow. Thanks for the review! > > Numbers overall > > I’ve now updated the configs for checkstyle, PMD and spotbugs. Checkstyle:check How to disable "failOnViolation" from the command line? > and spotbugs:check are enabled. Pmd:check is not. There are some loose ends to fix for that. Unfortunately the syntax for exclusions in the PMD config file are poor. I’ve added some but I’ve not looked at everything yet. A few decisions might need to be taken on whether to drops rules rather than fix them to at least get pmd:check active. It spots a lot of things so is useful for PRs. Maybe. But how does one know what to fix when the build fails because of a CheckStyle or PMD violation? > I’ve also not looked at SonarCloud [1]. There are a few things on there to fix too. Most (all?) are false positives. > [1] https://sonarcloud.io/dashboard?id=commons-numbers <https://sonarcloud.io/dashboard?id=commons-numbers> Best, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
> On 9 Nov 2019, at 02:42, Gilles Sadowski <[hidden email]> wrote: > > Hi. > > Le sam. 9 nov. 2019 à 01:48, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit : >> >> >> >>> On 7 Nov 2019, at 23:29, Eric Barnhill <[hidden email]> wrote: >>> >>> On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski <[hidden email]> wrote: >>> >>>> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill <[hidden email]> a >>>> écrit : >>>>> >>>>> I should also add on this note, my use case for developing ComplexUtils >>>> in >>>>> the first place was compatibility with JTransforms and JOCL. In both >>>> cases >>>>> I wanted to convert Complex[] arrays into interleaved double[] arrays to >>>>> feed into algorithms using those libraries. >>>> >>>> Implicit in my remark below is the question: Where does the "Complex[]" >>>> come from? If it is never a good idea to create such an array, why provide >>>> code to convert from it? Do we agree that we should rather create the >>>> "ComplexList" abstraction, including accessors that shape the data for >>>> use with e.g. JTransforms? >>>> >>>> >>> I completely agree this is a superior solution and look forward to its >>> development. >> >> I think at least the minimum implementation of the abstraction will be fairly easy. It can then be performance checked with a few variants. >> >> There currently is not a JMH module for numbers. Should I create one under a module included with an optional profile? > > What is going to be benchmarked? The operations in Complex applied via a ComplexList or a Complex[], or a specialisation that works direct on the numbers and does not use Complex. If one of the purposes of the ComplexList is efficient computation on large amounts of data then it would be a start to show the efficiency. I can do this on a branch and see if it is useful. > >> >> I have spent a fair bit of time trying to fix coverage of Complex. It is now at 100% with 95% branch coverage. It currently tests all the edge cases for the C standard. However it doesn’t really test actual computations. > > Actual computations? > Coveralls seems OK: > https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java <https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java> The point is that you can get 100% coverage by calling all the methods. You don’t have to do assertions on the results. So for example sin() has no test on data that is not an edge case, for example: complex(3, 4).sin() = ??? The only test is: assertComplex(z.sin(), negI.multiply(iz.sinh())); And the tests for sinh() tests the edge cases. This is why the following PR [1] exists that swaps the sign for the imaginary component in the sin() computation. There are no tests to show the functions working on “normal” computations. Essentially the most basic tests that compute the functions on data that should go down the common path are all missing, Only the computation edge cases are asserted. So I’ll create some using GNU g++ to create expected results. [1] https://github.com/apache/commons-numbers/pull/31/files <https://github.com/apache/commons-numbers/pull/31/files> > >> I am going to continue on this to add more tests that the computations output the same values as a C reference implementation, and hit all the branches. >> >> When working on the class it raised a few questions: >> >> 1. HashCode >> >> This is current the same for any number with a NaN component. However the C standard states that NaN components are allowed in infinites. So perhaps the hashCode should be updated to at least distinguish infinites from NaN. > > What would be the purpose? For storing in a HashSet. I noted this when first working on the class but on reflection I admit that it won’t matter. The current equals method will distinguish any binary difference so these are not equal: Complex(inf, NaN) Complex(-inf, NaN) Complex(NaN, inf) Complex(NaN, -inf) They are all infinite but are different. They will be stored under the same hash along with any other Complex with NaN. It may be of use for searching to have any infinite stored under the same key but one that is different from a NaN. > >> >> 2. NaN definition >> >> What is not clear is if there exists a definition of NaN for a complex number. The C library has no way to test it with a single function call. You have to test both the real and imaginary components. On top of this you can do computation with NaNs. The GNU g++ compiler computes: >> >> (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf >> >> Thus this is allowing some computations with NaN to produce results. I don’t know if this is a bug in GNU g++ or intentional. The C standard states: >> >> “If both operands of the * operator are complex or if the second operand of the / operator >> is complex, the operator raises floating-point exceptions if appropriate for the calculation >> of the parts of the result, and may raise spurious floating-point exceptions.” >> >> That is pretty vague. It says that you can do what is “appropriate”. >> >> So it seems that perhaps we should only call a complex a NaN is both fields are NaN. If only one is NaN then the complex is semi-NaN and results of computations may or may not make sense. > > Do you have an example of a NaN that makes sense? > What is semi-NaN? semiNaN = when only one of the two component is NaN. The example above is a case which does not make sense to me: (1e300 + i 1e300) * (1e30 + i NAN) = inf + I inf I wrote tests that asserted this should be a NaN (anything multiplied by a complex with a NaN component should have a NaN in the results). But my tests failed which is why I found this. So is this a bug in the reference implementation and in GNU g++ or a deliberate vagueness on working with NaN in complex numbers. I suppose you may do a computation where the Imaginary component ends up as NaN. But then you go on to only need the real component. This may not be NaN and you continue without knowing the NaN exists in the imaginary. I don’t like to state that every operation with a NaN in one of the components is going to be set to (NaN,NaN). So I will continue to see how GNU g++ works and go along with that. It’s been around for a while and so I assume it would have been fixed if this is not the way to do it. > >> >> 3. Multiplication >> >> I applied a change to the algorithm provided in the C standard when recovering (NaN,NaN) results of multiplying infinities. If you multiply infinity by zero this should be NaN. >> >> “if one operand is an infinity and the other operand is a nonzero finite number or an >> infinity, then the result of the * operator is an infinity;" >> >> Note the “non-zero” condition. >> >> But the algorithm provided does not do this and you can get an infinite result back. >> >> This is what GNU g++ does: >> >> multiply(0 + i 0, 0 + i inf) = -nan + i -nan >> multiply(-0 + i 0, 0 + i inf) = -nan + i -nan >> multiply(0 + i 0, -0 + i inf) = -nan + i -nan >> multiply(-0 + i 0, -0 + i inf) = -nan + i -nan >> multiply(0 + i 0, 1 + i -inf) = -nan + i -nan >> multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan >> >> So they have implemented the multiply differently because here multiply infinity by zero is NaN. Note how printf distinguishes the sign of infinite values. I don’t think Java does this. It collates NaN into one representation. I will have to convert to raw long bits to get the sign for a cross reference to a C version. > > I don't follow. Why would it be useful to distinguish -NaN from NaN or i NaN? It was an observation. IEEE 754 allows multiple values for a NaN. IIUC Any non-zero mantissa with an exponent of 1024 is nan. If the mantissa is zero it is infinite. C will distinguish the sign of NaN. Java will not. I suppose the authors of Java decided to go away from this as it is not useful. My motivation was to build a printf equivalent that does output the sign of the nan. It would help when comparing the results of the java code to the c code. I should also try and track down the c code for GNU g++ for multiply and divide. Maybe it has been changed from the reference provided in the C99 document. I’m not familiar with the source for g++ and it might be non trivial to do this. > >> >> 4. Divide >> >> I have applied a change to this algorithm to check for overflow in the division. The reference text states: >> >> “if the first operand is a finite number and the second operand is an infinity, then the >> result of the / operator is a zero;” >> >> But for example GNU g++ does this: >> >> divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0 >> >> This is what the reference algorithm does if it is not changed. Here overflow cause problems in the result (which should be zero). So I think my patch is correct and the result should be (0 + i 0). > > Not sure; I'd think that division by an imaginary infinity should yield NaN. I read from the standard that divide by infinity is zero. This divide(1e+308 + i 1e+308, inf + i inf) = -nan + I 0 Should be divide(1e+308 + i 1e+308, inf + i inf) = 0 + i 0 The bug in the reference c code is overflow in the ‘finite / infinite' recovery logic that does this: else if (isinf(logbw) && isfinite(a) && isfinite(b)) { c = copysign(isinf(c) ? 1.0 : 0.0, c); d = copysign(isinf(d) ? 1.0 : 0.0, d); x = 0.0 * ( a * c + b * d ); y = 0.0 * ( b * c - a * d ); } Here C and D are converted to 1. Then: a * c + b * d 1e+308 * 1 + 1e+308 * 1 = Infinite (overflow!) So: x = 0 * inf == NaN. x should be zero. > >> The current unit tests are enforcing this requirement. >> >> 5. Edge cases >> >> Some of the methods have a large amount of if statements to test edge cases. These are very verbose and hard to follow. They also leave the common path of a non infinite, non nan, non zero value at the end. So a rearrangement may allow better branch prediction for the common use case. I’ll carry on working on this as I add more tests. >> >> 6. tanh >> >> I have put a TODO: note in the code about this. It tests edge case for infinites. Then doubles the real and imaginary components. These could then overflow to infinite and so trigger the edge case that has just been skipped. I am going to compare the result to the GNU C version to decide how to handle overflow. > > Thanks for the review! > >> >> Numbers overall >> >> I’ve now updated the configs for checkstyle, PMD and spotbugs. Checkstyle:check > > How to disable "failOnViolation" from the command line? https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html <https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html> <failOnViolation> User property is: checkstyle.failOnViolation mvn checkstyle:check -Dcheckstyle.failOnViolation=false I would just run the report instead: mvn checkstyle:checkstyle (I use this in my .bashprofile: alias ch='mvn checkstyle:checkstyle’) We don’t have fail on violation set for the report so you can always generate it. It goes to ’target/site/checkstyle.html' > >> and spotbugs:check are enabled. Pmd:check is not. There are some loose ends to fix for that. Unfortunately the syntax for exclusions in the PMD config file are poor. I’ve added some but I’ve not looked at everything yet. A few decisions might need to be taken on whether to drops rules rather than fix them to at least get pmd:check active. It spots a lot of things so is useful for PRs. > > Maybe. But how does one know what to fix when the build fails because > of a CheckStyle > or PMD violation? Generate and read the reports. You don’t have to build the entire site: mvn pmd:pmd checkstyle:checkstyle Currently these are set not to log failures to the console. It could be changed. I just used the settings from RNG. > >> I’ve also not looked at SonarCloud [1]. There are a few things on there to fix too. > > Most (all?) are false positives. Maybe. If so I can configure them. I asked for Sonar admin access to RNG but I seem to have the admin access for Numbers too. > >> [1] https://sonarcloud.io/dashboard?id=commons-numbers <https://sonarcloud.io/dashboard?id=commons-numbers> <https://sonarcloud.io/dashboard?id=commons-numbers <https://sonarcloud.io/dashboard?id=commons-numbers>> > > > Best, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] <mailto:[hidden email]> > For additional commands, e-mail: [hidden email] <mailto:[hidden email]> |
Hello Alex.
>>> [...] >>> I think at least the minimum implementation of the abstraction will be >>> fairly easy. It can then be performance checked with a few variants. >>> >>> There currently is not a JMH module for numbers. Should I create one >>> under a module included with an optional profile? >> >> What is going to be benchmarked? > > The operations in Complex applied via a ComplexList or a Complex[], or a > specialisation that works direct on the numbers and does not use Complex. If > one of the purposes of the ComplexList is efficient computation on large > amounts of data then it would be a start to show the efficiency. I can do > this on a branch and see if it is useful. No need for a branch, as all expressed opinions agree on the usefulness of "ComplexList", at least to save RAM. Indeed, a JMH module would then hopefully show that no performance is lost by the one additional creation of a "Complex" instance when the data the abstraction is backed by "double[]" (wrt "Complex[]"). >>> >>> I have spent a fair bit of time trying to fix coverage of Complex. It is >>> now at 100% with 95% branch coverage. It currently tests all the edge >>> cases for the C standard. However it doesn’t really test actual >>> computations. >> >> Actual computations? >> Coveralls seems OK: >> >> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java >> <https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java> > > The point is that you can get 100% coverage by calling all the methods. You > don’t have to do assertions on the results. So for example sin() has no test > on data that is not an edge case, for example: > > complex(3, 4).sin() = ??? > > The only test is: > > assertComplex(z.sin(), negI.multiply(iz.sinh())); > > And the tests for sinh() tests the edge cases. > > This is why the following PR [1] exists that swaps the sign for the > imaginary component in the sin() computation. There are no tests to show the > functions working on “normal” computations. Essentially the most basic tests > that compute the functions on data that should go down the common path are > all missing, Only the computation edge cases are asserted. So I’ll create > some using GNU g++ to create expected results. > > [1] https://github.com/apache/commons-numbers/pull/31/files > <https://github.com/apache/commons-numbers/pull/31/files> Got it. Thanks. >> >>> I am going to continue on this to add more tests that the computations >>> output the same values as a C reference implementation, and hit all the >>> branches. >>> >>> When working on the class it raised a few questions: >>> >>> 1. HashCode >>> >>> This is current the same for any number with a NaN component. However the >>> C standard states that NaN components are allowed in infinites. So >>> perhaps the hashCode should be updated to at least distinguish infinites >>> from NaN. >> >> What would be the purpose? > > For storing in a HashSet. I noted this when first working on the class but > on reflection I admit that it won’t matter. The current equals method will > distinguish any binary difference so these are not equal: > > Complex(inf, NaN) > Complex(-inf, NaN) > Complex(NaN, inf) > Complex(NaN, -inf) > > They are all infinite but are different. They will be stored under the same > hash along with any other Complex with NaN. It may be of use for searching > to have any infinite stored under the same key but one that is different > from a NaN. Finally, I think that the safer option is to follow the JDK convention and ensure that "hashCode is consistent with equals". How about using "java.util.Arrays" to define both? >>> >>> 2. NaN definition >>> >>> What is not clear is if there exists a definition of NaN for a complex >>> number. The C library has no way to test it with a single function call. >>> You have to test both the real and imaginary components. On top of this >>> you can do computation with NaNs. The GNU g++ compiler computes: >>> >>> (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf >>> >>> Thus this is allowing some computations with NaN to produce results. I >>> don’t know if this is a bug in GNU g++ or intentional. The C standard >>> states: >>> >>> “If both operands of the * operator are complex or if the second operand >>> of the / operator >>> is complex, the operator raises floating-point exceptions if appropriate >>> for the calculation >>> of the parts of the result, and may raise spurious floating-point >>> exceptions.” >>> >>> That is pretty vague. It says that you can do what is “appropriate”. >>> >>> So it seems that perhaps we should only call a complex a NaN is both >>> fields are NaN. If only one is NaN then the complex is semi-NaN and >>> results of computations may or may not make sense. >> >> Do you have an example of a NaN that makes sense? >> What is semi-NaN? > > semiNaN = when only one of the two component is NaN. > > The example above is a case which does not make sense to me: (1e300 + i > 1e300) * (1e30 + i NAN) = inf + I inf > > I wrote tests that asserted this should be a NaN (anything multiplied by a > complex with a NaN component should have a NaN in the results). But my tests > failed which is why I found this. So is this a bug in the reference > implementation and in GNU g++ or a deliberate vagueness on working with NaN > in complex numbers. > > I suppose you may do a computation where the Imaginary component ends up as > NaN. But then you go on to only need the real component. This may not be NaN > and you continue without knowing the NaN exists in the imaginary. Yes. I seem to remember that some computations should be allowed to continue even if intermediate results could be NaN... > I don’t like to state that every operation with a NaN in one of the > components is going to be set to (NaN,NaN). So I will continue to see how > GNU g++ works and go along with that. It’s been around for a while and so I > assume it would have been fixed if this is not the way to do it. +1 Then, should we decide to align on the g++ results for all computations? [But not for "equals" if it would contradict the JDK.] >>> >>> 3. Multiplication >>> >>> I applied a change to the algorithm provided in the C standard when >>> recovering (NaN,NaN) results of multiplying infinities. If you multiply >>> infinity by zero this should be NaN. >>> >>> “if one operand is an infinity and the other operand is a nonzero finite >>> number or an >>> infinity, then the result of the * operator is an infinity;" >>> >>> Note the “non-zero” condition. >>> >>> But the algorithm provided does not do this and you can get an infinite >>> result back. >>> >>> This is what GNU g++ does: >>> >>> multiply(0 + i 0, 0 + i inf) = -nan + i -nan >>> multiply(-0 + i 0, 0 + i inf) = -nan + i -nan >>> multiply(0 + i 0, -0 + i inf) = -nan + i -nan >>> multiply(-0 + i 0, -0 + i inf) = -nan + i -nan >>> multiply(0 + i 0, 1 + i -inf) = -nan + i -nan >>> multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan >>> >>> So they have implemented the multiply differently because here multiply >>> infinity by zero is NaN. Note how printf distinguishes the sign of >>> infinite values. I don’t think Java does this. It collates NaN into one >>> representation. I will have to convert to raw long bits to get the sign >>> for a cross reference to a C version. >> >> I don't follow. Why would it be useful to distinguish -NaN from NaN or i >> NaN? > > It was an observation. IEEE 754 allows multiple values for a NaN. IIUC Any > non-zero mantissa with an exponent of 1024 is nan. If the mantissa is zero > it is infinite. C will distinguish the sign of NaN. Java will not. I suppose > the authors of Java decided to go away from this as it is not useful. > > My motivation was to build a printf equivalent that does output the sign of > the nan. It would help when comparing the results of the java code to the c > code. > > I should also try and track down the c code for GNU g++ for multiply and > divide. Maybe it has been changed from the reference provided in the C99 > document. I’m not familiar with the source for g++ and it might be non > trivial to do this. If the issues only arise in edge cases that involve indeterminate forms, I wouldn't worry too much, apart from stating the potential pitfalls in the Javadoc. If something there can be improved (e.g. after looking at math references), it could be done in a subsequent minor release. >> >>> >>> 4. Divide >>> >>> I have applied a change to this algorithm to check for overflow in the >>> division. The reference text states: >>> >>> “if the first operand is a finite number and the second operand is an >>> infinity, then the >>> result of the / operator is a zero;” >>> >>> But for example GNU g++ does this: >>> >>> divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0 >>> >>> This is what the reference algorithm does if it is not changed. Here >>> overflow cause problems in the result (which should be zero). So I think >>> my patch is correct and the result should be (0 + i 0). >> >> Not sure; I'd think that division by an imaginary infinity should yield >> NaN. > > I read from the standard that divide by infinity is zero. > > This > > divide(1e+308 + i 1e+308, inf + i inf) = -nan + I 0 > > Should be > > divide(1e+308 + i 1e+308, inf + i inf) = 0 + i 0 > > The bug in the reference c code is overflow in the ‘finite / infinite' > recovery logic that does this: > > else if (isinf(logbw) && > isfinite(a) && isfinite(b)) { > c = copysign(isinf(c) ? 1.0 : 0.0, c); > d = copysign(isinf(d) ? 1.0 : 0.0, d); > x = 0.0 * ( a * c + b * d ); > y = 0.0 * ( b * c - a * d ); > } > > Here > > C and D are converted to 1. > Then: > a * c + b * d > 1e+308 * 1 + 1e+308 * 1 = Infinite (overflow!) > > So: > > x = 0 * inf == NaN. > > x should be zero. > Perhaps that under some conditions, it would be simpler to perform the computation in polar form (and hopefully get the expected result). >>> [...] >>> >>> Numbers overall >>> >>> I’ve now updated the configs for checkstyle, PMD and spotbugs. >>> Checkstyle:check >> >> How to disable "failOnViolation" from the command line? > > https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html > <https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html> > > <failOnViolation> > > User property is: checkstyle.failOnViolation > > > mvn checkstyle:check -Dcheckstyle.failOnViolation=false I think that I tried $ mvn -Dcheckstyle.failOnViolation=false test And still it wouldn't run the test (because I had introduced the "System.out" forbidden construct). > I would just run the report instead: > > mvn checkstyle:checkstyle > > (I use this in my .bashprofile: alias ch='mvn checkstyle:checkstyle’) > > We don’t have fail on violation set for the report so you can always > generate it. It goes to ’target/site/checkstyle.html' >> >>> and spotbugs:check are enabled. Pmd:check is not. There are some loose >>> ends to fix for that. Unfortunately the syntax for exclusions in the PMD >>> config file are poor. I’ve added some but I’ve not looked at everything >>> yet. A few decisions might need to be taken on whether to drops rules >>> rather than fix them to at least get pmd:check active. It spots a lot of >>> things so is useful for PRs. >> >> Maybe. But how does one know what to fix when the build fails because >> of a CheckStyle >> or PMD violation? > > Generate and read the reports. Well, of course but with yesterday's config, the build would just fail, without having a report to read. > You don’t have to build the entire site: When the focus is on the report, I'd find it easier to generate all of them and read them on the web site... When the focus is on debugging/coverage, I prefer that checks are not part of the build. [But I agree that it's better to intercept problematic PRs...] > > mvn pmd:pmd checkstyle:checkstyle > > Currently these are set not to log failures to the console. It could be > changed. I just used the settings from RNG. Sure. But, there, we are already much more confident in the overall shape of the test suite. Anyway, thanks to your involvement, we are on that path for [Numbers] too. :-) >> >>> I’ve also not looked at SonarCloud [1]. There are a few things on there >>> to fix too. >> >> Most (all?) are false positives. > > Maybe. The likes of "method too long", "commented out code to be removed", "complexity too high", ... Regards, Gilles > If so I can configure them. I asked for Sonar admin access to RNG but > I seem to have the admin access for Numbers too. > >> >>> [1] https://sonarcloud.io/dashboard?id=commons-numbers >>> <https://sonarcloud.io/dashboard?id=commons-numbers> >>> <https://sonarcloud.io/dashboard?id=commons-numbers >>> <https://sonarcloud.io/dashboard?id=commons-numbers>> --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
> On 9 Nov 2019, at 12:23, Gilles Sadowski <[hidden email]> wrote: > > Hello Alex. > >>>> [...] >>>> I think at least the minimum implementation of the abstraction will be >>>> fairly easy. It can then be performance checked with a few variants. >>>> >>>> There currently is not a JMH module for numbers. Should I create one >>>> under a module included with an optional profile? >>> >>> What is going to be benchmarked? >> >> The operations in Complex applied via a ComplexList or a Complex[], or a >> specialisation that works direct on the numbers and does not use Complex. If >> one of the purposes of the ComplexList is efficient computation on large >> amounts of data then it would be a start to show the efficiency. I can do >> this on a branch and see if it is useful. > > No need for a branch, as all expressed opinions agree on the usefulness > of "ComplexList", at least to save RAM. > Indeed, a JMH module would then hopefully show that no performance is > lost by the one additional creation of a "Complex" instance when the data > the abstraction is backed by "double[]" (wrt "Complex[]”). Since this is the crux of my use case it will be interesting to see if using numbers can be as performant as what I am currently doing. > >>>> >>>> I have spent a fair bit of time trying to fix coverage of Complex. It is >>>> now at 100% with 95% branch coverage. It currently tests all the edge >>>> cases for the C standard. However it doesn’t really test actual >>>> computations. >>> >>> Actual computations? >>> Coveralls seems OK: >>> >>> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java >>> <https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java> >> >> The point is that you can get 100% coverage by calling all the methods. You >> don’t have to do assertions on the results. So for example sin() has no test >> on data that is not an edge case, for example: >> >> complex(3, 4).sin() = ??? >> >> The only test is: >> >> assertComplex(z.sin(), negI.multiply(iz.sinh())); >> >> And the tests for sinh() tests the edge cases. >> >> This is why the following PR [1] exists that swaps the sign for the >> imaginary component in the sin() computation. There are no tests to show the >> functions working on “normal” computations. Essentially the most basic tests >> that compute the functions on data that should go down the common path are >> all missing, Only the computation edge cases are asserted. So I’ll create >> some using GNU g++ to create expected results. >> >> [1] https://github.com/apache/commons-numbers/pull/31/files >> <https://github.com/apache/commons-numbers/pull/31/files> > > Got it. Thanks. > >>> >>>> I am going to continue on this to add more tests that the computations >>>> output the same values as a C reference implementation, and hit all the >>>> branches. >>>> >>>> When working on the class it raised a few questions: >>>> >>>> 1. HashCode >>>> >>>> This is current the same for any number with a NaN component. However the >>>> C standard states that NaN components are allowed in infinites. So >>>> perhaps the hashCode should be updated to at least distinguish infinites >>>> from NaN. >>> >>> What would be the purpose? >> >> For storing in a HashSet. I noted this when first working on the class but >> on reflection I admit that it won’t matter. The current equals method will >> distinguish any binary difference so these are not equal: >> >> Complex(inf, NaN) >> Complex(-inf, NaN) >> Complex(NaN, inf) >> Complex(NaN, -inf) >> >> They are all infinite but are different. They will be stored under the same >> hash along with any other Complex with NaN. It may be of use for searching >> to have any infinite stored under the same key but one that is different >> from a NaN. > > Finally, I think that the safer option is to follow the JDK convention > and ensure that "hashCode is consistent with equals". > > How about using "java.util.Arrays" to define both? Arrays: public static int hashCode(double a[]) { if (a == null) return 0; int result = 1; for (double element : a) { long bits = Double.doubleToLongBits(element); result = 31 * result + (int)(bits ^ (bits >>> 32)); } return result; } Double public static int hashCode(double value) { long bits = doubleToLongBits(value); return (int)(bits ^ (bits >>> 32)); } So the same result as Arrays.hashCode should be this: return 31 * (31 + Double.hashCode(real)) + Double.hashCode(imaginary); It can be checked it is the same as Arrays.hashCode(new double[]{real, imaginary}). The current hash is: return 37 * (17 * Double.hashCode(imaginary) + Double.hashCode(real)); So not much difference except the factors, order of arguments and that the inner operation on imaginary is changed from * to +. Basically we drop the condition that any complex with a NaN has the same hash code and state the hash code will match Arrays.hashCode({real, imaginary}). > >>>> >>>> 2. NaN definition >>>> >>>> What is not clear is if there exists a definition of NaN for a complex >>>> number. The C library has no way to test it with a single function call. >>>> You have to test both the real and imaginary components. On top of this >>>> you can do computation with NaNs. The GNU g++ compiler computes: >>>> >>>> (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf >>>> >>>> Thus this is allowing some computations with NaN to produce results. I >>>> don’t know if this is a bug in GNU g++ or intentional. The C standard >>>> states: >>>> >>>> “If both operands of the * operator are complex or if the second operand >>>> of the / operator >>>> is complex, the operator raises floating-point exceptions if appropriate >>>> for the calculation >>>> of the parts of the result, and may raise spurious floating-point >>>> exceptions.” >>>> >>>> That is pretty vague. It says that you can do what is “appropriate”. >>>> >>>> So it seems that perhaps we should only call a complex a NaN is both >>>> fields are NaN. If only one is NaN then the complex is semi-NaN and >>>> results of computations may or may not make sense. >>> >>> Do you have an example of a NaN that makes sense? >>> What is semi-NaN? >> >> semiNaN = when only one of the two component is NaN. >> >> The example above is a case which does not make sense to me: (1e300 + i >> 1e300) * (1e30 + i NAN) = inf + I inf >> >> I wrote tests that asserted this should be a NaN (anything multiplied by a >> complex with a NaN component should have a NaN in the results). But my tests >> failed which is why I found this. So is this a bug in the reference >> implementation and in GNU g++ or a deliberate vagueness on working with NaN >> in complex numbers. >> >> I suppose you may do a computation where the Imaginary component ends up as >> NaN. But then you go on to only need the real component. This may not be NaN >> and you continue without knowing the NaN exists in the imaginary. > > Yes. I seem to remember that some computations should be allowed > to continue even if intermediate results could be NaN... > >> I don’t like to state that every operation with a NaN in one of the >> components is going to be set to (NaN,NaN). So I will continue to see how >> GNU g++ works and go along with that. It’s been around for a while and so I >> assume it would have been fixed if this is not the way to do it. > > +1 > > Then, should we decide to align on the g++ results for all computations? > [But not for "equals" if it would contradict the JDK.] I’ll write a c++ class to output example results. Then test it with g++ and clang. Hopefully they will agree. Then we can align with them both. If they do not agree then we have some evidence that there is flexibility, perhaps more for the undocumented edge cases than anything else. > >>>> >>>> 3. Multiplication >>>> >>>> I applied a change to the algorithm provided in the C standard when >>>> recovering (NaN,NaN) results of multiplying infinities. If you multiply >>>> infinity by zero this should be NaN. >>>> >>>> “if one operand is an infinity and the other operand is a nonzero finite >>>> number or an >>>> infinity, then the result of the * operator is an infinity;" >>>> >>>> Note the “non-zero” condition. >>>> >>>> But the algorithm provided does not do this and you can get an infinite >>>> result back. >>>> >>>> This is what GNU g++ does: >>>> >>>> multiply(0 + i 0, 0 + i inf) = -nan + i -nan >>>> multiply(-0 + i 0, 0 + i inf) = -nan + i -nan >>>> multiply(0 + i 0, -0 + i inf) = -nan + i -nan >>>> multiply(-0 + i 0, -0 + i inf) = -nan + i -nan >>>> multiply(0 + i 0, 1 + i -inf) = -nan + i -nan >>>> multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan >>>> >>>> So they have implemented the multiply differently because here multiply >>>> infinity by zero is NaN. Note how printf distinguishes the sign of >>>> infinite values. I don’t think Java does this. It collates NaN into one >>>> representation. I will have to convert to raw long bits to get the sign >>>> for a cross reference to a C version. >>> >>> I don't follow. Why would it be useful to distinguish -NaN from NaN or i >>> NaN? >> >> It was an observation. IEEE 754 allows multiple values for a NaN. IIUC Any >> non-zero mantissa with an exponent of 1024 is nan. If the mantissa is zero >> it is infinite. C will distinguish the sign of NaN. Java will not. I suppose >> the authors of Java decided to go away from this as it is not useful. >> >> My motivation was to build a printf equivalent that does output the sign of >> the nan. It would help when comparing the results of the java code to the c >> code. >> >> I should also try and track down the c code for GNU g++ for multiply and >> divide. Maybe it has been changed from the reference provided in the C99 >> document. I’m not familiar with the source for g++ and it might be non >> trivial to do this. > > If the issues only arise in edge cases that involve indeterminate forms, > I wouldn't worry too much, apart from stating the potential pitfalls in > the Javadoc. If something there can be improved (e.g. after looking at > math references), it could be done in a subsequent minor release. > >>> >>>> >>>> 4. Divide >>>> >>>> I have applied a change to this algorithm to check for overflow in the >>>> division. The reference text states: >>>> >>>> “if the first operand is a finite number and the second operand is an >>>> infinity, then the >>>> result of the / operator is a zero;” >>>> >>>> But for example GNU g++ does this: >>>> >>>> divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0 >>>> >>>> This is what the reference algorithm does if it is not changed. Here >>>> overflow cause problems in the result (which should be zero). So I think >>>> my patch is correct and the result should be (0 + i 0). >>> >>> Not sure; I'd think that division by an imaginary infinity should yield >>> NaN. >> >> I read from the standard that divide by infinity is zero. >> >> This >> >> divide(1e+308 + i 1e+308, inf + i inf) = -nan + I 0 >> >> Should be >> >> divide(1e+308 + i 1e+308, inf + i inf) = 0 + i 0 >> >> The bug in the reference c code is overflow in the ‘finite / infinite' >> recovery logic that does this: >> >> else if (isinf(logbw) && >> isfinite(a) && isfinite(b)) { >> c = copysign(isinf(c) ? 1.0 : 0.0, c); >> d = copysign(isinf(d) ? 1.0 : 0.0, d); >> x = 0.0 * ( a * c + b * d ); >> y = 0.0 * ( b * c - a * d ); >> } >> >> Here >> >> C and D are converted to 1. >> Then: >> a * c + b * d >> 1e+308 * 1 + 1e+308 * 1 = Infinite (overflow!) >> >> So: >> >> x = 0 * inf == NaN. >> >> x should be zero. >> > > Perhaps that under some conditions, it would be simpler to > perform the computation in polar form (and hopefully get > the expected result). > >>>> [...] >>>> >>>> Numbers overall >>>> >>>> I’ve now updated the configs for checkstyle, PMD and spotbugs. >>>> Checkstyle:check >>> >>> How to disable "failOnViolation" from the command line? >> >> https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html >> <https://maven.apache.org/plugins/maven-checkstyle-plugin/check-mojo.html> >> >> <failOnViolation> >> >> User property is: checkstyle.failOnViolation >> >> >> mvn checkstyle:check -Dcheckstyle.failOnViolation=false > > I think that I tried > $ mvn -Dcheckstyle.failOnViolation=false test > > And still it wouldn't run the test (because I had introduced the > "System.out" forbidden construct). It may not have been your usage. I was playing the checkstyle this morning and the checkstyle runtime properties were not working. So I just updated the pom. The runtime properties for PMD did work. So perhaps checkstyle has changed the properties or there is another one to set that I did not find. > >> I would just run the report instead: >> >> mvn checkstyle:checkstyle >> >> (I use this in my .bashprofile: alias ch='mvn checkstyle:checkstyle’) >> >> We don’t have fail on violation set for the report so you can always >> generate it. It goes to ’target/site/checkstyle.html' >>> >>>> and spotbugs:check are enabled. Pmd:check is not. There are some loose >>>> ends to fix for that. Unfortunately the syntax for exclusions in the PMD >>>> config file are poor. I’ve added some but I’ve not looked at everything >>>> yet. A few decisions might need to be taken on whether to drops rules >>>> rather than fix them to at least get pmd:check active. It spots a lot of >>>> things so is useful for PRs. >>> >>> Maybe. But how does one know what to fix when the build fails because >>> of a CheckStyle >>> or PMD violation? >> >> Generate and read the reports. > > Well, of course but with yesterday's config, the build would just > fail, without having a report to read. > >> You don’t have to build the entire site: > > When the focus is on the report, I'd find it easier to generate all of > them and read them on the web site... > > When the focus is on debugging/coverage, I prefer that checks are > not part of the build. > [But I agree that it's better to intercept problematic PRs...] > >> >> mvn pmd:pmd checkstyle:checkstyle >> >> Currently these are set not to log failures to the console. It could be >> changed. I just used the settings from RNG. > > Sure. > But, there, we are already much more confident in the overall > shape of the test suite. > Anyway, thanks to your involvement, we are on that path for > [Numbers] too. :-) > >>> >>>> I’ve also not looked at SonarCloud [1]. There are a few things on there >>>> to fix too. >>> >>> Most (all?) are false positives. >> >> Maybe. > > The likes of "method too long", "commented out code to be removed", > "complexity too high", ... > > Regards, > Gilles > >> If so I can configure them. I asked for Sonar admin access to RNG but >> I seem to have the admin access for Numbers too. >> >>> >>>> [1] https://sonarcloud.io/dashboard?id=commons-numbers >>>> <https://sonarcloud.io/dashboard?id=commons-numbers> >>>> <https://sonarcloud.io/dashboard?id=commons-numbers >>>> <https://sonarcloud.io/dashboard?id=commons-numbers>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] > |
In reply to this post by Gilles Sadowski-2
> On 9 Nov 2019, at 12:23, Gilles Sadowski <[hidden email]> wrote: > >> Snip > > I think that I tried > $ mvn -Dcheckstyle.failOnViolation=false test > > And still it wouldn't run the test (because I had introduced the > "System.out" forbidden construct). Checkstyle is configured to run in the validate phase (right at the start) [1]. The default is normally verify. This was copied from commons RNG and predates my joining the project. So this is why you cannot run ‘mvn test’ as it runs checkstyle:check and fails you before doing the test. Now fail on violation is ‘true’ perhaps we should move the check back to validate. So you cannot do a 'mvn install' with any checkstyle errors. You can also run mvn checkstyle:check to run the goal manually. [1] http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html <http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html> |
> On 9 Nov 2019, at 18:10, Alex Herbert <[hidden email]> wrote: > > > >> On 9 Nov 2019, at 12:23, Gilles Sadowski <[hidden email] <mailto:[hidden email]>> wrote: >> >>> Snip >> >> I think that I tried >> $ mvn -Dcheckstyle.failOnViolation=false test >> >> And still it wouldn't run the test (because I had introduced the >> "System.out" forbidden construct). > > Checkstyle is configured to run in the validate phase (right at the start) [1]. The default is normally verify. This was copied from commons RNG and predates my joining the project. > > So this is why you cannot run ‘mvn test’ as it runs checkstyle:check and fails you before doing the test. > > Now fail on violation is ‘true’ perhaps we should move the check back to validate. So you cannot do a 'mvn install' with any checkstyle errors. You can also run mvn checkstyle:check to run the goal manually. > > [1] http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html <http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html> *** Checkstyle ignores command-line user properties in favour of the POM configuration. *** (Examples are shown below) If you want to run tests and use System.out.print to debug stuff (which is a very reasonable dev action) then: 1. We move checkstyle to run after the test phase (i.e. in verify) 2. You have to put '// CHECKSTYLE: stop all’ or e.g. ‘// CHECKSTYLE: stop RegexpCheck’ (for System.out violations) in the file you are working on 3. You have to use <failOnViolation>false</failOnViolation> in the POM (temporarily) 4. You can use an *unset* property from the POM: -Dcheckstyle.maxAllowedViolations=10 I favour 1 or 4. Option 1 would have less typing and less to remember. — Examples: Hypothesis: Checkstyle is not using the properties set using -D. User set properties should override the POM. This is how PMD works. If you make a violation for PMD and do this: mvn pmd:check -Dpmd.failOnViolation=true -X You can see: [DEBUG] (f) failOnViolation = true And it fails. mvn pmd:check -Dpmd.failOnViolation=false -X You can see: [DEBUG] (f) failOnViolation = false And it is allowed. Trying the same with checkstyle: mvn checkstyle:check -Dcheckstyle.failOnViolation=true/false -Dcheckstyle.failsOnError=true/false -X All variants of true/false show the config from the POM: [DEBUG] (f) failOnViolation = true [DEBUG] (f) failsOnError = false Other properties work as documented: mvn checkstyle:check -Dcheckstyle.maxAllowedViolations=10 This shows as: [DEBUG] (f) maxAllowedViolations = 10 And the build passes. If I remove: <failOnViolation>true</failOnViolation> From the checkstyle build plugin configuration in the POM then the command line user properties work, i.e. mvn checkstyle:check -Dcheckstyle.failOnViolation=false [DEBUG] (f) failOnViolation = false -> PASS mvn checkstyle:check -Dcheckstyle.failOnViolation=true [DEBUG] (f) failOnViolation = true -> FAIL mvn checkstyle:check [DEBUG] (f) failOnViolation = true -> FAIL The last case uses the default value of true. It is still logged to the console. The 'effective POM' does not have the <failOnViolation> property anywhere so checkstyle is logging this even though it has not been configured. I tried checkstyle plugin 3.0.0 and 3.1.0 (the latest). Same result. As a control I put <maxAllowedViolations>10</maxAllowedViolations> in the checkstyle POM configuration. Checkstyle then ignore the same command line switch that worked before. *** So it seems checkstyle ignores command-line user properties in favour of the POM configuration. *** |
Le dim. 10 nov. 2019 à 00:33, Alex Herbert <[hidden email]> a écrit :
> > > > > On 9 Nov 2019, at 18:10, Alex Herbert <[hidden email]> wrote: > > > > > > > >> On 9 Nov 2019, at 12:23, Gilles Sadowski <[hidden email] <mailto:[hidden email]>> wrote: > >> > >>> Snip > >> > >> I think that I tried > >> $ mvn -Dcheckstyle.failOnViolation=false test > >> > >> And still it wouldn't run the test (because I had introduced the > >> "System.out" forbidden construct). > > > > Checkstyle is configured to run in the validate phase (right at the start) [1]. The default is normally verify. This was copied from commons RNG and predates my joining the project. > > > > So this is why you cannot run ‘mvn test’ as it runs checkstyle:check and fails you before doing the test. > > > > Now fail on violation is ‘true’ perhaps we should move the check back to validate. So you cannot do a 'mvn install' with any checkstyle errors. You can also run mvn checkstyle:check to run the goal manually. > > > > [1] http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html <http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html> > Short answer: > > *** Checkstyle ignores command-line user properties in favour of the POM configuration. *** > > (Examples are shown below) > > If you want to run tests and use System.out.print to debug stuff (which is a very reasonable dev action) then: > > 1. We move checkstyle to run after the test phase (i.e. in verify) > 2. You have to put '// CHECKSTYLE: stop all’ or e.g. ‘// CHECKSTYLE: stop RegexpCheck’ (for System.out violations) in the file you are working on > 3. You have to use <failOnViolation>false</failOnViolation> in the POM (temporarily) > 4. You can use an *unset* property from the POM: -Dcheckstyle.maxAllowedViolations=10 > > I favour 1 or 4. > > Option 1 would have less typing and less to remember. +1 :-) Gilles > — > > Examples: > > Hypothesis: Checkstyle is not using the properties set using -D. > > > User set properties should override the POM. This is how PMD works. If you make a violation for PMD and do this: > > mvn pmd:check -Dpmd.failOnViolation=true -X > > You can see: > > [DEBUG] (f) failOnViolation = true > > And it fails. > > > mvn pmd:check -Dpmd.failOnViolation=false -X > > You can see: > > [DEBUG] (f) failOnViolation = false > > And it is allowed. > > > Trying the same with checkstyle: > > mvn checkstyle:check -Dcheckstyle.failOnViolation=true/false -Dcheckstyle.failsOnError=true/false -X > > All variants of true/false show the config from the POM: > > [DEBUG] (f) failOnViolation = true > [DEBUG] (f) failsOnError = false > > > Other properties work as documented: > > mvn checkstyle:check -Dcheckstyle.maxAllowedViolations=10 > > This shows as: > > [DEBUG] (f) maxAllowedViolations = 10 > > And the build passes. > > > If I remove: > > <failOnViolation>true</failOnViolation> > > From the checkstyle build plugin configuration in the POM then the command line user properties work, i.e. > > mvn checkstyle:check -Dcheckstyle.failOnViolation=false > > [DEBUG] (f) failOnViolation = false > -> PASS > > mvn checkstyle:check -Dcheckstyle.failOnViolation=true > > [DEBUG] (f) failOnViolation = true > -> FAIL > > mvn checkstyle:check > > [DEBUG] (f) failOnViolation = true > -> FAIL > > The last case uses the default value of true. It is still logged to the console. The 'effective POM' does not have the <failOnViolation> property anywhere so checkstyle is logging this even though it has not been configured. > > I tried checkstyle plugin 3.0.0 and 3.1.0 (the latest). Same result. > > > As a control I put <maxAllowedViolations>10</maxAllowedViolations> in the checkstyle POM configuration. Checkstyle then ignore the same command line switch that worked before. > > *** So it seems checkstyle ignores command-line user properties in favour of the POM configuration. *** > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email] |
Free forum by Nabble | Edit this page |