[math] Major speed improvement to 1D FFT (MATH-732)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[math] Major speed improvement to 1D FFT (MATH-732)

Sébastien Brisard
Hi,
Kurt has recently proposed a patch for 1D-FFT which is much faster
than our current impl, while still passing the tests (of course).
Am I likely to hurt anyone's feelings if I replace the old impl by the new impl?

Also, I think that one of the reasons why our previous implementation
was so slow is because it used internally Complex objects. Since these
objects are immutable, we end up creating *a lot* of small objects.
Apparently, the GC and JIT are not so good at that ;) (not being an
expert, I'm not sure that the culprit is indeed the JIT or the GC...
but surely someone is to be blamed).

I remember we recently had this conversation on the ML: one of Gilles
or Luc argued that for very low-level algorithms, it didn't really
matter if the parameters were passed as "very crude" objects, since it
was most likely that the user would have to write an adapter to its
own data format. So I would suggest we give up Complex[] altogether in
the interface of FFT, and replace it with double[] arrays, with the
following layout :
data[2 * i] = real part
data[2 * i + 1] = imaginary part.

What do you think?
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Luc Maisonobe
Le 07/02/2012 07:50, Sébastien Brisard a écrit :
> Hi,

Hello Sébastien,

> Kurt has recently proposed a patch for 1D-FFT which is much faster
> than our current impl, while still passing the tests (of course).
> Am I likely to hurt anyone's feelings if I replace the old impl by the new impl?

Please, go ahead, a faster implementation is always good, as long as it
can be maintained.

>
> Also, I think that one of the reasons why our previous implementation
> was so slow is because it used internally Complex objects. Since these
> objects are immutable, we end up creating *a lot* of small objects.
> Apparently, the GC and JIT are not so good at that ;) (not being an
> expert, I'm not sure that the culprit is indeed the JIT or the GC...
> but surely someone is to be blamed).

This seems strange to me. Lots of improvements have been added to Java 5
and Java 6 about JVM optimization, and small objects are not really
costly now. I do trust JVM implementors here and really like using small
immutable objects.

>
> I remember we recently had this conversation on the ML: one of Gilles
> or Luc argued that for very low-level algorithms, it didn't really
> matter if the parameters were passed as "very crude" objects, since it
> was most likely that the user would have to write an adapter to its
> own data format. So I would suggest we give up Complex[] altogether in
> the interface of FFT, and replace it with double[] arrays, with the
> following layout :
> data[2 * i] = real part
> data[2 * i + 1] = imaginary part.
>
> What do you think?

I agree with this view (it may be me who expressed this). A double array
as an interface for our library seems good to me.

Luc

> Sébastien
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Sébastien Brisard
Hello Luc,

>
> Hello Sébastien,
>
>> Kurt has recently proposed a patch for 1D-FFT which is much faster
>> than our current impl, while still passing the tests (of course).
>> Am I likely to hurt anyone's feelings if I replace the old impl by the new impl?
>
> Please, go ahead, a faster implementation is always good, as long as it
> can be maintained.
>
>>
>> Also, I think that one of the reasons why our previous implementation
>> was so slow is because it used internally Complex objects. Since these
>> objects are immutable, we end up creating *a lot* of small objects.
>> Apparently, the GC and JIT are not so good at that ;) (not being an
>> expert, I'm not sure that the culprit is indeed the JIT or the GC...
>> but surely someone is to be blamed).
>
> This seems strange to me. Lots of improvements have been added to Java 5
> and Java 6 about JVM optimization, and small objects are not really
> costly now. I do trust JVM implementors here and really like using small
> immutable objects.
>

Actually, the more I work on commons-math, the more I like small,
immutable objects! What I wrote was just a "feeling", though. I
haven't done any proper monitoring. I plan to do that though, as this
could provide us with some guidelines on what we should and should not
do.

>>
>> I remember we recently had this conversation on the ML: one of Gilles
>> or Luc argued that for very low-level algorithms, it didn't really
>> matter if the parameters were passed as "very crude" objects, since it
>> was most likely that the user would have to write an adapter to its
>> own data format. So I would suggest we give up Complex[] altogether in
>> the interface of FFT, and replace it with double[] arrays, with the
>> following layout :
>> data[2 * i] = real part
>> data[2 * i + 1] = imaginary part.
>>
>> What do you think?
>
> I agree with this view (it may be me who expressed this). A double array
> as an interface for our library seems good to me.
>
> Luc
>
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Gilles Sadowski
In reply to this post by Luc Maisonobe
Hello.

>
> Hello Sébastien,
>
> > Kurt has recently proposed a patch for 1D-FFT which is much faster
> > than our current impl, while still passing the tests (of course).
> > Am I likely to hurt anyone's feelings if I replace the old impl by the new impl?
>
> Please, go ahead, a faster implementation is always good, as long as it
> can be maintained.
>
> >
> > Also, I think that one of the reasons why our previous implementation
> > was so slow is because it used internally Complex objects. Since these
> > objects are immutable, we end up creating *a lot* of small objects.
> > Apparently, the GC and JIT are not so good at that ;) (not being an
> > expert, I'm not sure that the culprit is indeed the JIT or the GC...
> > but surely someone is to be blamed).
>
> This seems strange to me. Lots of improvements have been added to Java 5
> and Java 6 about JVM optimization, and small objects are not really
> costly now. I do trust JVM implementors here and really like using small
> immutable objects.

The difference might be that there is a slight overhead in creating objects
(however small) with respect to using arrays of primitives in this scenario.
Trying to borrow from your explanation about cache-friendly implementations:
the cache will contain a larger chunk of an array of "double" than an array
of "Complex" (?).
Also, when the Complex objects array are split into their real an imaginary
constituent arrays, the computation are performed with "double", thus
bypassing method calls ("add", "subtract") and their possible precondition
checks.

> >
> > I remember we recently had this conversation on the ML: one of Gilles
> > or Luc argued that for very low-level algorithms, it didn't really
> > matter if the parameters were passed as "very crude" objects, since it
> > was most likely that the user would have to write an adapter to its
> > own data format. So I would suggest we give up Complex[] altogether in
> > the interface of FFT, and replace it with double[] arrays, with the
> > following layout :
> > data[2 * i] = real part
> > data[2 * i + 1] = imaginary part.
> >
> > What do you think?
>
> I agree with this view (it may be me who expressed this). A double array
> as an interface for our library seems good to me.

I'm wary of this sort of "optimization" (as it decreases the code clarity).

-1 until it has been proven that it brings a _significant_ performance
   improvement.
At the moment, I would of course keep the internal switch to arrays of
primitives but also the API that takes and returns "Complex[]" objects.

A change[1] that might bring an added performance improvement is by
combining the "getRealArray()" and "getImaginartArray()" methods into a
single one, in order to loop only once over the array of "Complex" objects:
---CUT---
public static double[][] getRealAndImaginaryArrays(Complex[] dataC) {
    final double[][] dataRI = new double[2][dataC.length];
    final double[] dataR = dataRI[0];
    final double[] dataI = dataRI[1];
    for (int i = 0; i < dataC.length; i++) {
        final Complex c = dataC[i];
        dataR[i] = c.getReal();
        dataI[i] = c.getImaginary();
    }
    return dataRI;
}
---CUT---

Then, for "transform" we would have:
---CUT---
public Complex[] transform(Complex[] f) {
    final double[][] dataRI = getRealAndImaginaryArrays(f);
    transformInPlace(dataRI[0], dataRI[1], false);

    // etc.
}
---CUT---
and similarly for the other methods.


Two other points, unrelated to the above:
1. I think that the "scaleArray" method (the version that takes a "double"
   argument) should be moved to the "util.MathArrays" class (and renamed
   "scale").
2. Why are some methods "protected" (e.g. "fft")? IMHO, they should be
   "private".


Best,
Gilles

[1] This is untested.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Sébastien Brisard
Hi again,

>>
>> > Kurt has recently proposed a patch for 1D-FFT which is much faster
>> > than our current impl, while still passing the tests (of course).
>> > Am I likely to hurt anyone's feelings if I replace the old impl by the new impl?
>>
>> Please, go ahead, a faster implementation is always good, as long as it
>> can be maintained.
>>
>> >
>> > Also, I think that one of the reasons why our previous implementation
>> > was so slow is because it used internally Complex objects. Since these
>> > objects are immutable, we end up creating *a lot* of small objects.
>> > Apparently, the GC and JIT are not so good at that ;) (not being an
>> > expert, I'm not sure that the culprit is indeed the JIT or the GC...
>> > but surely someone is to be blamed).
>>
>> This seems strange to me. Lots of improvements have been added to Java 5
>> and Java 6 about JVM optimization, and small objects are not really
>> costly now. I do trust JVM implementors here and really like using small
>> immutable objects.
>
> The difference might be that there is a slight overhead in creating objects
> (however small) with respect to using arrays of primitives in this scenario.
> Trying to borrow from your explanation about cache-friendly implementations:
> the cache will contain a larger chunk of an array of "double" than an array
> of "Complex" (?).
>

This is open for speculation for the time being, but I think it is
worth digging into that (maybe once 3.0 has been released!).

>
> Also, when the Complex objects array are split into their real an imaginary
> constituent arrays, the computation are performed with "double", thus
> bypassing method calls ("add", "subtract") and their possible precondition
> checks.
>
Ah! I didn't think about precondition checks.

>> >
>> > I remember we recently had this conversation on the ML: one of Gilles
>> > or Luc argued that for very low-level algorithms, it didn't really
>> > matter if the parameters were passed as "very crude" objects, since it
>> > was most likely that the user would have to write an adapter to its
>> > own data format. So I would suggest we give up Complex[] altogether in
>> > the interface of FFT, and replace it with double[] arrays, with the
>> > following layout :
>> > data[2 * i] = real part
>> > data[2 * i + 1] = imaginary part.
>> >
>> > What do you think?
>>
>> I agree with this view (it may be me who expressed this). A double array
>> as an interface for our library seems good to me.
>
> I'm wary of this sort of "optimization" (as it decreases the code clarity).
>
> -1 until it has been proven that it brings a _significant_ performance
>   improvement.
> At the moment, I would of course keep the internal switch to arrays of
> primitives but also the API that takes and returns "Complex[]" objects.
>
Well, in fact this change was not really thought of as an
optimization. It seemed to me that this data layout was more
natural... But it's probably because I'm used to it. I should mention
that having to create an array of Complex is the very reason why I do
not use the transform package for my own calcs. Indeed, my data comes
in double[] arrays. So I would have to turn this data into Complex[],
which would internally be converted back to double[] arrays...

My preferred data layout would be: one array for the real part, one
array for the imaginary part. That's what Kurt's patch does internally
at the moment. Actually, I like your idea of a two-dimension array.

I suggested this morning one array for both real and imaginary parts,
but thinking about it now, this option would come only second.

But the Complex array would definitely come third, as (in my opinion)
the worst practical option.

>
> A change[1] that might bring an added performance improvement is by
> combining the "getRealArray()" and "getImaginartArray()" methods into a
> single one, in order to loop only once over the array of "Complex" objects:
> ---CUT---
> public static double[][] getRealAndImaginaryArrays(Complex[] dataC) {
>    final double[][] dataRI = new double[2][dataC.length];
>    final double[] dataR = dataRI[0];
>    final double[] dataI = dataRI[1];
>    for (int i = 0; i < dataC.length; i++) {
>        final Complex c = dataC[i];
>        dataR[i] = c.getReal();
>        dataI[i] = c.getImaginary();
>    }
>    return dataRI;
> }
> ---CUT---
>
> Then, for "transform" we would have:
> ---CUT---
> public Complex[] transform(Complex[] f) {
>    final double[][] dataRI = getRealAndImaginaryArrays(f);
>    transformInPlace(dataRI[0], dataRI[1], false);
>
>    // etc.
> }
> ---CUT---
> and similarly for the other methods.
>
>
> Two other points, unrelated to the above:
> 1. I think that the "scaleArray" method (the version that takes a "double"
>   argument) should be moved to the "util.MathArrays" class (and renamed
>   "scale").
> 2. Why are some methods "protected" (e.g. "fft")? IMHO, they should be
>   "private".
>
1. OK
2. Because I didn't change it ;)
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Bruce A Johnson-2
In reply to this post by Gilles Sadowski

On Feb 7, 2012, at 6:34 AM, Gilles Sadowski wrote:

> Hello.
>
>>
>> Hello Sébastien,
>>
>>> Kurt has recently proposed a patch for 1D-FFT which is much faster
>>> than our current impl, while still passing the tests (of course).
>>> Am I likely to hurt anyone's feelings if I replace the old impl by the new impl?
>>
>> Please, go ahead, a faster implementation is always good, as long as it
>> can be maintained.
>>
>>>
>>> Also, I think that one of the reasons why our previous implementation
>>> was so slow is because it used internally Complex objects. Since these
>>> objects are immutable, we end up creating *a lot* of small objects.
>>> Apparently, the GC and JIT are not so good at that ;) (not being an
>>> expert, I'm not sure that the culprit is indeed the JIT or the GC...
>>> but surely someone is to be blamed).
>>
>> This seems strange to me. Lots of improvements have been added to Java 5
>> and Java 6 about JVM optimization, and small objects are not really
>> costly now. I do trust JVM implementors here and really like using small
>> immutable objects.
>
> The difference might be that there is a slight overhead in creating objects
> (however small) with respect to using arrays of primitives in this scenario.
> Trying to borrow from your explanation about cache-friendly implementations:
> the cache will contain a larger chunk of an array of "double" than an array
> of "Complex" (?).
> Also, when the Complex objects array are split into their real an imaginary
> constituent arrays, the computation are performed with "double", thus
> bypassing method calls ("add", "subtract") and their possible precondition
> checks.
>
>>>
>>> I remember we recently had this conversation on the ML: one of Gilles
>>> or Luc argued that for very low-level algorithms, it didn't really
>>> matter if the parameters were passed as "very crude" objects, since it
>>> was most likely that the user would have to write an adapter to its
>>> own data format. So I would suggest we give up Complex[] altogether in
>>> the interface of FFT, and replace it with double[] arrays, with the
>>> following layout :
>>> data[2 * i] = real part
>>> data[2 * i + 1] = imaginary part.
>>>
>>> What do you think?
>>
>> I agree with this view (it may be me who expressed this). A double array
>> as an interface for our library seems good to me.
>
> I'm wary of this sort of "optimization" (as it decreases the code clarity).
>
> -1 until it has been proven that it brings a _significant_ performance
>   improvement.
> At the moment, I would of course keep the internal switch to arrays of
> primitives but also the API that takes and returns "Complex[]" objects.
>
> A change[1] that might bring an added performance improvement is by
> combining the "getRealArray()" and "getImaginartArray()" methods into a
> single one, in order to loop only once over the array of "Complex" objects:
> ---CUT---
> public static double[][] getRealAndImaginaryArrays(Complex[] dataC) {
>    final double[][] dataRI = new double[2][dataC.length];
>    final double[] dataR = dataRI[0];
>    final double[] dataI = dataRI[1];
>    for (int i = 0; i < dataC.length; i++) {
>        final Complex c = dataC[i];
>        dataR[i] = c.getReal();
>        dataI[i] = c.getImaginary();
>    }
>    return dataRI;
> }
> ---CUT---
>
> Then, for "transform" we would have:
> ---CUT---
> public Complex[] transform(Complex[] f) {
>    final double[][] dataRI = getRealAndImaginaryArrays(f);
>    transformInPlace(dataRI[0], dataRI[1], false);
>
>    // etc.
> }
> ---CUT---
> and similarly for the other methods.
>
>
> Two other points, unrelated to the above:
> 1. I think that the "scaleArray" method (the version that takes a "double"
>   argument) should be moved to the "util.MathArrays" class (and renamed
>   "scale").
> 2. Why are some methods "protected" (e.g. "fft")? IMHO, they should be
>   "private".
>
>
> Best,
> Gilles
>
> [1] This is untested.
>

I've been testing the new faster code in a real world application and I'm a big fan.  But, I too would be hesitant to change the API.  There are several ways it could be changed:
transform (double[] real_imag)

or

transform(double[] real, double[] imag)

or

transform(ComplexVector cvec)

(where ComplexVector might internally be implemented as either a real and imaginary double array, or the alternating real/imag array, or an array of Complex).

Choosing among these and other alternatives (including the current) might best be left to a post-3.0 discussion of optimal handling of Complex arrays in general.

Bruce


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Sébastien Brisard
>
> I've been testing the new faster code in a real world application and I'm a big fan.  But, I too would be hesitant to change the API.  There are several ways it could be changed:
> transform (double[] real_imag)
>
> or
>
> transform(double[] real, double[] imag)
>
> or
>
> transform(ComplexVector cvec)
>
> (where ComplexVector might internally be implemented as either a real and imaginary double array, or the alternating real/imag array, or an array of Complex).
>
> Choosing among these and other alternatives (including the current) might best be left to a post-3.0 discussion of optimal handling of Complex arrays in general.
>
> Bruce
>
OK, I realize this was a bad suggestion. I'll commit Kurt's patch
(including cosmetic changes), and that's all. I would be tempted to
expose the lower level fft method, though (the one which takes two
double[] as an argument). Would you allow me that?

Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Gilles Sadowski
In reply to this post by Sébastien Brisard
Hi.

> [...]
> >> >
> >> > I remember we recently had this conversation on the ML: one of Gilles
> >> > or Luc argued that for very low-level algorithms, it didn't really
> >> > matter if the parameters were passed as "very crude" objects, since it
> >> > was most likely that the user would have to write an adapter to its
> >> > own data format. So I would suggest we give up Complex[] altogether in
> >> > the interface of FFT, and replace it with double[] arrays, with the
> >> > following layout :
> >> > data[2 * i] = real part
> >> > data[2 * i + 1] = imaginary part.
> >> >
> >> > What do you think?
> >>
> >> I agree with this view (it may be me who expressed this). A double array
> >> as an interface for our library seems good to me.
> >
> > I'm wary of this sort of "optimization" (as it decreases the code clarity).
> >
> > -1 until it has been proven that it brings a _significant_ performance
> >   improvement.
> > At the moment, I would of course keep the internal switch to arrays of
> > primitives but also the API that takes and returns "Complex[]" objects.
> >
> Well, in fact this change was not really thought of as an
> optimization. It seemed to me that this data layout was more
> natural... But it's probably because I'm used to it. I should mention
> that having to create an array of Complex is the very reason why I do
> not use the transform package for my own calcs. Indeed, my data comes
> in double[] arrays. So I would have to turn this data into Complex[],
> which would internally be converted back to double[] arrays...

Wherever it makes sense from a user perspective[1], nobody prevents you from
adding new methods to the API, e.g.:
---CUT---
public double[][] transform(double[][] dataRI) { /* ... */ }
---CUT---
And you can pass your data to the CM code in a matter of two array reference
assignments.

> [...]


Best,
Gilles

[1] As long as the code remains self-documenting: I'd prefer to avoid, as
    much as possible, "out-of-band" conventions, like striding over a
    one-dimensional array to get different kinds of data (cf. real vs
    imaginary part: Another "natural" convention could be that the first
    half of the array contains all the real parts and the second half all
    the imaginary parts).
    Of course the "double[][]" also poses the question of the data ordering,
    but I have the impression that it is more flexible (and it could even be
    faster ;-).

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Sébastien Brisard
> ---CUT---
> public double[][] transform(double[][] dataRI) { /* ... */ }
> ---CUT---
> And you can pass your data to the CM code in a matter of two array reference
> assignments.
>
OK, I like this idea (this is more or less what I meant by "exposing"
the low-level method). I'll propose something along these lines, then.
Thanks everyone!
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Major speed improvement to 1D FFT (MATH-732)

Bruce A Johnson-2

On Feb 7, 2012, at 8:05 AM, Sébastien Brisard wrote:

>> ---CUT---
>> public double[][] transform(double[][] dataRI) { /* ... */ }
>> ---CUT---
>> And you can pass your data to the CM code in a matter of two array reference
>> assignments.
>>
> OK, I like this idea (this is more or less what I meant by "exposing"
> the low-level method). I'll propose something along these lines, then.
> Thanks everyone!
> Sébastien
>

Sounds good to me.

Bruce

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]