[math] noob; performance metrics?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[math] noob; performance metrics?

Andrew E. Davidson
sorry if this has been asked many times before. (maybe this can be added to the FAQ?)

has anyone done any bench marking?

The idea of having a math package that is implemented pure java is very attractive. My experience with machine learning is that java is very slow. To go fast you need to take advantage of assembler or libraries written in fortran or C. For example http://jblas.org/ <http://jblas.org/>


Kind Regards

Andy


Reply | Threaded
Open this post in threaded view
|

Re: [math] noob; performance metrics?

mike shugar
To amplify and extend the question - would also like to know the same info
where bigdecimal is involved.

Thanks.



----- Original Message -----
From: "Andrew E. Davidson" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, June 23, 2015 12:08 PM
Subject: [math] noob; performance metrics?


sorry if this has been asked many times before. (maybe this can be added to
the FAQ?)

has anyone done any bench marking?

The idea of having a math package that is implemented pure java is very
attractive. My experience with machine learning is that java is very slow.
To go fast you need to take advantage of assembler or libraries written in
fortran or C. For example http://jblas.org/ <http://jblas.org/>


Kind Regards

Andy




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] noob; performance metrics?

Luc Maisonobe-2
In reply to this post by Andrew E. Davidson
Hi Andrew,

Le 23/06/2015 19:08, Andrew E. Davidson a écrit :
> sorry if this has been asked many times before. (maybe this can be
> added to the FAQ?)
>
> has anyone done any bench marking?

Yes.

>
> The idea of having a math package that is implemented pure java is
> very attractive. My experience with machine learning is that java is
> very slow. To go fast you need to take advantage of assembler or
> libraries written in fortran or C. For example http://jblas.org/
> <http://jblas.org/>

It is not that simple, and in some case, it can be slower ...

I did not find the benchmark I presented at several symposium in
2010, but here are some rough results.

The tests were done on the QR decomposition plus solving of an
A.X = B linear problem, with dense matrices. I did it for dimensions
up to 4000x4000 if I remember well. The benchmark was made using the
same underlying algorithm (but obvisouly different implementations).

The results were, in increasing performance :

  - Numerical Recipes in fortran, non-optimized
  - Numerical Recipes in fortran, optimized
  - LAPACK with ATLAS as a BLAS implementation
    (almost no difference in non-optimized or optimized)
  - Apache Commons Math !

Well we were only about 2% faster than LAPACK, and it was on only
one algorithm type, on my machine. I was happy and in fact surprised,
I did not expect we could reach LAPACK performances. A more realistic
result is to look also for other algorithms.

Answering your question for the general case, is however more
difficult, and here I don't have real benchmarks, only some general
feelings. I would say that accross different domains, the speed
differences that can be observed are typically a factor 1.5 or 2 (Java
being slower), which is a significant difference but clearly not as
important as most people think. In fact, there are many factors other
than language that are also in this domain of 1.5 or 2.

The lessons I learnt here are *not* that we are faster (for most
operation, I am sure we are slower), but rather than language is
only one factor for speed. Change the algorithm and you change the
speed. Change the compiler and you change the speed. Change the
optimizer and you change the speed. Change the human developer and
you change the speed, change your computer for one that is only a
few months more recent and you change the speed ...

Attempting to use Java-fortran native interface to get speed is
almost always a bad idea. The reason is that the layer between
the languages is difficult to go through and really slow. You
will spend much of the type in this layer rather than in real
processing code. This is especially true for matrices because
double[][] are not packed as a lot of double numbers in some
specified order after an initial pointer, you often have to
copy between Java arrays (which are objects) to C or fortran
arrays and you lose a lot of time doing copies.

Java-fortran native interface is useful, but not for speed
considerations. From my experience, it is more useful for
interfacing with libraries that have only one implementation
and that you cannot afford to port (because they are huge,
because they are highly domain specific and nobody else use
them, because they have been validated and you cannot take
the risk to introduce a bug by porting them, because you don't
have the time, or because you don't have the money).

In my domain (space systems), we use extensively Apache Commons
Math and some upper level Java libraries and since a few years
we have replaced many older fortran and C libraries. In all cases,
we are either as fast or much faster. This is mainly because
when developping these replacements libraries, we have chosen
different architectures, used newer algorithms, used different
trade-offs between memory and processing than what was available
to engineers 20 or 30 years ago. For sure, if they were to develop
again their libraries in fortran by now, they will also improve
their results. So what is important is what you can achieve at
present time, using present algorithms and present languages.

If your work is really focused on linear algebra, there are
other Java libraries that are faster than Apache Commons Math
for this specific domain (some use native interface, some don't).
Linear algebra is one of our weak points. Apache Commons Math
is a library with a broad coverage, not a specialized one for
linear algebra only.

So as a summary, yes there have been some benchmarks. Yes
Java can be fast (and it can also be slow depending on how
well it is developed, just like all other languages).

best regards,
Luc

>
>
> Kind Regards
>
> Andy
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] noob; performance metrics?

Luc Maisonobe-2
In reply to this post by mike shugar
Hi Mike,

Le 23/06/2015 21:17, mike shugar a écrit :
> To amplify and extend the question - would also like to know the same
> info where bigdecimal is involved.

I really don't know. We don't reall use BigDecimal in Apache Commons
Math. We rather used Dfp when we need high accuracy. Dfp does provide
all floting points operations, including trigonometric, logarithmic
or hyperbolic functions whereas BigDecimal only provides the classical
arithmetic operations. Dfp is devoted to really high accuracy (say you
want to compute an hyperbolic cosine to 200 digits), and is expected
to be slow.

I am not aware of any benchmarks relative to Dfp (of course it would
depend on the number of digits you would use) with respecto to
primitive double numbers. If you want to do some benchmarks, we would
be happy to see the results.

best regards,
Luc

>
> Thanks.
>
>
>
> ----- Original Message ----- From: "Andrew E. Davidson"
> <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, June 23, 2015 12:08 PM
> Subject: [math] noob; performance metrics?
>
>
> sorry if this has been asked many times before. (maybe this can be added
> to the FAQ?)
>
> has anyone done any bench marking?
>
> The idea of having a math package that is implemented pure java is very
> attractive. My experience with machine learning is that java is very
> slow. To go fast you need to take advantage of assembler or libraries
> written in fortran or C. For example http://jblas.org/ <http://jblas.org/>
>
>
> Kind Regards
>
> Andy
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] noob; performance metrics?

Luc Maisonobe-2
In reply to this post by Luc Maisonobe-2
Le 2015-06-23 22:44, Luc Maisonobe a écrit :

> Hi Andrew,
>
> Le 23/06/2015 19:08, Andrew E. Davidson a écrit :
>> sorry if this has been asked many times before. (maybe this can be
>> added to the FAQ?)
>>
>> has anyone done any bench marking?
>
> Yes.
>
>>
>> The idea of having a math package that is implemented pure java is
>> very attractive. My experience with machine learning is that java is
>> very slow. To go fast you need to take advantage of assembler or
>> libraries written in fortran or C. For example http://jblas.org/
>> <http://jblas.org/>
>
> It is not that simple, and in some case, it can be slower ...
>
> I did not find the benchmark I presented at several symposium in
> 2010, but here are some rough results.

I have found again the plot from 2010 and put it there:
   <https://people.apache.org/~luc/performances-QR.png>

Contrary to what I wrote below, the plot goes only to 1000x1000,
not 4000x4000, sorry for the confusion.

Luc

>
> The tests were done on the QR decomposition plus solving of an
> A.X = B linear problem, with dense matrices. I did it for dimensions
> up to 4000x4000 if I remember well. The benchmark was made using the
> same underlying algorithm (but obvisouly different implementations).
>
> The results were, in increasing performance :
>
>   - Numerical Recipes in fortran, non-optimized
>   - Numerical Recipes in fortran, optimized
>   - LAPACK with ATLAS as a BLAS implementation
>     (almost no difference in non-optimized or optimized)
>   - Apache Commons Math !
>
> Well we were only about 2% faster than LAPACK, and it was on only
> one algorithm type, on my machine. I was happy and in fact surprised,
> I did not expect we could reach LAPACK performances. A more realistic
> result is to look also for other algorithms.
>
> Answering your question for the general case, is however more
> difficult, and here I don't have real benchmarks, only some general
> feelings. I would say that accross different domains, the speed
> differences that can be observed are typically a factor 1.5 or 2 (Java
> being slower), which is a significant difference but clearly not as
> important as most people think. In fact, there are many factors other
> than language that are also in this domain of 1.5 or 2.
>
> The lessons I learnt here are *not* that we are faster (for most
> operation, I am sure we are slower), but rather than language is
> only one factor for speed. Change the algorithm and you change the
> speed. Change the compiler and you change the speed. Change the
> optimizer and you change the speed. Change the human developer and
> you change the speed, change your computer for one that is only a
> few months more recent and you change the speed ...
>
> Attempting to use Java-fortran native interface to get speed is
> almost always a bad idea. The reason is that the layer between
> the languages is difficult to go through and really slow. You
> will spend much of the type in this layer rather than in real
> processing code. This is especially true for matrices because
> double[][] are not packed as a lot of double numbers in some
> specified order after an initial pointer, you often have to
> copy between Java arrays (which are objects) to C or fortran
> arrays and you lose a lot of time doing copies.
>
> Java-fortran native interface is useful, but not for speed
> considerations. From my experience, it is more useful for
> interfacing with libraries that have only one implementation
> and that you cannot afford to port (because they are huge,
> because they are highly domain specific and nobody else use
> them, because they have been validated and you cannot take
> the risk to introduce a bug by porting them, because you don't
> have the time, or because you don't have the money).
>
> In my domain (space systems), we use extensively Apache Commons
> Math and some upper level Java libraries and since a few years
> we have replaced many older fortran and C libraries. In all cases,
> we are either as fast or much faster. This is mainly because
> when developping these replacements libraries, we have chosen
> different architectures, used newer algorithms, used different
> trade-offs between memory and processing than what was available
> to engineers 20 or 30 years ago. For sure, if they were to develop
> again their libraries in fortran by now, they will also improve
> their results. So what is important is what you can achieve at
> present time, using present algorithms and present languages.
>
> If your work is really focused on linear algebra, there are
> other Java libraries that are faster than Apache Commons Math
> for this specific domain (some use native interface, some don't).
> Linear algebra is one of our weak points. Apache Commons Math
> is a library with a broad coverage, not a specialized one for
> linear algebra only.
>
> So as a summary, yes there have been some benchmarks. Yes
> Java can be fast (and it can also be slow depending on how
> well it is developed, just like all other languages).
>
> best regards,
> Luc
>
>>
>>
>> Kind Regards
>>
>> Andy
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] noob; performance metrics?

Benedikt Ritter-4
In reply to this post by Luc Maisonobe-2
Hello Luc,

2015-06-23 22:50 GMT+02:00 Luc Maisonobe <[hidden email]>:

> Hi Mike,
>
> Le 23/06/2015 21:17, mike shugar a écrit :
> > To amplify and extend the question - would also like to know the same
> > info where bigdecimal is involved.
>
> I really don't know. We don't reall use BigDecimal in Apache Commons
> Math. We rather used Dfp when we need high accuracy. Dfp does provide
> all floting points operations, including trigonometric, logarithmic
> or hyperbolic functions whereas BigDecimal only provides the classical
> arithmetic operations. Dfp is devoted to really high accuracy (say you
> want to compute an hyperbolic cosine to 200 digits), and is expected
> to be slow.
>
> I am not aware of any benchmarks relative to Dfp (of course it would
> depend on the number of digits you would use) with respecto to
> primitive double numbers. If you want to do some benchmarks, we would
> be happy to see the results.
>

Thank you for this and your other answers on this thread. As someone who is
working mainly on business application (which usually don't involve this
kind of mathematics) reading through your posts has been very informative
for me.

Benedikt


>
> best regards,
> Luc
>
> >
> > Thanks.
> >
> >
> >
> > ----- Original Message ----- From: "Andrew E. Davidson"
> > <[hidden email]>
> > To: <[hidden email]>
> > Sent: Tuesday, June 23, 2015 12:08 PM
> > Subject: [math] noob; performance metrics?
> >
> >
> > sorry if this has been asked many times before. (maybe this can be added
> > to the FAQ?)
> >
> > has anyone done any bench marking?
> >
> > The idea of having a math package that is implemented pure java is very
> > attractive. My experience with machine learning is that java is very
> > slow. To go fast you need to take advantage of assembler or libraries
> > written in fortran or C. For example http://jblas.org/ <
> http://jblas.org/>
> >
> >
> > Kind Regards
> >
> > Andy
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter