commons-math, matrix-toolkits-java and consolidation

classic Classic list List threaded Threaded
93 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
Dear all,

I am a maintainer of the matrix-toolkits-java

  http://code.google.com/p/matrix-toolkits-java/

which is a comprehensive collection of matrix data structures, linear solvers, least squares methods, eigenvalue and singular value decompositions.

This note is in regard to the commons-math library. It is clear that our projects dovetail, especially when I look at "linear" in version 2.0 of the API. It would be good if we could either complement or consolidate efforts, rather than reproduce.

It would be excellent if all the functionality of matrix-toolkits-java were available as part of commons-math. There is already too much diversity and un-maintained maths code out there for Java!

As a start, I'd like to discourage the use of a solid implementation for SparseReal{Vector, Matrix}... please prefer an interface approach, allowing implementations based on the Templates project:-

  http://www.netlib.org/templates

The reason is that the storage implementation should be related to the type of data being stored. For example, there are many well-known kinds of sparse matrix that are well suited to particular kinds of calculations... consider multiplying sparse matrices that you know to be diagonal!

In general, the netlib.org folk (BLAS/LAPACK) have spent a *lot* of time thinking about linear algebra and have set up unrivalled standard APIs which have been implemented right down to the architecture level. It would be a major mistake if commons-math didn't build on their good work.

I believe commons-math should move to a netlib-java backend (allowing the use of machine optimised BLAS/LAPACK).

  http://code.google.com/p/netlib-java/

The largest problems facing MTJ are support for Sparse BLAS/LAPACK and scalability to parallel architectures which use Parallel BLAS/LAPACK. The former should be possible with some work within the current API, but I fear major API changes would be needed for the latter. I do not want the commons-math API to walk into this trap without having first considered future architectures! MTJ has a distributed package, but I am not sure if this is something that is completely future proof either.

What say ye'?

--
Sam
Reply | Threaded
Open this post in threaded view
|

[math] Re: commons-math, matrix-toolkits-java and consolidation

Ted Dunning
On Thu, May 14, 2009 at 3:18 AM, Sam Halliday <[hidden email]>wrote:

>
> I am a maintainer of the matrix-toolkits-java


Which is an impressive piece of work, especially the transparent but
non-binding interface to the Atlas and Blas native packages.  My compliments
to Bjørn-Ove and all who have followed up on his original work.

This note is in regard to the commons-math library. It is clear that our
> projects dovetail, especially when I look at "linear" in version 2.0 of the
> API. It would be good if we could either complement or consolidate efforts,
> rather than reproduce.


That sounds good to me.

As a start, I'd like to discourage the use of a solid implementation for
> SparseReal{Vector, Matrix}... please prefer an interface approach, allowing
> implementations based on the Templates project:-


Can you say more about what aspects of the Templates project you feel are
important?  You mention one case of storage layout.


> I believe commons-math should move to a netlib-java backend (allowing the
> use of machine optimised BLAS/LAPACK).


This is an interesting suggestion.  Obviously adopting MTJ wholesale would
accomplish that.

Can you say something about the licensing issues if we were to explore, for
discussion sake, MTJ being folded into commons-math?  MTJ is LGPL while
commons has to stay Apache licensed.  This licensing issue has been the
biggest sticking point in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
Ted Dunning a écrit :

> On Thu, May 14, 2009 at 3:18 AM, Sam Halliday <[hidden email]>wrote:
>
>> I am a maintainer of the matrix-toolkits-java
>
>
> Which is an impressive piece of work, especially the transparent but
> non-binding interface to the Atlas and Blas native packages.  My compliments
> to Bjørn-Ove and all who have followed up on his original work.
>
> This note is in regard to the commons-math library. It is clear that our
>> projects dovetail, especially when I look at "linear" in version 2.0 of the
>> API. It would be good if we could either complement or consolidate efforts,
>> rather than reproduce.
>
>
> That sounds good to me.

You are right.

>
> As a start, I'd like to discourage the use of a solid implementation for
>> SparseReal{Vector, Matrix}... please prefer an interface approach, allowing
>> implementations based on the Templates project:-

This is exactly the purpose of the RealMatrix/RealVector and FieldMatrix
/FieldVector interfaces on one side and of the CholeskiDecomposition,
EigenDecomposition, LUDecomposition, QRDecomposition,
SingularValueDecomposition and Decompositionsolver interfaces on the
other side.

RealMatrix (resp. FieldMatrix) is the top level interfaces that does not
mandate any specific storage. It has several implementations
(RealMatrixImpl with a simple double[][] array, DenseRealMatrix with a
block implementation, SparseRealMatrix). We also have in mind (not
implemented yet) things like DiagonalRealMatrix or BandRealMatrix or
Lower/UpperTriangularRealMatrix. All implementations can be mixed
together and specific cases are automatically detected and handled to
avoid using the general embedded loops and use smart algorithms when
possible. There was also an attempt using recursive layouts (with
Gray-Morton space filling curve).

Maybe SparseRealMatrix was a bad name and should have been
SimpleSparseRealMatrix to avoid confusion with other sparse storage and
dedicated algorithms.

>
>
> Can you say more about what aspects of the Templates project you feel are
> important?  You mention one case of storage layout.
>
>
>> I believe commons-math should move to a netlib-java backend (allowing the
>> use of machine optimised BLAS/LAPACK).
>
>
> This is an interesting suggestion.  Obviously adopting MTJ wholesale would
> accomplish that.
>
> Can you say something about the licensing issues if we were to explore, for
> discussion sake, MTJ being folded into commons-math?  MTJ is LGPL while
> commons has to stay Apache licensed.  This licensing issue has been the
> biggest sticking point in the past.

This is really an issue. Apache projects cannot use LGPL (or GPL) code.
See http://www.apache.org/legal/resolved.html for the policy.

[math] also has currently zero dependencies. We had two dependencies on
other commons components up to version 1.2 and removed them when we
started work on version 2.0. Adding new dependencies, and especially
dependencies that involve native libraries is a difficult decision that
needs lots of discussion. We are currently trying to have 2.0 published
very soon now, such a decision would delay the publication several months.

Some benchmarks I did a few weeks ago showed the new [math] linear
package implementation was quite fast and compared very well with native
fortran libraries for QR decomposition with similar non-blocked
algorithms. In fact, it was 7 times faster than unoptimized numerical
recipes, about 2 or 3 times faster than optimized numerical recipes, and
very slightly (a few percents) faster than optimized lapack with Atlas
as the BLAS implementation. Faster QR decomposition required changing
the algorithm so the blocked lapack implementation was the only native
implementation faster than [math]. Of course, I now do want to also
implement a blocked QR decomposition in [math] ...

I am aware that we still lack lots of very efficient linear algebra
algorithms. Joining efforts with you would be a real gain if we can
solve the licensing issues and avoid new dependencies if possible.
[math] has already adopted a lot of external code, even complete
libraries. I came in by donating the whole mantissa library, merging it
in and now contributing to the maintainance of the component with the
other developers.

Luc

>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Ted Dunning
Dang.

That is fast.

What size matrices was this for?

On Thu, May 14, 2009 at 12:09 PM, Luc Maisonobe <[hidden email]>wrote:

> Some benchmarks I did a few weeks ago showed the new [math] linear
> package implementation was quite fast and compared very well with native
> fortran libraries for QR decomposition with similar non-blocked
> algorithms. In fact, it was 7 times faster than unoptimized numerical
> recipes, about 2 or 3 times faster than optimized numerical recipes, and
> very slightly (a few percents) faster than optimized lapack with Atlas
> as the BLAS implementation. Faster QR decomposition required changing
> the algorithm so the blocked lapack implementation was the only native
> implementation faster than [math].
>
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
Ted Dunning a écrit :

> Dang.
>
> That is fast.
>
> What size matrices was this for?
>
> On Thu, May 14, 2009 at 12:09 PM, Luc Maisonobe <[hidden email]>wrote:
>
>> Some benchmarks I did a few weeks ago showed the new [math] linear
>> package implementation was quite fast and compared very well with native
>> fortran libraries for QR decomposition with similar non-blocked
>> algorithms. In fact, it was 7 times faster than unoptimized numerical
>> recipes, about 2 or 3 times faster than optimized numerical recipes, and
>> very slightly (a few percents) faster than optimized lapack with Atlas
>> as the BLAS implementation. Faster QR decomposition required changing
>> the algorithm so the blocked lapack implementation was the only native
>> implementation faster than [math].
>>
>

Medium size, up to 600 if I remember correctly (I don't have the curve
available here). The pattern with O(n^3) was clearly visible so the
result can be be extrapolated for medium size I think. I did not check
larger matrices like 3000 or more, results may be different.

Beware, that was one algorithm only: QR decomposition and on one host
only (AMD64 phenom quad core, Linux, sun Java 6 on one side, Gnu fortran
on the other side). GNU fortran is not a very fast compiler so the
result would obviously be very different with a better compiler. The
purpose of this test was not to say that Java is faster, it was to show
that the performance difference between Java and fortran was smaller
than the difference you get by changing algorithms (in this case blocked
vs. non-blocked). I was surprised by the results.

For other kind of algorithms, mixing linear algebra, trigonometric
computation, ODE integration, root searching ... I often see performance
differences of about a factor 2, which in my opinion is a factor similar
to other changes one can do (algorithms, CPU, BLAS, compiler,
parallelism, memory ...). Of course, when comparing an highly optimized
algorithm with the perfect cache size, and the best compiler to a
standard setting we can get very different factors. So do not take these
numbers for granted.

Trying to get the fastest computation is not the purpose of [math]. It
should be reasonably efficient with respect to other libraries,
including native ones, but is not dedicated to speed. It should remain a
general-purpose library.

Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Ted Dunning
Replies inline:

Ted Dunning wrote
> As a start, I'd like to discourage the use of a solid implementation for
> SparseReal{Vector, Matrix}... please prefer an interface approach, allowing
> implementations based on the Templates project:-

Can you say more about what aspects of the Templates project you feel are important?  You mention one case of storage layout.
It's difficult to say which algorithms from Templates ones are the most important ones, but in most cases reference implementations already exist (usually in fortran) and should be preferred (e.g. by using f2j with a wrapper layer): the theory can be quite involved. MTJ only touches the surface! However, an important step is recognising that there are not just "dense" and "sparse" matrices... but whole classes of structured sparse matrices.

Ted Dunning wrote
> I believe commons-math should move to a netlib-java backend (allowing the
> use of machine optimised BLAS/LAPACK).

This is an interesting suggestion.  Obviously adopting MTJ wholesale would
accomplish that.

Can you say something about the licensing issues if we were to explore, for
discussion sake, MTJ being folded into commons-math?  MTJ is LGPL while
commons has to stay Apache licensed.  This licensing issue has been the
biggest sticking point in the past.
I personally have no problems with my MTJ contributions being released Apache. Bjorn-Ove is the person to talk to about the bulk of MTJ. I'll ask him!

MTJ depends on netlib-java, which is technically a translation of the original netlib libraries. They are BSD license. I seriously doubt you'll get them to give you the right to redistribute as, so you'll have to decide if that's a blocker.

What would "adopting wholesale" mean? It would be a good opportunity to review/revise parts of the API and find duplication with the rest of the commons-math project.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Ted Dunning
On Thu, May 14, 2009 at 1:54 PM, Sam Halliday <[hidden email]>wrote:

> I personally have no problems with my MTJ contributions being released
> Apache. Bjorn-Ove is the person to talk to about the bulk of MTJ. I'll ask
> him!
>

Great.

MTJ depends on netlib-java, which is technically a translation of the
> original netlib libraries. They are BSD license. I seriously doubt you'll
> get them to give you the right to redistribute as, so you'll have to decide
> if that's a blocker.
>

If they really are BSD, then there should be no problem.  BSD allows
redistribution with attribution, preservation of the copyright notice and no
implication of endorsement.


> What would "adopting wholesale" mean? It would be a good opportunity to
> review/revise parts of the API and find duplication with the rest of the
> commons-math project.
>

There is a big issue with dependencies, but a much smaller issue with major
source code contributions.  Essentially what I mean by "adopting wholesale"
would be for commons math to ingest MTJ.  IN the best world, the contributor
communities would merge as well.  There would still be plenty of issues such
as the conditional dependency on native libraries.  I am not sure how that
should play out.






--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Phil Steitz
In reply to this post by Luc Maisonobe
Luc Maisonobe wrote:

> Ted Dunning a écrit :
>  
>> On Thu, May 14, 2009 at 3:18 AM, Sam Halliday <[hidden email]>wrote:
>>
>>    
>>> I am a maintainer of the matrix-toolkits-java
>>>      
>> Which is an impressive piece of work, especially the transparent but
>> non-binding interface to the Atlas and Blas native packages.  My compliments
>> to Bjørn-Ove and all who have followed up on his original work.
>>
>> This note is in regard to the commons-math library. It is clear that our
>>    
>>> projects dovetail, especially when I look at "linear" in version 2.0 of the
>>> API. It would be good if we could either complement or consolidate efforts,
>>> rather than reproduce.
>>>      
>> That sounds good to me.
>>    
>
> You are right.
>
>  
>> As a start, I'd like to discourage the use of a solid implementation for
>>    
>>> SparseReal{Vector, Matrix}... please prefer an interface approach, allowing
>>> implementations based on the Templates project:-
>>>      
>
> This is exactly the purpose of the RealMatrix/RealVector and FieldMatrix
> /FieldVector interfaces on one side and of the CholeskiDecomposition,
> EigenDecomposition, LUDecomposition, QRDecomposition,
> SingularValueDecomposition and Decompositionsolver interfaces on the
> other side.
>
> RealMatrix (resp. FieldMatrix) is the top level interfaces that does not
> mandate any specific storage. It has several implementations
> (RealMatrixImpl with a simple double[][] array, DenseRealMatrix with a
> block implementation, SparseRealMatrix). We also have in mind (not
> implemented yet) things like DiagonalRealMatrix or BandRealMatrix or
> Lower/UpperTriangularRealMatrix. All implementations can be mixed
> together and specific cases are automatically detected and handled to
> avoid using the general embedded loops and use smart algorithms when
> possible. There was also an attempt using recursive layouts (with
> Gray-Morton space filling curve).
>
> Maybe SparseRealMatrix was a bad name and should have been
> SimpleSparseRealMatrix to avoid confusion with other sparse storage and
> dedicated algorithms.
>
>  
>> Can you say more about what aspects of the Templates project you feel are
>> important?  You mention one case of storage layout.
>>
>>
>>    
>>> I believe commons-math should move to a netlib-java backend (allowing the
>>> use of machine optimised BLAS/LAPACK).
>>>      
>> This is an interesting suggestion.  Obviously adopting MTJ wholesale would
>> accomplish that.
>>
>> Can you say something about the licensing issues if we were to explore, for
>> discussion sake, MTJ being folded into commons-math?  MTJ is LGPL while
>> commons has to stay Apache licensed.  This licensing issue has been the
>> biggest sticking point in the past.
>>    
>
> This is really an issue. Apache projects cannot use LGPL (or GPL) code.
> See http://www.apache.org/legal/resolved.html for the policy.
>
> [math] also has currently zero dependencies. We had two dependencies on
> other commons components up to version 1.2 and removed them when we
> started work on version 2.0. Adding new dependencies, and especially
> dependencies that involve native libraries is a difficult decision that
> needs lots of discussion. We are currently trying to have 2.0 published
> very soon now, such a decision would delay the publication several months.
>  
-1 for adding dependencies, especially on native code.  Commons math
needs to remain

1) ASL licensed
2) self-contained
3) fully documented, full open source

Phil

> Some benchmarks I did a few weeks ago showed the new [math] linear
> package implementation was quite fast and compared very well with native
> fortran libraries for QR decomposition with similar non-blocked
> algorithms. In fact, it was 7 times faster than unoptimized numerical
> recipes, about 2 or 3 times faster than optimized numerical recipes, and
> very slightly (a few percents) faster than optimized lapack with Atlas
> as the BLAS implementation. Faster QR decomposition required changing
> the algorithm so the blocked lapack implementation was the only native
> implementation faster than [math]. Of course, I now do want to also
> implement a blocked QR decomposition in [math] ...
>
> I am aware that we still lack lots of very efficient linear algebra
> algorithms. Joining efforts with you would be a real gain if we can
> solve the licensing issues and avoid new dependencies if possible.
> [math] has already adopted a lot of external code, even complete
> libraries. I came in by donating the whole mantissa library, merging it
> in and now contributing to the maintainance of the component with the
> other developers.
>
> Luc
>
>  
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Ted Dunning
Phil, I think we have much of the same desires and motivations, but we seem
to come to somewhat, but not entirely different conclusions.

Assuming that (1) can be dealt with and assuming that (3) is already dealt
with, do you still mind the inclusion of *optional*, automatically generated
native code?

This page has some useful speed comparisons.  For matrix-matrix multiply up
to size 50, java is competitive.  If you get up to roughly n = 500 or 1000,
then heavily optimized native code can be up to 3x faster.  Note the line in
the third graph for colt (a reasonably well written pure java
implementation) and MTJ (which is running in pure java mode here).  In my
case, I will generally opt for portability, but I like to have a portable
option for speed.  It is also important to remember that numerical codes
more often need blinding speed than most other applications.

http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html

Is that optional dependency really all that bad?

On Fri, May 15, 2009 at 6:23 PM, Phil Steitz <[hidden email]> wrote:

> [math] also has currently zero dependencies. We had two dependencies on
>> other commons components up to version 1.2 and removed them when we
>> started work on version 2.0. Adding new dependencies, and especially
>> dependencies that involve native libraries is a difficult decision that
>> needs lots of discussion. We are currently trying to have 2.0 published
>> very soon now, such a decision would delay the publication several months.
>>
>>
> -1 for adding dependencies, especially on native code.  Commons math needs
> to remain
>
> 1) ASL licensed
> 2) self-contained
> 3) fully documented, full open source




--
Ted Dunning, CTO
DeepDyve
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Phil Steitz
Ted Dunning wrote:
> Phil, I think we have much of the same desires and motivations, but we seem
> to come to somewhat, but not entirely different conclusions.
>
> Assuming that (1) can be dealt with and assuming that (3) is already dealt
> with, do you still mind the inclusion of *optional*, automatically generated
> native code?
>  
Part of 3) is having full code available with the package for
inspection.  That is part of the reason that we have avoided external
dependencies.  I would be open to making our fully self-contained, fully
documented, fully open source library extensible to use other libraries,
including native libraries, but I would not want to distribute anything
associated with external libraries.  The reason for this is the
commitment made early on that all numerics and algorithms would be
immediately visible to the user - no chasing down external, possibly
incomplete or ambiguous docs to figure out what our code is doing.  
> This page has some useful speed comparisons.  For matrix-matrix multiply up
> to size 50, java is competitive.  If you get up to roughly n = 500 or 1000,
> then heavily optimized native code can be up to 3x faster.  Note the line in
> the third graph for colt (a reasonably well written pure java
> implementation) and MTJ (which is running in pure java mode here).  In my
> case, I will generally opt for portability, but I like to have a portable
> option for speed.  It is also important to remember that numerical codes
> more often need blinding speed than most other applications.
>  
As Luc said, commons-math aims to be a general-purpose applied math
package implementing good, well-documented, unencumbered numerical
algorithms.  I think this can be done in Java and we are doing it.  We
are never going to compete with optimized native code in speed, but
strong numerics, JRE improvements and Moore's law are rapidly shrinking
the class of real-world applications where the 3x difference above is
material.

Phil


> http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html
>
> Is that optional dependency really all that bad?
>
> On Fri, May 15, 2009 at 6:23 PM, Phil Steitz <[hidden email]> wrote:
>
>  
>> [math] also has currently zero dependencies. We had two dependencies on
>>    
>>> other commons components up to version 1.2 and removed them when we
>>> started work on version 2.0. Adding new dependencies, and especially
>>> dependencies that involve native libraries is a difficult decision that
>>> needs lots of discussion. We are currently trying to have 2.0 published
>>> very soon now, such a decision would delay the publication several months.
>>>
>>>
>>>      
>> -1 for adding dependencies, especially on native code.  Commons math needs
>> to remain
>>
>> 1) ASL licensed
>> 2) self-contained
>> 3) fully documented, full open source
>>    
>
>
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
Phil Steitz a écrit :

> Ted Dunning wrote:
>> Phil, I think we have much of the same desires and motivations, but we
>> seem
>> to come to somewhat, but not entirely different conclusions.
>>
>> Assuming that (1) can be dealt with and assuming that (3) is already
>> dealt
>> with, do you still mind the inclusion of *optional*, automatically
>> generated
>> native code?
>>  
> Part of 3) is having full code available with the package for
> inspection.  That is part of the reason that we have avoided external
> dependencies.  I would be open to making our fully self-contained, fully
> documented, fully open source library extensible to use other libraries,
> including native libraries, but I would not want to distribute anything
> associated with external libraries.  The reason for this is the
> commitment made early on that all numerics and algorithms would be
> immediately visible to the user - no chasing down external, possibly
> incomplete or ambiguous docs to figure out what our code is doing.

I have an additional reason for avoiding native libraries. Pure Java can
be processed by external tools for either inspection (think findbugs,
cobertura, traceability, auditing) or modification (think Nabla!). The
Nabla case is especially important to me, but I am aware this is a
corner-case.

>> This page has some useful speed comparisons.  For matrix-matrix
>> multiply up
>> to size 50, java is competitive.  If you get up to roughly n = 500 or
>> 1000,
>> then heavily optimized native code can be up to 3x faster.  Note the
>> line in
>> the third graph for colt (a reasonably well written pure java
>> implementation) and MTJ (which is running in pure java mode here).  In my
>> case, I will generally opt for portability, but I like to have a portable
>> option for speed.  It is also important to remember that numerical codes
>> more often need blinding speed than most other applications.
>>  
> As Luc said, commons-math aims to be a general-purpose applied math
> package implementing good, well-documented, unencumbered numerical
> algorithms.  I think this can be done in Java and we are doing it.  We
> are never going to compete with optimized native code in speed, but
> strong numerics, JRE improvements and Moore's law are rapidly shrinking
> the class of real-world applications where the 3x difference above is
> material.

Perhaps we should have some benchmarks including our new linear package.
Something more serious than my little experiement with QR decomposition.
Unfortunately, I clearly have no time for it now. My current priotity is
to publish 2.0 as soon as possible and I am already late on my own schedule.

Luc

>
> Phil
>
>
>> http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html
>>
>> Is that optional dependency really all that bad?
>>
>> On Fri, May 15, 2009 at 6:23 PM, Phil Steitz <[hidden email]>
>> wrote:
>>
>>  
>>> [math] also has currently zero dependencies. We had two dependencies on
>>>    
>>>> other commons components up to version 1.2 and removed them when we
>>>> started work on version 2.0. Adding new dependencies, and especially
>>>> dependencies that involve native libraries is a difficult decision that
>>>> needs lots of discussion. We are currently trying to have 2.0 published
>>>> very soon now, such a decision would delay the publication several
>>>> months.
>>>>
>>>>
>>>>      
>>> -1 for adding dependencies, especially on native code.  Commons math
>>> needs
>>> to remain
>>>
>>> 1) ASL licensed
>>> 2) self-contained
>>> 3) fully documented, full open source
>>>    
>>
>>
>>
>>
>>  
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Ted Dunning
I've asked Bjorn about an Apache license for MTJ and his reply was

  "Yes, I don't see why not. The more users/developers, the better."

Ted Dunning wrote
On Thu, May 14, 2009 at 1:54 PM, Sam Halliday wrote:
> I personally have no problems with my MTJ contributions being released
> Apache. Bjorn-Ove is the person to talk to about the bulk of MTJ. I'll ask
> him!
>

Great.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe

Luc Maisonobe wrote
Ted Dunning a écrit :
> As a start, I'd like to discourage the use of a solid implementation for
>> SparseReal{Vector, Matrix}... please prefer an interface approach, allowing
>> implementations based on the Templates project:-

Maybe SparseRealMatrix was a bad name and should have been
SimpleSparseRealMatrix to avoid confusion with other sparse storage and
dedicated algorithms.
I give a +1 for renaming SparseReal{Matrix, Vector}! These names should be reserved for interfaces (which might be method-less) indicating that the implementation storage needs to be sparse.

Luc Maisonobe wrote
> Can you say something about the licensing issues if we were to explore, for
> discussion sake, MTJ being folded into commons-math?  MTJ is LGPL while
> commons has to stay Apache licensed.  This licensing issue has been the
> biggest sticking point in the past.

This is really an issue. Apache projects cannot use LGPL (or GPL) code.
See http://www.apache.org/legal/resolved.html for the policy.
Solved! See other message. Both myself and (more importantly, because he wrote MTJ) Bjorn are willing to use Apache license.

Luc Maisonobe wrote
Adding new dependencies, and especially dependencies that involve native libraries is a difficult decision that needs lots of discussion.
MTJ depends only on netlib-java, *which does not depend on any native libs*. The option is there to add native optimised libs if the end user wants to.

Luc Maisonobe wrote
Some benchmarks I did a few weeks ago showed the new [math] linear
package implementation was quite fast and compared very well with native
fortran libraries
I'm going to call "foul" here :-)

The Java implementation of netlib-java is just as fast as machine-optimised BLAS/LAPACK... but only for matrices smaller than roughly 1000 x 1000 elements AND ONLY FOR NORMAL DESKTOPS MACHINES! The important distinction here is that hardware exists with crazy optimisations for the BLAS/LAPACK API and having the option to use that architecture from within Java is a great bonus. Consider, for example, a dedicated GPU (or FPGA) card which comes with a BLAS/LAPACK binary.

Additionally, the BLAS/LAPACK API is universally accepted. It would be a mistake to attempt to reproduce all the brain power and agreement that has worked toward it.

Luc Maisonobe wrote
I am aware that we still lack lots of very efficient linear algebra
algorithms. Joining efforts with you would be a real gain if we can
solve the licensing issues and avoid new dependencies if possible.
I am very keen to consolidate efforts! I think the next step is perhaps for you to have a look through the MTJ API and create a wish-list of everything you think would make sense to appear in commons-math. Even if adopted "wholesale", I would still strongly recommend a review of the API. e.g. some interfaces extend Serializable (a mistake) ; I'm not entirely sure how relevant the distributed package is nowadays; the Matrix Market IO is difficult to understand/use ; there should perhaps be a "factory pattern" to instantiating matrices/vectors.

In the meantime, I recommend holding off a 2.0 API release with any new linear classes. That way we can stabilise the "new" merged API... releasing that as part of 2.1.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Ted Dunning
Ted, thanks for pointing this out... I'd never seen it before. Glad MTJ did so well and I note that this isn't even with the optional native BLAS/LAPACK :-)

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe
I've somehow missed much of this discussion, which has got a little confused. I'll repeat some key facts here:-

- MTJ depends on netlib-java
- I'm the maintainer of netlib-java
- netlib-java depends on PURE JAVA code, generated by F2J from netlib.org BLAS/LAPACK (and ARPACK). Keith Seymour (author of f2j) deserves all the praise for that magnificent task! The necessary jar is distributed with netlib-java.
- BLAS/LAPACK are industry standard APIs.
- netlib-java is technically a "translation" of netlib.org's BLAS/LAPACK/ARPACK API, so is therefore BSD licensed
- netlib-java can be *optionally* configured at runtime to use a native library instead of the Java implementation.
- the java implementation is pretty damn fast and will be more than adequate for most users. However, it will *never* be as fast as native code running on specialist hardware (no matter how much the JVM improves).

Being the maintainer of netlib-java, I'd be more than happy to re-license all the bits that aren't technically "translations" of netlib.org, for inclusion in commons-math (in fact, it makes sense to do so). But you'd still need to depend on the f2j translated implementation. They are BSD license.

Hell, it makes a *lot* of sense for commons-math to provide the BLAS/LAPACK API... they are industry standards after all, and all reference implementations for linear algebra algorithms make use of them.

Luc Maisonobe wrote
I have an additional reason for avoiding native libraries. Pure Java can
be processed by external tools for either inspection (think findbugs,
cobertura, traceability, auditing) or modification (think Nabla!). The
Nabla case is especially important to me, but I am aware this is a
corner-case.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Samuel Halliday
Just to let you know, I've contacted the author of this blog post... who has recently written a library called jblas. I've asked him if he wants to be involved with the initiative here, to consolidate efforts for Java Linear Algebra packages.

Incidentally... this blog post references a very pervasive, yet abandoned, project named Colt. Colt was a brilliant library in its day (now numerically challenged), although riddled with license issues (depending on non-commercial and ill-defined not-for-military-use middleware). Colt is a reminder of what can happen when a great library is written but not maintained. There might be lessons to learn from their API... I know some projects that use it.

It might be worthwhile contacting other Java Linear Algebra package authors, such as JAMA. JAMA is a very small library in comparison (no additional functionality over MTJ or commons-math)... but they might have a different take on APIs than we would have.

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
In reply to this post by Samuel Halliday
Sam Halliday a écrit :

> I've somehow missed much of this discussion, which has got a little confused.
> I'll repeat some key facts here:-
>
> - MTJ depends on netlib-java
> - I'm the maintainer of netlib-java
> - netlib-java depends on PURE JAVA code, generated by F2J from netlib.org
> BLAS/LAPACK (and ARPACK). Keith Seymour (author of f2j) deserves all the
> praise for that magnificent task! The necessary jar is distributed with
> netlib-java.
> - BLAS/LAPACK are industry standard APIs.
> - netlib-java is technically a "translation" of netlib.org's
> BLAS/LAPACK/ARPACK API, so is therefore BSD licensed
> - netlib-java can be *optionally* configured at runtime to use a native
> library instead of the Java implementation.
> - the java implementation is pretty damn fast and will be more than adequate
> for most users. However, it will *never* be as fast as native code running
> on specialist hardware (no matter how much the JVM improves).
>
> Being the maintainer of netlib-java, I'd be more than happy to re-license
> all the bits that aren't technically "translations" of netlib.org, for
> inclusion in commons-math (in fact, it makes sense to do so). But you'd
> still need to depend on the f2j translated implementation. They are BSD
> license.

This is becoming more and more interesting. However, do yo think it
would be possible to "include" the source (either manually written or
automatically translated) into [math] ? This would allow a
self-contained package.

We already provide some code which technically comes from translated
netlib routines, for example part of the Levenberg-Marquardt or almost
everything in the singular value decomposition. The Netlib license
allows that and we have set up the appropriate notices (see the javadoc
and the NOTICE.txt file).

>
> Hell, it makes a *lot* of sense for commons-math to provide the BLAS/LAPACK
> API... they are industry standards after all, and all reference
> implementations for linear algebra algorithms make use of them.

I strongly approve that for BLAS. I dream of the BLAS API being
mandatory in JVM implementations, but this will probably never happen.
Considering LAPACK, I am less convinced because the API is strongly
fortran-oriented, not using some of the object-oriented features that
are well suited for mathematical concepts. The algorithms and their
implementations are very good, and we already use them inside, but with
a different API.

Luc

>
>
> Luc Maisonobe wrote:
>> I have an additional reason for avoiding native libraries. Pure Java can
>> be processed by external tools for either inspection (think findbugs,
>> cobertura, traceability, auditing) or modification (think Nabla!). The
>> Nabla case is especially important to me, but I am aware this is a
>> corner-case.
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Phil Steitz
In reply to this post by Luc Maisonobe
Luc Maisonobe wrote:

> Phil Steitz a écrit :
>  
>> Ted Dunning wrote:
>>    
>>> Phil, I think we have much of the same desires and motivations, but we
>>> seem
>>> to come to somewhat, but not entirely different conclusions.
>>>
>>> Assuming that (1) can be dealt with and assuming that (3) is already
>>> dealt
>>> with, do you still mind the inclusion of *optional*, automatically
>>> generated
>>> native code?
>>>  
>>>      
>> Part of 3) is having full code available with the package for
>> inspection.  That is part of the reason that we have avoided external
>> dependencies.  I would be open to making our fully self-contained, fully
>> documented, fully open source library extensible to use other libraries,
>> including native libraries, but I would not want to distribute anything
>> associated with external libraries.  The reason for this is the
>> commitment made early on that all numerics and algorithms would be
>> immediately visible to the user - no chasing down external, possibly
>> incomplete or ambiguous docs to figure out what our code is doing.
>>    
>
> I have an additional reason for avoiding native libraries. Pure Java can
> be processed by external tools for either inspection (think findbugs,
> cobertura, traceability, auditing) or modification (think Nabla!). The
> Nabla case is especially important to me, but I am aware this is a
> corner-case.
>
>  
>>> This page has some useful speed comparisons.  For matrix-matrix
>>> multiply up
>>> to size 50, java is competitive.  If you get up to roughly n = 500 or
>>> 1000,
>>> then heavily optimized native code can be up to 3x faster.  Note the
>>> line in
>>> the third graph for colt (a reasonably well written pure java
>>> implementation) and MTJ (which is running in pure java mode here).  In my
>>> case, I will generally opt for portability, but I like to have a portable
>>> option for speed.  It is also important to remember that numerical codes
>>> more often need blinding speed than most other applications.
>>>  
>>>      
>> As Luc said, commons-math aims to be a general-purpose applied math
>> package implementing good, well-documented, unencumbered numerical
>> algorithms.  I think this can be done in Java and we are doing it.  We
>> are never going to compete with optimized native code in speed, but
>> strong numerics, JRE improvements and Moore's law are rapidly shrinking
>> the class of real-world applications where the 3x difference above is
>> material.
>>    
>
> Perhaps we should have some benchmarks including our new linear package.
> Something more serious than my little experiement with QR decomposition.
> Unfortunately, I clearly have no time for it now. My current priotity is
> to publish 2.0 as soon as possible and I am already late on my own schedule.
>  
+1 for getting 2.0 released ASAP.   This is long overdue and we need to
stay focussed on getting it out.

Phil

> Luc
>
>  
>> Phil
>>
>>
>>    
>>> http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html
>>>
>>> Is that optional dependency really all that bad?
>>>
>>> On Fri, May 15, 2009 at 6:23 PM, Phil Steitz <[hidden email]>
>>> wrote:
>>>
>>>  
>>>      
>>>> [math] also has currently zero dependencies. We had two dependencies on
>>>>    
>>>>        
>>>>> other commons components up to version 1.2 and removed them when we
>>>>> started work on version 2.0. Adding new dependencies, and especially
>>>>> dependencies that involve native libraries is a difficult decision that
>>>>> needs lots of discussion. We are currently trying to have 2.0 published
>>>>> very soon now, such a decision would delay the publication several
>>>>> months.
>>>>>
>>>>>
>>>>>      
>>>>>          
>>>> -1 for adding dependencies, especially on native code.  Commons math
>>>> needs
>>>> to remain
>>>>
>>>> 1) ASL licensed
>>>> 2) self-contained
>>>> 3) fully documented, full open source
>>>>    
>>>>        
>>>
>>>
>>>  
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>    
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe
Luc, if the Apache team are happy to include source generated by f2j (which is therefore BSD license) then there is no reason at all to have a dependency!

The generator code from netlib-java need not be distributed as part of the final commons-math binary, it is only needed to generate the .c files which allow for a native library at runtime. I would foresee the .c files being distributed as part of the commons-math binary download, with instructions on how to build the optional native library. The entire mechanism for doing this is entirely up for debate and review. The important thing is that there be a standardised BLAS/LAPACK API available.

Luc Maisonobe wrote
Sam Halliday a écrit :
> I've somehow missed much of this discussion, which has got a little confused.
> I'll repeat some key facts here:-
>
> - MTJ depends on netlib-java
> - I'm the maintainer of netlib-java
> - netlib-java depends on PURE JAVA code, generated by F2J from netlib.org
> BLAS/LAPACK (and ARPACK). Keith Seymour (author of f2j) deserves all the
> praise for that magnificent task! The necessary jar is distributed with
> netlib-java.
> - BLAS/LAPACK are industry standard APIs.
> - netlib-java is technically a "translation" of netlib.org's
> BLAS/LAPACK/ARPACK API, so is therefore BSD licensed
> - netlib-java can be *optionally* configured at runtime to use a native
> library instead of the Java implementation.
> - the java implementation is pretty damn fast and will be more than adequate
> for most users. However, it will *never* be as fast as native code running
> on specialist hardware (no matter how much the JVM improves).
>
> Being the maintainer of netlib-java, I'd be more than happy to re-license
> all the bits that aren't technically "translations" of netlib.org, for
> inclusion in commons-math (in fact, it makes sense to do so). But you'd
> still need to depend on the f2j translated implementation. They are BSD
> license.

This is becoming more and more interesting. However, do yo think it
would be possible to "include" the source (either manually written or
automatically translated) into [math] ? This would allow a
self-contained package.

We already provide some code which technically comes from translated
netlib routines, for example part of the Levenberg-Marquardt or almost
everything in the singular value decomposition. The Netlib license
allows that and we have set up the appropriate notices (see the javadoc
and the NOTICE.txt file).

>
> Hell, it makes a *lot* of sense for commons-math to provide the BLAS/LAPACK
> API... they are industry standards after all, and all reference
> implementations for linear algebra algorithms make use of them.

I strongly approve that for BLAS. I dream of the BLAS API being
mandatory in JVM implementations, but this will probably never happen.
Considering LAPACK, I am less convinced because the API is strongly
fortran-oriented, not using some of the object-oriented features that
are well suited for mathematical concepts. The algorithms and their
implementations are very good, and we already use them inside, but with
a different API.

Luc

>
>
> Luc Maisonobe wrote:
>> I have an additional reason for avoiding native libraries. Pure Java can
>> be processed by external tools for either inspection (think findbugs,
>> cobertura, traceability, auditing) or modification (think Nabla!). The
>> Nabla case is especially important to me, but I am aware this is a
>> corner-case.
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe
I think netlib-java might actually be using the CLAPACK version of LAPACK... the biggest problem with C/Fortran is the array indexing is different for double[][]. CLAPACK addresses this.

LAPACK is still heavily used in reference implementations of standard algorithms, although admittedly not as *core* as BLAS. The ARPACK API is also worthwhile considering for inclusion (it's part of netlib-java and f2j's translations).

Luc Maisonobe wrote
I strongly approve that for BLAS. I dream of the BLAS API being
mandatory in JVM implementations, but this will probably never happen.
Considering LAPACK, I am less convinced because the API is strongly
fortran-oriented, not using some of the object-oriented features that
are well suited for mathematical concepts. The algorithms and their
implementations are very good, and we already use them inside, but with
a different API.
12345