commons-math, matrix-toolkits-java and consolidation

classic Classic list List threaded Threaded
93 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Phil Steitz
Luc Maisonobe wrote:

> Phil Steitz a écrit :
>  
>> Luc Maisonobe wrote:
>>    
>>> Sam Halliday a écrit :
>>>  
>>>      
>>>> I've had a quick look at the 2.0 API and the only changes I'd like to
>>>> request
>>>> are:-
>>>>
>>>> - create interfaces RealSparse{Matrix, Vector} to indicate a sparse
>>>> storage
>>>> implementation. Can be empty (for now).
>>>>    
>>>>        
>>> I suppose these interfaces would extend Real{Matrix, Vector} ?
>>>
>>>  
>>>      
>>>> - rename SparseReal{Matrix, Vector} to OpenMapRealSparse{Matrix,
>>>> Vector}.
>>>> Implement above interfaces.
>>>>    
>>>>        
>>> I have no opinion on that. What do others think ?
>>>  
>>>      
>> +1 for "dethroning" ;)
>>
>> With the new interfaces, we could also drop the "Sparse" from the name -
>> i.e, "OpenMapRealMatrix"
>>
>> I am wondering now whether similar marking and "dethroning" should
>> happen on the dense side - i.e DenseReal{Matrix, Vector},
>> BlockRealMatrix (currently DenseRealMatrix), ArrayRealMatrix (currently
>> RealMatrixImpl)?
>>    
>
> +1. This is simple for DenseRealMatrix since it has been created after
> 1.2, but RealMatrixImpl may induce more problems to users. Should we
> copy an deprecate ?
>  
Yes, for RealMatrixImpl, we should copy and deprecate the original.

Phil

>  
>>>  
>>>      
>>>> - implementations should subclass the return type of some methods in the
>>>> Real{Matrix, Vector} API. e.g. RealVectorImpl.copy() returns a
>>>> RealVector,
>>>> but could declare that it returns a RealVectorImpl. This will avoid
>>>> needless
>>>> user casts. In future APIs, this could be a powerful way to define the
>>>> return type of matrix-matrix computation when the storage classes are
>>>> known... e.g. declaring that you return a DiagonalMatrix.    
>>>>        
>>> I didn't know changing the return type to a subtype was allowed when
>>> implementing an interface! This is for sure a good thing to do and could
>>> probably be used in many other places too.
>>>  
>>>      
>> +1 in general, but need to realize that when implemented, this changes
>> locks us in.  In the specific cases mentioned above, this is a
>> no-brainer, but there are others (mostly in the decomp classes) where it
>> is not so obvious.
>>
>>    
>>>  
>>>      
>>>> - is it too late to define a Norm enum and take that as a parameter,
>>>> rather
>>>> than explicit methods for each Norm type?
>>>>    
>>>>        
>>> There are many places where we use the same pattern outside of linear
>>> algebra. I'm reluctant to change this because if we extend the enum
>>> later, we may forget to add a new case in all implementations (somewhere
>>> in a switch statement), so we should add an exception for defensive
>>> programming. A method name enforced in an interface prevent such errors,
>>> you cannot compile your implementation if you forget a method.
>>>  
>>>      
>> Agree with Luc on this, but more from a Java style than maintenance
>> standpoint.  We have tried in general to stay away from "behavior flags"
>> in method signatures up to now.
>>
>> Phil
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>    
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Bill Barker
Excellent :-)

I think the consensus is to keep the sparse interfaces empty for now... don't want to lock ourselves into anything without detailed discussion, which there is no time for before 2.0.

Just drop a note into the Javadocs to the effect that methods may be added in the next release. I can't see users extending these interfaces in 2.0, because they are not even going to be used internally.

Bill Barker wrote
I'm +1 on these grounds (and, yes, that means I'm volunteering to do the
work).  I could add the getSparcity method as well (the only one that is
useful for 2.0).  Could do the getShape which would presumably return an
enum (which would only have one element in 2.0) to ease the pain of
upgrading to 2.1 when there are more.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Bill Barker

----- Original Message -----
From: "Sam Halliday" <[hidden email]>
To: <[hidden email]>
Sent: Monday, May 18, 2009 4:29 AM
Subject: Re: [math] Re: commons-math, matrix-toolkits-java and consolidation


>
> Excellent :-)
>
> I think the consensus is to keep the sparse interfaces empty for now...
> don't want to lock ourselves into anything without detailed discussion,
> which there is no time for before 2.0.
>
> Just drop a note into the Javadocs to the effect that methods may be added
> in the next release. I can't see users extending these interfaces in 2.0,
> because they are not even going to be used internally.
>

What I actually went for is to add getSparcity to the SparseRealVector
interface (should be easy to calculate in one dimension, and mostly useful
for DEBUG level logging, so stubbing it shouldn't be a problem if not), and
getShape for the SparseRealMatrix interface that returns an enum that
currently only has the value of 'Any'.

I really don't like the idea of changing the API between minor releases, so
put in place-holders for what seems to be the consensus for going forward.

>
> Bill Barker wrote:
>>
>> I'm +1 on these grounds (and, yes, that means I'm volunteering to do the
>> work).  I could add the getSparcity method as well (the only one that is
>> useful for 2.0).  Could do the getShape which would presumably return an
>> enum (which would only have one element in 2.0) to ease the pain of
>> upgrading to 2.1 when there are more.
>>
>
> --
> View this message in context:
> http://www.nabble.com/commons-math%2C-matrix-toolkits-java-and-consolidation-tp23537813p23595640.html
> Sent from the Commons - Dev mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
Bill, I strongly discourage adding these methods at this time. We will regret it.

If you don't want to change (i.e. add new methods) to an interface, then the sensible thing is to omit these interfaces for 2.0 and introduce them with 2.1.

Bill Barker wrote
What I actually went for is to add getSparcity to the SparseRealVector
interface (should be easy to calculate in one dimension, and mostly useful
for DEBUG level logging, so stubbing it shouldn't be a problem if not), and
getShape for the SparseRealMatrix interface that returns an enum that
currently only has the value of 'Any'.

I really don't like the idea of changing the API between minor releases, so
put in place-holders for what seems to be the consensus for going forward.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Jin Mingjian
Can I ask a question? how about MTJ vs pure F2J? (If I don't use the native
part of MTJ.)
Can I get more must-have things? is it necessary to introduce a wrapper
layer?
Best regards,
Jin
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Ted Dunning
The output of f2j is almost unusable for normal mortals.  There will
definitely be a wrapper layer.

Also, there is more than a wrapper in MTJ.  There are also higher order
algorithms that use the lower level linear algebra provided by f2j.

On Tue, May 19, 2009 at 2:59 AM, Jin Mingjian <[hidden email]> wrote:

> Can I ask a question? how about MTJ vs pure F2J? (If I don't use the native
> part of MTJ.)
> Can I get more must-have things? is it necessary to introduce a wrapper
> layer?
> Best regards,
> Jin
>



--
Ted Dunning, CTO
DeepDyve
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Phil Steitz
In reply to this post by Samuel Halliday
Sam Halliday wrote:
> Bill, I strongly discourage adding these methods at this time. We will regret
> it.
>
> If you don't want to change (i.e. add new methods) to an interface, then the
> sensible thing is to omit these interfaces for 2.0 and introduce them with
> 2.1.
>  
+1.  Unless we are either a) agreed on contents of non-marker interfaces
or b) willing to leave (until 3.0) markers empty, we should omit them.  
We cannot introduce incompatible changes in point releases.

Phil

>
> Bill Barker wrote:
>  
>> What I actually went for is to add getSparcity to the SparseRealVector
>> interface (should be easy to calculate in one dimension, and mostly useful
>> for DEBUG level logging, so stubbing it shouldn't be a problem if not),
>> and
>> getShape for the SparseRealMatrix interface that returns an enum that
>> currently only has the value of 'Any'.
>>
>> I really don't like the idea of changing the API between minor releases,
>> so
>> put in place-holders for what seems to be the consensus for going forward.
>>
>>    
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Bill Barker

----- Original Message -----
From: "Phil Steitz" <[hidden email]>
To: "Commons Developers List" <[hidden email]>
Sent: Wednesday, May 20, 2009 11:05 AM
Subject: Re: [math] Re: commons-math, matrix-toolkits-java and consolidation


> Sam Halliday wrote:
>> Bill, I strongly discourage adding these methods at this time. We will
>> regret
>> it.
>>
>> If you don't want to change (i.e. add new methods) to an interface, then
>> the
>> sensible thing is to omit these interfaces for 2.0 and introduce them
>> with
>> 2.1.
>>
> +1.  Unless we are either a) agreed on contents of non-marker interfaces
> or b) willing to leave (until 3.0) markers empty, we should omit them.  We
> cannot introduce incompatible changes in point releases.
>

I have a slight preference for leaving the markers empty until 3.0, but I
can remove them as well.  But I can wait to see what the community consensus
is before making changes.

> Phil
>>
>> Bill Barker wrote:
>>
>>> What I actually went for is to add getSparcity to the SparseRealVector
>>> interface (should be easy to calculate in one dimension, and mostly
>>> useful for DEBUG level logging, so stubbing it shouldn't be a problem if
>>> not),
>>> and getShape for the SparseRealMatrix interface that returns an enum
>>> that currently only has the value of 'Any'.
>>>
>>> I really don't like the idea of changing the API between minor releases,
>>> so put in place-holders for what seems to be the consensus for going
>>> forward.
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
Bill, I've had a look at some of the recent changes... some comments, including some new thoughts:-

- changing the return type to be actual classes was only supposed to be for copy() for 2.0. Doing this on multiply, add, etc is a big mistake... there is no guarantee that is the best thing to do and it is likely this will change in the future. This stands for all implementations. Great thought needs to go into each method, not just by reading the current implementation.

- I *strongly* urge you to remove Serializable from everything! Please, we did this in MTJ and it turned out to be a major pain. A more appropriate approach is to define a class for reading/writing Matrix Market files. This can be a new feature in 2.1. If you're going to leave it, at least document that the Serializable form is not guaranteed to remain compatible across versions.

- I discourage the use of the classes named *Impl. They will get very confusing when other implementations are added later! Instead, I recommend the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array backed implementation using row ordering). This allows a column-based or 1D implementation in the future without names getting very confusing. These implementations are hidden from users who just use the MatrixUtils help

- -1 for inclusion of methods in the sparse interfaces and shape type enum. This needs more thought and discussion before inclusion. I am happy enough for marker interfaces to be included (as long as it is documented that these interfaces are subject to update in the next release), but strongly against including methods in them.

- could you please describe the hashing algorithm that OpenMap is using? I have previously written a GNU Trove based sparse matrix implementation which was a map from the primitive "long" to "double". It looks analagous. I broke the row/column index into a long for the key using binary operations, but the hashcode was tricky as it had to be an integer (not long). A naive implementation can lead to collisions for typical or symmetric matrices. It looks like the current implementation is using an int -> double backing map! Wouldn't it make more sense to use a long?

The following code might be useful to you (don't forget the L marker and casts or the compiler will silently cast to an int), there are two options for a hashcode calculator... I've not stress tested the second one but it might be more appropriate as it gives more weight to lower bits. Remember that FF is 256, or a full byte... so a full 64 bit long is 8 bytes or 16 hex characters in Java bitwise.

       int hash1(long val) {
               int first = Integer.reverse(key2row(val));
               int second = key2col(val);
               return first + second;
       }

       int hash2(long val) {
               int first = key2row(val) & 0x0000FFFF;
               int second = key2col(val) << 16;
               return first + second;
       }

       long key(int row, int column) {
               return (((long) row) << 32) + (long) column;
       }

       int key2col(long key) {
               return (int) (key & 0x00000000FFFFFFFFL);
       }

       int key2row(long key) {
               return (int) (key >>> 32);
       }

Bill Barker wrote
I have a slight preference for leaving the markers empty until 3.0, but I
can remove them as well.  But I can wait to see what the community consensus
is before making changes.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
Sam Halliday a écrit :
> Bill, I've had a look at some of the recent changes... some comments,
> including some new thoughts:-
>
> - changing the return type to be actual classes was only supposed to be for
> copy() for 2.0. Doing this on multiply, add, etc is a big mistake... there
> is no guarantee that is the best thing to do and it is likely this will
> change in the future. This stands for all implementations. Great thought
> needs to go into each method, not just by reading the current
> implementation.

It may help the compiler using directly the optimized versions in
statements like:

  m1.add(m2.multiply(m3.subtract(m4))

>
> - I *strongly* urge you to remove Serializable from everything! Please, we
> did this in MTJ and it turned out to be a major pain. A more appropriate
> approach is to define a class for reading/writing Matrix Market files. This
> can be a new feature in 2.1. If you're going to leave it, at least document
> that the Serializable form is not guaranteed to remain compatible across
> versions.

I disagree with and I think its me who started adding this interface
everywhere. The rationale is explained in the following use case:

As a user, I want to implement an interface for some xyz library where
the interface already extends Serializable. In my implementation, I want
to have an instance field that is some RealMatrix. If RealMatrix does
not declare it extends Serializable, I keep getting warnings about
having a non-serializable field in a serializable class. Looking the
other way round having serializable and not using it never caused me any
pain because it was really a generalized pattern. Mixing serializable
and non serializable is difficult, putting it everywhere and sticking to
this policy is simple.

What kind of pain did Serializable cause in MTJ ?

BTW, I'm really +1 to Matrix Market files, could you open a Jira issue
asking for this enhancement for 2.1 ?

>
> - I discourage the use of the classes named *Impl. They will get very
> confusing when other implementations are added later! Instead, I recommend
> the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array
> backed implementation using row ordering). This allows a column-based or 1D
> implementation in the future without names getting very confusing. These
> implementations are hidden from users who just use the MatrixUtils help

Phil suggested to change RealMatrixImpl to ArrayRealMatrix (and
DenseRealMatrix to BlockRealMatrix). This sounds good to me.

>
> - -1 for inclusion of methods in the sparse interfaces and shape type enum.
> This needs more thought and discussion before inclusion. I am happy enough
> for marker interfaces to be included (as long as it is documented that these
> interfaces are subject to update in the next release), but strongly against
> including methods in them.

+1 to further thoughts and discussion and postponing declaring methods
in the interfaces after 2.0. Lets start with empty marker interfaces.

Luc

>
> - could you please describe the hashing algorithm that OpenMap is using? I
> have previously written a GNU Trove based sparse matrix implementation which
> was a map from the primitive "long" to "double". It looks analagous. I broke
> the row/column index into a long for the key using binary operations, but
> the hashcode was tricky as it had to be an integer (not long). A naive
> implementation can lead to collisions for typical or symmetric matrices. It
> looks like the current implementation is using an int -> double backing map!
> Wouldn't it make more sense to use a long?
>
> The following code might be useful to you (don't forget the L marker and
> casts or the compiler will silently cast to an int), there are two options
> for a hashcode calculator... I've not stress tested the second one but it
> might be more appropriate as it gives more weight to lower bits. Remember
> that FF is 256, or a full byte... so a full 64 bit long is 8 bytes or 16 hex
> characters in Java bitwise.
>
>        int hash1(long val) {
>                int first = Integer.reverse(key2row(val));
>                int second = key2col(val);
>                return first + second;
>        }
>
>        int hash2(long val) {
>                int first = key2row(val) & 0x0000FFFF;
>                int second = key2col(val) << 16;
>                return first + second;
>        }
>
>        long key(int row, int column) {
>                return (((long) row) << 32) + (long) column;
>        }
>
>        int key2col(long key) {
>                return (int) (key & 0x00000000FFFFFFFFL);
>        }
>
>        int key2row(long key) {
>                return (int) (key >>> 32);
>        }
>
>
> Bill Barker wrote:
>> I have a slight preference for leaving the markers empty until 3.0, but I
>> can remove them as well.  But I can wait to see what the community
>> consensus
>> is before making changes.
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

sebb-2-2
On 21/05/2009, Luc Maisonobe <[hidden email]> wrote:

> Sam Halliday a écrit :
>
> > Bill, I've had a look at some of the recent changes... some comments,
>  > including some new thoughts:-
>  >
>  > - changing the return type to be actual classes was only supposed to be for
>  > copy() for 2.0. Doing this on multiply, add, etc is a big mistake... there
>  > is no guarantee that is the best thing to do and it is likely this will
>  > change in the future. This stands for all implementations. Great thought
>  > needs to go into each method, not just by reading the current
>  > implementation.
>
>
> It may help the compiler using directly the optimized versions in
>  statements like:
>
>   m1.add(m2.multiply(m3.subtract(m4))
>
>
>  >
>  > - I *strongly* urge you to remove Serializable from everything! Please, we
>  > did this in MTJ and it turned out to be a major pain. A more appropriate
>  > approach is to define a class for reading/writing Matrix Market files. This
>  > can be a new feature in 2.1. If you're going to leave it, at least document
>  > that the Serializable form is not guaranteed to remain compatible across
>  > versions.
>
>
> I disagree with and I think its me who started adding this interface
>  everywhere. The rationale is explained in the following use case:
>
>  As a user, I want to implement an interface for some xyz library where
>  the interface already extends Serializable. In my implementation, I want
>  to have an instance field that is some RealMatrix. If RealMatrix does
>  not declare it extends Serializable, I keep getting warnings about
>  having a non-serializable field in a serializable class. Looking the
>  other way round having serializable and not using it never caused me any
>  pain because it was really a generalized pattern. Mixing serializable
>  and non serializable is difficult, putting it everywhere and sticking to
>  this policy is simple.

One can use the "transient" marker for non-serializable fields.

Merely implementing Serializable will prevent the warnings, however in
general it's not trivial to ensure that serialization works correctly.

>  What kind of pain did Serializable cause in MTJ ?
>
>  BTW, I'm really +1 to Matrix Market files, could you open a Jira issue
>  asking for this enhancement for 2.1 ?
>
>
>  >
>  > - I discourage the use of the classes named *Impl. They will get very
>  > confusing when other implementations are added later! Instead, I recommend
>  > the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array
>  > backed implementation using row ordering). This allows a column-based or 1D
>  > implementation in the future without names getting very confusing. These
>  > implementations are hidden from users who just use the MatrixUtils help
>
>
> Phil suggested to change RealMatrixImpl to ArrayRealMatrix (and
>  DenseRealMatrix to BlockRealMatrix). This sounds good to me.
>
>
>  >
>  > - -1 for inclusion of methods in the sparse interfaces and shape type enum.
>  > This needs more thought and discussion before inclusion. I am happy enough
>  > for marker interfaces to be included (as long as it is documented that these
>  > interfaces are subject to update in the next release), but strongly against
>  > including methods in them.
>
>
> +1 to further thoughts and discussion and postponing declaring methods
>  in the interfaces after 2.0. Lets start with empty marker interfaces.
>
>
>  Luc
>
>
>  >
>  > - could you please describe the hashing algorithm that OpenMap is using? I
>  > have previously written a GNU Trove based sparse matrix implementation which
>  > was a map from the primitive "long" to "double". It looks analagous. I broke
>  > the row/column index into a long for the key using binary operations, but
>  > the hashcode was tricky as it had to be an integer (not long). A naive
>  > implementation can lead to collisions for typical or symmetric matrices. It
>  > looks like the current implementation is using an int -> double backing map!
>  > Wouldn't it make more sense to use a long?
>  >
>  > The following code might be useful to you (don't forget the L marker and
>  > casts or the compiler will silently cast to an int), there are two options
>  > for a hashcode calculator... I've not stress tested the second one but it
>  > might be more appropriate as it gives more weight to lower bits. Remember
>  > that FF is 256, or a full byte... so a full 64 bit long is 8 bytes or 16 hex
>  > characters in Java bitwise.
>  >
>  >        int hash1(long val) {
>  >                int first = Integer.reverse(key2row(val));
>  >                int second = key2col(val);
>  >                return first + second;
>  >        }
>  >
>  >        int hash2(long val) {
>  >                int first = key2row(val) & 0x0000FFFF;
>  >                int second = key2col(val) << 16;
>  >                return first + second;
>  >        }
>  >
>  >        long key(int row, int column) {
>  >                return (((long) row) << 32) + (long) column;
>  >        }
>  >
>  >        int key2col(long key) {
>  >                return (int) (key & 0x00000000FFFFFFFFL);
>  >        }
>  >
>  >        int key2row(long key) {
>  >                return (int) (key >>> 32);
>  >        }
>  >
>  >
>  > Bill Barker wrote:
>  >> I have a slight preference for leaving the markers empty until 3.0, but I
>  >> can remove them as well.  But I can wait to see what the community
>  >> consensus
>  >> is before making changes.
>  >>
>  >
>
>
>
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
sebb a écrit :

> On 21/05/2009, Luc Maisonobe <[hidden email]> wrote:
>> Sam Halliday a écrit :
>>
>>> Bill, I've had a look at some of the recent changes... some comments,
>>  > including some new thoughts:-
>>  >
>>  > - changing the return type to be actual classes was only supposed to be for
>>  > copy() for 2.0. Doing this on multiply, add, etc is a big mistake... there
>>  > is no guarantee that is the best thing to do and it is likely this will
>>  > change in the future. This stands for all implementations. Great thought
>>  > needs to go into each method, not just by reading the current
>>  > implementation.
>>
>>
>> It may help the compiler using directly the optimized versions in
>>  statements like:
>>
>>   m1.add(m2.multiply(m3.subtract(m4))
>>
>>
>>  >
>>  > - I *strongly* urge you to remove Serializable from everything! Please, we
>>  > did this in MTJ and it turned out to be a major pain. A more appropriate
>>  > approach is to define a class for reading/writing Matrix Market files. This
>>  > can be a new feature in 2.1. If you're going to leave it, at least document
>>  > that the Serializable form is not guaranteed to remain compatible across
>>  > versions.
>>
>>
>> I disagree with and I think its me who started adding this interface
>>  everywhere. The rationale is explained in the following use case:
>>
>>  As a user, I want to implement an interface for some xyz library where
>>  the interface already extends Serializable. In my implementation, I want
>>  to have an instance field that is some RealMatrix. If RealMatrix does
>>  not declare it extends Serializable, I keep getting warnings about
>>  having a non-serializable field in a serializable class. Looking the
>>  other way round having serializable and not using it never caused me any
>>  pain because it was really a generalized pattern. Mixing serializable
>>  and non serializable is difficult, putting it everywhere and sticking to
>>  this policy is simple.
>
> One can use the "transient" marker for non-serializable fields.

Transient is OK when the field state can be reconstructed on
deserialization from other fields information. This is quite rare and is
surely not the case with matrices which may be huge and therefore not
duplicated in several different fields.

>
> Merely implementing Serializable will prevent the warnings, however in
> general it's not trivial to ensure that serialization works correctly.

This is probably simpler in math objects (except graph theory perhaps)
than in other business domains. Our objects typically hold a primitive
and arrays and some other lower level objects with similar simple
structures.

Luc

>
>>  What kind of pain did Serializable cause in MTJ ?
>>
>>  BTW, I'm really +1 to Matrix Market files, could you open a Jira issue
>>  asking for this enhancement for 2.1 ?
>>
>>
>>  >
>>  > - I discourage the use of the classes named *Impl. They will get very
>>  > confusing when other implementations are added later! Instead, I recommend
>>  > the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array
>>  > backed implementation using row ordering). This allows a column-based or 1D
>>  > implementation in the future without names getting very confusing. These
>>  > implementations are hidden from users who just use the MatrixUtils help
>>
>>
>> Phil suggested to change RealMatrixImpl to ArrayRealMatrix (and
>>  DenseRealMatrix to BlockRealMatrix). This sounds good to me.
>>
>>
>>  >
>>  > - -1 for inclusion of methods in the sparse interfaces and shape type enum.
>>  > This needs more thought and discussion before inclusion. I am happy enough
>>  > for marker interfaces to be included (as long as it is documented that these
>>  > interfaces are subject to update in the next release), but strongly against
>>  > including methods in them.
>>
>>
>> +1 to further thoughts and discussion and postponing declaring methods
>>  in the interfaces after 2.0. Lets start with empty marker interfaces.
>>
>>
>>  Luc
>>
>>
>>  >
>>  > - could you please describe the hashing algorithm that OpenMap is using? I
>>  > have previously written a GNU Trove based sparse matrix implementation which
>>  > was a map from the primitive "long" to "double". It looks analagous. I broke
>>  > the row/column index into a long for the key using binary operations, but
>>  > the hashcode was tricky as it had to be an integer (not long). A naive
>>  > implementation can lead to collisions for typical or symmetric matrices. It
>>  > looks like the current implementation is using an int -> double backing map!
>>  > Wouldn't it make more sense to use a long?
>>  >
>>  > The following code might be useful to you (don't forget the L marker and
>>  > casts or the compiler will silently cast to an int), there are two options
>>  > for a hashcode calculator... I've not stress tested the second one but it
>>  > might be more appropriate as it gives more weight to lower bits. Remember
>>  > that FF is 256, or a full byte... so a full 64 bit long is 8 bytes or 16 hex
>>  > characters in Java bitwise.
>>  >
>>  >        int hash1(long val) {
>>  >                int first = Integer.reverse(key2row(val));
>>  >                int second = key2col(val);
>>  >                return first + second;
>>  >        }
>>  >
>>  >        int hash2(long val) {
>>  >                int first = key2row(val) & 0x0000FFFF;
>>  >                int second = key2col(val) << 16;
>>  >                return first + second;
>>  >        }
>>  >
>>  >        long key(int row, int column) {
>>  >                return (((long) row) << 32) + (long) column;
>>  >        }
>>  >
>>  >        int key2col(long key) {
>>  >                return (int) (key & 0x00000000FFFFFFFFL);
>>  >        }
>>  >
>>  >        int key2row(long key) {
>>  >                return (int) (key >>> 32);
>>  >        }
>>  >
>>  >
>>  > Bill Barker wrote:
>>  >> I have a slight preference for leaving the markers empty until 3.0, but I
>>  >> can remove them as well.  But I can wait to see what the community
>>  >> consensus
>>  >> is before making changes.
>>  >>
>>  >
>>
>>
>>
>> ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: [hidden email]
>>  For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

sebb-2-2
On 21/05/2009, Luc Maisonobe <[hidden email]> wrote:

> sebb a écrit :
>
> > On 21/05/2009, Luc Maisonobe <[hidden email]> wrote:
>  >> Sam Halliday a écrit :
>  >>
>  >>> Bill, I've had a look at some of the recent changes... some comments,
>  >>  > including some new thoughts:-
>  >>  >
>  >>  > - changing the return type to be actual classes was only supposed to be for
>  >>  > copy() for 2.0. Doing this on multiply, add, etc is a big mistake... there
>  >>  > is no guarantee that is the best thing to do and it is likely this will
>  >>  > change in the future. This stands for all implementations. Great thought
>  >>  > needs to go into each method, not just by reading the current
>  >>  > implementation.
>  >>
>  >>
>  >> It may help the compiler using directly the optimized versions in
>  >>  statements like:
>  >>
>  >>   m1.add(m2.multiply(m3.subtract(m4))
>  >>
>  >>
>  >>  >
>  >>  > - I *strongly* urge you to remove Serializable from everything! Please, we
>  >>  > did this in MTJ and it turned out to be a major pain. A more appropriate
>  >>  > approach is to define a class for reading/writing Matrix Market files. This
>  >>  > can be a new feature in 2.1. If you're going to leave it, at least document
>  >>  > that the Serializable form is not guaranteed to remain compatible across
>  >>  > versions.
>  >>
>  >>
>  >> I disagree with and I think its me who started adding this interface
>  >>  everywhere. The rationale is explained in the following use case:
>  >>
>  >>  As a user, I want to implement an interface for some xyz library where
>  >>  the interface already extends Serializable. In my implementation, I want
>  >>  to have an instance field that is some RealMatrix. If RealMatrix does
>  >>  not declare it extends Serializable, I keep getting warnings about
>  >>  having a non-serializable field in a serializable class. Looking the
>  >>  other way round having serializable and not using it never caused me any
>  >>  pain because it was really a generalized pattern. Mixing serializable
>  >>  and non serializable is difficult, putting it everywhere and sticking to
>  >>  this policy is simple.
>  >
>  > One can use the "transient" marker for non-serializable fields.
>
>
> Transient is OK when the field state can be reconstructed on
>  deserialization from other fields information. This is quite rare and is
>  surely not the case with matrices which may be huge and therefore not
>  duplicated in several different fields.
>
>
>  >
>  > Merely implementing Serializable will prevent the warnings, however in
>  > general it's not trivial to ensure that serialization works correctly.
>
>
> This is probably simpler in math objects (except graph theory perhaps)
>  than in other business domains. Our objects typically hold a primitive
>  and arrays and some other lower level objects with similar simple
>  structures.
>

Nevertheless, adding serialization needs to be considered carefully,
as pointed out here:

http://www.javapractices.com/topic/TopicAction.do?Id=45

Furthermore, readObject() cannot be used with final fields. Bloch
(Effective Java) suggests using readResolve() instead, but even this
has limitations.

As the book points out, merely making a class Serializable is
equivalent to adding a public constructor which sets all the
non-transient fields without perfoming any validation.

If there are any constraints on the fields, these have to be addressed
in readObject() and/or readResolve() methods.

>  Luc
>
>
>  >
>  >>  What kind of pain did Serializable cause in MTJ ?
>  >>
>  >>  BTW, I'm really +1 to Matrix Market files, could you open a Jira issue
>  >>  asking for this enhancement for 2.1 ?
>  >>
>  >>
>  >>  >
>  >>  > - I discourage the use of the classes named *Impl. They will get very
>  >>  > confusing when other implementations are added later! Instead, I recommend
>  >>  > the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array
>  >>  > backed implementation using row ordering). This allows a column-based or 1D
>  >>  > implementation in the future without names getting very confusing. These
>  >>  > implementations are hidden from users who just use the MatrixUtils help
>  >>
>  >>
>  >> Phil suggested to change RealMatrixImpl to ArrayRealMatrix (and
>  >>  DenseRealMatrix to BlockRealMatrix). This sounds good to me.
>  >>
>  >>
>  >>  >
>  >>  > - -1 for inclusion of methods in the sparse interfaces and shape type enum.
>  >>  > This needs more thought and discussion before inclusion. I am happy enough
>  >>  > for marker interfaces to be included (as long as it is documented that these
>  >>  > interfaces are subject to update in the next release), but strongly against
>  >>  > including methods in them.
>  >>
>  >>
>  >> +1 to further thoughts and discussion and postponing declaring methods
>  >>  in the interfaces after 2.0. Lets start with empty marker interfaces.
>  >>
>  >>
>  >>  Luc
>  >>
>  >>
>  >>  >
>  >>  > - could you please describe the hashing algorithm that OpenMap is using? I
>  >>  > have previously written a GNU Trove based sparse matrix implementation which
>  >>  > was a map from the primitive "long" to "double". It looks analagous. I broke
>  >>  > the row/column index into a long for the key using binary operations, but
>  >>  > the hashcode was tricky as it had to be an integer (not long). A naive
>  >>  > implementation can lead to collisions for typical or symmetric matrices. It
>  >>  > looks like the current implementation is using an int -> double backing map!
>  >>  > Wouldn't it make more sense to use a long?
>  >>  >
>  >>  > The following code might be useful to you (don't forget the L marker and
>  >>  > casts or the compiler will silently cast to an int), there are two options
>  >>  > for a hashcode calculator... I've not stress tested the second one but it
>  >>  > might be more appropriate as it gives more weight to lower bits. Remember
>  >>  > that FF is 256, or a full byte... so a full 64 bit long is 8 bytes or 16 hex
>  >>  > characters in Java bitwise.
>  >>  >
>  >>  >        int hash1(long val) {
>  >>  >                int first = Integer.reverse(key2row(val));
>  >>  >                int second = key2col(val);
>  >>  >                return first + second;
>  >>  >        }
>  >>  >
>  >>  >        int hash2(long val) {
>  >>  >                int first = key2row(val) & 0x0000FFFF;
>  >>  >                int second = key2col(val) << 16;
>  >>  >                return first + second;
>  >>  >        }
>  >>  >
>  >>  >        long key(int row, int column) {
>  >>  >                return (((long) row) << 32) + (long) column;
>  >>  >        }
>  >>  >
>  >>  >        int key2col(long key) {
>  >>  >                return (int) (key & 0x00000000FFFFFFFFL);
>  >>  >        }
>  >>  >
>  >>  >        int key2row(long key) {
>  >>  >                return (int) (key >>> 32);
>  >>  >        }
>  >>  >
>  >>  >
>  >>  > Bill Barker wrote:
>  >>  >> I have a slight preference for leaving the markers empty until 3.0, but I
>  >>  >> can remove them as well.  But I can wait to see what the community
>  >>  >> consensus
>  >>  >> is before making changes.
>  >>  >>
>  >>  >
>  >>
>  >>
>  >>
>  >> ---------------------------------------------------------------------
>  >>  To unsubscribe, e-mail: [hidden email]
>  >>  For additional commands, e-mail: [hidden email]
>  >>
>  >>
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: [hidden email]
>  > For additional commands, e-mail: [hidden email]
>  >
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
Luc... couldn't agree more regarding Serializable. Adding the Serializable interface instantly means you not only have to be API compatible with future releases but also binary Serializable compatible. This is what stung MTJ... it means you can't swap internal details of fields.

I strongly recommend everybody read the Bloch chapters on Serialisation before ever implementing that interface.

sebb-2-2 wrote
On 21/05/2009, Luc Maisonobe <Luc.Maisonobe@free.fr> wrote:
Nevertheless, adding serialization needs to be considered carefully,
as pointed out here:

http://www.javapractices.com/topic/TopicAction.do?Id=45

Furthermore, readObject() cannot be used with final fields. Bloch
(Effective Java) suggests using readResolve() instead, but even this
has limitations.

As the book points out, merely making a class Serializable is
equivalent to adding a public constructor which sets all the
non-transient fields without perfoming any validation.

If there are any constraints on the fields, these have to be addressed
in readObject() and/or readResolve() methods.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe
Luc, it absolutely would help the compiler... but the point is that the return types might not remain as they currently are in future releases. For example, if you multiply a dense matrix and a diagonal matrix... does it make sense to return a dense matrix? No. :-)

Incidentally, in MTJ we have a mechanism to allow the client to decide on the storage type for return types. It's a bit clunky, and I'd prefer a factory approach. I'd like to open up the debate on a mechanism in commons-math for doing this post-2.0.

Luc Maisonobe wrote
> - changing the return type to be actual classes was only supposed to be for
> copy() for 2.0. Doing this on multiply, add, etc is a big mistake... there
> is no guarantee that is the best thing to do and it is likely this will
> change in the future. This stands for all implementations. Great thought
> needs to go into each method, not just by reading the current
> implementation.

It may help the compiler using directly the optimized versions in
statements like:

  m1.add(m2.multiply(m3.subtract(m4))
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

James Carman
In reply to this post by Samuel Halliday
On Thu, May 21, 2009 at 11:31 AM, Sam Halliday <[hidden email]> wrote:
>
> Luc... couldn't agree more regarding Serializable. Adding the Serializable
> interface instantly means you not only have to be API compatible with future
> releases but also binary Serializable compatible. This is what stung MTJ...
> it means you can't swap internal details of fields.
>
> I strongly recommend everybody read the Bloch chapters on Serialisation
> before ever implementing that interface.

I've always found that if you're going to do it, it's better to do
Externalizable (if you anticipate the stuff getting saved in long-term
storage and not just in remoting situations) and make sure you include
a version number in the output so you can handle different
serialization formats.  I have no idea what Bloch said, that's just my
$0.02 (based on today's exchange rates of course).

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe
Regarding the name of ArrayRealMatrix. Please don't forget to include the "2DRow" part to the name (indicating a 2D array which is Row ordered) to indicate the implementation type. Post 2.0 I'll convince you that a 1D Array approach is best as it will lead to more efficient use of BLAS and therefore reproducibility of reference algorithms from, e.g. the Templates project on netlib ;-)

I've never seen a storage type like BlockRealMatrix before... I'm interested in seeing where it performs better than, say, a full 1D representation of a dense matrix. Breaking up the storage like this makes it very difficult to use BLAS/LAPACK internally.

Luc Maisonobe wrote
> - I discourage the use of the classes named *Impl. They will get very
> confusing when other implementations are added later! Instead, I recommend
> the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array
> backed implementation using row ordering). This allows a column-based or 1D
> implementation in the future without names getting very confusing. These
> implementations are hidden from users who just use the MatrixUtils help

Phil suggested to change RealMatrixImpl to ArrayRealMatrix (and
DenseRealMatrix to BlockRealMatrix). This sounds good to me.
Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
In reply to this post by Samuel Halliday
Sam Halliday a écrit :
> Luc... couldn't agree more regarding Serializable. Adding the Serializable
> interface instantly means you not only have to be API compatible with future
> releases but also binary Serializable compatible. This is what stung MTJ...
> it means you can't swap internal details of fields.
>
> I strongly recommend everybody read the Bloch chapters on Serialisation
> before ever implementing that interface.

OK. What do you suggest now ? Completely remove Serializable (which
would really bother me as I do use it) or remove it from interfaces and
add it appropriately to some implementation classes ?

Luc

>
>
> sebb-2-2 wrote:
>> On 21/05/2009, Luc Maisonobe <[hidden email]> wrote:
>> Nevertheless, adding serialization needs to be considered carefully,
>> as pointed out here:
>>
>> http://www.javapractices.com/topic/TopicAction.do?Id=45
>>
>> Furthermore, readObject() cannot be used with final fields. Bloch
>> (Effective Java) suggests using readResolve() instead, but even this
>> has limitations.
>>
>> As the book points out, merely making a class Serializable is
>> equivalent to adding a public constructor which sets all the
>> non-transient fields without perfoming any validation.
>>
>> If there are any constraints on the fields, these have to be addressed
>> in readObject() and/or readResolve() methods.
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Luc Maisonobe
In reply to this post by Samuel Halliday
Sam Halliday a écrit :

> Regarding the name of ArrayRealMatrix. Please don't forget to include the
> "2DRow" part to the name (indicating a 2D array which is Row ordered) to
> indicate the implementation type. Post 2.0 I'll convince you that a 1D Array
> approach is best as it will lead to more efficient use of BLAS and therefore
> reproducibility of reference algorithms from, e.g. the Templates project on
> netlib ;-)
>
> I've never seen a storage type like BlockRealMatrix before... I'm interested
> in seeing where it performs better than, say, a full 1D representation of a
> dense matrix. Breaking up the storage like this makes it very difficult to
> use BLAS/LAPACK internally.

This is a simple approach I tried based on cache considerations. It is a
compromise that works well when some algorithms are row oriented, others
are column oriented and still others are mixed (like multiplication). It
does improve multiplication time with respect to RealMatrixImpl for
example on matrices larger than 100x100, but it may need adapting block
size depending on target.

A more complex example is given in a paper by Siddhartha Chatterjee,
Alvin R. Lebeck, Praveen K. Patnala and Mithuna Thottethodi: Recursive
Array Layouts and Fast Matrix Multiplication
<http://www.cs.duke.edu/~alvy/papers/matrix-tpds.pdf>. I have tried this
approach too (using the Gray-Morton space-filling curve) but the gain
was not worth the pain. This attempt was not completed for this reason,
but it is still available in the experimental directory in the
repository, it is the RecursiveLayoutRealMatrix class. Maybe it can be
resurrected later, once completed and optimized.

>
>
> Luc Maisonobe wrote:
>>> - I discourage the use of the classes named *Impl. They will get very
>>> confusing when other implementations are added later! Instead, I
>>> recommend
>>> the names ArrayRealVector, Array2DRowRealMatrix (to indicate a 2D array
>>> backed implementation using row ordering). This allows a column-based or
>>> 1D
>>> implementation in the future without names getting very confusing. These
>>> implementations are hidden from users who just use the MatrixUtils help
>> Phil suggested to change RealMatrixImpl to ArrayRealMatrix (and
>> DenseRealMatrix to BlockRealMatrix). This sounds good to me.
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [math] Re: commons-math, matrix-toolkits-java and consolidation

Samuel Halliday
In reply to this post by Luc Maisonobe
Absolutely remove from the interfaces... but leave on implementations :-) That way all future implementations aren't forced to be compatible to their first release.

Hopefully we'll address transport of instances in 2.1 with Matrix Market IO, which should also encourage more efficient compression.

Luc Maisonobe wrote
Sam Halliday a écrit :
> Luc... couldn't agree more regarding Serializable. Adding the Serializable
> interface instantly means you not only have to be API compatible with future
> releases but also binary Serializable compatible. This is what stung MTJ...
> it means you can't swap internal details of fields.
>
> I strongly recommend everybody read the Bloch chapters on Serialisation
> before ever implementing that interface.

OK. What do you suggest now ? Completely remove Serializable (which
would really bother me as I do use it) or remove it from interfaces and
add it appropriately to some implementation classes ?

Luc

>
>
> sebb-2-2 wrote:
>> On 21/05/2009, Luc Maisonobe <Luc.Maisonobe@free.fr> wrote:
>> Nevertheless, adding serialization needs to be considered carefully,
>> as pointed out here:
>>
>> http://www.javapractices.com/topic/TopicAction.do?Id=45
>>
>> Furthermore, readObject() cannot be used with final fields. Bloch
>> (Effective Java) suggests using readResolve() instead, but even this
>> has limitations.
>>
>> As the book points out, merely making a class Serializable is
>> equivalent to adding a public constructor which sets all the
>> non-transient fields without perfoming any validation.
>>
>> If there are any constraints on the fields, these have to be addressed
>> in readObject() and/or readResolve() methods.
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org
12345