[GSoC][Commons][STATISTICS][Regression][Matrix] Flexibility in Matrix Libraries in Regression Component?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[GSoC][Commons][STATISTICS][Regression][Matrix] Flexibility in Matrix Libraries in Regression Component?

Ben Nguyen
Hello,

If I recall/assume correctly, there is currently no concrete plan to create a new matrix library any time soon; leading me to use the EJML as suggested by Mr. Eric Barnhill. This seems to be temporary until the CM matrix component is upgraded, or perhaps it’s permanent? I don’t believe the plan is or that the use of EJML should be permanent….
So should I design the regression component (which heavily uses matrices and vectors) to use a matrix interfaces to allow for simpler reformatting later…. or even as a means of allowing the user to use any matrix library they want for computing regressions?
This also aligns with what Mr. Alex Herbert told me about decoupling in regards to the input data and builder/loader functionality: https://issues.apache.org/jira/browse/STATISTICS-11

Thank you for your response,
Cheers,
-Ben

Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][STATISTICS][Regression][Matrix] Flexibility in Matrix Libraries in Regression Component?

Gilles Sadowski-2
Hi.

Le lun. 17 juin 2019 à 16:13, Ben Nguyen <[hidden email]> a écrit :
>
> Hello,
>
> If I recall/assume correctly, there is currently no concrete plan to create a new matrix library any time soon;

An interesting question is whether there is a need.
Or, IOW, do we still cling to the "no dependency" policy?

> leading me to use the EJML as suggested by Mr. Eric Barnhill. This seems to be temporary until the CM matrix component is upgraded,

In line with the progressive replacement of CM packages by more focused
components, without circular dependencies, if there is ever an "upgrade",
it will become "Commons Linear Algebra" (or some such name).
Again, the question is: Can we depend on an external project?

> or perhaps it’s permanent? I don’t believe the plan is or that the use of EJML should be permanent….

Why not?

> So should I design the regression component (which heavily uses matrices and vectors) to use a matrix interfaces to allow for simpler reformatting later…. or even as a means of allowing the user to use any matrix library they want for computing regressions?
> This also aligns with what Mr. Alex Herbert told me about decoupling in regards to the input data and builder/loader functionality: https://issues.apache.org/jira/browse/STATISTICS-11

Wherever possible, there should be a clear boundary when some functionality
depends on an external API.
Where this occurs, the distinction should be made whether the use is internal
or in a public API.

In the former case, we can (more or less easily) replace one library by another
in a minor release.  And we can even "hide" our usage of an external code by
"shading" it.
In the latter case, we'd impose an external API onto users of "Statistics", and
replacing that library (effectively changing the our API) could only occur in a
major release.  And the "shade" plugin cannot be used.

Hence the design questions which you must answer are:
* What linear algebra operations are needed?
Based on the answer, you could check which library is better suited.[1]
* Which of these operations would be part of the public API?
Then we'd define minimal interfaces (to avoid coupling with the external
API), whose purpose is to bridge with the library used internally.


Regards,
Gilles

[1] https://issues.apache.org/jira/projects/STATISTICS/issues/STATISTICS-10




>
> Thank you for your response,
> Cheers,
> -Ben
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][STATISTICS][Regression][Matrix] Flexibility in Matrix Libraries in Regression Component?

Eric Barnhill
In reply to this post by Ben Nguyen
On Mon, Jun 17, 2019 at 7:13 AM Ben Nguyen <[hidden email]> wrote:

> I don’t believe the plan is or that the use of EJML should be permanent….
>


There's no reason it couldn't be permanent. Obviously we want to give
credit where it is due in all the appropriate ways. But the code is
licensed so that others may incorporate it. It is hard to see any downside
for the EJML team to gaining greater exposure and use by being shaded by
Apache. That is probably what they want.

Efficient matrix implementations are serious business. If you ask me,
commons would be well within its mission by making EJML easy to find, use,
and combine with other libraries of useful code. We would not necessarily
be in the commons mission by developing our own sparse matrix factorization
libraries.

I feel exactly the same way about the JTransforms library, on the day that
we get to that.
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][STATISTICS][Regression][Matrix] Flexibility in Matrix Libraries in Regression Component?

Gilles Sadowski-2
Hi.

Le mer. 19 juin 2019 à 18:39, Eric Barnhill <[hidden email]> a écrit :

>
> On Mon, Jun 17, 2019 at 7:13 AM Ben Nguyen <[hidden email]> wrote:
>
> > I don’t believe the plan is or that the use of EJML should be permanent….
> >
>
>
> There's no reason it couldn't be permanent. Obviously we want to give
> credit where it is due in all the appropriate ways. But the code is
> licensed so that others may incorporate it. It is hard to see any downside
> for the EJML team to gaining greater exposure and use by being shaded by
> Apache. That is probably what they want.

There are two issues which we must settle:
1. The choice of EJML, even though it was not the best contenders in the
benchmark referred to previously.
2. The problem of supporting an external API.

AFAIK, the latter was never accepted in Commons.
If you want to challenge that, please post to [All].

> Efficient matrix implementations are serious business. If you ask me,
> commons would be well within its mission by making EJML easy to find, use,
> and combine with other libraries of useful code. We would not necessarily
> be in the commons mission by developing our own sparse matrix factorization
> libraries.

This is not the point (I agree that we don't have neither the time nor
the expertise to reinvent a library for linear algebra).

The point is that by shading a library, we can switch to another if/when
there is a need (e.g. in case that project disappears).

Regards,
Gilles

>
> I feel exactly the same way about the JTransforms library, on the day that
> we get to that.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]