[jira] [Created] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069298#comment-13069298 ]

Phil Steitz commented on MATH-607:
----------------------------------

Don't worry about the exceptions.  The only thing remaining there is to fit into the hierarchy.  I will fix that.  Feel free to weigh in on the ML thread, though.

I am still not sold on the globalstats array.  It is just not the Java way to use arrays with static constants into them to represent properties.  I agree strongly that we are going to want to add more fields to RegressionResults. In the public API, we are going to want them to be properties, though.  Why would a user ever want to use getGlobalStats[THING_I_WANT] instead of getThingIWant?  It is actually more code to maintain the enum plus the array rather than just fields.  So I guess I disagree with a) and d) above.  I get b), but don't see it as a big deal or enough to change API.  I also get c), but it frankly looks a little scary.  I have been burned so many times over the years by indicies into blocks of storage, variable content hashmaps of attributes, etc. that I try to avoid these things in my code.  And think about the change in c) in any case - from globalStats[OLD_INDEX] to globalStats[NEW_INDEX] somehow through the API, when the same change is really s/oldProperty/newProperty likely at the same entry point.

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply | Threaded
Open this post in threaded view
|

[jira] [Issue Comment Edited] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069298#comment-13069298 ]

Phil Steitz edited comment on MATH-607 at 7/21/11 11:56 PM:
------------------------------------------------------------

Don't worry about the exceptions.  The only thing remaining there is to fit into the hierarchy.  I will fix that.  Feel free to weigh in on the ML thread, though.

I am still not sold on the globalstats array.  It is just not the Java way to use arrays with static constants into them to represent properties.  I agree strongly that we are going to want to add more fields to RegressionResults. In the public API, we are going to want them to be properties, though.  Why would a user ever want to use getGlobalStats[THING_I_WANT] instead of getThingIWant?  It is actually more code to maintain the enum plus the array rather than just fields.  So I guess I disagree with a) and d) above.  I get b), but don't see it as a big deal or enough to change API.  I also get c), but it frankly looks a little scary.  I have been burned so many times over the years by indicies into blocks of storage, variable content hashmaps of attributes, etc. that I try to avoid these things in my code.  And think about the change in c) in any case - from globalStats[OLD_INDEX] to globalStats[NEW_INDEX] somehow through the API, when the same change is really s/oldProperty/newProperty likely at the same entry point.

If we think that the globalFit stuff needs to be encapsulated within RegressionResults, I would be fine with defining another class to hold global model fit statistics.  This class would then have the fit properties as fields.

      was (Author: psteitz):
    Don't worry about the exceptions.  The only thing remaining there is to fit into the hierarchy.  I will fix that.  Feel free to weigh in on the ML thread, though.

I am still not sold on the globalstats array.  It is just not the Java way to use arrays with static constants into them to represent properties.  I agree strongly that we are going to want to add more fields to RegressionResults. In the public API, we are going to want them to be properties, though.  Why would a user ever want to use getGlobalStats[THING_I_WANT] instead of getThingIWant?  It is actually more code to maintain the enum plus the array rather than just fields.  So I guess I disagree with a) and d) above.  I get b), but don't see it as a big deal or enough to change API.  I also get c), but it frankly looks a little scary.  I have been burned so many times over the years by indicies into blocks of storage, variable content hashmaps of attributes, etc. that I try to avoid these things in my code.  And think about the change in c) in any case - from globalStats[OLD_INDEX] to globalStats[NEW_INDEX] somehow through the API, when the same change is really s/oldProperty/newProperty likely at the same entry point.
 

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069322#comment-13069322 ]

greg sterijevski commented on MATH-607:
---------------------------------------

How do you propose to allow for the growth of global fit statistics? Keep
the getter pattern?
If we decide to keep the getter pattern, then for sure eliminate the array,
eliminate the static int indices.

What is the ML thread?

-Greg




> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069326#comment-13069326 ]

Phil Steitz commented on MATH-607:
----------------------------------

To add global fit statistics, yes, we add properties and getters.  No easier to add constants and expand the array.  The public API needs to expose the new stats in either case.  As I said, if it turns out that just the fit statistics are useful by themselves, it may make sense to encapsulate them in a separate class and have RegressionResults have one of these as a member.

The ML thread I was referring to is the thread on commons-dev with subject "RegressionModelSpecificationException"

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069330#comment-13069330 ]

greg sterijevski commented on MATH-607:
---------------------------------------

Thanks on both counts. I can understand your reticence on the array
approach.




> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113120#comment-13113120 ]

greg sterijevski commented on MATH-607:
---------------------------------------

I am pushing some changes to SimpleRegression which will allow it to support the UpdatingMultipleRegression interface. There are a couple of additions to Localizable.

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113175#comment-13113175 ]

Phil Steitz commented on MATH-607:
----------------------------------

Lets s/UpdatingMultipleLinearRegression/UpdatingLinearRegression
It is a little, um, funny to have simple regression implement multiple regression ;)

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113431#comment-13113431 ]

greg sterijevski commented on MATH-607:
---------------------------------------

Yes, I concur. Besides, having such long interface name is not a good idea
either..




> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.0
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles updated MATH-607:
------------------------

    Fix Version/s:     (was: 3.0)
                   3.1
   

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.1
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420949#comment-13420949 ]

Thomas Neidhart commented on MATH-607:
--------------------------------------

The patch has already been applied for 3.0. Is there still something to do?
               

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.1
>
>         Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles resolved MATH-607.
-------------------------

    Resolution: Implemented

Nobody seems to be able to answer this question.
If someone wants to make further updates, a new ticket can be created...

               

> Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-607
>                 URL: https://issues.apache.org/jira/browse/MATH-607
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: Java
>            Reporter: greg sterijevski
>              Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma
>             Fix For: 3.1
>
>         Attachments: millerreg, millerreg_take2, millerregtest, regres_change1, RegressResults2, updating_reg_cut2, updating_reg_ifaces
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an "UpdatingLinearRegression" interface which defines basic functionality all such techniques must fulfill.
> Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface "RegressionResults".  
> Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place.
> Thanks,
> -Greg

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12