[jira] [Created] (MATH-857) I would like to include a VIF and TOLERANCE check for a 2 dimensional double array, vey commonly used by major packages to determine variables that cause multi-colinearity issues and should be excluded from the models

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (MATH-857) I would like to include a VIF and TOLERANCE check for a 2 dimensional double array, vey commonly used by major packages to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
Marios Michaelidis created MATH-857:
---------------------------------------

             Summary: I would like to include a VIF and TOLERANCE check for a 2 dimensional double array, vey commonly used by major packages to determine variables that cause multi-colinearity issues and should be excluded from the models
                 Key: MATH-857
                 URL: https://issues.apache.org/jira/browse/MATH-857
             Project: Commons Math
          Issue Type: New Feature
    Affects Versions: 3.0, 3.1
         Environment: can apply to all operating systems
            Reporter: Marios Michaelidis
             Fix For: 3.1, 3.0


Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles updated MATH-857:
------------------------

             Priority: Minor  (was: Major)
    Affects Version/s:     (was: 3.1)
        Fix Version/s:     (was: 3.0)
              Summary: Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models  (was: I would like to include a VIF and TOLERANCE check for a 2 dimensional double array, vey commonly used by major packages to determine variables that cause multi-colinearity issues and should be excluded from the models)
   

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marios Michaelidis updated MATH-857:
------------------------------------

    Attachment: VIF_Tolerance.txt

Here is the VIF and Tolerance class in a txt file. It is tested with the CM and compared with other major statistical packages. I propose to be in the same package with the correlations as this is what it checks, "Multi-colinearity". Tell me if this is required in a different format.

Regards
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: VIF_Tolerance.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451355#comment-13451355 ]

Phil Steitz commented on MATH-857:
----------------------------------

This is a great start.  I would say rename the class Multicollinearity and put it in regression. Per Gilles' comments on the mailing list, we also need some tests.  Validation of test cases against R or some other package is also desirable.  There is an R test framework in /src/test/R that can be used to validate test cases against R.  Ask on the ML or via private email if you need help getting set up to generate checkstyle reports, etc.
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: VIF_Tolerance.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marios Michaelidis updated MATH-857:
------------------------------------

    Attachment: FOR TOLERANCE.rar

I am uploading this .Rar file that includes:
 1)the R code to generate the Vif and tolerance labelled as R CODE.txt.
2) The Code and the results for R together (The output I had when I ran them) in R OUTPUT.txt
3) 3 screenshots with the exact same data matrix from SPSS. follow this sequence, Matrix in SPSS.JPG--> prepare for regerssion.JPG --> results of vif and Tolerance.JPG,
4) A test class, labelled as test class for java.txt that uses the Multicolinearity.JAVA class(renamed from VIF_Tolerance)
5) The Multicolinearity class included in the regerssion package.
If there is anything else that I need to do, please let me know.

Regards
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, VIF_Tolerance.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451391#comment-13451391 ]

Marios Michaelidis commented on MATH-857:
-----------------------------------------

All three methods give of course the same results
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, VIF_Tolerance.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marios Michaelidis updated MATH-857:
------------------------------------

    Attachment: Multicolinearity.java

Accidentatally I copied the previous version of Multicolinearity in the .rar file (before I renamed it). This is the updated one.

Regards

Marios
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java, VIF_Tolerance.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451425#comment-13451425 ]

Phil Steitz commented on MATH-857:
----------------------------------

In general, we don't like to add @author tags or to embed individual copyright notices in the code.  Are you OK removing the copyright notice from the source file?  If we end up committing something based on this, we will include you in our contributors file.

It would also make it a little easier on us in reviewing and applying your contributions if you could create them using "svn diff" against a checkout of the current source code.  Also, as noted on the mailing list, running the checkstyle checks and fixing the formatting problems will speed things up.

You can also delete obsolete attachments from this ticket so we don't get confused about what the most recent patch is.

Thanks!
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java, VIF_Tolerance.txt
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marios Michaelidis updated MATH-857:
------------------------------------

    Attachment:     (was: Multicolinearity.java)
   

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marios Michaelidis updated MATH-857:
------------------------------------

    Attachment:     (was: VIF_Tolerance.txt)
   

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marios Michaelidis updated MATH-857:
------------------------------------

    Attachment: Multicolinearity.java

This is the latest Multicolinearity class file with the copyright note removed.

Generally my background is more like in Risk and Statistics and therefore I find myself in lack of the specific vocabulary (and generally the processes for software deploymnent) that is being used here! I do apologise for that. I googled this "SVM diff" and it gives me various links. Do you have a specific link for that?. As for the style and formatting, I though eclipse was helping me for that... if not what should I do to improve it in an easy way?

Regards
               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451614#comment-13451614 ]

Gilles commented on MATH-857:
-----------------------------

bq. I googled this "SVM diff" [...]

That would be (if you work in a Unix-like environment):
{noformat}
  svn diff > math857.patch
{noformat}

in order to create a "patch" file that collects the "difference" of your working copy of the source code
with respect to the version (a.k.a. "revision") of the code on the shared repository.

The Commons Math project uses the "subversion" software (abbreviated as "svn") for keeping of the history of the modifications.
Prior to submitting a code for inclusion, you should thus merge it in that "svn" environment.

You must install the "subversion" software, then "check-out" the Commons Math code:
{noformat}
  svn co https://svn.apache.org/repos/asf/commons/proper/math/trunk
{noformat}

Then you must also install the "maven" software (abbreviated as "mvn"), as it is used to perform various tasks such as compiling, running the tests, generating reports (e.g. "CheckStyle").
Once you've checked out the code, the first thing you'd do is go to the "trunk" directory and run
{noformat}
  mvn test
{noformat}

               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451615#comment-13451615 ]

Phil Steitz commented on MATH-857:
----------------------------------

Have a look at the "Contributing" section of the developers guide here
http://commons.apache.org/math/developers.html

That should help you get started.  Basically, we currently use Subversion as our source control system and we collaboratively develop the code by looking at "diffs" against the source code in the repository.  Diff files are text files that show what is different between what a contributor has in his or her local checkout of the code and what has been committed to the repository.  After reviewing the link above, you should

0) check out the "trunk" (https://svn.apache.org/repos/asf/commons/proper/math/trunk)
1) add your new classes to the source tree locally
2) run "svn add" with the names of the new files to get them added to your local checkout
3) run "mvn clean test" from the root of the checkout (where "trunk/" is).  Verify no failures.
4) run "mvn -DskipTests site" again from trunk root.  After downloading a large number of dependent jars this will eventually produce a version of the commons math web site in /target/site (relative to the root of the checkout).  Navigate to the checkstyle reports and see what it may be complaining about.  Similarly for findbugs.
5) Once tests in 3) and static checks in 4) look good, run "svn diff > multicollinearity.patch" from the root of the checkout.  That will produce a diff file with the name on the right.  Upload that file here.

To get maximum help from Eclipse in formatting, you can use:
http://people.apache.org/~luc/Apache-commons.xml

Don't hesitate to ask if you have more questions.  One day you will get to answer these questions for someone else :)

               

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451615#comment-13451615 ]

Phil Steitz edited comment on MATH-857 at 9/10/12 2:50 AM:
-----------------------------------------------------------

Have a look at the "Contributing" section of the developers guide here
http://commons.apache.org/math/developers.html

That should help you get started.  Basically, we currently use Subversion as our source control system and we collaboratively develop the code by looking at "diffs" against the source code in the repository.  Diff files are text files that show what is different between what a contributor has in his or her local checkout of the code and what has been committed to the repository.  After reviewing the link above, you should

0) check out the "trunk" (https://svn.apache.org/repos/asf/commons/proper/math/trunk)
1) add your new classes to the source tree locally
2) run "svn add" with the names of the new files to get them added to your local checkout
3) run "mvn clean test" from the root of the checkout (where "trunk/" is).  Verify no failures.
4) run "mvn -DskipTests=true site" again from trunk root.  After downloading a large number of dependent jars this will eventually produce a version of the commons math web site in /target/site (relative to the root of the checkout).  Navigate to the checkstyle reports and see what it may be complaining about.  Similarly for findbugs.
5) Once tests in 3) and static checks in 4) look good, run "svn diff > multicollinearity.patch" from the root of the checkout.  That will produce a diff file with the name on the right.  Upload that file here.

To get maximum help from Eclipse in formatting, you can use:
http://people.apache.org/~luc/Apache-commons.xml

Don't hesitate to ask if you have more questions.  One day you will get to answer these questions for someone else :)

               
      was (Author: psteitz):
    Have a look at the "Contributing" section of the developers guide here
http://commons.apache.org/math/developers.html

That should help you get started.  Basically, we currently use Subversion as our source control system and we collaboratively develop the code by looking at "diffs" against the source code in the repository.  Diff files are text files that show what is different between what a contributor has in his or her local checkout of the code and what has been committed to the repository.  After reviewing the link above, you should

0) check out the "trunk" (https://svn.apache.org/repos/asf/commons/proper/math/trunk)
1) add your new classes to the source tree locally
2) run "svn add" with the names of the new files to get them added to your local checkout
3) run "mvn clean test" from the root of the checkout (where "trunk/" is).  Verify no failures.
4) run "mvn -DskipTests site" again from trunk root.  After downloading a large number of dependent jars this will eventually produce a version of the commons math web site in /target/site (relative to the root of the checkout).  Navigate to the checkstyle reports and see what it may be complaining about.  Similarly for findbugs.
5) Once tests in 3) and static checks in 4) look good, run "svn diff > multicollinearity.patch" from the root of the checkout.  That will produce a diff file with the name on the right.  Upload that file here.

To get maximum help from Eclipse in formatting, you can use:
http://people.apache.org/~luc/Apache-commons.xml

Don't hesitate to ask if you have more questions.  One day you will get to answer these questions for someone else :)

                 

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.1
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-857) Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles updated MATH-857:
------------------------

    Fix Version/s:     (was: 3.1)
                   3.2
   

> Include a VIF and TOLERANCE check for a 2 dimensional double array, to determine variables that cause multi-colinearity issues and should be excluded from the models
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-857
>                 URL: https://issues.apache.org/jira/browse/MATH-857
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 3.0
>         Environment: can apply to all operating systems
>            Reporter: Marios Michaelidis
>            Priority: Minor
>              Labels: build, test
>             Fix For: 3.2
>
>         Attachments: FOR TOLERANCE.rar, Multicolinearity.java
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Multicollinearity is a statistical phenomenon in which two or more predictor variables in any multiple regression model are highly correlated. Tolerance and VIF are checks that allows to avoid optimization failes due to "inability to converge". Most of the times, the major packages (SAS, SPSS etc), have a check prior to running the model and they exclude variables that might cause these kind of problems. It is quite a useful tool to be in common maths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira