[jira] [Created] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

ASF GitHub Bot (Jira)
for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: MATH-644
                 URL: https://issues.apache.org/jira/browse/MATH-644
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 2.2
            Reporter: marzieh
            Priority: Minor
             Fix For: 2.2


In windows 7, I used common.Math library. I used class "HypergeometricDistributionImpl" and method "upperCumulativeProbability" of zero for distribution and the return value is larget than 1. the following code is working error.

HypergeometricDistributionImpl u = new HypergeometricDistributionImpl(14761461, 1035 ,1841 );
System.out.println(u.upperCumulativeProbability(0))

Thanks

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159987#comment-13159987 ]

Thomas Neidhart commented on MATH-644:
--------------------------------------

I digged a bit into the problem. The HypergeometricDistribution calculates the probability for a given x using the following formula:

{code:java}
private double probability(int n, int m, int k, int x) {
  return FastMath.exp(ArithmeticUtils.binomialCoefficientLog(m, x) +
         ArithmeticUtils.binomialCoefficientLog(n - m, k - x) -
         ArithmeticUtils.binomialCoefficientLog(n, k));
}
{code}

Thus it transforms the binomial coefficients to a logarithmic scale in order to cope with the possibly large results (and to be able to compute the bincoeff at all). But, imho, the reverse transformation is broken, as it does not take any scaling into account. As the coefficients get larger (e.g. due to a large n), the differences between the terms will become smaller in log scale, and thus incorrectly transformed back to linear scale. Taking scaling into account, the exp function will most likely fail for large n.

I have created a simple test to illustrate the problem, the t{x} correspond to the binomial coeff terms from the formula, diff is the input the the exp function. This loop simulates an increasing n, and the expectation is that the result should get smaller with increasing n:

{code}
t1=0.0, t2=4547.288942497606, t3=4770.9627189150215, diff=-223.67377641741587, result=7.23957639711833E-98
t1=0.0, t2=12183.221706275828, t3=12186.968419291079, diff=-3.7467130152508616, result=0.023595175513309037
t1=0.0, t2=13444.672093808651, t3=13446.561352727935, diff=-1.8892589192837477, result=0.15118380673528464
t1=0.0, t2=14186.229425843805, t3=14187.492505971342, diff=-1.2630801275372505, result=0.2827816800804864
t1=0.0, t2=14713.395226772875, t3=14714.343882929534, diff=-0.9486561566591263, result=0.3872610921706871
t1=0.0, t2=15122.726785860956, t3=15123.486358374357, diff=-0.7595725134015083, result=0.46786639087791215
t1=0.0, t2=15457.396298892796, t3=15458.029636271298, diff=-0.6333373785018921, result=0.5308173033122051
t1=0.0, t2=15740.484181590378, t3=15741.027263000607, diff=-0.5430814102292061, result=0.5809553297318205
t1=0.0, t2=15985.787659011781, t3=15986.263000234962, diff=-0.47534122318029404, result=0.6216728910682101
t1=0.0, t2=16202.21559868753, t3=16202.638224512339, diff=-0.4226258248090744, result=0.6553237931479635
t1=0.0, t2=16395.855738580227, t3=16396.236174091697, diff=-0.380435511469841, result=0.6835636445695273
{code}
               

> for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.  
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-644
>                 URL: https://issues.apache.org/jira/browse/MATH-644
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>            Reporter: marzieh
>            Priority: Minor
>              Labels: hypergeometric, probability
>             Fix For: 2.2
>
>
> In windows 7, I used common.Math library. I used class "HypergeometricDistributionImpl" and method "upperCumulativeProbability" of zero for distribution and the return value is larget than 1. the following code is working error.
> HypergeometricDistributionImpl u = new HypergeometricDistributionImpl(14761461, 1035 ,1841 );
> System.out.println(u.upperCumulativeProbability(0))
> Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Issue Comment Edited] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/MATH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159987#comment-13159987 ]

Thomas Neidhart edited comment on MATH-644 at 12/15/11 8:53 AM:
----------------------------------------------------------------

I digged a bit into the problem. The HypergeometricDistribution calculates the probability for a given x using the following formula:

{code:java}
private double probability(int n, int m, int k, int x) {
  return FastMath.exp(ArithmeticUtils.binomialCoefficientLog(m, x) +
         ArithmeticUtils.binomialCoefficientLog(n - m, k - x) -
         ArithmeticUtils.binomialCoefficientLog(n, k));
}
{code}

Thus it transforms the binomial coefficients to a logarithmic scale in order to cope with the possibly large results (and to be able to compute the bincoeff at all). But, imho, the reverse transformation is broken, as it does not take any scaling into account. As the coefficients get larger (e.g. due to a large n), the differences between the terms will become smaller in log scale, and thus incorrectly transformed back to linear scale. Taking scaling into account, the exp function will most likely fail for large n.

I have created a simple test to illustrate the problem, the t{x} correspond to the binomial coeff terms from the formula, diff is the input the the exp function. This loop simulates an increasing n, and the expectation is that the result should get smaller with increasing n:

{code}
t1=0.0, t2=4547.288942497606, t3=4770.9627189150215, diff=-223.67377641741587, result=7.23957639711833E-98
t1=0.0, t2=12183.221706275828, t3=12186.968419291079, diff=-3.7467130152508616, result=0.023595175513309037
t1=0.0, t2=13444.672093808651, t3=13446.561352727935, diff=-1.8892589192837477, result=0.15118380673528464
t1=0.0, t2=14186.229425843805, t3=14187.492505971342, diff=-1.2630801275372505, result=0.2827816800804864
t1=0.0, t2=14713.395226772875, t3=14714.343882929534, diff=-0.9486561566591263, result=0.3872610921706871
t1=0.0, t2=15122.726785860956, t3=15123.486358374357, diff=-0.7595725134015083, result=0.46786639087791215
t1=0.0, t2=15457.396298892796, t3=15458.029636271298, diff=-0.6333373785018921, result=0.5308173033122051
t1=0.0, t2=15740.484181590378, t3=15741.027263000607, diff=-0.5430814102292061, result=0.5809553297318205
t1=0.0, t2=15985.787659011781, t3=15986.263000234962, diff=-0.47534122318029404, result=0.6216728910682101
t1=0.0, t2=16202.21559868753, t3=16202.638224512339, diff=-0.4226258248090744, result=0.6553237931479635
t1=0.0, t2=16395.855738580227, t3=16396.236174091697, diff=-0.380435511469841, result=0.6835636445695273
{code}

hmm, after some second thoughts, I am not sure if the analysis is correct, and the problem is hidden somewhere else.
               
      was (Author: tn):
    I digged a bit into the problem. The HypergeometricDistribution calculates the probability for a given x using the following formula:

{code:java}
private double probability(int n, int m, int k, int x) {
  return FastMath.exp(ArithmeticUtils.binomialCoefficientLog(m, x) +
         ArithmeticUtils.binomialCoefficientLog(n - m, k - x) -
         ArithmeticUtils.binomialCoefficientLog(n, k));
}
{code}

Thus it transforms the binomial coefficients to a logarithmic scale in order to cope with the possibly large results (and to be able to compute the bincoeff at all). But, imho, the reverse transformation is broken, as it does not take any scaling into account. As the coefficients get larger (e.g. due to a large n), the differences between the terms will become smaller in log scale, and thus incorrectly transformed back to linear scale. Taking scaling into account, the exp function will most likely fail for large n.

I have created a simple test to illustrate the problem, the t{x} correspond to the binomial coeff terms from the formula, diff is the input the the exp function. This loop simulates an increasing n, and the expectation is that the result should get smaller with increasing n:

{code}
t1=0.0, t2=4547.288942497606, t3=4770.9627189150215, diff=-223.67377641741587, result=7.23957639711833E-98
t1=0.0, t2=12183.221706275828, t3=12186.968419291079, diff=-3.7467130152508616, result=0.023595175513309037
t1=0.0, t2=13444.672093808651, t3=13446.561352727935, diff=-1.8892589192837477, result=0.15118380673528464
t1=0.0, t2=14186.229425843805, t3=14187.492505971342, diff=-1.2630801275372505, result=0.2827816800804864
t1=0.0, t2=14713.395226772875, t3=14714.343882929534, diff=-0.9486561566591263, result=0.3872610921706871
t1=0.0, t2=15122.726785860956, t3=15123.486358374357, diff=-0.7595725134015083, result=0.46786639087791215
t1=0.0, t2=15457.396298892796, t3=15458.029636271298, diff=-0.6333373785018921, result=0.5308173033122051
t1=0.0, t2=15740.484181590378, t3=15741.027263000607, diff=-0.5430814102292061, result=0.5809553297318205
t1=0.0, t2=15985.787659011781, t3=15986.263000234962, diff=-0.47534122318029404, result=0.6216728910682101
t1=0.0, t2=16202.21559868753, t3=16202.638224512339, diff=-0.4226258248090744, result=0.6553237931479635
t1=0.0, t2=16395.855738580227, t3=16396.236174091697, diff=-0.380435511469841, result=0.6835636445695273
{code}
                 

> for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.  
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-644
>                 URL: https://issues.apache.org/jira/browse/MATH-644
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>            Reporter: marzieh
>            Priority: Minor
>              Labels: hypergeometric, probability
>             Fix For: 2.2
>
>
> In windows 7, I used common.Math library. I used class "HypergeometricDistributionImpl" and method "upperCumulativeProbability" of zero for distribution and the return value is larget than 1. the following code is working error.
> HypergeometricDistributionImpl u = new HypergeometricDistributionImpl(14761461, 1035 ,1841 );
> System.out.println(u.upperCumulativeProbability(0))
> Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart updated MATH-644:
---------------------------------

    Comment: was deleted

(was: I digged a bit into the problem. The HypergeometricDistribution calculates the probability for a given x using the following formula:

{code:java}
private double probability(int n, int m, int k, int x) {
  return FastMath.exp(ArithmeticUtils.binomialCoefficientLog(m, x) +
         ArithmeticUtils.binomialCoefficientLog(n - m, k - x) -
         ArithmeticUtils.binomialCoefficientLog(n, k));
}
{code}

Thus it transforms the binomial coefficients to a logarithmic scale in order to cope with the possibly large results (and to be able to compute the bincoeff at all). But, imho, the reverse transformation is broken, as it does not take any scaling into account. As the coefficients get larger (e.g. due to a large n), the differences between the terms will become smaller in log scale, and thus incorrectly transformed back to linear scale. Taking scaling into account, the exp function will most likely fail for large n.

I have created a simple test to illustrate the problem, the t{x} correspond to the binomial coeff terms from the formula, diff is the input the the exp function. This loop simulates an increasing n, and the expectation is that the result should get smaller with increasing n:

{code}
t1=0.0, t2=4547.288942497606, t3=4770.9627189150215, diff=-223.67377641741587, result=7.23957639711833E-98
t1=0.0, t2=12183.221706275828, t3=12186.968419291079, diff=-3.7467130152508616, result=0.023595175513309037
t1=0.0, t2=13444.672093808651, t3=13446.561352727935, diff=-1.8892589192837477, result=0.15118380673528464
t1=0.0, t2=14186.229425843805, t3=14187.492505971342, diff=-1.2630801275372505, result=0.2827816800804864
t1=0.0, t2=14713.395226772875, t3=14714.343882929534, diff=-0.9486561566591263, result=0.3872610921706871
t1=0.0, t2=15122.726785860956, t3=15123.486358374357, diff=-0.7595725134015083, result=0.46786639087791215
t1=0.0, t2=15457.396298892796, t3=15458.029636271298, diff=-0.6333373785018921, result=0.5308173033122051
t1=0.0, t2=15740.484181590378, t3=15741.027263000607, diff=-0.5430814102292061, result=0.5809553297318205
t1=0.0, t2=15985.787659011781, t3=15986.263000234962, diff=-0.47534122318029404, result=0.6216728910682101
t1=0.0, t2=16202.21559868753, t3=16202.638224512339, diff=-0.4226258248090744, result=0.6553237931479635
t1=0.0, t2=16395.855738580227, t3=16396.236174091697, diff=-0.380435511469841, result=0.6835636445695273
{code}

hmm, after some second thoughts, I am not sure if the analysis is correct, and the problem is hidden somewhere else.)
   

> for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.  
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-644
>                 URL: https://issues.apache.org/jira/browse/MATH-644
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>            Reporter: marzieh
>            Priority: Minor
>              Labels: hypergeometric, probability
>             Fix For: 3.1
>
>
> In windows 7, I used common.Math library. I used class "HypergeometricDistributionImpl" and method "upperCumulativeProbability" of zero for distribution and the return value is larget than 1. the following code is working error.
> HypergeometricDistributionImpl u = new HypergeometricDistributionImpl(14761461, 1035 ,1841 );
> System.out.println(u.upperCumulativeProbability(0))
> Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart updated MATH-644:
---------------------------------

    Fix Version/s:     (was: 2.2)
                   3.1

Updated fix version as there will be no 2.2.1
               

> for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.  
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-644
>                 URL: https://issues.apache.org/jira/browse/MATH-644
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>            Reporter: marzieh
>            Priority: Minor
>              Labels: hypergeometric, probability
>             Fix For: 3.1
>
>
> In windows 7, I used common.Math library. I used class "HypergeometricDistributionImpl" and method "upperCumulativeProbability" of zero for distribution and the return value is larget than 1. the following code is working error.
> HypergeometricDistributionImpl u = new HypergeometricDistributionImpl(14761461, 1035 ,1841 );
> System.out.println(u.upperCumulativeProbability(0))
> Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (MATH-644) for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/MATH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved MATH-644.
----------------------------------

    Resolution: Fixed

Finally, I found the problem for the described behavior:

In the upperCumulativeProbability function there was a sanity check like this:

{noformat}
        if (x < domain[0]) {
            ret = 1.0;
        } else if (x > domain[1]) {
            ret = 0.0;
{noformat}

In fact to be correct, it has to be like this

{noformat}
        if (x <= domain[0]) {
            ret = 1.0;
        } else if (x > domain[1]) {
            ret = 0.0;
{noformat}

which is also symmetric to the case of the cumulativeProbability function.
It means that for values of x that are at the lower bound, the probability must be 1.0 as the upperCumulativeProbability is defined as P(X >= x).

Additionally, the duplicate probability mass functions have been cleaned up. After looking through the version history it became clear that initially there existed two methods, whereas the public one called only the private one (which contained the actual computation). Later on the public one got improved, whereas the private one was still called by the cumulativeProbability methods.

This has been fixed in the sense that only the public method (which also behaves better for large values of N, m, k) is used in the class.
               

> for the class of hyper-geometric distribution, for some number the method "upperCumulativeProbability" return a probability greater than 1 which is impossible.  
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MATH-644
>                 URL: https://issues.apache.org/jira/browse/MATH-644
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 2.2
>            Reporter: marzieh
>            Priority: Minor
>              Labels: hypergeometric, probability
>             Fix For: 3.1
>
>
> In windows 7, I used common.Math library. I used class "HypergeometricDistributionImpl" and method "upperCumulativeProbability" of zero for distribution and the return value is larget than 1. the following code is working error.
> HypergeometricDistributionImpl u = new HypergeometricDistributionImpl(14761461, 1035 ,1841 );
> System.out.println(u.upperCumulativeProbability(0))
> Thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira