Why not BigDecimal?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Why not BigDecimal?

Something Something
Apache Commons-Math seems like an excellent library, but what I don't
understand is why we are using 'double' everywhere instead of BigDecimal.

I wrote a simple program to run a Multiple Regression Analysis followed by
Rank, and compared my results to those from R stats package and because of
lack of precision the 'ranks' are way off.  I mean I am assuming that if we
had used BigDecimal the ranks would have matched to the ones from R.

Is there something I am missing?
Reply | Threaded
Open this post in threaded view
|

Re: Why not BigDecimal?

Ted Dunning
Doesn't R use doubles under the covers?  Note this quote from the manual:

*R has no single precision data type. All real numbers are stored in double
precision format*.

(from http://stat.ethz.ch/R-manual/R-patched/library/base/html/double.html)

Any difference in the results that you saw is likely due to different
algorithms.  If you mean rank as in the rank of a matrix, then the exact
value is very much a matter of judgment since it involves an implicit
comparison of a numerical value to zero.  Using BigDecimal is very unlikely
to have significantly affected your results.

On Thu, Feb 11, 2010 at 9:29 PM, Something Something <
[hidden email]> wrote:

> I wrote a simple program to run a Multiple Regression Analysis followed by
> Rank, and compared my results to those from R stats package and because of
> lack of precision the 'ranks' are way off.  I mean I am assuming that if we
> had used BigDecimal the ranks would have matched to the ones from R.
>
> Is there something I am missing?
>



--
Ted Dunning, CTO
DeepDyve
Reply | Threaded
Open this post in threaded view
|

RE: Why not BigDecimal?

Andy Turner
Interesting that this is a precision issue. I'm not surprised depending on what you are doing, double precision may not be enough. It depends a lot on how the calculations are broken into smaller parts. BigDecimal is fantastically useful...

Andy
http://www.geog.leeds.ac.uk/people/a.turner/
 
-----Original Message-----
From: Ted Dunning [mailto:[hidden email]]
Sent: 12 February 2010 05:59
To: Commons Users List
Subject: Re: Why not BigDecimal?

Doesn't R use doubles under the covers?  Note this quote from the manual:

*R has no single precision data type. All real numbers are stored in double
precision format*.

(from http://stat.ethz.ch/R-manual/R-patched/library/base/html/double.html)

Any difference in the results that you saw is likely due to different
algorithms.  If you mean rank as in the rank of a matrix, then the exact
value is very much a matter of judgment since it involves an implicit
comparison of a numerical value to zero.  Using BigDecimal is very unlikely
to have significantly affected your results.

On Thu, Feb 11, 2010 at 9:29 PM, Something Something <
[hidden email]> wrote:

> I wrote a simple program to run a Multiple Regression Analysis followed by
> Rank, and compared my results to those from R stats package and because of
> lack of precision the 'ranks' are way off.  I mean I am assuming that if we
> had used BigDecimal the ranks would have matched to the ones from R.
>
> Is there something I am missing?
>



--
Ted Dunning, CTO
DeepDyve
Reply | Threaded
Open this post in threaded view
|

Re: Why not BigDecimal?

Luc Maisonobe
In reply to this post by Something Something
Something Something a écrit :
> Apache Commons-Math seems like an excellent library, but what I don't
> understand is why we are using 'double' everywhere instead of BigDecimal.

Commons-math is a low level library intended to be used by many
different types of applications. Using primitive double as the standard
type is a fair bet to integrate smoothly with a priori unknown
applications. Using BigDecimal would greatly restrict the audience.

Also note that BigDecimal lacks many functions (sin, cos, sqrt, cbrt,
exp, log ...).

I also doubt performances would be on par with primitive doubles with
repset to speed, but this is only a personal guess that would need to be
verified.

>
> I wrote a simple program to run a Multiple Regression Analysis followed by
> Rank, and compared my results to those from R stats package and because of
> lack of precision the 'ranks' are way off.  I mean I am assuming that if we
> had used BigDecimal the ranks would have matched to the ones from R.

There are many other things that could explain differences. Even with
BigDecimal, it is difficult to set the proper scale, so your assumtion
needs to be verified.

Luc

>
> Is there something I am missing?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Why not BigDecimal?

Ted Dunning
In reply to this post by Andy Turner
It is not a precision issue.  R and commons-math use different algorithms
with the same underlying numerical implementation.

It is even an open question which result is better.  R has lots of
credibility, but I have found cases where it lacked precision (and I coded
up a patch that was accepted).

Unbounded precision integers and rationals are very useful, but not usually
for large scale numerical programming.  Except in a very few cases, if you
need more than 17 digits of precision, you have other very serious problems
that precision won't help.

On Fri, Feb 12, 2010 at 1:40 AM, Andy Turner <[hidden email]>wrote:

> Interesting that this is a precision issue. I'm not surprised depending on
> what you are doing, double precision may not be enough. It depends a lot on
> how the calculations are broken into smaller parts. BigDecimal is
> fantastically useful...
>



--
Ted Dunning, CTO
DeepDyve
Reply | Threaded
Open this post in threaded view
|

Re: Why not BigDecimal?

Something Something
Okay... Let's not worry about R, BigDecimal & precision for time being.  I
might have been looking at wrong values.  So let's hold that thought.

Let's take a simple example for getting Y-Hat values using Multiple
Regression given in this PDF:
http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf

I created a small CSV called, students.csv that contains the following data:

s1 14 4 1
s2 23 4 2
s3 30 7 2
s4 50 7 4
s5 39 10 3
s6 67 10 6

Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)

Now the expected results are:

yHat[0]:15.166666666666668
yHat[1]:24.666666666666668
yHat[2]:27.666666666666664
yHat[3]:46.666666666666664
yHat[4]:40.166666666666664
yHat[5]:68.66666666666667

This is based on the following equation (given in the PDF):  Y = 1.67 + X1 +
9.50 X2

I wrote the following small quick and dirty code to
use OLSMultipleLinearRegression.  The 'calculateHat()' method returns a
RealMatrix, but I can't see the above results in there.  Am I using this
class correctly?  Please let me know.  Thanks.



private static void regression1() {
double[][] X = new double[6][2];
double[] Y = new double[6];
try {
File file = new File("C:\\students.csv");
FileReader reader = new FileReader(file);
BufferedReader in = new BufferedReader(reader);
String line;
 int count = 0;
        while ((line = in.readLine()) != null) {
//        System.out.println(line);
        Scanner scanner = new Scanner(line);
        scanner.useDelimiter(" ");
        String[] cols = new String[4];
        int col = 0;
        while (scanner.hasNext()) {
            cols[col++] = scanner.next();
        }
            Y[count] = Double.valueOf(cols[1]);
            X[count] [0] = Double.valueOf(cols[2]);
            X[count] [1] = Double.valueOf(cols[3]);
            count++;
         }
         in.close();
         reader.close();
       } catch (IOException e) {
         e.printStackTrace();
       }
       OLSMultipleLinearRegression regression = new
OLSMultipleLinearRegression();
       regression.newSampleData(Y, X);
       RealMatrix matrix = regression.calculateHat();
       System.out.println("matrix:" + matrix.getColumnDimension());
}


On Fri, Feb 12, 2010 at 12:08 PM, Ted Dunning <[hidden email]> wrote:

> It is not a precision issue.  R and commons-math use different algorithms
> with the same underlying numerical implementation.
>
> It is even an open question which result is better.  R has lots of
> credibility, but I have found cases where it lacked precision (and I coded
> up a patch that was accepted).
>
> Unbounded precision integers and rationals are very useful, but not usually
> for large scale numerical programming.  Except in a very few cases, if you
> need more than 17 digits of precision, you have other very serious problems
> that precision won't help.
>
> On Fri, Feb 12, 2010 at 1:40 AM, Andy Turner <[hidden email]
> >wrote:
>
> > Interesting that this is a precision issue. I'm not surprised depending
> on
> > what you are doing, double precision may not be enough. It depends a lot
> on
> > how the calculations are broken into smaller parts. BigDecimal is
> > fantastically useful...
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>
Reply | Threaded
Open this post in threaded view
|

Re: Why not BigDecimal?

Phil Steitz
Something Something wrote:

> Okay... Let's not worry about R, BigDecimal & precision for time being.  I
> might have been looking at wrong values.  So let's hold that thought.
>
> Let's take a simple example for getting Y-Hat values using Multiple
> Regression given in this PDF:
> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
>
> I created a small CSV called, students.csv that contains the following data:
>
> s1 14 4 1
> s2 23 4 2
> s3 30 7 2
> s4 50 7 4
> s5 39 10 3
> s6 67 10 6
>
> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
>
> Now the expected results are:
>
> yHat[0]:15.166666666666668
> yHat[1]:24.666666666666668
> yHat[2]:27.666666666666664
> yHat[3]:46.666666666666664
> yHat[4]:40.166666666666664
> yHat[5]:68.66666666666667
>
> This is based on the following equation (given in the PDF):  Y = 1.67 + X1 +
> 9.50 X2
>
> I wrote the following small quick and dirty code to
> use OLSMultipleLinearRegression.  The 'calculateHat()' method returns a
> RealMatrix, but I can't see the above results in there.  Am I using this
> class correctly?  Please let me know.  Thanks.

The "hat matrix," as defined in the javadoc for calculateHat, is not
the same as the vector of yHat values.  See the javadoc and the
references that it contains for the definition of the hat matrix.

To compute predicted values, you need to post-multiply the design
matrix, X, by the estimated coefficients. Using the variable
definitions below, this is

RealVector b = regression.calculateBeta();
RealVector yHat = X.operate(b);

Side note: the residuals, Y - Y-hat, are available directly via
estimateResiduals; but to get predicted values directly you need to
compute them from the coeffients and design matrix as above.  A
computePredictedValues method added to
AbsractMultipleLinearRegression might be a good enhancement, as well
as a predict(RealVector) method similar to what SimpleRegression
has. Patches welcome!


Phil

>
>
>
> private static void regression1() {
> double[][] X = new double[6][2];
> double[] Y = new double[6];
> try {
> File file = new File("C:\\students.csv");
> FileReader reader = new FileReader(file);
> BufferedReader in = new BufferedReader(reader);
> String line;
>  int count = 0;
>         while ((line = in.readLine()) != null) {
> //        System.out.println(line);
>         Scanner scanner = new Scanner(line);
>         scanner.useDelimiter(" ");
>         String[] cols = new String[4];
>         int col = 0;
>         while (scanner.hasNext()) {
>             cols[col++] = scanner.next();
>         }
>             Y[count] = Double.valueOf(cols[1]);
>             X[count] [0] = Double.valueOf(cols[2]);
>             X[count] [1] = Double.valueOf(cols[3]);
>             count++;
>          }
>          in.close();
>          reader.close();
>        } catch (IOException e) {
>          e.printStackTrace();
>        }
>        OLSMultipleLinearRegression regression = new
> OLSMultipleLinearRegression();
>        regression.newSampleData(Y, X);
>        RealMatrix matrix = regression.calculateHat();
>        System.out.println("matrix:" + matrix.getColumnDimension());
> }
>
>
> On Fri, Feb 12, 2010 at 12:08 PM, Ted Dunning <[hidden email]> wrote:
>
>> It is not a precision issue.  R and commons-math use different algorithms
>> with the same underlying numerical implementation.
>>
>> It is even an open question which result is better.  R has lots of
>> credibility, but I have found cases where it lacked precision (and I coded
>> up a patch that was accepted).
>>
>> Unbounded precision integers and rationals are very useful, but not usually
>> for large scale numerical programming.  Except in a very few cases, if you
>> need more than 17 digits of precision, you have other very serious problems
>> that precision won't help.
>>
>> On Fri, Feb 12, 2010 at 1:40 AM, Andy Turner <[hidden email]
>>> wrote:
>>> Interesting that this is a precision issue. I'm not surprised depending
>> on
>>> what you are doing, double precision may not be enough. It depends a lot
>> on
>>> how the calculations are broken into smaller parts. BigDecimal is
>>> fantastically useful...
>>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]