[Math] MATH-894

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[Math] MATH-894

Gilles Sadowski
Hi.

What do you think about deprecating "getInternalValues"?
Its only use is in "DescriptiveStatistics". I guess that the current usage
could save some time (by avoiding the creation of an array and copying the
elements), so that acces to the internal representation should be retained.
If so, I think that the name "getInternalValues" should be changed in order
to make it explicit that we are exposing an instance's field (whose
modification will be reflected in the object's state). The suffix
"...Values" is misleading; I suggest "getInternalArray" since current API
(and current usage) anyways forbids that another data structure be used.
[The alternative would be to enhance encapsulation by hiding the internal
representation altogether, thus removing the methods "getInternalValues()"
and "start()".]

I also notice that the "clear()" method reallocates the internal array.
IMO, it is unnecessarily inefficient. If one wanted to get this behaviour,
one could just create a new object. However, when reusing the same object,
users could legitimately expect that no allocation occurs and that it is
only the _contents_ that is discarded.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/12/12 6:05 AM, Gilles Sadowski wrote:
> Hi.
>
> What do you think about deprecating "getInternalValues"?
> Its only use is in "DescriptiveStatistics". I guess that the current usage
> could save some time (by avoiding the creation of an array and copying the
> elements), so that acces to the internal representation should be retained.

That is why it is there.  ResizeableDoubleArray was created to be
the backing store for DescriptiveStatistics.  We don't actually use
it anywhere else, so I am wondering if it might in fact be better to
deprecate the entire class and aim to make it a private inner class
of DescriptiveStatistics.  That way, the broken encapsulation that
you point out will be less of an issue.  The idea behind the class
is to support a "rolling window" of data from a stream that
statistics can be applied to.  If we want to keep it separate (or
even not, actually), it might be better for it to expose an apply
method that takes a UnivariateStatistic as an argument and applies
the statistic to the currently defined window.  So the following
method from DescriptiveStatistics
 public double apply(UnivariateStatistic stat) {
        return stat.evaluate(eDA.getInternalValues(), eDA.start(),
eDA.getNumElements());
 }
would be implemented in ResizeableDoubleArray (just removing the
eDA. everywhere).  Then in DescriptiveStatistics you would have

public double apply(UnivariateStatistic stat) {
        return eDA.apply(stat);
 }
possibly renaming the version in RDA.  But given all of the fuss
over this class that really is just there to serve
DescriptiveStatistics, I think it may be best to just make it a
private inner class of DescriptiveStatistics.

Phil


> If so, I think that the name "getInternalValues" should be changed in order
> to make it explicit that we are exposing an instance's field (whose
> modification will be reflected in the object's state). The suffix
> "...Values" is misleading; I suggest "getInternalArray" since current API
> (and current usage) anyways forbids that another data structure be used.
> [The alternative would be to enhance encapsulation by hiding the internal
> representation altogether, thus removing the methods "getInternalValues()"
> and "start()".]
>
> I also notice that the "clear()" method reallocates the internal array.
> IMO, it is unnecessarily inefficient. If one wanted to get this behaviour,
> one could just create a new object. However, when reusing the same object,
> users could legitimately expect that no allocation occurs and that it is
> only the _contents_ that is discarded.
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [Math] MATH-894

Patrick Meyer
Please keep ResizeableDoubleArray as its own class. I find it very useful
for more than descriptive statistics. I do like the idea of adding an apply
method to it.

Patrick

-----Original Message-----
From: Phil Steitz [mailto:[hidden email]]
Sent: Monday, November 12, 2012 1:23 PM
To: Commons Developers List
Subject: Re: [Math] MATH-894

On 11/12/12 6:05 AM, Gilles Sadowski wrote:
> Hi.
>
> What do you think about deprecating "getInternalValues"?
> Its only use is in "DescriptiveStatistics". I guess that the current
> usage could save some time (by avoiding the creation of an array and
> copying the elements), so that acces to the internal representation should
be retained.

That is why it is there.  ResizeableDoubleArray was created to be the
backing store for DescriptiveStatistics.  We don't actually use it anywhere
else, so I am wondering if it might in fact be better to deprecate the
entire class and aim to make it a private inner class of
DescriptiveStatistics.  That way, the broken encapsulation that you point
out will be less of an issue.  The idea behind the class is to support a
"rolling window" of data from a stream that statistics can be applied to.
If we want to keep it separate (or even not, actually), it might be better
for it to expose an apply method that takes a UnivariateStatistic as an
argument and applies the statistic to the currently defined window.  So the
following method from DescriptiveStatistics  public double
apply(UnivariateStatistic stat) {
        return stat.evaluate(eDA.getInternalValues(), eDA.start(),
eDA.getNumElements());  } would be implemented in ResizeableDoubleArray
(just removing the eDA. everywhere).  Then in DescriptiveStatistics you
would have

public double apply(UnivariateStatistic stat) {
        return eDA.apply(stat);
 }
possibly renaming the version in RDA.  But given all of the fuss over this
class that really is just there to serve DescriptiveStatistics, I think it
may be best to just make it a private inner class of DescriptiveStatistics.

Phil


> If so, I think that the name "getInternalValues" should be changed in
> order to make it explicit that we are exposing an instance's field
> (whose modification will be reflected in the object's state). The
> suffix "...Values" is misleading; I suggest "getInternalArray" since
> current API (and current usage) anyways forbids that another data
structure be used.
> [The alternative would be to enhance encapsulation by hiding the
> internal representation altogether, thus removing the methods
"getInternalValues()"

> and "start()".]
>
> I also notice that the "clear()" method reallocates the internal array.
> IMO, it is unnecessarily inefficient. If one wanted to get this
> behaviour, one could just create a new object. However, when reusing
> the same object, users could legitimately expect that no allocation
> occurs and that it is only the _contents_ that is discarded.
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/12/12 11:38 AM, Patrick Meyer wrote:
> Please keep ResizeableDoubleArray as its own class. I find it very useful
> for more than descriptive statistics. I do like the idea of adding an apply
> method to it.

Thanks for speaking up :)

I think Gilles is right though on the broken encapsulation, so maybe
its best to add the apply or "applyStatistic" method as described below.

Phil

>
> Patrick
>
> -----Original Message-----
> From: Phil Steitz [mailto:[hidden email]]
> Sent: Monday, November 12, 2012 1:23 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-894
>
> On 11/12/12 6:05 AM, Gilles Sadowski wrote:
>> Hi.
>>
>> What do you think about deprecating "getInternalValues"?
>> Its only use is in "DescriptiveStatistics". I guess that the current
>> usage could save some time (by avoiding the creation of an array and
>> copying the elements), so that acces to the internal representation should
> be retained.
>
> That is why it is there.  ResizeableDoubleArray was created to be the
> backing store for DescriptiveStatistics.  We don't actually use it anywhere
> else, so I am wondering if it might in fact be better to deprecate the
> entire class and aim to make it a private inner class of
> DescriptiveStatistics.  That way, the broken encapsulation that you point
> out will be less of an issue.  The idea behind the class is to support a
> "rolling window" of data from a stream that statistics can be applied to.
> If we want to keep it separate (or even not, actually), it might be better
> for it to expose an apply method that takes a UnivariateStatistic as an
> argument and applies the statistic to the currently defined window.  So the
> following method from DescriptiveStatistics  public double
> apply(UnivariateStatistic stat) {
>         return stat.evaluate(eDA.getInternalValues(), eDA.start(),
> eDA.getNumElements());  } would be implemented in ResizeableDoubleArray
> (just removing the eDA. everywhere).  Then in DescriptiveStatistics you
> would have
>
> public double apply(UnivariateStatistic stat) {
>         return eDA.apply(stat);
>  }
> possibly renaming the version in RDA.  But given all of the fuss over this
> class that really is just there to serve DescriptiveStatistics, I think it
> may be best to just make it a private inner class of DescriptiveStatistics.
>
> Phil
>
>
>> If so, I think that the name "getInternalValues" should be changed in
>> order to make it explicit that we are exposing an instance's field
>> (whose modification will be reflected in the object's state). The
>> suffix "...Values" is misleading; I suggest "getInternalArray" since
>> current API (and current usage) anyways forbids that another data
> structure be used.
>> [The alternative would be to enhance encapsulation by hiding the
>> internal representation altogether, thus removing the methods
> "getInternalValues()"
>> and "start()".]
>>
>> I also notice that the "clear()" method reallocates the internal array.
>> IMO, it is unnecessarily inefficient. If one wanted to get this
>> behaviour, one could just create a new object. However, when reusing
>> the same object, users could legitimately expect that no allocation
>> occurs and that it is only the _contents_ that is discarded.
>>
>>
>> Regards,
>> Gilles
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Gilles Sadowski
In reply to this post by Patrick Meyer
Hi Patrick.

On Mon, Nov 12, 2012 at 02:38:07PM -0500, Patrick Meyer wrote:
> Please keep ResizeableDoubleArray as its own class. I find it very useful
> for more than descriptive statistics. I do like the idea of adding an apply
> method to it.

Could you please have a look at the JIRA page:
  https://issues.apache.org/jira/browse/MATH-894
and tell us whether the proposed changes would affect your usage of the
class?


Thanks for the feedback,
Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [Math] MATH-894

Patrick Meyer
Hi Gilles,

These changes look fine to me and the addition of the compute method is
really nice. Looking more closely at my code, I am using the getElements
method. As long as that remains available, it make sense to deprecate the
getInternalValues method.

My use of ResizeableDoubleArray is related to an earlier discussion -
missing values. My data are stored in a database and it may contain missing
values. I know how many cases are in the database, but I don't know the
amount of missing data. I read the non-missing database values into a
ResizeableDoubleArray, call getElements() and use the nonmissing data array
in my calculations. It may be a bit clunky, but it's one of the ways I
handle missing data without looping over the database twice. I don't have a
solution for comprehensive treatment of missing data yet, but I appreciate
the conversation we are having.

Patrick


-----Original Message-----
From: Gilles Sadowski [mailto:[hidden email]]
Sent: Thursday, November 15, 2012 6:32 AM
To: [hidden email]
Subject: Re: [Math] MATH-894

Hi Patrick.

On Mon, Nov 12, 2012 at 02:38:07PM -0500, Patrick Meyer wrote:
> Please keep ResizeableDoubleArray as its own class. I find it very
> useful for more than descriptive statistics. I do like the idea of
> adding an apply method to it.

Could you please have a look at the JIRA page:
  https://issues.apache.org/jira/browse/MATH-894
and tell us whether the proposed changes would affect your usage of the
class?


Thanks for the feedback,
Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Gilles Sadowski
Hi.

>
> These changes look fine to me and the addition of the compute method is
> really nice.

Oh, I should have asked whether you can also look at the source code in the
trunk: Currently the "compute(UnivariateStatistic)" method is only
implemented internally (in a private inner class of "DescriptiveStatistics").
If you want this feature to be in the public API, could you create a new issue
on JIRA? [Then we can see in which package this class should go.]

> Looking more closely at my code, I am using the getElements
> method. As long as that remains available, it make sense to deprecate the
> getInternalValues method.
>
> My use of ResizeableDoubleArray is related to an earlier discussion -
> missing values. My data are stored in a database and it may contain missing
> values. I know how many cases are in the database, but I don't know the
> amount of missing data. I read the non-missing database values into a
> ResizeableDoubleArray, call getElements() and use the nonmissing data array
> in my calculations. It may be a bit clunky, but it's one of the ways I
> handle missing data without looping over the database twice. I don't have a
> solution for comprehensive treatment of missing data yet, but I appreciate
> the conversation we are having.

I'm afraid I don't follow you; I don't see the connection between missing
values and the resizeable array. Maybe a small code example would help me.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/15/12 5:40 AM, Gilles Sadowski wrote:
> Hi.
>
>> These changes look fine to me and the addition of the compute method is
>> really nice.
> Oh, I should have asked whether you can also look at the source code in the
> trunk: Currently the "compute(UnivariateStatistic)" method is only
> implemented internally (in a private inner class of "DescriptiveStatistics").
> If you want this feature to be in the public API, could you create a new issue
> on JIRA? [Then we can see in which package this class should go.]

How to do this with good separation of concerns is an interesting
problem.  I agree with Gilles that making compute public in RDA is
not nice.  The basic problem is that we can't just pass around array
pointers as we would in C and we need a way to support applying a
method that can take an array and offset as argument without
exposing either the internal array or the offset.  A possibly better
approach might be to just create an interface - possibly housed in
.util - for methods that take double arrays and offsets and compute
something from them.  For example

public interface ArrayFunction {
    double evaluate(double[] values, int begin, int length);
}

Then in RDA, we add compute(ArrayFunction) as a public method.  Then
if we make UnivariateStatistic extend this new interface,
DescriptiveStatistics can get what it needs from this.

Just at thought.  Would love to get better ideas on this.  What is
in trunk now works; but having to subclass for internal use makes me
wonder if we have solved the problem.

Phil


>
>> Looking more closely at my code, I am using the getElements
>> method. As long as that remains available, it make sense to deprecate the
>> getInternalValues method.
>>
>> My use of ResizeableDoubleArray is related to an earlier discussion -
>> missing values. My data are stored in a database and it may contain missing
>> values. I know how many cases are in the database, but I don't know the
>> amount of missing data. I read the non-missing database values into a
>> ResizeableDoubleArray, call getElements() and use the nonmissing data array
>> in my calculations. It may be a bit clunky, but it's one of the ways I
>> handle missing data without looping over the database twice. I don't have a
>> solution for comprehensive treatment of missing data yet, but I appreciate
>> the conversation we are having.
> I'm afraid I don't follow you; I don't see the connection between missing
> values and the resizeable array. Maybe a small code example would help me.
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Ted Dunning


The typical answer to this when adding a functional method like compute is to also add a view object. The rationale is that a small number of view methods can be composed with a small number of compute/aggregate methods to get the expressive power of what would otherwise require a vast array of methods.  

On Nov 15, 2012, at 7:03 AM, Phil Steitz <[hidden email]> wrote:

>
> Then in RDA, we add compute(ArrayFunction) as a public method.  Then
> if we make UnivariateStatistic extend this new interface,
> DescriptiveStatistics can get what it needs from this.
>
> Just at thought.  Would love to get better ideas on this.  What is
> in trunk now works; but having to subclass for internal use makes me
> wonder if we have solved the problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/15/12 8:01 AM, Ted Dunning wrote:
>
> The typical answer to this when adding a functional method like compute is to also add a view object. The rationale is that a small number of view methods can be composed with a small number of compute/aggregate methods to get the expressive power of what would otherwise require a vast array of methods.  

If I understand correctly, we already have a view object exposed -
getElements.  The challenge is that this method returns a copy and
what we would like is a way to get a function computed directly on
the data encapsulated in the RDA.  Without function pointers or real
array references, I don't see a straightforward way to do this.

Phil

>
> On Nov 15, 2012, at 7:03 AM, Phil Steitz <[hidden email]> wrote:
>
>> Then in RDA, we add compute(ArrayFunction) as a public method.  Then
>> if we make UnivariateStatistic extend this new interface,
>> DescriptiveStatistics can get what it needs from this.
>>
>> Just at thought.  Would love to get better ideas on this.  What is
>> in trunk now works; but having to subclass for internal use makes me
>> wonder if we have solved the problem.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Ted Dunning
On Thu, Nov 15, 2012 at 8:42 AM, Phil Steitz <[hidden email]> wrote:

> On 11/15/12 8:01 AM, Ted Dunning wrote:
> >
> > The typical answer to this when adding a functional method like compute
> is to also add a view object. The rationale is that a small number of view
> methods can be composed with a small number of compute/aggregate methods to
> get the expressive power of what would otherwise require a vast array of
> methods.
>
> If I understand correctly, we already have a view object exposed -
> getElements.  The challenge is that this method returns a copy and
> what we would like is a way to get a function computed directly on
> the data encapsulated in the RDA.  Without function pointers or real
> array references, I don't see a straightforward way to do this.
>
>
When I say view, I mean something that is a reference and is not a copy.
 The getElements method is a copy, not  view under this terminology.

The Colt/Mahout approach is to define a view object which opaquely
remembers a reference to the original, an offset and a length.  Functions
and other arguments can be passed to this view object which operates on a
subset of the original contents by calling the function.  Performance is
actually quite good.  The JIT seems to in-line the view object access to
the underlying object and also in-lines evaluation of the function so that
the actual code that is executed is pretty much what you would write in C,
but you don't have to worry as much since the pattern of access is more
controlled.

For completeness, this is essentially what java.nio does with the *Buffer
classes as well.  You can wrap an array and then you can ask for slices out
of that array while retaining the reference semantics.
Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/15/12 9:22 AM, Ted Dunning wrote:

> On Thu, Nov 15, 2012 at 8:42 AM, Phil Steitz <[hidden email]> wrote:
>
>> On 11/15/12 8:01 AM, Ted Dunning wrote:
>>> The typical answer to this when adding a functional method like compute
>> is to also add a view object. The rationale is that a small number of view
>> methods can be composed with a small number of compute/aggregate methods to
>> get the expressive power of what would otherwise require a vast array of
>> methods.
>>
>> If I understand correctly, we already have a view object exposed -
>> getElements.  The challenge is that this method returns a copy and
>> what we would like is a way to get a function computed directly on
>> the data encapsulated in the RDA.  Without function pointers or real
>> array references, I don't see a straightforward way to do this.
>>
>>
> When I say view, I mean something that is a reference and is not a copy.
>  The getElements method is a copy, not  view under this terminology.

Do you know how to do that with a primitive array?  Can you provide
some sample code?

Thanks for your help on this.

Phil

>
> The Colt/Mahout approach is to define a view object which opaquely
> remembers a reference to the original, an offset and a length.  Functions
> and other arguments can be passed to this view object which operates on a
> subset of the original contents by calling the function.  Performance is
> actually quite good.  The JIT seems to in-line the view object access to
> the underlying object and also in-lines evaluation of the function so that
> the actual code that is executed is pretty much what you would write in C,
> but you don't have to worry as much since the pattern of access is more
> controlled.
>
> For completeness, this is essentially what java.nio does with the *Buffer
> classes as well.  You can wrap an array and then you can ask for slices out
> of that array while retaining the reference semantics.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Ted Dunning
On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz <[hidden email]> wrote:

> Do you know how to do that with a primitive array?  Can you provide
> some sample code?
>

You don't.  See my next paragraph.

See the assign method in this class:

https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java





>
> Thanks for your help on this.
>
> Phil
> >
> > The Colt/Mahout approach is to define a view object which opaquely
> > remembers a reference to the original, an offset and a length.  Functions
> > and other arguments can be passed to this view object which operates on a
> > subset of the original contents by calling the function.  Performance is
> > actually quite good.  The JIT seems to in-line the view object access to
> > the underlying object and also in-lines evaluation of the function so
> that
> > the actual code that is executed is pretty much what you would write in
> C,
> > but you don't have to worry as much since the pattern of access is more
> > controlled.
> >
> > For completeness, this is essentially what java.nio does with the *Buffer
> > classes as well.  You can wrap an array and then you can ask for slices
> out
> > of that array while retaining the reference semantics.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/15/12 10:29 AM, Ted Dunning wrote:

> On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz <[hidden email]> wrote:
>
>> Do you know how to do that with a primitive array?  Can you provide
>> some sample code?
>>
> You don't.  See my next paragraph.
>
> See the assign method in this class:
>
> https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java

Interesting.  I see no assign method, but I can see what this thing
does.  It is not clear to me though how this idea could be
meaningfully applied to solve the problem we have with applying
statistics to an RDA without doing any array copying.   Most likely
I am missing the point.

Phil

>
>
>
>
>
>> Thanks for your help on this.
>>
>> Phil
>>> The Colt/Mahout approach is to define a view object which opaquely
>>> remembers a reference to the original, an offset and a length.  Functions
>>> and other arguments can be passed to this view object which operates on a
>>> subset of the original contents by calling the function.  Performance is
>>> actually quite good.  The JIT seems to in-line the view object access to
>>> the underlying object and also in-lines evaluation of the function so
>> that
>>> the actual code that is executed is pretty much what you would write in
>> C,
>>> but you don't have to worry as much since the pattern of access is more
>>> controlled.
>>>
>>> For completeness, this is essentially what java.nio does with the *Buffer
>>> classes as well.  You can wrap an array and then you can ask for slices
>> out
>>> of that array while retaining the reference semantics.
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [Math] MATH-894

Patrick Meyer
I misunderstood an earlier email and I though the compute method was public. It seems that adding a public compute method may be more complication and effort than it is worth. My apologies for the misunderstanding. The compute method would only be of value if it can be an easy addition.

-----Original Message-----
From: Phil Steitz [mailto:[hidden email]]
Sent: Thursday, November 15, 2012 1:42 PM
To: Commons Developers List
Subject: Re: [Math] MATH-894

On 11/15/12 10:29 AM, Ted Dunning wrote:

> On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz <[hidden email]> wrote:
>
>> Do you know how to do that with a primitive array?  Can you provide
>> some sample code?
>>
> You don't.  See my next paragraph.
>
> See the assign method in this class:
>
> https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apa
> che/mahout/math/VectorView.java

Interesting.  I see no assign method, but I can see what this thing does.  It is not clear to me though how this idea could be meaningfully applied to solve the problem we have with applying
statistics to an RDA without doing any array copying.   Most likely
I am missing the point.

Phil

>
>
>
>
>
>> Thanks for your help on this.
>>
>> Phil
>>> The Colt/Mahout approach is to define a view object which opaquely
>>> remembers a reference to the original, an offset and a length.  
>>> Functions and other arguments can be passed to this view object
>>> which operates on a subset of the original contents by calling the
>>> function.  Performance is actually quite good.  The JIT seems to
>>> in-line the view object access to the underlying object and also
>>> in-lines evaluation of the function so
>> that
>>> the actual code that is executed is pretty much what you would write
>>> in
>> C,
>>> but you don't have to worry as much since the pattern of access is
>>> more controlled.
>>>
>>> For completeness, this is essentially what java.nio does with the
>>> *Buffer classes as well.  You can wrap an array and then you can ask
>>> for slices
>> out
>>> of that array while retaining the reference semantics.
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Ted Dunning
In reply to this post by Phil Steitz
On Thu, Nov 15, 2012 at 10:42 AM, Phil Steitz <[hidden email]> wrote:

> On 11/15/12 10:29 AM, Ted Dunning wrote:
> > On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz <[hidden email]>
> wrote:
> >
> >> Do you know how to do that with a primitive array?  Can you provide
> >> some sample code?
> >>
> > You don't.  See my next paragraph.
> >
> > See the assign method in this class:
> >
> >
> https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java
>
> Interesting.  I see no assign method, but I can see what this thing
> does.  It is not clear to me though how this idea could be
> meaningfully applied to solve the problem we have with applying
> statistics to an RDA without doing any array copying.   Most likely
> I am missing the point.


The assign methods are inherited.  The signatures are like
assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so
on.

My thought was that if you need to operate on part of an RDA, then a
RDA_View class might do the job.
Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
On 11/15/12 10:56 AM, Ted Dunning wrote:

> On Thu, Nov 15, 2012 at 10:42 AM, Phil Steitz <[hidden email]> wrote:
>
>> On 11/15/12 10:29 AM, Ted Dunning wrote:
>>> On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz <[hidden email]>
>> wrote:
>>>> Do you know how to do that with a primitive array?  Can you provide
>>>> some sample code?
>>>>
>>> You don't.  See my next paragraph.
>>>
>>> See the assign method in this class:
>>>
>>>
>> https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java
>>
>> Interesting.  I see no assign method, but I can see what this thing
>> does.  It is not clear to me though how this idea could be
>> meaningfully applied to solve the problem we have with applying
>> statistics to an RDA without doing any array copying.   Most likely
>> I am missing the point.
>
> The assign methods are inherited.  The signatures are like
> assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so
> on.

OK, assign looks like what I was calling "evaluate" and
DoubleFunction looks like what I was calling "ArrayFunction"

Phil
>
> My thought was that if you need to operate on part of an RDA, then a
> RDA_View class might do the job.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Ted Dunning
Yes.  Sounds similar.

On Thu, Nov 15, 2012 at 11:02 AM, Phil Steitz <[hidden email]> wrote:

> > The assign methods are inherited.  The signatures are like
> > assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so
> > on.
>
> OK, assign looks like what I was calling "evaluate" and
> DoubleFunction looks like what I was calling "ArrayFunction"
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] MATH-894

Phil Steitz
In reply to this post by Patrick Meyer
On 11/15/12 10:52 AM, Patrick Meyer wrote:
> I misunderstood an earlier email and I though the compute method was public. It seems that adding a public compute method may be more complication and effort than it is worth. My apologies for the misunderstanding. The compute method would only be of value if it can be an easy addition.

It is easy to add and that was the original proposal.  The problem
was the type of the actual parameter - UnivariateStatistic - seeming
out of place.  That is what led me to suggest a more general type -
ArrayFunction.  Unless there are better ideas, I think that is the
best way to handle this.

Phil

>
> -----Original Message-----
> From: Phil Steitz [mailto:[hidden email]]
> Sent: Thursday, November 15, 2012 1:42 PM
> To: Commons Developers List
> Subject: Re: [Math] MATH-894
>
> On 11/15/12 10:29 AM, Ted Dunning wrote:
>> On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz <[hidden email]> wrote:
>>
>>> Do you know how to do that with a primitive array?  Can you provide
>>> some sample code?
>>>
>> You don't.  See my next paragraph.
>>
>> See the assign method in this class:
>>
>> https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apa
>> che/mahout/math/VectorView.java
> Interesting.  I see no assign method, but I can see what this thing does.  It is not clear to me though how this idea could be meaningfully applied to solve the problem we have with applying
> statistics to an RDA without doing any array copying.   Most likely
> I am missing the point.
>
> Phil
>>
>>
>>
>>
>>> Thanks for your help on this.
>>>
>>> Phil
>>>> The Colt/Mahout approach is to define a view object which opaquely
>>>> remembers a reference to the original, an offset and a length.  
>>>> Functions and other arguments can be passed to this view object
>>>> which operates on a subset of the original contents by calling the
>>>> function.  Performance is actually quite good.  The JIT seems to
>>>> in-line the view object access to the underlying object and also
>>>> in-lines evaluation of the function so
>>> that
>>>> the actual code that is executed is pretty much what you would write
>>>> in
>>> C,
>>>> but you don't have to worry as much since the pattern of access is
>>>> more controlled.
>>>>
>>>> For completeness, this is essentially what java.nio does with the
>>>> *Buffer classes as well.  You can wrap an array and then you can ask
>>>> for slices
>>> out
>>>> of that array while retaining the reference semantics.
>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]