[Math] rolling calculations with lag

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[Math] rolling calculations with lag

Bradford Cross
Greetings!

Recently I stumbled into the Commons math project; nice design, good abstractions, "smart updates" and even unit tests! :-) 

the Smart updates are a key feature for event stream processing / time series simulation.  The only piece that is missing from a time series analysis and simulation perspective is the ability to supply a lag that defines a fixed sample size and perform rolling calculations.

I was very happy to see this as an item on the wish list.

A ThoughtWorks colleague (Yaxin Wang) and I are prototyping a java time series simulation engine and we are considering the commons math as the base of our numerical libraries.  In order to do this we need to complete the rolling calculations, so here is our first spike (spike means prototype that can be thrown away / not a real patch.)  We thought we would start with an easy case; mean, which uses sum.

We have already combined the rolling calculations with the smart update algorithms before in the numerical libraries for our previous time series simulation engine.  As you have mentioned in the wish list notes, our past experience is that some of the algorithms can not avoid using queues for rolling updates case.  Obviously it is something pretty fundamental to the design and requires a bit of work across a lot of places to do this for all the statistics (at least starting with summary statistics.)

Please give feedback on the design, any issues with performance (better data structure than the queue we used), etc!

If the community is OK with this initial spike, then we can start submitting patches. :-) 
 

/brad

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

statistics.tar.gz (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Phil Steitz
On 9/22/07, Bradford Cross <[hidden email]> wrote:
> Greetings!
>
> Recently I stumbled into the Commons math project; nice design, good
> abstractions, "smart updates" and even unit tests! :-)
>
Thanks!

> the Smart updates are a key feature for event stream processing / time
> series simulation.  The only piece that is missing from a time series
> analysis and simulation perspective is the ability to supply a lag that
> defines a fixed sample size and perform rolling calculations.
>

That functionality actually already exists in the
DescriptiveStatistics class.  You can set a "window size" for rolling
computations of univariate statistics using the concrete
implementation of this class,
o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl.  See
http://commons.apache.org/math/userguide/stat.html

> I was very happy to see this as an item on the wish list.

The wishlist item is not as clear as it could be.  Sorry about that.
In addition to the computations in DescriptiveStatistics that require
that you maintain all of the values in the current window in memory,
we also support "storeless" computation of statistics than can be
computed in one pass through the data. This allows very large data
streams to be handled with fixed storage overhead.  I think that what
the wishlist item refers to is something in between - ways to support
the window concept without storing all of the data.  Strictly
speaking, this is impossible, but doing things like sampling from the
streams, periodically resetting or maintaining arrays of storeless
stats with different offsets would in theory be possible.

>
> A ThoughtWorks colleague (Yaxin Wang) and I are prototyping a java time
> series simulation engine and we are considering the commons math as the base
> of our numerical libraries.  In order to do this we need to complete the
> rolling calculations, so here is our first spike (spike means prototype that
> can be thrown away / not a real patch.)  We thought we would start with an
> easy case; mean, which uses sum.
>
> We have already combined the rolling calculations with the smart update
> algorithms before in the numerical libraries for our previous time series
> simulation engine.  As you have mentioned in the wish list notes, our past
> experience is that some of the algorithms can not avoid using queues for
> rolling updates case.  Obviously it is something pretty fundamental to the
> design and requires a bit of work across a lot of places to do this for all
> the statistics (at least starting with summary statistics.)
>
> Please give feedback on the design, any issues with performance (better data
> structure than the queue we used), etc!
>
> If the community is OK with this initial spike, then we can start submitting
> patches. :-)
>

Thanks for the contribution! There are a few problems with
incorporating the code as is, though.  First it uses generics and the
concurrent package, which requires JDK 1.5 and our current minimum JDK
level is 1.3.  That could probably be eliminated fairly easily,
though.  The second is really whether or not the queue implementation
is going to improve performance over the ResizeableDoubleArray store
that DescriptiveStatisticsImpl uses now.  If you think so and can
demonstrate with benchmarks, we can talk about swapping out that
implementation.  Otherwise, its probably better to use
ResizeableDoubleArray.

I am +1 on adding a RollingStatistic abstract base class (would prefer
that name to "Statistic" since it is specialized) like you have
defined and rolling versions of the individual statistics.  This would
be a convenience over the current setup and provide a more intuitive
way to access rolling stats than to use DescriptiveStatisticsImpl as a
container.  Currently this is only the only way to do it.  So if you
can refactor to either use ResizableDoubleArray as the backing store
(look at DescriptiveStatisticsImpl.apply - the convenience classes
could just use that pattern) or otherwise eliminate the JDK 1.5
dependency, I would support adding the rolling stats.  If I understand
correctly the idea of what you mean by Sum, and Mean (using
constructor arguments to determine whether or not statistic is
rolling), I would prefer to leave the existing statistics in
commons-math as is and introduce Rolling versions as separate classes.

One more thing.  It is very important that any contributions that you
make can be made in accordance with the Apache Contributor's License
Agreement.  Have a look here:
http://www.apache.org/licenses/#clas
and make sure you can agree to those terms.  Then you can start
submitting patches with attachements to Jira tickets.

Thanks!

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Bradford Cross
On 9/29/07, Phil Steitz <[hidden email]> wrote:

>
> On 9/22/07, Bradford Cross <[hidden email]>
>
> > the Smart updates are a key feature for event stream processing / time
> > series simulation.  The only piece that is missing from a time series
> > analysis and simulation perspective is the ability to supply a lag that
> > defines a fixed sample size and perform rolling calculations.
> >
>
> That functionality actually already exists in the
> DescriptiveStatistics class.  You can set a "window size" for rolling
> computations of univariate statistics using the concrete
> implementation of this class,
> o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl.  See
> http://commons.apache.org/math/userguide/stat.html



cool - i did not see this yet.



> >
> > If the community is OK with this initial spike, then we can start
> submitting
> > patches. :-)
> >
>
> Thanks for the contribution! There are a few problems with
> incorporating the code as is, though.  First it uses generics and the
> concurrent package, which requires JDK 1.5 and our current minimum JDK
> level is 1.3.  That could probably be eliminated fairly easily,
> though.  The second is really whether or not the queue implementation
> is going to improve performance over the ResizeableDoubleArray store
> that DescriptiveStatisticsImpl uses now.  If you think so and can
> demonstrate with benchmarks, we can talk about swapping out that
> implementation.  Otherwise, its probably better to use
> ResizeableDoubleArray.


Yes, this is just a "spike" - a proof of concept. :-)  Today I setup a
benchmark test and swapped in lots of different collections.  The fastest
java queue I found is the ArrayDeque from java-6.  Interestingly, the
calculations are about twice as fast using this queue compared with some of
the other queue implementations in the java collections for a run of about
10K calls to calculate().  Nevertheless, the ResizableDoubleArray seems to
be a bit faster.  I will formalize my benchmarking with a bit more rigor and
publish the results on this thread.

I am +1 on adding a RollingStatistic abstract base class (would prefer

> that name to "Statistic" since it is specialized) like you have
> defined and rolling versions of the individual statistics.  This would
> be a convenience over the current setup and provide a more intuitive
> way to access rolling stats than to use DescriptiveStatisticsImpl as a
> container.  Currently this is only the only way to do it.  So if you
> can refactor to either use ResizableDoubleArray as the backing store
> (look at DescriptiveStatisticsImpl.apply - the convenience classes
> could just use that pattern) or otherwise eliminate the JDK 1.5
> dependency, I would support adding the rolling stats.  If I understand
> correctly the idea of what you mean by Sum, and Mean (using
> constructor arguments to determine whether or not statistic is
> rolling), I would prefer to leave the existing statistics in
> commons-math as is and introduce Rolling versions as separate classes.



Sounds good - I will start working on the RollingStatistics.  As for the
convience pattern I used in Mean/Sum (using
constructor arguments to determine whether or not statistic is rolling) - it
is easy to do the refactorings later after the rolling statistics are
added.  We can just leave the current statistics as is and wait to see if we
find some valuable reason to do it.

One more thing.  It is very important that any contributions that you
> make can be made in accordance with the Apache Contributor's License
> Agreement.  Have a look here:
> http://www.apache.org/licenses/#clas
> and make sure you can agree to those terms.


Yep, no problems. :-)


Then you can start
> submitting patches with attachements to Jira tickets.



Sounds good.
Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Bradford Cross
In reply to this post by Phil Steitz
OK, I have created a patch...I tried to follow the instructions to file a
bug on bugzilla but i can't seem to find the right place to file a new bug
to either commons or commons math.

I wonder if someone could help me out.

/b

On 9/29/07, Phil Steitz <[hidden email]> wrote:

>
> On 9/22/07, Bradford Cross <[hidden email]> wrote:
> > Greetings!
> >
> > Recently I stumbled into the Commons math project; nice design, good
> > abstractions, "smart updates" and even unit tests! :-)
> >
> Thanks!
>
> > the Smart updates are a key feature for event stream processing / time
> > series simulation.  The only piece that is missing from a time series
> > analysis and simulation perspective is the ability to supply a lag that
> > defines a fixed sample size and perform rolling calculations.
> >
>
> That functionality actually already exists in the
> DescriptiveStatistics class.  You can set a "window size" for rolling
> computations of univariate statistics using the concrete
> implementation of this class,
> o.a.c.math.stat.descriptive.DescriptiveStatisticsImpl.  See
> http://commons.apache.org/math/userguide/stat.html
>
> > I was very happy to see this as an item on the wish list.
>
> The wishlist item is not as clear as it could be.  Sorry about that.
> In addition to the computations in DescriptiveStatistics that require
> that you maintain all of the values in the current window in memory,
> we also support "storeless" computation of statistics than can be
> computed in one pass through the data. This allows very large data
> streams to be handled with fixed storage overhead.  I think that what
> the wishlist item refers to is something in between - ways to support
> the window concept without storing all of the data.  Strictly
> speaking, this is impossible, but doing things like sampling from the
> streams, periodically resetting or maintaining arrays of storeless
> stats with different offsets would in theory be possible.
> >
> > A ThoughtWorks colleague (Yaxin Wang) and I are prototyping a java time
> > series simulation engine and we are considering the commons math as the
> base
> > of our numerical libraries.  In order to do this we need to complete the
> > rolling calculations, so here is our first spike (spike means prototype
> that
> > can be thrown away / not a real patch.)  We thought we would start with
> an
> > easy case; mean, which uses sum.
> >
> > We have already combined the rolling calculations with the smart update
> > algorithms before in the numerical libraries for our previous time
> series
> > simulation engine.  As you have mentioned in the wish list notes, our
> past
> > experience is that some of the algorithms can not avoid using queues for
> > rolling updates case.  Obviously it is something pretty fundamental to
> the
> > design and requires a bit of work across a lot of places to do this for
> all
> > the statistics (at least starting with summary statistics.)
> >
> > Please give feedback on the design, any issues with performance (better
> data
> > structure than the queue we used), etc!
> >
> > If the community is OK with this initial spike, then we can start
> submitting
> > patches. :-)
> >
>
> Thanks for the contribution! There are a few problems with
> incorporating the code as is, though.  First it uses generics and the
> concurrent package, which requires JDK 1.5 and our current minimum JDK
> level is 1.3.  That could probably be eliminated fairly easily,
> though.  The second is really whether or not the queue implementation
> is going to improve performance over the ResizeableDoubleArray store
> that DescriptiveStatisticsImpl uses now.  If you think so and can
> demonstrate with benchmarks, we can talk about swapping out that
> implementation.  Otherwise, its probably better to use
> ResizeableDoubleArray.
>
> I am +1 on adding a RollingStatistic abstract base class (would prefer
> that name to "Statistic" since it is specialized) like you have
> defined and rolling versions of the individual statistics.  This would
> be a convenience over the current setup and provide a more intuitive
> way to access rolling stats than to use DescriptiveStatisticsImpl as a
> container.  Currently this is only the only way to do it.  So if you
> can refactor to either use ResizableDoubleArray as the backing store
> (look at DescriptiveStatisticsImpl.apply - the convenience classes
> could just use that pattern) or otherwise eliminate the JDK 1.5
> dependency, I would support adding the rolling stats.  If I understand
> correctly the idea of what you mean by Sum, and Mean (using
> constructor arguments to determine whether or not statistic is
> rolling), I would prefer to leave the existing statistics in
> commons-math as is and introduce Rolling versions as separate classes.
>
> One more thing.  It is very important that any contributions that you
> make can be made in accordance with the Apache Contributor's License
> Agreement.  Have a look here:
> http://www.apache.org/licenses/#clas
> and make sure you can agree to those terms.  Then you can start
> submitting patches with attachements to Jira tickets.
>
> Thanks!
>
> Phil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Wendy Smoak
On 10/3/07, Bradford Cross <[hidden email]> wrote:

> OK, I have created a patch...I tried to follow the instructions to file a
> bug on bugzilla but i can't seem to find the right place to file a new bug
> to either commons or commons math.
>
> I wonder if someone could help me out.

Start here for [math]: http://commons.apache.org/math/issue-tracking.html

--
Wendy

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Phil Steitz
In reply to this post by Bradford Cross
On 10/3/07, Bradford Cross <[hidden email]> wrote:
> OK, I have created a patch...I tried to follow the instructions to file a
> bug on bugzilla but i can't seem to find the right place to file a new bug
> to either commons or commons math.
>
> I wonder if someone could help me out.
>

Sorry for the response latency and sorry if you were led to Bugzilla
by the incorrect link that I just noticed on
http://commons.apache.org/math/developers.html.  I will fix that.

We now use Jira for issue tracking / patch submission.  Here is a link
to the commons math issues page:

http://commons.apache.org/math/issue-tracking.html

Phil
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Bradford Cross
Cool - first patch finally submitted. :-)

On 10/6/07, Phil Steitz <[hidden email]> wrote:

>
> On 10/3/07, Bradford Cross <[hidden email]> wrote:
> > OK, I have created a patch...I tried to follow the instructions to file
> a
> > bug on bugzilla but i can't seem to find the right place to file a new
> bug
> > to either commons or commons math.
> >
> > I wonder if someone could help me out.
> >
>
> Sorry for the response latency and sorry if you were led to Bugzilla
> by the incorrect link that I just noticed on
> http://commons.apache.org/math/developers.html.  I will fix that.
>
> We now use Jira for issue tracking / patch submission.  Here is a link
> to the commons math issues page:
>
> http://commons.apache.org/math/issue-tracking.html
>
> Phil
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Phil Steitz
Thanks!

Had a quick look and have a couple of comments.

First, have a look at the developers guide (linked on the [math]
website) for style and other guidlines and if you are not set up yet
with maven, let me know if you  need help getting set up so you can
get checkstyle and pmd reports.

We should probably talk a little about the API - i.e., what the method
names actually mean - and this all needs to be specified in the method
javadoc.  Commons math is as much about the API as the
implementations, so we try to be pretty careful about getting the
names and semantics right.  Like other commons components, we have an
implicit commitment to maintain backward compatibility, so that makes
it even more important to get the APIs right.  So can you describe
what exactly the public methods mean and why they are named as they
are?

One more patch-generation point.  I notice that the patch includes
some variable name changes and other stylistic changes to existing
code.  While there is nothing wrong with suggesting this kind of
change, we try to separate the style / formatting changes from the
changes that introduce new features or fix bugs.  That makes the diffs
and change logs easier to read.  So it would be good to remove those
changes from the patch.

Checkstyle will flag this, but two other little things I noticed are
the presence of tabs (we use spaces in place of tabs) and if - then -
else with no braces (we like braces).

Thanks again for your interest and contributions!

Phil

On 10/11/07, Bradford Cross <[hidden email]> wrote:

> Cool - first patch finally submitted. :-)
>
> On 10/6/07, Phil Steitz <[hidden email]> wrote:
> >
> > On 10/3/07, Bradford Cross <[hidden email]> wrote:
> > > OK, I have created a patch...I tried to follow the instructions to file
> > a
> > > bug on bugzilla but i can't seem to find the right place to file a new
> > bug
> > > to either commons or commons math.
> > >
> > > I wonder if someone could help me out.
> > >
> >
> > Sorry for the response latency and sorry if you were led to Bugzilla
> > by the incorrect link that I just noticed on
> > http://commons.apache.org/math/developers.html.  I will fix that.
> >
> > We now use Jira for issue tracking / patch submission.  Here is a link
> > to the commons math issues page:
> >
> > http://commons.apache.org/math/issue-tracking.html
> >
> > Phil
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Bradford Cross
On 10/12/07, Phil Steitz <[hidden email]> wrote:
>
> Thanks!
>
> Had a quick look and have a couple of comments.
>
> First, have a look at the developers guide (linked on the [math]
> website) for style and other guidlines and if you are not set up yet
> with maven, let me know if you  need help getting set up so you can
> get checkstyle and pmd reports.


>>>Sure.

We should probably talk a little about the API - i.e., what the method
> names actually mean - and this all needs to be specified in the method
> javadoc.  Commons math is as much about the API as the
> implementations, so we try to be pretty careful about getting the
> names and semantics right.  Like other commons components, we have an
> implicit commitment to maintain backward compatibility, so that makes
> it even more important to get the APIs right.  So can you describe
> what exactly the public methods mean and why they are named as they
> are?


>>>I agree.  calculate() can be renamed to increment() for consistency with
the rest of the code base.  I also suggest that all increment() methods
return the double value of the incremental calculation rather than void,
which forces the client code to do statistic.increment(), and
statistic.getResult() - the change doesn't violate backward compatibility
and makes client code use of the API simpler and more intuitive.  Also note
that evaluate() methods return double, so this change would make calculation
methods consistent.  On this note, I also think it is better to decompose
each method of calculation into a different object.  There is static
calculation on a collection via evaluate, incremental accumulated calculate,
and incremental rolling calculate.  This way we can simplify the main
statistic classes by making them decorators, keep backwards compatibility in
the API for all the statistics, and delegate the three different calculation
types to specialized objects rather than cluttering things up.  Maybe there
are also better names ... like staticEvaluate,  accumulatedEvaluate,
rollingEvaluate..or likewise for ___Calculate ...bBut that would violate
backward compatibility.

One more patch-generation point.  I notice that the patch includes

> some variable name changes and other stylistic changes to existing
> code.  While there is nothing wrong with suggesting this kind of
> change, we try to separate the style / formatting changes from the
> changes that introduce new features or fix bugs.  That makes the diffs
> and change logs easier to read.  So it would be good to remove those
> changes from the patch.
>
> Checkstyle will flag this, but two other little things I noticed are
> the presence of tabs (we use spaces in place of tabs) and if - then -
> else with no braces (we like braces).



>>>No problem, I will make these changes, add javadoc and resubmit the
patch.

Thanks again for your interest and contributions!

>
> Phil
>
> On 10/11/07, Bradford Cross <[hidden email]> wrote:
> > Cool - first patch finally submitted. :-)
> >
> > On 10/6/07, Phil Steitz <[hidden email]> wrote:
> > >
> > > On 10/3/07, Bradford Cross <[hidden email]> wrote:
> > > > OK, I have created a patch...I tried to follow the instructions to
> file
> > > a
> > > > bug on bugzilla but i can't seem to find the right place to file a
> new
> > > bug
> > > > to either commons or commons math.
> > > >
> > > > I wonder if someone could help me out.
> > > >
> > >
> > > Sorry for the response latency and sorry if you were led to Bugzilla
> > > by the incorrect link that I just noticed on
> > > http://commons.apache.org/math/developers.html.  I will fix that.
> > >
> > > We now use Jira for issue tracking / patch submission.  Here is a link
> > > to the commons math issues page:
> > >
> > > http://commons.apache.org/math/issue-tracking.html
> > >
> > > Phil
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Bradford Cross
I am fixing these issues tonight and finishing up the ranking statistics.

I still need to optimize the sorted deque structure that I created for
RollingPercentile that uses a linked list and hte ResizeableDoubleArray.  I
will create a patch thhat includes the first version while I work an a more
performant and less hackish implimentaiton of htis data structure - when I
submit the patch please have a look and give me any suggestions.

I notice that the Moment statistics, FirstMoment, SecondMoment and their
wrappers Mean, Variance, etc. all extend from
AbstractStatelessUnivariateStatistic and generally use the increment()
calcualtion to produce the value returned by the evaluate() function.  This
is right in line with the abstractions I am talking about in my last email -
we have a few different kinds of calculations; static (pass in an array, get
back a value), incremental-accumulating (keep passing in values, keep
updating calcualtion), incremental-rolling (keep passing in values, removing
data outside the lag window, and updating calculation.)  I propose to
continue the abstractions in this direction, maintain backward
compatability, and change increment() from void to double return type as I
mention in the previous email.

In the case of ranking statistics, there are currently no impliemntations
for incrent() - I propose to add those.  Also, while I was working on the
rank statistics tonight I realized that we have two different cases for
incremental rank statistics. 1) You want to use Percentile and specify a
quantile in the constructor, you are interested in repeated calls to
increment() for the same quantile (f.ex. min, max, median.)  2) You want to
use Perceltile with incremental updates but without specifying a quantile in
the constructor, by executing repeated calls on increment() and calling
getValue with a quantile argument; this is useful if you need, f.ex. both
min and max of the same data, there is no need to create multiple instances
of Percentile.  This si implimented in the current Percentile with
setQuantile() ... again we can keep taht for backward compatability and just
have a convienience method for getValue(double quantile).

Thoughts?



On 10/12/07, Bradford Cross <[hidden email]> wrote:

>
>
>
> On 10/12/07, Phil Steitz <[hidden email]> wrote:
> >
> > Thanks!
> >
> > Had a quick look and have a couple of comments.
> >
> > First, have a look at the developers guide (linked on the [math]
> > website) for style and other guidlines and if you are not set up yet
> > with maven, let me know if you  need help getting set up so you can
> > get checkstyle and pmd reports.
>
>
> >>>Sure.
>
> We should probably talk a little about the API - i.e., what the method
> > names actually mean - and this all needs to be specified in the method
> > javadoc.  Commons math is as much about the API as the
> > implementations, so we try to be pretty careful about getting the
> > names and semantics right.  Like other commons components, we have an
> > implicit commitment to maintain backward compatibility, so that makes
> > it even more important to get the APIs right.  So can you describe
> > what exactly the public methods mean and why they are named as they
> > are?
>
>
> >>>I agree.  calculate() can be renamed to increment() for consistency
> with the rest of the code base.  I also suggest that all increment() methods
> return the double value of the incremental calculation rather than void,
> which forces the client code to do statistic.increment(), and
> statistic.getResult() - the change doesn't violate backward compatibility
> and makes client code use of the API simpler and more intuitive.  Also note
> that evaluate() methods return double, so this change would make calculation
> methods consistent.  On this note, I also think it is better to decompose
> each method of calculation into a different object.  There is static
> calculation on a collection via evaluate, incremental accumulated calculate,
> and incremental rolling calculate.  This way we can simplify the main
> statistic classes by making them decorators, keep backwards compatibility in
> the API for all the statistics, and delegate the three different calculation
> types to specialized objects rather than cluttering things up.  Maybe there
> are also better names ... like staticEvaluate,  accumulatedEvaluate,
> rollingEvaluate..or likewise for ___Calculate ...bBut that would violate
> backward compatibility.
>
> One more patch-generation point.  I notice that the patch includes
> > some variable name changes and other stylistic changes to existing
> > code.  While there is nothing wrong with suggesting this kind of
> > change, we try to separate the style / formatting changes from the
> > changes that introduce new features or fix bugs.  That makes the diffs
> > and change logs easier to read.  So it would be good to remove those
> > changes from the patch.
> >
> > Checkstyle will flag this, but two other little things I noticed are
> > the presence of tabs (we use spaces in place of tabs) and if - then -
> > else with no braces (we like braces).
>
>
>
> >>>No problem, I will make these changes, add javadoc and resubmit the
> patch.
>
> Thanks again for your interest and contributions!
> >
> > Phil
> >
> > On 10/11/07, Bradford Cross <[hidden email]> wrote:
> > > Cool - first patch finally submitted. :-)
> > >
> > > On 10/6/07, Phil Steitz <[hidden email]> wrote:
> > > >
> > > > On 10/3/07, Bradford Cross < [hidden email]> wrote:
> > > > > OK, I have created a patch...I tried to follow the instructions to
> > file
> > > > a
> > > > > bug on bugzilla but i can't seem to find the right place to file a
> > new
> > > > bug
> > > > > to either commons or commons math.
> > > > >
> > > > > I wonder if someone could help me out.
> > > > >
> > > >
> > > > Sorry for the response latency and sorry if you were led to Bugzilla
> >
> > > > by the incorrect link that I just noticed on
> > > > http://commons.apache.org/math/developers.html.  I will fix that.
> > > >
> > > > We now use Jira for issue tracking / patch submission.  Here is a
> > link
> > > > to the commons math issues page:
> > > >
> > > > http://commons.apache.org/math/issue-tracking.html
> > > >
> > > > Phil
> > > > >
> > > >
> > > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Math] rolling calculations with lag

Bradford Cross
New patch for rolling stats is attached to ticket.  Notes are included.
Please review and get back to me.

On 10/13/07, Bradford Cross <[hidden email]> wrote:

>
> I am fixing these issues tonight and finishing up the ranking statistics.
>
>
> I still need to optimize the sorted deque structure that I created for
> RollingPercentile that uses a linked list and hte ResizeableDoubleArray.  I
> will create a patch thhat includes the first version while I work an a more
> performant and less hackish implimentaiton of htis data structure - when I
> submit the patch please have a look and give me any suggestions.
>
> I notice that the Moment statistics, FirstMoment, SecondMoment and their
> wrappers Mean, Variance, etc. all extend from
> AbstractStatelessUnivariateStatistic and generally use the increment()
> calcualtion to produce the value returned by the evaluate() function.  This
> is right in line with the abstractions I am talking about in my last email -
> we have a few different kinds of calculations; static (pass in an array, get
> back a value), incremental-accumulating (keep passing in values, keep
> updating calcualtion), incremental-rolling (keep passing in values, removing
> data outside the lag window, and updating calculation.)  I propose to
> continue the abstractions in this direction, maintain backward
> compatability, and change increment() from void to double return type as I
> mention in the previous email.
>
> In the case of ranking statistics, there are currently no impliemntations
> for incrent() - I propose to add those.  Also, while I was working on the
> rank statistics tonight I realized that we have two different cases for
> incremental rank statistics. 1) You want to use Percentile and specify a
> quantile in the constructor, you are interested in repeated calls to
> increment() for the same quantile ( f.ex. min, max, median.)  2) You want
> to use Perceltile with incremental updates but without specifying a quantile
> in the constructor, by executing repeated calls on increment() and calling
> getValue with a quantile argument; this is useful if you need, f.ex. both
> min and max of the same data, there is no need to create multiple instances
> of Percentile.  This si implimented in the current Percentile with
> setQuantile() ... again we can keep taht for backward compatability and just
> have a convienience method for getValue(double quantile).
>
> Thoughts?
>
>
>
> On 10/12/07, Bradford Cross <[hidden email]> wrote:
> >
> >
> >
> > On 10/12/07, Phil Steitz <[hidden email]> wrote:
> > >
> > > Thanks!
> > >
> > > Had a quick look and have a couple of comments.
> > >
> > > First, have a look at the developers guide (linked on the [math]
> > > website) for style and other guidlines and if you are not set up yet
> > > with maven, let me know if you  need help getting set up so you can
> > > get checkstyle and pmd reports.
> >
> >
> > >>>Sure.
> >
> > We should probably talk a little about the API - i.e., what the method
> > > names actually mean - and this all needs to be specified in the method
> > > javadoc.  Commons math is as much about the API as the
> > > implementations, so we try to be pretty careful about getting the
> > > names and semantics right.  Like other commons components, we have an
> > > implicit commitment to maintain backward compatibility, so that makes
> > > it even more important to get the APIs right.  So can you describe
> > > what exactly the public methods mean and why they are named as they
> > > are?
> >
> >
> > >>>I agree.  calculate() can be renamed to increment() for consistency
> > with the rest of the code base.  I also suggest that all increment() methods
> > return the double value of the incremental calculation rather than void,
> > which forces the client code to do statistic.increment(), and
> > statistic.getResult() - the change doesn't violate backward
> > compatibility and makes client code use of the API simpler and more
> > intuitive.  Also note that evaluate() methods return double, so this change
> > would make calculation methods consistent.  On this note, I also think it is
> > better to decompose each method of calculation into a different object.
> > There is static calculation on a collection via evaluate, incremental
> > accumulated calculate, and incremental rolling calculate.  This way we can
> > simplify the main statistic classes by making them decorators, keep
> > backwards compatibility in the API for all the statistics, and delegate the
> > three different calculation types to specialized objects rather than
> > cluttering things up.  Maybe there are also better names ... like
> > staticEvaluate,  accumulatedEvaluate,  rollingEvaluate..or likewise for
> > ___Calculate ...bBut that would violate backward compatibility.
> >
> > One more patch-generation point.  I notice that the patch includes
> > > some variable name changes and other stylistic changes to existing
> > > code.  While there is nothing wrong with suggesting this kind of
> > > change, we try to separate the style / formatting changes from the
> > > changes that introduce new features or fix bugs.  That makes the diffs
> > > and change logs easier to read.  So it would be good to remove those
> > > changes from the patch.
> > >
> > > Checkstyle will flag this, but two other little things I noticed are
> > > the presence of tabs (we use spaces in place of tabs) and if - then -
> > > else with no braces (we like braces).
> >
> >
> >
> > >>>No problem, I will make these changes, add javadoc and resubmit the
> > patch.
> >
> > Thanks again for your interest and contributions!
> > >
> > > Phil
> > >
> > > On 10/11/07, Bradford Cross <[hidden email] > wrote:
> > > > Cool - first patch finally submitted. :-)
> > > >
> > > > On 10/6/07, Phil Steitz <[hidden email]> wrote:
> > > > >
> > > > > On 10/3/07, Bradford Cross < [hidden email]> wrote:
> > > > > > OK, I have created a patch...I tried to follow the instructions
> > > to file
> > > > > a
> > > > > > bug on bugzilla but i can't seem to find the right place to file
> > > a new
> > > > > bug
> > > > > > to either commons or commons math.
> > > > > >
> > > > > > I wonder if someone could help me out.
> > > > > >
> > > > >
> > > > > Sorry for the response latency and sorry if you were led to
> > > Bugzilla
> > > > > by the incorrect link that I just noticed on
> > > > > http://commons.apache.org/math/developers.html .  I will fix that.
> > > > >
> > > > > We now use Jira for issue tracking / patch submission.  Here is a
> > > link
> > > > > to the commons math issues page:
> > > > >
> > > > > http://commons.apache.org/math/issue-tracking.html
> > > > >
> > > > > Phil
> > > > > >
> > > > >
> > > > >
> > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [hidden email]
> > > > > For additional commands, e-mail: [hidden email]
> > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>