# [statistics][descriptive] Classes or static methods for common descriptive statistics?

 The previous commons-math interface for descriptive statistics used a paradigm of constructing classes for various statistical functions and calling evaluate(). Example Mean mean = new Mean(); double mn = mean.evaluate(double[]) I wrote this type of code all through grad school and always found it unnecessarily bulky.  To me these summary statistics are classic use cases for static methods: double mean .= Mean.evaluate(double[]) I don't have any particular problem with the evaluate() syntax. I looked over the old Math 4 API to see if there were any benefits to the previous class-oriented approach that we might not want to lose. But I don't think there were, the functionality outside of evaluate() is minimal. Finally we should consider whether we really need a separate class for each statistic at all. Do we want to call: Mean.evaluate() or SummaryStats.mean() or maybe Stats.mean() ? The last being nice and compact. Let's make a decision so our esteemed mentee Virendra knows in what direction to take his work this summer. :)
## Re: [statistics][descriptive] Classes or static methods for common descriptive statistics?

 > On 28 May 2019, at 18:09, Eric Barnhill <[hidden email]> wrote:
>
> The previous commons-math interface for descriptive statistics used a
> paradigm of constructing classes for various statistical functions and
> calling evaluate(). Example
>
> Mean mean = new Mean();
> double mn = mean.evaluate(double[])
>
> I wrote this type of code all through grad school and always found it
> unnecessarily bulky.  To me these summary statistics are classic use cases
> for static methods:
>
> double mean .= Mean.evaluate(double[])
>
> I don't have any particular problem with the evaluate() syntax.
>
> I looked over the old Math 4 API to see if there were any benefits to the
> previous class-oriented approach that we might not want to lose. But I
> don't think there were, the functionality outside of evaluate() is minimal.

A quick check shows that evaluate comes from UnivariateStatistic. This has some more methods that add little to an instance view of the computation:

double evaluate(double[] values) throws MathIllegalArgumentException;
double evaluate(double[] values, int begin, int length) throws MathIllegalArgumentException;
UnivariateStatistic copy();

However it is extended by StorelessUnivariateStatistic which adds methods to update the statistic:

void increment(double d);
void incrementAll(double[] values) throws MathIllegalArgumentException;
void incrementAll(double[] values, int start, int length) throws MathIllegalArgumentException;
double getResult();
long getN();
void clear();
StorelessUnivariateStatistic copy();

This type of functionality would be lost by static methods. If you are moving to a functional interface type pattern for each statistic then you will lose the other functionality possible with an instance state, namely updating with more values or combining instances.

So this is a question of whether updating a statistic is required after the first computation. Will there be an alternative in the library for a map-reduce type operation using instances that can be combined using Stream.collect:

    R collect(Supplier supplier,
                  ObjDoubleConsumer accumulator,
                  BiConsumer combiner);

Here would be Mean:

double mean = Arrays.stream(new double[1000]).collect(Mean::new, Mean::add, Mean::add).getMean()

with:

void add(double);
void add(Mean);
double getMean();

(Untested code)

>
> Finally we should consider whether we really need a separate class for each
> statistic at all. Do we want to call:
>
> Mean.evaluate()
>
> or
>
> SummaryStats.mean()
>
> or maybe
>
> Stats.mean() ?
>
> The last being nice and compact.
>
> Let's make a decision so our esteemed mentee Virendra knows in what
> direction to take his work this summer. :)
## Re: [statistics][descriptive] Classes or static methods for common descriptive statistics?

