[GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

Virendra singh Rajpurohit
Hi all,
Hope you all are doing well, I had a discussion  on Slack with my GSoC
mentors regarding this variable initiation. I'm posting it on ML for more
opinions.

*Should the variables like mean be initiated with NaN or 0?*
Because, definitional formula of mean is,
    mean = (sum of values)/n
    Hence for  n=0 it is 0/0 which is NaN
But also Java's SummaryStatistics classes(Double, Long & Int) return
average=0 for n=0.
As discussed on slack, "The initialization should not set the initial value
to NaN. This is a convenience to make getMean() faster. This is likely to
cause fewer problems than NaN when used in downstream computations".
Assigning '0' will make things faster because if condition to check n value
will be removed in calculation and assigning 'NaN' will be more correct.
*Alex Herbert* suggested NaN can be used in getMean() method with if
condition to check 'n' value, that way we don't check condition everytime a
value is added.
What are your opinions about it?

--
*Virendra Singh Rajpurohit*
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

Eric Barnhill
Hi Virenda,

I think that's right in terms of initialization. If it is initialized to
NaN then accumulation will require an additional step getting rid of the
NaN. Just initialize to zero.

I just looked around and it's pretty clear that it is best practice to
return NaN in the edge case of an average of no values. That is what
happens in Python when calling numpy.mean([]) and in R when calling
mean(c()) , and that is also mathematically right.

So, and I think this is a step that could be saved until after the
milestone, a check for zero values and returning NaN in that case should
probably be somehow implemented. But in terms of under the hood initialize
to zero.




On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit <
[hidden email]> wrote:

> Hi all,
> Hope you all are doing well, I had a discussion  on Slack with my GSoC
> mentors regarding this variable initiation. I'm posting it on ML for more
> opinions.
>
> *Should the variables like mean be initiated with NaN or 0?*
> Because, definitional formula of mean is,
>     mean = (sum of values)/n
>     Hence for  n=0 it is 0/0 which is NaN
> But also Java's SummaryStatistics classes(Double, Long & Int) return
> average=0 for n=0.
> As discussed on slack, "The initialization should not set the initial value
> to NaN. This is a convenience to make getMean() faster. This is likely to
> cause fewer problems than NaN when used in downstream computations".
> Assigning '0' will make things faster because if condition to check n value
> will be removed in calculation and assigning 'NaN' will be more correct.
> *Alex Herbert* suggested NaN can be used in getMean() method with if
> condition to check 'n' value, that way we don't check condition everytime a
> value is added.
> What are your opinions about it?
>
> --
> *Virendra Singh Rajpurohit*
>
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

Alex Herbert


> On 19 Jul 2019, at 20:57, Eric Barnhill <[hidden email]> wrote:
>
> Hi Virenda,
>
> I think that's right in terms of initialization. If it is initialized to
> NaN then accumulation will require an additional step getting rid of the
> NaN. Just initialize to zero.

+1

Initialisation with zero will allow the accumulating function to be free of checks.


>
> I just looked around and it's pretty clear that it is best practice to
> return NaN in the edge case of an average of no values. That is what
> happens in Python when calling numpy.mean([]) and in R when calling
> mean(c()) , and that is also mathematically right.

+1

In-line with other libraries. It is also in-line with java which will throw an ArithmeticException for 0 / 0 and return NaN for 0.0 / 0.0.

>
> So, and I think this is a step that could be saved until after the
> milestone, a check for zero values and returning NaN in that case should
> probably be somehow implemented. But in terms of under the hood initialize
> to zero.

The code just needs to move the logic for checking if there are any values (count > 0) into the getMean() method and return appropriately. This should be added to the contract of Mean by putting into the Javadoc and adding a test to ensure it does work.


>
>
>
>
> On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit <
> [hidden email]> wrote:
>
>> Hi all,
>> Hope you all are doing well, I had a discussion  on Slack with my GSoC
>> mentors regarding this variable initiation. I'm posting it on ML for more
>> opinions.
>>
>> *Should the variables like mean be initiated with NaN or 0?*
>> Because, definitional formula of mean is,
>>    mean = (sum of values)/n
>>    Hence for  n=0 it is 0/0 which is NaN
>> But also Java's SummaryStatistics classes(Double, Long & Int) return
>> average=0 for n=0.
>> As discussed on slack, "The initialization should not set the initial value
>> to NaN. This is a convenience to make getMean() faster. This is likely to
>> cause fewer problems than NaN when used in downstream computations".
>> Assigning '0' will make things faster because if condition to check n value
>> will be removed in calculation and assigning 'NaN' will be more correct.
>> *Alex Herbert* suggested NaN can be used in getMean() method with if
>> condition to check 'n' value, that way we don't check condition everytime a
>> value is added.
>> What are your opinions about it?
>>
>> --
>> *Virendra Singh Rajpurohit*
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

Virendra singh Rajpurohit
>
> The code just needs to move the logic for checking if there are any values
> (count > 0) into the getMean() method and return appropriately. This should
> be added to the contract of Mean by putting into the Javadoc and adding a
> test to ensure it does work.
>

Hi Alex,
I'm sorry but I don't get what you mean by, "This should be added to the
*contract* of Mean" here.
Please elaborate.
Thanks


Warm Regards
--
*Virendra Singh Rajpurohit*
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

Alex Herbert
On Sat, 20 Jul 2019, 02:00 Virendra singh Rajpurohit, <
[hidden email]> wrote:

> >
> > The code just needs to move the logic for checking if there are any
> values
> > (count > 0) into the getMean() method and return appropriately. This
> should
> > be added to the contract of Mean by putting into the Javadoc and adding a
> > test to ensure it does work.
> >
>
> Hi Alex,
> I'm sorry but I don't get what you mean by, "This should be added to the
> *contract* of Mean" here.
> Please elaborate.
> Thanks
>

The 'contract' is the specification of the behaviour of the code. Someone
should not have to read the code to know what it does. The javadoc should
contain all behaviour that the user can expect.


>
> Warm Regards
> --
> *Virendra Singh Rajpurohit*
>