# [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

5 messages
Open this post in threaded view
|

## [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

 Hi all, Hope you all are doing well, I had a discussion  on Slack with my GSoC mentors regarding this variable initiation. I'm posting it on ML for more opinions. *Should the variables like mean be initiated with NaN or 0?* Because, definitional formula of mean is,     mean = (sum of values)/n     Hence for  n=0 it is 0/0 which is NaN But also Java's SummaryStatistics classes(Double, Long & Int) return average=0 for n=0. As discussed on slack, "The initialization should not set the initial value to NaN. This is a convenience to make getMean() faster. This is likely to cause fewer problems than NaN when used in downstream computations". Assigning '0' will make things faster because if condition to check n value will be removed in calculation and assigning 'NaN' will be more correct. *Alex Herbert* suggested NaN can be used in getMean() method with if condition to check 'n' value, that way we don't check condition everytime a value is added. What are your opinions about it? -- *Virendra Singh Rajpurohit*
Open this post in threaded view
|

## Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

 Hi Virenda, I think that's right in terms of initialization. If it is initialized to NaN then accumulation will require an additional step getting rid of the NaN. Just initialize to zero. I just looked around and it's pretty clear that it is best practice to return NaN in the edge case of an average of no values. That is what happens in Python when calling numpy.mean([]) and in R when calling mean(c()) , and that is also mathematically right. So, and I think this is a step that could be saved until after the milestone, a check for zero values and returning NaN in that case should probably be somehow implemented. But in terms of under the hood initialize to zero. On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit < [hidden email]> wrote: > Hi all, > Hope you all are doing well, I had a discussion  on Slack with my GSoC > mentors regarding this variable initiation. I'm posting it on ML for more > opinions. > > *Should the variables like mean be initiated with NaN or 0?* > Because, definitional formula of mean is, >     mean = (sum of values)/n >     Hence for  n=0 it is 0/0 which is NaN > But also Java's SummaryStatistics classes(Double, Long & Int) return > average=0 for n=0. > As discussed on slack, "The initialization should not set the initial value > to NaN. This is a convenience to make getMean() faster. This is likely to > cause fewer problems than NaN when used in downstream computations". > Assigning '0' will make things faster because if condition to check n value > will be removed in calculation and assigning 'NaN' will be more correct. > *Alex Herbert* suggested NaN can be used in getMean() method with if > condition to check 'n' value, that way we don't check condition everytime a > value is added. > What are your opinions about it? > > -- > *Virendra Singh Rajpurohit* >
Open this post in threaded view
|

## Re: [GSoC][Commons][Statistics][Descriptive] Mean should be initiated with 0 or NaN ?

 > On 19 Jul 2019, at 20:57, Eric Barnhill <[hidden email]> wrote: > > Hi Virenda, > > I think that's right in terms of initialization. If it is initialized to > NaN then accumulation will require an additional step getting rid of the > NaN. Just initialize to zero. +1 Initialisation with zero will allow the accumulating function to be free of checks. > > I just looked around and it's pretty clear that it is best practice to > return NaN in the edge case of an average of no values. That is what > happens in Python when calling numpy.mean([]) and in R when calling > mean(c()) , and that is also mathematically right. +1 In-line with other libraries. It is also in-line with java which will throw an ArithmeticException for 0 / 0 and return NaN for 0.0 / 0.0. > > So, and I think this is a step that could be saved until after the > milestone, a check for zero values and returning NaN in that case should > probably be somehow implemented. But in terms of under the hood initialize > to zero. The code just needs to move the logic for checking if there are any values (count > 0) into the getMean() method and return appropriately. This should be added to the contract of Mean by putting into the Javadoc and adding a test to ensure it does work. > > > > > On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit < > [hidden email]> wrote: > >> Hi all, >> Hope you all are doing well, I had a discussion  on Slack with my GSoC >> mentors regarding this variable initiation. I'm posting it on ML for more >> opinions. >> >> *Should the variables like mean be initiated with NaN or 0?* >> Because, definitional formula of mean is, >>    mean = (sum of values)/n >>    Hence for  n=0 it is 0/0 which is NaN >> But also Java's SummaryStatistics classes(Double, Long & Int) return >> average=0 for n=0. >> As discussed on slack, "The initialization should not set the initial value >> to NaN. This is a convenience to make getMean() faster. This is likely to >> cause fewer problems than NaN when used in downstream computations". >> Assigning '0' will make things faster because if condition to check n value >> will be removed in calculation and assigning 'NaN' will be more correct. >> *Alex Herbert* suggested NaN can be used in getMean() method with if >> condition to check 'n' value, that way we don't check condition everytime a >> value is added. >> What are your opinions about it? >> >> -- >> *Virendra Singh Rajpurohit* >> --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]