[Statistics]Port codes from Commons Math

classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Github usage

Gilles Sadowski

Thanks for all the suggestions.
I see that there are things to try before bothering INFRA. ;-)

Regards,
Gilles

On Thu, 15 Mar 2018 14:03:35 -0400, Otto Fowler wrote:

>
> https://github.com/apache/metron/tree/master/dev-utilities/committer-utils
> if you just want the bash
>
>
> On March 15, 2018 at 13:51:00, ajs6f ([hidden email]) wrote:
>
> One gotcha that has bit me before-- if the PR isn't rebased over the
> current master (assuming you are merging into master) it may still be
> merge-able because maybe there aren't any conflicts. (E.g. maybe no
> one has
> worked on that section of the codebase since the PR's branch was
> branched.)
>
> But if you merge without rebasing, Apache's mirroring won't realize
> that
> the PR should be closed (as I understand it, because the commits will
> have
> different hashes since they are diffs between different places on the
> tree). So best to rebase if needed, but you forget and this happens
> to you,
> you can still rebase and force-push the PR branch, and then Apache's
> mirroring will catch up and close the PR "posthumously". Or of course
> you
> can always close it manually on Github.
>
> I make this mistake about once a month or so. :(
>
> ajs6f
>
>> On Mar 14, 2018, at 12:27 PM, Matt Sicker <[hidden email]> wrote:
>>
>> When you have a GitHub origin, you can checkout pulls/42/head to
>> check
> out
>> PR#42. You can pull/merge from that branch as well to merge the PR
>> (by
>> committing and pushing that merge, GitHub will notice and mark the
>> PR as
>> merged). You can also use the "hub" command line tool that GitHub
> publishes
>> which adds a bunch of convenience commands to do the same thing.
>>
>> On 14 March 2018 at 10:19, Gilles <[hidden email]>
>> wrote:
>>
>>> Hi.
>>>
>>> On Wed, 14 Mar 2018 14:16:42 +0000, Otto Fowler wrote:
>>>
>>>> I should be more specific, this is for looking at github pr’s.
>>>> So if your submitters are forking, submitting prs on github.
>>>>
>>>> We also have scripts for committing, but we are doing git ->
>>>> github
> mirror
>>>>
>>>
>>> My knowledge of "git" is small; my knowledge of GitHub smaller
>>> (and zero for functionalities that require being logged in). :-}
>>>
>>> Assuming a "git" repository (where "origin" is on an Apache server)
>>> with a local "clone" (i.e. on my machine), is it possible to create
>>> a branch, say "gimo_work", such that
>>>
>>> $ git checkout gimo_work
>>> $ git ... ? ... (equivalent to "pull" wrt "origin")
>>>
>>> will retrieve the latest Gimo's commits on the fork made
>>> from the Apache repository?
>>>
>>> Gilles
>>>
>>> On March 14, 2018 at 10:15:04, Otto Fowler
>>> ([hidden email])
>>>> wrote:
>>>>
>>>> We have script to help reviewers checkout PR’s in git, either in
>>>> their
> own
>>>> repo
>>>> or just doing it in ~/tmp or something into a new repo.
>>>>
>>>> So, I would run:
>>>>
>>>> checkout-pr 999
>>>>>
>>>>
>>>> in the tmp directory, and end up with a local version that I can
>>>> then
>>>> build
>>>> and do whatever with.
>>>> would that help?
>>>>
>>>>
>>>> On March 14, 2018 at 10:08:47, Gilles
>>>> ([hidden email])
>>>> wrote:
>>>>
>>>> On Tue, 13 Mar 2018 11:43:17 -0400, ajs6f wrote:
>>>>
>>>>> On Mar 13, 2018, at 11:20 AM, Gilles
>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> I didn't find it very easy to cooperate with developers who fork
>>>>>> on
>>>>>> GitHub and submit PRs. I've now found the "git" command that
>>>>>> creates
> a
>>>>>> branch from a PR, but it would be so much more comfortable to
>>>>>> just
>>>>>> switch directory and do "git pull".
>>>>>>
>>>>>>
>>>>> Just as a point of information, it is possible to reverse the
>>>>> Github
>>>>> <- Apache mirroring most projects use to be Github -> Apache.
>>>>>
>>>>
>>>> It seems that a good-enough-for-me solution would be to "clone"
>>>> (on my local system) the repository forked by the GSoC
>>>> participant.
>>>>
>>>> Does it make sense?
>>>>
>>>> Thanks,
>>>> Gilles
>>>>
>>>> What
>>>>> that means is that merging PRs from Github becomes one click in
>>>>> the
>>>>> Github UI.
>>>>>
>>>>> There are other consequences, of course, especially related to
>>>>> other
>>>>> integrations Commons may be using (e.g. integration between
>>>>> Github
>>>>> and
>>>>> JIRA).
>>>>>
>>>>> Of course, INFRA are the folks to talk to if this sounds
>>>>> interesting.
>>>>> At Apache Jena, we looked into it but have taken no action
>>>>> because we
>>>>> still have some open questions about when some of our workflow
>>>>> integrations will become possible with "reversed mirroring".
>>>>>
>>>>> Adam Soroka ; [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
In reply to this post by Gilles Sadowski
Hi devs,

Sorry for the delayed reply due to my academics.


> If you want to start playing with the code, we could just begin
> by having discussions here (on design) and on JIRA (for processing
> minor issues) based on the current state of your repository.
> [What's the link to look it up?]
>

Should I create my own repo and start code in there?[Not in the forked repo]

Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
guide me more. Like, to give me some minor issues in the current
implementation to solve or as a new feature implementation and gradually we
can go for deeper and eventually I can go further my my own way.  Then I
can gradually familiar with the code and I think it is the most efficient
way to learn the design architecture.[I spent hours to understand the
current code basis and I felt that was not so efficient as I thought]

And if there is a format of Proposal regarding ASF ? If not what should I
mention in the proposal basically?

Best Regards,




On 14 March 2018 at 19:07, Gilles <[hidden email]> wrote:

> Hi.
>
> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>
>> Hello Devs,
>>
>> Thanks Gilles and Eric for guidance.
>>
>> I have cloned the Commons repos and forked the Common's Stat repo. Is it
>> possible to make pull requests to that repo to be reviewed?
>>
>
> That's certainly possible, but I'm afraid that it will become
> quite unwieldy from my side if I have to delete/create branches
> for every PR.
>
> If you want to start playing with the code, we could just begin
> by having discussions here (on design) and on JIRA (for processing
> minor issues) based on the current state of your repository.
> [What's the link to look it up?]
>
> Or should I
>> follow a specific method?
>>
>
> I'll inquire about a more efficient method (than the above)...
>
> By referring the API docs I got some idea of the separation of modules.
>>
>> In the current Commons's stat repo there are some classes under the
>> package  distribution. I think those can be refactored using java 8 in
>> build statistics functionalities. Please correct me if I wrong.
>>
>
> An example perhaps?
>
> As Eric said separation of function and streaming implementations is good
>> idea as designing. (In my point of view, it means method overloading ->
>> Again correct me if I didn't understand your fact correctly)
>>
>
> ?
>
> And I will share my draft proposal here for your review soon.
>>
>
> OK.
>
> Thanks again for your interest,
> Gilles
>
>
>
>> Best Regards.
>>
>> On 13 March 2018 at 20:50, Gilles <[hidden email]> wrote:
>>
>> Hello.
>>>
>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>
>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles <[hidden email]>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>> Where can we find the old code before port into new Commons components?
>>>>>>
>>>>>>
>>>>>> The code bases are managed by the "git" software; the whole history is
>>>>> available:
>>>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>>>>>
>>>>> [I'd advise to "clone" the repositories on your local computer, and
>>>>> use the command line tools.]
>>>>>
>>>>>
>>>>
>>>> I believe you will want to clone the commons-math repositories, but then
>>>> develop your own "fork" of the commons-statistics repository. Gilles can
>>>> correct me if that is wrong.
>>>>
>>>>
>>> Actually, I know only my workflow:
>>>  $ git clone ...
>>>  $ git branch ...
>>>  $ git commit ...
>>>  $ git push
>>>
>>> :-}
>>>
>>> I didn't find it very easy to cooperate with developers who
>>> fork on GitHub and submit PRs.
>>> I've now found the "git" command that creates a branch from
>>> a PR, but it would be so much more comfortable to just switch
>>> directory and do "git pull".
>>>
>>> In the context of GSoC, would it be possible to grant some
>>> privilege to non-committers so that they can update a selected
>>> "git" repository?
>>> If not, what is the next easiest way to share a "common space"
>>> (aka "sandbox") from which it would be easy to copy reviewed
>>> bits over to the official source repository?
>>>
>>>
>>> As
>>>>>
>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>
>>>>>>
>>>>>> You don't necessarily need to analyze how the code was before
>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>> unless you suspect that something is wrong now and might have
>>>>> been better before. ;-)
>>>>>
>>>>>
>>>>> In particular, the statistics library was designed before Java 8. Java
>>>> 8
>>>> however has provided both efficient programming strategies for these
>>>> statistical methods (in the form of lambdas and streams) as well as some
>>>> built-in methods providing summary statistics functions (see discussion
>>>> at
>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>
>>>>
>>> Very good point, indeed.
>>> IMO, the new component should be targeted Java 8.
>>> Even Java 9 (enforcing modularity with JPMS): if by the time we think
>>> of releasing the code, we still want to avoid "multi-release" JARs it
>>> will be easy to just remove the "module-info" files (I don't think much
>>> else Java 9 specific would used by "Commons Statistics").
>>>
>>> In fact, given the very slow pace at which new components are being
>>> brought to releasable state, I'd like to ask whether it would be OK
>>> to make "incremental" releases?  That would mean: focus on (maven)
>>> modules that seem close to feature-complete and bug-free, fix the
>>> remaining issues and perform a release with that module added.
>>>
>>> It seems that the expectations were set to high (content-wise given
>>> the amount of human resources), so that neither CM can be released
>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off that
>>> contains many modules, some of which are blocked by lack of consensus
>>> or dangling discussions.
>>>
>>> It probably makes sense, as a design strategy, to separate the function
>>>
>>>> implementation from the streaming implementation. For example, a 2D
>>>> integer
>>>> array will probably require a different streaming implementation than a
>>>> 1D
>>>> double array, but they can  probably both be passed the same function
>>>> handle to collect, say, the mean or max value.
>>>>
>>>> The role of commons might then be to provide a convenient interface, so
>>>> that the user can simply call a static method like SummaryStats.mean()
>>>> and
>>>> not have to worry about the implementation.
>>>>
>>>> The other difficulty I see, is that quantile and median statistics will
>>>> not
>>>> be as easy to stream as statistics with a closed-form solution like mean
>>>> or
>>>> variance. There may however be great algorithms out there for pulling
>>>> the
>>>> median or the 95% quantile out of a stream -- if so they should be used.
>>>>
>>>> Eric
>>>>
>>>>
>>> Eric,
>>>
>>> Would you be the official "mentor" for the GSoC participants that
>>> are interested in helping with the porting of "o.a.c.math4.stat"?
>>>
>>> Thank you,
>>> Gilles
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gilles Sadowski
Hi.

On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:

> Hi devs,
>
> Sorry for the delayed reply due to my academics.
>
>
>> If you want to start playing with the code, we could just begin
>> by having discussions here (on design) and on JIRA (for processing
>> minor issues) based on the current state of your repository.
>> [What's the link to look it up?]
>>
>
> Should I create my own repo and start code in there?[Not in the
> forked repo]

What's the difference?  IOW, someone else should answer. :-}

> Actually it will be more helpful to me if someone [ @Gilles or @Eric
> ] can
> guide me more. Like, to give me some minor issues in the current
> implementation to solve or as a new feature implementation and
> gradually we
> can go for deeper

IMO, the top priority would be to release "Commons Numbers":
   http://commons.apache.org/proper/commons-numbers/

There are some blocking issues on JIRA:
   https://issues.apache.org/jira/projects/NUMBERS

> and eventually I can go further my my own way.  Then I
> can gradually familiar with the code and I think it is the most
> efficient
> way to learn the design architecture.[I spent hours to understand the
> current code basis and I felt that was not so efficient as I thought]

Refactoring the package "stat" is not straightforward...
However, to get to that, it would be useful to record your thoughts
as you browse through the code(s): what seems easy to port, what should
be changed/fixed, what you don't understand, and so on.

>
> And if there is a format of Proposal regarding ASF ?

I don't think so.  This ML is the forum where project directions
are discussed.

> If not what should I
> mention in the proposal basically?

This can be a work in progress, I think (see above suggestions).

Best regards,
Gilles

>
> Best Regards,
>
>
>
>
> On 14 March 2018 at 19:07, Gilles <[hidden email]>
> wrote:
>
>> Hi.
>>
>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>
>>> Hello Devs,
>>>
>>> Thanks Gilles and Eric for guidance.
>>>
>>> I have cloned the Commons repos and forked the Common's Stat repo.
>>> Is it
>>> possible to make pull requests to that repo to be reviewed?
>>>
>>
>> That's certainly possible, but I'm afraid that it will become
>> quite unwieldy from my side if I have to delete/create branches
>> for every PR.
>>
>> If you want to start playing with the code, we could just begin
>> by having discussions here (on design) and on JIRA (for processing
>> minor issues) based on the current state of your repository.
>> [What's the link to look it up?]
>>
>> Or should I
>>> follow a specific method?
>>>
>>
>> I'll inquire about a more efficient method (than the above)...
>>
>> By referring the API docs I got some idea of the separation of
>> modules.
>>>
>>> In the current Commons's stat repo there are some classes under the
>>> package  distribution. I think those can be refactored using java 8
>>> in
>>> build statistics functionalities. Please correct me if I wrong.
>>>
>>
>> An example perhaps?
>>
>> As Eric said separation of function and streaming implementations is
>> good
>>> idea as designing. (In my point of view, it means method
>>> overloading ->
>>> Again correct me if I didn't understand your fact correctly)
>>>
>>
>> ?
>>
>> And I will share my draft proposal here for your review soon.
>>>
>>
>> OK.
>>
>> Thanks again for your interest,
>> Gilles
>>
>>
>>
>>> Best Regards.
>>>
>>> On 13 March 2018 at 20:50, Gilles <[hidden email]>
>>> wrote:
>>>
>>> Hello.
>>>>
>>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>>
>>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles
>>>> <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Where can we find the old code before port into new Commons
>>>>>> components?
>>>>>>>
>>>>>>>
>>>>>>> The code bases are managed by the "git" software; the whole
>>>>>>> history is
>>>>>> available:
>>>>>>  
>>>>>> https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>>>>>>
>>>>>> [I'd advise to "clone" the repositories on your local computer,
>>>>>> and
>>>>>> use the command line tools.]
>>>>>>
>>>>>>
>>>>>
>>>>> I believe you will want to clone the commons-math repositories,
>>>>> but then
>>>>> develop your own "fork" of the commons-statistics repository.
>>>>> Gilles can
>>>>> correct me if that is wrong.
>>>>>
>>>>>
>>>> Actually, I know only my workflow:
>>>>  $ git clone ...
>>>>  $ git branch ...
>>>>  $ git commit ...
>>>>  $ git push
>>>>
>>>> :-}
>>>>
>>>> I didn't find it very easy to cooperate with developers who
>>>> fork on GitHub and submit PRs.
>>>> I've now found the "git" command that creates a branch from
>>>> a PR, but it would be so much more comfortable to just switch
>>>> directory and do "git pull".
>>>>
>>>> In the context of GSoC, would it be possible to grant some
>>>> privilege to non-committers so that they can update a selected
>>>> "git" repository?
>>>> If not, what is the next easiest way to share a "common space"
>>>> (aka "sandbox") from which it would be easy to copy reviewed
>>>> bits over to the official source repository?
>>>>
>>>>
>>>> As
>>>>>>
>>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>>
>>>>>>>
>>>>>>> You don't necessarily need to analyze how the code was before
>>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>>> unless you suspect that something is wrong now and might have
>>>>>> been better before. ;-)
>>>>>>
>>>>>>
>>>>>> In particular, the statistics library was designed before Java
>>>>>> 8. Java
>>>>> 8
>>>>> however has provided both efficient programming strategies for
>>>>> these
>>>>> statistical methods (in the form of lambdas and streams) as well
>>>>> as some
>>>>> built-in methods providing summary statistics functions (see
>>>>> discussion
>>>>> at
>>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>>
>>>>>
>>>> Very good point, indeed.
>>>> IMO, the new component should be targeted Java 8.
>>>> Even Java 9 (enforcing modularity with JPMS): if by the time we
>>>> think
>>>> of releasing the code, we still want to avoid "multi-release" JARs
>>>> it
>>>> will be easy to just remove the "module-info" files (I don't think
>>>> much
>>>> else Java 9 specific would used by "Commons Statistics").
>>>>
>>>> In fact, given the very slow pace at which new components are
>>>> being
>>>> brought to releasable state, I'd like to ask whether it would be
>>>> OK
>>>> to make "incremental" releases?  That would mean: focus on (maven)
>>>> modules that seem close to feature-complete and bug-free, fix the
>>>> remaining issues and perform a release with that module added.
>>>>
>>>> It seems that the expectations were set to high (content-wise
>>>> given
>>>> the amount of human resources), so that neither CM can be released
>>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off
>>>> that
>>>> contains many modules, some of which are blocked by lack of
>>>> consensus
>>>> or dangling discussions.
>>>>
>>>> It probably makes sense, as a design strategy, to separate the
>>>> function
>>>>
>>>>> implementation from the streaming implementation. For example, a
>>>>> 2D
>>>>> integer
>>>>> array will probably require a different streaming implementation
>>>>> than a
>>>>> 1D
>>>>> double array, but they can  probably both be passed the same
>>>>> function
>>>>> handle to collect, say, the mean or max value.
>>>>>
>>>>> The role of commons might then be to provide a convenient
>>>>> interface, so
>>>>> that the user can simply call a static method like
>>>>> SummaryStats.mean()
>>>>> and
>>>>> not have to worry about the implementation.
>>>>>
>>>>> The other difficulty I see, is that quantile and median
>>>>> statistics will
>>>>> not
>>>>> be as easy to stream as statistics with a closed-form solution
>>>>> like mean
>>>>> or
>>>>> variance. There may however be great algorithms out there for
>>>>> pulling
>>>>> the
>>>>> median or the 95% quantile out of a stream -- if so they should
>>>>> be used.
>>>>>
>>>>> Eric
>>>>>
>>>>>
>>>> Eric,
>>>>
>>>> Would you be the official "mentor" for the GSoC participants that
>>>> are interested in helping with the porting of "o.a.c.math4.stat"?
>>>>
>>>> Thank you,
>>>> Gilles
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hii,

I have just shared my draft proposal for GSoC. Port Codes from Commons Math.
<https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit>
Devs, would you please review it and I always welcome your precious
suggestions to improve it.

Best Regards,
Gimhana

On 17 March 2018 at 05:06, Gilles <[hidden email]> wrote:

> Hi.
>
> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>
>> Hi devs,
>>
>> Sorry for the delayed reply due to my academics.
>>
>>
>> If you want to start playing with the code, we could just begin
>>> by having discussions here (on design) and on JIRA (for processing
>>> minor issues) based on the current state of your repository.
>>> [What's the link to look it up?]
>>>
>>>
>> Should I create my own repo and start code in there?[Not in the forked
>> repo]
>>
>
> What's the difference?  IOW, someone else should answer. :-}
>
> Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
>> guide me more. Like, to give me some minor issues in the current
>> implementation to solve or as a new feature implementation and gradually
>> we
>> can go for deeper
>>
>
> IMO, the top priority would be to release "Commons Numbers":
>   http://commons.apache.org/proper/commons-numbers/
>
> There are some blocking issues on JIRA:
>   https://issues.apache.org/jira/projects/NUMBERS
>
> and eventually I can go further my my own way.  Then I
>> can gradually familiar with the code and I think it is the most efficient
>> way to learn the design architecture.[I spent hours to understand the
>> current code basis and I felt that was not so efficient as I thought]
>>
>
> Refactoring the package "stat" is not straightforward...
> However, to get to that, it would be useful to record your thoughts
> as you browse through the code(s): what seems easy to port, what should
> be changed/fixed, what you don't understand, and so on.
>
>
>> And if there is a format of Proposal regarding ASF ?
>>
>
> I don't think so.  This ML is the forum where project directions
> are discussed.
>
> If not what should I
>> mention in the proposal basically?
>>
>
> This can be a work in progress, I think (see above suggestions).
>
> Best regards,
> Gilles
>
>
>
>> Best Regards,
>>
>>
>>
>>
>> On 14 March 2018 at 19:07, Gilles <[hidden email]> wrote:
>>
>> Hi.
>>>
>>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>>
>>> Hello Devs,
>>>>
>>>> Thanks Gilles and Eric for guidance.
>>>>
>>>> I have cloned the Commons repos and forked the Common's Stat repo. Is it
>>>> possible to make pull requests to that repo to be reviewed?
>>>>
>>>>
>>> That's certainly possible, but I'm afraid that it will become
>>> quite unwieldy from my side if I have to delete/create branches
>>> for every PR.
>>>
>>> If you want to start playing with the code, we could just begin
>>> by having discussions here (on design) and on JIRA (for processing
>>> minor issues) based on the current state of your repository.
>>> [What's the link to look it up?]
>>>
>>> Or should I
>>>
>>>> follow a specific method?
>>>>
>>>>
>>> I'll inquire about a more efficient method (than the above)...
>>>
>>> By referring the API docs I got some idea of the separation of modules.
>>>
>>>>
>>>> In the current Commons's stat repo there are some classes under the
>>>> package  distribution. I think those can be refactored using java 8 in
>>>> build statistics functionalities. Please correct me if I wrong.
>>>>
>>>>
>>> An example perhaps?
>>>
>>> As Eric said separation of function and streaming implementations is good
>>>
>>>> idea as designing. (In my point of view, it means method overloading ->
>>>> Again correct me if I didn't understand your fact correctly)
>>>>
>>>>
>>> ?
>>>
>>> And I will share my draft proposal here for your review soon.
>>>
>>>>
>>>>
>>> OK.
>>>
>>> Thanks again for your interest,
>>> Gilles
>>>
>>>
>>>
>>> Best Regards.
>>>>
>>>> On 13 March 2018 at 20:50, Gilles <[hidden email]> wrote:
>>>>
>>>> Hello.
>>>>
>>>>>
>>>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>>>
>>>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles <[hidden email]
>>>>> >
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Where can we find the old code before port into new Commons
>>>>>>> components?
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The code bases are managed by the "git" software; the whole history
>>>>>>>> is
>>>>>>>>
>>>>>>> available:
>>>>>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>>>>>>>
>>>>>>> [I'd advise to "clone" the repositories on your local computer, and
>>>>>>> use the command line tools.]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I believe you will want to clone the commons-math repositories, but
>>>>>> then
>>>>>> develop your own "fork" of the commons-statistics repository. Gilles
>>>>>> can
>>>>>> correct me if that is wrong.
>>>>>>
>>>>>>
>>>>>> Actually, I know only my workflow:
>>>>>  $ git clone ...
>>>>>  $ git branch ...
>>>>>  $ git commit ...
>>>>>  $ git push
>>>>>
>>>>> :-}
>>>>>
>>>>> I didn't find it very easy to cooperate with developers who
>>>>> fork on GitHub and submit PRs.
>>>>> I've now found the "git" command that creates a branch from
>>>>> a PR, but it would be so much more comfortable to just switch
>>>>> directory and do "git pull".
>>>>>
>>>>> In the context of GSoC, would it be possible to grant some
>>>>> privilege to non-committers so that they can update a selected
>>>>> "git" repository?
>>>>> If not, what is the next easiest way to share a "common space"
>>>>> (aka "sandbox") from which it would be easy to copy reviewed
>>>>> bits over to the official source repository?
>>>>>
>>>>>
>>>>> As
>>>>>
>>>>>>
>>>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> You don't necessarily need to analyze how the code was before
>>>>>>>>
>>>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>>>> unless you suspect that something is wrong now and might have
>>>>>>> been better before. ;-)
>>>>>>>
>>>>>>>
>>>>>>> In particular, the statistics library was designed before Java 8.
>>>>>>> Java
>>>>>>>
>>>>>> 8
>>>>>> however has provided both efficient programming strategies for these
>>>>>> statistical methods (in the form of lambdas and streams) as well as
>>>>>> some
>>>>>> built-in methods providing summary statistics functions (see
>>>>>> discussion
>>>>>> at
>>>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>>>
>>>>>>
>>>>>> Very good point, indeed.
>>>>> IMO, the new component should be targeted Java 8.
>>>>> Even Java 9 (enforcing modularity with JPMS): if by the time we think
>>>>> of releasing the code, we still want to avoid "multi-release" JARs it
>>>>> will be easy to just remove the "module-info" files (I don't think much
>>>>> else Java 9 specific would used by "Commons Statistics").
>>>>>
>>>>> In fact, given the very slow pace at which new components are being
>>>>> brought to releasable state, I'd like to ask whether it would be OK
>>>>> to make "incremental" releases?  That would mean: focus on (maven)
>>>>> modules that seem close to feature-complete and bug-free, fix the
>>>>> remaining issues and perform a release with that module added.
>>>>>
>>>>> It seems that the expectations were set to high (content-wise given
>>>>> the amount of human resources), so that neither CM can be released
>>>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off that
>>>>> contains many modules, some of which are blocked by lack of consensus
>>>>> or dangling discussions.
>>>>>
>>>>> It probably makes sense, as a design strategy, to separate the function
>>>>>
>>>>> implementation from the streaming implementation. For example, a 2D
>>>>>> integer
>>>>>> array will probably require a different streaming implementation than
>>>>>> a
>>>>>> 1D
>>>>>> double array, but they can  probably both be passed the same function
>>>>>> handle to collect, say, the mean or max value.
>>>>>>
>>>>>> The role of commons might then be to provide a convenient interface,
>>>>>> so
>>>>>> that the user can simply call a static method like SummaryStats.mean()
>>>>>> and
>>>>>> not have to worry about the implementation.
>>>>>>
>>>>>> The other difficulty I see, is that quantile and median statistics
>>>>>> will
>>>>>> not
>>>>>> be as easy to stream as statistics with a closed-form solution like
>>>>>> mean
>>>>>> or
>>>>>> variance. There may however be great algorithms out there for pulling
>>>>>> the
>>>>>> median or the 95% quantile out of a stream -- if so they should be
>>>>>> used.
>>>>>>
>>>>>> Eric
>>>>>>
>>>>>>
>>>>>> Eric,
>>>>>
>>>>> Would you be the official "mentor" for the GSoC participants that
>>>>> are interested in helping with the porting of "o.a.c.math4.stat"?
>>>>>
>>>>> Thank you,
>>>>> Gilles
>>>>>
>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hii,

I have not decided the timeline yet. I suppose to decide it after Design
Architecture is confirmed.


Best Regards,
Gimhana.

On 18 March 2018 at 19:17, Gimhana Nadeeshan <
[hidden email]> wrote:

> Hii,
>
> I have just shared my draft proposal for GSoC. Port Codes from Commons
> Math.
> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit>
> Devs, would you please review it and I always welcome your precious
> suggestions to improve it.
>
> Best Regards,
> Gimhana
>
> On 17 March 2018 at 05:06, Gilles <[hidden email]> wrote:
>
>> Hi.
>>
>> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>>
>>> Hi devs,
>>>
>>> Sorry for the delayed reply due to my academics.
>>>
>>>
>>> If you want to start playing with the code, we could just begin
>>>> by having discussions here (on design) and on JIRA (for processing
>>>> minor issues) based on the current state of your repository.
>>>> [What's the link to look it up?]
>>>>
>>>>
>>> Should I create my own repo and start code in there?[Not in the forked
>>> repo]
>>>
>>
>> What's the difference?  IOW, someone else should answer. :-}
>>
>> Actually it will be more helpful to me if someone [ @Gilles or @Eric ] can
>>> guide me more. Like, to give me some minor issues in the current
>>> implementation to solve or as a new feature implementation and gradually
>>> we
>>> can go for deeper
>>>
>>
>> IMO, the top priority would be to release "Commons Numbers":
>>   http://commons.apache.org/proper/commons-numbers/
>>
>> There are some blocking issues on JIRA:
>>   https://issues.apache.org/jira/projects/NUMBERS
>>
>> and eventually I can go further my my own way.  Then I
>>> can gradually familiar with the code and I think it is the most efficient
>>> way to learn the design architecture.[I spent hours to understand the
>>> current code basis and I felt that was not so efficient as I thought]
>>>
>>
>> Refactoring the package "stat" is not straightforward...
>> However, to get to that, it would be useful to record your thoughts
>> as you browse through the code(s): what seems easy to port, what should
>> be changed/fixed, what you don't understand, and so on.
>>
>>
>>> And if there is a format of Proposal regarding ASF ?
>>>
>>
>> I don't think so.  This ML is the forum where project directions
>> are discussed.
>>
>> If not what should I
>>> mention in the proposal basically?
>>>
>>
>> This can be a work in progress, I think (see above suggestions).
>>
>> Best regards,
>> Gilles
>>
>>
>>
>>> Best Regards,
>>>
>>>
>>>
>>>
>>> On 14 March 2018 at 19:07, Gilles <[hidden email]> wrote:
>>>
>>> Hi.
>>>>
>>>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>>>
>>>> Hello Devs,
>>>>>
>>>>> Thanks Gilles and Eric for guidance.
>>>>>
>>>>> I have cloned the Commons repos and forked the Common's Stat repo. Is
>>>>> it
>>>>> possible to make pull requests to that repo to be reviewed?
>>>>>
>>>>>
>>>> That's certainly possible, but I'm afraid that it will become
>>>> quite unwieldy from my side if I have to delete/create branches
>>>> for every PR.
>>>>
>>>> If you want to start playing with the code, we could just begin
>>>> by having discussions here (on design) and on JIRA (for processing
>>>> minor issues) based on the current state of your repository.
>>>> [What's the link to look it up?]
>>>>
>>>> Or should I
>>>>
>>>>> follow a specific method?
>>>>>
>>>>>
>>>> I'll inquire about a more efficient method (than the above)...
>>>>
>>>> By referring the API docs I got some idea of the separation of modules.
>>>>
>>>>>
>>>>> In the current Commons's stat repo there are some classes under the
>>>>> package  distribution. I think those can be refactored using java 8 in
>>>>> build statistics functionalities. Please correct me if I wrong.
>>>>>
>>>>>
>>>> An example perhaps?
>>>>
>>>> As Eric said separation of function and streaming implementations is
>>>> good
>>>>
>>>>> idea as designing. (In my point of view, it means method overloading ->
>>>>> Again correct me if I didn't understand your fact correctly)
>>>>>
>>>>>
>>>> ?
>>>>
>>>> And I will share my draft proposal here for your review soon.
>>>>
>>>>>
>>>>>
>>>> OK.
>>>>
>>>> Thanks again for your interest,
>>>> Gilles
>>>>
>>>>
>>>>
>>>> Best Regards.
>>>>>
>>>>> On 13 March 2018 at 20:50, Gilles <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Hello.
>>>>>
>>>>>>
>>>>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>>>>
>>>>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles <
>>>>>> [hidden email]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Where can we find the old code before port into new Commons
>>>>>>>> components?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The code bases are managed by the "git" software; the whole
>>>>>>>>> history is
>>>>>>>>>
>>>>>>>> available:
>>>>>>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git
>>>>>>>> ;a=log
>>>>>>>>
>>>>>>>> [I'd advise to "clone" the repositories on your local computer, and
>>>>>>>> use the command line tools.]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I believe you will want to clone the commons-math repositories, but
>>>>>>> then
>>>>>>> develop your own "fork" of the commons-statistics repository. Gilles
>>>>>>> can
>>>>>>> correct me if that is wrong.
>>>>>>>
>>>>>>>
>>>>>>> Actually, I know only my workflow:
>>>>>>  $ git clone ...
>>>>>>  $ git branch ...
>>>>>>  $ git commit ...
>>>>>>  $ git push
>>>>>>
>>>>>> :-}
>>>>>>
>>>>>> I didn't find it very easy to cooperate with developers who
>>>>>> fork on GitHub and submit PRs.
>>>>>> I've now found the "git" command that creates a branch from
>>>>>> a PR, but it would be so much more comfortable to just switch
>>>>>> directory and do "git pull".
>>>>>>
>>>>>> In the context of GSoC, would it be possible to grant some
>>>>>> privilege to non-committers so that they can update a selected
>>>>>> "git" repository?
>>>>>> If not, what is the next easiest way to share a "common space"
>>>>>> (aka "sandbox") from which it would be easy to copy reviewed
>>>>>> bits over to the official source repository?
>>>>>>
>>>>>>
>>>>>> As
>>>>>>
>>>>>>>
>>>>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You don't necessarily need to analyze how the code was before
>>>>>>>>>
>>>>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>>>>> unless you suspect that something is wrong now and might have
>>>>>>>> been better before. ;-)
>>>>>>>>
>>>>>>>>
>>>>>>>> In particular, the statistics library was designed before Java 8.
>>>>>>>> Java
>>>>>>>>
>>>>>>> 8
>>>>>>> however has provided both efficient programming strategies for these
>>>>>>> statistical methods (in the form of lambdas and streams) as well as
>>>>>>> some
>>>>>>> built-in methods providing summary statistics functions (see
>>>>>>> discussion
>>>>>>> at
>>>>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>>>>
>>>>>>>
>>>>>>> Very good point, indeed.
>>>>>> IMO, the new component should be targeted Java 8.
>>>>>> Even Java 9 (enforcing modularity with JPMS): if by the time we think
>>>>>> of releasing the code, we still want to avoid "multi-release" JARs it
>>>>>> will be easy to just remove the "module-info" files (I don't think
>>>>>> much
>>>>>> else Java 9 specific would used by "Commons Statistics").
>>>>>>
>>>>>> In fact, given the very slow pace at which new components are being
>>>>>> brought to releasable state, I'd like to ask whether it would be OK
>>>>>> to make "incremental" releases?  That would mean: focus on (maven)
>>>>>> modules that seem close to feature-complete and bug-free, fix the
>>>>>> remaining issues and perform a release with that module added.
>>>>>>
>>>>>> It seems that the expectations were set to high (content-wise given
>>>>>> the amount of human resources), so that neither CM can be released
>>>>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off that
>>>>>> contains many modules, some of which are blocked by lack of consensus
>>>>>> or dangling discussions.
>>>>>>
>>>>>> It probably makes sense, as a design strategy, to separate the
>>>>>> function
>>>>>>
>>>>>> implementation from the streaming implementation. For example, a 2D
>>>>>>> integer
>>>>>>> array will probably require a different streaming implementation
>>>>>>> than a
>>>>>>> 1D
>>>>>>> double array, but they can  probably both be passed the same function
>>>>>>> handle to collect, say, the mean or max value.
>>>>>>>
>>>>>>> The role of commons might then be to provide a convenient interface,
>>>>>>> so
>>>>>>> that the user can simply call a static method like
>>>>>>> SummaryStats.mean()
>>>>>>> and
>>>>>>> not have to worry about the implementation.
>>>>>>>
>>>>>>> The other difficulty I see, is that quantile and median statistics
>>>>>>> will
>>>>>>> not
>>>>>>> be as easy to stream as statistics with a closed-form solution like
>>>>>>> mean
>>>>>>> or
>>>>>>> variance. There may however be great algorithms out there for pulling
>>>>>>> the
>>>>>>> median or the 95% quantile out of a stream -- if so they should be
>>>>>>> used.
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>>
>>>>>>> Eric,
>>>>>>
>>>>>> Would you be the official "mentor" for the GSoC participants that
>>>>>> are interested in helping with the porting of "o.a.c.math4.stat"?
>>>>>>
>>>>>> Thank you,
>>>>>> Gilles
>>>>>>
>>>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gilles Sadowski
In reply to this post by Gimhana Nadeeshan
Hi Gimhana.

On Sun, 18 Mar 2018 19:17:44 +0530, Gimhana Nadeeshan wrote:
> Hii,
>
> I have just shared my draft proposal for GSoC. Port Codes from
> Commons Math.
>
> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit>

Wow; probably the first time that such a structured document
appears on this list. ;-)

> Devs, would you please review it and I always welcome your precious
> suggestions to improve it.

OK.  I'll try to provide some clarifications and words of
caution.

== "Background" section ==
Useful to cite:
(for Commons in general)
  * number of stable/active/dormant components
  * number of listed/active contributors
  * overview of topics covered
  * histogram of component's sizes (lines of code)
(for Commons Math)
  * how it fits within the above data

And draw some conclusions out of the comparison.
You stress "before JDK 1.8"; worth noting that some codes
dates back to before JDK 1.5!
Code age is not necessarily a problem per se, but the mix
(of designs linked to outdated JDK) is, IMHO, a development
nightmare.
Modularization can alleviate the unwanted consequences (such
as release stalled due to the lack of support).

== "Deliverables" section ==

Clarify what is meant by
  * "less dependencies" (an example?)
  * "Advanced mathematical functionalities": other than what
    exists now?  Or do you mean new interfaces (e.g. in
    accordance with the APIs provided by JDK8)?
  * "implemented module" (singular). I would assume that
    "Commons Statistics" will provide many modules.
  * "Guide for refactoring [..] Commons packages": That is
    unlikely. ;-)
    Did you more modestly mean "Commons Math packages"?
    You should perhaps note (in the "Background" section)
    that the task has been started two year ago (cf.
    "Commons RNG" and "Commons Numbers").

Another quite useful task is: set up the web site.

== "Implementation" section ==

  * "Design issues": list *actual* issues (see JIRA).
    Working with stream would better be described as an
    enhancement.
  * Describe "too many dependencies" (examples).
  * "Design goals": give concrete examples.

The class diagram is nice but I see a big issue with
the "matrix" functionality. [This was one of the reason
I wrote a few months ago (cf. ML archive) that the
refactoring of the "o.a.c.math4.stat" was not among the
low-hanging fruits of the refactoring.]
If ever possible, better start with functionality that
doesn't need the CM matrix code.

== "Results" section ==

Hope to get comment from PMC...
[Wish list, design requirements, mentor(s), etc.]

== "Future Development" section ==

AFAICT, porting "o.a.c.math4.geometry" will be much
easier and likely to be finished before "Commons
Statistics". :-}


Thanks for your interest,
Gilles

> Best Regards,
> Gimhana
>
> On 17 March 2018 at 05:06, Gilles <[hidden email]>
> wrote:
>
>> Hi.
>>
>> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>>
>>> Hi devs,
>>>
>>> Sorry for the delayed reply due to my academics.
>>>
>>>
>>> If you want to start playing with the code, we could just begin
>>>> by having discussions here (on design) and on JIRA (for processing
>>>> minor issues) based on the current state of your repository.
>>>> [What's the link to look it up?]
>>>>
>>>>
>>> Should I create my own repo and start code in there?[Not in the
>>> forked
>>> repo]
>>>
>>
>> What's the difference?  IOW, someone else should answer. :-}
>>
>> Actually it will be more helpful to me if someone [ @Gilles or @Eric
>> ] can
>>> guide me more. Like, to give me some minor issues in the current
>>> implementation to solve or as a new feature implementation and
>>> gradually
>>> we
>>> can go for deeper
>>>
>>
>> IMO, the top priority would be to release "Commons Numbers":
>>   http://commons.apache.org/proper/commons-numbers/
>>
>> There are some blocking issues on JIRA:
>>   https://issues.apache.org/jira/projects/NUMBERS
>>
>> and eventually I can go further my my own way.  Then I
>>> can gradually familiar with the code and I think it is the most
>>> efficient
>>> way to learn the design architecture.[I spent hours to understand
>>> the
>>> current code basis and I felt that was not so efficient as I
>>> thought]
>>>
>>
>> Refactoring the package "stat" is not straightforward...
>> However, to get to that, it would be useful to record your thoughts
>> as you browse through the code(s): what seems easy to port, what
>> should
>> be changed/fixed, what you don't understand, and so on.
>>
>>
>>> And if there is a format of Proposal regarding ASF ?
>>>
>>
>> I don't think so.  This ML is the forum where project directions
>> are discussed.
>>
>> If not what should I
>>> mention in the proposal basically?
>>>
>>
>> This can be a work in progress, I think (see above suggestions).
>>
>> Best regards,
>> Gilles
>>
>>
>>
>>> Best Regards,
>>>
>>>
>>>
>>>
>>> On 14 March 2018 at 19:07, Gilles <[hidden email]>
>>> wrote:
>>>
>>> Hi.
>>>>
>>>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>>>
>>>> Hello Devs,
>>>>>
>>>>> Thanks Gilles and Eric for guidance.
>>>>>
>>>>> I have cloned the Commons repos and forked the Common's Stat
>>>>> repo. Is it
>>>>> possible to make pull requests to that repo to be reviewed?
>>>>>
>>>>>
>>>> That's certainly possible, but I'm afraid that it will become
>>>> quite unwieldy from my side if I have to delete/create branches
>>>> for every PR.
>>>>
>>>> If you want to start playing with the code, we could just begin
>>>> by having discussions here (on design) and on JIRA (for processing
>>>> minor issues) based on the current state of your repository.
>>>> [What's the link to look it up?]
>>>>
>>>> Or should I
>>>>
>>>>> follow a specific method?
>>>>>
>>>>>
>>>> I'll inquire about a more efficient method (than the above)...
>>>>
>>>> By referring the API docs I got some idea of the separation of
>>>> modules.
>>>>
>>>>>
>>>>> In the current Commons's stat repo there are some classes under
>>>>> the
>>>>> package  distribution. I think those can be refactored using java
>>>>> 8 in
>>>>> build statistics functionalities. Please correct me if I wrong.
>>>>>
>>>>>
>>>> An example perhaps?
>>>>
>>>> As Eric said separation of function and streaming implementations
>>>> is good
>>>>
>>>>> idea as designing. (In my point of view, it means method
>>>>> overloading ->
>>>>> Again correct me if I didn't understand your fact correctly)
>>>>>
>>>>>
>>>> ?
>>>>
>>>> And I will share my draft proposal here for your review soon.
>>>>
>>>>>
>>>>>
>>>> OK.
>>>>
>>>> Thanks again for your interest,
>>>> Gilles
>>>>
>>>>
>>>>
>>>> Best Regards.
>>>>>
>>>>> On 13 March 2018 at 20:50, Gilles <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> Hello.
>>>>>
>>>>>>
>>>>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>>>>
>>>>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles
>>>>>> <[hidden email]
>>>>>> >
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Where can we find the old code before port into new Commons
>>>>>>>> components?
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The code bases are managed by the "git" software; the whole
>>>>>>>>> history
>>>>>>>>> is
>>>>>>>>>
>>>>>>>> available:
>>>>>>>>  
>>>>>>>> https://git1-us-west.apache.org/repos/asf?p=commons-math.git;a=log
>>>>>>>>
>>>>>>>> [I'd advise to "clone" the repositories on your local
>>>>>>>> computer, and
>>>>>>>> use the command line tools.]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I believe you will want to clone the commons-math repositories,
>>>>>>> but
>>>>>>> then
>>>>>>> develop your own "fork" of the commons-statistics repository.
>>>>>>> Gilles
>>>>>>> can
>>>>>>> correct me if that is wrong.
>>>>>>>
>>>>>>>
>>>>>>> Actually, I know only my workflow:
>>>>>>  $ git clone ...
>>>>>>  $ git branch ...
>>>>>>  $ git commit ...
>>>>>>  $ git push
>>>>>>
>>>>>> :-}
>>>>>>
>>>>>> I didn't find it very easy to cooperate with developers who
>>>>>> fork on GitHub and submit PRs.
>>>>>> I've now found the "git" command that creates a branch from
>>>>>> a PR, but it would be so much more comfortable to just switch
>>>>>> directory and do "git pull".
>>>>>>
>>>>>> In the context of GSoC, would it be possible to grant some
>>>>>> privilege to non-committers so that they can update a selected
>>>>>> "git" repository?
>>>>>> If not, what is the next easiest way to share a "common space"
>>>>>> (aka "sandbox") from which it would be easy to copy reviewed
>>>>>> bits over to the official source repository?
>>>>>>
>>>>>>
>>>>>> As
>>>>>>
>>>>>>>
>>>>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You don't necessarily need to analyze how the code was before
>>>>>>>>>
>>>>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>>>>> unless you suspect that something is wrong now and might have
>>>>>>>> been better before. ;-)
>>>>>>>>
>>>>>>>>
>>>>>>>> In particular, the statistics library was designed before Java
>>>>>>>> 8.
>>>>>>>> Java
>>>>>>>>
>>>>>>> 8
>>>>>>> however has provided both efficient programming strategies for
>>>>>>> these
>>>>>>> statistical methods (in the form of lambdas and streams) as
>>>>>>> well as
>>>>>>> some
>>>>>>> built-in methods providing summary statistics functions (see
>>>>>>> discussion
>>>>>>> at
>>>>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>>>>
>>>>>>>
>>>>>>> Very good point, indeed.
>>>>>> IMO, the new component should be targeted Java 8.
>>>>>> Even Java 9 (enforcing modularity with JPMS): if by the time we
>>>>>> think
>>>>>> of releasing the code, we still want to avoid "multi-release"
>>>>>> JARs it
>>>>>> will be easy to just remove the "module-info" files (I don't
>>>>>> think much
>>>>>> else Java 9 specific would used by "Commons Statistics").
>>>>>>
>>>>>> In fact, given the very slow pace at which new components are
>>>>>> being
>>>>>> brought to releasable state, I'd like to ask whether it would be
>>>>>> OK
>>>>>> to make "incremental" releases?  That would mean: focus on
>>>>>> (maven)
>>>>>> modules that seem close to feature-complete and bug-free, fix
>>>>>> the
>>>>>> remaining issues and perform a release with that module added.
>>>>>>
>>>>>> It seems that the expectations were set to high (content-wise
>>>>>> given
>>>>>> the amount of human resources), so that neither CM can be
>>>>>> released
>>>>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off
>>>>>> that
>>>>>> contains many modules, some of which are blocked by lack of
>>>>>> consensus
>>>>>> or dangling discussions.
>>>>>>
>>>>>> It probably makes sense, as a design strategy, to separate the
>>>>>> function
>>>>>>
>>>>>> implementation from the streaming implementation. For example, a
>>>>>> 2D
>>>>>>> integer
>>>>>>> array will probably require a different streaming
>>>>>>> implementation than
>>>>>>> a
>>>>>>> 1D
>>>>>>> double array, but they can  probably both be passed the same
>>>>>>> function
>>>>>>> handle to collect, say, the mean or max value.
>>>>>>>
>>>>>>> The role of commons might then be to provide a convenient
>>>>>>> interface,
>>>>>>> so
>>>>>>> that the user can simply call a static method like
>>>>>>> SummaryStats.mean()
>>>>>>> and
>>>>>>> not have to worry about the implementation.
>>>>>>>
>>>>>>> The other difficulty I see, is that quantile and median
>>>>>>> statistics
>>>>>>> will
>>>>>>> not
>>>>>>> be as easy to stream as statistics with a closed-form solution
>>>>>>> like
>>>>>>> mean
>>>>>>> or
>>>>>>> variance. There may however be great algorithms out there for
>>>>>>> pulling
>>>>>>> the
>>>>>>> median or the 95% quantile out of a stream -- if so they should
>>>>>>> be
>>>>>>> used.
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>>
>>>>>>> Eric,
>>>>>>
>>>>>> Would you be the official "mentor" for the GSoC participants
>>>>>> that
>>>>>> are interested in helping with the porting of
>>>>>> "o.a.c.math4.stat"?
>>>>>>
>>>>>> Thank you,
>>>>>> Gilles
>>>>>>
>>>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hi ,

Thanks a lot Gilles for your valuable suggestions and give the reviews so
quickly. I'll apply those corrections asked for any clarifications in here.
By the way since I'm new to Apache Community I'm not yet familiar with some
abbreviations used in the list. [such as ML archive, PMC ]

AFAICT, porting "o.a.c.math4.geometry" will be much
> easier and likely to be finished before "Commons
> Statistics". :-}
>

Since the design structure is the same, this would be interesting and
easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea at
JIRA !!]

Best Regards,
Gimhana.

On 18 March 2018 at 21:18, Gilles <[hidden email]> wrote:

> Hi Gimhana.
>
> On Sun, 18 Mar 2018 19:17:44 +0530, Gimhana Nadeeshan wrote:
>
>> Hii,
>>
>> I have just shared my draft proposal for GSoC. Port Codes from Commons
>> Math.
>>
>> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqC
>> OBOqTOeMnPaBsE9U5YhU/edit>
>>
>
> Wow; probably the first time that such a structured document
> appears on this list. ;-)
>
> Devs, would you please review it and I always welcome your precious
>> suggestions to improve it.
>>
>
> OK.  I'll try to provide some clarifications and words of
> caution.
>
> == "Background" section ==
> Useful to cite:
> (for Commons in general)
>  * number of stable/active/dormant components
>  * number of listed/active contributors
>  * overview of topics covered
>  * histogram of component's sizes (lines of code)
> (for Commons Math)
>  * how it fits within the above data
>
> And draw some conclusions out of the comparison.
> You stress "before JDK 1.8"; worth noting that some codes
> dates back to before JDK 1.5!
> Code age is not necessarily a problem per se, but the mix
> (of designs linked to outdated JDK) is, IMHO, a development
> nightmare.
> Modularization can alleviate the unwanted consequences (such
> as release stalled due to the lack of support).
>
> == "Deliverables" section ==
>
> Clarify what is meant by
>  * "less dependencies" (an example?)
>  * "Advanced mathematical functionalities": other than what
>    exists now?  Or do you mean new interfaces (e.g. in
>    accordance with the APIs provided by JDK8)?
>  * "implemented module" (singular). I would assume that
>    "Commons Statistics" will provide many modules.
>  * "Guide for refactoring [..] Commons packages": That is
>    unlikely. ;-)
>    Did you more modestly mean "Commons Math packages"?
>    You should perhaps note (in the "Background" section)
>    that the task has been started two year ago (cf.
>    "Commons RNG" and "Commons Numbers").
>
> Another quite useful task is: set up the web site.
>
> == "Implementation" section ==
>
>  * "Design issues": list *actual* issues (see JIRA).
>    Working with stream would better be described as an
>    enhancement.
>  * Describe "too many dependencies" (examples).
>  * "Design goals": give concrete examples.
>
> The class diagram is nice but I see a big issue with
> the "matrix" functionality. [This was one of the reason
> I wrote a few months ago (cf. ML archive) that the
> refactoring of the "o.a.c.math4.stat" was not among the
> low-hanging fruits of the refactoring.]
> If ever possible, better start with functionality that
> doesn't need the CM matrix code.
>
> == "Results" section ==
>
> Hope to get comment from PMC...
> [Wish list, design requirements, mentor(s), etc.]
>
> == "Future Development" section ==
>
> AFAICT, porting "o.a.c.math4.geometry" will be much
> easier and likely to be finished before "Commons
> Statistics". :-}
>
>
> Thanks for your interest,
>
> Gilles
>
> Best Regards,
>> Gimhana
>>
>> On 17 March 2018 at 05:06, Gilles <[hidden email]> wrote:
>>
>> Hi.
>>>
>>> On Fri, 16 Mar 2018 23:12:38 +0530, Gimhana Nadeeshan wrote:
>>>
>>> Hi devs,
>>>>
>>>> Sorry for the delayed reply due to my academics.
>>>>
>>>>
>>>> If you want to start playing with the code, we could just begin
>>>>
>>>>> by having discussions here (on design) and on JIRA (for processing
>>>>> minor issues) based on the current state of your repository.
>>>>> [What's the link to look it up?]
>>>>>
>>>>>
>>>>> Should I create my own repo and start code in there?[Not in the forked
>>>> repo]
>>>>
>>>>
>>> What's the difference?  IOW, someone else should answer. :-}
>>>
>>> Actually it will be more helpful to me if someone [ @Gilles or @Eric ]
>>> can
>>>
>>>> guide me more. Like, to give me some minor issues in the current
>>>> implementation to solve or as a new feature implementation and gradually
>>>> we
>>>> can go for deeper
>>>>
>>>>
>>> IMO, the top priority would be to release "Commons Numbers":
>>>   http://commons.apache.org/proper/commons-numbers/
>>>
>>> There are some blocking issues on JIRA:
>>>   https://issues.apache.org/jira/projects/NUMBERS
>>>
>>> and eventually I can go further my my own way.  Then I
>>>
>>>> can gradually familiar with the code and I think it is the most
>>>> efficient
>>>> way to learn the design architecture.[I spent hours to understand the
>>>> current code basis and I felt that was not so efficient as I thought]
>>>>
>>>>
>>> Refactoring the package "stat" is not straightforward...
>>> However, to get to that, it would be useful to record your thoughts
>>> as you browse through the code(s): what seems easy to port, what should
>>> be changed/fixed, what you don't understand, and so on.
>>>
>>>
>>> And if there is a format of Proposal regarding ASF ?
>>>>
>>>>
>>> I don't think so.  This ML is the forum where project directions
>>> are discussed.
>>>
>>> If not what should I
>>>
>>>> mention in the proposal basically?
>>>>
>>>>
>>> This can be a work in progress, I think (see above suggestions).
>>>
>>> Best regards,
>>> Gilles
>>>
>>>
>>>
>>> Best Regards,
>>>>
>>>>
>>>>
>>>>
>>>> On 14 March 2018 at 19:07, Gilles <[hidden email]> wrote:
>>>>
>>>> Hi.
>>>>
>>>>>
>>>>> On Tue, 13 Mar 2018 23:37:24 +0530, Gimhana Nadeeshan wrote:
>>>>>
>>>>> Hello Devs,
>>>>>
>>>>>>
>>>>>> Thanks Gilles and Eric for guidance.
>>>>>>
>>>>>> I have cloned the Commons repos and forked the Common's Stat repo. Is
>>>>>> it
>>>>>> possible to make pull requests to that repo to be reviewed?
>>>>>>
>>>>>>
>>>>>> That's certainly possible, but I'm afraid that it will become
>>>>> quite unwieldy from my side if I have to delete/create branches
>>>>> for every PR.
>>>>>
>>>>> If you want to start playing with the code, we could just begin
>>>>> by having discussions here (on design) and on JIRA (for processing
>>>>> minor issues) based on the current state of your repository.
>>>>> [What's the link to look it up?]
>>>>>
>>>>> Or should I
>>>>>
>>>>> follow a specific method?
>>>>>>
>>>>>>
>>>>>> I'll inquire about a more efficient method (than the above)...
>>>>>
>>>>> By referring the API docs I got some idea of the separation of modules.
>>>>>
>>>>>
>>>>>> In the current Commons's stat repo there are some classes under the
>>>>>> package  distribution. I think those can be refactored using java 8 in
>>>>>> build statistics functionalities. Please correct me if I wrong.
>>>>>>
>>>>>>
>>>>>> An example perhaps?
>>>>>
>>>>> As Eric said separation of function and streaming implementations is
>>>>> good
>>>>>
>>>>> idea as designing. (In my point of view, it means method overloading ->
>>>>>> Again correct me if I didn't understand your fact correctly)
>>>>>>
>>>>>>
>>>>>> ?
>>>>>
>>>>> And I will share my draft proposal here for your review soon.
>>>>>
>>>>>
>>>>>>
>>>>>> OK.
>>>>>
>>>>> Thanks again for your interest,
>>>>> Gilles
>>>>>
>>>>>
>>>>>
>>>>> Best Regards.
>>>>>
>>>>>>
>>>>>> On 13 March 2018 at 20:50, Gilles <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>>
>>>>>>> On Tue, 13 Mar 2018 09:25:19 +0100, Eric Barnhill wrote:
>>>>>>>
>>>>>>> On Tue, Mar 13, 2018 at 12:47 AM, Gilles <
>>>>>>> [hidden email]
>>>>>>> >
>>>>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Where can we find the old code before port into new Commons
>>>>>>>>
>>>>>>>>> components?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The code bases are managed by the "git" software; the whole
>>>>>>>>>> history
>>>>>>>>>> is
>>>>>>>>>>
>>>>>>>>>> available:
>>>>>>>>>   https://git1-us-west.apache.org/repos/asf?p=commons-math.git
>>>>>>>>> ;a=log
>>>>>>>>>
>>>>>>>>> [I'd advise to "clone" the repositories on your local computer, and
>>>>>>>>> use the command line tools.]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I believe you will want to clone the commons-math repositories, but
>>>>>>>> then
>>>>>>>> develop your own "fork" of the commons-statistics repository. Gilles
>>>>>>>> can
>>>>>>>> correct me if that is wrong.
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, I know only my workflow:
>>>>>>>>
>>>>>>>  $ git clone ...
>>>>>>>  $ git branch ...
>>>>>>>  $ git commit ...
>>>>>>>  $ git push
>>>>>>>
>>>>>>> :-}
>>>>>>>
>>>>>>> I didn't find it very easy to cooperate with developers who
>>>>>>> fork on GitHub and submit PRs.
>>>>>>> I've now found the "git" command that creates a branch from
>>>>>>> a PR, but it would be so much more comfortable to just switch
>>>>>>> directory and do "git pull".
>>>>>>>
>>>>>>> In the context of GSoC, would it be possible to grant some
>>>>>>> privilege to non-committers so that they can update a selected
>>>>>>> "git" repository?
>>>>>>> If not, what is the next easiest way to share a "common space"
>>>>>>> (aka "sandbox") from which it would be easy to copy reviewed
>>>>>>> bits over to the official source repository?
>>>>>>>
>>>>>>>
>>>>>>> As
>>>>>>>
>>>>>>>
>>>>>>>> you mentioned it will be a good approach to redesign process.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You don't necessarily need to analyze how the code was before
>>>>>>>>>>
>>>>>>>>>> the port/refactoring; looking at how it is now is sufficient,
>>>>>>>>> unless you suspect that something is wrong now and might have
>>>>>>>>> been better before. ;-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In particular, the statistics library was designed before Java 8.
>>>>>>>>> Java
>>>>>>>>>
>>>>>>>>> 8
>>>>>>>> however has provided both efficient programming strategies for these
>>>>>>>> statistical methods (in the form of lambdas and streams) as well as
>>>>>>>> some
>>>>>>>> built-in methods providing summary statistics functions (see
>>>>>>>> discussion
>>>>>>>> at
>>>>>>>> http://markmail.org/message/7t2mjaprsuvb3waj).
>>>>>>>>
>>>>>>>>
>>>>>>>> Very good point, indeed.
>>>>>>>>
>>>>>>> IMO, the new component should be targeted Java 8.
>>>>>>> Even Java 9 (enforcing modularity with JPMS): if by the time we think
>>>>>>> of releasing the code, we still want to avoid "multi-release" JARs it
>>>>>>> will be easy to just remove the "module-info" files (I don't think
>>>>>>> much
>>>>>>> else Java 9 specific would used by "Commons Statistics").
>>>>>>>
>>>>>>> In fact, given the very slow pace at which new components are being
>>>>>>> brought to releasable state, I'd like to ask whether it would be OK
>>>>>>> to make "incremental" releases?  That would mean: focus on (maven)
>>>>>>> modules that seem close to feature-complete and bug-free, fix the
>>>>>>> remaining issues and perform a release with that module added.
>>>>>>>
>>>>>>> It seems that the expectations were set to high (content-wise given
>>>>>>> the amount of human resources), so that neither CM can be released
>>>>>>> (too many non-fixed issues) nor its "Commons Numbers" spin-off that
>>>>>>> contains many modules, some of which are blocked by lack of consensus
>>>>>>> or dangling discussions.
>>>>>>>
>>>>>>> It probably makes sense, as a design strategy, to separate the
>>>>>>> function
>>>>>>>
>>>>>>> implementation from the streaming implementation. For example, a 2D
>>>>>>>
>>>>>>>> integer
>>>>>>>> array will probably require a different streaming implementation
>>>>>>>> than
>>>>>>>> a
>>>>>>>> 1D
>>>>>>>> double array, but they can  probably both be passed the same
>>>>>>>> function
>>>>>>>> handle to collect, say, the mean or max value.
>>>>>>>>
>>>>>>>> The role of commons might then be to provide a convenient interface,
>>>>>>>> so
>>>>>>>> that the user can simply call a static method like
>>>>>>>> SummaryStats.mean()
>>>>>>>> and
>>>>>>>> not have to worry about the implementation.
>>>>>>>>
>>>>>>>> The other difficulty I see, is that quantile and median statistics
>>>>>>>> will
>>>>>>>> not
>>>>>>>> be as easy to stream as statistics with a closed-form solution like
>>>>>>>> mean
>>>>>>>> or
>>>>>>>> variance. There may however be great algorithms out there for
>>>>>>>> pulling
>>>>>>>> the
>>>>>>>> median or the 95% quantile out of a stream -- if so they should be
>>>>>>>> used.
>>>>>>>>
>>>>>>>> Eric
>>>>>>>>
>>>>>>>>
>>>>>>>> Eric,
>>>>>>>>
>>>>>>>
>>>>>>> Would you be the official "mentor" for the GSoC participants that
>>>>>>> are interested in helping with the porting of "o.a.c.math4.stat"?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Gilles
>>>>>>>
>>>>>>>
>>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gilles Sadowski
Hello.

On Sun, 18 Mar 2018 23:29:44 +0530, Gimhana Nadeeshan wrote:
> Hi ,
>
> Thanks a lot Gilles for your valuable suggestions and give the
> reviews so
> quickly. I'll apply those corrections asked for any clarifications in
> here.
> By the way since I'm new to Apache Community I'm not yet familiar
> with some
> abbreviations used in the list. [such as ML archive, PMC ]

Sorry!
ML = Mailing List
PMC = Project Management Committee

>
> AFAICT, porting "o.a.c.math4.geometry" will be much
>> easier and likely to be finished before "Commons
>> Statistics". :-}
>>
>
> Since the design structure is the same, this would be interesting and
> easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea
> at
> JIRA !!]

If it's just a matter of creating a GSoC task, not a big problem. ;-)
For would-be "Commons Geometry", I'm waiting for the green light
from our expert contributor, Matt Juntunen.
In the meantime, you could also review the open issues for "Commons
Numbers":
   https://issues.apache.org/jira/projects/NUMBERS/

This is quite important as almost all other "Commons Math"
spin-offs will have some dependency on this new component;
hence a release of "Commons Numbers" must precede a release
of either "Commons Statistics" or "Commons Geometry".

Best,
Gilles

> Best Regards,
> Gimhana.
>
>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hello devs,

I have updated my draft proposal with @Gilles's suggestions[Draft Proposal
V1.1
<https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>]
and I think I need some more clarifications on below suggestions


> == "Background" section ==
>

>
number of listed/active contributors
> histogram of component's sizes (lines of code)


How to recognize "active contributors" ? In the ML or GitHub ? What do you
mean by "histogram of component's sizes (lines of code)" ?


== "Deliverables" section ==
>
>  * less dependencies" (an example?)
>  * "Advanced mathematical functionalities": other than what
>    exists now?  Or do you mean new interfaces (e.g. in
>    accordance with the APIs provided by JDK8)?
>

Most of the classes in "math4.stat" contains "math4.exception" classes. And
some classes in "correlations module" dependent to "RealMatix" interfaces.
Can those be considered as dependencies ? Can't exceptions be substituted
with inbuilt java exceptions ? @Gilles would you please explain this matrix
issue because I didn't get it much....


> == "Implementation" section ==

* "Design goals": give concrete examples.


I noted some examples for Design Goals in my proposal.  But I'm not sure I
that I wrote it correctly. (And don't know those are the examples you
expect me to mention.) Please clarify those too.

== "Results" section ==
>
> Hope to get comment from PMC...
> [Wish list, design requirements, mentor(s), etc.]
>

mentor(s)?? @Gilles,@Eric wont't you guys be the mentors of this project ??
I'm asking this because I'm new to ASF and GSoC !! And I'm appreciate to
know how this is working !!

In the meantime, you could also review the open issues for "Commons
> Numbers":
>   https://issues.apache.org/jira/projects/NUMBERS/
>
> This is quite important as almost all other "Commons Math"
> spin-offs will have some dependency on this new component;
> hence a release of "Commons Numbers" must precede a release
> of either "Commons Statistics" or "Commons Geometry".
>

Yep Wow sure...I'm on my way right now CM Numbers!!


Nadeeshan Gimhana

Batch Representative (15' batch)

Department of Computer Science & Engineering

University of Moratuwa

*Mobile :+94775744613*


*Website : https://ngimhana94.wixsite.com/gimhanadesilva/
<https://ngimhana94.wixsite.com/gimhanadesilva/>*

*L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
<http://www.linkedin.com/in/nadeeshangimhana/>*


* <http://www.linkedin.com/in/nadeeshangimhana/>*


* <http://www.linkedin.com/in/nadeeshangimhana/>*



On 19 March 2018 at 03:46, Gilles <[hidden email]> wrote:

> Hello.
>
> On Sun, 18 Mar 2018 23:29:44 +0530, Gimhana Nadeeshan wrote:
>
>> Hi ,
>>
>> Thanks a lot Gilles for your valuable suggestions and give the reviews so
>> quickly. I'll apply those corrections asked for any clarifications in
>> here.
>> By the way since I'm new to Apache Community I'm not yet familiar with
>> some
>> abbreviations used in the list. [such as ML archive, PMC ]
>>
>
> Sorry!
> ML = Mailing List
> PMC = Project Management Committee
>
>
>> AFAICT, porting "o.a.c.math4.geometry" will be much
>>
>>> easier and likely to be finished before "Commons
>>> Statistics". :-}
>>>
>>>
>> Since the design structure is the same, this would be interesting and
>> easier. But is it allowed in GSoC? [Since it not labeled as GSoC idea at
>> JIRA !!]
>>
>
> If it's just a matter of creating a GSoC task, not a big problem. ;-)
> For would-be "Commons Geometry", I'm waiting for the green light
> from our expert contributor, Matt Juntunen.
> In the meantime, you could also review the open issues for "Commons
> Numbers":
>   https://issues.apache.org/jira/projects/NUMBERS/
>
> This is quite important as almost all other "Commons Math"
> spin-offs will have some dependency on this new component;
> hence a release of "Commons Numbers" must precede a release
> of either "Commons Statistics" or "Commons Geometry".
>
> Best,
> Gilles
>
> Best Regards,
>> Gimhana.
>>
>> [...]
>>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gilles Sadowski
Hi Gimhana.

On Tue, 20 Mar 2018 14:36:10 +0530, Gimhana Nadeeshan wrote:
> Hello devs,
>
> I have updated my draft proposal with @Gilles's suggestions[Draft
> Proposal
> V1.1
>
> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>]
> and I think I need some more clarifications on below suggestions

I haven't read the new version yet; I'll try to answer
some of the below questions.

>
>
>> == "Background" section ==
>>
>
>>
> number of listed/active contributors
>> histogram of component's sizes (lines of code)
>
>
> How to recognize "active contributors" ?

Good question!
Perhaps organize a survey? :-)

> In the ML

A possible source, but probably not very efficient.  I certainly
do not suggest to perform a "manual" counting of who talks about
what. ;-)
Actually I don't know how to make automated queries (if possible).

You might want to have a look at "kibble":
   http://kibble.apache.org/
I asked that "Commons" projects be added to their "live demo".
But never got much time to explore the information it extracts
from various data sources.
I'd guess that it would be quite interesting to get more acquainted
with that tool.  Don't hesitate to subscribe to their ML (I'm not);
you might then report here what you found useful and which a lot of
us may not be aware of.

> or GitHub?

AFAICT, it would be completely biased.  Indeed if one looks at
that page, for example:
   https://github.com/apache/commons-rng/graphs/contributors
there is absolutely no trace that someone not mentioned there
performed 88% of all commits. [And this is much lower than the
actual number of deletions/additions...]

> What do you
> mean by "histogram of component's sizes (lines of code)" ?

There is a (command-line) tool called "cloc".
You could "clone" the repositories (of a selection of the
active or popular components) and run it on the "src/main"
directory to get some indication of the size of the projects.

> == "Deliverables" section ==
>>
>>  * less dependencies" (an example?)
>>  * "Advanced mathematical functionalities": other than what
>>    exists now?  Or do you mean new interfaces (e.g. in
>>    accordance with the APIs provided by JDK8)?
>>
>
> Most of the classes in "math4.stat" contains "math4.exception"
> classes. And
> some classes in "correlations module" dependent to "RealMatix"
> interfaces.
> Can those be considered as dependencies ?

Yes.  And we don't want that any of the new component to depend
on "Commons Math" code. [With perhaps an exception for the "test"
scope.]

> Can't exceptions be substituted
> with inbuilt java exceptions ?

Certainly.  You should take a look at what is done in "Commons
Numbers".

> @Gilles would you please explain this matrix
> issue because I didn't get it much....

Design issues were identified a long time ago.
You should be able to find them by doing a "Search issues"
in JIRA.
Bottom-line is we don't want to depend on an API that must
be changed (at some point).
Although not ideal, a workaround is to copy the necessary
functionality over to the new component but ensure that it
is *not* part of the API.
Better would be to tackle the issues themselves but it has
proven difficult and was postponed several times...

>> == "Implementation" section ==
>
> * "Design goals": give concrete examples.
>
>
> I noted some examples for Design Goals in my proposal.  But I'm not
> sure I
> that I wrote it correctly. (And don't know those are the examples you
> expect me to mention.) Please clarify those too.

IIRC, you mention streams. So for example, you could show how
the contribution would enhance usage (comparing "before"/"after").

>
> == "Results" section ==
>>
>> Hope to get comment from PMC...
>> [Wish list, design requirements, mentor(s), etc.]
>>
>
> mentor(s)?? @Gilles,@Eric wont't you guys be the mentors of this
> project ??
> I'm asking this because I'm new to ASF and GSoC !! And I'm appreciate
> to
> know how this is working !!

I don't know whether there is an official "mentor" role for GSoC,
and if so, what that implies.  This is also new for me; so I hope
that people can give advice about the "administrative" side.

For the contents, Eric indeed proposed to participate, but I
don't know how available he is (and how this will fit with
the GSoC timetable, which I also don't know).
It seems that Eric's contributions are currently extremely
asynchronous... :-}

> In the meantime, you could also review the open issues for "Commons
>> Numbers":
>>   https://issues.apache.org/jira/projects/NUMBERS/
>>
>> This is quite important as almost all other "Commons Math"
>> spin-offs will have some dependency on this new component;
>> hence a release of "Commons Numbers" must precede a release
>> of either "Commons Statistics" or "Commons Geometry".
>>
>
> Yep Wow sure...I'm on my way right now CM Numbers!!

Whenever you find something you can handle tackle, please
submit a PR.

Thanks,
Gilles

>>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hello devs,

By gone through @Gilles suggestions I found very interesting facts about
Commons projects.

Feel free to check Kibble reports
<https://demo.kibble.apache.org/dashboard.html?page=repos&subfilter=commons&author=true&from=1458585000&to=1521743399>
regarding these projects. It will be given a clear picture on the progress
of projects.In the Commons Projects side it seems visible growth of
contributors and releases.

And I created a simple doc using the data collected from CLOC tool to get
an idea of commons projects. I think This kind of document will help new
volunteers to get a rough idea of the scope and the current status of
projects before go deeper.Histogram of Commons Projects.
<https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu47V8LSglgsV5hBxVnLiCI/edit?usp=sharing>

Best Regards,
Gimhana.
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gilles Sadowski
Hi Gimhana.

On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:

> Hello devs,
>
> By gone through @Gilles suggestions I found very interesting facts
> about
> Commons projects.
>
> Feel free to check Kibble reports
>
> <https://demo.kibble.apache.org/dashboard.html?page=repos&subfilter=commons&author=true&from=1458585000&to=1521743399>
> regarding these projects. It will be given a clear picture on the
> progress
> of projects.In the Commons Projects side it seems visible growth of
> contributors and releases.

Note that some of the repositories included in that screen do
not belong to "Commons":
  * sling-*
  * webservices-*
  * xml-*

There should be a way to filter them out.

> And I created a simple doc using the data collected from CLOC tool to
> get
> an idea of commons projects. I think This kind of document will help
> new
> volunteers to get a rough idea of the scope and the current status of
> projects before go deeper.Histogram of Commons Projects.
>
> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu47V8LSglgsV5hBxVnLiCI/edit?usp=sharing>

Botched alignments...
"cloc" has several output formats from which you could produce
nicer tables.

Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hello devs,


> Note that some of the repositories included in that screen do
> not belong to "Commons":
>  * sling-*
>  * webservices-*
>  * xml-*


I'm working on it.(Still research on Kibble :-) )

Botched alignments...
> "cloc" has several output formats from which you could produce
> nicer tables.


I'm extremely sorry. I'll fix it asap.

Best Regards,
Gimhana

On 23 March 2018 at 17:43, Gilles <[hidden email]> wrote:

> Hi Gimhana.
>
> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>
>> Hello devs,
>>
>> By gone through @Gilles suggestions I found very interesting facts about
>> Commons projects.
>>
>> Feel free to check Kibble reports
>>
>> <https://demo.kibble.apache.org/dashboard.html?page=repos&su
>> bfilter=commons&author=true&from=1458585000&to=1521743399>
>> regarding these projects. It will be given a clear picture on the progress
>> of projects.In the Commons Projects side it seems visible growth of
>> contributors and releases.
>>
>
> Note that some of the repositories included in that screen do
> not belong to "Commons":
>  * sling-*
>  * webservices-*
>  * xml-*
>
> There should be a way to filter them out.
>
> And I created a simple doc using the data collected from CLOC tool to get
>> an idea of commons projects. I think This kind of document will help new
>> volunteers to get a rough idea of the scope and the current status of
>> projects before go deeper.Histogram of Commons Projects.
>>
>> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu4
>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>>
>
> Botched alignments...
> "cloc" has several output formats from which you could produce
> nicer tables.
>
> Regards,
> Gilles
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hello devs,

I have updated my draft proposal (Port codes from Commons Math
<https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>)
-Timeline added; before submitting the final at the Google site. Feel free
to comment and give feedback to improve it.

Best Regards,
Gimhana.

On 24 March 2018 at 17:35, Gimhana Nadeeshan <
[hidden email]> wrote:

> Hello devs,
>
>
>> Note that some of the repositories included in that screen do
>> not belong to "Commons":
>>  * sling-*
>>  * webservices-*
>>  * xml-*
>
>
> I'm working on it.(Still research on Kibble :-) )
>
> Botched alignments...
>> "cloc" has several output formats from which you could produce
>> nicer tables.
>
>
> I'm extremely sorry. I'll fix it asap.
>
> Best Regards,
> Gimhana
>
> On 23 March 2018 at 17:43, Gilles <[hidden email]> wrote:
>
>> Hi Gimhana.
>>
>> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>>
>>> Hello devs,
>>>
>>> By gone through @Gilles suggestions I found very interesting facts about
>>> Commons projects.
>>>
>>> Feel free to check Kibble reports
>>>
>>> <https://demo.kibble.apache.org/dashboard.html?page=repos&su
>>> bfilter=commons&author=true&from=1458585000&to=1521743399>
>>> regarding these projects. It will be given a clear picture on the
>>> progress
>>> of projects.In the Commons Projects side it seems visible growth of
>>> contributors and releases.
>>>
>>
>> Note that some of the repositories included in that screen do
>> not belong to "Commons":
>>  * sling-*
>>  * webservices-*
>>  * xml-*
>>
>> There should be a way to filter them out.
>>
>> And I created a simple doc using the data collected from CLOC tool to get
>>> an idea of commons projects. I think This kind of document will help new
>>> volunteers to get a rough idea of the scope and the current status of
>>> projects before go deeper.Histogram of Commons Projects.
>>>
>>> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu4
>>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>>>
>>
>> Botched alignments...
>> "cloc" has several output formats from which you could produce
>> nicer tables.
>>
>> Regards,
>> Gilles
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Eric Barnhill
HI Gimhana,

Sorry for the delay in response, but you posted this right before our
two-week Easter holiday, for which I was completely absent ; then I needed
a few days back at work to clean up all the mess. :)

Your overall goals look good to me. You have gone right to the heart of the
matter and propose to reinvent the statistics tools to make good use of the
Java 8 API. I think that's great and you should get started. Your goal of
eliminating dependencies on Commons-Math is also right.

I noticed this in the proposal:

*Covariance stats=
> IntStream.of(1,2,3).collect(Covariance::new,Covariance::accept,Covariance::combine);*


Can you explain a bit more what is happening with the method references
"accept" and "combine"?

Also this

*Week 2: Begin porting the code according to the dependency hierarchy
> identified. *
>

Sorry but I cannot see where you identify the dependency hierarchy. Are you
referring to your diagram?

Eric


On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
[hidden email]> wrote:

> Hello devs,
>
> I have updated my draft proposal (Port codes from Commons Math
> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTO
> eMnPaBsE9U5YhU/edit?usp=sharing>)
> -Timeline added; before submitting the final at the Google site. Feel free
> to comment and give feedback to improve it.
>
> Best Regards,
> Gimhana.
>
> On 24 March 2018 at 17:35, Gimhana Nadeeshan <
> [hidden email]> wrote:
>
> > Hello devs,
> >
> >
> >> Note that some of the repositories included in that screen do
> >> not belong to "Commons":
> >>  * sling-*
> >>  * webservices-*
> >>  * xml-*
> >
> >
> > I'm working on it.(Still research on Kibble :-) )
> >
> > Botched alignments...
> >> "cloc" has several output formats from which you could produce
> >> nicer tables.
> >
> >
> > I'm extremely sorry. I'll fix it asap.
> >
> > Best Regards,
> > Gimhana
> >
> > On 23 March 2018 at 17:43, Gilles <[hidden email]> wrote:
> >
> >> Hi Gimhana.
> >>
> >> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
> >>
> >>> Hello devs,
> >>>
> >>> By gone through @Gilles suggestions I found very interesting facts
> about
> >>> Commons projects.
> >>>
> >>> Feel free to check Kibble reports
> >>>
> >>> <https://demo.kibble.apache.org/dashboard.html?page=repos&su
> >>> bfilter=commons&author=true&from=1458585000&to=1521743399>
> >>> regarding these projects. It will be given a clear picture on the
> >>> progress
> >>> of projects.In the Commons Projects side it seems visible growth of
> >>> contributors and releases.
> >>>
> >>
> >> Note that some of the repositories included in that screen do
> >> not belong to "Commons":
> >>  * sling-*
> >>  * webservices-*
> >>  * xml-*
> >>
> >> There should be a way to filter them out.
> >>
> >> And I created a simple doc using the data collected from CLOC tool to
> get
> >>> an idea of commons projects. I think This kind of document will help
> new
> >>> volunteers to get a rough idea of the scope and the current status of
> >>> projects before go deeper.Histogram of Commons Projects.
> >>>
> >>> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu4
> >>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
> >>>
> >>
> >> Botched alignments...
> >> "cloc" has several output formats from which you could produce
> >> nicer tables.
> >>
> >> Regards,
> >> Gilles
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Eric Barnhill
A further comment: L1-type statistics such as median and quantiles can also
be included in the API by using the stream.sorted() method to sort the
stream first.

While it is true medians can be in the aggregate sped up by partitioning
algorithms, I think making use of built-in methods like sorted() is still
likely to produce the best and most consistent performance with the JVM.

On Thu, Apr 12, 2018 at 2:03 PM, Eric Barnhill <[hidden email]>
wrote:

> HI Gimhana,
>
> Sorry for the delay in response, but you posted this right before our
> two-week Easter holiday, for which I was completely absent ; then I needed
> a few days back at work to clean up all the mess. :)
>
> Your overall goals look good to me. You have gone right to the heart of
> the matter and propose to reinvent the statistics tools to make good use of
> the Java 8 API. I think that's great and you should get started. Your goal
> of eliminating dependencies on Commons-Math is also right.
>
> I noticed this in the proposal:
>
> *Covariance stats=
>> IntStream.of(1,2,3).collect(Covariance::new,Covariance::accept,Covariance::combine);*
>
>
> Can you explain a bit more what is happening with the method references
> "accept" and "combine"?
>
> Also this
>
> *Week 2: Begin porting the code according to the dependency hierarchy
>> identified. *
>>
>
> Sorry but I cannot see where you identify the dependency hierarchy. Are
> you referring to your diagram?
>
> Eric
>
>
> On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
> [hidden email]> wrote:
>
>> Hello devs,
>>
>> I have updated my draft proposal (Port codes from Commons Math
>> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqC
>> OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>)
>> -Timeline added; before submitting the final at the Google site. Feel free
>> to comment and give feedback to improve it.
>>
>> Best Regards,
>> Gimhana.
>>
>> On 24 March 2018 at 17:35, Gimhana Nadeeshan <
>> [hidden email]> wrote:
>>
>> > Hello devs,
>> >
>> >
>> >> Note that some of the repositories included in that screen do
>> >> not belong to "Commons":
>> >>  * sling-*
>> >>  * webservices-*
>> >>  * xml-*
>> >
>> >
>> > I'm working on it.(Still research on Kibble :-) )
>> >
>> > Botched alignments...
>> >> "cloc" has several output formats from which you could produce
>> >> nicer tables.
>> >
>> >
>> > I'm extremely sorry. I'll fix it asap.
>> >
>> > Best Regards,
>> > Gimhana
>> >
>> > On 23 March 2018 at 17:43, Gilles <[hidden email]> wrote:
>> >
>> >> Hi Gimhana.
>> >>
>> >> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>> >>
>> >>> Hello devs,
>> >>>
>> >>> By gone through @Gilles suggestions I found very interesting facts
>> about
>> >>> Commons projects.
>> >>>
>> >>> Feel free to check Kibble reports
>> >>>
>> >>> <https://demo.kibble.apache.org/dashboard.html?page=repos&su
>> >>> bfilter=commons&author=true&from=1458585000&to=1521743399>
>> >>> regarding these projects. It will be given a clear picture on the
>> >>> progress
>> >>> of projects.In the Commons Projects side it seems visible growth of
>> >>> contributors and releases.
>> >>>
>> >>
>> >> Note that some of the repositories included in that screen do
>> >> not belong to "Commons":
>> >>  * sling-*
>> >>  * webservices-*
>> >>  * xml-*
>> >>
>> >> There should be a way to filter them out.
>> >>
>> >> And I created a simple doc using the data collected from CLOC tool to
>> get
>> >>> an idea of commons projects. I think This kind of document will help
>> new
>> >>> volunteers to get a rough idea of the scope and the current status of
>> >>> projects before go deeper.Histogram of Commons Projects.
>> >>>
>> >>> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu4
>> >>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>> >>>
>> >>
>> >> Botched alignments...
>> >> "cloc" has several output formats from which you could produce
>> >> nicer tables.
>> >>
>> >> Regards,
>> >> Gilles
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >>
>> >>
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hello devs,

*Covariance stats=
> > IntStream.of(1,2,3).collect(Covariance::new,Covariance::
> accept,Covariance::combine);*
>
>
> Can you explain a bit more what is happening with the method references
> "accept" and "combine"?
>

The mutable reduction operation - collect() accumulates input elements into
a mutable result container, such as a Collection. It requires 3 functions.
A *supplier function* construct new instance of the result container.
An *accumulator
function *incorporate an input element into a result container and a *combining
function* to merge the contents of one result container into another.

So the accept() method, Records a new value into the result container.
(Here Covariance Object). Accepting the values in the Stream, to the
Covariance Object. It is the functionality of the functional interface I'm
going to implement to make use the Lambda Expressions of Java8.

combine()  method will combine the state of another Covariance Object into
this one. It merges the results of one results container to another.
Generation of new object is replaced by Replacing.

As a whole the meaning of those implementation is like generating a single
string object by concatenating strings in an array list. All the
statistical functionalities are served as a state object in this
implementation.

*Week 2: Begin porting the code according to the dependency hierarchy
> > identified. *
> >
>
> Sorry but I cannot see where you identify the dependency hierarchy. Are you
> referring to your diagram?


Dependency Hierarchy is not mentioned separately in the proposal. But I
have created the Time-line of the proposed project according to that. Less
dependent modules are porting at the beginning and gradually going for the
more coupled ones. So at that point of view I am going to port Ranking
Module at the beginning and gradually port
Interval,Regression,Descriptive,Correlation,Interference modules and so on.

A further comment: L1-type statistics such as median and quantiles can also
> be included in the API by using the stream.sorted() method to sort the
> stream first.
>
> While it is true medians can be in the aggregate sped up by partitioning
> algorithms, I think making use of built-in methods like sorted() is still
> likely to produce the best and most consistent performance with the JVM.


Definitely. Using built-in-methods provided, will make the package
performance and the ease of use and using inbuilt-methods where is possible
is one of the main goals of the proposed project.

Best Regards,
Gimhana.


Nadeeshan Gimhana

Batch Representative (15' batch)

Department of Computer Science & Engineering

University of Moratuwa

*Mobile :+94775744613*


*Website : https://ngimhana94.wixsite.com/gimhanadesilva/
<https://ngimhana94.wixsite.com/gimhanadesilva/>*

*L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
<http://www.linkedin.com/in/nadeeshangimhana/>*


* <http://www.linkedin.com/in/nadeeshangimhana/>*


* <http://www.linkedin.com/in/nadeeshangimhana/>*



On 13 April 2018 at 12:26, Eric Barnhill <[hidden email]> wrote:

> A further comment: L1-type statistics such as median and quantiles can also
> be included in the API by using the stream.sorted() method to sort the
> stream first.
>
> While it is true medians can be in the aggregate sped up by partitioning
> algorithms, I think making use of built-in methods like sorted() is still
> likely to produce the best and most consistent performance with the JVM.
>
> On Thu, Apr 12, 2018 at 2:03 PM, Eric Barnhill <[hidden email]>
> wrote:
>
> > HI Gimhana,
> >
> > Sorry for the delay in response, but you posted this right before our
> > two-week Easter holiday, for which I was completely absent ; then I
> needed
> > a few days back at work to clean up all the mess. :)
> >
> > Your overall goals look good to me. You have gone right to the heart of
> > the matter and propose to reinvent the statistics tools to make good use
> of
> > the Java 8 API. I think that's great and you should get started. Your
> goal
> > of eliminating dependencies on Commons-Math is also right.
> >
> > I noticed this in the proposal:
> >
> > *Covariance stats=
> >> IntStream.of(1,2,3).collect(Covariance::new,Covariance::
> accept,Covariance::combine);*
> >
> >
> > Can you explain a bit more what is happening with the method references
> > "accept" and "combine"?
> >
> > Also this
> >
> > *Week 2: Begin porting the code according to the dependency hierarchy
> >> identified. *
> >>
> >
> > Sorry but I cannot see where you identify the dependency hierarchy. Are
> > you referring to your diagram?
> >
> > Eric
> >
> >
> > On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
> > [hidden email]> wrote:
> >
> >> Hello devs,
> >>
> >> I have updated my draft proposal (Port codes from Commons Math
> >> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqC
> >> OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>)
> >> -Timeline added; before submitting the final at the Google site. Feel
> free
> >> to comment and give feedback to improve it.
> >>
> >> Best Regards,
> >> Gimhana.
> >>
> >> On 24 March 2018 at 17:35, Gimhana Nadeeshan <
> >> [hidden email]> wrote:
> >>
> >> > Hello devs,
> >> >
> >> >
> >> >> Note that some of the repositories included in that screen do
> >> >> not belong to "Commons":
> >> >>  * sling-*
> >> >>  * webservices-*
> >> >>  * xml-*
> >> >
> >> >
> >> > I'm working on it.(Still research on Kibble :-) )
> >> >
> >> > Botched alignments...
> >> >> "cloc" has several output formats from which you could produce
> >> >> nicer tables.
> >> >
> >> >
> >> > I'm extremely sorry. I'll fix it asap.
> >> >
> >> > Best Regards,
> >> > Gimhana
> >> >
> >> > On 23 March 2018 at 17:43, Gilles <[hidden email]>
> wrote:
> >> >
> >> >> Hi Gimhana.
> >> >>
> >> >> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
> >> >>
> >> >>> Hello devs,
> >> >>>
> >> >>> By gone through @Gilles suggestions I found very interesting facts
> >> about
> >> >>> Commons projects.
> >> >>>
> >> >>> Feel free to check Kibble reports
> >> >>>
> >> >>> <https://demo.kibble.apache.org/dashboard.html?page=repos&su
> >> >>> bfilter=commons&author=true&from=1458585000&to=1521743399>
> >> >>> regarding these projects. It will be given a clear picture on the
> >> >>> progress
> >> >>> of projects.In the Commons Projects side it seems visible growth of
> >> >>> contributors and releases.
> >> >>>
> >> >>
> >> >> Note that some of the repositories included in that screen do
> >> >> not belong to "Commons":
> >> >>  * sling-*
> >> >>  * webservices-*
> >> >>  * xml-*
> >> >>
> >> >> There should be a way to filter them out.
> >> >>
> >> >> And I created a simple doc using the data collected from CLOC tool to
> >> get
> >> >>> an idea of commons projects. I think This kind of document will help
> >> new
> >> >>> volunteers to get a rough idea of the scope and the current status
> of
> >> >>> projects before go deeper.Histogram of Commons Projects.
> >> >>>
> >> >>> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu4
> >> >>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
> >> >>>
> >> >>
> >> >> Botched alignments...
> >> >> "cloc" has several output formats from which you could produce
> >> >> nicer tables.
> >> >>
> >> >> Regards,
> >> >> Gilles
> >> >>
> >> >>
> >> >>
> >> >> ------------------------------------------------------------
> ---------
> >> >> To unsubscribe, e-mail: [hidden email]
> >> >> For additional commands, e-mail: [hidden email]
> >> >>
> >> >>
> >> >
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hello all,

As I proposed early I would like to begin port code from Commons-math
<https://github.com/apache/commons-math> to Commons-statistics
<https://github.com/apache/commons-statistics>.
(For further details refer my  GSoC Proposal
<https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>
though I'm not selected this year)

This is my proposed architecture in brief

   1. Commons-Statistics-Core => Frequency and StatUtils classes (Can add
   more common classes while implementing)
   2. Commons-Statistics-Correlation
   3. Commons-Statistics-Descriptive
   4. Commons-Statistics-Inference
   5. Commons-Statistics-Interval
   6. Commons-Statistics-Ranking
   7. Commons-Statistics-Regression

While I referring Commons-Geometry ported code to get a head start , I
found that each module inside, contain a pox.xml file. Are they implemented
as separate projects and then group in the same package? I'm asking because
Since I'm new to code porting :-).

If so in here should I create all 7 projects and then group those in same
project. Firstly I suppose to start port Ranking Module as it has less
dependencies comparing to others.

Would someone help me to get a head start ??

Best Regards,
Gimhana.


On 14 April 2018 at 14:24, Gimhana Nadeeshan <
[hidden email]> wrote:

> Hello devs,
>
> *Covariance stats=
>> > IntStream.of(1,2,3).collect(Covariance::new,Covariance::acce
>> pt,Covariance::combine);*
>>
>>
>> Can you explain a bit more what is happening with the method references
>> "accept" and "combine"?
>>
>
> The mutable reduction operation - collect() accumulates input elements
> into a mutable result container, such as a Collection. It requires 3
> functions. A *supplier function* construct new instance of the result
> container. An *accumulator function *incorporate an input element into a
> result container and a *combining function* to merge the contents of one
> result container into another.
>
> So the accept() method, Records a new value into the result container.
> (Here Covariance Object). Accepting the values in the Stream, to the
> Covariance Object. It is the functionality of the functional interface I'm
> going to implement to make use the Lambda Expressions of Java8.
>
> combine()  method will combine the state of another Covariance Object
> into this one. It merges the results of one results container to another.
> Generation of new object is replaced by Replacing.
>
> As a whole the meaning of those implementation is like generating a single
> string object by concatenating strings in an array list. All the
> statistical functionalities are served as a state object in this
> implementation.
>
> *Week 2: Begin porting the code according to the dependency hierarchy
>> > identified. *
>> >
>>
>> Sorry but I cannot see where you identify the dependency hierarchy. Are
>> you
>> referring to your diagram?
>
>
> Dependency Hierarchy is not mentioned separately in the proposal. But I
> have created the Time-line of the proposed project according to that. Less
> dependent modules are porting at the beginning and gradually going for the
> more coupled ones. So at that point of view I am going to port Ranking
> Module at the beginning and gradually port Interval,Regression,
> Descriptive,Correlation,Interference modules and so on.
>
> A further comment: L1-type statistics such as median and quantiles can also
>> be included in the API by using the stream.sorted() method to sort the
>> stream first.
>>
>> While it is true medians can be in the aggregate sped up by partitioning
>> algorithms, I think making use of built-in methods like sorted() is still
>> likely to produce the best and most consistent performance with the JVM.
>
>
> Definitely. Using built-in-methods provided, will make the package
> performance and the ease of use and using inbuilt-methods where is possible
> is one of the main goals of the proposed project.
>
> Best Regards,
> Gimhana.
>
>
> Nadeeshan Gimhana
>
> Batch Representative (15' batch)
>
> Department of Computer Science & Engineering
>
> University of Moratuwa
>
> *Mobile :+94775744613*
>
>
> *Website : https://ngimhana94.wixsite.com/gimhanadesilva/
> <https://ngimhana94.wixsite.com/gimhanadesilva/>*
>
> *L**inkedin **:www.linkedin.com/in/nadeeshangimhana/
> <http://www.linkedin.com/in/nadeeshangimhana/>*
>
>
> * <http://www.linkedin.com/in/nadeeshangimhana/>*
>
>
> * <http://www.linkedin.com/in/nadeeshangimhana/>*
>
>
>
> On 13 April 2018 at 12:26, Eric Barnhill <[hidden email]> wrote:
>
>> A further comment: L1-type statistics such as median and quantiles can
>> also
>> be included in the API by using the stream.sorted() method to sort the
>> stream first.
>>
>> While it is true medians can be in the aggregate sped up by partitioning
>> algorithms, I think making use of built-in methods like sorted() is still
>> likely to produce the best and most consistent performance with the JVM.
>>
>> On Thu, Apr 12, 2018 at 2:03 PM, Eric Barnhill <[hidden email]>
>> wrote:
>>
>> > HI Gimhana,
>> >
>> > Sorry for the delay in response, but you posted this right before our
>> > two-week Easter holiday, for which I was completely absent ; then I
>> needed
>> > a few days back at work to clean up all the mess. :)
>> >
>> > Your overall goals look good to me. You have gone right to the heart of
>> > the matter and propose to reinvent the statistics tools to make good
>> use of
>> > the Java 8 API. I think that's great and you should get started. Your
>> goal
>> > of eliminating dependencies on Commons-Math is also right.
>> >
>> > I noticed this in the proposal:
>> >
>> > *Covariance stats=
>> >> IntStream.of(1,2,3).collect(Covariance::new,Covariance::acce
>> pt,Covariance::combine);*
>> >
>> >
>> > Can you explain a bit more what is happening with the method references
>> > "accept" and "combine"?
>> >
>> > Also this
>> >
>> > *Week 2: Begin porting the code according to the dependency hierarchy
>> >> identified. *
>> >>
>> >
>> > Sorry but I cannot see where you identify the dependency hierarchy. Are
>> > you referring to your diagram?
>> >
>> > Eric
>> >
>> >
>> > On Mon, Mar 26, 2018 at 8:07 AM, Gimhana Nadeeshan <
>> > [hidden email]> wrote:
>> >
>> >> Hello devs,
>> >>
>> >> I have updated my draft proposal (Port codes from Commons Math
>> >> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqC
>> >> OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>)
>> >> -Timeline added; before submitting the final at the Google site. Feel
>> free
>> >> to comment and give feedback to improve it.
>> >>
>> >> Best Regards,
>> >> Gimhana.
>> >>
>> >> On 24 March 2018 at 17:35, Gimhana Nadeeshan <
>> >> [hidden email]> wrote:
>> >>
>> >> > Hello devs,
>> >> >
>> >> >
>> >> >> Note that some of the repositories included in that screen do
>> >> >> not belong to "Commons":
>> >> >>  * sling-*
>> >> >>  * webservices-*
>> >> >>  * xml-*
>> >> >
>> >> >
>> >> > I'm working on it.(Still research on Kibble :-) )
>> >> >
>> >> > Botched alignments...
>> >> >> "cloc" has several output formats from which you could produce
>> >> >> nicer tables.
>> >> >
>> >> >
>> >> > I'm extremely sorry. I'll fix it asap.
>> >> >
>> >> > Best Regards,
>> >> > Gimhana
>> >> >
>> >> > On 23 March 2018 at 17:43, Gilles <[hidden email]>
>> wrote:
>> >> >
>> >> >> Hi Gimhana.
>> >> >>
>> >> >> On Thu, 22 Mar 2018 22:11:31 +0530, Gimhana Nadeeshan wrote:
>> >> >>
>> >> >>> Hello devs,
>> >> >>>
>> >> >>> By gone through @Gilles suggestions I found very interesting facts
>> >> about
>> >> >>> Commons projects.
>> >> >>>
>> >> >>> Feel free to check Kibble reports
>> >> >>>
>> >> >>> <https://demo.kibble.apache.org/dashboard.html?page=repos&su
>> >> >>> bfilter=commons&author=true&from=1458585000&to=1521743399>
>> >> >>> regarding these projects. It will be given a clear picture on the
>> >> >>> progress
>> >> >>> of projects.In the Commons Projects side it seems visible growth of
>> >> >>> contributors and releases.
>> >> >>>
>> >> >>
>> >> >> Note that some of the repositories included in that screen do
>> >> >> not belong to "Commons":
>> >> >>  * sling-*
>> >> >>  * webservices-*
>> >> >>  * xml-*
>> >> >>
>> >> >> There should be a way to filter them out.
>> >> >>
>> >> >> And I created a simple doc using the data collected from CLOC tool
>> to
>> >> get
>> >> >>> an idea of commons projects. I think This kind of document will
>> help
>> >> new
>> >> >>> volunteers to get a rough idea of the scope and the current status
>> of
>> >> >>> projects before go deeper.Histogram of Commons Projects.
>> >> >>>
>> >> >>> <https://docs.google.com/document/d/1qPWWnA9hWgKytLWI3A3rXu4
>> >> >>> 7V8LSglgsV5hBxVnLiCI/edit?usp=sharing>
>> >> >>>
>> >> >>
>> >> >> Botched alignments...
>> >> >> "cloc" has several output formats from which you could produce
>> >> >> nicer tables.
>> >> >>
>> >> >> Regards,
>> >> >> Gilles
>> >> >>
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------
>> ---------
>> >> >> To unsubscribe, e-mail: [hidden email]
>> >> >> For additional commands, e-mail: [hidden email]
>> >> >>
>> >> >>
>> >> >
>> >>
>> >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gilles Sadowski
Hi Gimhana.

On Sat, 5 May 2018 15:50:43 +0530, Gimhana Nadeeshan wrote:

> Hello all,
>
> As I proposed early I would like to begin port code from Commons-math
> <https://github.com/apache/commons-math> to Commons-statistics
> <https://github.com/apache/commons-statistics>.
> (For further details refer my  GSoC Proposal
>
> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqCOBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>
> though I'm not selected this year)
>
> This is my proposed architecture in brief
>
>    1. Commons-Statistics-Core => Frequency and StatUtils classes (Can
> add
>    more common classes while implementing)
>    2. Commons-Statistics-Correlation
>    3. Commons-Statistics-Descriptive
>    4. Commons-Statistics-Inference
>    5. Commons-Statistics-Interval
>    6. Commons-Statistics-Ranking
>    7. Commons-Statistics-Regression

Nit-pick: module names have no capital in them (just a convention).
So: "commons-statistics-core" rather than "Commons-Statistics-Core",
etc.

> While I referring Commons-Geometry

No need to refer to that project since "Commons Statistics" has been
set up:
   http://commons.apache.org/proper/commons-statistics/

The code repository is here:
   
https://git1-us-west.apache.org/repos/asf?p=commons-statistics.git;a=tree
It already contains a "commons-statistics-distribution" module whose
layout can be duplicated in the modules which you are proposing above
(with appropriate changes of course).

> ported code to get a head start , I
> found that each module inside, contain a pox.xml file. Are they
> implemented
> as separate projects and then group in the same package? I'm asking
> because
> Since I'm new to code porting :-).

A requirement is that no package should be shared between different
modules; by convention, the top-level package of module
   commons-statistics-descriptive
would be
   org.apache.commons.statistics.descriptive

[And so on for the other modules. But I'd suggest you start with one.]

> If so in here should I create all 7 projects and then group those in
> same
> project.

No, the project is "Commons Statisitics" and it would contain several
_maven_ modules, each of which should ultimately map to a _JPMS_ (JDK9)
module).

> Firstly I suppose to start port Ranking Module as it has less
> dependencies comparing to others.

Fine. But don't forget to browse through the JIRA issues of Commons
Math (CM) for things that would need fixing.  Whenever it's the case,
please open a report in the new JIRA project (linking to the CM
report), and post here your proposed solution (or questions).

We might want to create a public branch for that work in order to
merge PRs more quickly without risk of breaking "master".
What do you think?  Eric?

> Would someone help me to get a head start ??

What else do you need?

Best regards,
Gilles

> Best Regards,
> Gimhana.
>
>
>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Statistics] Port codes from Commons Math

Gimhana Nadeeshan
Hi all,

We might want to create a public branch for that work in order to
> merge PRs more quickly without risk of breaking "master".
> What do you think?  Eric?
>

I ported the Statistics Interval Module and would like to get your reviews.
How should I make the Pull request ?

Best Regards,
Gimhana


On 5 May 2018 at 18:50, Gilles <[hidden email]> wrote:

> Hi Gimhana.
>
> On Sat, 5 May 2018 15:50:43 +0530, Gimhana Nadeeshan wrote:
>
>> Hello all,
>>
>> As I proposed early I would like to begin port code from Commons-math
>> <https://github.com/apache/commons-math> to Commons-statistics
>> <https://github.com/apache/commons-statistics>.
>> (For further details refer my  GSoC Proposal
>>
>> <https://docs.google.com/document/d/1sqSa0hrYc2AD75RZyJRkeqC
>> OBOqTOeMnPaBsE9U5YhU/edit?usp=sharing>
>> though I'm not selected this year)
>>
>> This is my proposed architecture in brief
>>
>>    1. Commons-Statistics-Core => Frequency and StatUtils classes (Can add
>>    more common classes while implementing)
>>    2. Commons-Statistics-Correlation
>>    3. Commons-Statistics-Descriptive
>>    4. Commons-Statistics-Inference
>>    5. Commons-Statistics-Interval
>>    6. Commons-Statistics-Ranking
>>    7. Commons-Statistics-Regression
>>
>
> Nit-pick: module names have no capital in them (just a convention).
> So: "commons-statistics-core" rather than "Commons-Statistics-Core", etc.
>
> While I referring Commons-Geometry
>>
>
> No need to refer to that project since "Commons Statistics" has been
> set up:
>   http://commons.apache.org/proper/commons-statistics/
>
> The code repository is here:
>   https://git1-us-west.apache.org/repos/asf?p=commons-statisti
> cs.git;a=tree
> It already contains a "commons-statistics-distribution" module whose
> layout can be duplicated in the modules which you are proposing above
> (with appropriate changes of course).
>
> ported code to get a head start , I
>> found that each module inside, contain a pox.xml file. Are they
>> implemented
>> as separate projects and then group in the same package? I'm asking
>> because
>> Since I'm new to code porting :-).
>>
>
> A requirement is that no package should be shared between different
> modules; by convention, the top-level package of module
>   commons-statistics-descriptive
> would be
>   org.apache.commons.statistics.descriptive
>
> [And so on for the other modules. But I'd suggest you start with one.]
>
> If so in here should I create all 7 projects and then group those in same
>> project.
>>
>
> No, the project is "Commons Statisitics" and it would contain several
> _maven_ modules, each of which should ultimately map to a _JPMS_ (JDK9)
> module).
>
> Firstly I suppose to start port Ranking Module as it has less
>> dependencies comparing to others.
>>
>
> Fine. But don't forget to browse through the JIRA issues of Commons
> Math (CM) for things that would need fixing.  Whenever it's the case,
> please open a report in the new JIRA project (linking to the CM
> report), and post here your proposed solution (or questions).
>
> We might want to create a public branch for that work in order to
> merge PRs more quickly without risk of breaking "master".
> What do you think?  Eric?
>
> Would someone help me to get a head start ??
>>
>
> What else do you need?
>
> Best regards,
> Gilles
>
> Best Regards,
>> Gimhana.
>>
>>
>> [...]
>>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
123