[jira] [Created] (SANDBOX-341) [functor] New components: summarize and aggregate

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
[functor] New components: summarize and aggregate
-------------------------------------------------

                 Key: SANDBOX-341
                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
             Project: Commons Sandbox
          Issue Type: Improvement
          Components: Functor
         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
            Reporter: Liviu Tudor
            Priority: Minor
         Attachments: commons-functor-aggregate+summarizer.zip

This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
In brief, the 2 components:
* aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
* timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liviu Tudor updated SANDBOX-341:
--------------------------------

    Attachment: commons-functor-aggregate+summarizer.zip

Diff taken against trunk on 29/Aug/2011 (about midnight in UK :)

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094610#comment-13094610 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

Hi Liviu,
thanks for your contribution! some suggestions in order to apply the patch:

 * Instead of adding new interfaces, I suggest you reusing existing functor APIs, i.e. if I were you, I would modify the {{org.apache.commons.functor.aggregate.Aggregator<T>}} interface in order to inherit from {{org.apache.commons.functor.UnaryProcedure<A>}}; take a look at the existing codebase, don't just add new stuff!
 * The Apache header is missing in every class you contributed, it must be included in order to apply the patch;
 * Please remove the @author tags, committers/contributors are mentioned in the POM (ignore existing tags) - add yourself in the POM in the contributors list;
 * {{org.apache.commons.functor.summarizer.TimedSummarizer}} could inherit from {{org.apache.commons.functor.BinaryFunction}};
 * I didn't understand why the {{org.apache.commons.functor.summarizer.TimedSummarizer#MAIN_TIMER}} is static, it IIUC could cause some leaks.
 * Test cases must be implement as valid JUnit tests executed by the surefire plugin, AFAIK usage of static main() methods as tests is not encouraged.

Many thanks in advance for your effort, looking forward for the next patch! ;)
 *

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094689#comment-13094689 ]

Liviu Tudor commented on SANDBOX-341:
-------------------------------------

Hi Simone,

Thanks for the feedback -- my first commit, so I expected a lot of things to go wrong.
Since my code was initially committed to {{commons.lang}}, I didn't have access to the classes you mentioned, so I will have a look and use them.
Secondly, regarding the Apache header, am I correct in assuming that simply copying and pasting it from an existing class in the repository should be enough? Point taken about the {{@author}} tag, that was generated automatically by my Eclipse (note to self: adjust my _Eclipse_ templates to remove that and a few other things!).
I didn't spend too much time on unit testing purely because I wasn't sure still if this was the right place to commit this component, now that I am on the right track I will spend some time to implement those properly.
Last but not least, {{org.apache.commons.functor.summarizer.TimedSummarizer#MAIN_TIMER}} is static in order to avoid creating a new one for each instance -- but you are right, it can create a memory leak. I will actually spend some time on it and probably provide a factory for creating instances of the {{TimedSummarizer}} class which will allow for the instance to use it's own {{Timer}} or a shared one; this is so "power users" can use the shared one and minimize the memory and threading footprint, while still allowing for Joe Average to avoid the memory leakage by using a per-instance {{Timer}}. Do you think that would be a good idea?

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096001#comment-13096001 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

Hi Liviu!
about the Apache header, just open existing functor classes and copy from them, it is more than enough :P

It would be really appreciated if you could improve unit testing (take a look also to the generated code coverage report) otherwise, you can understand, people would be a little reluctant on adding non tested code...

I'd tend to agree that passing an existing {{Timer}} would allow smarter users better managing their resources, go for it and see how things go :)

Looking forward for the next patch!

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099643#comment-13099643 ]

Liviu Tudor commented on SANDBOX-341:
-------------------------------------

OK so I reckon I'm 90% there so I'm just submitting this patch to see if I'm doing something right or what am I doing wrong.
I've reworked the whole package structure to use {{UnaryFunction}} and {{BinaryFunction}} and as such I ended up with just one package (aggregator) and the _summarizer_ becomes just an implementation of that ({{AbstractTimedAggregator}}). I'm still not 100% sure with having the {{AbstractListBackedAggregator}} and {{AbstractNoStoreAggregator}} extending {{AbstractTimedAggregator}} -- and I wanted to run this past you guys to see what would be a more elegant solution?
I need to provide unit tests for {{AbstractTimedAggregator}} and add a couple of more tests to the exiting ones to test for timer support as well but otherwise I think it's done.
Would love some feedback and suggestions.

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

     [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liviu Tudor updated SANDBOX-341:
--------------------------------

    Attachment: commons-functor.patch.bz2

update code -- 90% done (hopefully!)

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102461#comment-13102461 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

I had a look at the new patch, few more minor comments:

 * since {{Aggregator<T>}} now extends {{Function<T>}}, there's no need to redeclare the {{T evaluate()}} method;
 * in abstract implementations, Class members access has to be done via getters/setters instead of making fields protected (checkstyle would complain about it)

Moreover, a favor: can you please create a documentation page under {{src/site/xdoc/}}? That would be useful to put users aware of such feature (and yes, the rest of documentation is still a TODO :P)

Many thanks in advance for your hard work!

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102471#comment-13102471 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

Forgot to ask the favor of having a look also at generated findbugs/checkstyle reports and fix potential errors/violations in classes in the package you added.
TIA!

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102724#comment-13102724 ]

Liviu Tudor commented on SANDBOX-341:
-------------------------------------

Hi Simone,

Agreed on not needing the {{evaluate()}} function -- really it was included there so I can have the JavaDoc section for it -- but I've moved that in the class header.
Will make the changes to use getters/setters for abstract classes; meanwhile though, to help me address all the checkstyle/findbugs issues, can you point me in the right direction as to how to run the findbugs/checkstyle against the code and where would I find the results? (sorry a bit new to the whole maven setup -- and luckily in most environments i've used it so far, I relied on the automated hudson/integration to produce these nightly reports :)
With regards to the document under {{src/site/xdocs}} -- what should this include? A high level overview of the components and possibly including sample code maybe with patterns of usage? Or what things should I touch on?
One last thing, going back through your comments, I've only just noticed a small discrepancy -- you have asked me to add myself to the list of developers in the {{pom.xml}} -- I opted to put myself on the contributors list as I'm guessing _developers_ would be the Apache peeps actually maintaing/managing this project, right? Not a big thing really, just trying to iron out all the small details at the moment, so thought I'd ask about this as well.
Sorry to be a pain!

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102956#comment-13102956 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

Hi Liviu!
no pain at all - smart and volunteering people like you are lifeblood for community's health, so I am more than happy to put contributors in the position of working in the better way ;)

So, to produce those reports, it is enough you run {{mvn clean site && open target/site/checkstyle.html}}, it will produce a page like the one actually online[1].
Then, if you {{open target/site/cobertura/index.html}} you can have a look at the generated cobertura report.
Just take care of classes you put in the summarize and aggregate package, no need to take care of the rest - I'm doing it during the spare time, feel free to fill a new issue with a new patch anyway ;)

Documentation: take a look at existing pages, the xdoc format[2] is ~html, so not hard to write. And yes, just an overview of added classes, samples, patterns... under both PoVs of users (how to write client code) and developers (extending APIs, if allowed, etc)

Pom: you did write, sorry for the mistake, actually {{{<contributors>}} is the right place where adding your name - hopefully you will become a committer :)

Last: the right place where sharing thoughts and discussing is the dev@ ML, in Apache there's the mantra "if it didn't happen in the list, it didn't happen", JIRA is more like a "wall stories" reminder ;)

Looking forward for the next patch, have a nice day!

[1] http://commons.apache.org/sandbox/functor/checkstyle.html
[2] http://commons.apache.org/sandbox/functor/cobertura/index.html
[3] http://maven.apache.org/doxia/references/xdoc-format.html

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105946#comment-13105946 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

Hi Liviu,
please update your local copy of the code, I recently ported the tests to JUnit4 style.

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107751#comment-13107751 ]

Liviu Tudor commented on SANDBOX-341:
-------------------------------------

Hi Simone, thanks for the heads-up on this, sorry, been away last week only just got back. Will go through the code and your comments and update the code.

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126523#comment-13126523 ]

Simone Tripodi commented on SANDBOX-341:
----------------------------------------

Hi Liviu,
how things are going? Do you need any help from my side to finalize it?
TIA,
Simo
               

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SANDBOX-341) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/SANDBOX-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129700#comment-13129700 ]

Liviu Tudor commented on SANDBOX-341:
-------------------------------------

Hi Simone,

I've already replied to you in private, this is just for other maintainers to be aware of this: I'm currently in the process of relocating to Palo Alto, CA (anyone living in the Bay area up for an informal meet at some point? ;) so things a bit hectic but still planning to finish this. ETA is about 2 weeks -- so end of October.
Hope this is ok,

Liv
               

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: SANDBOX-341
>                 URL: https://issues.apache.org/jira/browse/SANDBOX-341
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: Functor
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Moved] (FUNCTOR-1) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

     [ https://issues.apache.org/jira/browse/FUNCTOR-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simone Tripodi moved SANDBOX-341 to FUNCTOR-1:
----------------------------------------------

    Component/s:     (was: Functor)
     Issue Type: New Feature  (was: Improvement)
            Key: FUNCTOR-1  (was: SANDBOX-341)
        Project: Commons Functor  (was: Commons Sandbox)
   

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: FUNCTOR-1
>                 URL: https://issues.apache.org/jira/browse/FUNCTOR-1
>             Project: Commons Functor
>          Issue Type: New Feature
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FUNCTOR-1) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/FUNCTOR-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212470#comment-13212470 ]

Liviu Tudor commented on FUNCTOR-1:
-----------------------------------

Hi Simone,

I finally got around to doing the coding for this. Please have a look at the attached patch -- I checked the findbugs/checkstyle as instructed and as far as I can tell it looks ok.
Looking forward to your comments on it!
               

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: FUNCTOR-1
>                 URL: https://issues.apache.org/jira/browse/FUNCTOR-1
>             Project: Commons Functor
>          Issue Type: New Feature
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Assignee: Simone Tripodi
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (FUNCTOR-1) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

     [ https://issues.apache.org/jira/browse/FUNCTOR-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liviu Tudor updated FUNCTOR-1:
------------------------------

    Attachment: functor.patch.bz2

Patch as per last comment.
               

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: FUNCTOR-1
>                 URL: https://issues.apache.org/jira/browse/FUNCTOR-1
>             Project: Commons Functor
>          Issue Type: New Feature
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Assignee: Simone Tripodi
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2, functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FUNCTOR-1) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/FUNCTOR-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213013#comment-13213013 ]

Liviu Tudor commented on FUNCTOR-1:
-----------------------------------

Question -- looking at FUNCTOR-4 , I need to add some examples etc for these components to make it easier for users. Where would these examples go?
               

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: FUNCTOR-1
>                 URL: https://issues.apache.org/jira/browse/FUNCTOR-1
>             Project: Commons Functor
>          Issue Type: New Feature
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Assignee: Simone Tripodi
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2, functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (FUNCTOR-1) [functor] New components: summarize and aggregate

Michael Osipov (Jira)
In reply to this post by Michael Osipov (Jira)

    [ https://issues.apache.org/jira/browse/FUNCTOR-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216357#comment-13216357 ]

Simone Tripodi commented on FUNCTOR-1:
--------------------------------------

Hi Liviu,
thanks for the contribution, patch looks good. I have anyway few observations before applying it:

 * the name {{*AggregatorFunction_2_}} doesn't help since doesn't express any semantic. Can you change them in a more self descriptive names?

 * In the {{org.apache.commons.functor.aggregate.functions}} package I just see the implementation for {{Double}} and one {{Integer}} related class - maybe you forgot to {{svn add}} the other implementations?

 * Could you provide implementations also for other primitive wrappers?

Documentaion under {{src/site/xdoc}} is more than fine and satisfies FUNCTOR-4 - feel free to enhance the doc adding more samples!

Thanks for the hard work, looking forward the next patch version!
-Simo
               

> [functor] New components: summarize and aggregate
> -------------------------------------------------
>
>                 Key: FUNCTOR-1
>                 URL: https://issues.apache.org/jira/browse/FUNCTOR-1
>             Project: Commons Functor
>          Issue Type: New Feature
>         Environment: JDK 1.6.0_25 but should work with any JDK 5+ (possibly 1.4 though I haven't tested).
>            Reporter: Liviu Tudor
>            Assignee: Simone Tripodi
>            Priority: Minor
>              Labels: features
>         Attachments: commons-functor-aggregate+summarizer.zip, commons-functor.patch.bz2, functor.patch.bz2
>
>
> This is the next step from https://issues.apache.org/jira/browse/SANDBOX-340 -- as instructed I'm finally hoping to get the code in the right place and hopefully this is something that the functor component could do with.
> Whereas initially I just started with the summarizer component, I have added now the second one, the "aggregator" as they are somehow related. If this code proves to be useful to functor in any way, it would actually be good to get some feedback on these 2 to see if the class hierarchy can in fact be changed to share some common functionality as I feel (probably due to the similar needs that lead to writing/using these components) that somehow they should share a common base.
> In brief, the 2 components:
> * aggregator: this just allows for data to be aggregated in a user defined way (e.g. stored in a list for the purpose of averaging, computing the arithmetic median etc). The classes provided actually offer the implementation for storing data in a list and computing the above-mentioned values or summing up everything.
> * timed summarizer: this is another variation of the aggreator, however, it adds the idea of regular "flushes", so based on a timer it will reset the value and start summing/aggregating the data again. Rather than using an aggregator which would store the whole data series (possibly for applying more complex formulas), this component just computes on the fly on each request the formula and stores the result of it. (Which does mean things like computing arithmetic mean, median etc would be difficult to compute without knowing upfront how many calls will be received -- i.e. how many elements we will be required to summarize/aggregate.) So the memory footprint of running this is much smaller -- even though, as I said, it achieves similar results. I have only provided a summarizer which operates on integers, but obviously others for float, double etc can be created if we go ahead with this design.
> Hopefully the above make sense; this code has resulted from finding myself writing similar components to these a few times and because it's always been either one type (e.g. aggregator) or another (summarizer) I haven't given quite possibly enough thought to the class design to join these 2. Also, unfortunately, the time I sat down to make these components a bit more general and submitted issue 340 was nearly 3 months ago so I'm trying to remember myself all the ideas I had at a time so bear with me please if these are still  a bit fuzzy :) However, if you can make use of these I'm quite happy to elaborate on areas that are unclear and obviously put some effort into getting these components to the standards required to put these into a release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
12