New Sub-project Proposal.

classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

New Sub-project Proposal.

Claude Warren
Having spoken with several people at ApacheCon, I would like to see a
bloomfilter sub project.   I have code that is already under Apache License
that I am willing to contribute as the basis The goal of the sub-project
would be to produce a reference implementation that could be used by other
projects that desire to have use bloom filters and bloom filter based
collections.

Is there any objection to doing this?  Other than asking here, what is the
proper path to get a sub-project created,  What does the Commons PMC
require?

Any assistance and comments would be apprecieated.
Claude

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

jochen-2
Hi, Claude,

having read, what a bloom filter is, a subproject sounds unnecessary
to me. I'd recommend, that you contribute your code to Commons
Collections, which seems to me to be a logical target.

Jochen

On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]> wrote:

>
> Having spoken with several people at ApacheCon, I would like to see a
> bloomfilter sub project.   I have code that is already under Apache License
> that I am willing to contribute as the basis The goal of the sub-project
> would be to produce a reference implementation that could be used by other
> projects that desire to have use bloom filters and bloom filter based
> collections.
>
> Is there any objection to doing this?  Other than asking here, what is the
> proper path to get a sub-project created,  What does the Commons PMC
> require?
>
> Any assistance and comments would be apprecieated.
> Claude
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Bruno P. Kinoshita-2
 +1 Collections sounds like a good place for a bloom filter.

Bruno

    On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <[hidden email]> wrote:  
 
 Hi, Claude,

having read, what a bloom filter is, a subproject sounds unnecessary
to me. I'd recommend, that you contribute your code to Commons
Collections, which seems to me to be a logical target.

Jochen

On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]> wrote:

>
> Having spoken with several people at ApacheCon, I would like to see a
> bloomfilter sub project.  I have code that is already under Apache License
> that I am willing to contribute as the basis The goal of the sub-project
> would be to produce a reference implementation that could be used by other
> projects that desire to have use bloom filters and bloom filter based
> collections.
>
> Is there any objection to doing this?  Other than asking here, what is the
> proper path to get a sub-project created,  What does the Commons PMC
> require?
>
> Any assistance and comments would be apprecieated.
> Claude
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

 
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

garydgregory
I would like to know more. I am curious since looking up whether an element
is in a set is done via a hash code. How do you do better than that?

Gary

On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita <[hidden email]> wrote:

>  +1 Collections sounds like a good place for a bloom filter.
>
> Bruno
>
>     On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <
> [hidden email]> wrote:
>
>  Hi, Claude,
>
> having read, what a bloom filter is, a subproject sounds unnecessary
> to me. I'd recommend, that you contribute your code to Commons
> Collections, which seems to me to be a logical target.
>
> Jochen
>
> On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]> wrote:
> >
> > Having spoken with several people at ApacheCon, I would like to see a
> > bloomfilter sub project.  I have code that is already under Apache
> License
> > that I am willing to contribute as the basis The goal of the sub-project
> > would be to produce a reference implementation that could be used by
> other
> > projects that desire to have use bloom filters and bloom filter based
> > collections.
> >
> > Is there any objection to doing this?  Other than asking here, what is
> the
> > proper path to get a sub-project created,  What does the Commons PMC
> > require?
> >
> > Any assistance and comments would be apprecieated.
> > Claude
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

sebb-2-2
On Wed, 11 Sep 2019 at 12:36, Gary Gregory <[hidden email]> wrote:
>
> I would like to know more. I am curious since looking up whether an element
> is in a set is done via a hash code. How do you do better than that?

Wikipedia has a good explanation:

https://en.wikipedia.org/wiki/Bloom_filter

Basically instead of a hash you create a bit mask and set/test that.

This can give false positives, but not false negatives.

> Gary
>
> On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita <[hidden email]> wrote:
>
> >  +1 Collections sounds like a good place for a bloom filter.
> >
> > Bruno
> >
> >     On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <
> > [hidden email]> wrote:
> >
> >  Hi, Claude,
> >
> > having read, what a bloom filter is, a subproject sounds unnecessary
> > to me. I'd recommend, that you contribute your code to Commons
> > Collections, which seems to me to be a logical target.
> >
> > Jochen
> >
> > On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]> wrote:
> > >
> > > Having spoken with several people at ApacheCon, I would like to see a
> > > bloomfilter sub project.  I have code that is already under Apache
> > License
> > > that I am willing to contribute as the basis The goal of the sub-project
> > > would be to produce a reference implementation that could be used by
> > other
> > > projects that desire to have use bloom filters and bloom filter based
> > > collections.
> > >
> > > Is there any objection to doing this?  Other than asking here, what is
> > the
> > > proper path to get a sub-project created,  What does the Commons PMC
> > > require?
> > >
> > > Any assistance and comments would be apprecieated.
> > > Claude
> > >
> > > --
> > > I like: Like Like - The likeliest place on the web
> > > <http://like-like.xenei.com>
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
In reply to this post by garydgregory
First it is important to remember that Bloom filters tell you where things
are NOT.  Second it is important to understand that Bloom filters can give
false positives but never false negatives.  Seems kind of pointless I know
but consider the case where you have 10K buckets that may contain the item
you are looking for.  If you can reduce the number of buckets you are
searching you can significantly speed up the search.  In a case like this a
bloom filter could be used "in front" of each bucket as a gatekeeper.  When
ever an object goes in the bucket the objects bloom filter is added to the
bucket bloom filter.  If you want to search the 10K buckets for an item
then you build the  bloom filter for the item you are looking for and check
the bloom filter on each bucket.  If the filter says that the item is not
in the bucket then you can skip that bucket, if the filter says it is in
the bucket you search the bucket to verify that it is not a false
positive.  A common use for bloom filters is to determine if an expensive
call should be made.  For example many browsers have a bloom filter that
comprises all the known bad URLs (ones that serve malware, etc).  When the
URL is entered in the browser it is checked against the bloom filter.  If
it is not there the request goes through as normal.  If it is there then
the browser makes the expensive lookup call to a server to determine if the
URL really is in the database of bad URLs.

So a bloom filter is generally used to front a collection to determine if
the collection should be searched.  And as has been pointed out it doesn't
make much sense to use it in front of an in-memory hash table.  However,
applications like Cassandra and Hadoop use bloom filters for various
reasons.  I have recently been made aware of an Apache Incubator project
that wants to implement bloom filters as part of their project.  Other uses
for bloom filters include sharding data.  There is a measure of difference
between filters called a hamming distance.  This is the number of bits that
have to be "flipped" to turn one filter into another, and is very similar
to Hamming measures found in string and other similar comparisons.  By
using the hamming value it is possible to distribute data among a set of
buckets by simply putting the value in the bucket that it is "closest" to
in terms of Hamming distance.  Searcing takes place as noted above.
However this has some interesting properties.  For example you can add new
buckets at any time simply by adding an empty bucket and bloom filter to
the collection of buckets and the system will start filling the bucket as
appropriate.  In addition if a bucket/shard becomes "full", where "full" is
an implementation dependent decision (e.g. the index on a DB table reaches
the inflection point where performance degradation begins), you can pull a
bucket out of consideration for inserts but still search it without
significant stress or change to the system.

Internally Bloom filters are bit vectors.  The length of the vector being
determined by the number of items that are to be placed in the bucket and
the acceptable hash collision rate.  There is a function that will
calculate the length of the vector and the number of  functions to use to
turn on the bits.[1]  In general you build a bloom filter by creating a
hash and using the modulus of that to determine which bit in the vector to
turn on.  You then furn a second hash, usually the same hash function with
a different seed to determine the next bit and so on until the number of
functions has been executed.  Importantly, there are comments tin the
Cassandra code that describe a new and much faster way of doing this using
128-bit hashes and splitting them into a pair of longs.  To check
membership in a bloom filter you buid the bloom-filter for the target (T -
the thing we are looking for) and get the filter for the candidate (C - the
bucket) and evaluate T&C = T
if it evaluates as true there is a match if it not then T is guaranteed not
to be in the bucket.

There are several of us at Apache that work on bloom filters and we have
been unable to locate an open source library that is under the Apache or
similar license.  I have done work on a concept call a proto-bloom filter
that does the hashing early and then makes it faster to generated concrete
bloom filters of various sizes, thus enabling a more efficient layering of
filters.

Several of us have done research into ways to index filters so that if you
have a collection of filters you can quickly locate the candidates.  This
is not as simple as it sounds due to the way in which filters are checked
and the issues with over filled filters yielding high false positive
counts, in addition the check is so fast that the over head for most
indexing eats up any increase in speed. My research has shown that for
filter collections of less than 1000 it is always faster to do a linear
search through an array than any other means.  Above 1000 entries there are
techniques that can yield faster evaluation in some cases.

The long and short of this is that there is no good unencumbered open
source library available at the current time.  Myself and several others,
in conversation here at ApacheCon, have expressed interest in creating such
a library.  We have fairly mature code that we are willing to contribute
along with code that embodies new thinking in the bloom filter arena (like
proto-bloom filters).  We just need a space within the Apache family to
host it.  The code base seems to small to be a separate project and so we
come to Apache Commons seeking a home.

Claude





[1] https://hur.st/bloomfilter/

On Wed, Sep 11, 2019 at 12:36 PM Gary Gregory <[hidden email]>
wrote:

> I would like to know more. I am curious since looking up whether an element
> is in a set is done via a hash code. How do you do better than that?
>
> Gary
>
> On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita <[hidden email]> wrote:
>
> >  +1 Collections sounds like a good place for a bloom filter.
> >
> > Bruno
> >
> >     On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <
> > [hidden email]> wrote:
> >
> >  Hi, Claude,
> >
> > having read, what a bloom filter is, a subproject sounds unnecessary
> > to me. I'd recommend, that you contribute your code to Commons
> > Collections, which seems to me to be a logical target.
> >
> > Jochen
> >
> > On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]> wrote:
> > >
> > > Having spoken with several people at ApacheCon, I would like to see a
> > > bloomfilter sub project.  I have code that is already under Apache
> > License
> > > that I am willing to contribute as the basis The goal of the
> sub-project
> > > would be to produce a reference implementation that could be used by
> > other
> > > projects that desire to have use bloom filters and bloom filter based
> > > collections.
> > >
> > > Is there any objection to doing this?  Other than asking here, what is
> > the
> > > proper path to get a sub-project created,  What does the Commons PMC
> > > require?
> > >
> > > Any assistance and comments would be apprecieated.
> > > Claude
> > >
> > > --
> > > I like: Like Like - The likeliest place on the web
> > > <http://like-like.xenei.com>
> > > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>


--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
In reply to this post by sebb-2-2
As another note, we have had discussions here at ApacheCon about developing
a method to exchange bloom filter hashing algorithms to make it easier for
systems to publish interfaces where bloom filters are passed as the search
parameters.

Also, bloom filters are good for looking for "and"ed values.  So if I
create bloom filters for cars based on model, make, color and use those in
the bucket filter then I can determine if there are any red cars, or red
cars made by Ford in the bucket.

Claude

On Wed, Sep 11, 2019 at 3:49 PM sebb <[hidden email]> wrote:

> On Wed, 11 Sep 2019 at 12:36, Gary Gregory <[hidden email]> wrote:
> >
> > I would like to know more. I am curious since looking up whether an
> element
> > is in a set is done via a hash code. How do you do better than that?
>
> Wikipedia has a good explanation:
>
> https://en.wikipedia.org/wiki/Bloom_filter
>
> Basically instead of a hash you create a bit mask and set/test that.
>
> This can give false positives, but not false negatives.
>
> > Gary
> >
> > On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita <[hidden email]> wrote:
> >
> > >  +1 Collections sounds like a good place for a bloom filter.
> > >
> > > Bruno
> > >
> > >     On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <
> > > [hidden email]> wrote:
> > >
> > >  Hi, Claude,
> > >
> > > having read, what a bloom filter is, a subproject sounds unnecessary
> > > to me. I'd recommend, that you contribute your code to Commons
> > > Collections, which seems to me to be a logical target.
> > >
> > > Jochen
> > >
> > > On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]>
> wrote:
> > > >
> > > > Having spoken with several people at ApacheCon, I would like to see a
> > > > bloomfilter sub project.  I have code that is already under Apache
> > > License
> > > > that I am willing to contribute as the basis The goal of the
> sub-project
> > > > would be to produce a reference implementation that could be used by
> > > other
> > > > projects that desire to have use bloom filters and bloom filter based
> > > > collections.
> > > >
> > > > Is there any objection to doing this?  Other than asking here, what
> is
> > > the
> > > > proper path to get a sub-project created,  What does the Commons PMC
> > > > require?
> > > >
> > > > Any assistance and comments would be apprecieated.
> > > > Claude
> > > >
> > > > --
> > > > I like: Like Like - The likeliest place on the web
> > > > <http://like-like.xenei.com>
> > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Gilles Sadowski-2
In reply to this post by Claude Warren
Hi.

Le mer. 11 sept. 2019 à 17:06, Claude Warren <[hidden email]> a écrit :

>
> [...]
>
> The long and short of this is that there is no good unencumbered open
> source library available at the current time.  Myself and several others,
> in conversation here at ApacheCon, have expressed interest in creating such
> a library.  We have fairly mature code that we are willing to contribute
> along with code that embodies new thinking in the bloom filter arena (like
> proto-bloom filters).  We just need a space within the Apache family to
> host it.  The code base seems to small to be a separate project and so we
> come to Apache Commons seeking a home.

IMO, a pretty compelling rationale for hosting it at "Commons".
If people think that [Collections] would be the best home, I'd suggest
making that component modular; hence unnecessary dependencies
would be a non-issue.

Regards,
Gilles

>
> Claude
>
>
>
>
>
> [1] https://hur.st/bloomfilter/
>
> On Wed, Sep 11, 2019 at 12:36 PM Gary Gregory <[hidden email]>
> wrote:
>
> > I would like to know more. I am curious since looking up whether an element
> > is in a set is done via a hash code. How do you do better than that?
> >
> > Gary
> >
> > On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita <[hidden email]> wrote:
> >
> > >  +1 Collections sounds like a good place for a bloom filter.
> > >
> > > Bruno
> > >
> > >     On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <
> > > [hidden email]> wrote:
> > >
> > >  Hi, Claude,
> > >
> > > having read, what a bloom filter is, a subproject sounds unnecessary
> > > to me. I'd recommend, that you contribute your code to Commons
> > > Collections, which seems to me to be a logical target.
> > >
> > > Jochen
> > >
> > > On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]> wrote:
> > > >
> > > > Having spoken with several people at ApacheCon, I would like to see a
> > > > bloomfilter sub project.  I have code that is already under Apache
> > > License
> > > > that I am willing to contribute as the basis The goal of the
> > sub-project
> > > > would be to produce a reference implementation that could be used by
> > > other
> > > > projects that desire to have use bloom filters and bloom filter based
> > > > collections.
> > > >
> > > > Is there any objection to doing this?  Other than asking here, what is
> > > the
> > > > proper path to get a sub-project created,  What does the Commons PMC
> > > > require?
> > > >
> > > > Any assistance and comments would be apprecieated.
> > > > Claude
> > > >
> > > > --
> > > > I like: Like Like - The likeliest place on the web
> > > > <http://like-like.xenei.com>
> > > > LinkedIn: http://www.linkedin.com/in/claudewarren

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Stian Soiland-Reyes
On Wed, 11 Sep 2019 18:12:24 +0200, Gilles Sadowski <[hidden email]> wrote:

> > The long and short of this is that there is no good unencumbered open
> > source library available at the current time.  Myself and several others,
> > in conversation here at ApacheCon, have expressed interest in creating such
> > a library.  We have fairly mature code that we are willing to contribute
> > along with code that embodies new thinking in the bloom filter arena (like
> > proto-bloom filters).  We just need a space within the Apache family to
> > host it.  The code base seems to small to be a separate project and so we
> > come to Apache Commons seeking a home.
>
> IMO, a pretty compelling rationale for hosting it at "Commons".
> If people think that [Collections] would be the best home, I'd suggest
> making that component modular; hence unnecessary dependencies
> would be a non-issue.

Thanks Claude for that brilliant explanation about bloom filter!
(please blog it!)

At the moment Commons Collections have no runtime dependencies, and only
3 test-dependencies.
<https://github.com/apache/commons-collections/blob/master/pom.xml#L443>

So unless the Bloom filter code comes with any new depdendencies seen to
"bloat" rest of Commons, it could fit well in there.

It would be a new package "bloom" as it's something to use for building
collections rather than directly being a collection - but Collections
already have similar packages for balanced trees, etc.


Looking at the code which I suspect is
<https://github.com/Claudenw/BloomFilter/tree/master/src/main/java/org/xenei/bloomfilter>
it looks pretty clean, independent and straight forward to read and understand.

From Claude's email I see it is it's *use* that needs explanation!

The only dependencies in that code seem to be within the
org.xenei.bloomfilter.collections package which currently include use of
Jena's extended iterator classes.

This could probably be refactored, if this package is also to be
included? (those classes would also fit naturally into Collections)


--
Stian Soiland-Reyes
The University of Manchester 🐝
https://www.esciencelab.org.uk/
https://orcid.org/0000-0001-9842-9718


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
@stain. You have correctly identified the code in my repository.  The code
could be refactored to use streams or we could bring the jena iterator
extensions into commons.  I had suggested that at one time but there were
concerns about conflicts with existing code.  Duplication with of
functionality was the main concern as I recall.

Claude

On Wed, Sep 11, 2019, 09:43 Stian Soiland-Reyes <[hidden email]> wrote:

> On Wed, 11 Sep 2019 18:12:24 +0200, Gilles Sadowski <[hidden email]>
> wrote:
> > > The long and short of this is that there is no good unencumbered open
> > > source library available at the current time.  Myself and several
> others,
> > > in conversation here at ApacheCon, have expressed interest in creating
> such
> > > a library.  We have fairly mature code that we are willing to
> contribute
> > > along with code that embodies new thinking in the bloom filter arena
> (like
> > > proto-bloom filters).  We just need a space within the Apache family to
> > > host it.  The code base seems to small to be a separate project and so
> we
> > > come to Apache Commons seeking a home.
> >
> > IMO, a pretty compelling rationale for hosting it at "Commons".
> > If people think that [Collections] would be the best home, I'd suggest
> > making that component modular; hence unnecessary dependencies
> > would be a non-issue.
>
> Thanks Claude for that brilliant explanation about bloom filter!
> (please blog it!)
>
> At the moment Commons Collections have no runtime dependencies, and only
> 3 test-dependencies.
> <https://github.com/apache/commons-collections/blob/master/pom.xml#L443>
>
> So unless the Bloom filter code comes with any new depdendencies seen to
> "bloat" rest of Commons, it could fit well in there.
>
> It would be a new package "bloom" as it's something to use for building
> collections rather than directly being a collection - but Collections
> already have similar packages for balanced trees, etc.
>
>
> Looking at the code which I suspect is
> <
> https://github.com/Claudenw/BloomFilter/tree/master/src/main/java/org/xenei/bloomfilter
> >
> it looks pretty clean, independent and straight forward to read and
> understand.
>
> From Claude's email I see it is it's *use* that needs explanation!
>
> The only dependencies in that code seem to be within the
> org.xenei.bloomfilter.collections package which currently include use of
> Jena's extended iterator classes.
>
> This could probably be refactored, if this package is also to be
> included? (those classes would also fit naturally into Collections)
>
>
> --
> Stian Soiland-Reyes
> The University of Manchester 🐝
> https://www.esciencelab.org.uk/
> https://orcid.org/0000-0001-9842-9718
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Stian Soiland-Reyes
I certainly got thinking about streams for those methods using the ancy
integrators yes. Commons Collection is already on JDK8, so if that is
sufficient, go for it!

We would need to do IP clearance to bring in the code formally to ASF. It
should be easy if it is just you who made it under Apache license.

On Wed, 11 Sep 2019, 18:44 Claude Warren, <[hidden email]> wrote:

> @stain. You have correctly identified the code in my repository.  The code
> could be refactored to use streams or we could bring the jena iterator
> extensions into commons.  I had suggested that at one time but there were
> concerns about conflicts with existing code.  Duplication with of
> functionality was the main concern as I recall.
>
> Claude
>
> On Wed, Sep 11, 2019, 09:43 Stian Soiland-Reyes <[hidden email]> wrote:
>
> > On Wed, 11 Sep 2019 18:12:24 +0200, Gilles Sadowski <
> [hidden email]>
> > wrote:
> > > > The long and short of this is that there is no good unencumbered open
> > > > source library available at the current time.  Myself and several
> > others,
> > > > in conversation here at ApacheCon, have expressed interest in
> creating
> > such
> > > > a library.  We have fairly mature code that we are willing to
> > contribute
> > > > along with code that embodies new thinking in the bloom filter arena
> > (like
> > > > proto-bloom filters).  We just need a space within the Apache family
> to
> > > > host it.  The code base seems to small to be a separate project and
> so
> > we
> > > > come to Apache Commons seeking a home.
> > >
> > > IMO, a pretty compelling rationale for hosting it at "Commons".
> > > If people think that [Collections] would be the best home, I'd suggest
> > > making that component modular; hence unnecessary dependencies
> > > would be a non-issue.
> >
> > Thanks Claude for that brilliant explanation about bloom filter!
> > (please blog it!)
> >
> > At the moment Commons Collections have no runtime dependencies, and only
> > 3 test-dependencies.
> > <https://github.com/apache/commons-collections/blob/master/pom.xml#L443>
> >
> > So unless the Bloom filter code comes with any new depdendencies seen to
> > "bloat" rest of Commons, it could fit well in there.
> >
> > It would be a new package "bloom" as it's something to use for building
> > collections rather than directly being a collection - but Collections
> > already have similar packages for balanced trees, etc.
> >
> >
> > Looking at the code which I suspect is
> > <
> >
> https://github.com/Claudenw/BloomFilter/tree/master/src/main/java/org/xenei/bloomfilter
> > >
> > it looks pretty clean, independent and straight forward to read and
> > understand.
> >
> > From Claude's email I see it is it's *use* that needs explanation!
> >
> > The only dependencies in that code seem to be within the
> > org.xenei.bloomfilter.collections package which currently include use of
> > Jena's extended iterator classes.
> >
> > This could probably be refactored, if this package is also to be
> > included? (those classes would also fit naturally into Collections)
> >
> >
> > --
> > Stian Soiland-Reyes
> > The University of Manchester 🐝
> > https://www.esciencelab.org.uk/
> > https://orcid.org/0000-0001-9842-9718
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

garydgregory
In reply to this post by Claude Warren
On Wed, Sep 11, 2019 at 11:06 AM Claude Warren <[hidden email]> wrote:

> First it is important to remember that Bloom filters tell you where things
> are NOT.  Second it is important to understand that Bloom filters can give
> false positives but never false negatives.  Seems kind of pointless I know
> but consider the case where you have 10K buckets that may contain the item
> you are looking for.  If you can reduce the number of buckets you are
> searching you can significantly speed up the search.  In a case like this a
> bloom filter could be used "in front" of each bucket as a gatekeeper.  When
> ever an object goes in the bucket the objects bloom filter is added to the
> bucket bloom filter.  If you want to search the 10K buckets for an item
> then you build the  bloom filter for the item you are looking for and check
> the bloom filter on each bucket.  If the filter says that the item is not
> in the bucket then you can skip that bucket, if the filter says it is in
> the bucket you search the bucket to verify that it is not a false
> positive.  A common use for bloom filters is to determine if an expensive
> call should be made.  For example many browsers have a bloom filter that
> comprises all the known bad URLs (ones that serve malware, etc).  When the
> URL is entered in the browser it is checked against the bloom filter.  If
> it is not there the request goes through as normal.  If it is there then
> the browser makes the expensive lookup call to a server to determine if the
> URL really is in the database of bad URLs.
>
> So a bloom filter is generally used to front a collection to determine if
> the collection should be searched.  And as has been pointed out it doesn't
> make much sense to use it in front of an in-memory hash table.  However,
> applications like Cassandra and Hadoop use bloom filters for various
> reasons.  I have recently been made aware of an Apache Incubator project
> that wants to implement bloom filters as part of their project.  Other uses
> for bloom filters include sharding data.  There is a measure of difference
> between filters called a hamming distance.  This is the number of bits that
> have to be "flipped" to turn one filter into another, and is very similar
> to Hamming measures found in string and other similar comparisons.  By
> using the hamming value it is possible to distribute data among a set of
> buckets by simply putting the value in the bucket that it is "closest" to
> in terms of Hamming distance.  Searcing takes place as noted above.
> However this has some interesting properties.  For example you can add new
> buckets at any time simply by adding an empty bucket and bloom filter to
> the collection of buckets and the system will start filling the bucket as
> appropriate.  In addition if a bucket/shard becomes "full", where "full" is
> an implementation dependent decision (e.g. the index on a DB table reaches
> the inflection point where performance degradation begins), you can pull a
> bucket out of consideration for inserts but still search it without
> significant stress or change to the system.
>
> Internally Bloom filters are bit vectors.  The length of the vector being
> determined by the number of items that are to be placed in the bucket and
> the acceptable hash collision rate.  There is a function that will
> calculate the length of the vector and the number of  functions to use to
> turn on the bits.[1]  In general you build a bloom filter by creating a
> hash and using the modulus of that to determine which bit in the vector to
> turn on.  You then furn a second hash, usually the same hash function with
> a different seed to determine the next bit and so on until the number of
> functions has been executed.  Importantly, there are comments tin the
> Cassandra code that describe a new and much faster way of doing this using
> 128-bit hashes and splitting them into a pair of longs.  To check
> membership in a bloom filter you buid the bloom-filter for the target (T -
> the thing we are looking for) and get the filter for the candidate (C - the
> bucket) and evaluate T&C = T
> if it evaluates as true there is a match if it not then T is guaranteed not
> to be in the bucket.
>
> There are several of us at Apache that work on bloom filters and we have
> been unable to locate an open source library that is under the Apache or
> similar license.  I have done work on a concept call a proto-bloom filter
> that does the hashing early and then makes it faster to generated concrete
> bloom filters of various sizes, thus enabling a more efficient layering of
> filters.
>
> Several of us have done research into ways to index filters so that if you
> have a collection of filters you can quickly locate the candidates.  This
> is not as simple as it sounds due to the way in which filters are checked
> and the issues with over filled filters yielding high false positive
> counts, in addition the check is so fast that the over head for most
> indexing eats up any increase in speed. My research has shown that for
> filter collections of less than 1000 it is always faster to do a linear
> search through an array than any other means.  Above 1000 entries there are
> techniques that can yield faster evaluation in some cases.
>
> The long and short of this is that there is no good unencumbered open
> source library available at the current time.  Myself and several others,
> in conversation here at ApacheCon, have expressed interest in creating such
> a library.  We have fairly mature code that we are willing to contribute
> along with code that embodies new thinking in the bloom filter arena (like
> proto-bloom filters).  We just need a space within the Apache family to
> host it.  The code base seems to small to be a separate project and so we
> come to Apache Commons seeking a home.
>
> Claude
>

Hi Claude,

Thank you for the explainer :-) quite helpful.

Gary


>
>
>
>
> [1] https://hur.st/bloomfilter/
>
> On Wed, Sep 11, 2019 at 12:36 PM Gary Gregory <[hidden email]>
> wrote:
>
> > I would like to know more. I am curious since looking up whether an
> element
> > is in a set is done via a hash code. How do you do better than that?
> >
> > Gary
> >
> > On Tue, Sep 10, 2019, 16:51 Bruno P. Kinoshita <[hidden email]> wrote:
> >
> > >  +1 Collections sounds like a good place for a bloom filter.
> > >
> > > Bruno
> > >
> > >     On Wednesday, 11 September 2019, 8:00:45 am NZST, Jochen Wiedmann <
> > > [hidden email]> wrote:
> > >
> > >  Hi, Claude,
> > >
> > > having read, what a bloom filter is, a subproject sounds unnecessary
> > > to me. I'd recommend, that you contribute your code to Commons
> > > Collections, which seems to me to be a logical target.
> > >
> > > Jochen
> > >
> > > On Tue, Sep 10, 2019 at 8:45 PM Claude Warren <[hidden email]>
> wrote:
> > > >
> > > > Having spoken with several people at ApacheCon, I would like to see a
> > > > bloomfilter sub project.  I have code that is already under Apache
> > > License
> > > > that I am willing to contribute as the basis The goal of the
> > sub-project
> > > > would be to produce a reference implementation that could be used by
> > > other
> > > > projects that desire to have use bloom filters and bloom filter based
> > > > collections.
> > > >
> > > > Is there any objection to doing this?  Other than asking here, what
> is
> > > the
> > > > proper path to get a sub-project created,  What does the Commons PMC
> > > > require?
> > > >
> > > > Any assistance and comments would be apprecieated.
> > > > Claude
> > > >
> > > > --
> > > > I like: Like Like - The likeliest place on the web
> > > > <http://like-like.xenei.com>
> > > > LinkedIn: http://www.linkedin.com/in/claudewarren
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

garydgregory
In reply to this post by Stian Soiland-Reyes
So is the idea to provide wrappers on Sets or a Set implementation?

Gary

On Wed, Sep 11, 2019 at 3:54 PM Stian Soiland-Reyes <[hidden email]>
wrote:

> I certainly got thinking about streams for those methods using the ancy
> integrators yes. Commons Collection is already on JDK8, so if that is
> sufficient, go for it!
>
> We would need to do IP clearance to bring in the code formally to ASF. It
> should be easy if it is just you who made it under Apache license.
>
> On Wed, 11 Sep 2019, 18:44 Claude Warren, <[hidden email]> wrote:
>
> > @stain. You have correctly identified the code in my repository.  The
> code
> > could be refactored to use streams or we could bring the jena iterator
> > extensions into commons.  I had suggested that at one time but there were
> > concerns about conflicts with existing code.  Duplication with of
> > functionality was the main concern as I recall.
> >
> > Claude
> >
> > On Wed, Sep 11, 2019, 09:43 Stian Soiland-Reyes <[hidden email]>
> wrote:
> >
> > > On Wed, 11 Sep 2019 18:12:24 +0200, Gilles Sadowski <
> > [hidden email]>
> > > wrote:
> > > > > The long and short of this is that there is no good unencumbered
> open
> > > > > source library available at the current time.  Myself and several
> > > others,
> > > > > in conversation here at ApacheCon, have expressed interest in
> > creating
> > > such
> > > > > a library.  We have fairly mature code that we are willing to
> > > contribute
> > > > > along with code that embodies new thinking in the bloom filter
> arena
> > > (like
> > > > > proto-bloom filters).  We just need a space within the Apache
> family
> > to
> > > > > host it.  The code base seems to small to be a separate project and
> > so
> > > we
> > > > > come to Apache Commons seeking a home.
> > > >
> > > > IMO, a pretty compelling rationale for hosting it at "Commons".
> > > > If people think that [Collections] would be the best home, I'd
> suggest
> > > > making that component modular; hence unnecessary dependencies
> > > > would be a non-issue.
> > >
> > > Thanks Claude for that brilliant explanation about bloom filter!
> > > (please blog it!)
> > >
> > > At the moment Commons Collections have no runtime dependencies, and
> only
> > > 3 test-dependencies.
> > > <
> https://github.com/apache/commons-collections/blob/master/pom.xml#L443>
> > >
> > > So unless the Bloom filter code comes with any new depdendencies seen
> to
> > > "bloat" rest of Commons, it could fit well in there.
> > >
> > > It would be a new package "bloom" as it's something to use for building
> > > collections rather than directly being a collection - but Collections
> > > already have similar packages for balanced trees, etc.
> > >
> > >
> > > Looking at the code which I suspect is
> > > <
> > >
> >
> https://github.com/Claudenw/BloomFilter/tree/master/src/main/java/org/xenei/bloomfilter
> > > >
> > > it looks pretty clean, independent and straight forward to read and
> > > understand.
> > >
> > > From Claude's email I see it is it's *use* that needs explanation!
> > >
> > > The only dependencies in that code seem to be within the
> > > org.xenei.bloomfilter.collections package which currently include use
> of
> > > Jena's extended iterator classes.
> > >
> > > This could probably be refactored, if this package is also to be
> > > included? (those classes would also fit naturally into Collections)
> > >
> > >
> > > --
> > > Stian Soiland-Reyes
> > > The University of Manchester 🐝
> > > https://www.esciencelab.org.uk/
> > > https://orcid.org/0000-0001-9842-9718
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
Actually the code I was thinking of is the multi-filter branch.  It cleans
up some names and simplifies a few things.  The collections and storage
packages might be best added as examples rather than as mainline code.

In this case we just provide the bloom filter implementation,  If we want
to provide the container implementation then I think it should probably be
modified to accept any SortedSet or NavigatableSet in the constructor.

When I return home, next week, I'll take a swipe at moving the packages
over to org.apache.commons.collections4.bloomfilter package (unless there
is a better package name).  We can then look at the entire code donation
and decide what changes need to be made before it is accepted.

Does this sound like a reasonable approach?

Claude

On Thu, Sep 12, 2019 at 2:34 AM Gary Gregory <[hidden email]> wrote:

> So is the idea to provide wrappers on Sets or a Set implementation?
>
> Gary
>
> On Wed, Sep 11, 2019 at 3:54 PM Stian Soiland-Reyes <[hidden email]>
> wrote:
>
> > I certainly got thinking about streams for those methods using the ancy
> > integrators yes. Commons Collection is already on JDK8, so if that is
> > sufficient, go for it!
> >
> > We would need to do IP clearance to bring in the code formally to ASF. It
> > should be easy if it is just you who made it under Apache license.
> >
> > On Wed, 11 Sep 2019, 18:44 Claude Warren, <[hidden email]> wrote:
> >
> > > @stain. You have correctly identified the code in my repository.  The
> > code
> > > could be refactored to use streams or we could bring the jena iterator
> > > extensions into commons.  I had suggested that at one time but there
> were
> > > concerns about conflicts with existing code.  Duplication with of
> > > functionality was the main concern as I recall.
> > >
> > > Claude
> > >
> > > On Wed, Sep 11, 2019, 09:43 Stian Soiland-Reyes <[hidden email]>
> > wrote:
> > >
> > > > On Wed, 11 Sep 2019 18:12:24 +0200, Gilles Sadowski <
> > > [hidden email]>
> > > > wrote:
> > > > > > The long and short of this is that there is no good unencumbered
> > open
> > > > > > source library available at the current time.  Myself and several
> > > > others,
> > > > > > in conversation here at ApacheCon, have expressed interest in
> > > creating
> > > > such
> > > > > > a library.  We have fairly mature code that we are willing to
> > > > contribute
> > > > > > along with code that embodies new thinking in the bloom filter
> > arena
> > > > (like
> > > > > > proto-bloom filters).  We just need a space within the Apache
> > family
> > > to
> > > > > > host it.  The code base seems to small to be a separate project
> and
> > > so
> > > > we
> > > > > > come to Apache Commons seeking a home.
> > > > >
> > > > > IMO, a pretty compelling rationale for hosting it at "Commons".
> > > > > If people think that [Collections] would be the best home, I'd
> > suggest
> > > > > making that component modular; hence unnecessary dependencies
> > > > > would be a non-issue.
> > > >
> > > > Thanks Claude for that brilliant explanation about bloom filter!
> > > > (please blog it!)
> > > >
> > > > At the moment Commons Collections have no runtime dependencies, and
> > only
> > > > 3 test-dependencies.
> > > > <
> > https://github.com/apache/commons-collections/blob/master/pom.xml#L443>
> > > >
> > > > So unless the Bloom filter code comes with any new depdendencies seen
> > to
> > > > "bloat" rest of Commons, it could fit well in there.
> > > >
> > > > It would be a new package "bloom" as it's something to use for
> building
> > > > collections rather than directly being a collection - but Collections
> > > > already have similar packages for balanced trees, etc.
> > > >
> > > >
> > > > Looking at the code which I suspect is
> > > > <
> > > >
> > >
> >
> https://github.com/Claudenw/BloomFilter/tree/master/src/main/java/org/xenei/bloomfilter
> > > > >
> > > > it looks pretty clean, independent and straight forward to read and
> > > > understand.
> > > >
> > > > From Claude's email I see it is it's *use* that needs explanation!
> > > >
> > > > The only dependencies in that code seem to be within the
> > > > org.xenei.bloomfilter.collections package which currently include use
> > of
> > > > Jena's extended iterator classes.
> > > >
> > > > This could probably be refactored, if this package is also to be
> > > > included? (those classes would also fit naturally into Collections)
> > > >
> > > >
> > > > --
> > > > Stian Soiland-Reyes
> > > > The University of Manchester 🐝
> > > > https://www.esciencelab.org.uk/
> > > > https://orcid.org/0000-0001-9842-9718
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > > >
> > >
> >
>


--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Stian Soiland-Reyes
On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren <[hidden email]> wrote:

> Actually the code I was thinking of is the multi-filter branch.  It cleans
> up some names and simplifies a few things.  The collections and storage
> packages might be best added as examples rather than as mainline code.
>
> In this case we just provide the bloom filter implementation,  If we want
> to provide the container implementation then I think it should probably be
> modified to accept any SortedSet or NavigatableSet in the constructor.
>
> When I return home, next week, I'll take a swipe at moving the packages
> over to org.apache.commons.collections4.bloomfilter package (unless there
> is a better package name).  We can then look at the entire code donation
> and decide what changes need to be made before it is accepted.
>
> Does this sound like a reasonable approach?

Sounds reasonable to me - then it's easy to see what will be the code
donation, they could be examples at first that we can link to from
documentation, several commons packages have such example codes.


Perhaps use packagename "commons.collections4.bloomfilter" without
org.apache before it's been IP-cleared? Alternatively add it on a fork
of https://github.com/apache/commons-collections/ so we don't confuse
anyone.


I see on your branch you are using some new dependencies like
org.xenei.blockstorage and org.xenei.spanbuffer.SpanBuffer - would these
be needed as well if we include the container implementation or are they
more for disk-based collections?

--
Stian Soiland-Reyes
https://orcid.org/0000-0001-9842-9718


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Gilles Sadowski-2
Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes <[hidden email]> a écrit :

>
> On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren <[hidden email]> wrote:
> > Actually the code I was thinking of is the multi-filter branch.  It cleans
> > up some names and simplifies a few things.  The collections and storage
> > packages might be best added as examples rather than as mainline code.
> >
> > In this case we just provide the bloom filter implementation,  If we want
> > to provide the container implementation then I think it should probably be
> > modified to accept any SortedSet or NavigatableSet in the constructor.
> >
> > When I return home, next week, I'll take a swipe at moving the packages
> > over to org.apache.commons.collections4.bloomfilter package (unless there
> > is a better package name).  We can then look at the entire code donation
> > and decide what changes need to be made before it is accepted.
> >
> > Does this sound like a reasonable approach?

Any comment about my suggestion to make [Collections] modular,
starting with that code ([Collections] is nearing 30k LOC...)?

Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
In reply to this post by Stian Soiland-Reyes
I have no issues with contributing Span and SpanBuffer.  Span is similar to
commons-lang Range and it might be reasonable to migrate to Range for the
Span part.  The SpanBuffer (possibly renamed to RangeBuffer) is
conceptually a byte buffer with long offset and length so that it can
conceptually be shifted.  A span buffer can be composed of multiple buffers
and you can extract span buffers from other span buffers.  Like the
standard java bytebuffer they do not reallocate the space, just point to
the offsets in the existing allocated space.  It is the composition that
makes them powerful.

So I can create a SpanBuffer from "Hello World", and another from "Goodbye
cruel"  Then I can compose a new one by creating one useing the "Goodbye
cruel" and folloiwng it with the "Hello World".cut( 5 ) [ cut the first 5
bytes off] and have "Goodbye cruel world" as the result.  This structure is
very handy when serializing large objects across kafka and similar systems
where the buffer size may be smaller than the object you want to serialize
as it can be chopped into smaller chunks, and reassembled as a composition
of the smaller chunks without the calling classes knowing.  I also used it
when I built a binary diff routine that handles very large files as it
means that I only need keep the one file and the differences to the other.

But Span and SpanBuffer are both used in the serialization classes that
could form the basis of example code, so accepting them is optional.

Claude

On Thu, Sep 12, 2019 at 9:28 AM Stian Soiland-Reyes <[hidden email]>
wrote:

> On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren <[hidden email]>
> wrote:
> > Actually the code I was thinking of is the multi-filter branch.  It
> cleans
> > up some names and simplifies a few things.  The collections and storage
> > packages might be best added as examples rather than as mainline code.
> >
> > In this case we just provide the bloom filter implementation,  If we want
> > to provide the container implementation then I think it should probably
> be
> > modified to accept any SortedSet or NavigatableSet in the constructor.
> >
> > When I return home, next week, I'll take a swipe at moving the packages
> > over to org.apache.commons.collections4.bloomfilter package (unless there
> > is a better package name).  We can then look at the entire code donation
> > and decide what changes need to be made before it is accepted.
> >
> > Does this sound like a reasonable approach?
>
> Sounds reasonable to me - then it's easy to see what will be the code
> donation, they could be examples at first that we can link to from
> documentation, several commons packages have such example codes.
>
>
> Perhaps use packagename "commons.collections4.bloomfilter" without
> org.apache before it's been IP-cleared? Alternatively add it on a fork
> of https://github.com/apache/commons-collections/ so we don't confuse
> anyone.
>
>
> I see on your branch you are using some new dependencies like
> org.xenei.blockstorage and org.xenei.spanbuffer.SpanBuffer - would these
> be needed as well if we include the container implementation or are they
> more for disk-based collections?
>
> --
> Stian Soiland-Reyes
> https://orcid.org/0000-0001-9842-9718
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
In reply to this post by Gilles Sadowski-2
@Gilles

Missed your suggestion about modularity.  Can you point me to the original
message or paraphrase it here?

Claude

On Thu, Sep 12, 2019 at 11:03 AM Gilles Sadowski <[hidden email]>
wrote:

> Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes <[hidden email]> a
> écrit :
> >
> > On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren <[hidden email]>
> wrote:
> > > Actually the code I was thinking of is the multi-filter branch.  It
> cleans
> > > up some names and simplifies a few things.  The collections and storage
> > > packages might be best added as examples rather than as mainline code.
> > >
> > > In this case we just provide the bloom filter implementation,  If we
> want
> > > to provide the container implementation then I think it should
> probably be
> > > modified to accept any SortedSet or NavigatableSet in the constructor.
> > >
> > > When I return home, next week, I'll take a swipe at moving the packages
> > > over to org.apache.commons.collections4.bloomfilter package (unless
> there
> > > is a better package name).  We can then look at the entire code
> donation
> > > and decide what changes need to be made before it is accepted.
> > >
> > > Does this sound like a reasonable approach?
>
> Any comment about my suggestion to make [Collections] modular,
> starting with that code ([Collections] is nearing 30k LOC...)?
>
> Gilles
>
> > [...]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

garydgregory
Let's talk about modules after the PR comes, I only see that as needed to
avoid bringing in dependencies for all users. IOW I would only see breaking
up Collections into Maven modules if either the PR is giant or it depends
on other artifacts.

Gary

On Thu, Sep 12, 2019, 11:15 Claude Warren <[hidden email]> wrote:

> @Gilles
>
> Missed your suggestion about modularity.  Can you point me to the original
> message or paraphrase it here?
>
> Claude
>
> On Thu, Sep 12, 2019 at 11:03 AM Gilles Sadowski <[hidden email]>
> wrote:
>
> > Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes <[hidden email]> a
> > écrit :
> > >
> > > On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren <[hidden email]>
> > wrote:
> > > > Actually the code I was thinking of is the multi-filter branch.  It
> > cleans
> > > > up some names and simplifies a few things.  The collections and
> storage
> > > > packages might be best added as examples rather than as mainline
> code.
> > > >
> > > > In this case we just provide the bloom filter implementation,  If we
> > want
> > > > to provide the container implementation then I think it should
> > probably be
> > > > modified to accept any SortedSet or NavigatableSet in the
> constructor.
> > > >
> > > > When I return home, next week, I'll take a swipe at moving the
> packages
> > > > over to org.apache.commons.collections4.bloomfilter package (unless
> > there
> > > > is a better package name).  We can then look at the entire code
> > donation
> > > > and decide what changes need to be made before it is accepted.
> > > >
> > > > Does this sound like a reasonable approach?
> >
> > Any comment about my suggestion to make [Collections] modular,
> > starting with that code ([Collections] is nearing 30k LOC...)?
> >
> > Gilles
> >
> > > [...]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>
Reply | Threaded
Open this post in threaded view
|

Re: New Sub-project Proposal.

Claude Warren
In reply to this post by Claude Warren
The base code depended on commons-lang3 for building hashes.  Is this
acceptable or should the hash generation code from lang3 be cut and pasted
into the classes.  Not sure what the standard is in this project.

On Thu, Sep 12, 2019 at 4:14 PM Claude Warren <[hidden email]> wrote:

> @Gilles
>
> Missed your suggestion about modularity.  Can you point me to the original
> message or paraphrase it here?
>
> Claude
>
> On Thu, Sep 12, 2019 at 11:03 AM Gilles Sadowski <[hidden email]>
> wrote:
>
>> Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes <[hidden email]> a
>> écrit :
>> >
>> > On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren <[hidden email]>
>> wrote:
>> > > Actually the code I was thinking of is the multi-filter branch.  It
>> cleans
>> > > up some names and simplifies a few things.  The collections and
>> storage
>> > > packages might be best added as examples rather than as mainline code.
>> > >
>> > > In this case we just provide the bloom filter implementation,  If we
>> want
>> > > to provide the container implementation then I think it should
>> probably be
>> > > modified to accept any SortedSet or NavigatableSet in the constructor.
>> > >
>> > > When I return home, next week, I'll take a swipe at moving the
>> packages
>> > > over to org.apache.commons.collections4.bloomfilter package (unless
>> there
>> > > is a better package name).  We can then look at the entire code
>> donation
>> > > and decide what changes need to be made before it is accepted.
>> > >
>> > > Does this sound like a reasonable approach?
>>
>> Any comment about my suggestion to make [Collections] modular,
>> starting with that code ([Collections] is nearing 30k LOC...)?
>>
>> Gilles
>>
>> > [...]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren
123