[lang] adding an implementation of the Myers difference algorithm

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[lang] adding an implementation of the Myers difference algorithm

Luc Maisonobe-2
Hello,

Some times ago, Thomas proposed an implementation of a Longest Commons
Substring algorithm. At
that time I said I had another algorithm in the same spirit for the
Myers difference algorithm.

I got the green light to provide this code base to the Apache Software
Foundation. I will send
the Software Grant to secretary in a few minutes. Once the grant is
registered, I will create a
Jira issue and attach the original code to it, then I will port it for
inclusion into Commons.

The public API of this implementation takes two sequences of Object and
provides as output an EditScript
which implements the visitor design pattern. By visiting the script, we
can retrieve the differences
between the two sequences (objects inserted, object deleted) or we can
retrieve the similarities
(sub-sequences that are in both initial sequences). We only use the
"equals" method in the initial objects.

So my questions are:

  - in which component do we include this, we talked about [lang], is it
right ?
  - the classes are in a "comparator" package, where should we put this
package ?

best regards,
Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] adding an implementation of the Myers difference algorithm

garydgregory
On Fri, Apr 6, 2012 at 11:14 AM, luc <[hidden email]> wrote:

> Hello,
>
> Some times ago, Thomas proposed an implementation of a Longest Commons
> Substring algorithm. At
> that time I said I had another algorithm in the same spirit for the Myers
> difference algorithm.
>
> I got the green light to provide this code base to the Apache Software
> Foundation. I will send
> the Software Grant to secretary in a few minutes. Once the grant is
> registered, I will create a
> Jira issue and attach the original code to it, then I will port it for
> inclusion into Commons.
>
> The public API of this implementation takes two sequences of Object and
> provides as output an EditScript
> which implements the visitor design pattern. By visiting the script, we
> can retrieve the differences
> between the two sequences (objects inserted, object deleted) or we can
> retrieve the similarities
> (sub-sequences that are in both initial sequences). We only use the
> "equals" method in the initial objects.
>
> So my questions are:
>
>  - in which component do we include this, we talked about [lang], is it
> right ?
>  - the classes are in a "comparator" package, where should we put this
> package ?
>

.text?

Gary

>
> best regards,
> Luc
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<[hidden email]>
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [lang] adding an implementation of the Myers difference algorithm

Luc Maisonobe
Hi Gary,

Le 06/04/2012 21:50, Gary Gregory a écrit :

> On Fri, Apr 6, 2012 at 11:14 AM, luc <[hidden email]> wrote:
>
>> Hello,
>>
>> Some times ago, Thomas proposed an implementation of a Longest Commons
>> Substring algorithm. At
>> that time I said I had another algorithm in the same spirit for the Myers
>> difference algorithm.
>>
>> I got the green light to provide this code base to the Apache Software
>> Foundation. I will send
>> the Software Grant to secretary in a few minutes. Once the grant is
>> registered, I will create a
>> Jira issue and attach the original code to it, then I will port it for
>> inclusion into Commons.
>>
>> The public API of this implementation takes two sequences of Object and
>> provides as output an EditScript
>> which implements the visitor design pattern. By visiting the script, we
>> can retrieve the differences
>> between the two sequences (objects inserted, object deleted) or we can
>> retrieve the similarities
>> (sub-sequences that are in both initial sequences). We only use the
>> "equals" method in the initial objects.
>>
>> So my questions are:
>>
>>  - in which component do we include this, we talked about [lang], is it
>> right ?
>>  - the classes are in a "comparator" package, where should we put this
>> package ?
>>
>
> .text?

Perhaps, but it is much more general than that. It can be used on any
object as it only relies on equal. You could use it on text, on numbers,
on genetic sequences, on binary streams, you name it.

Luc

>
> Gary
>
>>
>> best regards,
>> Luc
>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<[hidden email]>
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] adding an implementation of the Myers difference algorithm

garydgregory
In reply to this post by garydgregory
On Apr 6, 2012, at 16:05, Luc Maisonobe <[hidden email]> wrote:

> Hi Gary,
>
> Le 06/04/2012 21:50, Gary Gregory a écrit :
>> On Fri, Apr 6, 2012 at 11:14 AM, luc <[hidden email]> wrote:
>>
>>> Hello,
>>>
>>> Some times ago, Thomas proposed an implementation of a Longest Commons
>>> Substring algorithm. At
>>> that time I said I had another algorithm in the same spirit for the Myers
>>> difference algorithm.
>>>
>>> I got the green light to provide this code base to the Apache Software
>>> Foundation. I will send
>>> the Software Grant to secretary in a few minutes. Once the grant is
>>> registered, I will create a
>>> Jira issue and attach the original code to it, then I will port it for
>>> inclusion into Commons.
>>>
>>> The public API of this implementation takes two sequences of Object and
>>> provides as output an EditScript
>>> which implements the visitor design pattern. By visiting the script, we
>>> can retrieve the differences
>>> between the two sequences (objects inserted, object deleted) or we can
>>> retrieve the similarities
>>> (sub-sequences that are in both initial sequences). We only use the
>>> "equals" method in the initial objects.
>>>
>>> So my questions are:
>>>
>>> - in which component do we include this, we talked about [lang], is it
>>> right ?
>>> - the classes are in a "comparator" package, where should we put this
>>> package ?
>>>
>>
>> .text?
>
> Perhaps, but it is much more general than that. It can be used on any
> object as it only relies on equal. You could use it on text, on numbers,
> on genetic sequences, on binary streams, you name it.

Not to be downer here but this is feeling out of scope for Lang. I
think about the java.lang extension mission and this does not fit IMO.
So the next question is where in Commons would this fit? Codec? A new
component? There are 20 plus components in Commons, let's think of the
best fit.

Gary

>
> Luc
>
>>
>> Gary
>>
>>>
>>> best regards,
>>> Luc
>>>
>>> ------------------------------**------------------------------**---------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<[hidden email]>
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[codec][collections][math]Re: [lang] adding an implementation of the Myers difference algorithm

Luc Maisonobe
Hi Gary,

Le 07/04/2012 04:30, Gary Gregory a écrit :

> On Apr 6, 2012, at 16:05, Luc Maisonobe <[hidden email]> wrote:
>
>> Hi Gary,
>>
>> Le 06/04/2012 21:50, Gary Gregory a écrit :
>>> On Fri, Apr 6, 2012 at 11:14 AM, luc <[hidden email]> wrote:
>>>
>>>> Hello,
>>>>
>>>> Some times ago, Thomas proposed an implementation of a Longest Commons
>>>> Substring algorithm. At
>>>> that time I said I had another algorithm in the same spirit for the Myers
>>>> difference algorithm.
>>>>
>>>> I got the green light to provide this code base to the Apache Software
>>>> Foundation. I will send
>>>> the Software Grant to secretary in a few minutes. Once the grant is
>>>> registered, I will create a
>>>> Jira issue and attach the original code to it, then I will port it for
>>>> inclusion into Commons.
>>>>
>>>> The public API of this implementation takes two sequences of Object and
>>>> provides as output an EditScript
>>>> which implements the visitor design pattern. By visiting the script, we
>>>> can retrieve the differences
>>>> between the two sequences (objects inserted, object deleted) or we can
>>>> retrieve the similarities
>>>> (sub-sequences that are in both initial sequences). We only use the
>>>> "equals" method in the initial objects.
>>>>
>>>> So my questions are:
>>>>
>>>> - in which component do we include this, we talked about [lang], is it
>>>> right ?
>>>> - the classes are in a "comparator" package, where should we put this
>>>> package ?
>>>>
>>>
>>> .text?
>>
>> Perhaps, but it is much more general than that. It can be used on any
>> object as it only relies on equal. You could use it on text, on numbers,
>> on genetic sequences, on binary streams, you name it.
>
> Not to be downer here but this is feeling out of scope for Lang. I

You are right.

> think about the java.lang extension mission and this does not fit IMO.
> So the next question is where in Commons would this fit? Codec? A new
> component? There are 20 plus components in Commons, let's think of the
> best fit.

Looking at the list, I would put [collections] as first choice and
[codec] as second choice ... and [math] as third choice (but this is
already stretching too far).

Any other idea or advice ?

Luc

>
> Gary
>
>>
>> Luc
>>
>>>
>>> Gary
>>>
>>>>
>>>> best regards,
>>>> Luc
>>>>
>>>> ------------------------------**------------------------------**---------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<[hidden email]>
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [codec][collections][math]Re: [lang] adding an implementation of the Myers difference algorithm

garydgregory
In reply to this post by garydgregory
On Apr 7, 2012, at 5:41, Luc Maisonobe <[hidden email]> wrote:

> Hi Gary,
>
> Le 07/04/2012 04:30, Gary Gregory a écrit :
>> On Apr 6, 2012, at 16:05, Luc Maisonobe <[hidden email]> wrote:
>>
>>> Hi Gary,
>>>
>>> Le 06/04/2012 21:50, Gary Gregory a écrit :
>>>> On Fri, Apr 6, 2012 at 11:14 AM, luc <[hidden email]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Some times ago, Thomas proposed an implementation of a Longest Commons
>>>>> Substring algorithm. At
>>>>> that time I said I had another algorithm in the same spirit for the Myers
>>>>> difference algorithm.
>>>>>
>>>>> I got the green light to provide this code base to the Apache Software
>>>>> Foundation. I will send
>>>>> the Software Grant to secretary in a few minutes. Once the grant is
>>>>> registered, I will create a
>>>>> Jira issue and attach the original code to it, then I will port it for
>>>>> inclusion into Commons.
>>>>>
>>>>> The public API of this implementation takes two sequences of Object and
>>>>> provides as output an EditScript
>>>>> which implements the visitor design pattern. By visiting the script, we
>>>>> can retrieve the differences
>>>>> between the two sequences (objects inserted, object deleted) or we can
>>>>> retrieve the similarities
>>>>> (sub-sequences that are in both initial sequences). We only use the
>>>>> "equals" method in the initial objects.
>>>>>
>>>>> So my questions are:
>>>>>
>>>>> - in which component do we include this, we talked about [lang], is it
>>>>> right ?
>>>>> - the classes are in a "comparator" package, where should we put this
>>>>> package ?
>>>>>
>>>>
>>>> .text?
>>>
>>> Perhaps, but it is much more general than that. It can be used on any
>>> object as it only relies on equal. You could use it on text, on numbers,
>>> on genetic sequences, on binary streams, you name it.
>>
>> Not to be downer here but this is feeling out of scope for Lang. I
>
> You are right.
>
>> think about the java.lang extension mission and this does not fit IMO.
>> So the next question is where in Commons would this fit? Codec? A new
>> component? There are 20 plus components in Commons, let's think of the
>> best fit.
>
> Looking at the list, I would put [collections] as first choice and
> [codec] as second choice ... and [math] as third choice (but this is
> already stretching too far).
>
> Any other idea or advice ?

A positive aspect of picking collections would be to release
collections 4.0 which has been requested many times for Generics.

Gary

>
> Luc
>
>>
>> Gary
>>
>>>
>>> Luc
>>>
>>>>
>>>> Gary
>>>>
>>>>>
>>>>> best regards,
>>>>> Luc
>>>>>
>>>>> ------------------------------**------------------------------**---------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<[hidden email]>
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [codec][collections][math]Re: [lang] adding an implementation of the Myers difference algorithm

Oliver Heger-3
Am 07.04.2012 12:37, schrieb Gary Gregory:

> On Apr 7, 2012, at 5:41, Luc Maisonobe<[hidden email]>  wrote:
>
>> Hi Gary,
>>
>> Le 07/04/2012 04:30, Gary Gregory a écrit :
>>> On Apr 6, 2012, at 16:05, Luc Maisonobe<[hidden email]>  wrote:
>>>
>>>> Hi Gary,
>>>>
>>>> Le 06/04/2012 21:50, Gary Gregory a écrit :
>>>>> On Fri, Apr 6, 2012 at 11:14 AM, luc<[hidden email]>  wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Some times ago, Thomas proposed an implementation of a Longest Commons
>>>>>> Substring algorithm. At
>>>>>> that time I said I had another algorithm in the same spirit for the Myers
>>>>>> difference algorithm.
>>>>>>
>>>>>> I got the green light to provide this code base to the Apache Software
>>>>>> Foundation. I will send
>>>>>> the Software Grant to secretary in a few minutes. Once the grant is
>>>>>> registered, I will create a
>>>>>> Jira issue and attach the original code to it, then I will port it for
>>>>>> inclusion into Commons.
>>>>>>
>>>>>> The public API of this implementation takes two sequences of Object and
>>>>>> provides as output an EditScript
>>>>>> which implements the visitor design pattern. By visiting the script, we
>>>>>> can retrieve the differences
>>>>>> between the two sequences (objects inserted, object deleted) or we can
>>>>>> retrieve the similarities
>>>>>> (sub-sequences that are in both initial sequences). We only use the
>>>>>> "equals" method in the initial objects.
>>>>>>
>>>>>> So my questions are:
>>>>>>
>>>>>> - in which component do we include this, we talked about [lang], is it
>>>>>> right ?
>>>>>> - the classes are in a "comparator" package, where should we put this
>>>>>> package ?
>>>>>>
>>>>>
>>>>> .text?
>>>>
>>>> Perhaps, but it is much more general than that. It can be used on any
>>>> object as it only relies on equal. You could use it on text, on numbers,
>>>> on genetic sequences, on binary streams, you name it.
>>>
>>> Not to be downer here but this is feeling out of scope for Lang. I
>>
>> You are right.
>>
>>> think about the java.lang extension mission and this does not fit IMO.
>>> So the next question is where in Commons would this fit? Codec? A new
>>> component? There are 20 plus components in Commons, let's think of the
>>> best fit.
>>
>> Looking at the list, I would put [collections] as first choice and
>> [codec] as second choice ... and [math] as third choice (but this is
>> already stretching too far).
>>
>> Any other idea or advice ?
>
> A positive aspect of picking collections would be to release
> collections 4.0 which has been requested many times for Generics.

This code seems to be something new because in Commons there is no
component providing implementations for standard algorithms. (And I
doubt that it makes sense to create one as it would be really hard to
define its scope.)

Given the fact that the algorithm operates on collections, [collections]
indeed seems to be the right place. Or maybe [functor]?

Oliver

>
> Gary
>
>>
>> Luc
>>
>>>
>>> Gary
>>>
>>>>
>>>> Luc
>>>>
>>>>>
>>>>> Gary
>>>>>
>>>>>>
>>>>>> best regards,
>>>>>> Luc
>>>>>>
>>>>>> ------------------------------**------------------------------**---------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<[hidden email]>
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [codec][collections][math]Re: [lang] adding an implementation of the Myers difference algorithm

James Carman-3
In reply to this post by Luc Maisonobe
I can see a simpler string-based version as being applicable to StringUtils
(char sequence)
On Apr 7, 2012 5:41 AM, "Luc Maisonobe" <[hidden email]> wrote:

> Hi Gary,
>
> Le 07/04/2012 04:30, Gary Gregory a écrit :
> > On Apr 6, 2012, at 16:05, Luc Maisonobe <[hidden email]> wrote:
> >
> >> Hi Gary,
> >>
> >> Le 06/04/2012 21:50, Gary Gregory a écrit :
> >>> On Fri, Apr 6, 2012 at 11:14 AM, luc <[hidden email]> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> Some times ago, Thomas proposed an implementation of a Longest Commons
> >>>> Substring algorithm. At
> >>>> that time I said I had another algorithm in the same spirit for the
> Myers
> >>>> difference algorithm.
> >>>>
> >>>> I got the green light to provide this code base to the Apache Software
> >>>> Foundation. I will send
> >>>> the Software Grant to secretary in a few minutes. Once the grant is
> >>>> registered, I will create a
> >>>> Jira issue and attach the original code to it, then I will port it for
> >>>> inclusion into Commons.
> >>>>
> >>>> The public API of this implementation takes two sequences of Object
> and
> >>>> provides as output an EditScript
> >>>> which implements the visitor design pattern. By visiting the script,
> we
> >>>> can retrieve the differences
> >>>> between the two sequences (objects inserted, object deleted) or we can
> >>>> retrieve the similarities
> >>>> (sub-sequences that are in both initial sequences). We only use the
> >>>> "equals" method in the initial objects.
> >>>>
> >>>> So my questions are:
> >>>>
> >>>> - in which component do we include this, we talked about [lang], is it
> >>>> right ?
> >>>> - the classes are in a "comparator" package, where should we put this
> >>>> package ?
> >>>>
> >>>
> >>> .text?
> >>
> >> Perhaps, but it is much more general than that. It can be used on any
> >> object as it only relies on equal. You could use it on text, on numbers,
> >> on genetic sequences, on binary streams, you name it.
> >
> > Not to be downer here but this is feeling out of scope for Lang. I
>
> You are right.
>
> > think about the java.lang extension mission and this does not fit IMO.
> > So the next question is where in Commons would this fit? Codec? A new
> > component? There are 20 plus components in Commons, let's think of the
> > best fit.
>
> Looking at the list, I would put [collections] as first choice and
> [codec] as second choice ... and [math] as third choice (but this is
> already stretching too far).
>
> Any other idea or advice ?
>
> Luc
>
> >
> > Gary
> >
> >>
> >> Luc
> >>
> >>>
> >>> Gary
> >>>
> >>>>
> >>>> best regards,
> >>>> Luc
> >>>>
> >>>>
> ------------------------------**------------------------------**---------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org<
> [hidden email]>
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>