RE : Re: [text] Longest common subsequence wrong result?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

RE : Re: [text] Longest common subsequence wrong result?

Sébastien Piller
Ho ok, my bad I did not read enough...
Maybe just put that note at the class level? If you just read the class description, it is not clear that there is a difference between the two concepts
Thanks for the clarification !
Envoyé depuis mon smartphone Samsung Galaxy.
-------- Message d'origine --------De : Rob Tompkins <[hidden email]> Date : 31.03.17  13:34  (GMT+01:00) À : Commons Users List <[hidden email]> Objet : Re: [text] Longest common subsequence wrong result?
Hello Sébastien,

From what I can tell this would be expected behaviour. I think this hinges on the definition of “subsequence” differing from the definition of “substring.” By this I mean that a subsequence to be an enumerated list of elements derived by deleting some (possibly zero) elements from the original enumerated list. Whereas, a substring is an enumerated list of characters derived by deleting some (possibly zero) elements from the original character list and that our new character list were adjacent in the original list.

So, in your example of “Gandalf” and “Sauron” share the subsequence {a, n}. But, it we were to restrict to substring, then the longest commons substring would simply be {a}.

I’ve tried to spell this out in the javadoc here (http://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LongestCommonSubsequence.html#logestCommonSubsequence-java.lang.CharSequence-java.lang.CharSequence-), but I suppose I should have been clearer in the documentation.

Do let me know if you think there’s a way to better present this details.

Many thanks and all the best,
-Rob

> On Mar 31, 2017, at 7:16 AM, Sébastien Piller <[hidden email]> wrote:
>
> Hi all,
> If I call
> new LongestCommonSubsequence ().apply ("xxx","yyy")
> I get 0 (correct)
> If I call
> new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
> I get 2 which looks incorrect to me (should have got 1 since there is no sequence of 2 chars on both strings. Is it a bug or an expected behavior?
> Thanks
>
> Envoyé depuis mon smartphone Samsung Galaxy.