[text] Longest common subsequence wrong result?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[text] Longest common subsequence wrong result?

Sébastien Piller
Hi all,
If I call
new LongestCommonSubsequence ().apply ("xxx","yyy")
I get 0 (correct)
If I call 
new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
I get 2 which looks incorrect to me (should have got 1 since there is no sequence of 2 chars on both strings. Is it a bug or an expected behavior?
Thanks

Envoyé depuis mon smartphone Samsung Galaxy.
Reply | Threaded
Open this post in threaded view
|

Re: [text] Longest common subsequence wrong result?

paul womack
Sébastien Piller wrote:
> Hi all,
> If I call
> new LongestCommonSubsequence ().apply ("xxx","yyy")
> I get 0 (correct)
> If I call
> new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
> I get 2 which looks incorrect to me (should have got 1 since there is no sequence of 2 chars on both strings. Is it a bug or an expected behavior?

What is the return type of the method?

  BugBear


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [text] Longest common subsequence wrong result?

Rob Tompkins-2
In reply to this post by Sébastien Piller
Hello Sébastien,

From what I can tell this would be expected behaviour. I think this hinges on the definition of “subsequence” differing from the definition of “substring.” By this I mean that a subsequence to be an enumerated list of elements derived by deleting some (possibly zero) elements from the original enumerated list. Whereas, a substring is an enumerated list of characters derived by deleting some (possibly zero) elements from the original character list and that our new character list were adjacent in the original list.

So, in your example of “Gandalf” and “Sauron” share the subsequence {a, n}. But, it we were to restrict to substring, then the longest commons substring would simply be {a}.

I’ve tried to spell this out in the javadoc here (http://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/similarity/LongestCommonSubsequence.html#logestCommonSubsequence-java.lang.CharSequence-java.lang.CharSequence-), but I suppose I should have been clearer in the documentation.

Do let me know if you think there’s a way to better present this details.

Many thanks and all the best,
-Rob

> On Mar 31, 2017, at 7:16 AM, Sébastien Piller <[hidden email]> wrote:
>
> Hi all,
> If I call
> new LongestCommonSubsequence ().apply ("xxx","yyy")
> I get 0 (correct)
> If I call
> new LongestCommonSubsequence ().apply ("Gandalf","Sauron")
> I get 2 which looks incorrect to me (should have got 1 since there is no sequence of 2 chars on both strings. Is it a bug or an expected behavior?
> Thanks
>
> Envoyé depuis mon smartphone Samsung Galaxy.