[TEXT] Distance vs. Metric vs. Similarity

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[TEXT] Distance vs. Metric vs. Similarity

Benedikt Ritter-4
Hi,

currently the wording in commons text is a bit confusing. We have the three
terms:

- distance
- similarity
- metric

Distance and similarity seem to be just opposites of the same thing. A
great distance indicates a small similarity between two character
sequences. Metric feels like it's something more general, but I'm not sure.

I think we should consider renaming everything to distance, since the
implemented algorithms all end on *Distance. So we would change the package
name from o.a.c.text.similarity to o.a.c.text.distance and the interface
from StringMetric to StringDistance.

WDYT?

Benedikt

--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Benedikt Ritter-4
2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:

>
> Hi,
>
> currently the wording in commons text is a bit confusing. We have the
> three terms:
>
> - distance
> - similarity
> - metric
>
> Distance and similarity seem to be just opposites of the same thing. A
> great distance indicates a small similarity between two character
> sequences. Metric feels like it's something more general, but I'm not sure.
>
> I think we should consider renaming everything to distance, since the
> implemented algorithms all end on *Distance. So we would change the package
> name from o.a.c.text.similarity to o.a.c.text.distance and the interface
> from StringMetric to StringDistance.
>

Looking at the code again, it seems like the algorithms all really return a
similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
states: "A higher score indicates a higher similarity". If this is a case,
maybe it makes more sense to rename everything to Similarity?



--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Bruno P. Kinoshita
Hello Benedikt!
> Metric feels like it's something more general, but I'm not sure.
You're right. Metric was supposed to be a general interface, representing the String Metric from the Wikipedia article.
>  and the interface from StringMetric to StringDistance.
I'm reading the Myers paper, and already have a local branch with the Myers algorithm from [collections] ported to [text]. 
Perhaps we could move the StringMetric interface to o.a.c.text package, and create StringDistance or EditDistance interface in o.a.c.text.distance.
This way we can have String Metrics as in Wikipedia, as being a way of giving a valuefor comparing two strings. We would have the edit distances in the distance package, and the diff algorithms in another diff package. All of them being String Metrics. 
What do you think?
> > I think we should consider renaming everything to distance, since the> > implemented algorithms all end on *Distance. So we would change the package> > name from o.a.c.text.similarity to o.a.c.text.distance and the interface> > from StringMetric to StringDistance.> >> 
> Looking at the code again, it seems like the algorithms all really return a> similarity score and not a distance. For exmaple FuzzyDistance JavaDoc> states: "A higher score indicates a higher similarity". If this is a case,> maybe it makes more sense to rename everything to Similarity?
I'm in favor of dropping score and similarity, and adopting distance in the package, classes and javadocs, as it is used in other tools (e.g. Solr, Talend, Informatica IIR, etc).
All the best,Bruno

 
      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>
 Sent: Sunday, December 14, 2014 6:20 PM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
   
2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:

>
> Hi,
>
> currently the wording in commons text is a bit confusing. We have the
> three terms:
>
> - distance
> - similarity
> - metric
>
> Distance and similarity seem to be just opposites of the same thing. A
> great distance indicates a small similarity between two character
> sequences. Metric feels like it's something more general, but I'm not sure.
>
> I think we should consider renaming everything to distance, since the
> implemented algorithms all end on *Distance. So we would change the package
> name from o.a.c.text.similarity to o.a.c.text.distance and the interface
> from StringMetric to StringDistance.
>

Looking at the code again, it seems like the algorithms all really return a
similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
states: "A higher score indicates a higher similarity". If this is a case,
maybe it makes more sense to rename everything to Similarity?



>


--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter


   
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Benedikt Ritter-4
Hi Bruna,



2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <[hidden email]>:

>
> Hello Benedikt!
> > Metric feels like it's something more general, but I'm not sure.
> You're right. Metric was supposed to be a general interface,
> representing the String Metric from the Wikipedia article.
> >  and the interface from StringMetric to StringDistance.
> I'm reading the Myers paper, and already have a local branch with the
> Myers algorithm from [collections] ported to [text].
> Perhaps we could move the StringMetric interface to o.a.c.text package,
> and create StringDistance or EditDistance interface in o.a.c.text.distance.
> This way we can have String Metrics as in Wikipedia, as being a way of
> giving a valuefor comparing two strings. We would have the edit distances
> in the distance package, and the diff algorithms in another diff package.
> All of them being String Metrics.
> What do you think?
>

Sounds good, although I'm not sure I understand where you are going with
the marker interface. What is it's purpose?


> > > I think we should consider renaming everything to distance, since
> the> > implemented algorithms all end on *Distance. So we would change the
> package> > name from o.a.c.text.similarity to o.a.c.text.distance and the
> interface> > from StringMetric to StringDistance.> >>
> > Looking at the code again, it seems like the algorithms all really
> return a> similarity score and not a distance. For exmaple FuzzyDistance
> JavaDoc> states: "A higher score indicates a higher similarity". If this is
> a case,> maybe it makes more sense to rename everything to Similarity?
> I'm in favor of dropping score and similarity, and adopting distance in
> the package, classes and javadocs, as it is used in other tools (e.g. Solr,
> Talend, Informatica IIR, etc).
>

Okay, but we need to make sure all algorithms really return a distance
then. As I said, FuzzyDistance currently really returns a similarity score.
An algorithm returning a distance should return a higher number for higher
distances.

Benedikt


> All the best,Bruno
>
>
>       From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>
>  Sent: Sunday, December 14, 2014 6:20 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> >
> > Hi,
> >
> > currently the wording in commons text is a bit confusing. We have the
> > three terms:
> >
> > - distance
> > - similarity
> > - metric
> >
> > Distance and similarity seem to be just opposites of the same thing. A
> > great distance indicates a small similarity between two character
> > sequences. Metric feels like it's something more general, but I'm not
> sure.
> >
> > I think we should consider renaming everything to distance, since the
> > implemented algorithms all end on *Distance. So we would change the
> package
> > name from o.a.c.text.similarity to o.a.c.text.distance and the interface
> > from StringMetric to StringDistance.
> >
>
> Looking at the code again, it seems like the algorithms all really return a
> similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> states: "A higher score indicates a higher similarity". If this is a case,
> maybe it makes more sense to rename everything to Similarity?
>
>
> >
> > WDYT?
> >
> > Benedikt
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
>
>
> >
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>
>
>
>

--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Bruno P. Kinoshita
> Sounds good, although I'm not sure I understand where you are going with> the marker interface. What is it's purpose?
Let's then keep the StringMetric interface and update its Javadoc. Thinking again, that other marker interface seems to be unnecessary.  > Okay, but we need to make sure all algorithms really return a distance> then. As I said, FuzzyDistance currently really returns a similarity score.> An algorithm returning a distance should return a higher number for higher> distances. I had a look at the code, and I think I understand what you are saying now. In FuzzyDistance, the higher the score, the closer strings are. Different than what the other algorithms return.
I believe I found why I named that package similarity. Probably it was because I saw that in the stringmetric library [1]. There, Levenshtein, Jaccard and other algorithms are suffixed with "Metric".
How about we keep the package as similarity and simply rename the classes to [Algo]Metric too? This way we will be able to accommodate other metrics such as the Sorensen-Dice coefficient, where the higher the coefficient, more similar two strings are.
WDYT?
CheersBruno 
[1] https://github.com/rockymadden/stringmetric
 


      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita <[hidden email]>
 Sent: Sunday, December 14, 2014 6:45 PM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
   
Hi Bruna,



2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <[hidden email]>:

>
> Hello Benedikt!
> > Metric feels like it's something more general, but I'm not sure.
> You're right. Metric was supposed to be a general interface,
> representing the String Metric from the Wikipedia article.
> >  and the interface from StringMetric to StringDistance.
> I'm reading the Myers paper, and already have a local branch with the
> Myers algorithm from [collections] ported to [text].
> Perhaps we could move the StringMetric interface to o.a.c.text package,
> and create StringDistance or EditDistance interface in o.a.c.text.distance.
> This way we can have String Metrics as in Wikipedia, as being a way of
> giving a valuefor comparing two strings. We would have the edit distances
> in the distance package, and the diff algorithms in another diff package.
> All of them being String Metrics.
> What do you think?
>

Sounds good, although I'm not sure I understand where you are going with
the marker interface. What is it's purpose?


> > > I think we should consider renaming everything to distance, since
> the> > implemented algorithms all end on *Distance. So we would change the
> package> > name from o.a.c.text.similarity to o.a.c.text.distance and the
> interface> > from StringMetric to StringDistance.> >>
> > Looking at the code again, it seems like the algorithms all really
> return a> similarity score and not a distance. For exmaple FuzzyDistance
> JavaDoc> states: "A higher score indicates a higher similarity". If this is
> a case,> maybe it makes more sense to rename everything to Similarity?
> I'm in favor of dropping score and similarity, and adopting distance in
> the package, classes and javadocs, as it is used in other tools (e.g. Solr,
> Talend, Informatica IIR, etc).
>

Okay, but we need to make sure all algorithms really return a distance
then. As I said, FuzzyDistance currently really returns a similarity score.
An algorithm returning a distance should return a higher number for higher
distances.

Benedikt


> All the best,Bruno
>
>
>      From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>
>  Sent: Sunday, December 14, 2014 6:20 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> >
> > Hi,
> >
> > currently the wording in commons text is a bit confusing. We have the
> > three terms:
> >
> > - distance
> > - similarity
> > - metric
> >
> > Distance and similarity seem to be just opposites of the same thing. A
> > great distance indicates a small similarity between two character
> > sequences. Metric feels like it's something more general, but I'm not
> sure.
> >
> > I think we should consider renaming everything to distance, since the
> > implemented algorithms all end on *Distance. So we would change the
> package
> > name from o.a.c.text.similarity to o.a.c.text.distance and the interface
> > from StringMetric to StringDistance.
> >
>
> Looking at the code again, it seems like the algorithms all really return a
> similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> states: "A higher score indicates a higher similarity". If this is a case,
> maybe it makes more sense to rename everything to Similarity?
>
>
> >
> > WDYT?
> >
> > Benedikt
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter


--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter


   
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Emmanuel Bourg-3
In reply to this post by Benedikt Ritter-4
Le 14/12/2014 21:08, Benedikt Ritter a écrit :

> Distance and similarity seem to be just opposites of the same thing. A
> great distance indicates a small similarity between two character
> sequences. Metric feels like it's something more general, but I'm not sure.
>
> WDYT?

Return the inverse (1/similarity) and you get a distance ;) (not sure it
gets all the expected properties of a norm though [1])

Emmanuel Bourg

[1] http://en.wikipedia.org/wiki/Norm_%28mathematics%29#Definition

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Benedikt Ritter-4
In reply to this post by Bruno P. Kinoshita
2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <[hidden email]>:

>
> > Sounds good, although I'm not sure I understand where you are going
> with> the marker interface. What is it's purpose?
> Let's then keep the StringMetric interface and update its Javadoc.
> Thinking again, that other marker interface seems to be unnecessary.  >
> Okay, but we need to make sure all algorithms really return a
> distance> then. As I said, FuzzyDistance currently really returns a
> similarity score.> An algorithm returning a distance should return a higher
> number for higher> distances. I had a look at the code, and I think I
> understand what you are saying now. In FuzzyDistance, the higher the score,
> the closer strings are. Different than what the other algorithms return.
> I believe I found why I named that package similarity. Probably it was
> because I saw that in the stringmetric library [1]. There, Levenshtein,
> Jaccard and other algorithms are suffixed with "Metric".
> How about we keep the package as similarity and simply rename the classes
> to [Algo]Metric too? This way we will be able to accommodate other metrics
> such as the Sorensen-Dice coefficient, where the higher the coefficient,
> more similar two strings are.
> WDYT?
>


Hey Bruno,

yes we can do it that way. What I want to avoid is, that the users have to
check the JavaDoc every time they use an algorithms. To me it would make
sense to have a number of distance algorithms and they all return a
distance. Or we have Similarity algorithms and they all return a
similarity. That way users can swap out the underlying algorithms without
changing their code.

Benedikt


> CheersBruno
> [1] https://github.com/rockymadden/stringmetric
>
>
>
>       From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita
> <[hidden email]>
>  Sent: Sunday, December 14, 2014 6:45 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> Hi Bruna,
>
>
>
> 2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <[hidden email]
> >:
> >
> > Hello Benedikt!
> > > Metric feels like it's something more general, but I'm not sure.
> > You're right. Metric was supposed to be a general interface,
> > representing the String Metric from the Wikipedia article.
> > >  and the interface from StringMetric to StringDistance.
> > I'm reading the Myers paper, and already have a local branch with the
> > Myers algorithm from [collections] ported to [text].
> > Perhaps we could move the StringMetric interface to o.a.c.text package,
> > and create StringDistance or EditDistance interface in
> o.a.c.text.distance.
> > This way we can have String Metrics as in Wikipedia, as being a way of
> > giving a valuefor comparing two strings. We would have the edit distances
> > in the distance package, and the diff algorithms in another diff package.
> > All of them being String Metrics.
> > What do you think?
> >
>
> Sounds good, although I'm not sure I understand where you are going with
> the marker interface. What is it's purpose?
>
>
> > > > I think we should consider renaming everything to distance, since
> > the> > implemented algorithms all end on *Distance. So we would change
> the
> > package> > name from o.a.c.text.similarity to o.a.c.text.distance and the
> > interface> > from StringMetric to StringDistance.> >>
> > > Looking at the code again, it seems like the algorithms all really
> > return a> similarity score and not a distance. For exmaple FuzzyDistance
> > JavaDoc> states: "A higher score indicates a higher similarity". If this
> is
> > a case,> maybe it makes more sense to rename everything to Similarity?
> > I'm in favor of dropping score and similarity, and adopting distance in
> > the package, classes and javadocs, as it is used in other tools (e.g.
> Solr,
> > Talend, Informatica IIR, etc).
> >
>
> Okay, but we need to make sure all algorithms really return a distance
> then. As I said, FuzzyDistance currently really returns a similarity score.
> An algorithm returning a distance should return a higher number for higher
> distances.
>
> Benedikt
>
>
> > All the best,Bruno
> >
> >
> >      From: Benedikt Ritter <[hidden email]>
> >  To: Commons Developers List <[hidden email]>
> >  Sent: Sunday, December 14, 2014 6:20 PM
> >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> > >
> > > Hi,
> > >
> > > currently the wording in commons text is a bit confusing. We have the
> > > three terms:
> > >
> > > - distance
> > > - similarity
> > > - metric
> > >
> > > Distance and similarity seem to be just opposites of the same thing. A
> > > great distance indicates a small similarity between two character
> > > sequences. Metric feels like it's something more general, but I'm not
> > sure.
> > >
> > > I think we should consider renaming everything to distance, since the
> > > implemented algorithms all end on *Distance. So we would change the
> > package
> > > name from o.a.c.text.similarity to o.a.c.text.distance and the
> interface
> > > from StringMetric to StringDistance.
> > >
> >
> > Looking at the code again, it seems like the algorithms all really
> return a
> > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> > states: "A higher score indicates a higher similarity". If this is a
> case,
> > maybe it makes more sense to rename everything to Similarity?
> >
> >
> > >
> > > WDYT?
> > >
> > > Benedikt
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter
>
>
> >
> >
> > >
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
> >
> >
> >
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>
>
>
>

--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Bruno P. Kinoshita
Hi Benedikt,
After playing more with [text] and some edit distances, I think we can retake this conversation and hopefully fix SANDBOX-488 [1].
I've created a branch SANDBOX-488 in git [2] with the following modifications:

* The StringMetric interface has been renamed to EditDistance* We have the following edit distances available: Levenshtein, JaroWrinkler, Hamming ([lang]) and Cosine. Others might be added in the future, such as Jaccard and QGram
* When an edit distance returns 0, it means both strings are identical or at least very similar. The opposite is true, returning 1, or higher values, means that the strings are less close to each other* There are other classes that can be used for text similarity, such as the FuzzyScore ([lang]), and the CosineSimilarity (used by the Cosine edit distance). Others might be added later, such as the Jaccard Index. The behaviour of each of these classes varies

I think it is simpler, and users will quickly understand the API. Once one understands what is an edit distance, s/he can guess the behaviour of any of its implementations.
What do you think? If you agree I'd like to merge the branch and fix the issue.
TL;DR: the similarity package contains code to work on text similarity, such as edit distances, but also scores / indexes and other algorithms. The StringMetric interface has been renamed to EditDistance, and only edit distances implement it

TIA
Bruno

[1] https://issues.apache.org/jira/browse/SANDBOX-488[2] https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=tree;f=src/main/java/org/apache/commons/text/similarity;h=a2de9f0196b543f50c6d2c28376feb311f46eeda;hb=refs/heads/SANDBOX-488
 

      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita <[hidden email]>
 Sent: Friday, December 19, 2014 2:35 AM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
   


2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <[hidden email]>:
> Sounds good, although I'm not sure I understand where you are going with> the marker interface. What is it's purpose?
Let's then keep the StringMetric interface and update its Javadoc. Thinking again, that other marker interface seems to be unnecessary.  > Okay, but we need to make sure all algorithms really return a distance> then. As I said, FuzzyDistance currently really returns a similarity score.> An algorithm returning a distance should return a higher number for higher> distances. I had a look at the code, and I think I understand what you are saying now. In FuzzyDistance, the higher the score, the closer strings are. Different than what the other algorithms return.
I believe I found why I named that package similarity. Probably it was because I saw that in the stringmetric library [1]. There, Levenshtein, Jaccard and other algorithms are suffixed with "Metric".
How about we keep the package as similarity and simply rename the classes to [Algo]Metric too? This way we will be able to accommodate other metrics such as the Sorensen-Dice coefficient, where the higher the coefficient, more similar two strings are.
WDYT?



Hey Bruno,
yes we can do it that way. What I want to avoid is, that the users have to check the JavaDoc every time they use an algorithms. To me it would make sense to have a number of distance algorithms and they all return a distance. Or we have Similarity algorithms and they all return a similarity. That way users can swap out the underlying algorithms without changing their code.
Benedikt 
CheersBruno 
[1] https://github.com/rockymadden/stringmetric



      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita <[hidden email]>
 Sent: Sunday, December 14, 2014 6:45 PM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity

Hi Bruna,



2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <[hidden email]>:

>
> Hello Benedikt!
> > Metric feels like it's something more general, but I'm not sure.
> You're right. Metric was supposed to be a general interface,
> representing the String Metric from the Wikipedia article.
> >  and the interface from StringMetric to StringDistance.
> I'm reading the Myers paper, and already have a local branch with the
> Myers algorithm from [collections] ported to [text].
> Perhaps we could move the StringMetric interface to o.a.c.text package,
> and create StringDistance or EditDistance interface in o.a.c.text.distance.
> This way we can have String Metrics as in Wikipedia, as being a way of
> giving a valuefor comparing two strings. We would have the edit distances
> in the distance package, and the diff algorithms in another diff package.
> All of them being String Metrics.
> What do you think?
>

Sounds good, although I'm not sure I understand where you are going with
the marker interface. What is it's purpose?


> > > I think we should consider renaming everything to distance, since
> the> > implemented algorithms all end on *Distance. So we would change the
> package> > name from o.a.c.text.similarity to o.a.c.text.distance and the
> interface> > from StringMetric to StringDistance.> >>
> > Looking at the code again, it seems like the algorithms all really
> return a> similarity score and not a distance. For exmaple FuzzyDistance
> JavaDoc> states: "A higher score indicates a higher similarity". If this is
> a case,> maybe it makes more sense to rename everything to Similarity?
> I'm in favor of dropping score and similarity, and adopting distance in
> the package, classes and javadocs, as it is used in other tools (e.g. Solr,
> Talend, Informatica IIR, etc).
>

Okay, but we need to make sure all algorithms really return a distance
then. As I said, FuzzyDistance currently really returns a similarity score.
An algorithm returning a distance should return a higher number for higher
distances.

Benedikt


> All the best,Bruno
>
>
>      From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>
>  Sent: Sunday, December 14, 2014 6:20 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> >
> > Hi,
> >
> > currently the wording in commons text is a bit confusing. We have the
> > three terms:
> >
> > - distance
> > - similarity
> > - metric
> >
> > Distance and similarity seem to be just opposites of the same thing. A
> > great distance indicates a small similarity between two character
> > sequences. Metric feels like it's something more general, but I'm not
> sure.
> >
> > I think we should consider renaming everything to distance, since the
> > implemented algorithms all end on *Distance. So we would change the
> package
> > name from o.a.c.text.similarity to o.a.c.text.distance and the interface
> > from StringMetric to StringDistance.
> >
>
> Looking at the code again, it seems like the algorithms all really return a
> similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> states: "A higher score indicates a higher similarity". If this is a case,
> maybe it makes more sense to rename everything to Similarity?
>
>
> >
> > WDYT?
> >
> > Benedikt
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter




--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter






--
http://people.apache.org/~britter/http://www.systemoutprintln.de/http://twitter.com/BenediktRitterhttp://github.com/britter

   
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Benedikt Ritter-4
Hi Bruno

2015-04-15 12:14 GMT+02:00 Bruno P. Kinoshita <[hidden email]>:

> Hi Benedikt,
>
> After playing more with [text] and some edit distances, I think we can
> retake this conversation and hopefully fix SANDBOX-488 [1].
>
> I've created a branch SANDBOX-488 in git [2] with the following
> modifications:
>
> * The StringMetric interface has been renamed to EditDistance
> * We have the following edit distances available: Levenshtein,
> JaroWrinkler, Hamming ([lang]) and Cosine. Others might be added in the
> future, such as Jaccard and QGram
> * When an edit distance returns 0, it means both strings are identical or
> at least very similar. The opposite is true, returning 1, or higher values,
> means that the strings are less close to each other
> * There are other classes that can be used for text similarity, such as
> the FuzzyScore ([lang]), and the CosineSimilarity (used by the Cosine edit
> distance). Others might be added later, such as the Jaccard Index. The
> behaviour of each of these classes varies
>
> I think it is simpler, and users will quickly understand the API. Once one
> understands what is an edit distance, s/he can guess the behaviour of any
> of its implementations.
>
> What do you think? If you agree I'd like to merge the branch and fix the
> issue.
>

Very nice! Maybe we can even come up with a generic class that calculates a
distance based on a similarity score.

Benedikt


>
> TL;DR: the similarity package contains code to work on text similarity,
> such as edit distances, but also scores / indexes and other algorithms. The
> StringMetric interface has been renamed to EditDistance, and only edit
> distances implement it
>
> TIA
> Bruno
>
> [1] https://issues.apache.org/jira/browse/SANDBOX-488
> [2]
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=tree;f=src/main/java/org/apache/commons/text/similarity;h=a2de9f0196b543f50c6d2c28376feb311f46eeda;hb=refs/heads/SANDBOX-488
>
>   ------------------------------
>  *From:* Benedikt Ritter <[hidden email]>
> *To:* Commons Developers List <[hidden email]>; Bruno P.
> Kinoshita <[hidden email]>
> *Sent:* Friday, December 19, 2014 2:35 AM
>
> *Subject:* Re: [TEXT] Distance vs. Metric vs. Similarity
>
>
>
> 2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <[hidden email]>
> :
>
> > Sounds good, although I'm not sure I understand where you are going
> with> the marker interface. What is it's purpose?
> Let's then keep the StringMetric interface and update its Javadoc.
> Thinking again, that other marker interface seems to be unnecessary.  >
> Okay, but we need to make sure all algorithms really return a
> distance> then. As I said, FuzzyDistance currently really returns a
> similarity score.> An algorithm returning a distance should return a higher
> number for higher> distances. I had a look at the code, and I think I
> understand what you are saying now. In FuzzyDistance, the higher the score,
> the closer strings are. Different than what the other algorithms return.
> I believe I found why I named that package similarity. Probably it was
> because I saw that in the stringmetric library [1]. There, Levenshtein,
> Jaccard and other algorithms are suffixed with "Metric".
> How about we keep the package as similarity and simply rename the classes
> to [Algo]Metric too? This way we will be able to accommodate other metrics
> such as the Sorensen-Dice coefficient, where the higher the coefficient,
> more similar two strings are.
> WDYT?
>
>
>
> Hey Bruno,
>
> yes we can do it that way. What I want to avoid is, that the users have to
> check the JavaDoc every time they use an algorithms. To me it would make
> sense to have a number of distance algorithms and they all return a
> distance. Or we have Similarity algorithms and they all return a
> similarity. That way users can swap out the underlying algorithms without
> changing their code.
>
> Benedikt
>
>
> CheersBruno
> [1] https://github.com/rockymadden/stringmetric
>
>
>
>       From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita
> <[hidden email]>
>  Sent: Sunday, December 14, 2014 6:45 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> Hi Bruna,
>
>
>
> 2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <[hidden email]
> >:
> >
> > Hello Benedikt!
> > > Metric feels like it's something more general, but I'm not sure.
> > You're right. Metric was supposed to be a general interface,
> > representing the String Metric from the Wikipedia article.
> > >  and the interface from StringMetric to StringDistance.
> > I'm reading the Myers paper, and already have a local branch with the
> > Myers algorithm from [collections] ported to [text].
> > Perhaps we could move the StringMetric interface to o.a.c.text package,
> > and create StringDistance or EditDistance interface in
> o.a.c.text.distance.
> > This way we can have String Metrics as in Wikipedia, as being a way of
> > giving a valuefor comparing two strings. We would have the edit distances
> > in the distance package, and the diff algorithms in another diff package.
> > All of them being String Metrics.
> > What do you think?
> >
>
> Sounds good, although I'm not sure I understand where you are going with
> the marker interface. What is it's purpose?
>
>
> > > > I think we should consider renaming everything to distance, since
> > the> > implemented algorithms all end on *Distance. So we would change
> the
> > package> > name from o.a.c.text.similarity to o.a.c.text.distance and the
> > interface> > from StringMetric to StringDistance.> >>
> > > Looking at the code again, it seems like the algorithms all really
> > return a> similarity score and not a distance. For exmaple FuzzyDistance
> > JavaDoc> states: "A higher score indicates a higher similarity". If this
> is
> > a case,> maybe it makes more sense to rename everything to Similarity?
> > I'm in favor of dropping score and similarity, and adopting distance in
> > the package, classes and javadocs, as it is used in other tools (e.g.
> Solr,
> > Talend, Informatica IIR, etc).
> >
>
> Okay, but we need to make sure all algorithms really return a distance
> then. As I said, FuzzyDistance currently really returns a similarity score.
> An algorithm returning a distance should return a higher number for higher
> distances.
>
> Benedikt
>
>
> > All the best,Bruno
> >
> >
> >      From: Benedikt Ritter <[hidden email]>
> >  To: Commons Developers List <[hidden email]>
> >  Sent: Sunday, December 14, 2014 6:20 PM
> >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> > >
> > > Hi,
> > >
> > > currently the wording in commons text is a bit confusing. We have the
> > > three terms:
> > >
> > > - distance
> > > - similarity
> > > - metric
> > >
> > > Distance and similarity seem to be just opposites of the same thing. A
> > > great distance indicates a small similarity between two character
> > > sequences. Metric feels like it's something more general, but I'm not
> > sure.
> > >
> > > I think we should consider renaming everything to distance, since the
> > > implemented algorithms all end on *Distance. So we would change the
> > package
> > > name from o.a.c.text.similarity to o.a.c.text.distance and the
> interface
> > > from StringMetric to StringDistance.
> > >
> >
> > Looking at the code again, it seems like the algorithms all really
> return a
> > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> > states: "A higher score indicates a higher similarity". If this is a
> case,
> > maybe it makes more sense to rename everything to Similarity?
> >
> >
> > >
> > > WDYT?
> > >
> > > Benedikt
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter
>
>
>
>
>
> >
> >
> > >
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
> >
> >
> >
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>
>
>
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>
>
>


--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Bruno P. Kinoshita
Hi Benedikt

>Very nice! Maybe we can even come up with a generic class that calculates a>distance based on a similarity score.
Hmmm, that's a good idea. We probably want to keep that idea in an issue for later :-) [1] I'll use my next development cycle on [text] to review the code and reports, and to write the user guide with what we have already in the project.
Do you think we would need anything else before trying a 1.0 release? There are two TODO marks in the test, but I plan to get rid of them in the next days too. But they don't seem like a blocker right now anyway.

ThanksBruno

[1] https://issues.apache.org/jira/browse/SANDBOX-495

 
      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>
 Sent: Wednesday, April 15, 2015 11:03 PM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
   
Hi Bruno

2015-04-15 12:14 GMT+02:00 Bruno P. Kinoshita <[hidden email]>:

> Hi Benedikt,
>
> After playing more with [text] and some edit distances, I think we can
> retake this conversation and hopefully fix SANDBOX-488 [1].
>
> I've created a branch SANDBOX-488 in git [2] with the following
> modifications:
>
> * The StringMetric interface has been renamed to EditDistance
> * We have the following edit distances available: Levenshtein,
> JaroWrinkler, Hamming ([lang]) and Cosine. Others might be added in the
> future, such as Jaccard and QGram
> * When an edit distance returns 0, it means both strings are identical or
> at least very similar. The opposite is true, returning 1, or higher values,
> means that the strings are less close to each other
> * There are other classes that can be used for text similarity, such as
> the FuzzyScore ([lang]), and the CosineSimilarity (used by the Cosine edit
> distance). Others might be added later, such as the Jaccard Index. The
> behaviour of each of these classes varies
>
> I think it is simpler, and users will quickly understand the API. Once one
> understands what is an edit distance, s/he can guess the behaviour of any
> of its implementations.
>
> What do you think? If you agree I'd like to merge the branch and fix the
> issue.
>

Very nice! Maybe we can even come up with a generic class that calculates a
distance based on a similarity score.

Benedikt


>
> TL;DR: the similarity package contains code to work on text similarity,
> such as edit distances, but also scores / indexes and other algorithms. The
> StringMetric interface has been renamed to EditDistance, and only edit
> distances implement it
>
> TIA
> Bruno
>
> [1] https://issues.apache.org/jira/browse/SANDBOX-488
> [2]
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=tree;f=src/main/java/org/apache/commons/text/similarity;h=a2de9f0196b543f50c6d2c28376feb311f46eeda;hb=refs/heads/SANDBOX-488
>
>  ------------------------------
>  *From:* Benedikt Ritter <[hidden email]>
> *To:* Commons Developers List <[hidden email]>; Bruno P.
> Kinoshita <[hidden email]>
> *Sent:* Friday, December 19, 2014 2:35 AM
>
> *Subject:* Re: [TEXT] Distance vs. Metric vs. Similarity
>
>
>
> 2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <[hidden email]>
> :
>
> > Sounds good, although I'm not sure I understand where you are going
> with> the marker interface. What is it's purpose?
> Let's then keep the StringMetric interface and update its Javadoc.
> Thinking again, that other marker interface seems to be unnecessary.  >
> Okay, but we need to make sure all algorithms really return a
> distance> then. As I said, FuzzyDistance currently really returns a
> similarity score.> An algorithm returning a distance should return a higher
> number for higher> distances. I had a look at the code, and I think I
> understand what you are saying now. In FuzzyDistance, the higher the score,
> the closer strings are. Different than what the other algorithms return.
> I believe I found why I named that package similarity. Probably it was
> because I saw that in the stringmetric library [1]. There, Levenshtein,
> Jaccard and other algorithms are suffixed with "Metric".
> How about we keep the package as similarity and simply rename the classes
> to [Algo]Metric too? This way we will be able to accommodate other metrics
> such as the Sorensen-Dice coefficient, where the higher the coefficient,
> more similar two strings are.
> WDYT?
>
>
>
> Hey Bruno,
>
> yes we can do it that way. What I want to avoid is, that the users have to
> check the JavaDoc every time they use an algorithms. To me it would make
> sense to have a number of distance algorithms and they all return a
> distance. Or we have Similarity algorithms and they all return a
> similarity. That way users can swap out the underlying algorithms without
> changing their code.
>
> Benedikt
>
>
> CheersBruno
> [1] https://github.com/rockymadden/stringmetric
>
>
>
>      From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita
> <[hidden email]>
>  Sent: Sunday, December 14, 2014 6:45 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> Hi Bruna,
>
>
>
> 2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <[hidden email]
> >:
> >
> > Hello Benedikt!
> > > Metric feels like it's something more general, but I'm not sure.
> > You're right. Metric was supposed to be a general interface,
> > representing the String Metric from the Wikipedia article.
> > >  and the interface from StringMetric to StringDistance.
> > I'm reading the Myers paper, and already have a local branch with the
> > Myers algorithm from [collections] ported to [text].
> > Perhaps we could move the StringMetric interface to o.a.c.text package,
> > and create StringDistance or EditDistance interface in
> o.a.c.text.distance.
> > This way we can have String Metrics as in Wikipedia, as being a way of
> > giving a valuefor comparing two strings. We would have the edit distances
> > in the distance package, and the diff algorithms in another diff package.
> > All of them being String Metrics.
> > What do you think?
> >
>
> Sounds good, although I'm not sure I understand where you are going with
> the marker interface. What is it's purpose?
>
>
> > > > I think we should consider renaming everything to distance, since
> > the> > implemented algorithms all end on *Distance. So we would change
> the
> > package> > name from o.a.c.text.similarity to o.a.c.text.distance and the
> > interface> > from StringMetric to StringDistance.> >>
> > > Looking at the code again, it seems like the algorithms all really
> > return a> similarity score and not a distance. For exmaple FuzzyDistance
> > JavaDoc> states: "A higher score indicates a higher similarity". If this
> is
> > a case,> maybe it makes more sense to rename everything to Similarity?
> > I'm in favor of dropping score and similarity, and adopting distance in
> > the package, classes and javadocs, as it is used in other tools (e.g.
> Solr,
> > Talend, Informatica IIR, etc).
> >
>
> Okay, but we need to make sure all algorithms really return a distance
> then. As I said, FuzzyDistance currently really returns a similarity score.
> An algorithm returning a distance should return a higher number for higher
> distances.
>
> Benedikt
>
>
> > All the best,Bruno
> >
> >
> >      From: Benedikt Ritter <[hidden email]>
> >  To: Commons Developers List <[hidden email]>
> >  Sent: Sunday, December 14, 2014 6:20 PM
> >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> > >
> > > Hi,
> > >
> > > currently the wording in commons text is a bit confusing. We have the
> > > three terms:
> > >
> > > - distance
> > > - similarity
> > > - metric
> > >
> > > Distance and similarity seem to be just opposites of the same thing. A
> > > great distance indicates a small similarity between two character
> > > sequences. Metric feels like it's something more general, but I'm not
> > sure.
> > >
> > > I think we should consider renaming everything to distance, since the
> > > implemented algorithms all end on *Distance. So we would change the
> > package
> > > name from o.a.c.text.similarity to o.a.c.text.distance and the
> interface
> > > from StringMetric to StringDistance.
> > >
> >
> > Looking at the code again, it seems like the algorithms all really
> return a
> > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> > states: "A higher score indicates a higher similarity". If this is a
> case,
> > maybe it makes more sense to rename everything to Similarity?
> >
> >
> > >
> > > WDYT?
> > >
> > > Benedikt
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter



--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter


   
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Benedikt Ritter-4
2015-04-16 13:38 GMT+02:00 Bruno P. Kinoshita <[hidden email]>:

> Hi Benedikt
>
> >Very nice! Maybe we can even come up with a generic class that calculates
> a>distance based on a similarity score.
> Hmmm, that's a good idea. We probably want to keep that idea in an issue
> for later :-) [1] I'll use my next development cycle on [text] to review
> the code and reports, and to write the user guide with what we have already
> in the project.
> Do you think we would need anything else before trying a 1.0 release?
> There are two TODO marks in the test, but I plan to get rid of them in the
> next days too. But they don't seem like a blocker right now anyway.
>

Release early, release often. Better come up with a small feature set in
1.0 and add stuff in the next releases than try to push everything into 1.0.
I'd like to do a little review cycle of the code myself. I hope to find the
time this weekend. After polishing up, we can go for 1.0

keep up the good work!
Benedikt


>
> ThanksBruno
>
> [1] https://issues.apache.org/jira/browse/SANDBOX-495
>
>
>       From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>
>  Sent: Wednesday, April 15, 2015 11:03 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> Hi Bruno
>
> 2015-04-15 12:14 GMT+02:00 Bruno P. Kinoshita <[hidden email]
> >:
>
> > Hi Benedikt,
> >
> > After playing more with [text] and some edit distances, I think we can
> > retake this conversation and hopefully fix SANDBOX-488 [1].
> >
> > I've created a branch SANDBOX-488 in git [2] with the following
> > modifications:
> >
> > * The StringMetric interface has been renamed to EditDistance
> > * We have the following edit distances available: Levenshtein,
> > JaroWrinkler, Hamming ([lang]) and Cosine. Others might be added in the
> > future, such as Jaccard and QGram
> > * When an edit distance returns 0, it means both strings are identical or
> > at least very similar. The opposite is true, returning 1, or higher
> values,
> > means that the strings are less close to each other
> > * There are other classes that can be used for text similarity, such as
> > the FuzzyScore ([lang]), and the CosineSimilarity (used by the Cosine
> edit
> > distance). Others might be added later, such as the Jaccard Index. The
> > behaviour of each of these classes varies
> >
> > I think it is simpler, and users will quickly understand the API. Once
> one
> > understands what is an edit distance, s/he can guess the behaviour of any
> > of its implementations.
> >
> > What do you think? If you agree I'd like to merge the branch and fix the
> > issue.
> >
>
> Very nice! Maybe we can even come up with a generic class that calculates a
> distance based on a similarity score.
>
> Benedikt
>
>
> >
> > TL;DR: the similarity package contains code to work on text similarity,
> > such as edit distances, but also scores / indexes and other algorithms.
> The
> > StringMetric interface has been renamed to EditDistance, and only edit
> > distances implement it
> >
> > TIA
> > Bruno
> >
> > [1] https://issues.apache.org/jira/browse/SANDBOX-488
> > [2]
> >
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=tree;f=src/main/java/org/apache/commons/text/similarity;h=a2de9f0196b543f50c6d2c28376feb311f46eeda;hb=refs/heads/SANDBOX-488
> >
> >  ------------------------------
> >  *From:* Benedikt Ritter <[hidden email]>
> > *To:* Commons Developers List <[hidden email]>; Bruno P.
> > Kinoshita <[hidden email]>
> > *Sent:* Friday, December 19, 2014 2:35 AM
> >
> > *Subject:* Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> >
> >
> > 2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <
> [hidden email]>
> > :
> >
> > > Sounds good, although I'm not sure I understand where you are going
> > with> the marker interface. What is it's purpose?
> > Let's then keep the StringMetric interface and update its Javadoc.
> > Thinking again, that other marker interface seems to be unnecessary.  >
> > Okay, but we need to make sure all algorithms really return a
> > distance> then. As I said, FuzzyDistance currently really returns a
> > similarity score.> An algorithm returning a distance should return a
> higher
> > number for higher> distances. I had a look at the code, and I think I
> > understand what you are saying now. In FuzzyDistance, the higher the
> score,
> > the closer strings are. Different than what the other algorithms return.
> > I believe I found why I named that package similarity. Probably it was
> > because I saw that in the stringmetric library [1]. There, Levenshtein,
> > Jaccard and other algorithms are suffixed with "Metric".
> > How about we keep the package as similarity and simply rename the classes
> > to [Algo]Metric too? This way we will be able to accommodate other
> metrics
> > such as the Sorensen-Dice coefficient, where the higher the coefficient,
> > more similar two strings are.
> > WDYT?
> >
> >
> >
> > Hey Bruno,
> >
> > yes we can do it that way. What I want to avoid is, that the users have
> to
> > check the JavaDoc every time they use an algorithms. To me it would make
> > sense to have a number of distance algorithms and they all return a
> > distance. Or we have Similarity algorithms and they all return a
> > similarity. That way users can swap out the underlying algorithms without
> > changing their code.
> >
> > Benedikt
> >
> >
> > CheersBruno
> > [1] https://github.com/rockymadden/stringmetric
> >
> >
> >
> >      From: Benedikt Ritter <[hidden email]>
> >  To: Commons Developers List <[hidden email]>; Bruno P.
> Kinoshita
> > <[hidden email]>
> >  Sent: Sunday, December 14, 2014 6:45 PM
> >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> > Hi Bruna,
> >
> >
> >
> > 2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <
> [hidden email]
> > >:
> > >
> > > Hello Benedikt!
> > > > Metric feels like it's something more general, but I'm not sure.
> > > You're right. Metric was supposed to be a general interface,
> > > representing the String Metric from the Wikipedia article.
> > > >  and the interface from StringMetric to StringDistance.
> > > I'm reading the Myers paper, and already have a local branch with the
> > > Myers algorithm from [collections] ported to [text].
> > > Perhaps we could move the StringMetric interface to o.a.c.text package,
> > > and create StringDistance or EditDistance interface in
> > o.a.c.text.distance.
> > > This way we can have String Metrics as in Wikipedia, as being a way of
> > > giving a valuefor comparing two strings. We would have the edit
> distances
> > > in the distance package, and the diff algorithms in another diff
> package.
> > > All of them being String Metrics.
> > > What do you think?
> > >
> >
> > Sounds good, although I'm not sure I understand where you are going with
> > the marker interface. What is it's purpose?
> >
> >
> > > > > I think we should consider renaming everything to distance, since
> > > the> > implemented algorithms all end on *Distance. So we would change
> > the
> > > package> > name from o.a.c.text.similarity to o.a.c.text.distance and
> the
> > > interface> > from StringMetric to StringDistance.> >>
> > > > Looking at the code again, it seems like the algorithms all really
> > > return a> similarity score and not a distance. For exmaple
> FuzzyDistance
> > > JavaDoc> states: "A higher score indicates a higher similarity". If
> this
> > is
> > > a case,> maybe it makes more sense to rename everything to Similarity?
> > > I'm in favor of dropping score and similarity, and adopting distance in
> > > the package, classes and javadocs, as it is used in other tools (e.g.
> > Solr,
> > > Talend, Informatica IIR, etc).
> > >
> >
> > Okay, but we need to make sure all algorithms really return a distance
> > then. As I said, FuzzyDistance currently really returns a similarity
> score.
> > An algorithm returning a distance should return a higher number for
> higher
> > distances.
> >
> > Benedikt
> >
> >
> > > All the best,Bruno
> > >
> > >
> > >      From: Benedikt Ritter <[hidden email]>
> > >  To: Commons Developers List <[hidden email]>
> > >  Sent: Sunday, December 14, 2014 6:20 PM
> > >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> > >
> > > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> > > >
> > > > Hi,
> > > >
> > > > currently the wording in commons text is a bit confusing. We have the
> > > > three terms:
> > > >
> > > > - distance
> > > > - similarity
> > > > - metric
> > > >
> > > > Distance and similarity seem to be just opposites of the same thing.
> A
> > > > great distance indicates a small similarity between two character
> > > > sequences. Metric feels like it's something more general, but I'm not
> > > sure.
> > > >
> > > > I think we should consider renaming everything to distance, since the
> > > > implemented algorithms all end on *Distance. So we would change the
> > > package
> > > > name from o.a.c.text.similarity to o.a.c.text.distance and the
> > interface
> > > > from StringMetric to StringDistance.
> > > >
> > >
> > > Looking at the code again, it seems like the algorithms all really
> > return a
> > > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> > > states: "A higher score indicates a higher similarity". If this is a
> > case,
> > > maybe it makes more sense to rename everything to Similarity?
> > >
> > >
> > > >
> > > > WDYT?
> > > >
> > > > Benedikt
> > > >
> > > > --
> > > > http://people.apache.org/~britter/
> > > > http://www.systemoutprintln.de/
> > > > http://twitter.com/BenediktRitter
> > > > http://github.com/britter
>
>
> >
> >
> >
> >
> >
> > >
> > >
> > > >
> > >
> > >
> > > --
> > > http://people.apache.org/~britter/
> > > http://www.systemoutprintln.de/
> > > http://twitter.com/BenediktRitter
> > > http://github.com/britter
> > >
> > >
> > >
> > >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
> >
> >
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
> >
> >
>
>
> --
> http://people.apache.org/~britter/
> http://www.systemoutprintln.de/
> http://twitter.com/BenediktRitter
> http://github.com/britter
>
>
>
>


--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Bruno P. Kinoshita
> Release early, release often. Better come up with a small feature set in
>1.0 and add stuff in the next releases than try to push everything into 1.0.
+1 :-)

Will try writing some part of the user guide tomorrow. That way you can review some part of the documentation too - at least the index and overall structure.
ThanksBruno

 
      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita <[hidden email]>
 Sent: Thursday, April 16, 2015 11:52 PM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
   
2015-04-16 13:38 GMT+02:00 Bruno P. Kinoshita <[hidden email]>:

> Hi Benedikt
>
> >Very nice! Maybe we can even come up with a generic class that calculates
> a>distance based on a similarity score.
> Hmmm, that's a good idea. We probably want to keep that idea in an issue
> for later :-) [1] I'll use my next development cycle on [text] to review
> the code and reports, and to write the user guide with what we have already
> in the project.
> Do you think we would need anything else before trying a 1.0 release?
> There are two TODO marks in the test, but I plan to get rid of them in the
> next days too. But they don't seem like a blocker right now anyway.
>

Release early, release often. Better come up with a small feature set in
1.0 and add stuff in the next releases than try to push everything into 1.0.
I'd like to do a little review cycle of the code myself. I hope to find the
time this weekend. After polishing up, we can go for 1.0

keep up the good work!
Benedikt


>
> ThanksBruno
>
> [1] https://issues.apache.org/jira/browse/SANDBOX-495
>
>
>      From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>
>  Sent: Wednesday, April 15, 2015 11:03 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> Hi Bruno
>
> 2015-04-15 12:14 GMT+02:00 Bruno P. Kinoshita <[hidden email]
> >:
>
> > Hi Benedikt,
> >
> > After playing more with [text] and some edit distances, I think we can
> > retake this conversation and hopefully fix SANDBOX-488 [1].
> >
> > I've created a branch SANDBOX-488 in git [2] with the following
> > modifications:
> >
> > * The StringMetric interface has been renamed to EditDistance
> > * We have the following edit distances available: Levenshtein,
> > JaroWrinkler, Hamming ([lang]) and Cosine. Others might be added in the
> > future, such as Jaccard and QGram
> > * When an edit distance returns 0, it means both strings are identical or
> > at least very similar. The opposite is true, returning 1, or higher
> values,
> > means that the strings are less close to each other
> > * There are other classes that can be used for text similarity, such as
> > the FuzzyScore ([lang]), and the CosineSimilarity (used by the Cosine
> edit
> > distance). Others might be added later, such as the Jaccard Index. The
> > behaviour of each of these classes varies
> >
> > I think it is simpler, and users will quickly understand the API. Once
> one
> > understands what is an edit distance, s/he can guess the behaviour of any
> > of its implementations.
> >
> > What do you think? If you agree I'd like to merge the branch and fix the
> > issue.
> >
>
> Very nice! Maybe we can even come up with a generic class that calculates a
> distance based on a similarity score.
>
> Benedikt
>
>
> >
> > TL;DR: the similarity package contains code to work on text similarity,
> > such as edit distances, but also scores / indexes and other algorithms.
> The
> > StringMetric interface has been renamed to EditDistance, and only edit
> > distances implement it
> >
> > TIA
> > Bruno
> >
> > [1] https://issues.apache.org/jira/browse/SANDBOX-488
> > [2]
> >
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=tree;f=src/main/java/org/apache/commons/text/similarity;h=a2de9f0196b543f50c6d2c28376feb311f46eeda;hb=refs/heads/SANDBOX-488
> >
> >  ------------------------------
> >  *From:* Benedikt Ritter <[hidden email]>
> > *To:* Commons Developers List <[hidden email]>; Bruno P.
> > Kinoshita <[hidden email]>
> > *Sent:* Friday, December 19, 2014 2:35 AM
> >
> > *Subject:* Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> >
> >
> > 2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <
> [hidden email]>
> > :
> >
> > > Sounds good, although I'm not sure I understand where you are going
> > with> the marker interface. What is it's purpose?
> > Let's then keep the StringMetric interface and update its Javadoc.
> > Thinking again, that other marker interface seems to be unnecessary.  >
> > Okay, but we need to make sure all algorithms really return a
> > distance> then. As I said, FuzzyDistance currently really returns a
> > similarity score.> An algorithm returning a distance should return a
> higher
> > number for higher> distances. I had a look at the code, and I think I
> > understand what you are saying now. In FuzzyDistance, the higher the
> score,
> > the closer strings are. Different than what the other algorithms return.
> > I believe I found why I named that package similarity. Probably it was
> > because I saw that in the stringmetric library [1]. There, Levenshtein,
> > Jaccard and other algorithms are suffixed with "Metric".
> > How about we keep the package as similarity and simply rename the classes
> > to [Algo]Metric too? This way we will be able to accommodate other
> metrics
> > such as the Sorensen-Dice coefficient, where the higher the coefficient,
> > more similar two strings are.
> > WDYT?
> >
> >
> >
> > Hey Bruno,
> >
> > yes we can do it that way. What I want to avoid is, that the users have
> to
> > check the JavaDoc every time they use an algorithms. To me it would make
> > sense to have a number of distance algorithms and they all return a
> > distance. Or we have Similarity algorithms and they all return a
> > similarity. That way users can swap out the underlying algorithms without
> > changing their code.
> >
> > Benedikt
> >
> >
> > CheersBruno
> > [1] https://github.com/rockymadden/stringmetric
> >
> >
> >
> >      From: Benedikt Ritter <[hidden email]>
> >  To: Commons Developers List <[hidden email]>; Bruno P.
> Kinoshita
> > <[hidden email]>
> >  Sent: Sunday, December 14, 2014 6:45 PM
> >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> > Hi Bruna,
> >
> >
> >
> > 2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <
> [hidden email]
> > >:
> > >
> > > Hello Benedikt!
> > > > Metric feels like it's something more general, but I'm not sure.
> > > You're right. Metric was supposed to be a general interface,
> > > representing the String Metric from the Wikipedia article.
> > > >  and the interface from StringMetric to StringDistance.
> > > I'm reading the Myers paper, and already have a local branch with the
> > > Myers algorithm from [collections] ported to [text].
> > > Perhaps we could move the StringMetric interface to o.a.c.text package,
> > > and create StringDistance or EditDistance interface in
> > o.a.c.text.distance.
> > > This way we can have String Metrics as in Wikipedia, as being a way of
> > > giving a valuefor comparing two strings. We would have the edit
> distances
> > > in the distance package, and the diff algorithms in another diff
> package.
> > > All of them being String Metrics.
> > > What do you think?
> > >
> >
> > Sounds good, although I'm not sure I understand where you are going with
> > the marker interface. What is it's purpose?
> >
> >
> > > > > I think we should consider renaming everything to distance, since
> > > the> > implemented algorithms all end on *Distance. So we would change
> > the
> > > package> > name from o.a.c.text.similarity to o.a.c.text.distance and
> the
> > > interface> > from StringMetric to StringDistance.> >>
> > > > Looking at the code again, it seems like the algorithms all really
> > > return a> similarity score and not a distance. For exmaple
> FuzzyDistance
> > > JavaDoc> states: "A higher score indicates a higher similarity". If
> this
> > is
> > > a case,> maybe it makes more sense to rename everything to Similarity?
> > > I'm in favor of dropping score and similarity, and adopting distance in
> > > the package, classes and javadocs, as it is used in other tools (e.g.
> > Solr,
> > > Talend, Informatica IIR, etc).
> > >
> >
> > Okay, but we need to make sure all algorithms really return a distance
> > then. As I said, FuzzyDistance currently really returns a similarity
> score.
> > An algorithm returning a distance should return a higher number for
> higher
> > distances.
> >
> > Benedikt
> >
> >
> > > All the best,Bruno
> > >
> > >
> > >      From: Benedikt Ritter <[hidden email]>
> > >  To: Commons Developers List <[hidden email]>
> > >  Sent: Sunday, December 14, 2014 6:20 PM
> > >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> > >
> > > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> > > >
> > > > Hi,
> > > >
> > > > currently the wording in commons text is a bit confusing. We have the
> > > > three terms:
> > > >
> > > > - distance
> > > > - similarity
> > > > - metric
> > > >
> > > > Distance and similarity seem to be just opposites of the same thing.
> A
> > > > great distance indicates a small similarity between two character
> > > > sequences. Metric feels like it's something more general, but I'm not
> > > sure.
> > > >
> > > > I think we should consider renaming everything to distance, since the
> > > > implemented algorithms all end on *Distance. So we would change the
> > > package
> > > > name from o.a.c.text.similarity to o.a.c.text.distance and the
> > interface
> > > > from StringMetric to StringDistance.
> > > >
> > >
> > > Looking at the code again, it seems like the algorithms all really
> > return a
> > > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> > > states: "A higher score indicates a higher similarity". If this is a
> > case,
> > > maybe it makes more sense to rename everything to Similarity?
> > >
> > >
> > > >
> > > > WDYT?
> > > >
> > > > Benedikt
> > > >
> > > > --
> > > > http://people.apache.org/~britter/
> > > > http://www.systemoutprintln.de/
> > > > http://twitter.com/BenediktRitter
> > > > http://github.com/britter



--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter


   
Reply | Threaded
Open this post in threaded view
|

Re: [TEXT] Distance vs. Metric vs. Similarity

Bruno P. Kinoshita
In reply to this post by Benedikt Ritter-4
Finished reviewing the code and reports. Website updated to the latest version too.
http://commons.apache.org/sandbox/commons-text/index.html
All the best,Bruno
 

      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>; Bruno P. Kinoshita <[hidden email]>
 Sent: Thursday, April 16, 2015 11:52 PM
 Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
   
2015-04-16 13:38 GMT+02:00 Bruno P. Kinoshita <[hidden email]>:

> Hi Benedikt
>
> >Very nice! Maybe we can even come up with a generic class that calculates
> a>distance based on a similarity score.
> Hmmm, that's a good idea. We probably want to keep that idea in an issue
> for later :-) [1] I'll use my next development cycle on [text] to review
> the code and reports, and to write the user guide with what we have already
> in the project.
> Do you think we would need anything else before trying a 1.0 release?
> There are two TODO marks in the test, but I plan to get rid of them in the
> next days too. But they don't seem like a blocker right now anyway.
>

Release early, release often. Better come up with a small feature set in
1.0 and add stuff in the next releases than try to push everything into 1.0.
I'd like to do a little review cycle of the code myself. I hope to find the
time this weekend. After polishing up, we can go for 1.0

keep up the good work!
Benedikt


>
> ThanksBruno
>
> [1] https://issues.apache.org/jira/browse/SANDBOX-495
>
>
>      From: Benedikt Ritter <[hidden email]>
>  To: Commons Developers List <[hidden email]>
>  Sent: Wednesday, April 15, 2015 11:03 PM
>  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
>
> Hi Bruno
>
> 2015-04-15 12:14 GMT+02:00 Bruno P. Kinoshita <[hidden email]
> >:
>
> > Hi Benedikt,
> >
> > After playing more with [text] and some edit distances, I think we can
> > retake this conversation and hopefully fix SANDBOX-488 [1].
> >
> > I've created a branch SANDBOX-488 in git [2] with the following
> > modifications:
> >
> > * The StringMetric interface has been renamed to EditDistance
> > * We have the following edit distances available: Levenshtein,
> > JaroWrinkler, Hamming ([lang]) and Cosine. Others might be added in the
> > future, such as Jaccard and QGram
> > * When an edit distance returns 0, it means both strings are identical or
> > at least very similar. The opposite is true, returning 1, or higher
> values,
> > means that the strings are less close to each other
> > * There are other classes that can be used for text similarity, such as
> > the FuzzyScore ([lang]), and the CosineSimilarity (used by the Cosine
> edit
> > distance). Others might be added later, such as the Jaccard Index. The
> > behaviour of each of these classes varies
> >
> > I think it is simpler, and users will quickly understand the API. Once
> one
> > understands what is an edit distance, s/he can guess the behaviour of any
> > of its implementations.
> >
> > What do you think? If you agree I'd like to merge the branch and fix the
> > issue.
> >
>
> Very nice! Maybe we can even come up with a generic class that calculates a
> distance based on a similarity score.
>
> Benedikt
>
>
> >
> > TL;DR: the similarity package contains code to work on text similarity,
> > such as edit distances, but also scores / indexes and other algorithms.
> The
> > StringMetric interface has been renamed to EditDistance, and only edit
> > distances implement it
> >
> > TIA
> > Bruno
> >
> > [1] https://issues.apache.org/jira/browse/SANDBOX-488
> > [2]
> >
> https://git1-us-west.apache.org/repos/asf?p=commons-text.git;a=tree;f=src/main/java/org/apache/commons/text/similarity;h=a2de9f0196b543f50c6d2c28376feb311f46eeda;hb=refs/heads/SANDBOX-488
> >
> >  ------------------------------
> >  *From:* Benedikt Ritter <[hidden email]>
> > *To:* Commons Developers List <[hidden email]>; Bruno P.
> > Kinoshita <[hidden email]>
> > *Sent:* Friday, December 19, 2014 2:35 AM
> >
> > *Subject:* Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> >
> >
> > 2014-12-14 23:10 GMT+01:00 Bruno P. Kinoshita <
> [hidden email]>
> > :
> >
> > > Sounds good, although I'm not sure I understand where you are going
> > with> the marker interface. What is it's purpose?
> > Let's then keep the StringMetric interface and update its Javadoc.
> > Thinking again, that other marker interface seems to be unnecessary.  >
> > Okay, but we need to make sure all algorithms really return a
> > distance> then. As I said, FuzzyDistance currently really returns a
> > similarity score.> An algorithm returning a distance should return a
> higher
> > number for higher> distances. I had a look at the code, and I think I
> > understand what you are saying now. In FuzzyDistance, the higher the
> score,
> > the closer strings are. Different than what the other algorithms return.
> > I believe I found why I named that package similarity. Probably it was
> > because I saw that in the stringmetric library [1]. There, Levenshtein,
> > Jaccard and other algorithms are suffixed with "Metric".
> > How about we keep the package as similarity and simply rename the classes
> > to [Algo]Metric too? This way we will be able to accommodate other
> metrics
> > such as the Sorensen-Dice coefficient, where the higher the coefficient,
> > more similar two strings are.
> > WDYT?
> >
> >
> >
> > Hey Bruno,
> >
> > yes we can do it that way. What I want to avoid is, that the users have
> to
> > check the JavaDoc every time they use an algorithms. To me it would make
> > sense to have a number of distance algorithms and they all return a
> > distance. Or we have Similarity algorithms and they all return a
> > similarity. That way users can swap out the underlying algorithms without
> > changing their code.
> >
> > Benedikt
> >
> >
> > CheersBruno
> > [1] https://github.com/rockymadden/stringmetric
> >
> >
> >
> >      From: Benedikt Ritter <[hidden email]>
> >  To: Commons Developers List <[hidden email]>; Bruno P.
> Kinoshita
> > <[hidden email]>
> >  Sent: Sunday, December 14, 2014 6:45 PM
> >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> >
> > Hi Bruna,
> >
> >
> >
> > 2014-12-14 21:37 GMT+01:00 Bruno P. Kinoshita <
> [hidden email]
> > >:
> > >
> > > Hello Benedikt!
> > > > Metric feels like it's something more general, but I'm not sure.
> > > You're right. Metric was supposed to be a general interface,
> > > representing the String Metric from the Wikipedia article.
> > > >  and the interface from StringMetric to StringDistance.
> > > I'm reading the Myers paper, and already have a local branch with the
> > > Myers algorithm from [collections] ported to [text].
> > > Perhaps we could move the StringMetric interface to o.a.c.text package,
> > > and create StringDistance or EditDistance interface in
> > o.a.c.text.distance.
> > > This way we can have String Metrics as in Wikipedia, as being a way of
> > > giving a valuefor comparing two strings. We would have the edit
> distances
> > > in the distance package, and the diff algorithms in another diff
> package.
> > > All of them being String Metrics.
> > > What do you think?
> > >
> >
> > Sounds good, although I'm not sure I understand where you are going with
> > the marker interface. What is it's purpose?
> >
> >
> > > > > I think we should consider renaming everything to distance, since
> > > the> > implemented algorithms all end on *Distance. So we would change
> > the
> > > package> > name from o.a.c.text.similarity to o.a.c.text.distance and
> the
> > > interface> > from StringMetric to StringDistance.> >>
> > > > Looking at the code again, it seems like the algorithms all really
> > > return a> similarity score and not a distance. For exmaple
> FuzzyDistance
> > > JavaDoc> states: "A higher score indicates a higher similarity". If
> this
> > is
> > > a case,> maybe it makes more sense to rename everything to Similarity?
> > > I'm in favor of dropping score and similarity, and adopting distance in
> > > the package, classes and javadocs, as it is used in other tools (e.g.
> > Solr,
> > > Talend, Informatica IIR, etc).
> > >
> >
> > Okay, but we need to make sure all algorithms really return a distance
> > then. As I said, FuzzyDistance currently really returns a similarity
> score.
> > An algorithm returning a distance should return a higher number for
> higher
> > distances.
> >
> > Benedikt
> >
> >
> > > All the best,Bruno
> > >
> > >
> > >      From: Benedikt Ritter <[hidden email]>
> > >  To: Commons Developers List <[hidden email]>
> > >  Sent: Sunday, December 14, 2014 6:20 PM
> > >  Subject: Re: [TEXT] Distance vs. Metric vs. Similarity
> > >
> > > 2014-12-14 21:08 GMT+01:00 Benedikt Ritter <[hidden email]>:
> > > >
> > > > Hi,
> > > >
> > > > currently the wording in commons text is a bit confusing. We have the
> > > > three terms:
> > > >
> > > > - distance
> > > > - similarity
> > > > - metric
> > > >
> > > > Distance and similarity seem to be just opposites of the same thing.
> A
> > > > great distance indicates a small similarity between two character
> > > > sequences. Metric feels like it's something more general, but I'm not
> > > sure.
> > > >
> > > > I think we should consider renaming everything to distance, since the
> > > > implemented algorithms all end on *Distance. So we would change the
> > > package
> > > > name from o.a.c.text.similarity to o.a.c.text.distance and the
> > interface
> > > > from StringMetric to StringDistance.
> > > >
> > >
> > > Looking at the code again, it seems like the algorithms all really
> > return a
> > > similarity score and not a distance. For exmaple FuzzyDistance JavaDoc
> > > states: "A higher score indicates a higher similarity". If this is a
> > case,
> > > maybe it makes more sense to rename everything to Similarity?
> > >
> > >
> > > >
> > > > WDYT?
> > > >
> > > > Benedikt
> > > >
> > > > --
> > > > http://people.apache.org/~britter/
> > > > http://www.systemoutprintln.de/
> > > > http://twitter.com/BenediktRitter
> > > > http://github.com/britter



--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter