[LANG] Add alphabet conversion API

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[LANG] Add alphabet conversion API

Eyal Allweil
Hi guys,
Would you be interested in adding a utility class that creates alphabet converters, perhaps using a helper method available from StringUtils? It doesn't have to stay the way it is now, but the API for the class - AlphabetConverter - is currently:
/** * The input is integers representing code points, but we can make it accept chars as well * * doNotEncode represents chars we want to leave in the original state (not to encode them using the chars in encoding) */
public AlphabetConverter(Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
public String encode (String original);

public String decode (String encoded);
In StringUtils, we could add

public AlphabetConverter getAlphabetConverter (Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
I used it to convert from unicode to latin letters, without using any chars I wanted as delimiters, and preserving the English alphabet as is for readability. If you'd like to add it, I'll clean up the code and prepare it for a pull request so you can review it.

It makes sense to me to add a method that returns the HashMaps used internally for the mappings so they can be serialized (and deserialized) for preserving the mapping.
Regards,Eyal Allweil (PayPal)



Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Simone Tripodi-2
Hi,
I personally think it would a very "nice to have" feature, I had to face
similar issues in the past and, if that feature was available would have
saved me developing time.

I just have a small request/suggestion: since int/char can be casted to
each other, I would use BitSets rather than Sets.

Good luck!
-Simo


http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi

On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <
[hidden email]> wrote:

> Hi guys,
> Would you be interested in adding a utility class that creates alphabet
> converters, perhaps using a helper method available from StringUtils? It
> doesn't have to stay the way it is now, but the API for the class -
> AlphabetConverter - is currently:
> /** * The input is integers representing code points, but we can make it
> accept chars as well * * doNotEncode represents chars we want to leave in
> the original state (not to encode them using the chars in encoding) */
> public AlphabetConverter(Set<Integer> original, Set<Integer> encoding,
> Set<Integer> doNotEncode);
> public String encode (String original);
>
> public String decode (String encoded);
> In StringUtils, we could add
>
> public AlphabetConverter getAlphabetConverter (Set<Integer> original,
> Set<Integer> encoding, Set<Integer> doNotEncode);
> I used it to convert from unicode to latin letters, without using any
> chars I wanted as delimiters, and preserving the English alphabet as is for
> readability. If you'd like to add it, I'll clean up the code and prepare it
> for a pull request so you can review it.
>
> It makes sense to me to add a method that returns the HashMaps used
> internally for the mappings so they can be serialized (and deserialized)
> for preserving the mapping.
> Regards,Eyal Allweil (PayPal)
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Benedikt Ritter-4
Hi Simo,

nice seeing a message from you again. Hope you're doing fine!

I currently don't understand the proposed API. Can you give some more
examples?

Thank you!
Benedikt

Simone Tripodi <[hidden email]> schrieb am Do., 1. Sep. 2016 um
11:26 Uhr:

> Hi,
> I personally think it would a very "nice to have" feature, I had to face
> similar issues in the past and, if that feature was available would have
> saved me developing time.
>
> I just have a small request/suggestion: since int/char can be casted to
> each other, I would use BitSets rather than Sets.
>
> Good luck!
> -Simo
>
>
> http://people.apache.org/~simonetripodi/
> http://twitter.com/simonetripodi
>
> On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <
> [hidden email]> wrote:
>
> > Hi guys,
> > Would you be interested in adding a utility class that creates alphabet
> > converters, perhaps using a helper method available from StringUtils? It
> > doesn't have to stay the way it is now, but the API for the class -
> > AlphabetConverter - is currently:
> > /** * The input is integers representing code points, but we can make it
> > accept chars as well * * doNotEncode represents chars we want to leave in
> > the original state (not to encode them using the chars in encoding) */
> > public AlphabetConverter(Set<Integer> original, Set<Integer> encoding,
> > Set<Integer> doNotEncode);
> > public String encode (String original);
> >
> > public String decode (String encoded);
> > In StringUtils, we could add
> >
> > public AlphabetConverter getAlphabetConverter (Set<Integer> original,
> > Set<Integer> encoding, Set<Integer> doNotEncode);
> > I used it to convert from unicode to latin letters, without using any
> > chars I wanted as delimiters, and preserving the English alphabet as is
> for
> > readability. If you'd like to add it, I'll clean up the code and prepare
> it
> > for a pull request so you can review it.
> >
> > It makes sense to me to add a method that returns the HashMaps used
> > internally for the mappings so they can be serialized (and deserialized)
> > for preserving the mapping.
> > Regards,Eyal Allweil (PayPal)
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Eyal Allweil
In reply to this post by Simone Tripodi-2
Hi Simo,
I'm not sure I understood how BitSets would be used in this case. For example, an example with chars might look like this.
AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'], ['a','e','f','g'],['a']) // 'a' is not encoded

and the mapping would become a -> a, b -> e, c -> f, d -> g
so encoding encode("abc") would become "aef".
Ints can be used instead of chars to support unicode code points that don't fit in a single char (which was our case, but if that seems overkill, the chars implementation is much more direct).
How did you mean the BitSet to be used?
Regards,Eyal

 

    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi <[hidden email]> wrote:
 

 Hi,I personally think it would a very "nice to have" feature, I had to face similar issues in the past and, if that feature was available would have saved me developing time.
I just have a small request/suggestion: since int/char can be casted to each other, I would use BitSets rather than Sets.
Good luck!-Simo

http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi
On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <[hidden email]> wrote:

Hi guys,
Would you be interested in adding a utility class that creates alphabet converters, perhaps using a helper method available from StringUtils? It doesn't have to stay the way it is now, but the API for the class - AlphabetConverter - is currently:
/** * The input is integers representing code points, but we can make it accept chars as well * * doNotEncode represents chars we want to leave in the original state (not to encode them using the chars in encoding) */
public AlphabetConverter(Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
public String encode (String original);

public String decode (String encoded);
In StringUtils, we could add

public AlphabetConverter getAlphabetConverter (Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
I used it to convert from unicode to latin letters, without using any chars I wanted as delimiters, and preserving the English alphabet as is for readability. If you'd like to add it, I'll clean up the code and prepare it for a pull request so you can review it.

It makes sense to me to add a method that returns the HashMaps used internally for the mappings so they can be serialized (and deserialized) for preserving the mapping.
Regards,Eyal Allweil (PayPal)







   
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Eyal Allweil
I've created a JIRA issue, https://issues.apache.org/jira/browse/LANG-1266, and a pull request for this: https://github.com/apache/commons-lang/pull/188
Regards,Eyal


 

    On Wednesday, September 7, 2016 5:27 PM, Eyal Allweil <[hidden email]> wrote:
 

 Hi Simo,
I'm not sure I understood how BitSets would be used in this case. For example, an example with chars might look like this.
AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'], ['a','e','f','g'],['a']) // 'a' is not encoded

and the mapping would become a -> a, b -> e, c -> f, d -> g
so encoding encode("abc") would become "aef".
Ints can be used instead of chars to support unicode code points that don't fit in a single char (which was our case, but if that seems overkill, the chars implementation is much more direct).
How did you mean the BitSet to be used?
Regards,Eyal

 

    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi <[hidden email]> wrote:
 

 Hi,I personally think it would a very "nice to have" feature, I had to face similar issues in the past and, if that feature was available would have saved me developing time.
I just have a small request/suggestion: since int/char can be casted to each other, I would use BitSets rather than Sets.
Good luck!-Simo

http://people.apache.org/~simonetripodi/
http://twitter.com/simonetripodi
On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <[hidden email]> wrote:

Hi guys,
Would you be interested in adding a utility class that creates alphabet converters, perhaps using a helper method available from StringUtils? It doesn't have to stay the way it is now, but the API for the class - AlphabetConverter - is currently:
/** * The input is integers representing code points, but we can make it accept chars as well * * doNotEncode represents chars we want to leave in the original state (not to encode them using the chars in encoding) */
public AlphabetConverter(Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
public String encode (String original);

public String decode (String encoded);
In StringUtils, we could add

public AlphabetConverter getAlphabetConverter (Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
I used it to convert from unicode to latin letters, without using any chars I wanted as delimiters, and preserving the English alphabet as is for readability. If you'd like to add it, I'll clean up the code and prepare it for a pull request so you can review it.

It makes sense to me to add a method that returns the HashMaps used internally for the mappings so they can be serialized (and deserialized) for preserving the mapping.
Regards,Eyal Allweil (PayPal)







   

   
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Rob Tompkins

> On Sep 13, 2016, at 4:39 AM, Eyal Allweil <[hidden email]> wrote:
>
> I've created a JIRA issue, https://issues.apache.org/jira/browse/LANG-1266, and a pull request for this: https://github.com/apache/commons-lang/pull/188
> Regards,Eyal
>
>
>
>
>    On Wednesday, September 7, 2016 5:27 PM, Eyal Allweil <[hidden email]> wrote:
>
>
> Hi Simo,
> I'm not sure I understood how BitSets would be used in this case. For example, an example with chars might look like this.
> AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'], ['a','e','f','g'],['a']) // 'a' is not encoded

Hello Eyal,

The first thing that springs to mind here is: are we naming this class appropriately? I’ll preface my naming argument with I’m coming from a mathematical background (combinatorics on words) here. Traditionally in the literature such a “mapping”

        f: {Kleene Closure A} -> {Kleene Closure B}

with the property f(StringConcatenate(x,y)) = StringConcatenate(f(x),f(y)) for x,y strings from {Kleene Closure A}, is called a “Morphism” [1, pg. 8][2]. Clearly that name is quite terse when one comes from an application development mindset, so I’m not sure that going with the theoretical name is appropriate here. That said, I minimally wanted to bring it up so that we can have open discourse about naming.

After looking at the code some, the following pop into my head (note. I’m not tied to any of the ideas here, just stating thoughts that ran through my head):
There are some stylistic differences that stand out (e.g. "methodName (signature)" as opposed to “methodName(signature)”).
More javadoc?
Do we need the “doNotEncodeMap”?
The “.equals" method could use a null check.
Do we want to accommodate non-invertible or non-decodable encodings (e.g. new AlphabetConverter([‘a’,’b’,’c’,’d’],[‘a’,’e’,’f’,’e’],[‘a’]))?
Do we want to accommodate alphabets over concatenated chars (e.g. new AlphabetConverter([‘ab’,’c’,’d’,e’],[‘a’,’k’,’hi’,’z’],[]))?

Personally I like the idea of having the ability of having the generalization of the input/output alphabets, but it would seem that would require having a superclass have that implementation and an extension for an invertible AlphabetConverter.

All that said, I’m not particularly tied to any of the ideas, and aside from the stylistic changes and the .equals bit, the changes seem quite reasonable. I would love to hear other folks’ thoughts on the proposed functionality.

Cheers,
-Rob

Biblio.
[1] Jean-Paul Allouche and Jeffrey Shallit. Automatic sequences. Cambridge University Press, Cambridge, 2003. Theory, ap- plications, and generalizations.

[2] https://en.wikipedia.org/wiki/Free_monoid#Morphisms

>
> and the mapping would become a -> a, b -> e, c -> f, d -> g
> so encoding encode("abc") would become "aef".
> Ints can be used instead of chars to support unicode code points that don't fit in a single char (which was our case, but if that seems overkill, the chars implementation is much more direct).
> How did you mean the BitSet to be used?
> Regards,Eyal
>
>
>
>    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi <[hidden email]> wrote:
>
>
> Hi,I personally think it would a very "nice to have" feature, I had to face similar issues in the past and, if that feature was available would have saved me developing time.
> I just have a small request/suggestion: since int/char can be casted to each other, I would use BitSets rather than Sets.
> Good luck!-Simo
>
> http://people.apache.org/~simonetripodi/
> http://twitter.com/simonetripodi
> On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <[hidden email]> wrote:
>
> Hi guys,
> Would you be interested in adding a utility class that creates alphabet converters, perhaps using a helper method available from StringUtils? It doesn't have to stay the way it is now, but the API for the class - AlphabetConverter - is currently:
> /** * The input is integers representing code points, but we can make it accept chars as well * * doNotEncode represents chars we want to leave in the original state (not to encode them using the chars in encoding) */
> public AlphabetConverter(Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
> public String encode (String original);
>
> public String decode (String encoded);
> In StringUtils, we could add
>
> public AlphabetConverter getAlphabetConverter (Set<Integer> original, Set<Integer> encoding, Set<Integer> doNotEncode);
> I used it to convert from unicode to latin letters, without using any chars I wanted as delimiters, and preserving the English alphabet as is for readability. If you'd like to add it, I'll clean up the code and prepare it for a pull request so you can review it.
>
> It makes sense to me to add a method that returns the HashMaps used internally for the mappings so they can be serialized (and deserialized) for preserving the mapping.
> Regards,Eyal Allweil (PayPal)
>
>
>
>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Benedikt Ritter-4
Does this really belong into [LANG]? We also have Commons Text [1] in the
sandbox, which seems to be a better home for this functionality.

Benedikt

[1] http://commons.apache.org/sandbox/commons-text/

Rob Tompkins <[hidden email]> schrieb am Di., 13. Sep. 2016 um
15:48 Uhr:

>
> > On Sep 13, 2016, at 4:39 AM, Eyal Allweil <[hidden email]>
> wrote:
> >
> > I've created a JIRA issue,
> https://issues.apache.org/jira/browse/LANG-1266, and a pull request for
> this: https://github.com/apache/commons-lang/pull/188
> > Regards,Eyal
> >
> >
> >
> >
> >    On Wednesday, September 7, 2016 5:27 PM, Eyal Allweil <
> [hidden email]> wrote:
> >
> >
> > Hi Simo,
> > I'm not sure I understood how BitSets would be used in this case. For
> example, an example with chars might look like this.
> > AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'],
> ['a','e','f','g'],['a']) // 'a' is not encoded
>
> Hello Eyal,
>
> The first thing that springs to mind here is: are we naming this class
> appropriately? I’ll preface my naming argument with I’m coming from a
> mathematical background (combinatorics on words) here. Traditionally in the
> literature such a “mapping”
>
>         f: {Kleene Closure A} -> {Kleene Closure B}
>
> with the property f(StringConcatenate(x,y)) = StringConcatenate(f(x),f(y))
> for x,y strings from {Kleene Closure A}, is called a “Morphism” [1, pg.
> 8][2]. Clearly that name is quite terse when one comes from an application
> development mindset, so I’m not sure that going with the theoretical name
> is appropriate here. That said, I minimally wanted to bring it up so that
> we can have open discourse about naming.
>
> After looking at the code some, the following pop into my head (note. I’m
> not tied to any of the ideas here, just stating thoughts that ran through
> my head):
> There are some stylistic differences that stand out (e.g. "methodName
> (signature)" as opposed to “methodName(signature)”).
> More javadoc?
> Do we need the “doNotEncodeMap”?
> The “.equals" method could use a null check.
> Do we want to accommodate non-invertible or non-decodable encodings (e.g.
> new AlphabetConverter([‘a’,’b’,’c’,’d’],[‘a’,’e’,’f’,’e’],[‘a’]))?
> Do we want to accommodate alphabets over concatenated chars (e.g. new
> AlphabetConverter([‘ab’,’c’,’d’,e’],[‘a’,’k’,’hi’,’z’],[]))?
>
> Personally I like the idea of having the ability of having the
> generalization of the input/output alphabets, but it would seem that would
> require having a superclass have that implementation and an extension for
> an invertible AlphabetConverter.
>
> All that said, I’m not particularly tied to any of the ideas, and aside
> from the stylistic changes and the .equals bit, the changes seem quite
> reasonable. I would love to hear other folks’ thoughts on the proposed
> functionality.
>
> Cheers,
> -Rob
>
> Biblio.
> [1] Jean-Paul Allouche and Jeffrey Shallit. Automatic sequences. Cambridge
> University Press, Cambridge, 2003. Theory, ap- plications, and
> generalizations.
>
> [2] https://en.wikipedia.org/wiki/Free_monoid#Morphisms
>
> >
> > and the mapping would become a -> a, b -> e, c -> f, d -> g
> > so encoding encode("abc") would become "aef".
> > Ints can be used instead of chars to support unicode code points that
> don't fit in a single char (which was our case, but if that seems overkill,
> the chars implementation is much more direct).
> > How did you mean the BitSet to be used?
> > Regards,Eyal
> >
> >
> >
> >    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi <
> [hidden email]> wrote:
> >
> >
> > Hi,I personally think it would a very "nice to have" feature, I had to
> face similar issues in the past and, if that feature was available would
> have saved me developing time.
> > I just have a small request/suggestion: since int/char can be casted to
> each other, I would use BitSets rather than Sets.
> > Good luck!-Simo
> >
> > http://people.apache.org/~simonetripodi/
> > http://twitter.com/simonetripodi
> > On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <[hidden email]>
> wrote:
> >
> > Hi guys,
> > Would you be interested in adding a utility class that creates alphabet
> converters, perhaps using a helper method available from StringUtils? It
> doesn't have to stay the way it is now, but the API for the class -
> AlphabetConverter - is currently:
> > /** * The input is integers representing code points, but we can make it
> accept chars as well * * doNotEncode represents chars we want to leave in
> the original state (not to encode them using the chars in encoding) */
> > public AlphabetConverter(Set<Integer> original, Set<Integer> encoding,
> Set<Integer> doNotEncode);
> > public String encode (String original);
> >
> > public String decode (String encoded);
> > In StringUtils, we could add
> >
> > public AlphabetConverter getAlphabetConverter (Set<Integer> original,
> Set<Integer> encoding, Set<Integer> doNotEncode);
> > I used it to convert from unicode to latin letters, without using any
> chars I wanted as delimiters, and preserving the English alphabet as is for
> readability. If you'd like to add it, I'll clean up the code and prepare it
> for a pull request so you can review it.
> >
> > It makes sense to me to add a method that returns the HashMaps used
> internally for the mappings so they can be serialized (and deserialized)
> for preserving the mapping.
> > Regards,Eyal Allweil (PayPal)
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Bruno P. Kinoshita-3
+1
Bruno

 
      From: Benedikt Ritter <[hidden email]>
 To: Commons Developers List <[hidden email]>
 Sent: Wednesday, 14 September 2016 2:06 AM
 Subject: Re: [LANG] Add alphabet conversion API
   
Does this really belong into [LANG]? We also have Commons Text [1] in the
sandbox, which seems to be a better home for this functionality.

Benedikt

[1] http://commons.apache.org/sandbox/commons-text/

Rob Tompkins <[hidden email]> schrieb am Di., 13. Sep. 2016 um
15:48 Uhr:

>
> > On Sep 13, 2016, at 4:39 AM, Eyal Allweil <[hidden email]>
> wrote:
> >
> > I've created a JIRA issue,
> https://issues.apache.org/jira/browse/LANG-1266, and a pull request for
> this: https://github.com/apache/commons-lang/pull/188
> > Regards,Eyal
> >
> >
> >
> >
> >    On Wednesday, September 7, 2016 5:27 PM, Eyal Allweil <
> [hidden email]> wrote:
> >
> >
> > Hi Simo,
> > I'm not sure I understood how BitSets would be used in this case. For
> example, an example with chars might look like this.
> > AlphabetConverter ac = new AlphabetConverter(['a','b','c','d'],
> ['a','e','f','g'],['a']) // 'a' is not encoded
>
> Hello Eyal,
>
> The first thing that springs to mind here is: are we naming this class
> appropriately? I’ll preface my naming argument with I’m coming from a
> mathematical background (combinatorics on words) here. Traditionally in the
> literature such a “mapping”
>
>        f: {Kleene Closure A} -> {Kleene Closure B}
>
> with the property f(StringConcatenate(x,y)) = StringConcatenate(f(x),f(y))
> for x,y strings from {Kleene Closure A}, is called a “Morphism” [1, pg.
> 8][2]. Clearly that name is quite terse when one comes from an application
> development mindset, so I’m not sure that going with the theoretical name
> is appropriate here. That said, I minimally wanted to bring it up so that
> we can have open discourse about naming.
>
> After looking at the code some, the following pop into my head (note. I’m
> not tied to any of the ideas here, just stating thoughts that ran through
> my head):
> There are some stylistic differences that stand out (e.g. "methodName
> (signature)" as opposed to “methodName(signature)”).
> More javadoc?
> Do we need the “doNotEncodeMap”?
> The “.equals" method could use a null check.
> Do we want to accommodate non-invertible or non-decodable encodings (e.g.
> new AlphabetConverter([‘a’,’b’,’c’,’d’],[‘a’,’e’,’f’,’e’],[‘a’]))?
> Do we want to accommodate alphabets over concatenated chars (e.g. new
> AlphabetConverter([‘ab’,’c’,’d’,e’],[‘a’,’k’,’hi’,’z’],[]))?
>
> Personally I like the idea of having the ability of having the
> generalization of the input/output alphabets, but it would seem that would
> require having a superclass have that implementation and an extension for
> an invertible AlphabetConverter.
>
> All that said, I’m not particularly tied to any of the ideas, and aside
> from the stylistic changes and the .equals bit, the changes seem quite
> reasonable. I would love to hear other folks’ thoughts on the proposed
> functionality.
>
> Cheers,
> -Rob
>
> Biblio.
> [1] Jean-Paul Allouche and Jeffrey Shallit. Automatic sequences. Cambridge
> University Press, Cambridge, 2003. Theory, ap- plications, and
> generalizations.
>
> [2] https://en.wikipedia.org/wiki/Free_monoid#Morphisms
>
> >
> > and the mapping would become a -> a, b -> e, c -> f, d -> g
> > so encoding encode("abc") would become "aef".
> > Ints can be used instead of chars to support unicode code points that
> don't fit in a single char (which was our case, but if that seems overkill,
> the chars implementation is much more direct).
> > How did you mean the BitSet to be used?
> > Regards,Eyal
> >
> >
> >
> >    On Thursday, September 1, 2016 12:26 PM, Simone Tripodi <
> [hidden email]> wrote:
> >
> >
> > Hi,I personally think it would a very "nice to have" feature, I had to
> face similar issues in the past and, if that feature was available would
> have saved me developing time.
> > I just have a small request/suggestion: since int/char can be casted to
> each other, I would use BitSets rather than Sets.
> > Good luck!-Simo
> >
> > http://people.apache.org/~simonetripodi/
> > http://twitter.com/simonetripodi
> > On Thu, Sep 1, 2016 at 10:53 AM, Eyal Allweil <[hidden email]>
> wrote:
> >
> > Hi guys,
> > Would you be interested in adding a utility class that creates alphabet
> converters, perhaps using a helper method available from StringUtils? It
> doesn't have to stay the way it is now, but the API for the class -
> AlphabetConverter - is currently:
> > /** * The input is integers representing code points, but we can make it
> accept chars as well * * doNotEncode represents chars we want to leave in
> the original state (not to encode them using the chars in encoding) */
> > public AlphabetConverter(Set<Integer> original, Set<Integer> encoding,
> Set<Integer> doNotEncode);
> > public String encode (String original);
> >
> > public String decode (String encoded);
> > In StringUtils, we could add
> >
> > public AlphabetConverter getAlphabetConverter (Set<Integer> original,
> Set<Integer> encoding, Set<Integer> doNotEncode);
> > I used it to convert from unicode to latin letters, without using any
> chars I wanted as delimiters, and preserving the English alphabet as is for
> readability. If you'd like to add it, I'll clean up the code and prepare it
> for a pull request so you can review it.
> >
> > It makes sense to me to add a method that returns the HashMaps used
> internally for the mappings so they can be serialized (and deserialized)
> for preserving the mapping.
> > Regards,Eyal Allweil (PayPal)
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>

   
 
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Eyal Allweil
In reply to this post by Rob Tompkins
Hi Benedict, Bruno -
I can see that commons-text might be a better home for this functionality than commons-lang. But is the project still active? There haven't been any commits in the past year, and there seem to be no maven binaries available (http://stackoverflow.com/questions/36374510/maven-dependency-to-apache-commons-text). (that said, I don't mind helping to push it back into activity)

Rob - thank you for your comments! A quick look already has me in agreement with most - I'll play around with it on my fork as soon as I have time and push my changes so they're visible.
Thanks,Eyal


Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Benedikt Ritter-4
Hello Eyal,

Commons Text started as an experiment but we never pushed out a release. If
there is interest again, we can get things started again.

Regards,
Benedikt

Eyal Allweil <[hidden email]> schrieb am Do., 15. Sep. 2016
um 12:34 Uhr:

> Hi Benedict, Bruno -
> I can see that commons-text might be a better home for this functionality
> than commons-lang. But is the project still active? There haven't been any
> commits in the past year, and there seem to be no maven binaries available (
> http://stackoverflow.com/questions/36374510/maven-dependency-to-apache-commons-text).
> (that said, I don't mind helping to push it back into activity)
>
> Rob - thank you for your comments! A quick look already has me in
> agreement with most - I'll play around with it on my fork as soon as I have
> time and push my changes so they're visible.
> Thanks,Eyal
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

garydgregory
There is already creep of text-like code in [lang]
like org.apache.commons.lang3.text.WordUtils.

I'd rather see a [text] component come alive.

Gary

On Thu, Sep 15, 2016 at 4:22 AM, Benedikt Ritter <[hidden email]> wrote:

> Hello Eyal,
>
> Commons Text started as an experiment but we never pushed out a release. If
> there is interest again, we can get things started again.
>
> Regards,
> Benedikt
>
> Eyal Allweil <[hidden email]> schrieb am Do., 15. Sep.
> 2016
> um 12:34 Uhr:
>
> > Hi Benedict, Bruno -
> > I can see that commons-text might be a better home for this functionality
> > than commons-lang. But is the project still active? There haven't been
> any
> > commits in the past year, and there seem to be no maven binaries
> available (
> > http://stackoverflow.com/questions/36374510/maven-
> dependency-to-apache-commons-text).
> > (that said, I don't mind helping to push it back into activity)
> >
> > Rob - thank you for your comments! A quick look already has me in
> > agreement with most - I'll play around with it on my fork as soon as I
> have
> > time and push my changes so they're visible.
> > Thanks,Eyal
> >
> >
> >
>



--
E-Mail: [hidden email] | [hidden email]
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Eyal Allweil
Ok, I'm convinced. Should I cancel my existing pull request and close the existing jira issue (LANG-1266) and make a new jira issue? Do we need a new thread (in the mailing list for this, or is working with comments in jira enough?)
Regards,Eyal
 

    On Thursday, September 15, 2016 8:37 PM, Gary Gregory <[hidden email]> wrote:
 

 There is already creep of text-like code in [lang] like org.apache.commons.lang3.text.WordUtils. I'd rather see a [text] component come alive.
Gary
On Thu, Sep 15, 2016 at 4:22 AM, Benedikt Ritter <[hidden email]> wrote:

Hello Eyal,

Commons Text started as an experiment but we never pushed out a release. If
there is interest again, we can get things started again.

Regards,
Benedikt

Eyal Allweil <[hidden email]. invalid> schrieb am Do., 15. Sep. 2016
um 12:34 Uhr:

> Hi Benedict, Bruno -
> I can see that commons-text might be a better home for this functionality
> than commons-lang. But is the project still active? There haven't been any
> commits in the past year, and there seem to be no maven binaries available (
> http://stackoverflow.com/ questions/36374510/maven- dependency-to-apache-commons- text).
> (that said, I don't mind helping to push it back into activity)
>
> Rob - thank you for your comments! A quick look already has me in
> agreement with most - I'll play around with it on my fork as soon as I have
> time and push my changes so they're visible.
> Thanks,Eyal
>
>
>




--
E-Mail: [hidden email] | [hidden email]
Java Persistence with Hibernate, Second Edition
JUnit in Action, Second Edition
Spring Batch in Action
Blog: http://garygregory.wordpress.com 
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

   
Reply | Threaded
Open this post in threaded view
|

Re: [LANG] Add alphabet conversion API

Benedikt Ritter-4
Hello Eyal,

yes you should close the PR against Commons Lang and create a new one for
Commons Text [1]. I think I can move the Issue to the Commons Text Project,
so you don't need to create a new one. General design discussions are
usually held on the mailing list. But details around a specific issue can
also be documented in jira only.

Regards,
Benedikt

[1] https://github.com/apache/commons-text

Eyal Allweil <[hidden email]> schrieb am So., 18. Sep. 2016
um 13:27 Uhr:

> Ok, I'm convinced. Should I cancel my existing pull request and close the
> existing jira issue (LANG-1266) and make a new jira issue? Do we need a new
> thread (in the mailing list for this, or is working with comments in jira
> enough?)
> Regards,Eyal
>
>
>     On Thursday, September 15, 2016 8:37 PM, Gary Gregory <
> [hidden email]> wrote:
>
>
>  There is already creep of text-like code in [lang]
> like org.apache.commons.lang3.text.WordUtils. I'd rather see a [text]
> component come alive.
> Gary
> On Thu, Sep 15, 2016 at 4:22 AM, Benedikt Ritter <[hidden email]>
> wrote:
>
> Hello Eyal,
>
> Commons Text started as an experiment but we never pushed out a release. If
> there is interest again, we can get things started again.
>
> Regards,
> Benedikt
>
> Eyal Allweil <[hidden email]. invalid> schrieb am Do., 15. Sep.
> 2016
> um 12:34 Uhr:
>
> > Hi Benedict, Bruno -
> > I can see that commons-text might be a better home for this functionality
> > than commons-lang. But is the project still active? There haven't been
> any
> > commits in the past year, and there seem to be no maven binaries
> available (
> > http://stackoverflow.com/ questions/36374510/maven-
> dependency-to-apache-commons- text).
> > (that said, I don't mind helping to push it back into activity)
> >
> > Rob - thank you for your comments! A quick look already has me in
> > agreement with most - I'll play around with it on my fork as soon as I
> have
> > time and push my changes so they're visible.
> > Thanks,Eyal
> >
> >
> >
>
>
>
>
> --
> E-Mail: [hidden email] | [hidden email]
> Java Persistence with Hibernate, Second Edition
> JUnit in Action, Second Edition
> Spring Batch in Action
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>
>