[jira] Created: (LANG-510) Convert StringUtils API to take CharSequence

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
Convert StringUtils API to take CharSequence
--------------------------------------------

                 Key: LANG-510
                 URL: https://issues.apache.org/jira/browse/LANG-510
             Project: Commons Lang
          Issue Type: Improvement
            Reporter: Henri Yandell
             Fix For: 3.0


Wherever possible, use CharSequence and not String in the StringUtils API.

substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752237#action_12752237 ]

Henri Yandell commented on LANG-510:
------------------------------------

Said 'merely' walking method needs to match JVM speed though before we'd want the rest of the codebase to be dependent on it. Which I doubt we'd want to do.

Here is the Harmony String.java implementation:

http://svn.apache.org/repos/asf/harmony/enhanced/classlib/trunk/modules/luni/src/main/java/java/lang/String.java

It uses both direct private variable access on the subString and it works at the array level without having to incur charAt method calls.

Various naive implementations of unoptimized that I just hacked together only reach the 3x speed (and no idea if they're actually accurate).  So I think that's it for the idea of moving the parts of StringUtils that rely on indexOf over to CharSequence.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752238#action_12752238 ]

Henri Yandell commented on LANG-510:
------------------------------------

Though...

Could have an indexOf method for CharSequence that IF both string and substring are String, then it casts to String. That has a pretty minimal cost. Equally could add optimized code for StrBuilder and StringBuffer too.

That's tempting.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752241#action_12752241 ]

Henri Yandell commented on LANG-510:
------------------------------------

Other notes:

1) Underlying String/StrBuilder/StringBuffer indexOf methods all take String as the delimiter. So it would be very tempting to generate a String internally on the delimiter.
2) Boyer Moore implementation might be a useful feature in text.* or in Codec.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830664#action_12830664 ]

Henri Yandell commented on LANG-510:
------------------------------------

Alternatively - for now - we could merely call toString on the CharSequence class if it's not a String.

Basically making the return str.indexOf(...) into:

{code:java}
        if (str instanceof String) {
            return ((String) str).indexOf(searchChar);
        } else {
            return str.toString().indexOf(searchChar);
        }
{code}

Cost to existing String users is an instanceof check and a cast.

We could then later implement a String indexOf based on charAt and subSequence. Or possibly just charAt if we feel subSequence is expensive.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-510:
-------------------------------

    Comment: was deleted

(was: Alternatively - for now - we could merely call toString on the CharSequence class if it's not a String.

Basically making the return str.indexOf(...) into:

{code:java}
        if (str instanceof String) {
            return ((String) str).indexOf(searchChar);
        } else {
            return str.toString().indexOf(searchChar);
        }
{code}

Cost to existing String users is an instanceof check and a cast.

We could then later implement a String indexOf based on charAt and subSequence. Or possibly just charAt if we feel subSequence is expensive.)

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-510:
-------------------------------

    Attachment: LANG-510-indexOf.patch

Very tempting to just add the CharSequence API, and have it call toString() under the hood if not a String.

Such a patch is attached.

The value in moving indexOf over to using CharSequence is less in its own value, than in it will allow us to move many methods over. We get StringBufferUtils for the cost of moving various methods over to using StringUtils.indexOf with pairs of instanceof checks and casts.

I considered having the code invoke toString() instead of the instanceof/cast for the moment. They're of general equal cost - the former slightly beats the latter in my performance tests but the latter is better placed to have us implement our own CharSequence indexOf (which we need to do at some point).

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830673#action_12830673 ]

Henri Yandell commented on LANG-510:
------------------------------------

Methods that could move to CharSequence if we had an indexOf:

stripStart
stripEnd
ordinalIndexOf; though this requires a lastIndexOf(CharSequence also being added)
contains
indexOfAny(String, String[])
lastIndexOfAny(String, String[]); with said method above being added
substringBefore/After etc
possible all the split code

Can be moved over already:

substring - with a slight slow down for the single int method as subSequence will force it to calculate the length (not actually a calculation Harmony directly repeats)
many of the indexOfAny methods
left, right, mid

.... I got tired of listing things :) Still a lot that can move over without the indexOf change, and even more that can with that change.

[In fact I'm cursing the lack of indexOf on CharSequence]

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830676#action_12830676 ]

Henri Yandell commented on LANG-510:
------------------------------------

Note for future changes - don't change javadoc to say 'CharSequence'. I think that makes things much harder to read. Nicer to keep it with String, yet have the magic of StringBuffer, StrBuilder and StringBuilder working as well.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830677#action_12830677 ]

Henri Yandell commented on LANG-510:
------------------------------------

Played with changing left/right/mid. Issue here, same substring will have, is that it changes the method from String left(String) to CharSequence left(CharSequence) due to the reliance on subSequence.

That means using code is going to have to start casting. Ugh. That will be a general issue for any String manipulation as opposed to query API. For these methods we are going to need to do overloads:

{code:java}
    String doFoo(String)
   
    CharSequence doFoo(CharSequence)
{code}

Smells likely to be copy and paste code. Ugh again.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830677#action_12830677 ]

Henri Yandell edited comment on LANG-510 at 2/7/10 7:25 AM:
------------------------------------------------------------

Played with changing left/right/mid. Issue here, same substring will have, is that it changes the method from String left(String) to CharSequence left(CharSequence) due to the reliance on subSequence.

That means using code is going to have to start casting. Ugh. That will be a general issue for any String manipulation as opposed to query API. For these methods we are going to need to do overloads:

{code:java}
    String doFoo(String)
   
    CharSequence doFoo(CharSequence)
{code}

Smells likely to be copy and paste code. Ugh again.

UPDATE: Must be late... this is generics territory :)

      was (Author: bayard):
    Played with changing left/right/mid. Issue here, same substring will have, is that it changes the method from String left(String) to CharSequence left(CharSequence) due to the reliance on subSequence.

That means using code is going to have to start casting. Ugh. That will be a general issue for any String manipulation as opposed to query API. For these methods we are going to need to do overloads:

{code:java}
    String doFoo(String)
   
    CharSequence doFoo(CharSequence)
{code}

Smells likely to be copy and paste code. Ugh again.
 

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830677#action_12830677 ]

Henri Yandell edited comment on LANG-510 at 2/7/10 7:31 AM:
------------------------------------------------------------

Played with changing left/right/mid. Issue here, same substring will have, is that it changes the method from String left(String) to CharSequence left(CharSequence) due to the reliance on subSequence.

That means using code is going to have to start casting. Ugh. That will be a general issue for any String manipulation as opposed to query API. For these methods we are going to need to do overloads:

{code:java}
    String doFoo(String)
   
    CharSequence doFoo(CharSequence)
{code}

Smells likely to be copy and paste code. Ugh again.

UPDATE: Must be late... this is generics territory :)
UPDATE2: Too late for brain today apparently. Generics get hurt by EMPTY being a String and subSequence returning a CharSequence. On the plus side - I did identify that copy and paste is not needed in the overloads.

      was (Author: bayard):
    Played with changing left/right/mid. Issue here, same substring will have, is that it changes the method from String left(String) to CharSequence left(CharSequence) due to the reliance on subSequence.

That means using code is going to have to start casting. Ugh. That will be a general issue for any String manipulation as opposed to query API. For these methods we are going to need to do overloads:

{code:java}
    String doFoo(String)
   
    CharSequence doFoo(CharSequence)
{code}

Smells likely to be copy and paste code. Ugh again.

UPDATE: Must be late... this is generics territory :)
 

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842220#action_12842220 ]

Henri Yandell commented on LANG-510:
------------------------------------

lastIndexOf method can be CharSequence'd. Same way as indexOf for now - use toString on non-Strings to get a String and use its API. Later rewrite with a charAt algorithm.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (LANG-510) Convert StringUtils API to take CharSequence

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LANG-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-510.
------------------------------

    Resolution: Fixed

Closing this out.

I'd love to move the indexOf based methods over, but it would mean a notable performance drop.

> Convert StringUtils API to take CharSequence
> --------------------------------------------
>
>                 Key: LANG-510
>                 URL: https://issues.apache.org/jira/browse/LANG-510
>             Project: Commons Lang
>          Issue Type: Improvement
>          Components: lang.*
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>         Attachments: LANG-510-indexOf.patch
>
>
> Wherever possible, use CharSequence and not String in the StringUtils API.
> substring can go to subSequence inside code. The general substring method can be replaced with a subSequence method and substring deprecated(?). One question is whether to implement a CharSequence.indexOf type method. Given that it's merely a walking method(?!?), this might be quite handy and would allow a bunch of StringUtils methods to move over.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.