CharSequence vs. String (was Re: [GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

CharSequence vs. String (was Re: [GitHub] commons-text pull request #46: TEXT-85:Added CaseUtils class with camel case...)

Simon Spero
On Jun 12, 2017 10:47 AM, "arunvinudss" <[hidden email]> wrote:

Github user arunvinudss commented on a diff in the pull request:

    I am a bit biased towards using String instead of CharSequence . Yes
CharSequence allows us to pass String Buffers and builders and other types
as input potentially increasing the scope of the function but considering
the nature of work we do in this particular method it may not necessarily
be a good idea. My basic contention is that the minute we call toString()
on a charSequence  to do any sort of manipulation it becomes a costly
operation and we may lose performance .


True if the particular CharSequence is not in fact an instance of String.
String::toString returns this.

The bigger problem is that too many methods use String as a parameter or
return type, when  CharSequence would serve just as well. This indeed
requires the invocation of Object::toString.

For methods that use String as the return type, changing the result to
CharSequence is source and binary incompatible, and properly so (since at
some point the user may actually need a String).

A  generic method with Type parameter with CharSequence as bound (T extends
CharSequence) can sometimes be useful, and can be added in addition to
methods taking String arguments, but can't replace them.

There are some places in javac that have special treatment for String - for
example, the + operator , but jdk9 reduces that particular win by indyfying
concat.
If a method doesn't intrinsically require a String, then I prefer
CharSequence. It's probable that sooner or later something is going to
demand a String, but that's not a good reason to be "that guy" :-)

Note:
Strings can be an incredible waste of memory; 40 +  ⌈length/4⌉  bytes
(reduced to a mere  40 + ⌈length/8⌉ bytes in jdk9 when compact strings can
be used).

 This is incredibly painful if you have a vast number of small "strings",
which may not all need to be materialized simultaneously. See e.g. [1]
(~50MiB of UTF-8 chars becomes ~250MiB of Strings. And since there's no
individual humongous object  they all get to make the journey from TLAB to
Old Space the hard way. Note this predates jdk 9,but illustrates some of
the win from compact strings)

Storing the character data in a shared byte array is a huge win. Someone
should tell the jdk implementors to look at applications that do this.
Like, um, javac :-)

Materializing these strings as possibly transient  CharSequence's  is
really convenient... until some method just has to have a String

Also, wouldn't  some sort of low-space-overhead string storage be a good
fit for text?

Simon
[1]  Spero,S. (2015). Time And Relative Dimensions In Semantics: Is OWL
Bigger On The Inside? OWLED 2015. Available at
http://cgi.csc.liv.ac.uk/~valli/OWLED2015/OWLED_2015_paper_12.pdf
Loading...