[lang3] StringUtils does not handle supplementary characters correctly

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[lang3] StringUtils does not handle supplementary characters correctly

Jason Pickens
Hi,

I was just wondering whether StringUtils should be handling Unicode
supplementary characters correctly?

For example org.apache.commons.lang3.StringUtils#isAlphanumeric will return
false for code point 65536 which is actually a letter. This is because it
uses java.lang.CharSequence#charAt rather
than java.lang.CharSequence#codePoints. The former will only return the
high-surrogate code unit if that code point is a supplementary code point.


Cheers,

Jason