[jira] [Created] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
StringEscapeUtils.escapeXml(str) does not support supplemental characters.
--------------------------------------------------------------------------

                 Key: LANG-728
                 URL: https://issues.apache.org/jira/browse/LANG-728
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 2.6
            Reporter: Taro Yabuki
            Priority: Minor


Hello.

StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:

String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
System.out.println(str);
//��

But, the output should be 𣎴.

According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
http://www.w3.org/International/questions/qa-escapes

In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)

     [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Taro Yabuki updated LANG-728:
-----------------------------

    Attachment: lang_2_6_escapexml_20110716.diff

Test code and patch for org/apache/commons/lang/Entities.java.

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

     [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-728:
-------------------------------

    Fix Version/s: 3.0.1

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0.1
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

    [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067492#comment-13067492 ]

Henri Yandell commented on LANG-728:
------------------------------------

The API has changed in Lang 3.0; however the issue remains. A failing test (with @Ignore) has been added to StringEscapeUtilsTest. Need to resolve this in 3.0.1.

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0.1
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

    [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067494#comment-13067494 ]

Henri Yandell commented on LANG-728:
------------------------------------

Happiness. This is fixed in 3.0 already! :)

It wasn't clear because escapeXml no longer escapes entities above 7f, instead you have to do a bit more work to get that. Here is the code you would need:

{code:java}
        CharSequenceTranslator escapeXml =
            StringEscapeUtils.ESCAPE_XML.with( UnicodeEscaper.between(0x7f, Integer.MAX_VALUE) );

        assertEquals("Supplementary character must be represented using a single escape", "\u233B4",
                escapeXml.translate("\uD84C\uDFB4"));
{code}

Also note the need to use a unicode escape and not a numeric entity in Java.

This has been added to the unit tests run each time. Marking this as Fixed in 3.0.

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Closed] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

     [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-728.
------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.0.1)
                   3.0

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Reopened] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

     [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell reopened LANG-728:
--------------------------------


Reopening as I explained things badly.

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0.1
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

     [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-728:
-------------------------------

    Fix Version/s:     (was: 3.0)
                   3.0.1

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0.1
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

    [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067501#comment-13067501 ]

Henri Yandell commented on LANG-728:
------------------------------------

I used the wrong translator :) Code should be:

{code:java}
        CharSequenceTranslator escapeXml =
            StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );

        assertEquals("Supplementary character must be represented using a single escape", "𣎴",
                escapeXml.translate("\uD84C\uDFB4"));
{code}

ie) Ignore the 'note the' comment.

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0.1
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Closed] (LANG-728) StringEscapeUtils.escapeXml(str) does not support supplemental characters.

Sebb (Jira)
In reply to this post by Sebb (Jira)

     [ https://issues.apache.org/jira/browse/LANG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-728.
------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.0.1)
                   3.0

Closing again as 'fixed in 3.0'.

> StringEscapeUtils.escapeXml(str) does not support supplemental characters.
> --------------------------------------------------------------------------
>
>                 Key: LANG-728
>                 URL: https://issues.apache.org/jira/browse/LANG-728
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 2.6
>            Reporter: Taro Yabuki
>            Priority: Minor
>              Labels: patch
>             Fix For: 3.0
>
>         Attachments: lang_2_6_escapexml_20110716.diff
>
>
> Hello.
> StringEscapeUtils.escapeXml(str) escapes Unicode characters greater than 0x7f to their numerical \\u equivalent:
> String str = StringEscapeUtils.escapeXml("\uD84C\uDFB4");
> System.out.println(str);
> //��
> But, the output should be 𣎴.
> According to W3C document "Using character escapes in markup and CSS," We must use the single, code point value for supplemental character.
> http://www.w3.org/International/questions/qa-escapes
> In fact, �� is not rendered correctly in some web browsers e.g., Firefox 5.0 and Chrome 12.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira