[jira] [Created] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
------------------------------------------------------------------

                 Key: LANG-710
                 URL: https://issues.apache.org/jira/browse/LANG-710
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.0
         Environment: java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

            Reporter: Benjamin Valentin
            Priority: Minor


When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
        at java.lang.String.charAt(String.java:686)
        at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
        at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
        at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
        at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
        at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)

    [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058800#comment-13058800 ]

Benjamin Valentin commented on LANG-710:
----------------------------------------

any & followed by an invalid escape sequence or just a solitary "&" will have the same effect.

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

     [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-710:
-------------------------------

    Fix Version/s: 3.x

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.x
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Closed] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

     [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-710.
------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 3.x)
                   3.0

Thanks Benjamin.

I've fixed this in trunk - it would be very appreciated if you could confirm this works for you.

svn ci -m "Adding tests and resolving LANG-710, reported by Benjamin Valentin. Note that this changed such that the code will now escape an unfinished entity (i.e. &#030). This matches browser behaviour. "
Sending        src/main/java/org/apache/commons/lang3/text/translate/NumericEntityUnescaper.java
Sending        src/test/java/org/apache/commons/lang3/text/translate/NumericEntityUnescaperTest.java
Transmitting file data ..
Committed revision 1142389.



> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

    [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059340#comment-13059340 ]

Joerg Schaible commented on LANG-710:
-------------------------------------

I am not sure whether this is really a good idea to accept this input silently. In the end it is an HTML syntax error. While a SIOOBE is not really helpful, a java.text.ParseException seems to me mkore appropriate. WDYT?

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

    [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059451#comment-13059451 ]

Gary D. Gregory commented on LANG-710:
--------------------------------------

Ignoring garbage Input seems like trouble to me too that can only make things worse. Just imagine, why is one method accepting garbage input and another not. IMO, garbage in means that you should blow up.

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Reopened] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

     [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell reopened LANG-710:
--------------------------------

      Assignee: Henri Yandell

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Assignee: Henri Yandell
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

    [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060285#comment-13060285 ]

Henri Yandell commented on LANG-710:
------------------------------------

Agreed. I was thinking of the browser accepting it, but that's only really applicable to the escape method and because browser's support weakly defined human input. An unescape method should run on already escaped code, and that code should have been escaped properly.

I'll look into throwing a ParseException.

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

    [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061027#comment-13061027 ]

Henri Yandell commented on LANG-710:
------------------------------------

So the basic issue imo is that ParseException is a typed exception - we'd have to introduce it to the StringEscapeUtils API.

I'm uncomfortable throwing a random IllegalArgumentException (or similar) when the bad data is passed in. That may be the typed-exception fan in me speaking. I don't like discovering at 4am that someone found a piece of data that caused a heretofore unknown runtime exception to occur.

So we have three options:

1: Leave the data unescaped because it is poorly typed.
2: Claim that we're dealing with XHTML and throw an exception.
3: Escape the data.

All the options seem useful, but none of them seem perfect. So I've implemented all three.

svn ci -m "Making unescapeHtml _NOT_ escape unfinished numeric entities by default (it ignores them); however adding options that will fire an exception or unescape the numeric entity. LANG-710"
Sending        src/main/java/org/apache/commons/lang3/text/translate/NumericEntityUnescaper.java
Sending        src/test/java/org/apache/commons/lang3/text/translate/NumericEntityUnescaperTest.java
Transmitting file data ..
Committed revision 1143641.


> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Assignee: Henri Yandell
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Closed] (LANG-710) StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")

Gary D. Gregory (Jira)
In reply to this post by Gary D. Gregory (Jira)

     [ https://issues.apache.org/jira/browse/LANG-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-710.
------------------------------

    Resolution: Fixed

Resolving.

> StringIndexOutOfBoundsException when calling unescapeHtml4("&#03")
> ------------------------------------------------------------------
>
>                 Key: LANG-710
>                 URL: https://issues.apache.org/jira/browse/LANG-710
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>    Affects Versions: 3.0
>         Environment: java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
>            Reporter: Benjamin Valentin
>            Assignee: Henri Yandell
>            Priority: Minor
>              Labels: StringEscapeUtils, StringUtils
>             Fix For: 3.0
>
>
> When calling unescapeHtml4() on the String "&#03" (or any String that contains these characters) an Exception is thrown:
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 4
> at java.lang.String.charAt(String.java:686)
> at org.apache.commons.lang3.text.translate.NumericEntityUnescaper.translate(NumericEntityUnescaper.java:49)
> at org.apache.commons.lang3.text.translate.AggregateTranslator.translate(AggregateTranslator.java:53)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:88)
> at org.apache.commons.lang3.text.translate.CharSequenceTranslator.translate(CharSequenceTranslator.java:60)
> at org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(StringEscapeUtils.java:351)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira