[commons-text] Help debugging a very strange error

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[commons-text] Help debugging a very strange error

Christopher Schultz-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

In the past week, I've received reports of our servers starting to
incorrectly escape XML strings with consumer errors like this:

org.xml.sax.SAXParseException: The entity "rsquo" was referenced, but
not declared.

When looking at the raw text being generated, it's clear that, indeed,
the text is being escaped as if it were HTML (where the ’ entity
is defined) instead of XML.

The code path is a little convoluted, and I'm going to try to get the
smallest reproducible test case I can, but I thought I'd reach-out
early to see if anyone has any "aha" guidance to me before I tear-out
a whole lot of hair following this down the rabbit hole.

This is commons-text-1.1. I've looked at the release notes between 1.1
and 1.8 and I don't see anything immediately that looks like a bugfix.

The data is coming from a database, and the string is clearly correct,
and it includes a "typographic right apostrophe", which is accurately
’ in HTML.

The output is being generated by Apache Velocity, through a macro
which escapes XML for us. The code in the template looks like this:

#xmlEscape($foo)

Where $foo is a string value containing this character: ’

The xmlEscape macro is defined in our global macros file which gets
evaluated on startup:

#macro (xmlEscape
$text)#if($text)$!modernEscape.escapeXml10($text.toString())#end#end

$modernEscape is an instance of
org.apache.commons.text.StringEscapeUtils in the global-scope; it's
like "application" scope for webapps, but it's in Velocity.

When we first start our web application, all seems well. After some
time, this process breaks and we start emitting "’" instead of "’"
.

I can find no evidence of any of the following:

1. multiple versions of commons-text library
2. multiple versions org.apache.commons.text.StringEscapeUtil in any
library
3. any component replacing the value of $modernEscape
4. any component replacing the definition of the #xmlEscape macro

When the first report came in, we tried replicating the reporter's
experience and we could see it on one server node but not others. We
restarted that web application on that node and it started working
properly again.

Does StringEscapeUtils.escape* keep any state associated with what
it's doing? We aren't doing anything weird: just calling
StringEscapeUtils.escapeXml10 ... a lot of times, probably from many
threads.

Any ideas?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl4grhwACgkQHPApP6U8
pFga3RAAgPalqagLkEyGuWhKOaa6VbGaXRqLGNjd63byTM/TFKJyuVHsU3W0MpkC
LxG7IK1a+FuTcQuaxSY8tP9T/TH7p88y9cVpj2r8b4PXJLZ4SddOMxr/gT9MfBxA
7Vq+vpvwdkWOfcIqFBwgcx7h+EVGoUbzzYBbc301m5TxkK7kYtV6KmlGi4o3R68A
x5Ic6QtASxjaDZK6bywsHTxQWmp66+8j1QFInEtjP69Am+fkjKxE/vnTHFYha+Cr
rYuseQxhDMOyUOxhPQiU65sFzjGnS/0529EV0VykP59YNrpTGAxha7T5tSQL8iNy
p9fRv0X/Ijz6WznNiN6K36Ftu6OEyTouak0zfzKiOPZKhIvp+ofNaRbuA01O/Km/
hqt0bEdBtq8/nnYGsKmXuNv+18pWl8eY539w3kw572Rnzyxo5bdUX5YFCyq3dIeP
rhQDhA4DDpFfaHHsL1cIdLXs5b+0au85REwHusZe7iPCxZytUNahE9uDIcQhyRwJ
ix6+LgF+4nWHVtMnQL3Dw60Of/uIbvEs/Bfvc86dIGrEBhXoh2q1qLu1iwlBf7Jw
rxFsWmDv8T1jrWYmvKNispr2KUAhGf6bl+1PxxxdnKnUJdE09CqjDL/BnYclDqJZ
6f7pORqISRLiUN99KHNliC9TMwEBjmXUhV3QOoSx+d5IUTBB0/g=
=zk4m
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [commons-text] Help debugging a very strange error

Rob Tompkins
That’s super weird. Is there a defined amount of time before the method implementation seemingly switches?

-Rob

> On Jan 16, 2020, at 1:40 PM, Christopher Schultz <[hidden email]> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> All,
>
> In the past week, I've received reports of our servers starting to
> incorrectly escape XML strings with consumer errors like this:
>
> org.xml.sax.SAXParseException: The entity "rsquo" was referenced, but
> not declared.
>
> When looking at the raw text being generated, it's clear that, indeed,
> the text is being escaped as if it were HTML (where the &rsquo; entity
> is defined) instead of XML.
>
> The code path is a little convoluted, and I'm going to try to get the
> smallest reproducible test case I can, but I thought I'd reach-out
> early to see if anyone has any "aha" guidance to me before I tear-out
> a whole lot of hair following this down the rabbit hole.
>
> This is commons-text-1.1. I've looked at the release notes between 1.1
> and 1.8 and I don't see anything immediately that looks like a bugfix.
>
> The data is coming from a database, and the string is clearly correct,
> and it includes a "typographic right apostrophe", which is accurately
> &rsquo; in HTML.
>
> The output is being generated by Apache Velocity, through a macro
> which escapes XML for us. The code in the template looks like this:
>
> #xmlEscape($foo)
>
> Where $foo is a string value containing this character: ’
>
> The xmlEscape macro is defined in our global macros file which gets
> evaluated on startup:
>
> #macro (xmlEscape
> $text)#if($text)$!modernEscape.escapeXml10($text.toString())#end#end
>
> $modernEscape is an instance of
> org.apache.commons.text.StringEscapeUtils in the global-scope; it's
> like "application" scope for webapps, but it's in Velocity.
>
> When we first start our web application, all seems well. After some
> time, this process breaks and we start emitting "&rsquo;" instead of "’"
> .
>
> I can find no evidence of any of the following:
>
> 1. multiple versions of commons-text library
> 2. multiple versions org.apache.commons.text.StringEscapeUtil in any
> library
> 3. any component replacing the value of $modernEscape
> 4. any component replacing the definition of the #xmlEscape macro
>
> When the first report came in, we tried replicating the reporter's
> experience and we could see it on one server node but not others. We
> restarted that web application on that node and it started working
> properly again.
>
> Does StringEscapeUtils.escape* keep any state associated with what
> it's doing? We aren't doing anything weird: just calling
> StringEscapeUtils.escapeXml10 ... a lot of times, probably from many
> threads.
>
> Any ideas?
>
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/
>
> iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl4grhwACgkQHPApP6U8
> pFga3RAAgPalqagLkEyGuWhKOaa6VbGaXRqLGNjd63byTM/TFKJyuVHsU3W0MpkC
> LxG7IK1a+FuTcQuaxSY8tP9T/TH7p88y9cVpj2r8b4PXJLZ4SddOMxr/gT9MfBxA
> 7Vq+vpvwdkWOfcIqFBwgcx7h+EVGoUbzzYBbc301m5TxkK7kYtV6KmlGi4o3R68A
> x5Ic6QtASxjaDZK6bywsHTxQWmp66+8j1QFInEtjP69Am+fkjKxE/vnTHFYha+Cr
> rYuseQxhDMOyUOxhPQiU65sFzjGnS/0529EV0VykP59YNrpTGAxha7T5tSQL8iNy
> p9fRv0X/Ijz6WznNiN6K36Ftu6OEyTouak0zfzKiOPZKhIvp+ofNaRbuA01O/Km/
> hqt0bEdBtq8/nnYGsKmXuNv+18pWl8eY539w3kw572Rnzyxo5bdUX5YFCyq3dIeP
> rhQDhA4DDpFfaHHsL1cIdLXs5b+0au85REwHusZe7iPCxZytUNahE9uDIcQhyRwJ
> ix6+LgF+4nWHVtMnQL3Dw60Of/uIbvEs/Bfvc86dIGrEBhXoh2q1qLu1iwlBf7Jw
> rxFsWmDv8T1jrWYmvKNispr2KUAhGf6bl+1PxxxdnKnUJdE09CqjDL/BnYclDqJZ
> 6f7pORqISRLiUN99KHNliC9TMwEBjmXUhV3QOoSx+d5IUTBB0/g=
> =zk4m
> -----END PGP SIGNATURE-----
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [commons-text] Help debugging a very strange error

Christopher Schultz-2
In reply to this post by Christopher Schultz-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

As is common when weird things are happening, I wasn't looking in the
right place. A macro IS being re-defined. It just isn't #xmlEscape...
it's the one *calling* #xmlEscape.

:(

Thanks!
- -chris

On 1/16/20 1:40 PM, Christopher Schultz wrote:

> All,
>
> In the past week, I've received reports of our servers starting to
> incorrectly escape XML strings with consumer errors like this:
>
> org.xml.sax.SAXParseException: The entity "rsquo" was referenced,
> but not declared.
>
> When looking at the raw text being generated, it's clear that,
> indeed, the text is being escaped as if it were HTML (where the
> &rsquo; entity is defined) instead of XML.
>
> The code path is a little convoluted, and I'm going to try to get
> the smallest reproducible test case I can, but I thought I'd
> reach-out early to see if anyone has any "aha" guidance to me
> before I tear-out a whole lot of hair following this down the
> rabbit hole.
>
> This is commons-text-1.1. I've looked at the release notes between
> 1.1 and 1.8 and I don't see anything immediately that looks like a
> bugfix.
>
> The data is coming from a database, and the string is clearly
> correct, and it includes a "typographic right apostrophe", which is
> accurately &rsquo; in HTML.
>
> The output is being generated by Apache Velocity, through a macro
> which escapes XML for us. The code in the template looks like
> this:
>
> #xmlEscape($foo)
>
> Where $foo is a string value containing this character: ’
>
> The xmlEscape macro is defined in our global macros file which
> gets evaluated on startup:
>
> #macro (xmlEscape
> $text)#if($text)$!modernEscape.escapeXml10($text.toString())#end#end
>
>  $modernEscape is an instance of
> org.apache.commons.text.StringEscapeUtils in the global-scope;
> it's like "application" scope for webapps, but it's in Velocity.
>
> When we first start our web application, all seems well. After
> some time, this process breaks and we start emitting "&rsquo;"
> instead of "’" .
>
> I can find no evidence of any of the following:
>
> 1. multiple versions of commons-text library 2. multiple versions
> org.apache.commons.text.StringEscapeUtil in any library 3. any
> component replacing the value of $modernEscape 4. any component
> replacing the definition of the #xmlEscape macro
>
> When the first report came in, we tried replicating the reporter's
> experience and we could see it on one server node but not others.
> We restarted that web application on that node and it started
> working properly again.
>
> Does StringEscapeUtils.escape* keep any state associated with what
> it's doing? We aren't doing anything weird: just calling
> StringEscapeUtils.escapeXml10 ... a lot of times, probably from
> many threads.
>
> Any ideas?
>
> -chris
>
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl4g9m0ACgkQHPApP6U8
pFhxVw//Sz1xJ3kVs5dwSZsNg4NT1wVRPsje4l0/dRU2KbBX/15LsLzTEb/jXiRa
BeYq1a5YXc8USt7/HOBMaxO4VLXsxj7wPuHZj0dm8D3X/my8O6BciW4A4S4vxJhy
RxJM2TtekdYFwZH3TBZ9SDV/IfHXRemIOwpbYtCGrwZzdoLOowMJUjKADzBZQs4c
w25vqZExUjlNkQXXisy1PXgR5SI+YY8q2wRQ1sW0TAvhtyZbmNr2ub/MOPGElEmV
0IpTE1eJOG9LU5Isqb2ultkGZ2b5KDc+WFGb0lW40w9eyUbG0CYz2mQPF16VMZ1P
hwUW4mvuxYSAnTCAzpwd1va0KLV2Ilk+XWgA0B4olxhnqzQt6onEGd/RW1CB8Kb0
uDW+0KosGkb49ngTzVsDkWksIC1Rkcts5cYc7LhYvuzZwUbLh4jVyReixh3gvDPd
waouxjUhmlM8QqvJRlqembu77QilCWzcYwTOtGhDMXsArvtV1mjxgHS/1ZOOAtZO
eg4CFZzT+32K7Uwahmfs1Ca4Y7SDkPqiWNgqfprG/eH2KU7E0gvS4S1FGJttrMhO
thfVyp1Pqc9eQ2SZm6OEuU5yC44qTfinvt+fakgQP9cz96V52tbNDOTteKl2YLvX
t7/URTvgil4pjCrWIUSvO8bEgNNdp6D037juv4fgVRqeEVSH6QY=
=6q6T
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [commons-text] Help debugging a very strange error

Bruno P. Kinoshita-2
In reply to this post by Christopher Schultz-2
 I had a quick look at the code, and couldn't find anything that looked suspicious Christopher. There is some state, but it is private, created in the class static constructors, and not changed anywhere that I could find.
Interested to learn what's causing this issue in your environment. Keep us posted.
Cheers
Bruno


    On Friday, 17 January 2020, 7:40:44 am NZDT, Christopher Schultz <[hidden email]> wrote:  
 
 -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

All,

In the past week, I've received reports of our servers starting to
incorrectly escape XML strings with consumer errors like this:

org.xml.sax.SAXParseException: The entity "rsquo" was referenced, but
not declared.

When looking at the raw text being generated, it's clear that, indeed,
the text is being escaped as if it were HTML (where the &rsquo; entity
is defined) instead of XML.

The code path is a little convoluted, and I'm going to try to get the
smallest reproducible test case I can, but I thought I'd reach-out
early to see if anyone has any "aha" guidance to me before I tear-out
a whole lot of hair following this down the rabbit hole.

This is commons-text-1.1. I've looked at the release notes between 1.1
and 1.8 and I don't see anything immediately that looks like a bugfix.

The data is coming from a database, and the string is clearly correct,
and it includes a "typographic right apostrophe", which is accurately
&rsquo; in HTML.

The output is being generated by Apache Velocity, through a macro
which escapes XML for us. The code in the template looks like this:

#xmlEscape($foo)

Where $foo is a string value containing this character: ’

The xmlEscape macro is defined in our global macros file which gets
evaluated on startup:

#macro (xmlEscape
$text)#if($text)$!modernEscape.escapeXml10($text.toString())#end#end

$modernEscape is an instance of
org.apache.commons.text.StringEscapeUtils in the global-scope; it's
like "application" scope for webapps, but it's in Velocity.

When we first start our web application, all seems well. After some
time, this process breaks and we start emitting "&rsquo;" instead of "’"
.

I can find no evidence of any of the following:

1. multiple versions of commons-text library
2. multiple versions org.apache.commons.text.StringEscapeUtil in any
library
3. any component replacing the value of $modernEscape
4. any component replacing the definition of the #xmlEscape macro

When the first report came in, we tried replicating the reporter's
experience and we could see it on one server node but not others. We
restarted that web application on that node and it started working
properly again.

Does StringEscapeUtils.escape* keep any state associated with what
it's doing? We aren't doing anything weird: just calling
StringEscapeUtils.escapeXml10 ... a lot of times, probably from many
threads.

Any ideas?

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAl4grhwACgkQHPApP6U8
pFga3RAAgPalqagLkEyGuWhKOaa6VbGaXRqLGNjd63byTM/TFKJyuVHsU3W0MpkC
LxG7IK1a+FuTcQuaxSY8tP9T/TH7p88y9cVpj2r8b4PXJLZ4SddOMxr/gT9MfBxA
7Vq+vpvwdkWOfcIqFBwgcx7h+EVGoUbzzYBbc301m5TxkK7kYtV6KmlGi4o3R68A
x5Ic6QtASxjaDZK6bywsHTxQWmp66+8j1QFInEtjP69Am+fkjKxE/vnTHFYha+Cr
rYuseQxhDMOyUOxhPQiU65sFzjGnS/0529EV0VykP59YNrpTGAxha7T5tSQL8iNy
p9fRv0X/Ijz6WznNiN6K36Ftu6OEyTouak0zfzKiOPZKhIvp+ofNaRbuA01O/Km/
hqt0bEdBtq8/nnYGsKmXuNv+18pWl8eY539w3kw572Rnzyxo5bdUX5YFCyq3dIeP
rhQDhA4DDpFfaHHsL1cIdLXs5b+0au85REwHusZe7iPCxZytUNahE9uDIcQhyRwJ
ix6+LgF+4nWHVtMnQL3Dw60Of/uIbvEs/Bfvc86dIGrEBhXoh2q1qLu1iwlBf7Jw
rxFsWmDv8T1jrWYmvKNispr2KUAhGf6bl+1PxxxdnKnUJdE09CqjDL/BnYclDqJZ
6f7pORqISRLiUN99KHNliC9TMwEBjmXUhV3QOoSx+d5IUTBB0/g=
=zk4m
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]