[jira] [Commented] (VALIDATOR-429) UrlValidator - path is invalid due to using java.net.URI for validation (regression)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (VALIDATOR-429) UrlValidator - path is invalid due to using java.net.URI for validation (regression)

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/VALIDATOR-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201587#comment-16201587 ]

Bruno P. Kinoshita commented on VALIDATOR-429:

Might be easier to review the suggested changes if there is a pull request in GitHub, or a patch attached (former is preferable IMO). Might have time to give it a try and review it in the next days.

> UrlValidator - path is invalid due to using java.net.URI for validation (regression)
> ------------------------------------------------------------------------------------
>                 Key: VALIDATOR-429
>                 URL: https://issues.apache.org/jira/browse/VALIDATOR-429
>             Project: Commons Validator
>          Issue Type: Bug
>          Components: Routines
>    Affects Versions: 1.6
>            Reporter: limpygnome
>              Labels: easyfix
> h1. Summary
> We've been hit by a bug in a real world application after upgrading 1.4.1 to 1.6, where previously valid URLs are no longer valid, which looks to be due to using java.net.URI for validating the path of a URL.
> h1. Steps to Reproduce
> Our application went to validate URLs similar to the following:
> * http://example.com//_test
> This is no longer valid in 1.6.1, but the following cases are:
> * http://example.com//test
> * http://example.com/_test
> h1. Impact
> It seems paths in UrlValidator are being parsed/validated as host-names, per java.net.URI's validation.
> h1. Technical
> It looks like this may have been introduced by the following change:
> https://github.com/apache/commons-validator/commit/03bf0d33143ebd13e4f389cd4ecac8aec17c2057
> Specifically due to now using java.net.URI to validate a path. The usage is as follows in org.apache.commons.validator.routines.UrlValidator:
> {code}
> URI uri = new URI(null,null,path,null);
> {code}
> It looks like URI is trying to parse the path as a hostname when the schema and hostname are not specified.
> Example to reproduce:
> {code}
> new URI(null, null, "//_test", null);   // throws URISyntaxException
> {code}
> Same example with other parts, no longer throwing exception:
> {code}
> new URI(null, "test", "//_test", null);
> {code}
> Even though java.net.URI states string components can be null, it seems the URL built internally, which is validated, is slightly different. So when specifying a hostname with URI, internally it constructs:
> * //test//_test
> Using no hostname, in the same way as UrlValidator, the following is constructed and validated internally:
> * //_test
> Therefore it looks like there's either a bug in java.net.URI, or its usage is not correctly documented.
> h1. Fix
> A potential fix is to change org.apache.commons.validator.routines.UrlValidator to pass an empty string in the hostname. Internally, in java.net.URI, this produces:
> * ////_test
> Thus the hostname is empty, which is considered empty, and the correct path is validated.
> Would this fix be appropriate, or considered too fragile?
> Alternatively the fix could be to extract similar logic to java.net.URI, to validate the path, which appears to be just checking the characters are valid and between a certain range. This logic can be seen in java.net.URI.parseHierarchical, which calls upon checkChars.

This message was sent by Atlassian JIRA