[Digester] How to turn off parse's attempt to retrieve dtd?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Digester] How to turn off parse's attempt to retrieve dtd?

Jon Steelman-4
When I parse after setValidating(false), why does Digester still try to
retrieve the dtd file? What else do I need to set in Digester to make it
ignore xml like <!DOCTYPE cardActionVendor SYSTEM
"cardActionVendor.dtd"> ? Otherwise, I cannot parse and if I don't have
the dtd at the moment and always get the java.io.FileNotFoundException
(The system cannot find the file specified). Using Digester 1.7.

Thanks,
Jon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Digester] How to turn off parse's attempt to retrieve dtd?

Simon Kitching
Hi Jon,

On Wed, 2005-06-15 at 03:36 -0400, Jon Steelman wrote:
> When I parse after setValidating(false), why does Digester still try to
> retrieve the dtd file? What else do I need to set in Digester to make it
> ignore xml like <!DOCTYPE cardActionVendor SYSTEM
> "cardActionVendor.dtd"> ? Otherwise, I cannot parse and if I don't have
> the dtd at the moment and always get the java.io.FileNotFoundException
> (The system cannot find the file specified). Using Digester 1.7.

This is standard XML parser behaviour. Even when validation is disabled,
an XML parser still needs the DTD for things such as entity definitions.

Digester provides basic support for using local copies of resources
referenced from an xml file (such as DTDs): if your DOCTYPE declaration
was properly formed (ie had a PUBLIC declaration instead of just a
SYSTEM declaration) then you could have used the Digester.register
method to register a local copy of the DTD for the PUBLIC id. Note that
omitting the PUBLIC id from a DTD decl is WRONG in any application where
xml is expected to be used on a host other than the one it is created
on.

But as you've got a broken DOCTYPE you have two options:
(1) use some parser-specific property setting to tell the parser to
ignore external DTDs. If you're using Sun's java 1.4 or later then the
underlying parser is actually Xerces, so see
  http://xml.apache.org/xerces2-j/features.html
  for feature external-parameter-entities
and see also method Digester.setProperty. Of course this isn't terribly
portable, nor guaranteed to work in future JVMs (though I can't see Sun
ditching Xerces for anything else in the near future).

(2) write a custom EntityResolver subclass that returns an empty stream
when asked for the DTD, and register that with Digester using
Digester.setEntityResolver.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Digester] How to turn off parse's attempt to retrieve dtd?

Thomas Dudziak
> But as you've got a broken DOCTYPE you have two options:
> (1) use some parser-specific property setting to tell the parser to
> ignore external DTDs. If you're using Sun's java 1.4 or later then the
> underlying parser is actually Xerces, so see
>   http://xml.apache.org/xerces2-j/features.html
>   for feature external-parameter-entities
> and see also method Digester.setProperty. Of course this isn't terribly
> portable, nor guaranteed to work in future JVMs (though I can't see Sun
> ditching Xerces for anything else in the near future).
>
> (2) write a custom EntityResolver subclass that returns an empty stream
> when asked for the DTD, and register that with Digester using
> Digester.setEntityResolver.

Forrest and others use the xml-resolver component from
http://xml.apache.org/commons/components/resolver/index.html, which
provides such an entity resolver that is able to map system
identifiers to (different) urls.
Have a look at the "XML Entity and URI Resolvers" article
(http://xml.apache.org/commons/components/resolver/resolver-article.html)
for a more in-depth explanation.

Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Digester] How to turn off parse's attempt to retrieve dtd?

Simon Kitching
On Wed, 2005-06-15 at 11:12 +0200, Thomas Dudziak wrote:

> > (2) write a custom EntityResolver subclass that returns an empty stream
> > when asked for the DTD, and register that with Digester using
> > Digester.setEntityResolver.
>
> Forrest and others use the xml-resolver component from
> http://xml.apache.org/commons/components/resolver/index.html, which
> provides such an entity resolver that is able to map system
> identifiers to (different) urls.
> Have a look at the "XML Entity and URI Resolvers" article
> (http://xml.apache.org/commons/components/resolver/resolver-article.html)
> for a more in-depth explanation.

Thanks for the links Tom - I knew that there were open-source
EntityResolver implementations that supported OASIS catalogs but didn't
know one was at Apache!

It should be noted, though, that writing a trivial EntityResolver to
simply ignore external entities is only about 10 lines, which might be
better than introducing an external dependency. Of course for more
flexibility the xml.apache.org classes look to be the way to go..

However ignoring the DTD can potentially change the meaning of a
document, and so should only be done when the DTD is known to have
nothing but validation rules in it.

Cheers,

Simn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [Digester] How to turn off parse's attempt to retrieve dtd?

Thomas Dudziak
> Thanks for the links Tom - I knew that there were open-source
> EntityResolver implementations that supported OASIS catalogs but didn't
> know one was at Apache!
>
> It should be noted, though, that writing a trivial EntityResolver to
> simply ignore external entities is only about 10 lines, which might be
> better than introducing an external dependency. Of course for more
> flexibility the xml.apache.org classes look to be the way to go..

and about the reinvention of the wheel ;-)

> However ignoring the DTD can potentially change the meaning of a
> document, and so should only be done when the DTD is known to have
> nothing but validation rules in it.

The main reason for the xml-resolver component is AFAIK to allow the
bundling of the DTD while maintaining the full URL i.e. for offline
mode. At least that's what Forrest uses it for.

Tom

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [Digester] How to turn off parse's attempt to retrieve dtd?

Jon Steelman-4
In reply to this post by Jon Steelman-4
Simon,

How does one know if the DTD has nothing but validation rules?

Thanks,
Jon

Sample DTD content:

<!-- ================= -->
<!-- Top Level Element -->
<!-- ================= -->
<!ELEMENT cardActionVendor (request|response)>
<!-- ================================= -->
<!-- Complex Elements request/response -->
<!-- ================================= -->
<!ELEMENT request  (user,password,cardid,cardAction,merchantID,
locationID,trackingNumber)>
<!ELEMENT response (user,password,cardid,cardAction,merchantID,
locationID,trackingNumber,
responseCode,responseText,responseLog)>
<!-- =============== -->
<!-- Simple Elements -->
<!-- =============== -->
<!ELEMENT user (#PCDATA)>
<!ELEMENT password (#PCDATA)>
<!ELEMENT cardid (#PCDATA)>
<!ELEMENT cardAction EMPTY>
<!ATTLIST cardAction action (activate|deactivate|status) "activate">
<!ELEMENT merchantID (#PCDATA)>
<!ELEMENT locationID (#PCDATA)>
<!ELEMENT trackingNumber (#PCDATA)>
<!ELEMENT responseCode (#PCDATA)>
<!ELEMENT responseText (#PCDATA)>
<!ELEMENT responseLog (#PCDATA)>


-----Original Message-----
From: Simon Kitching [mailto:[hidden email]]
Sent: Wednesday, June 15, 2005 5:27 AM
To: Jakarta Commons Users List
Subject: Re: [Digester] How to turn off parse's attempt to retrieve dtd?

However ignoring the DTD can potentially change the meaning of a
document, and so should only be done when the DTD is known to have
nothing but validation rules in it.

Cheers,
Simn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [Digester] How to turn off parse's attempt to retrieve dtd?

Simon Kitching
On Thu, 2005-06-16 at 01:07 -0400, Jon Steelman wrote:
> Simon,
>
> How does one know if the DTD has nothing but validation rules?

Read the DTD and watch out for things like:

Default values for xml attributes:
  <!ATTLIST payment type CDATA "check">
which will implicitly sets xml attribute payment to "check" if the input
doesn't specify a value.

Entity definitions:
  <!ENTITY .....>

There might be a few other things that a DTD can contain which affect
the *meaning* of even a valid document. I haven't actually figured out
what the exclusive list is.

Things like ELEMENT and ATTLIST (without default values) declarations
should be ok, in that a valid document is unchanged whether the DTD is
used or not.

But::
> <!ATTLIST cardAction action (activate|deactivate|status) "activate">
there's a default value. So you can't skip the DTD completely; you'll
have to provide an EntityResolver that returns a local copy of the DTD
otherwise documents that don't define the action attribute won't get the
default value.

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]