Exception when parsing RSS

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Exception when parsing RSS

Adriano Bonat
Hello,

I'm using Digester to parse a Google News RSS feed, but sometimes I'm
getting the following exception:

org.xml.sax.SAXParseException: Open quote is expected for attribute
"face" associated with an  element type  "font".
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.commons.digester.Digester.parse(Digester.java:1685)
...

anybody knows what can I do to solve this exception and be able to
normally get the contents?

Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Exception when parsing RSS

Simon Kitching
On Tue, 2008-09-09 at 17:14 -0300, Adriano Bonat wrote:

> Hello,
>
> I'm using Digester to parse a Google News RSS feed, but sometimes I'm
> getting the following exception:
>
> org.xml.sax.SAXParseException: Open quote is expected for attribute
> "face" associated with an  element type  "font".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>         at org.apache.commons.digester.Digester.parse(Digester.java:1685)
> ...
>
> anybody knows what can I do to solve this exception and be able to
> normally get the contents?

Can you post the relevant part of the rss input text?

>From this error message, it sure looks like the input is invalid xml.
And if that is the case, then there is no way to parse it with any xml
parser.

If it is intermittent, then maybe you are getting intermittent
truncation of the input data stream.

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Exception when parsing RSS

Adriano Bonat
On Tue, Sep 9, 2008 at 5:39 PM, Simon Kitching <[hidden email]> wrote:
>
> Can you post the relevant part of the rss input text?

For example:
http://news.google.com/?output=rss&ned=en&num=50&q=test&ie=UTF-8

> >From this error message, it sure looks like the input is invalid xml.
> And if that is the case, then there is no way to parse it with any xml
> parser.

The <description> content from the Google's RSS is escaped, so "<" is
&lt;, ">" is &gt;... so I don't understand why I'm getting that error.

> If it is intermittent, then maybe you are getting intermittent
> truncation of the input data stream.

Hmm.. it is implemented like this:

InputStreamReader isr = new
InputStreamReader(urlConnection.getInputStream(), "UTF-8");
BufferedReader br = new BufferedReader(isr);
                       
Channel channel = (Channel) this.rssParser.parse(br);
               
urlConnection.disconnect();

... so using a BufferedReader is this "intermittent" problem possible?

Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Exception when parsing RSS

Simon Kitching
Adriano Bonat schrieb:
> On Tue, Sep 9, 2008 at 5:39 PM, Simon Kitching <[hidden email]> wrote:
>  
>> Can you post the relevant part of the rss input text?
>>    
>
> For example:
> http://news.google.com/?output=rss&ned=en&num=50&q=test&ie=UTF-8
That isn't what I meant;  I'm quite sure that google.com is generating
good xml. But what is being passed to Digester?
>  
>> >From this error message, it sure looks like the input is invalid xml.
>> And if that is the case, then there is no way to parse it with any xml
>> parser.
>>    
>
> The <description> content from the Google's RSS is escaped, so "<" is
> &lt;, ">" is &gt;... so I don't understand why I'm getting that error.
>  
By the way, how do you view the raw xml from that url?

>  
>> If it is intermittent, then maybe you are getting intermittent
>> truncation of the input data stream.
>>    
>
> Hmm.. it is implemented like this:
>
> InputStreamReader isr = new
> InputStreamReader(urlConnection.getInputStream(), "UTF-8");
> BufferedReader br = new BufferedReader(isr);
>
> Channel channel = (Channel) this.rssParser.parse(br);
>
> urlConnection.disconnect();
>
> ... so using a BufferedReader is this "intermittent" problem possible?
>  

It would seem so.

I would recommend reading the contents of the input stream into a String
first, then passing that to digester. Then you can see what data is
really being parsed.

By the way, digester does not parse the input itself. Digester is simply
a "sax event handler". The parse methods are just simple convenience
wrappers that create an instance of whatever xml parser is bundled with
the jvm, configures the digester instance to listen to events from that
parser then passes the input to the xml parser. So what you are seeing
is an error being reported from the standard xml parser built into your
jvm; it's really nothing to do with Digester.

Regards,
Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]