Skip navigation.
 
mlNSXMLDocument unable to parse valid HTML with scripts in the body
FROM : Marcus S. Zarra
DATE : Wed Feb 06 20:35:44 2008

Greetings List,

I have been trying to solve this issue for a while.  Nothing is coming 
up on the lists, google, cocoadev, etc. that is similar to the issue 
that I am having.

The code to reproduce this behavior is simple:

NSError *error = nil;
NSURL *url = [NSURL URLWithString:@"http://web.mac.com/mzarra/Test/Original.html
"];
NSXMLDocument *document = [[NSXMLDocument alloc] 
initWithContentsOfURL:url options:NSXMLDocumentTidyHTML error:&error];
NSAssert(error == nil, ([NSString stringWithFormat:@"Error reading 
file: %@", error]));

If you run that code (the html page is safe for work), NSXMLDocument 
will give an error back of:

Exception raised during posting of notification.  Ignored.  exception: 
'Error reading file: Error Domain=NSXMLParserErrorDomain Code=23 
UserInfo=0x1f1550 "Line 140: EntityRef: expecting ';'

And will return a nil NSXMLDocument.  The line that it is complaining 
about is:

<div class="CounterDivClass"><script type="text/javascript" src="http://web.mac.com/i/chp/NGHitCounter.js
"></script>

Which, as far as I can tell, is perfectly valid html.

I have tried every input option available for loading the document but 
none of them change the error.  Even more interesting, if I just 
initialize an NSXMLParser prior to loading the document then the 
document will load but it will mutilate the tree and actually make the 
document invalid!

To duplicate this add the line:

[[[NSXMLParser alloc] initWithData:data] autorelease];

Just before the NSXMLDocument *document... line above and rerun the 
test.  The document will pass but a large chunk (starting after line 
140) will no longer be in the document.

So the questions that I am hoping to get resolved are:

1.  Why is this throwing an error?
2.  How can I get past it to properly load the XML Document 
(preferably without having to build my own tree).

FYI, This html code is generated by iWeb...

Thanks for any and all help, suggestions, etc.

Marcus

Related mailsAuthorDate
No related mails found.