Skip navigation.
 
mlRe: NSXMLDocument unable to parse valid HTML with scripts in the body
FROM : Diez B. Roggisch
DATE : Wed Feb 06 20:43:08 2008

Marcus S. Zarra schrieb:
> Greetings List,
>
> I have been trying to solve this issue for a while.  Nothing is coming
> up on the lists, google, cocoadev, etc. that is similar to the issue
> that I am having.
>
> The code to reproduce this behavior is simple:
>
> NSError *error = nil;
> NSURL *url = [NSURL
> URLWithString:@"http://web.mac.com/mzarra/Test/Original.html"];
> NSXMLDocument *document = [[NSXMLDocument alloc]
> initWithContentsOfURL:url options:NSXMLDocumentTidyHTML error:&error];
> NSAssert(error == nil, ([NSString stringWithFormat:@"Error reading file:
> %@", error]));
>
> If you run that code (the html page is safe for work), NSXMLDocument
> will give an error back of:
>
> Exception raised during posting of notification.  Ignored.  exception:
> 'Error reading file: Error Domain=NSXMLParserErrorDomain Code=23
> UserInfo=0x1f1550 "Line 140: EntityRef: expecting ';'
>
> And will return a nil NSXMLDocument.  The line that it is complaining
> about is:
>
> <div class="CounterDivClass"><script type="text/javascript"
> src=""></script>
>
> Which, as far as I can tell, is perfectly valid html.
>
> I have tried every input option available for loading the document but
> none of them change the error.  Even more interesting, if I just
> initialize an NSXMLParser prior to loading the document then the
> document will load but it will mutilate the tree and actually make the
> document invalid!
>
> To duplicate this add the line:
>
> [[[NSXMLParser alloc] initWithData:data] autorelease];
>
> Just before the NSXMLDocument *document... line above and rerun the
> test.  The document will pass but a large chunk (starting after line
> 140) will no longer be in the document.
>
> So the questions that I am hoping to get resolved are:
>
> 1.  Why is this throwing an error?
> 2.  How can I get past it to properly load the XML Document (preferably
> without having to build my own tree).
>
> FYI, This html code is generated by iWeb...
>
> Thanks for any and all help, suggestions, etc.
>
> Marcus


For me, the line in question looks like this:

            <div class="CounterDivClass"><script type="text/javascript"
src="
"></script><script
type="text/javascript"
src="
"></script><div
id="CounterDiv"><img src="
http://web.mac.com/i/chp/1/spacer.gif" alt=""
/></div></div><a href="http://www.mac.com"
title=""><img src="Welcome_files/mwmac.png" alt="Made
on a Mac" style="border: none; height: 50px; left: 24px; opacity: 0.55;
position: absolute; top: 29px; width: 139px; z-index: 1; " id="id2" />


Look at the &counter  in the second script-tag. You should use an &amp;
there I guess.

Diez

Related mailsAuthorDate
mlRe: NSXMLDocument unable to parse valid HTML with scripts in the body Diez B. Roggisch Feb 6, 20:43
mlRe: NSXMLDocument unable to parse valid HTML with scripts in the body [SOLVED] Marcus S. Zarra Feb 6, 20:56