Odd NSData initWithContentsOfURL: issue
-
Hey all,
I'm trying to grab XML files from various URLs and save the XML file to
disk as a text file. Here is the little snippet of code I'm using:
NSURL *xmlURL = [[NSURL alloc] initWithString:feedURL];
NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];
[theXMLFile writeToFile:@"/tmp/blaa.xml" atomically:TRUE];
Ok, so all it should do, given the URL as the NSString 'feedURL', is
grab the data using that URL, then write it out to /tmp/blaa.xml.
It works for most URLs I try. But there are some that result in binary
garbage data in the blaa.xml file! I'm wondering if anyone has any
clue where I can look to start figuring this out.
For example, this XML file works fine:
http://www.coverville.com/index.xml
But this one doesn't:
http://vinylpodcast.com/wp-rss2.php
If I load 'em up in Firefox or Safari and view source, they both look
like valid XML text files. The first is saved as a valid XML text
file. The second is saved as binary junk. The .php extension
shouldn't matter, since that is a server-side scripting that just sends
back a valid XML file.
Any ideas at all???
Thanks a lot in advance!
-Mike
--
Sent from my .mac account -
Yes, I know exactly what's going on. The server is returning it as a
gzipped file. initWithContentsOfURL is obviously saying it can handle
gzip, but it's not decompressing it for you. This is surprising -
either it shouldn't say it can handle gzip, or it should decompress it
for you.
I suggest passing any data you get back through a gzip decompressor
(you could use libz directly, or you could see if anybody's written a
Cocoa wrapper). If it's not gzip data, libz returns it unmodified, and
if it is, libz returns the correct data.
On Mar 13, 2005, at 10:27 PM, Michael J. Sherman wrote:> I'm trying to grab XML files from various URLs and save the XML file
> to disk as a text file. Here is the little snippet of code I'm using:
>
> NSURL *xmlURL = [[NSURL alloc] initWithString:feedURL];
> NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];
> [theXMLFile writeToFile:@"/tmp/blaa.xml" atomically:TRUE];
>
> Ok, so all it should do, given the URL as the NSString 'feedURL', is
> grab the data using that URL, then write it out to /tmp/blaa.xml.
>
> It works for most URLs I try. But there are some that result in
> binary garbage data in the blaa.xml file! I'm wondering if anyone
> has any clue where I can look to start figuring this out.
>
> For example, this XML file works fine:
> http://www.coverville.com/index.xml
> But this one doesn't:
> http://vinylpodcast.com/wp-rss2.php
>
> If I load 'em up in Firefox or Safari and view source, they both look
> like valid XML text files. The first is saved as a valid XML text
> file. The second is saved as binary junk. The .php extension
> shouldn't matter, since that is a server-side scripting that just
> sends back a valid XML file.
>
> Any ideas at all???
--
Kevin Ballard
<kevin...>
http://www.tildesoft.com
http://kevin.sb.org -
On 14 Mar 2005, at 3:27, Michael J. Sherman wrote:> I'm trying to grab XML files from various URLs and save the XML file...
> to disk as a text file.> NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];...> It works for most URLs I try. But there are some that result in
> binary garbage data in the blaa.xml file! I'm wondering if anyone has
> any clue where I can look to start figuring this out.
I ran into the same problem the other day but with the
initWithContentsOfURL: message in NSXMLParser instead of NSData.
Another site that fails is the C|Net RSS feed at:
http://news.com.com/2547-1_3-0-5.xml At the time I didn't have enough
time to fully test things and then I went off on holiday but seeing
your mail made me come back and look at this again.
It seems that there is a real bug in the initWithContentsOfURL: code
for NSData and NSString and by extension every class that uses them. A
quick check of the headers sent with the request shows that they
include:
Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
The problem is that when the server does send back gziped data it's not
getting unpacked. I just tried your code to download from the site you
mentioned and from CNet and the resulting files unpack just fine on the
command line using gunzip.
I will file a bug report with Apple. Given that 10.3.9 is in seed
already I guess it won't get fixed before Tiger :-(
Nicko -
On Mar 13, 2005, at 10:01 PM, Nicko van Someren wrote:> On 14 Mar 2005, at 3:27, Michael J. Sherman wrote:
>> I'm trying to grab XML files from various URLs and save the XML
>> file to disk as a text file.
> ...
>> NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];
> ...
>> It works for most URLs I try. But there are some that result in
>> binary garbage data in the blaa.xml file! I'm wondering if anyone
>> has any clue where I can look to start figuring this out.
>
> I ran into the same problem the other day but with the
> initWithContentsOfURL: message in NSXMLParser instead of NSData.
> Another site that fails is the C|Net RSS feed at: http://
> news.com.com/2547-1_3-0-5.xml At the time I didn't have enough
> time to fully test things and then I went off on holiday but seeing
> your mail made me come back and look at this again.
>
> It seems that there is a real bug in the initWithContentsOfURL:
> code for NSData and NSString and by extension every class that uses
> them. A quick check of the headers sent with the request shows
> that they include:
> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
> The problem is that when the server does send back gziped data it's
> not getting unpacked. I just tried your code to download from the
> site you mentioned and from CNet and the resulting files unpack
> just fine on the command line using gunzip.
>
> I will file a bug report with Apple. Given that 10.3.9 is in seed
> already I guess it won't get fixed before Tiger :-(
That would be a feature request, rather than a bug report. -
initWithContentsOfURL: is supposed to give you whatever's at that
URL, without any modifications.
-jcr
John C. Randolph <jcr...> (408) 974-8819
Sr. Cocoa Software Engineer,
Apple Worldwide Developer Relations
http://developer.apple.com/cocoa/index.html -
On Mar 15, 2005, at 11:46 AM, John C. Randolph wrote:>
> On Mar 13, 2005, at 10:01 PM, Nicko van Someren wrote:
>
>> On 14 Mar 2005, at 3:27, Michael J. Sherman wrote:
>>> I'm trying to grab XML files from various URLs and save the XML
>>> file to disk as a text file.
>> ...
>>> NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];
>> ...
>>> It works for most URLs I try. But there are some that result in
>>> binary garbage data in the blaa.xml file! I'm wondering if anyone
>>> has any clue where I can look to start figuring this out.
>>
>> I ran into the same problem the other day but with the
>> initWithContentsOfURL: message in NSXMLParser instead of NSData.
>> Another site that fails is the C|Net RSS feed at:
>> http://news.com.com/2547-1_3-0-5.xml At the time I didn't have
>> enough time to fully test things and then I went off on holiday but
>> seeing your mail made me come back and look at this again.
>>
>> It seems that there is a real bug in the initWithContentsOfURL: code
>> for NSData and NSString and by extension every class that uses them.
>> A quick check of the headers sent with the request shows that they
>> include:
>> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
>> The problem is that when the server does send back gziped data it's
>> not getting unpacked. I just tried your code to download from the
>> site you mentioned and from CNet and the resulting files unpack just
>> fine on the command line using gunzip.
>>
>> I will file a bug report with Apple. Given that 10.3.9 is in seed
>> already I guess it won't get fixed before Tiger :-(
>
> That would be a feature request, rather than a bug report.
> -initWithContentsOfURL: is supposed to give you whatever's at that
> URL, without any modifications.
I think most users would argue that the compression of the data is an
artifact of the transport mechanism, and does not represent the actual
data at that URL. And you should not be explicitly advertising gzip
support to the server on the client's behalf if you aren't going to
take responsibility for the actual gzip decompression. -
On Mar 15, 2005, at 2:46 PM, John C. Randolph wrote:>
> On Mar 13, 2005, at 10:01 PM, Nicko van Someren wrote:
>
>> On 14 Mar 2005, at 3:27, Michael J. Sherman wrote:
>>> I'm trying to grab XML files from various URLs and save the XML file
>>> to disk as a text file.
>> ...
>>> NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];
>> ...
>>> It works for most URLs I try. But there are some that result in
>>> binary garbage data in the blaa.xml file! I'm wondering if anyone
>>> has any clue where I can look to start figuring this out.
>>
>> I ran into the same problem the other day but with the
>> initWithContentsOfURL: message in NSXMLParser instead of NSData.
>> Another site that fails is the C|Net RSS feed at:
>> http://news.com.com/2547-1_3-0-5.xml At the time I didn't have
>> enough time to fully test things and then I went off on holiday but
>> seeing your mail made me come back and look at this again.
>>
>> It seems that there is a real bug in the initWithContentsOfURL: code
>> for NSData and NSString and by extension every class that uses them.
>> A quick check of the headers sent with the request shows that they
>> include:
>> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
>> The problem is that when the server does send back gziped data it's
>> not getting unpacked. I just tried your code to download from the
>> site you mentioned and from CNet and the resulting files unpack just
>> fine on the command line using gunzip.
>>
>> I will file a bug report with Apple. Given that 10.3.9 is in seed
>> already I guess it won't get fixed before Tiger :-(
>
> That would be a feature request, rather than a bug report.
> -initWithContentsOfURL: is supposed to give you whatever's at that
> URL, without any modifications.
>
But the library is advertising to the remote server the fact that it
can handle gzip, but it never bothers to unzip it when it gets it.
What am I supposed to do with the gzipped return data? Do you have a
solution? -
On Mar 15, 2005, at 7:06 PM, Michael J. Sherman wrote:>
> On Mar 15, 2005, at 2:46 PM, John C. Randolph wrote:
>
>>
>> On Mar 13, 2005, at 10:01 PM, Nicko van Someren wrote:
>>
>>> On 14 Mar 2005, at 3:27, Michael J. Sherman wrote:
>>>> I'm trying to grab XML files from various URLs and save the XML
>>>> file to disk as a text file.
>>> ...
>>>> NSData *theXMLFile = [[NSData alloc] initWithContentsOfURL:xmlURL];
>>> ...
>>>> It works for most URLs I try. But there are some that result in
>>>> binary garbage data in the blaa.xml file! I'm wondering if
>>>> anyone has any clue where I can look to start figuring this out.
>>>
>>> I ran into the same problem the other day but with the
>>> initWithContentsOfURL: message in NSXMLParser instead of NSData.
>>> Another site that fails is the C|Net RSS feed at: http://
>>> news.com.com/2547-1_3-0-5.xml At the time I didn't have enough
>>> time to fully test things and then I went off on holiday but
>>> seeing your mail made me come back and look at this again.
>>>
>>> It seems that there is a real bug in the initWithContentsOfURL:
>>> code for NSData and NSString and by extension every class that
>>> uses them. A quick check of the headers sent with the request
>>> shows that they include:
>>> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0
>>> The problem is that when the server does send back gziped data
>>> it's not getting unpacked. I just tried your code to download
>>> from the site you mentioned and from CNet and the resulting files
>>> unpack just fine on the command line using gunzip.
>>>
>>> I will file a bug report with Apple. Given that 10.3.9 is in
>>> seed already I guess it won't get fixed before Tiger :-(
>>
>> That would be a feature request, rather than a bug report. -
>> initWithContentsOfURL: is supposed to give you whatever's at that
>> URL, without any modifications.
>>
>
> But the library is advertising to the remote server the fact that
> it can handle gzip, but it never bothers to unzip it when it gets
> it. What am I supposed to do with the gzipped return data? Do you
> have a solution?
>
I would suggest getting zlib, (http://www.gzip.org/zlib/) and using
the inflate() function, or using an NSTask to invoke gunzip.
-jcr
John C. Randolph <jcr...> (408) 974-8819
Sr. Cocoa Software Engineer,
Apple Worldwide Developer Relations
http://developer.apple.com/cocoa/index.html -
> I would suggest getting zlib, (http://www.gzip.org/zlib/) and using
> the inflate() function, or using an NSTask to invoke gunzip.
Zlib should be installed by default. On my system without any
modifications, both zilb and bzlib are installed. The zlib library is
pretty easy to learn, so I would think that invoking gunzip (using
NSTask) would be unneccesary. The documentation for zlib (and bzlib is
even worse) is pretty limited, but a little experimentation tends to
reveal all that's necessary.
I hope this helps,
Will -
On Mar 16, 2005, at 12:38 AM, Will Mason wrote:>
>> I would suggest getting zlib, (http://www.gzip.org/zlib/) and using
>> the inflate() function, or using an NSTask to invoke gunzip.
>
> Zlib should be installed by default. On my system without any
> modifications, both zilb and bzlib are installed.
Now that you mention it, I seem to recall that gunzip uses zlib, so I
should have known it was there..
-jcr
John C. Randolph <jcr...> (408) 974-8819
Sr. Cocoa Software Engineer,
Apple Worldwide Developer Relations
http://developer.apple.com/cocoa/index.html -
On 15 Mar 2005, at 20:01, John Stiles wrote:> On Mar 15, 2005, at 11:46 AM, John C. Randolph wrote:...
>> On Mar 13, 2005, at 10:01 PM, Nicko van Someren wrote:>>> It seems that there is a real bug in the initWithContentsOfURL: code...
>>> for NSData and NSString and by extension every class that uses them.
>>> A quick check of the headers sent with the request shows that they
>>> include:
>>> Accept-Encoding: gzip, deflate;q=1.0, identity;q=0.5, *;q=0>>> I will file a bug report with Apple. Given that 10.3.9 is in seed
>>> already I guess it won't get fixed before Tiger :-(
>>
>> That would be a feature request, rather than a bug report.
>> -initWithContentsOfURL: is supposed to give you whatever's at that
>> URL, without any modifications.
The problem is is does not do that. There is no compressed data at
that URL.> I think most users would argue that the compression of the data is an
> artifact of the transport mechanism, and does not represent the actual
> data at that URL. And you should not be explicitly advertising gzip
> support to the server on the client's behalf if you aren't going to
> take responsibility for the actual gzip decompression.
Absolutely. The data is not compressed on the server and it is only
being compressed in transit at the (erroneous) request of the client.
If you make an access to the same URL using, say, wget, you get the raw
data. If you access the URL with Safari and display the source you get
the raw data. If you access the URL with initWithContentsOfURL your
get the compressed transport stream and not the data at the URL.
The compression of the data is an un-requested artefact of the manner
in which initWithContentsOfURL asks for the data. There is no way for
the user to know in advance if the server supports compression and
therefore will be effected by the Accept-encoding: header line that
gets sent. Do you honestly think that this is a feature, let alone a
useful one, and not a honest-to-goodness bug?
Nicko


