Newbie: problem with NSURL?
-
Hello,
I have written a small Cocoa program, which is supposed to read via
the Web a log file located on a Linux server and parse it.
Unfortunately, it only reads garbage although the file itself contains
valid data (written by a php script running on Linux) such as:
2004-03-04 126.66.66.66 Netscape
2004-03-04 666.66.66.66 Safari
etc.
Here is the code:
---------------------------
#import "MyController.h"
@implementation MyController
-(void)awakeFromNib {
NSURL *url = [NSURL
URLWithString:@"http://www.myserver.org/log/logfile.txt"];
NSString *content = [NSString stringWithContentsOfURL: url];
NSLog(@"content = %@", content);
}
@end
---------------------------
Whenever I run the debugger, the 'content' variable contains the
following value:
\x1fã\b
and the following value is printed by NSLog : ã
Whenever I replace the above log file by a text file created on my Mac,
'content' contains valid data (e.g., "hello world...").
Whenever I have my program read an HTML file located on the Web server,
'content' also contains valid data (e.g., "<html>...").
I have downloaded my log file and examined it with BBEdit for control
characters, but none could be found.
So what is wrong?
Any help with this issue would be much appreciated.
Best regards,
PR
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
At 8:10 PM +0200 4/3/04, Philippe de Rochambeau wrote:
> Hello,
>
> I have written a small Cocoa program, which is supposed to read via
> the Web a log file located on a Linux server and parse it.
>
> Unfortunately, it only reads garbage although the file itself
> contains valid data (written by a php script running on Linux) such
> as:
The problem is the character encoding - NSString is assuming the
wrong character encoding. From the documentation:
"If the contents begin with a byte-order mark (U+FEFF or U+FFFE),
interprets the contents as Unicode characters. If the contents begin
with a UTF-8 byte-order mark (EFBBBF), interprets the contents as
UTF-8. Otherwise interprets the contents as characters in the default
C string encoding. Since the default C string encoding will vary with
the user's configuration, do not depend on this method unless you are
using Unicode or UTF-8 or you can verify the default C string
encoding."
Use this code instead (untested off the top of my head):
-(void)awakeFromNib
{
NSURL *url = [NSURL
URLWithString:@"http://www.myserver.org/log/logfile.txt"];
NSData *data = [NSData dataWithContentsOfURL: url];
NSString *content = [[[NSString alloc] initWithData:data
encoding:NSISOLatin1StringEncoding] autorelease];
NSLog(@"content = %@", content);
}
The NSISOLatin1StringEncoding is just a guess (though probably
correct). Depending on your web server config, you may have to try a
different one.
- Jon
--
________________________________________________________________________
Jon Gotow <gotow...>
St. Clair Software http://www.stclairsoft.com/
Fax (540)552-5898 ftp://ftp.stclairsoft.com/
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
Hello,
unfortunately, the result is the same with an NSData.
I now know what the problem is: NSURL cannot handle text containing
more than 1001 characters when it has trouble figuring out the Web
server's character encoding, or something of the sort.
Here is how I figured this out. I created a dummy log file with 1002 xs
and ran my program. As a result, 'content' contained non-displayable
characters 32 bytes in length. When I removed 1 x, the program
displayed the 1001 xs correctly. I then moved the 1002 log file to my
Mac's Web server, changed 'url''s value to
'http://localhost/~myname/prov4.txt', and ran the program. The 1002
characters were displayed correctly.
Whenever I have my program display the data in the remote Web server's
index.html file (url: 'http://www.myserver.com/index.html'), 'contain'
contains rubbish. Whenever I set 'url' to 'http://www.myserver.com',
'content' contains '<html>...'
Any suggestions please?
Cheers,
PR
On 4 avr. 04, at 05:12, Jon Gotow wrote:
> At 8:10 PM +0200 4/3/04, Philippe de Rochambeau wrote:
>> Hello,
>>
>> I have written a small Cocoa program, which is supposed to read via
>> the Web a log file located on a Linux server and parse it.
>>
>> Unfortunately, it only reads garbage although the file itself
>> contains valid data (written by a php script running on Linux) such
>> as:
>
> The problem is the character encoding - NSString is assuming the wrong
> character encoding. From the documentation:
>
> "If the contents begin with a byte-order mark (U+FEFF or U+FFFE),
> interprets the contents as Unicode characters. If the contents begin
> with a UTF-8 byte-order mark (EFBBBF), interprets the contents as
> UTF-8. Otherwise interprets the contents as characters in the default
> C string encoding. Since the default C string encoding will vary with
> the user's configuration, do not depend on this method unless you are
> using Unicode or UTF-8 or you can verify the default C string
> encoding."
>
> Use this code instead (untested off the top of my head):
>
> -(void)awakeFromNib
> {
> NSURL *url = [NSURL
> URLWithString:@"http://www.myserver.org/log/logfile.txt"];
> NSData *data = [NSData dataWithContentsOfURL: url];
> NSString *content = [[[NSString alloc] initWithData:data
> encoding:NSISOLatin1StringEncoding] autorelease];
>
> NSLog(@"content = %@", content);
> }
>
> The NSISOLatin1StringEncoding is just a guess (though probably
> correct). Depending on your web server config, you may have to try a
> different one.
>
> - Jon
>
>
> --
> _______________________________________________________________________
> _
> Jon Gotow <gotow...>
> St. Clair Software http://www.stclairsoft.com/
> Fax (540)552-5898 ftp://ftp.stclairsoft.com/
>
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
> <snip>
> ---------------------------
> Whenever I run the debugger, the 'content' variable contains the
> following value:
>
> \x1fã\b
Philippe,
I believe you are seeing this issue:
http://www.livejournal.com/community/macosxdev/52490.html
It was mentioned (and confirmed by someone at Apple) on this list a
couple of weeks ago, but I can't find that post now.
Here's the first few bytes of a random gzipped file I had handy:
Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
head
00000000 1f 8b 08 00 86 35 9a 3b 00 03 ec 3d 69 73 db 46
|.....5.;...=is.F|
00000010 b2 fe 4a fc 8a 36 a5 b5 49 85 a7 a8 23 26 2d 57
|..J..6..I...#&-W|
The first three match your 'garbage' data. To get around it, you'd need
to either uncompress the data using libz (quite straightforward), or
arrange for the data not to be gzipped in the first place, either by
changing the server config, or using one of the methods mentioned in
the link above.
Regards,
Howard
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
Howard,
that's interesting. It may be a coincidence though, because when my log
file contains less than 1002 characters, my programs prints out the
data correctly. Otherwise, it prints out that weird character.
What is even stranger is the fact that telnet has no problem reading
the log file, whatever its length.
PR
On 4 avr. 04, at 12:30, Howard Jones wrote:
> On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
>> <snip>
>> ---------------------------
>> Whenever I run the debugger, the 'content' variable contains the
>> following value:
>>
>> \x1fã\b
>
>
> Philippe,
>
> I believe you are seeing this issue:
> http://www.livejournal.com/community/macosxdev/52490.html
> It was mentioned (and confirmed by someone at Apple) on this list a
> couple of weeks ago, but I can't find that post now.
>
> Here's the first few bytes of a random gzipped file I had handy:
>
> Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
> head
> 00000000 1f 8b 08 00 86 35 9a 3b 00 03 ec 3d 69 73 db 46
> |.....5.;...=is.F|
> 00000010 b2 fe 4a fc 8a 36 a5 b5 49 85 a7 a8 23 26 2d 57
> |..J..6..I...#&-W|
>
> The first three match your 'garbage' data. To get around it, you'd
> need to either uncompress the data using libz (quite straightforward),
> or arrange for the data not to be gzipped in the first place, either
> by changing the server config, or using one of the methods mentioned
> in the link above.
>
> Regards,
>
> Howard
>
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
PR,
This makes sense to me. I would guess your web server doesn't bother
compressing output until it reaches a certain length. Even when it
does, it should only send gzip'd data when the client requests it using
a Accept-Encoding request header--which I imagine you are not when you
use telnet. You could try adding the accept-encoding header to your
telnet request and see if you get compressed data back.
Mark Meyer
On Apr 4, 2004, at 6:25 AM, Philippe de Rochambeau wrote:
> Howard,
>
> that's interesting. It may be a coincidence though, because when my
> log file contains less than 1002 characters, my programs prints out
> the data correctly. Otherwise, it prints out that weird character.
>
> What is even stranger is the fact that telnet has no problem reading
> the log file, whatever its length.
>
>
> PR
>
> On 4 avr. 04, at 12:30, Howard Jones wrote:
>
>> On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
>>> <snip>
>>> ---------------------------
>>> Whenever I run the debugger, the 'content' variable contains the
>>> following value:
>>>
>>> \x1fã\b
>>
>>
>> Philippe,
>>
>> I believe you are seeing this issue:
>> http://www.livejournal.com/community/macosxdev/52490.html
>> It was mentioned (and confirmed by someone at Apple) on this list a
>> couple of weeks ago, but I can't find that post now.
>>
>> Here's the first few bytes of a random gzipped file I had handy:
>>
>> Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
>> head
>> 00000000 1f 8b 08 00 86 35 9a 3b 00 03 ec 3d 69 73 db 46
>> |.....5.;...=is.F|
>> 00000010 b2 fe 4a fc 8a 36 a5 b5 49 85 a7 a8 23 26 2d 57
>> |..J..6..I...#&-W|
>>
>> The first three match your 'garbage' data. To get around it, you'd
>> need to either uncompress the data using libz (quite
>> straightforward), or arrange for the data not to be gzipped in the
>> first place, either by changing the server config, or using one of
>> the methods mentioned in the link above.
>>
>> Regards,
>>
>> Howard
>>
>
> _______________________________________________
> MacOSX-dev mailing list
> <MacOSX-dev...>
> http://www.omnigroup.com/mailman/listinfo/macosx-dev
>
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
I can assure you that NSURL can handle text longer then 1001 characters.
The http server may make an optimization to not compress before sending
the file if the length of the file is less then a certain amount (why
compress something that doesn't really need compression). Telnet of
course is a different way of accessing the file and one that doesn't do
compression like http servers can support.
-Shawn
On Apr 4, 2004, at 6:25 AM, Philippe de Rochambeau wrote:
> Howard,
>
> that's interesting. It may be a coincidence though, because when my
> log file contains less than 1002 characters, my programs prints out
> the data correctly. Otherwise, it prints out that weird character.
>
> What is even stranger is the fact that telnet has no problem reading
> the log file, whatever its length.
>
> PR
>
> On 4 avr. 04, at 12:30, Howard Jones wrote:
>
>> On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
>>> <snip>
>>> ---------------------------
>>> Whenever I run the debugger, the 'content' variable contains the
>>> following value:
>>>
>>> \x1fã\b
>>
>> Philippe,
>>
>> I believe you are seeing this issue:
>> http://www.livejournal.com/community/macosxdev/52490.html
>> It was mentioned (and confirmed by someone at Apple) on this list a
>> couple of weeks ago, but I can't find that post now.
>>
>> Here's the first few bytes of a random gzipped file I had handy:
>>
>> Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
>> head
>> 00000000 1f 8b 08 00 86 35 9a 3b 00 03 ec 3d 69 73 db 46
>> |.....5.;...=is.F|
>> 00000010 b2 fe 4a fc 8a 36 a5 b5 49 85 a7 a8 23 26 2d 57
>> |..J..6..I...#&-W|
>>
>> The first three match your 'garbage' data. To get around it, you'd
>> need to either uncompress the data using libz (quite
>> straightforward), or arrange for the data not to be gzipped in the
>> first place, either by changing the server config, or using one of
>> the methods mentioned in the link above.
>>
>> Regards,
>>
>> Howard
>>
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev -
As others have commented, the server probably isn't bothering to gzip
small quantities of data. You can confirm this absolutely, though, by
taking a packet dump. There are products like EtherPeek available that
let you look at the packets in a more readable form, but I've always
found the simplest way is to execute the following as root from a
shell:
/usr/sbin/tcpdump -i en0 -s 0 -X host <host you are contacting>
This will dump all packets exchanged between your host and the remote
server in hex, with the ASCII equivalents shown in a column to the
right. You'll be able to examine the outgoing HTTP headers, as well as
the HTTP response sent back by the server. You will see that NSURL is
sending an Accept-Encoding: gzip header (this is the known NSURL bug;
it's sending this Accept-Encoding header without being prepared to
process returned gzip'ed content), and I think you will see that for
content larger than 1001 characters, the server is sending back a
Content-Encoding: gzip header.
Best workaround is to use NSURLRequest & NSURLConnection instead.
CFNetwork is nearly as good unless you need cookies, authentication, or
caching support - these are all add-on features that NSURLConnection
provides above and beyond CFNetwork. If you can rely on the server
always gzip'ing, you can also choose to unzip yourself on the receiving
end.
Hope that helps,
REW
_______________________________________________
MacOSX-dev mailing list
<MacOSX-dev...>
http://www.omnigroup.com/mailman/listinfo/macosx-dev



