Newbie: problem with NSURL?

  • Hello,

    I have written a  small Cocoa program, which is supposed to read via
    the Web a log file located on a Linux server and parse it.

    Unfortunately, it only reads garbage although the file itself contains
    valid data (written by a php script running on Linux) such as:

    2004-03-04 126.66.66.66 Netscape
    2004-03-04 666.66.66.66 Safari
    etc.

    Here is the code:

    ---------------------------
    #import "MyController.h"

    @implementation MyController

    -(void)awakeFromNib {
    NSURL *url = [NSURL
    URLWithString:@"http://www.myserver.org/log/logfile.txt"];
    NSString *content = [NSString stringWithContentsOfURL: url];

    NSLog(@"content = %@", content);
    }

    @end
    ---------------------------
    Whenever I run the debugger, the 'content' variable contains the
    following value:

    \x1fã\b

    and the following value is printed by NSLog : ã

    Whenever I replace the above log file by a text file created on my Mac,
    'content' contains valid data (e.g., "hello world...").

    Whenever I have my program read an HTML file located on the Web server,
    'content' also contains valid data (e.g., "<html>...").

    I have downloaded my log file and examined it with BBEdit for control
    characters, but none could be found.

    So what is wrong?

    Any help with this issue would be much appreciated.

    Best regards,

    PR
    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • At 8:10 PM +0200 4/3/04, Philippe de Rochambeau wrote:
    > Hello,
    >
    > I have written a  small Cocoa program, which is supposed to read via
    > the Web a log file located on a Linux server and parse it.
    >
    > Unfortunately, it only reads garbage although the file itself
    > contains valid data (written by a php script running on Linux) such
    > as:

    The problem is the character encoding - NSString is assuming the
    wrong character encoding.  From the documentation:

    "If the contents begin with a byte-order mark (U+FEFF or U+FFFE),
    interprets the contents as Unicode characters. If the contents begin
    with a UTF-8 byte-order mark (EFBBBF), interprets the contents as
    UTF-8. Otherwise interprets the contents as characters in the default
    C string encoding. Since the default C string encoding will vary with
    the user's configuration, do not depend on this method unless you are
    using Unicode or UTF-8 or you can verify the default C string
    encoding."

    Use this code instead (untested off the top of my head):

    -(void)awakeFromNib
    {
    NSURL *url = [NSURL
    URLWithString:@"http://www.myserver.org/log/logfile.txt"];
    NSData *data = [NSData dataWithContentsOfURL: url];
    NSString *content = [[[NSString alloc] initWithData:data
    encoding:NSISOLatin1StringEncoding] autorelease];

    NSLog(@"content = %@", content);
    }

    The NSISOLatin1StringEncoding is just a guess (though probably
    correct).  Depending on your web server config, you may have to try a
    different one.

      - Jon

    --
    ________________________________________________________________________
            Jon Gotow                    <gotow...>
        St. Clair Software              http://www.stclairsoft.com/
        Fax (540)552-5898              ftp://ftp.stclairsoft.com/
    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • Hello,

    unfortunately, the result is the same with an NSData.

    I now know what the problem is: NSURL cannot handle text containing
    more than 1001 characters when it has trouble figuring out the Web
    server's character encoding, or something of the sort.

    Here is how I figured this out. I created a dummy log file with 1002 xs
    and ran my program. As a result, 'content' contained non-displayable
    characters 32 bytes in length. When I removed 1 x, the program
    displayed the 1001 xs correctly. I then moved the 1002 log file to my
    Mac's Web server, changed 'url''s value to
    'http://localhost/~myname/prov4.txt',  and ran the program. The 1002
    characters were displayed correctly.

    Whenever I have my program display the data in the remote Web server's
    index.html file (url: 'http://www.myserver.com/index.html'), 'contain'
    contains rubbish. Whenever I set 'url' to 'http://www.myserver.com',
    'content' contains '<html>...'

    Any suggestions please?

    Cheers,

    PR

    On 4 avr. 04, at 05:12, Jon Gotow wrote:

    > At 8:10 PM +0200 4/3/04, Philippe de Rochambeau wrote:
    >> Hello,
    >>
    >> I have written a  small Cocoa program, which is supposed to read via
    >> the Web a log file located on a Linux server and parse it.
    >>
    >> Unfortunately, it only reads garbage although the file itself
    >> contains valid data (written by a php script running on Linux) such
    >> as:
    >
    > The problem is the character encoding - NSString is assuming the wrong
    > character encoding.  From the documentation:
    >
    > "If the contents begin with a byte-order mark (U+FEFF or U+FFFE),
    > interprets the contents as Unicode characters. If the contents begin
    > with a UTF-8 byte-order mark (EFBBBF), interprets the contents as
    > UTF-8. Otherwise interprets the contents as characters in the default
    > C string encoding. Since the default C string encoding will vary with
    > the user's configuration, do not depend on this method unless you are
    > using Unicode or UTF-8 or you can verify the default C string
    > encoding."
    >
    > Use this code instead (untested off the top of my head):
    >
    > -(void)awakeFromNib
    > {
    > NSURL *url = [NSURL
    > URLWithString:@"http://www.myserver.org/log/logfile.txt"];
    > NSData    *data = [NSData dataWithContentsOfURL: url];
    > NSString *content = [[[NSString alloc] initWithData:data
    > encoding:NSISOLatin1StringEncoding] autorelease];
    >
    > NSLog(@"content = %@", content);
    > }
    >
    > The NSISOLatin1StringEncoding is just a guess (though probably
    > correct).  Depending on your web server config, you may have to try a
    > different one.
    >
    > - Jon
    >
    >
    > --
    > _______________________________________________________________________
    > _
    > Jon Gotow                    <gotow...>
    > St. Clair Software              http://www.stclairsoft.com/
    > Fax (540)552-5898              ftp://ftp.stclairsoft.com/
    >

    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
    > <snip>
    > ---------------------------
    > Whenever I run the debugger, the 'content' variable contains the
    > following value:
    >
    > \x1fã\b

    Philippe,

    I believe you are seeing this issue:
    http://www.livejournal.com/community/macosxdev/52490.html
    It was mentioned (and confirmed by someone at Apple) on this list a
    couple of weeks ago, but I can't find that post now.

    Here's the first few bytes of a random gzipped file I had handy:

    Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
    head
    00000000  1f 8b 08 00 86 35 9a 3b  00 03 ec 3d 69 73 db 46
    |.....5.;...=is.F|
    00000010  b2 fe 4a fc 8a 36 a5 b5  49 85 a7 a8 23 26 2d 57
    |..J..6..I...#&-W|

    The first three match your 'garbage' data. To get around it, you'd need
    to either uncompress the data using libz (quite straightforward), or
    arrange for the data not to be gzipped in the first place, either by
    changing the server config, or using one of the methods mentioned in
    the link above.

    Regards,

    Howard

    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • Howard,

    that's interesting. It may be a coincidence though, because when my log
    file contains less than 1002 characters, my programs prints out the
    data correctly. Otherwise, it prints out that weird character.

    What is even stranger is the fact that telnet has no problem reading
    the log file, whatever its length.

    PR

    On 4 avr. 04, at 12:30, Howard Jones wrote:

    > On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
    >> <snip>
    >> ---------------------------
    >> Whenever I run the debugger, the 'content' variable contains the
    >> following value:
    >>
    >> \x1fã\b
    >
    >
    > Philippe,
    >
    > I believe you are seeing this issue:
    > http://www.livejournal.com/community/macosxdev/52490.html
    > It was mentioned (and confirmed by someone at Apple) on this list a
    > couple of weeks ago, but I can't find that post now.
    >
    > Here's the first few bytes of a random gzipped file I had handy:
    >
    > Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
    > head
    > 00000000  1f 8b 08 00 86 35 9a 3b  00 03 ec 3d 69 73 db 46
    > |.....5.;...=is.F|
    > 00000010  b2 fe 4a fc 8a 36 a5 b5  49 85 a7 a8 23 26 2d 57
    > |..J..6..I...#&-W|
    >
    > The first three match your 'garbage' data. To get around it, you'd
    > need to either uncompress the data using libz (quite straightforward),
    > or arrange for the data not to be gzipped in the first place, either
    > by changing the server config, or using one of the methods mentioned
    > in the link above.
    >
    > Regards,
    >
    > Howard
    >

    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • PR,

    This makes sense to me. I would guess your web server doesn't bother
    compressing output until it reaches a certain length. Even when it
    does, it should only send gzip'd data when the client requests it using
    a Accept-Encoding request header--which I imagine you are not when you
    use telnet. You could try adding the accept-encoding header to your
    telnet request and see if you get compressed data back.

    Mark Meyer

    On Apr 4, 2004, at 6:25 AM, Philippe de Rochambeau wrote:

    > Howard,
    >
    > that's interesting. It may be a coincidence though, because when my
    > log file contains less than 1002 characters, my programs prints out
    > the data correctly. Otherwise, it prints out that weird character.
    >
    > What is even stranger is the fact that telnet has no problem reading
    > the log file, whatever its length.
    >
    >
    > PR
    >
    > On 4 avr. 04, at 12:30, Howard Jones wrote:
    >
    >> On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
    >>> <snip>
    >>> ---------------------------
    >>> Whenever I run the debugger, the 'content' variable contains the
    >>> following value:
    >>>
    >>> \x1fã\b
    >>
    >>
    >> Philippe,
    >>
    >> I believe you are seeing this issue:
    >> http://www.livejournal.com/community/macosxdev/52490.html
    >> It was mentioned (and confirmed by someone at Apple) on this list a
    >> couple of weeks ago, but I can't find that post now.
    >>
    >> Here's the first few bytes of a random gzipped file I had handy:
    >>
    >> Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
    >> head
    >> 00000000  1f 8b 08 00 86 35 9a 3b  00 03 ec 3d 69 73 db 46
    >> |.....5.;...=is.F|
    >> 00000010  b2 fe 4a fc 8a 36 a5 b5  49 85 a7 a8 23 26 2d 57
    >> |..J..6..I...#&-W|
    >>
    >> The first three match your 'garbage' data. To get around it, you'd
    >> need to either uncompress the data using libz (quite
    >> straightforward), or arrange for the data not to be gzipped in the
    >> first place, either by changing the server config, or using one of
    >> the methods mentioned in the link above.
    >>
    >> Regards,
    >>
    >> Howard
    >>
    >
    > _______________________________________________
    > MacOSX-dev mailing list
    > <MacOSX-dev...>
    > http://www.omnigroup.com/mailman/listinfo/macosx-dev
    >

    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • I can assure you that NSURL can handle text longer then 1001 characters.

    The http server may make an optimization to not compress before sending
    the file if the length of the file is less then a certain amount (why
    compress something that doesn't really need compression). Telnet of
    course is a different way of accessing the file and one that doesn't do
    compression like http servers can support.

    -Shawn

    On Apr 4, 2004, at 6:25 AM, Philippe de Rochambeau wrote:

    > Howard,
    >
    > that's interesting. It may be a coincidence though, because when my
    > log file contains less than 1002 characters, my programs prints out
    > the data correctly. Otherwise, it prints out that weird character.
    >
    > What is even stranger is the fact that telnet has no problem reading
    > the log file, whatever its length.
    >
    > PR
    >
    > On 4 avr. 04, at 12:30, Howard Jones wrote:
    >
    >> On 3 Apr 2004, at 19:10, Philippe de Rochambeau wrote:
    >>> <snip>
    >>> ---------------------------
    >>> Whenever I run the debugger, the 'content' variable contains the
    >>> following value:
    >>>
    >>> \x1fã\b
    >>
    >> Philippe,
    >>
    >> I believe you are seeing this issue:
    >> http://www.livejournal.com/community/macosxdev/52490.html
    >> It was mentioned (and confirmed by someone at Apple) on this list a
    >> couple of weeks ago, but I can't find that post now.
    >>
    >> Here's the first few bytes of a random gzipped file I had handy:
    >>
    >> Satans-Little-Helper:~/Work/mon howie$ hexdump -C mon-0.99.2.tar.gz |
    >> head
    >> 00000000  1f 8b 08 00 86 35 9a 3b  00 03 ec 3d 69 73 db 46
    >> |.....5.;...=is.F|
    >> 00000010  b2 fe 4a fc 8a 36 a5 b5  49 85 a7 a8 23 26 2d 57
    >> |..J..6..I...#&-W|
    >>
    >> The first three match your 'garbage' data. To get around it, you'd
    >> need to either uncompress the data using libz (quite
    >> straightforward), or arrange for the data not to be gzipped in the
    >> first place, either by changing the server config, or using one of
    >> the methods mentioned in the link above.
    >>
    >> Regards,
    >>
    >> Howard
    >>

    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
  • As others have commented, the server probably isn't bothering to gzip
    small quantities of data.  You can confirm this absolutely, though, by
    taking a packet dump.  There are products like EtherPeek available that
    let you look at the packets in a more readable form, but I've always
    found the simplest way is to execute the following as root from a
    shell:

    /usr/sbin/tcpdump -i en0 -s 0 -X host <host you are contacting>

    This will dump all packets exchanged between your host and the remote
    server in hex, with the ASCII equivalents shown in a column to the
    right.  You'll be able to examine the outgoing HTTP headers, as well as
    the HTTP response sent back by the server.  You will see that NSURL is
    sending an Accept-Encoding: gzip header (this is the known NSURL bug;
    it's sending this Accept-Encoding header without being prepared to
    process returned gzip'ed content), and I think you will see that for
    content larger than 1001 characters, the server is sending back a
    Content-Encoding: gzip header.

    Best workaround is to use NSURLRequest & NSURLConnection instead.
    CFNetwork is nearly as good unless you need cookies, authentication, or
    caching support - these are all add-on features that NSURLConnection
    provides above and beyond CFNetwork.  If you can rely on the server
    always gzip'ing, you can also choose to unzip yourself on the receiving
    end.

    Hope that helps,
    REW

    _______________________________________________
    MacOSX-dev mailing list
    <MacOSX-dev...>
    http://www.omnigroup.com/mailman/listinfo/macosx-dev
previous month april 2004 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today
MindNode
MindNode offered a free license !