Size limit of NSData or NSFileHandle?

  • Hi,

    Is anyone aware of a size limit in NSData (I am aware of the supposed
    2GB size limit of NSData according to the docs) or NSFileHandle?

    I am using the - [NSFileHandle availableData] method to get all the
    data of a file in one NSData object.  But it seems that the maximum
    amount of data I'll get back is 268,435,456 bytes -- it happens to be
    precisely 2^28, by the way.  I tried both (A) NSData *fileData =
    [inFile availableData]; and (B) NSData *fileData = [inFile
    readDataOfLength:fileSize]; but no luck either way.

    I use this way of reading the entire file contents at once for many
    files, and for the first time today noticed that it broke on a very
    large file of about 310 MB in size.

    I see no exceptions being raised.  Is this a bug, or is someone aware
    of this behavior?  Couldn't find any information on this...

    Thanks,

    -- ivan
  • At 16:33 09.04.2006 -0400, Ivan Kourtev wrote:

    > Is anyone aware of a size limit in NSData (I am aware of the supposed
    > 2GB size limit of NSData according to the docs) or NSFileHandle?

    I wouldn't be surprised if you run into limits with mmap,
    per-process memory space or something similar. My idea would
    be to try mapping that file using mmap() yourself, and see
    if it works. At least you'll know whether Darwin or Cocoa
    is the part to blame :-)

    Also, keep in mind that a lot of classes have that 2 (or 4, if you
    feel lucky) GB limit, since sizes, lengths etc. are always typed
    "unsigned int", never off_t. At least I always considered the
    limit to be 2GB (unsigned int would allow for 4GB, but never
    trust software you haven't written yourself --- an "int" in the
    wrong place and things get messed up bad); never actually tried it
    --- I'm just using my own classes when dealing with potentially
    "large" files anyway.

    Christian
  • The maximum amount of virtual address space available to an OS X app is 4
    GB. The maximum size of a file on an HFS+ volume is over 4 billion times
    larger than 4 GB (the precise limit is 2^63 bytes, a number greater than the
    estimated number of atoms in the entire universe as it so happens).

    Therefore it is reasonable to expect that some files may be too large for
    their entire contents to be resident in an OS X app's memory at one time. So
    when reading a file, a program should not assume that the first read
    operation has read the file's entire contents - instead the program should
    keep on reading the file's contents until no more data is left to be read.

    Here is some code to demonstrate this technique:

        NSFileHandle *fh = [NSFileHandle fileHandleForReadingAtPath:
                              @"BigFile.txt"];

        assert(fh != NULL);

        unsigned len = 0;

        do
        {
            NSData * data = [fh availableData];
            len = [data length];

            printf("data len: %d offset: %lld\n", len, [fh offsetInFile]);

            // release the data before reading more
            // or the consequences will be quite unpleasant

            [data release];

        } while (len != 0);

    Greg

    On 4/9/06 1:33 PM, "Ivan Kourtev" <isk_lists...> wrote:

    > Hi,
    >
    > Is anyone aware of a size limit in NSData (I am aware of the supposed
    > 2GB size limit of NSData according to the docs) or NSFileHandle?
    >
    > I am using the - [NSFileHandle availableData] method to get all the
    > data of a file in one NSData object.  But it seems that the maximum
    > amount of data I'll get back is 268,435,456 bytes -- it happens to be
    > precisely 2^28, by the way.  I tried both (A) NSData *fileData =
    > [inFile availableData]; and (B) NSData *fileData = [inFile
    > readDataOfLength:fileSize]; but no luck either way.
    >
    > I use this way of reading the entire file contents at once for many
    > files, and for the first time today noticed that it broke on a very
    > large file of about 310 MB in size.
    >
    > I see no exceptions being raised.  Is this a bug, or is someone aware
    > of this behavior?  Couldn't find any information on this...
    >
    > Thanks,
    >
    > -- ivan
    > _______________________________________________
    > Do not post admin requests to the list. They will be ignored.
    > Cocoa-dev mailing list      (<Cocoa-dev...>)
    > Help/Unsubscribe/Update your Subscription:
    > http://lists.apple.com/mailman/options/cocoa-dev/<greghe...>
    >
    > This email sent to <greghe...>
  • Greg,

    you are right of course, but for one small glitch:

    On 10.4.2006, at 16:21, Greg Herlihy wrote:

    > NSData * data = [fh availableData];
    > len = [data length];
    > printf("data len: %d offset: %lld\n", len, [fh offsetInFile]);
    > // release the data before reading more
    > // or the consequences will be quite unpleasant
    > [data release];

    availableData happens to return an autorelease object (just like more
    or less anything but for alloc/init, new, and mutable/copy). Thus,
    the proper pattern is (something like)

        for (;;) {
    NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
            NSData *data=[fh availableData];
            if (![data length]) break;
    ...
            [pool release];
        }

    Depending on the task it might be better to release the pool not each
    time, but every N times, yadda yadda yadda :)
    ---
    Ondra ÄŒada
    OCSoftware:    <ocs...>              http://www.ocs.cz
    private        <ondra...>            http://www.ocs.cz/oc
  • On Apr 10, 2006, at 9:33 AM, Ondra Cada wrote:

    > availableData happens to return an autorelease object (just like
    > more or less anything but for alloc/init, new, and mutable/copy).
    > Thus, the proper pattern is (something like)
    >
    > for (;;) {
    > NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
    > NSData *data=[fh availableData];
    > if (![data length]) break;
    > ...
    > [pool release];
    > }
    >
    > Depending on the task it might be better to release the pool not
    > each time, but every N times, yadda yadda yadda :)

    This will result in a leak as well, since when the break is hit,
    neither the pool, nor anything in it, will be released.  Instead:

    NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
    for (unsigned i=0;;i++) {
            NSData *data=[fh availableData];
            if (![data length]) break;
    ...
    if (i % 10 == 9) { // release every 10 times through
      [pool release];
      pool = [[NSAutoreleasePool alloc] init];
    }
    }
    [pool release];

    (You can, of course, release every time through, or something other
    than 10)

    Glenn Andreas                      <gandreas...>
      <http://www.gandreas.com/> wicked fun!
    quadrium | build, mutate, evolve | images, textures, backgrounds, art
  • Glenn,

    On 10.4.2006, at 16:47, glenn andreas wrote:

    >> for (;;) {
    >> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
    >> NSData *data=[fh availableData];
    >> if (![data length]) break;
    >> ...
    >> [pool release];
    >> }
    >>
    >> Depending on the task it might be better to release the pool not
    >> each time, but every N times, yadda yadda yadda :)
    >
    > This will result in a leak as well, since when the break is hit,
    > neither the pool, nor anything in it, will be released.

    Nope it won't. Autorelease pools are smart enough -- the last one
    gets automatically taken care of by the parent pool.
    ---
    Ondra ÄŒada
    OCSoftware:    <ocs...>              http://www.ocs.cz
    private        <ondra...>            http://www.ocs.cz/oc
  • The autorelease pool holding the NSData should still be released upon every
    iteration of the loop. The program will not be able to complete ten
    iterations without having released the pool, because there is almost
    certainly not enough contiguous virtual address space available to
    accommodate ten 268 MB allocations all at the same time.

    And while releasing the pool more frequently than every tenth iteration may
    avoid complete memory exhaustion, the mere accumulation of such large blocks
    of memory can easily cause the hard disk to thrash and the system to come to
    a complete crawl as these oversized blocks are continuously being paged to
    and from the disk.

    For these reasons, I would also reinstate the warning about the consequences
    of not releasing the NSData objects in a timely manner. The warning should
    also be rewritten, since the one I wrote did not turn out to be very
    effective :).

    Greg

    On 4/10/06 7:47 AM, "glenn andreas" <gandreas...> wrote:

    >
    > On Apr 10, 2006, at 9:33 AM, Ondra Cada wrote:
    >
    >> availableData happens to return an autorelease object (just like
    >> more or less anything but for alloc/init, new, and mutable/copy).
    >> Thus, the proper pattern is (something like)
    >>
    >> for (;;) {
    >> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
    >> NSData *data=[fh availableData];
    >> if (![data length]) break;
    >> ...
    >> [pool release];
    >> }
    >>
    >> Depending on the task it might be better to release the pool not
    >> each time, but every N times, yadda yadda yadda :)
    >
    > This will result in a leak as well, since when the break is hit,
    > neither the pool, nor anything in it, will be released.  Instead:
    >
    > NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
    > for (unsigned i=0;;i++) {
    > NSData *data=[fh availableData];
    > if (![data length]) break;
    > ...
    > if (i % 10 == 9) { // release every 10 times through
    > [pool release];
    > pool = [[NSAutoreleasePool alloc] init];
    > }
    > }
    > [pool release];
    >
    > (You can, of course, release every time through, or something other
    > than 10)
    >
    >
    >
    > Glenn Andreas                      <gandreas...>
    > <http://www.gandreas.com/> wicked fun!
    > quadrium | build, mutate, evolve | images, textures, backgrounds, art
    >
    > _______________________________________________
    > Do not post admin requests to the list. They will be ignored.
    > Cocoa-dev mailing list      (<Cocoa-dev...>)
    > Help/Unsubscribe/Update your Subscription:
    > http://lists.apple.com/mailman/options/cocoa-dev/<greghe...>
    >
    > This email sent to <greghe...>
  • Greg:

    I tried your approach -- consecutive calls to - [NSFileHandle
    availableData] -- and it indeed worked, so I got the file in two
    chunks.  I also find another workaround:

    fileData = [[NSData alloc] initWithContentsOfFile:filename
    options:NSUncachedRead | NSMappedRead error:&err];

    which initialized the data with the entire file size of 310 MB.

    This clearly indicates that either the documentation for -
    [NSFileHandle availableData] or its implementation is wrong, quote
    from Apple's Docs (see quote following): "Discussion: If the receiver
    is a file, returns the data obtained by reading the file from the
    file pointer to the end of the file. If the receiver is a
    communications channel, reads up to a buffer of data and returns it;
    if no data is available, the method blocks. Returns an empty data
    object if the end of file is reached. Raises
    NSFileHandleOperationException if attempts to determine file-handle
    type fail or if attempts to read from the file or channel fail."

    --
    ivan

    On Apr 10, 2006, at 10:21 AM, Greg Herlihy wrote:

    > The maximum amount of virtual address space available to an OS X
    > app is 4
    > GB. The maximum size of a file on an HFS+ volume is over 4 billion
    > times
    > larger than 4 GB (the precise limit is 2^63 bytes, a number greater
    > than the
    > estimated number of atoms in the entire universe as it so happens).
    >
    > Therefore it is reasonable to expect that some files may be too
    > large for
    > their entire contents to be resident in an OS X app's memory at one
    > time. So
    > when reading a file, a program should not assume that the first read
    > operation has read the file's entire contents - instead the program
    > should
    > keep on reading the file's contents until no more data is left to
    > be read.
    >
    > Here is some code to demonstrate this technique:
    >
    >
    > NSFileHandle *fh = [NSFileHandle fileHandleForReadingAtPath:
    > @"BigFile.txt"];
    >
    > assert(fh != NULL);
    >
    > unsigned len = 0;
    >
    > do
    > {
    > NSData * data = [fh availableData];
    > len = [data length];
    >
    > printf("data len: %d offset: %lld\n", len, [fh offsetInFile]);
    >
    > // release the data before reading more
    > // or the consequences will be quite unpleasant
    >
    > [data release];
    >
    > } while (len != 0);
    >
    > Greg
    >
    > On 4/9/06 1:33 PM, "Ivan Kourtev" <isk_lists...> wrote:
    >
    >> Hi,
    >>
    >> Is anyone aware of a size limit in NSData (I am aware of the supposed
    >> 2GB size limit of NSData according to the docs) or NSFileHandle?
    >>
    >> I am using the - [NSFileHandle availableData] method to get all the
    >> data of a file in one NSData object.  But it seems that the maximum
    >> amount of data I'll get back is 268,435,456 bytes -- it happens to be
    >> precisely 2^28, by the way.  I tried both (A) NSData *fileData =
    >> [inFile availableData]; and (B) NSData *fileData = [inFile
    >> readDataOfLength:fileSize]; but no luck either way.
    >>
    >> I use this way of reading the entire file contents at once for many
    >> files, and for the first time today noticed that it broke on a very
    >> large file of about 310 MB in size.
    >>
    >> I see no exceptions being raised.  Is this a bug, or is someone aware
    >> of this behavior?  Couldn't find any information on this...
    >>
    >> Thanks,
    >>
    >> -- ivan
    >> _______________________________________________
    >> Do not post admin requests to the list. They will be ignored.
    >> Cocoa-dev mailing list      (<Cocoa-dev...>)
    >> Help/Unsubscribe/Update your Subscription:
    >> http://lists.apple.com/mailman/options/cocoa-dev/<greghe...>
    >>
    >> This email sent to <greghe...>
    >
    >
  • On Apr 10, 2006, at 9:54 AM, Ondra Cada wrote:

    > Glenn,
    >
    > On 10.4.2006, at 16:47, glenn andreas wrote:
    >
    >>> for (;;) {
    >>> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
    >>> NSData *data=[fh availableData];
    >>> if (![data length]) break;
    >>> ...
    >>> [pool release];
    >>> }
    >>>
    >>> Depending on the task it might be better to release the pool not
    >>> each time, but every N times, yadda yadda yadda :)
    >>
    >> This will result in a leak as well, since when the break is hit,
    >> neither the pool, nor anything in it, will be released.
    >
    > Nope it won't. Autorelease pools are smart enough -- the last one
    > gets automatically taken care of by the parent pool.

    Well, what do you know:

    > If you neglect to send release to an autorelease pool when you are
    > finished with it (something not recommended), it is released when
    > one of the autorelease pools in which it nests is released.

    And here I've just been paranoid for all these years (though it seems
    that this paranoia at least was recommended).
  • >> If you neglect to send release to an autorelease pool when you are
    >> finished with it (something not recommended), it is released when
    >> one of the autorelease pools in which it nests is released.
    >
    > And here I've just been paranoid for all these years (though it seems
    > that this paranoia at least was recommended).

    Well, OK, but do we know for sure that the run loop creates and releases a
    pool for each event? Or could it use some internal, possibly undocumented,
    optimization whereby it releases the referenced objects and reuses the pool?

    --
    Scott Ribe
    <scott_ribe...>
    http://www.killerbytes.com/
    (303) 722-0567 voice
previous month april 2006 next month
MTWTFSS
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Go to today
MindNode
MindNode offered a free license !