Size limit of NSData or NSFileHandle?
-
Hi,
Is anyone aware of a size limit in NSData (I am aware of the supposed
2GB size limit of NSData according to the docs) or NSFileHandle?
I am using the - [NSFileHandle availableData] method to get all the
data of a file in one NSData object. But it seems that the maximum
amount of data I'll get back is 268,435,456 bytes -- it happens to be
precisely 2^28, by the way. I tried both (A) NSData *fileData =
[inFile availableData]; and (B) NSData *fileData = [inFile
readDataOfLength:fileSize]; but no luck either way.
I use this way of reading the entire file contents at once for many
files, and for the first time today noticed that it broke on a very
large file of about 310 MB in size.
I see no exceptions being raised. Is this a bug, or is someone aware
of this behavior? Couldn't find any information on this...
Thanks,
-- ivan -
At 16:33 09.04.2006 -0400, Ivan Kourtev wrote:
> Is anyone aware of a size limit in NSData (I am aware of the supposed
> 2GB size limit of NSData according to the docs) or NSFileHandle?
I wouldn't be surprised if you run into limits with mmap,
per-process memory space or something similar. My idea would
be to try mapping that file using mmap() yourself, and see
if it works. At least you'll know whether Darwin or Cocoa
is the part to blame :-)
Also, keep in mind that a lot of classes have that 2 (or 4, if you
feel lucky) GB limit, since sizes, lengths etc. are always typed
"unsigned int", never off_t. At least I always considered the
limit to be 2GB (unsigned int would allow for 4GB, but never
trust software you haven't written yourself --- an "int" in the
wrong place and things get messed up bad); never actually tried it
--- I'm just using my own classes when dealing with potentially
"large" files anyway.
Christian -
The maximum amount of virtual address space available to an OS X app is 4
GB. The maximum size of a file on an HFS+ volume is over 4 billion times
larger than 4 GB (the precise limit is 2^63 bytes, a number greater than the
estimated number of atoms in the entire universe as it so happens).
Therefore it is reasonable to expect that some files may be too large for
their entire contents to be resident in an OS X app's memory at one time. So
when reading a file, a program should not assume that the first read
operation has read the file's entire contents - instead the program should
keep on reading the file's contents until no more data is left to be read.
Here is some code to demonstrate this technique:
NSFileHandle *fh = [NSFileHandle fileHandleForReadingAtPath:
@"BigFile.txt"];
assert(fh != NULL);
unsigned len = 0;
do
{
NSData * data = [fh availableData];
len = [data length];
printf("data len: %d offset: %lld\n", len, [fh offsetInFile]);
// release the data before reading more
// or the consequences will be quite unpleasant
[data release];
} while (len != 0);
Greg
On 4/9/06 1:33 PM, "Ivan Kourtev" <isk_lists...> wrote:
> Hi,
>
> Is anyone aware of a size limit in NSData (I am aware of the supposed
> 2GB size limit of NSData according to the docs) or NSFileHandle?
>
> I am using the - [NSFileHandle availableData] method to get all the
> data of a file in one NSData object. But it seems that the maximum
> amount of data I'll get back is 268,435,456 bytes -- it happens to be
> precisely 2^28, by the way. I tried both (A) NSData *fileData =
> [inFile availableData]; and (B) NSData *fileData = [inFile
> readDataOfLength:fileSize]; but no luck either way.
>
> I use this way of reading the entire file contents at once for many
> files, and for the first time today noticed that it broke on a very
> large file of about 310 MB in size.
>
> I see no exceptions being raised. Is this a bug, or is someone aware
> of this behavior? Couldn't find any information on this...
>
> Thanks,
>
> -- ivan
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Cocoa-dev mailing list (<Cocoa-dev...>)
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<greghe...>
>
> This email sent to <greghe...>
-
Greg,
you are right of course, but for one small glitch:
On 10.4.2006, at 16:21, Greg Herlihy wrote:
> NSData * data = [fh availableData];
> len = [data length];
> printf("data len: %d offset: %lld\n", len, [fh offsetInFile]);
> // release the data before reading more
> // or the consequences will be quite unpleasant
> [data release];
availableData happens to return an autorelease object (just like more
or less anything but for alloc/init, new, and mutable/copy). Thus,
the proper pattern is (something like)
for (;;) {
NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
NSData *data=[fh availableData];
if (![data length]) break;
...
[pool release];
}
Depending on the task it might be better to release the pool not each
time, but every N times, yadda yadda yadda :)
---
Ondra ÄŒada
OCSoftware: <ocs...> http://www.ocs.cz
private <ondra...> http://www.ocs.cz/oc -
On Apr 10, 2006, at 9:33 AM, Ondra Cada wrote:
> availableData happens to return an autorelease object (just like
> more or less anything but for alloc/init, new, and mutable/copy).
> Thus, the proper pattern is (something like)
>
> for (;;) {
> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
> NSData *data=[fh availableData];
> if (![data length]) break;
> ...
> [pool release];
> }
>
> Depending on the task it might be better to release the pool not
> each time, but every N times, yadda yadda yadda :)
This will result in a leak as well, since when the break is hit,
neither the pool, nor anything in it, will be released. Instead:
NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
for (unsigned i=0;;i++) {
NSData *data=[fh availableData];
if (![data length]) break;
...
if (i % 10 == 9) { // release every 10 times through
[pool release];
pool = [[NSAutoreleasePool alloc] init];
}
}
[pool release];
(You can, of course, release every time through, or something other
than 10)
Glenn Andreas <gandreas...>
<http://www.gandreas.com/> wicked fun!
quadrium | build, mutate, evolve | images, textures, backgrounds, art -
Glenn,
On 10.4.2006, at 16:47, glenn andreas wrote:
>> for (;;) {
>> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
>> NSData *data=[fh availableData];
>> if (![data length]) break;
>> ...
>> [pool release];
>> }
>>
>> Depending on the task it might be better to release the pool not
>> each time, but every N times, yadda yadda yadda :)
>
> This will result in a leak as well, since when the break is hit,
> neither the pool, nor anything in it, will be released.
Nope it won't. Autorelease pools are smart enough -- the last one
gets automatically taken care of by the parent pool.
---
Ondra ÄŒada
OCSoftware: <ocs...> http://www.ocs.cz
private <ondra...> http://www.ocs.cz/oc -
The autorelease pool holding the NSData should still be released upon every
iteration of the loop. The program will not be able to complete ten
iterations without having released the pool, because there is almost
certainly not enough contiguous virtual address space available to
accommodate ten 268 MB allocations all at the same time.
And while releasing the pool more frequently than every tenth iteration may
avoid complete memory exhaustion, the mere accumulation of such large blocks
of memory can easily cause the hard disk to thrash and the system to come to
a complete crawl as these oversized blocks are continuously being paged to
and from the disk.
For these reasons, I would also reinstate the warning about the consequences
of not releasing the NSData objects in a timely manner. The warning should
also be rewritten, since the one I wrote did not turn out to be very
effective :).
Greg
On 4/10/06 7:47 AM, "glenn andreas" <gandreas...> wrote:
>
> On Apr 10, 2006, at 9:33 AM, Ondra Cada wrote:
>
>> availableData happens to return an autorelease object (just like
>> more or less anything but for alloc/init, new, and mutable/copy).
>> Thus, the proper pattern is (something like)
>>
>> for (;;) {
>> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
>> NSData *data=[fh availableData];
>> if (![data length]) break;
>> ...
>> [pool release];
>> }
>>
>> Depending on the task it might be better to release the pool not
>> each time, but every N times, yadda yadda yadda :)
>
> This will result in a leak as well, since when the break is hit,
> neither the pool, nor anything in it, will be released. Instead:
>
> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
> for (unsigned i=0;;i++) {
> NSData *data=[fh availableData];
> if (![data length]) break;
> ...
> if (i % 10 == 9) { // release every 10 times through
> [pool release];
> pool = [[NSAutoreleasePool alloc] init];
> }
> }
> [pool release];
>
> (You can, of course, release every time through, or something other
> than 10)
>
>
>
> Glenn Andreas <gandreas...>
> <http://www.gandreas.com/> wicked fun!
> quadrium | build, mutate, evolve | images, textures, backgrounds, art
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Cocoa-dev mailing list (<Cocoa-dev...>)
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/cocoa-dev/<greghe...>
>
> This email sent to <greghe...>
-
Greg:
I tried your approach -- consecutive calls to - [NSFileHandle
availableData] -- and it indeed worked, so I got the file in two
chunks. I also find another workaround:
fileData = [[NSData alloc] initWithContentsOfFile:filename
options:NSUncachedRead | NSMappedRead error:&err];
which initialized the data with the entire file size of 310 MB.
This clearly indicates that either the documentation for -
[NSFileHandle availableData] or its implementation is wrong, quote
from Apple's Docs (see quote following): "Discussion: If the receiver
is a file, returns the data obtained by reading the file from the
file pointer to the end of the file. If the receiver is a
communications channel, reads up to a buffer of data and returns it;
if no data is available, the method blocks. Returns an empty data
object if the end of file is reached. Raises
NSFileHandleOperationException if attempts to determine file-handle
type fail or if attempts to read from the file or channel fail."
--
ivan
On Apr 10, 2006, at 10:21 AM, Greg Herlihy wrote:
> The maximum amount of virtual address space available to an OS X
> app is 4
> GB. The maximum size of a file on an HFS+ volume is over 4 billion
> times
> larger than 4 GB (the precise limit is 2^63 bytes, a number greater
> than the
> estimated number of atoms in the entire universe as it so happens).
>
> Therefore it is reasonable to expect that some files may be too
> large for
> their entire contents to be resident in an OS X app's memory at one
> time. So
> when reading a file, a program should not assume that the first read
> operation has read the file's entire contents - instead the program
> should
> keep on reading the file's contents until no more data is left to
> be read.
>
> Here is some code to demonstrate this technique:
>
>
> NSFileHandle *fh = [NSFileHandle fileHandleForReadingAtPath:
> @"BigFile.txt"];
>
> assert(fh != NULL);
>
> unsigned len = 0;
>
> do
> {
> NSData * data = [fh availableData];
> len = [data length];
>
> printf("data len: %d offset: %lld\n", len, [fh offsetInFile]);
>
> // release the data before reading more
> // or the consequences will be quite unpleasant
>
> [data release];
>
> } while (len != 0);
>
> Greg
>
> On 4/9/06 1:33 PM, "Ivan Kourtev" <isk_lists...> wrote:
>
>> Hi,
>>
>> Is anyone aware of a size limit in NSData (I am aware of the supposed
>> 2GB size limit of NSData according to the docs) or NSFileHandle?
>>
>> I am using the - [NSFileHandle availableData] method to get all the
>> data of a file in one NSData object. But it seems that the maximum
>> amount of data I'll get back is 268,435,456 bytes -- it happens to be
>> precisely 2^28, by the way. I tried both (A) NSData *fileData =
>> [inFile availableData]; and (B) NSData *fileData = [inFile
>> readDataOfLength:fileSize]; but no luck either way.
>>
>> I use this way of reading the entire file contents at once for many
>> files, and for the first time today noticed that it broke on a very
>> large file of about 310 MB in size.
>>
>> I see no exceptions being raised. Is this a bug, or is someone aware
>> of this behavior? Couldn't find any information on this...
>>
>> Thanks,
>>
>> -- ivan
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Cocoa-dev mailing list (<Cocoa-dev...>)
>> Help/Unsubscribe/Update your Subscription:
>> http://lists.apple.com/mailman/options/cocoa-dev/<greghe...>
>>
>> This email sent to <greghe...>
>
>
-
On Apr 10, 2006, at 9:54 AM, Ondra Cada wrote:
> Glenn,
>
> On 10.4.2006, at 16:47, glenn andreas wrote:
>
>>> for (;;) {
>>> NSAutoreleasePool *pool=[[NSAutoreleasePool alloc] init];
>>> NSData *data=[fh availableData];
>>> if (![data length]) break;
>>> ...
>>> [pool release];
>>> }
>>>
>>> Depending on the task it might be better to release the pool not
>>> each time, but every N times, yadda yadda yadda :)
>>
>> This will result in a leak as well, since when the break is hit,
>> neither the pool, nor anything in it, will be released.
>
> Nope it won't. Autorelease pools are smart enough -- the last one
> gets automatically taken care of by the parent pool.
Well, what do you know:
> If you neglect to send release to an autorelease pool when you are
> finished with it (something not recommended), it is released when
> one of the autorelease pools in which it nests is released.
And here I've just been paranoid for all these years (though it seems
that this paranoia at least was recommended). -
>> If you neglect to send release to an autorelease pool when you are
>> finished with it (something not recommended), it is released when
>> one of the autorelease pools in which it nests is released.
>
> And here I've just been paranoid for all these years (though it seems
> that this paranoia at least was recommended).
Well, OK, but do we know for sure that the run loop creates and releases a
pool for each event? Or could it use some internal, possibly undocumented,
optimization whereby it releases the referenced objects and reuses the pool?
--
Scott Ribe
<scott_ribe...>
http://www.killerbytes.com/
(303) 722-0567 voice



