Memory limits under Garbage Collection, NSMutableData vs. malloc()

  • I have been experimenting with garbage collection under 10.5, and have
    had a few difficulties I don't quite understand.

    I have a method that needs to allocate a very large chunk of memory
    (about 700MB in my test case) to do some processing (I am on a 32 bit
    Intel Core Duo with 2GB RAM).  Under Tiger, this always worked
    reasonably well and the performance was good.  However under Leopard,
    the same code has started failing under GC.  What's strange is that
    allocating the same amount of memory behaves quite differently
    depending on how it's done.  Specifically:

    - if I use malloc() to allocate a traditional C buffer, it is no
    problem.  The allocation is also very fast (no perceptible latency)

    - if I use [NSMutableData dataWithLength:theLength]; to allocate the
    same amount of memory, *without* garbage collection, it is very slow
    (10 seconds) but it is successful.

    - if I use [NSMutableData dataWithLength:theLength]; to allocate the
    same amount of memory, *with* garbage collection (i.e. required), the
    allocation fails with the following message:

    2007-11-13 17:11:06.686 MyApp[10413:10b] Here1
    MyApp(10413,0x37cf60) malloc: *** mmap(size=788488192) failed (error
    code=12)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    2007-11-13 17:11:06.699 MyApp[10413:10b] Here2

    (Here1 and Here2 are from NSLog's that bracket the constructor
    statement)

    I'm not surprised there's some performance difference between malloc()
    and NSMutableData's convenience constructor, but I'm a little
    surprised at how much.  Also I'm trying to figure out why GC leads to
    a failure, when the same code run non-GC does not.  I guess it's quite
    possible that the collector may not know in advance that it's about to
    be hit with a huge memory demand, so perhaps it's let a lot of garbage
    accumulate at the time of my constructor...

    Obviously I can break the memory chunk up into smaller bits, or just
    use malloc().  I'd just like to understand what's going on under the
    hood a bit better to make appropriate choices in future code.

    Thanks,

    Rick
  • On Nov 13, 2007, at 3:30 PM, Rick Hoge wrote:

    > 2007-11-13 17:11:06.686 MyApp[10413:10b] Here1
    > MyApp(10413,0x37cf60) malloc: *** mmap(size=788488192) failed (error
    > code=12)
    > *** error: can't allocate region
    > *** set a breakpoint in malloc_error_break to debug
    > 2007-11-13 17:11:06.699 MyApp[10413:10b] Here2

    This is what happens when memory allocation fails due to running out
    of space. Remember, the maximum amount of RAM an application can
    allocate in a 32-bit space is 4 GB.

    > I'm not surprised there's some performance difference between
    > malloc() and NSMutableData's convenience constructor, but I'm a
    > little surprised at how much.  Also I'm trying to figure out why GC
    > leads to a failure, when the same code run non-GC does not.  I guess
    > it's quite possible that the collector may not know in advance that
    > it's about to be hit with a huge memory demand, so perhaps it's let
    > a lot of garbage accumulate at the time of my constructor...

    When it's being run with GC, then something else is sticking around in
    memory and is not being freed when it was previously expected to be
    freed.

    The experts will probably have something better to say here, but if
    your application needs to allocate huge amounts of memory, then maybe
    it'll be better off only using GC in 64-bit builds...

    Nick Zitzmann
    <http://www.chronosnet.com/>
  • On Nov 13, 2007, at 2:30 PM, Rick Hoge wrote:
    > I'm not surprised there's some performance difference between
    > malloc() and NSMutableData's convenience constructor, but I'm a
    > little surprised at how much.  Also I'm trying to figure out why GC
    > leads to a failure, when the same code run non-GC does not.  I guess
    > it's quite possible that the collector may not know in advance that
    > it's about to be hit with a huge memory demand, so perhaps it's let
    > a lot of garbage accumulate at the time of my constructor...
    >
    > Obviously I can break the memory chunk up into smaller bits, or just
    > use malloc().  I'd just like to understand what's going on under the
    > hood a bit better to make appropriate choices in future code.

    Unless you are going to stuff the 700MB block of memory full of
    pointers to objects controlled & scanned by the collector, just use
    malloc.    In a GC process, you have scanned and unscanned memory.
    Quite literally, the collector scans every location eligible to
    contain a pointer value looking for object refferences (with some
    significant optimizations under the hood to avoid *actually* scanning
    every address).

    Thus, unless you are going to shove pointers to objects into that
    700MB, you are far far better off using malloc(), which will leave the
    memory as unscanned.

    b.bum
  • On 11/13/07, Nick Zitzmann <nick...> wrote:
    >
    > On Nov 13, 2007, at 3:30 PM, Rick Hoge wrote:
    >
    >> 2007-11-13 17:11:06.686 MyApp[10413:10b] Here1
    >> MyApp(10413,0x37cf60) malloc: *** mmap(size=788488192) failed (error
    >> code=12)
    >> *** error: can't allocate region
    >> *** set a breakpoint in malloc_error_break to debug
    >> 2007-11-13 17:11:06.699 MyApp[10413:10b] Here2
    >
    > This is what happens when memory allocation fails due to running out
    > of space. Remember, the maximum amount of RAM an application can
    > allocate in a 32-bit space is 4 GB.

    In this case he likely has that much available still in his processes
    address space however not as a contiguous block. In situations like
    this it may be best to make this type of allocation early in the
    applications life-cycle to avoid virtual memory space fragmentation
    issues that can cause it to fail later on after the application has
    been running for a while.

    Note it shouldn't be wasteful to do this since the allocation should
    only reserve the memory and not physical pages (those get zero filled
    on page fault, aka when first accessed).

    -Shawn
  • Thanks for the reply -

    On 13-Nov-07, at 7:10 PM, Nick Zitzmann wrote:

    >
    > On Nov 13, 2007, at 3:30 PM, Rick Hoge wrote:
    >
    >> 2007-11-13 17:11:06.686 MyApp[10413:10b] Here1
    >> MyApp(10413,0x37cf60) malloc: *** mmap(size=788488192) failed
    >> (error code=12)
    >> *** error: can't allocate region
    >> *** set a breakpoint in malloc_error_break to debug
    >> 2007-11-13 17:11:06.699 MyApp[10413:10b] Here2
    >
    > This is what happens when memory allocation fails due to running out
    > of space. Remember, the maximum amount of RAM an application can
    > allocate in a 32-bit space is 4 GB.

    Yes - the weird thing is that a malloc() of the exact same amount of
    memory, at exactly the same place, has no problem.

    >> I'm not surprised there's some performance difference between
    >> malloc() and NSMutableData's convenience constructor, but I'm a
    >> little surprised at how much.  Also I'm trying to figure out why GC
    >> leads to a failure, when the same code run non-GC does not.  I
    >> guess it's quite possible that the collector may not know in
    >> advance that it's about to be hit with a huge memory demand, so
    >> perhaps it's let a lot of garbage accumulate at the time of my
    >> constructor...
    >
    > When it's being run with GC, then something else is sticking around
    > in memory and is not being freed when it was previously expected to
    > be freed.

    This sounds like the problem -

    > The experts will probably have something better to say here, but if
    > your application needs to allocate huge amounts of memory, then
    > maybe it'll be better off only using GC in 64-bit builds...

    There is a certain irony in that many of our users don't put more than
    2GB of RAM on their systems (due to cost I expect) whether it's a 32
    bit or 64 bit CPU.  I ran the above tests on a Mac Pro (65 bit Xeon)
    with 16GB RAM and a 64 bit build and got the same failure.

    >
    >
    > Nick Zitzmann
    > <http://www.chronosnet.com/>
    >
  • On 13-Nov-07, at 7:12 PM, Bill Bumgarner wrote:

    > On Nov 13, 2007, at 2:30 PM, Rick Hoge wrote:
    >> I'm not surprised there's some performance difference between
    >> malloc() and NSMutableData's convenience constructor, but I'm a
    >> little surprised at how much.  Also I'm trying to figure out why GC
    >> leads to a failure, when the same code run non-GC does not.  I
    >> guess it's quite possible that the collector may not know in
    >> advance that it's about to be hit with a huge memory demand, so
    >> perhaps it's let a lot of garbage accumulate at the time of my
    >> constructor...
    >>
    >> Obviously I can break the memory chunk up into smaller bits, or
    >> just use malloc().  I'd just like to understand what's going on
    >> under the hood a bit better to make appropriate choices in future
    >> code.
    >
    > Unless you are going to stuff the 700MB block of memory full of
    > pointers to objects controlled & scanned by the collector, just use
    > malloc.    In a GC process, you have scanned and unscanned memory.
    > Quite literally, the collector scans every location eligible to
    > contain a pointer value looking for object refferences (with some
    > significant optimizations under the hood to avoid *actually*
    > scanning every address).

    Is it always necessary to use scanned memory if I want to use GC?  If
    I know that the memory is just going to hold data values (not pointers
    - e.g. pixel intensity values or something) can I use

    NSAllocateCollectible(lotsOfPixels*3*sizeof(float),0); // The zero
    means not to scan

    I get the impression that the scan will allow the collector to be
    aware of object references contained in the allocated memory - it's
    not needed to collect the memory itself once it's no longer "reachable".

    > Thus, unless you are going to shove pointers to objects into that
    > 700MB, you are far far better off using malloc(), which will leave
    > the memory as unscanned.

    I don't mind using malloc/free pairs, but the prospect of
    NSAllocateCollectible is interesting.  My code has a number of
    termination pathways (if some condition, return with a not-successful
    code) and currently it's kind of messy as you need to provide free()
    statements for any preceding malloc's (as good programming practice
    has always dictated).  If I could omit these free's the code would be
    a lot cleaner (although the idea takes some getting used to).

    Rick

    > b.bum
    >
    >
    >
previous month november 2007 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today