Locating managed objects within ObjectAlloc (was Re: Garbage collection, core data, and tight loops)

  • On Nov 2, 2007, at 3:01 PM, John R. Timmer wrote:

    > Using garbage collection resulted in a significant memory gain, but
    > nowhere near bad enough to crash the program.  Oddly, the memory use
    > did not subside after the loop had finished.

    This sounds like caching behavior that I think I've observed.  My app
    create millions of managed objects during an import process, and I've
    seen large memory usage that doesn't go down when the objects are
    dealloced.  Looking at backtraces in ObjectAlloc, I see a lot of
    mallocs in caching functions as various objects are dealloc'ed.  If I
    close the NSPersistentDocument, I get some memory reclaimed, but not
    what I expect.  But, if I repeated run the import (without quitting),
    the peak memory doesn't go up, so the memory is apparently reused.

    This leads me to a question about using ObjectAlloc (in Instruments on
    Leopard) effectively, especially with respect to Core Data.  When
    looking at allocations, there are a large number of GeneralBlocks of
    various sizes, but no NSManagedObjects.  I have some custom
    subclasses, too, and those aren't identified either in the Categories
    column.  I want to observe the usage patterns of the various entities
    in my model, but I'm not sure how to do that.  Is there an easy way to
    filter managed objects?

    Thanks,

    ----
    Aaron Burghardt
    <aburgh...>
  • > This sounds like caching behavior that I think I've observed.  My app
    > create millions of managed objects during an import process, and I've
    > seen large memory usage that doesn't go down when the objects are
    > dealloced.

    You mean RSIZE as reported by 'top', not malloc's free heap space as
    reported by 'heap'.

    > Looking at backtraces in ObjectAlloc, I see a lot of
    > mallocs in caching functions as various objects are dealloc'ed.  If I
    > close the NSPersistentDocument, I get some memory reclaimed, but not
    > what I expect.  But, if I repeated run the import (without quitting),
    > the peak memory doesn't go up, so the memory is apparently reused.

    Correct.  The memory is freed (heap space), but not returned to the
    kernel (VM mapping).  This is the behavior of malloc on OSX.  You will
    see a high watermark effect.

    > This leads me to a question about using ObjectAlloc (in Instruments on
    > Leopard) effectively, especially with respect to Core Data.  When
    > looking at allocations, there are a large number of GeneralBlocks of
    > various sizes, but no NSManagedObjects.  I have some custom
    > subclasses, too, and those aren't identified either in the Categories
    > column.  I want to observe the usage patterns of the various entities
    > in my model, but I'm not sure how to do that.  Is there an easy way to
    > filter managed objects?

    No, allocation events for managed objects are not currently tracked by
    ObjectAlloc except as general blocks.  The 'heap' and 'leaks' tools
    both perceive managed objects.  You'll probably find the 'heap' tool
    useful for experimenting with these questions.  Also, 'malloc_history'
    is much improved on Leopard and often overlooked.

    - Ben
  • On Nov 3, 2007, at 7:08 PM, Ben Trumbull wrote:
    >>
    > You mean RSIZE as reported by 'top', not malloc's free heap space as
    > reported by 'heap'.

    Actually, I was observing VSIZE (which I thought should track with
    RSIZE as long as the system isn't swapping my app) and also the graph
    in ObjectAlloc, which appeared to correlate with VSIZE.  In other
    words, even the graph of all allocations in ObjectAlloc shows only a
    small reduction when the document is closed.  'heap' is the tool I
    needed, though.  Thanks!

    >> the peak memory doesn't go up, so the memory is apparently reused.
    >
    > Correct.  The memory is freed (heap space), but not returned to the
    > kernel (VM mapping).  This is the behavior of malloc on OSX.  You
    > will see a high watermark effect.
    >
    I've read that, but forgot to consider it in this case.  However, I'm
    confused by the test case below.  When the memory is malloc'ed, VSIZE
    goes up.  When the memory is used, RSIZE goes up.  When the memory is
    freed, both VSIZE and RSIZE go back down.  Does this contradict what
    you are saying?

    #include <stdlib.h>

    int main( int argc, char *argv[])
    {
        sleep(5);
        char *ptr;
        while (1) {
            ptr = malloc(10 * 1024 * 1024);
            sleep(10);

            int i;
            for( i = 0; i < (10 * 1024 * 1024); i++)
                *(ptr + i) = 'x';
            sleep(10);

            free(ptr);
            sleep(10);
        }
    }

    >
    > No, allocation events for managed objects are not currently tracked
    > by ObjectAlloc except as general blocks.  The 'heap' and 'leaks'
    > tools both perceive managed objects.  You'll probably find the
    > 'heap' tool useful for experimenting with these questions.  Also,
    > 'malloc_history' is much improved on Leopard and often overlooked.

    I was using 'leaks' and had a few leaks, but nothing that explained
    the large amount of memory I was using, so I suspected I had a retain
    cycle with some managed objects.  Ultimately, I found I forgot to add
    an autorelease pool to one of my loops.

    You are right, 'heap' would have been very useful for this.  I've used
    malloc_history for an over-release/BAD_EXEC bug, and it was great for
    that.

    Thanks, Ben.

    Aaron
  • On Nov 4, 2007, at 3:27 AM, <ajb.lists...> wrote:
    >> Correct.  The memory is freed (heap space), but not returned to the
    >> kernel (VM mapping).  This is the behavior of malloc on OSX.  You
    >> will see a high watermark effect.
    >>
    > I've read that, but forgot to consider it in this case.  However,
    > I'm confused by the test case below.  When the memory is malloc'ed,
    > VSIZE goes up.  When the memory is used, RSIZE goes up.  When the
    > memory is freed, both VSIZE and RSIZE go back down.  Does this
    > contradict what you are saying?

    No.  The relationship is more complicated.  For large blocks of
    memory, malloc will return the allocation back to the kernel.  For
    small blocks, it will not, but instead coalesce the free space for
    reuse later.  The objects used in Core Data are almost exclusively
    small.  Amit Singh's book describes the malloc algorithms in
    considerable detail.

    <http://www.amazon.com/Mac-OS-Internals-Systems-Approach/dp/0321278542/ref=s
    r_1_1/103-1158819-7883057?ie=UTF8&s=books&qid=1194215432&sr=1-1
    >

    > ptr = malloc(10 * 1024 * 1024);

    Try again with something closer to the size of your managed objects.
    Like 96-256 bytes.

    Generally, I find RSIZE to reflect the high watermark of the process,
    and use 'heap' to examine the currently free space within it.  Your
    mileage may vary.  There's some pretty vociferous disagreement over
    the "ideal" way to measure memory utilization (available v.s. vm
    deallocated v.s. internal fragmentation, etc)

    - Ben
  • On Nov 4, 2007, at 5:36 PM, Ben Trumbull wrote:
    >
    > On Nov 4, 2007, at 3:27 AM, <ajb.lists...> wrote:
    >>

    > No.  The relationship is more complicated.  For large blocks of
    > memory, malloc will return the allocation back to the kernel.  For
    > small blocks, it will not, but instead coalesce the free space for
    > reuse later.  The objects used in Core Data are almost exclusively
    > small.  Amit

    >> ptr = malloc(10 * 1024 * 1024);
    >
    > Try again with something closer to the size of your managed
    > objects.  Like 96-256 bytes.
    >
    > Generally, I find RSIZE to reflect the high watermark of the
    > process, and use 'heap' to examine the currently free space within
    > it.  Your mileage may vary.  There's some pretty vociferous
    > disagreement over the "ideal" way to measure memory utilization
    > (available v.s. vm deallocated v.s. internal fragmentation, etc)
    >
      Yup, even 10 KB mallocs were small enough.  For anyone that may be
    following this thread, the following test looks different in Activity
    Monitor (and and presumably ObjectAlloc).  Once the VSIZE and RSIZE go
    up, they stay at that level until the test quits:

    #include <stdlib.h>

    int main( int argc, char *argv[])
    {
        sleep(5);
        char *ptr[1000];
        int i;
        while (1) {
            for( i = 0; i < 1000; i++)
                ptr[i] = malloc(10 * 1024);
            sleep(10);

            int j;
            for( i = 0; i < 1000; i++) {
                for( j = 0; j < (10 * 1024); j++)
                    *(ptr[i] + j) = 'x';
            }
            sleep(10);

            for( i = 0; i < 1000; i++)
                free(ptr[i]);
            sleep(10);
        }
    }

    Thanks again,

    Aaron
  • On Nov 4, 2007, at 7:01 PM, Aaron Burghardt wrote:

    > Yup, even 10 KB mallocs were small enough.  For anyone that may be
    > following this thread, the following test looks different in
    > Activity Monitor (and and presumably ObjectAlloc).  Once the VSIZE
    > and RSIZE go up, they stay at that level until the test quits:

    And do not represent "the memory used by the test," but rather "the
    address space used by the test."  These are very different things.
    Address space can be in use with no physical memory backing it, and as
    Ben said, can represent a high-water mark.

    In top(1) terms, the best measure of actual memory use for a process
    is RPRVT, not RSIZE.  As seen on the top(1) man page:

      RPRVT  -  Resident private memory size.
      RSIZE  -  Total resident memory size, including shared pages.
      VSIZE  -  Total address space allocated, including shared pages.

    "Shared" pages are memory pages that are mapped into multiple address
    spaces, typically read-only.  These include the system frameworks and
    window back-buffers handed out by the window server.

    If I look at, say, a fresh launch of TextEdit on my system right now,
    it has the following:

        PID    COMMAND    RPRVT    RSIZE    VSIZE
      6596    TextEdit  1680K    6324K    358M

    Thus even though it has 358M of its address space assigned, and an
    RSIZE of 6324K, only 1680K of that is really "unique" to TextEdit.
    And doing a "heap TextEdit" corroborates this; the number of bytes
    allocated is very close to the RPRVT value.  (I think the RPRVT value
    is calculated in terms of allocated pages, rather than allocated
    bytes, whereas I think heap looks at the malloc statistics which are
    in terms of bytes.)

      -- Chris
previous month november 2007 next month
MTWTFSS
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    
Go to today