NSAllocateCollectable() questions

  • Hi all,

    This is a re-post since I didn't receive any response on Sunday.

    I've been writing a library that uses NSAllocateCollectable() quite a
    bit and I have a few questions about proper usage.

    - Copying data
    if I am copying to a malloc'd block, I can use memmove() regardless of
    whether the source is GC'd or not, right?
    if I am copying to a GC block allocated with NSScannedOption, I need
    to use objc_memmove_collectable(), right?
    if I am copying to a GC block allocated with nonscanned memory, I can
    use memmove(), right?

    - Zero'ing data
    There does not seem to be a GC-compatible bzero().  If I loop through
    and zero out each pointer in a scanned block, that would generate a
    write barrier for each pointer which is expensive.  However, if I use
    bzero(), then libauto won't know that I've messed with the block.
    Will it eventually figure it out when it does an exhaustive scan?  Or
    will it never notice that I've zero'd out the block?

    Thanks,
    Brendan
  • On 15 Apr 2008, at 06:42, Brendan Younger wrote:

    > Hi all,
    >
    > This is a re-post since I didn't receive any response on Sunday.
    >
    > I've been writing a library that uses NSAllocateCollectable() quite
    > a bit and I have a few questions about proper usage.

    The problem is that your questions are quite involved and so the only
    people who can give a definitive answer at this point are the people
    inside Apple who deal with the GC.  Some of them do read the list, but
    since the mailing lists are volunteer-based, and since there's a lot
    of traffic and they're busy people, you might not always get their
    attention.

    > - Copying data
    > if I am copying to a malloc'd block, I can use memmove() regardless
    > of whether the source is GC'd or not, right?
    > if I am copying to a GC block allocated with NSScannedOption, I need
    > to use objc_memmove_collectable(), right?
    > if I am copying to a GC block allocated with nonscanned memory, I
    > can use memmove(), right?

    I *think* all three of these are correct.  Obviously in the first and
    last cases you don't want to have any pointers to GCable objects in
    the memory concerned, as they won't be traced by the collector.

    > - Zero'ing data
    > There does not seem to be a GC-compatible bzero().  If I loop
    > through and zero out each pointer in a scanned block, that would
    > generate a write barrier for each pointer which is expensive.
    > However, if I use bzero(), then libauto won't know that I've messed
    > with the block.  Will it eventually figure it out when it does an
    > exhaustive scan?  Or will it never notice that I've zero'd out the
    > block?

    I might be wrong, but I don't *think* libauto will care if you zero
    something out; the problems arise when you write a pointer somewhere
    without a write barrier, expecting the pointed-to object to remain
    live.  There are two problems that doing that can cause:

    1. If you write into an object that's part of an older generation, in
    which case the collector might not know that it needs to use this
    pointer as a root when collecting younger generations.

    2. If you write into an object that has already been scanned during
    this GC cycle, in which case the collector won't know about the new
    pointer.

    In either case, the GC may as a result throw away a live object,
    resulting in a later crash.  Now I don't have access to the GC source
    code, so there could be other potential problems also, but those are
    the two common ones from the literature.

    But like I say, you'll need someone from Apple to give you a
    definitive answer on this.

    FWIW there are a few other dangers with NSAllocateCollectable().  See
    e.g.

      <http://lists.apple.com/archives/cocoa-dev/2008/Feb/msg01530.html>

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • Brendan Younger wrote:
    > I've been writing a library that uses NSAllocateCollectable() quite
    > a bit and I have a few questions about proper usage.

    For those of you playing along at home, let me summarize the reasons
    for the garbage collector's special functions.

    Some heap blocks are "GC-managed". The garbage collector will destroy
    a GC-managed block if no pointers to it are found during the GC's
    scan. NSAllocateCollectable() and [NSObject alloc] return GC-managed
    blocks. malloc() does not.

    Some memory areas are "GC-scanned". The garbage collector will look in
    these regions for interesting pointer values.
    NSAllocateCollectable(NSScannedOption) returns a block of scanned
    memory. NSAllocateCollectable(no NSScannedOption) and malloc() do not;
    their contents are invisible to the garbage collector.

    The Rule for Write Barriers:

        If you write a pointer to a GC-managed block into GC-scanned heap
    memory, you must use an appropriate write barrier.

    Why? Speed. Write barriers allow the garbage collector to cheat, like
    not scanning large swaths of memory, or allowing other threads to
    continue running (and modifying memory) while a scan is in progress.
    Without write barriers, the collector might miss a pointer because
    it's cheating.

    Usually, the compiler does this for you. If `variable` is of an
    Objective-C object type or a __strong pointer type, and you write
    `variable = value`, then the compiler will emit an appropriate write
    barrier for you.

    Note that __weak memory, and memory outside the heap like globals and
    thread stacks, are all handled differently.

    > - Copying data
    > if I am copying to a malloc'd block, I can use memmove() regardless
    > of whether the source is GC'd or not, right?
    > if I am copying to a GC block allocated with nonscanned memory, I
    > can use memmove(), right?

    Correct. If you're writing to GC-unscanned heap memory, then you don't
    need a write barrier. Of course, the contents of the malloc block and
    managed-but-unscanned block are invisible to the garbage collector, so
    you need to be wary of writing GC-managed pointers into it.

    > if I am copying to a GC block allocated with NSScannedOption, I need
    > to use objc_memmove_collectable(), right?

    Correct. The only exception is if the memory being copied contains no
    GC-managed pointer values then you may use memmove(). If you're not
    sure, use objc_memmove_collectable().

    > - Zero'ing data
    > There does not seem to be a GC-compatible bzero(). If I loop through
    > and zero out each pointer in a scanned block, that would generate a
    > write barrier for each pointer which is expensive. However, if I use
    > bzero(), then libauto won't know that I've messed with the block.
    > Will it eventually figure it out when it does an exhaustive scan? Or
    > will it never notice that I've zero'd out the block?

    You don't need a write barrier when erasing GC-scanned memory. The
    write barrier helps the collector see pointers that it might otherwise
    miss because it's cheating. It does not help the collector "forget" a
    value that it saw previously. (In particular, the old pointer value
    might be gone from the zeroed location, but without re-scanning
    everything there's no way to know that it doesn't still exist
    somewhere else.)

    --
    Greg Parker    <gparker...>    Runtime Wrangler
  • On Tue, Apr 15, 2008 at 4:53 PM, Greg Parker <gparker...> wrote:
    > You don't need a write barrier when erasing GC-scanned memory. The write
    > barrier helps the collector see pointers that it might otherwise miss
    > because it's cheating. It does not help the collector "forget" a value that
    > it saw previously. (In particular, the old pointer value might be gone from
    > the zeroed location, but without re-scanning everything there's no way to
    > know that it doesn't still exist somewhere else.)

    If this is the case then how does the collector know that you have
    cleared the memory. It seems to me that without a write barrier, the
    collector will not see the change and will think that that you
    continue to hold the old pointer in this memory. This will result in a
    memory leak of sorts, although it can't really grow without bound in
    most scenarios. But still, the collector needs to know when you nil
    out a variable so that it can know that a particular link no longer
    exists, just like it needs to know when you store a non-nil value so
    that it can know that a new link now exists.

    In other words, not using a write barrier for nil isn't a disaster but
    it can cause garbage to fail to be recognized as such.

    What did I miss?

    Mike
  • On 16 Apr 2008, at 03:29, Michael Ash wrote:

    > On Tue, Apr 15, 2008 at 4:53 PM, Greg Parker <gparker...>
    > wrote:
    >
    >> You don't need a write barrier when erasing GC-scanned memory. The
    >> write
    >> barrier helps the collector see pointers that it might otherwise miss
    >> because it's cheating. It does not help the collector "forget" a
    >> value that
    >> it saw previously. (In particular, the old pointer value might be
    >> gone from
    >> the zeroed location, but without re-scanning everything there's no
    >> way to
    >> know that it doesn't still exist somewhere else.)
    >
    > If this is the case then how does the collector know that you have
    > cleared the memory. It seems to me that without a write barrier, the
    > collector will not see the change and will think that that you
    > continue to hold the old pointer in this memory.

    No.  The garbage collector does not use reference counting, and so
    your statement is not true.  If you overwrite the last pointer to an
    object with a nil, the pointed-to object *may* survive the current
    garbage collection cycle, but it will not survive the next one.

    If you want to understand why, there are a number of books and papers
    on the subject, as well as numerous resources on the Internet that
    explain how garbage collectors are implemented.  One of the best is
    Paul R. Wilson's Uniprocessor Garbage Collection Techniques, which you
    can find here:

      ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps

    Cocoa GC is actually a concurrent collector, but if you read through
    the parts describing mark-sweep, incremental and generational
    collection you will have a good idea what the write barrier does and
    doesn't do and why it is needed.

    Richard Jones' Garbage Collection Page is also quite good, and he has
    written a book on the topic as well:

      http://www.cs.kent.ac.uk/people/staff/rej/gc.html

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Tue, Apr 15, 2008 at 7:29 PM, Michael Ash <michael.ash...> wrote:
    > On Tue, Apr 15, 2008 at 4:53 PM, Greg Parker <gparker...> wrote:
    >> You don't need a write barrier when erasing GC-scanned memory. The write
    >> barrier helps the collector see pointers that it might otherwise miss
    >> because it's cheating. It does not help the collector "forget" a value that
    >> it saw previously. (In particular, the old pointer value might be gone from
    >> the zeroed location, but without re-scanning everything there's no way to
    >> know that it doesn't still exist somewhere else.)
    >
    > If this is the case then how does the collector know that you have
    > cleared the memory. It seems to me that without a write barrier, the
    > collector will not see the change and will think that that you
    > continue to hold the old pointer in this memory.

    The purpose of the write barrier is to tell the collector to keep
    something alive; it has nothing to do with when to collect it.

    > This will result in a memory leak of sorts,

    It will lead to an object being kept around for the current GC cycle.
    However, the next time the collector scans, it will find that there
    are no references left to that particular object, and it will then
    collect it. This is the essential argument between supporters of
    manual memory management and garbage collection:

    Manual Memory Management:
    I can know exactly when my object is destroyed, and I can count on
    that fact (i.e. when I call free(), I know that that object is "gone"
    immediately)
    But it is complicated, and much of the housekeeping is easy to get wrong.

    Garbage Collection:
    The housekeeping is taken care of, I don't have to worry about it.
    However, I cannot know exactly when my object will be destroyed (it
    might be right away, it might be in a couple of seconds).

    > although it can't really grow without bound in
    > most scenarios. But still, the collector needs to know when you nil
    > out a variable so that it can know that a particular link no longer
    > exists, just like it needs to know when you store a non-nil value so
    > that it can know that a new link now exists.
    > In other words, not using a write barrier for nil isn't a disaster but
    > it can cause garbage to fail to be recognized as such.
    >
    > What did I miss?

    --
    Clark S. Cox III
    <clarkcox3...>
  • On Wed, Apr 16, 2008 at 5:28 AM, Alastair Houghton
    <alastair...> wrote:
    > On 16 Apr 2008, at 03:29, Michael Ash wrote:
    >> If this is the case then how does the collector know that you have
    >> cleared the memory. It seems to me that without a write barrier, the
    >> collector will not see the change and will think that that you
    >> continue to hold the old pointer in this memory.
    >
    > No.  The garbage collector does not use reference counting, and so your
    > statement is not true.  If you overwrite the last pointer to an object with
    > a nil, the pointed-to object *may* survive the current garbage collection
    > cycle, but it will not survive the next one.

    It has nothing to do with reference counting. I thought that the
    collector was using write barriers as a shortcut to knowing when a
    block of memory was modified so that it could avoid constantly
    re-scanning unchanging blocks. But now I see that it's only a
    mechanism to avoid stopping all threads while scanning. This is kind
    of disappointing because it means that an application which uses GC
    must necessarily have its working set equal to the sum total of all
    scanned memory plus the working set in any unscanned memory, but what
    can you do.

    Thanks for the links.

    Mike
  • On Wed, Apr 16, 2008 at 12:04 PM, Clark Cox <clarkcox3...> wrote:
    > The purpose of the write barrier is to tell the collector to keep
    > something alive; it has nothing to do with when to collect it.

    Right, I get that now.

    [snip]
    > Garbage Collection:
    > The housekeeping is taken care of, I don't have to worry about it.
    > However, I cannot know exactly when my object will be destroyed (it
    > might be right away, it might be in a couple of seconds).

    I don't see the relevance of this. The Cocoa GC explicitly only works
    properly if you play by the rules it sets out. For example, if you
    store a pointer in unscanned memory then you're playing with fire and
    the object may well have been destroyed by the next time you try to
    use it. Skipping write barriers likewise. I had thought that using
    write barriers for clearing memory was part of the required rules, but
    now it appears that it is not. But regardless, Cocoa GC only takes
    care of its housekeeping when you take care of yours.

    Mike
  • On Wed, Apr 16, 2008 at 9:26 AM, Michael Ash <michael.ash...> wrote:
    > On Wed, Apr 16, 2008 at 12:04 PM, Clark Cox <clarkcox3...> wrote:
    >> The purpose of the write barrier is to tell the collector to keep
    >> something alive; it has nothing to do with when to collect it.
    >
    > Right, I get that now.
    >
    > [snip]
    >
    >> Garbage Collection:
    >> The housekeeping is taken care of, I don't have to worry about it.
    >> However, I cannot know exactly when my object will be destroyed (it
    >> might be right away, it might be in a couple of seconds).
    >
    > I don't see the relevance of this. The Cocoa GC explicitly only works
    > properly if you play by the rules it sets out. For example, if you
    > store a pointer in unscanned memory then you're playing with fire and
    > the object may well have been destroyed by the next time you try to
    > use it.

    But, in normal Cocoa patterns, after doing some relatively trivial
    replacements (i.e. use NSAllocateCollectable instead of malloc, etc.),
    it usually takes effort to store pointers in unscanned memory.

    > Skipping write barriers likewise.

    Again, under most circumstances, it usually takes effort to skip the
    write barriers, as they are added automatically by the compiler.

    > I had thought that using
    > write barriers for clearing memory was part of the required rules, but
    > now it appears that it is not. But regardless, Cocoa GC only takes
    > care of its housekeeping when you take care of yours.

    Indeed. GC doesn't allow the developer to abdicate all memory
    management responsibility; however, the responsibilities that one
    still has are  much smaller in scope and severity (i.e. "Don't store a
    GC-managed pointer into non-scanned memory") , and the exceptions to
    the rules are few and far between. In the past year, I've had to make
    a special allowance for the garbage collector only once that I can
    recall.

    Generally, the only difference between my non-GC and my GC code is
    that the GC code lacks retain, release and autorelease calls as well
    as dealloc implementations. At this point, if I could do away with
    writing pre-GC Cocoa code, I would do so in a heartbeat.

    --
    Clark S. Cox III
    <clarkcox3...>
  • On 16 Apr 2008, at 17:22, Michael Ash wrote:

    > On Wed, Apr 16, 2008 at 5:28 AM, Alastair Houghton
    > <alastair...> wrote:
    >
    >> On 16 Apr 2008, at 03:29, Michael Ash wrote:
    >>> If this is the case then how does the collector know that you have
    >>> cleared the memory. It seems to me that without a write barrier, the
    >>> collector will not see the change and will think that that you
    >>> continue to hold the old pointer in this memory.
    >>
    >> No.  The garbage collector does not use reference counting, and so
    >> your
    >> statement is not true.  If you overwrite the last pointer to an
    >> object with
    >> a nil, the pointed-to object *may* survive the current garbage
    >> collection
    >> cycle, but it will not survive the next one.
    >
    > It has nothing to do with reference counting. I thought that the
    > collector was using write barriers as a shortcut to knowing when a
    > block of memory was modified so that it could avoid constantly
    > re-scanning unchanging blocks.

    Ah, I see.  Yes, you're right that write barriers have been used for
    that kind of thing.

    We've just had a number of posts recently where the assumption has
    been that the write barrier was just a way of doing automatic
    reference counting (or similar) behind the scenes.

    > But now I see that it's only a
    > mechanism to avoid stopping all threads while scanning. This is kind
    > of disappointing because it means that an application which uses GC
    > must necessarily have its working set equal to the sum total of all
    > scanned memory plus the working set in any unscanned memory, but what
    > can you do.

    Not necessarily; there are other mechanisms that could be used to
    implement that optimisation, for instance page table dirty bits.
    Also, it might be possible in some cases to use the mincore() function
    to avoid touching pages that have been swapped out.  Another approach
    might be to use the write barrier as a hint that objects need to be re-
    scanned, and then periodically do a full scan to catch garbage that
    has been created by non-write-barriered zeroing.  Whether libauto/
    Cocoa GC does any of those things I have no idea.

    Also, the fact that the GC is generational, coupled with the fact that
    it seems unlikely that you will create and retain many rarely-used
    objects (if you were doing that, why not just release them?) means
    that while theoretically a problem I don't believe the working set
    size will be such a problem in practice.  It might be in particular
    types of programs, I suppose, but I think they would be atypical
    applications for Cocoa.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
previous month april 2008 next month
MTWTFSS
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30        
Go to today