NSAllocateCollectable() questions
-
Hi all,
This is a re-post since I didn't receive any response on Sunday.
I've been writing a library that uses NSAllocateCollectable() quite a
bit and I have a few questions about proper usage.
- Copying data
if I am copying to a malloc'd block, I can use memmove() regardless of
whether the source is GC'd or not, right?
if I am copying to a GC block allocated with NSScannedOption, I need
to use objc_memmove_collectable(), right?
if I am copying to a GC block allocated with nonscanned memory, I can
use memmove(), right?
- Zero'ing data
There does not seem to be a GC-compatible bzero(). If I loop through
and zero out each pointer in a scanned block, that would generate a
write barrier for each pointer which is expensive. However, if I use
bzero(), then libauto won't know that I've messed with the block.
Will it eventually figure it out when it does an exhaustive scan? Or
will it never notice that I've zero'd out the block?
Thanks,
Brendan -
On 15 Apr 2008, at 06:42, Brendan Younger wrote:
> Hi all,
>
> This is a re-post since I didn't receive any response on Sunday.
>
> I've been writing a library that uses NSAllocateCollectable() quite
> a bit and I have a few questions about proper usage.
The problem is that your questions are quite involved and so the only
people who can give a definitive answer at this point are the people
inside Apple who deal with the GC. Some of them do read the list, but
since the mailing lists are volunteer-based, and since there's a lot
of traffic and they're busy people, you might not always get their
attention.
> - Copying data
> if I am copying to a malloc'd block, I can use memmove() regardless
> of whether the source is GC'd or not, right?
> if I am copying to a GC block allocated with NSScannedOption, I need
> to use objc_memmove_collectable(), right?
> if I am copying to a GC block allocated with nonscanned memory, I
> can use memmove(), right?
I *think* all three of these are correct. Obviously in the first and
last cases you don't want to have any pointers to GCable objects in
the memory concerned, as they won't be traced by the collector.
> - Zero'ing data
> There does not seem to be a GC-compatible bzero(). If I loop
> through and zero out each pointer in a scanned block, that would
> generate a write barrier for each pointer which is expensive.
> However, if I use bzero(), then libauto won't know that I've messed
> with the block. Will it eventually figure it out when it does an
> exhaustive scan? Or will it never notice that I've zero'd out the
> block?
I might be wrong, but I don't *think* libauto will care if you zero
something out; the problems arise when you write a pointer somewhere
without a write barrier, expecting the pointed-to object to remain
live. There are two problems that doing that can cause:
1. If you write into an object that's part of an older generation, in
which case the collector might not know that it needs to use this
pointer as a root when collecting younger generations.
2. If you write into an object that has already been scanned during
this GC cycle, in which case the collector won't know about the new
pointer.
In either case, the GC may as a result throw away a live object,
resulting in a later crash. Now I don't have access to the GC source
code, so there could be other potential problems also, but those are
the two common ones from the literature.
But like I say, you'll need someone from Apple to give you a
definitive answer on this.
FWIW there are a few other dangers with NSAllocateCollectable(). See
e.g.
<http://lists.apple.com/archives/cocoa-dev/2008/Feb/msg01530.html>
Kind regards,
Alastair.
--
http://alastairs-place.net -
Brendan Younger wrote:
> I've been writing a library that uses NSAllocateCollectable() quite
> a bit and I have a few questions about proper usage.
For those of you playing along at home, let me summarize the reasons
for the garbage collector's special functions.
Some heap blocks are "GC-managed". The garbage collector will destroy
a GC-managed block if no pointers to it are found during the GC's
scan. NSAllocateCollectable() and [NSObject alloc] return GC-managed
blocks. malloc() does not.
Some memory areas are "GC-scanned". The garbage collector will look in
these regions for interesting pointer values.
NSAllocateCollectable(NSScannedOption) returns a block of scanned
memory. NSAllocateCollectable(no NSScannedOption) and malloc() do not;
their contents are invisible to the garbage collector.
The Rule for Write Barriers:
If you write a pointer to a GC-managed block into GC-scanned heap
memory, you must use an appropriate write barrier.
Why? Speed. Write barriers allow the garbage collector to cheat, like
not scanning large swaths of memory, or allowing other threads to
continue running (and modifying memory) while a scan is in progress.
Without write barriers, the collector might miss a pointer because
it's cheating.
Usually, the compiler does this for you. If `variable` is of an
Objective-C object type or a __strong pointer type, and you write
`variable = value`, then the compiler will emit an appropriate write
barrier for you.
Note that __weak memory, and memory outside the heap like globals and
thread stacks, are all handled differently.
> - Copying data
> if I am copying to a malloc'd block, I can use memmove() regardless
> of whether the source is GC'd or not, right?
> if I am copying to a GC block allocated with nonscanned memory, I
> can use memmove(), right?
Correct. If you're writing to GC-unscanned heap memory, then you don't
need a write barrier. Of course, the contents of the malloc block and
managed-but-unscanned block are invisible to the garbage collector, so
you need to be wary of writing GC-managed pointers into it.
> if I am copying to a GC block allocated with NSScannedOption, I need
> to use objc_memmove_collectable(), right?
Correct. The only exception is if the memory being copied contains no
GC-managed pointer values then you may use memmove(). If you're not
sure, use objc_memmove_collectable().
> - Zero'ing data
> There does not seem to be a GC-compatible bzero(). If I loop through
> and zero out each pointer in a scanned block, that would generate a
> write barrier for each pointer which is expensive. However, if I use
> bzero(), then libauto won't know that I've messed with the block.
> Will it eventually figure it out when it does an exhaustive scan? Or
> will it never notice that I've zero'd out the block?
You don't need a write barrier when erasing GC-scanned memory. The
write barrier helps the collector see pointers that it might otherwise
miss because it's cheating. It does not help the collector "forget" a
value that it saw previously. (In particular, the old pointer value
might be gone from the zeroed location, but without re-scanning
everything there's no way to know that it doesn't still exist
somewhere else.)
--
Greg Parker <gparker...> Runtime Wrangler -
On Tue, Apr 15, 2008 at 4:53 PM, Greg Parker <gparker...> wrote:
> You don't need a write barrier when erasing GC-scanned memory. The write
> barrier helps the collector see pointers that it might otherwise miss
> because it's cheating. It does not help the collector "forget" a value that
> it saw previously. (In particular, the old pointer value might be gone from
> the zeroed location, but without re-scanning everything there's no way to
> know that it doesn't still exist somewhere else.)
If this is the case then how does the collector know that you have
cleared the memory. It seems to me that without a write barrier, the
collector will not see the change and will think that that you
continue to hold the old pointer in this memory. This will result in a
memory leak of sorts, although it can't really grow without bound in
most scenarios. But still, the collector needs to know when you nil
out a variable so that it can know that a particular link no longer
exists, just like it needs to know when you store a non-nil value so
that it can know that a new link now exists.
In other words, not using a write barrier for nil isn't a disaster but
it can cause garbage to fail to be recognized as such.
What did I miss?
Mike -
On 16 Apr 2008, at 03:29, Michael Ash wrote:
> On Tue, Apr 15, 2008 at 4:53 PM, Greg Parker <gparker...>
> wrote:
>
>> You don't need a write barrier when erasing GC-scanned memory. The
>> write
>> barrier helps the collector see pointers that it might otherwise miss
>> because it's cheating. It does not help the collector "forget" a
>> value that
>> it saw previously. (In particular, the old pointer value might be
>> gone from
>> the zeroed location, but without re-scanning everything there's no
>> way to
>> know that it doesn't still exist somewhere else.)
>
> If this is the case then how does the collector know that you have
> cleared the memory. It seems to me that without a write barrier, the
> collector will not see the change and will think that that you
> continue to hold the old pointer in this memory.
No. The garbage collector does not use reference counting, and so
your statement is not true. If you overwrite the last pointer to an
object with a nil, the pointed-to object *may* survive the current
garbage collection cycle, but it will not survive the next one.
If you want to understand why, there are a number of books and papers
on the subject, as well as numerous resources on the Internet that
explain how garbage collectors are implemented. One of the best is
Paul R. Wilson's Uniprocessor Garbage Collection Techniques, which you
can find here:
ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps
Cocoa GC is actually a concurrent collector, but if you read through
the parts describing mark-sweep, incremental and generational
collection you will have a good idea what the write barrier does and
doesn't do and why it is needed.
Richard Jones' Garbage Collection Page is also quite good, and he has
written a book on the topic as well:
http://www.cs.kent.ac.uk/people/staff/rej/gc.html
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Tue, Apr 15, 2008 at 7:29 PM, Michael Ash <michael.ash...> wrote:
> On Tue, Apr 15, 2008 at 4:53 PM, Greg Parker <gparker...> wrote:
>> You don't need a write barrier when erasing GC-scanned memory. The write
>> barrier helps the collector see pointers that it might otherwise miss
>> because it's cheating. It does not help the collector "forget" a value that
>> it saw previously. (In particular, the old pointer value might be gone from
>> the zeroed location, but without re-scanning everything there's no way to
>> know that it doesn't still exist somewhere else.)
>
> If this is the case then how does the collector know that you have
> cleared the memory. It seems to me that without a write barrier, the
> collector will not see the change and will think that that you
> continue to hold the old pointer in this memory.
The purpose of the write barrier is to tell the collector to keep
something alive; it has nothing to do with when to collect it.
> This will result in a memory leak of sorts,
It will lead to an object being kept around for the current GC cycle.
However, the next time the collector scans, it will find that there
are no references left to that particular object, and it will then
collect it. This is the essential argument between supporters of
manual memory management and garbage collection:
Manual Memory Management:
I can know exactly when my object is destroyed, and I can count on
that fact (i.e. when I call free(), I know that that object is "gone"
immediately)
But it is complicated, and much of the housekeeping is easy to get wrong.
Garbage Collection:
The housekeeping is taken care of, I don't have to worry about it.
However, I cannot know exactly when my object will be destroyed (it
might be right away, it might be in a couple of seconds).
> although it can't really grow without bound in
> most scenarios. But still, the collector needs to know when you nil
> out a variable so that it can know that a particular link no longer
> exists, just like it needs to know when you store a non-nil value so
> that it can know that a new link now exists.
> In other words, not using a write barrier for nil isn't a disaster but
> it can cause garbage to fail to be recognized as such.
>
> What did I miss?
--
Clark S. Cox III
<clarkcox3...> -
On Wed, Apr 16, 2008 at 5:28 AM, Alastair Houghton
<alastair...> wrote:
> On 16 Apr 2008, at 03:29, Michael Ash wrote:
>> If this is the case then how does the collector know that you have
>> cleared the memory. It seems to me that without a write barrier, the
>> collector will not see the change and will think that that you
>> continue to hold the old pointer in this memory.
>
> No. The garbage collector does not use reference counting, and so your
> statement is not true. If you overwrite the last pointer to an object with
> a nil, the pointed-to object *may* survive the current garbage collection
> cycle, but it will not survive the next one.
It has nothing to do with reference counting. I thought that the
collector was using write barriers as a shortcut to knowing when a
block of memory was modified so that it could avoid constantly
re-scanning unchanging blocks. But now I see that it's only a
mechanism to avoid stopping all threads while scanning. This is kind
of disappointing because it means that an application which uses GC
must necessarily have its working set equal to the sum total of all
scanned memory plus the working set in any unscanned memory, but what
can you do.
Thanks for the links.
Mike -
On Wed, Apr 16, 2008 at 12:04 PM, Clark Cox <clarkcox3...> wrote:
> The purpose of the write barrier is to tell the collector to keep
> something alive; it has nothing to do with when to collect it.
Right, I get that now.
[snip]
> Garbage Collection:
> The housekeeping is taken care of, I don't have to worry about it.
> However, I cannot know exactly when my object will be destroyed (it
> might be right away, it might be in a couple of seconds).
I don't see the relevance of this. The Cocoa GC explicitly only works
properly if you play by the rules it sets out. For example, if you
store a pointer in unscanned memory then you're playing with fire and
the object may well have been destroyed by the next time you try to
use it. Skipping write barriers likewise. I had thought that using
write barriers for clearing memory was part of the required rules, but
now it appears that it is not. But regardless, Cocoa GC only takes
care of its housekeeping when you take care of yours.
Mike -
On Wed, Apr 16, 2008 at 9:26 AM, Michael Ash <michael.ash...> wrote:
> On Wed, Apr 16, 2008 at 12:04 PM, Clark Cox <clarkcox3...> wrote:
>> The purpose of the write barrier is to tell the collector to keep
>> something alive; it has nothing to do with when to collect it.
>
> Right, I get that now.
>
> [snip]
>
>> Garbage Collection:
>> The housekeeping is taken care of, I don't have to worry about it.
>> However, I cannot know exactly when my object will be destroyed (it
>> might be right away, it might be in a couple of seconds).
>
> I don't see the relevance of this. The Cocoa GC explicitly only works
> properly if you play by the rules it sets out. For example, if you
> store a pointer in unscanned memory then you're playing with fire and
> the object may well have been destroyed by the next time you try to
> use it.
But, in normal Cocoa patterns, after doing some relatively trivial
replacements (i.e. use NSAllocateCollectable instead of malloc, etc.),
it usually takes effort to store pointers in unscanned memory.
> Skipping write barriers likewise.
Again, under most circumstances, it usually takes effort to skip the
write barriers, as they are added automatically by the compiler.
> I had thought that using
> write barriers for clearing memory was part of the required rules, but
> now it appears that it is not. But regardless, Cocoa GC only takes
> care of its housekeeping when you take care of yours.
Indeed. GC doesn't allow the developer to abdicate all memory
management responsibility; however, the responsibilities that one
still has are much smaller in scope and severity (i.e. "Don't store a
GC-managed pointer into non-scanned memory") , and the exceptions to
the rules are few and far between. In the past year, I've had to make
a special allowance for the garbage collector only once that I can
recall.
Generally, the only difference between my non-GC and my GC code is
that the GC code lacks retain, release and autorelease calls as well
as dealloc implementations. At this point, if I could do away with
writing pre-GC Cocoa code, I would do so in a heartbeat.
--
Clark S. Cox III
<clarkcox3...> -
On 16 Apr 2008, at 17:22, Michael Ash wrote:
> On Wed, Apr 16, 2008 at 5:28 AM, Alastair Houghton
> <alastair...> wrote:
>
>> On 16 Apr 2008, at 03:29, Michael Ash wrote:
>>> If this is the case then how does the collector know that you have
>>> cleared the memory. It seems to me that without a write barrier, the
>>> collector will not see the change and will think that that you
>>> continue to hold the old pointer in this memory.
>>
>> No. The garbage collector does not use reference counting, and so
>> your
>> statement is not true. If you overwrite the last pointer to an
>> object with
>> a nil, the pointed-to object *may* survive the current garbage
>> collection
>> cycle, but it will not survive the next one.
>
> It has nothing to do with reference counting. I thought that the
> collector was using write barriers as a shortcut to knowing when a
> block of memory was modified so that it could avoid constantly
> re-scanning unchanging blocks.
Ah, I see. Yes, you're right that write barriers have been used for
that kind of thing.
We've just had a number of posts recently where the assumption has
been that the write barrier was just a way of doing automatic
reference counting (or similar) behind the scenes.
> But now I see that it's only a
> mechanism to avoid stopping all threads while scanning. This is kind
> of disappointing because it means that an application which uses GC
> must necessarily have its working set equal to the sum total of all
> scanned memory plus the working set in any unscanned memory, but what
> can you do.
Not necessarily; there are other mechanisms that could be used to
implement that optimisation, for instance page table dirty bits.
Also, it might be possible in some cases to use the mincore() function
to avoid touching pages that have been swapped out. Another approach
might be to use the write barrier as a hint that objects need to be re-
scanned, and then periodically do a full scan to catch garbage that
has been created by non-write-barriered zeroing. Whether libauto/
Cocoa GC does any of those things I have no idea.
Also, the fact that the GC is generational, coupled with the fact that
it seems unlikely that you will create and retain many rarely-used
objects (if you were doing that, why not just release them?) means
that while theoretically a problem I don't believe the working set
size will be such a problem in practice. It might be in particular
types of programs, I suppose, but I think they would be atypical
applications for Cocoa.
Kind regards,
Alastair.
--
http://alastairs-place.net


