Skip navigation.
 
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
FROM : Alastair Houghton
DATE : Mon Feb 04 14:11:59 2008

On 4 Feb 2008, at 01:57, John Engelhart wrote:

> I've had several reservations about Leopard's GC system since I 
> started working with it.  There is very little documentation on 
> Leopards GC system, so the following has been pieced together by 
> inference and observations of how the garbage collection system 
> seems to work.  My first concern was with the use of "compiler 
> assisted write barriers".  The current public documentation is 
> extremely vague as to what a 'write barrier' is,


[snip]

> From what I can tell, the term 'write barrier' as it is used by the 
> GC documentation has absolutely nothing to do with this traditional 
> meaning of the term.


The meaning of "write barrier" in this context is the traditional one 
in the world of garbage collection, which has been around a lot longer 
than other meaning.  It's certainly traditional though; e.g. see

  <ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps>

or the excellent book "Garbage Collection: Algorithms for Automatic 
Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN 
0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).

The GC docs actually explain what the write barrier is used for here:

<http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection/Articles/gcArchitecture.html#//apple_ref/doc/uid/TP40002451-SW4
>

> Anyone who's used garbage collection with C is probably familiar 
> with the Boehm Garbage Collector.  I believe that the Boehm GC 
> library embodies what most people would expect of a garbage 
> collection system:  The programmer is freed from having to worry 
> about memory allocations


[snip]

> It makes no particular demands of the programmer or compiler, in 
> fact it can be used as a drop in replacement for malloc() and 
> free(), requiring no changes.
>
> From what I've pieced together, Leopards GC system is nothing like 
> this.  While the Boehm GC system detects liveness passively by 
> scanning memory and looking for and tracing pointers, Leopards GC 
> system does no scanning and requires /active/ notification of 
> changes to the heap.  This, I believe, is what a 'write-barrier' 
> actually is: it is a function call to the GC system so that it can 
> update it's internal state as to what memory is live.  It relies, I 
> suspect exclusively, on these function calls to track memory 
> allocations.


The Boehm GC and the Leopard Cocoa GC have very different design 
goals.  In the case of Boehm's collector, it's a requirement that the 
collector work without any assistance from the compiler; as a result, 
it has to use "conservative" techniques, which may in general result 
in leaks of arbitrary amounts of memory simply because of a stray 
value that *looks like* a pointer to something.  The lack of compiler 
assistance means that it's almost impossible to write a collector that 
will run in the background (the Boehm collector has to stop *all* the 
other threads in your program every so often if you run it in the 
background), and it's difficult to implement generational behaviour 
without relying on platform-specific features such as access to dirty 
bits from the system page table...  Even in that case, use of dirty 
bits is woefully inefficient compared to compiler co-operation, since 
a single dirty bit means you must re-scan an entire page of memory. 
The Boehm GC is very clever, certainly, but it has to cope with these 
limitations (and more besides).

Cocoa GC, on the other hand, is able to co-operate with the compiler, 
and that's what the write barriers are.  You have mis-interpreted 
their function; they exist to track inter-generational pointers, not 
to enable some sort of behind-the-scenes reference counting as I think 
you imply.  They may also be used to help the collector to obtain a 
consistent view of the mutator's objects in spite of running in the 
background...  I don't know whether the Leopard GC does that or not.

(Incidentally, there is also a read barrier, which is used to help 
implement zeroing weak references; the compiler only generates that 
for variables marked __weak.)

I think, perhaps, that it would be worth your while reading through 
the literature on garbage collection, as you might then understand the 
various trade-offs involved better.

> In order for leopards GC system to function properly, the compiler 
> must be aware of all pointers that have been allocated by the GC 
> system so that it can wrap all uses of the pointer with the 
> appropriate GC notification functions (objc_assign*).


Yep.

[snip]

> Realistically, to properly add __strong to a pointer, you need to 
> know if that allocation came from the garbage collector.  This 
> information is essentially impossible to know apriori, so the only 
> practical course of action is to defensively qualify all pointers as 
> __strong.


No.  Cocoa GC mostly deals with objects (which may include Core 
Foundation objects).  That's why the default assumption, which is that 
object pointers are strong, is enough for most situations.

That only changes if you have pointers of non-object types that happen 
to point to things that were allocated with the GC, *and only then* if 
they are stored in locations that are not scanned by default.  This is 
an unusual situation, since few methods return things that are 
allocated by GC and that are not objects.  -UTF8String is probably the 
most common example, but since you tend not to store the result of 
that method, there would rarely---if ever---be a problem.

> The consequence of using a pointer that is not properly qualified as 
> __strong is that the GC system may determine that the allocation is 
> no longer live and reclaim it, even if there is still a valid 
> pointer out there.


Only if there is no copy of the pointer in any of the locations that 
are scanned by default (e.g. the stack, in registers, in global 
variables).

> It is also trivial to get wrong, and the only indication that 
> there's a problem is an occasional random error or crash.


In most cases, because GC'd things are objects, it's trivial to get 
*right*.

It's only in special cases, where you're using C pointer types to 
point to GC'd memory, that you need worry about this kind of thing.

> I believe I have a succinct example that illustrates these issues:


[snip]

> I strongly suspect the pointer that UTF8String returns is a pointer 
> to an allocation from the garbage collector.  In fact, by changing 
> the 'title' ivar to include __strong 'solves' the problem.


Yes, that's your bug.  It doesn't just 'solve' the problem, the lack 
of __strong here *is* the problem, but only because this is an ivar 
and not e.g. a function argument or a stack-based variable.

> But this points to a much bigger problem:  anyone who has used 
> UTF8String and not qualified it as __strong has a race condition 
> just waiting to happen.


No, because stack variables and registers are included in the set of 
GC roots.

> This is but one example.  I don't think I need to point out that 
> there are others.  A lot of others.  And most of them are non-
> obvious.  A consequence of all of this is that you must not pass 
> pointers that may have been allocated by the garbage collector to 
> any C function in a library.  For example,
>
> printf("String: %s\n", [@"Hello, world!" UTF8String]);


That code is fine.  The reference is on the stack (or, before that, in 
the register that holds the return value of -UTF8String).  It will be 
followed, so the memory won't be released until the printf() function 
has finished with it.

> passes a GC allocated pointer to a C library function, which almost 
> assuredly does not have the proper write barrier logic in place to 
> properly guard the pointer.


The write barrier is nothing to do with it.  The write barrier is for 
inter-generational pointers, and possibly also to help the collector 
to scan in the background safely.

Kind regards,

Alastair.

--
http://alastairs-place.net

Related mailsAuthorDate
mlUse of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 4, 02:57
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Quincey Morris Feb 4, 05:09
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful j o a r Feb 4, 08:18
mlre: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Ben Trumbull Feb 4, 10:19
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Alastair Houghton Feb 4, 14:11
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Greg Titus Feb 4, 16:39
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Sean McBride Feb 4, 22:40
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 5, 01:14
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Chris Hanson Feb 5, 02:21
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Jonathon Mah Feb 5, 08:47
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Alastair Houghton Feb 5, 13:40
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 6, 10:39
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Alastair Houghton Feb 6, 12:15
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 6, 15:59
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Michael Tsai Feb 6, 16:12
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful glenn andreas Feb 6, 16:45
mlBug in CF/NSString's no-copy constructors (was Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful) Alastair Houghton Feb 6, 16:55
mlRe: Bug in CF/NSString's no-copy constructors (was Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful) Michael Tsai Feb 6, 17:46
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Clark Cox Feb 6, 17:47
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Jean-Daniel Dupas Feb 6, 18:05
mlRe: Bug in CF/NSString's no-copy constructors Alastair Houghton Feb 6, 18:46
ml[Moderator] Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Scott Anguish Feb 6, 18:46
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Hamish Allan Feb 6, 18:52
mlRe: Bug in CF/NSString's no-copy constructors Michael Tsai Feb 6, 19:29
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Hamish Allan Feb 6, 19:36
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Alastair Houghton Feb 6, 20:12
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful glenn andreas Feb 6, 21:23
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Sean McBride Feb 6, 21:47
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Michael Ash Feb 6, 22:38
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Hamish Allan Feb 6, 23:49
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 7, 02:06
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Hamish Allan Feb 7, 02:40
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Chris Suter Feb 7, 02:48
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 7, 02:48
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Clark Cox Feb 7, 03:52
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful glenn andreas Feb 7, 04:01
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Timothy Reaves Feb 7, 04:08
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Brady Duga Feb 7, 04:22
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 7, 07:14
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 7, 07:14
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Michael Ash Feb 7, 23:58
mlRe: Bug in CF/NSString's no-copy constructors John Engelhart Feb 8, 03:31
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful John Engelhart Feb 8, 05:23
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Greg Parker Feb 8, 05:30
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Michael Tsai Feb 8, 05:56
mlRe: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Hamish Allan Feb 8, 18:35
ml[Moderator] Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Scott Anguish Feb 8, 22:17