FROM : Greg Parker
DATE : Mon Feb 04 22:13:30 2008
John Engelhart wrote:
> My first concern was with the use of "compiler assisted write
> barriers". The current public documentation is extremely vague as
> to what a 'write barrier' is, and I'm sure that the majority of you,
> like me, assumed the term referred to an "atomic write barrier /
> fence" used to ensure that all CPU's past the write barrier would
> see the same data at a given location. See `man 3 barrier` for a
> description of the OSMemoryBarrier() function that performs this
> operation. This would make some sense for a GC system, it would
> ensure that the use of a pointer is visible to the collector no
> matter what thread or CPU is using the pointer. From what I can
> tell, the term 'write barrier' as it is used by the GC documentation
> has absolutely nothing to do with this traditional meaning of the
> term.
In the GC literature, a "write barrier" is simply the code used to
write a pointer value. It is unrelated to memory barriers, though
sometimes the write barrier code includes a memory barrier.
Most garbage collectors use a write barrier that does more than just a
store instruction. Why? Performance. Without a write barrier (or,
rarely, a read barrier), a garbage collector has no choice but to stop
all threads and scan all memory. This is too slow for programs with
large heaps or threads with responsiveness constraints (e.g. audio
playback). With a write barrier, more sophisticated algorithms can be
used to reduce the amount of scanning or limit thread-stopped time.
The Boehm collector does not use a write barrier, in order to be
compatible with arbitrary C compilers.
> Anyone who's used garbage collection with C is probably familiar
> with the Boehm Garbage Collector. I believe that the Boehm GC
> library embodies what most people would expect of a garbage
> collection system: The programmer is freed from having to worry
> about memory allocations and when or where pointers to allocated
> memory are, it's the collectors job to find those pointers and piece
> together what memory is still pointed to by active pointers and
> reclaim the memory which has no live pointers referencing it. Very
> roughly, it does this by starting with a collection of root
> allocations. It scans these allocations looking for pointers, and
> then following those pointers and scanning those blocks of memory.
> It builds a graph of references from these root objects, and when
> all the memory has been scanned, memory allocations that are not
> part of this 'liveness' graph can be reclaimed. It makes no
> particular demands of the programmer or compiler, in fact it can be
> used as a drop in replacement for malloc() and free(), requiring no
> changes.
>
> From what I've pieced together, Leopards GC system is nothing like
> this. While the Boehm GC system detects liveness passively by
> scanning memory and looking for and tracing pointers, Leopards GC
> system does no scanning and requires /active/ notification of
> changes to the heap. This, I believe, is what a 'write-barrier'
> actually is: it is a function call to the GC system so that it can
> update it's internal state as to what memory is live. It relies, I
> suspect exclusively, on these function calls to track memory
> allocations.
Both Leopard's GC and the Boehm-Demers-Weiser GC are conservative
scanning collectors.
Like the Boehm collector, Leopards's GC traces from a set of roots.
Unlike the Boehm collector, the root set is a more limited set of
memory; for example, non-strong global variables are not part of the
root set.
Unlike the Boehm collector, Leopard's GC uses a write barrier; both
"scanning" and "active notification" are required for performance.
Unlike the Boehm collector, Leopard's GC does not scan memory
allocated with malloc(), nor does it manage blocks allocated with
malloc().
Unlike the Boehm collector, Leopard's GC never stops all threads at
once, and stops each thread for only a short period of time (just
enough to scan the thread registers and stack).
The Leopard GC is designed and optimized for Objective-C; it can be
used for ordinary C code, but it's not as easy to use for C code as
the Boehm collector.
> A consequence of all of this is that you must not pass pointers that
> may have been allocated by the garbage collector to any C function
> in a library.
>
> printf("String: %s\n", [@"Hello, world!" UTF8String]);
>
> passes a GC allocated pointer to a C library function, which almost
> assuredly does not have the proper write barrier logic in place to
> properly guard the pointer.
Not true. This example is perfectly safe, assuming printf() does not
store that pointer for use after the printf() call returns.
Leopard's GC includes every local variable and function parameter in
the root set. No write barriers are required for stack memory.
Ordinary C code can use GC pointers on the stack and as parameters.
> /ANY/ pointer that holds a pointer to memory that MAY be allocated
> from the garbage collector must be marked __strong. The compiler
> attempts to 'automagically' add __strong to certain types of pointer
> references, specifically 'id' and derivatives of 'id', namely class
> pointers (NSString *).
>
> Realistically, to properly add __strong to a pointer, you need to
> know if that allocation came from the garbage collector. This
> information is essentially impossible to know apriori, so the only
> practical course of action is to defensively qualify all pointers as
> __strong.
Luckily, it's easier than you describe for most code.
If your code uses only NSObjects and NSArrays, you don't have to do
anything.
If your code uses C pointers that you allocate and free yourself, you
don't have to do anything.
If your code uses C pointers, and you don't know where they came from,
but you only store those pointers on the stack, you don't have to do
anything.
If your code stores Objective-C pointers or other pointers of unknown
provenance into malloc blocks or global variables or Objective-C
ivars, then you do need to do extra work. Usually "extra work" means
"mark the variable as __strong".
--
Greg Parker <email_removed> Runtime Wrangler
DATE : Mon Feb 04 22:13:30 2008
John Engelhart wrote:
> My first concern was with the use of "compiler assisted write
> barriers". The current public documentation is extremely vague as
> to what a 'write barrier' is, and I'm sure that the majority of you,
> like me, assumed the term referred to an "atomic write barrier /
> fence" used to ensure that all CPU's past the write barrier would
> see the same data at a given location. See `man 3 barrier` for a
> description of the OSMemoryBarrier() function that performs this
> operation. This would make some sense for a GC system, it would
> ensure that the use of a pointer is visible to the collector no
> matter what thread or CPU is using the pointer. From what I can
> tell, the term 'write barrier' as it is used by the GC documentation
> has absolutely nothing to do with this traditional meaning of the
> term.
In the GC literature, a "write barrier" is simply the code used to
write a pointer value. It is unrelated to memory barriers, though
sometimes the write barrier code includes a memory barrier.
Most garbage collectors use a write barrier that does more than just a
store instruction. Why? Performance. Without a write barrier (or,
rarely, a read barrier), a garbage collector has no choice but to stop
all threads and scan all memory. This is too slow for programs with
large heaps or threads with responsiveness constraints (e.g. audio
playback). With a write barrier, more sophisticated algorithms can be
used to reduce the amount of scanning or limit thread-stopped time.
The Boehm collector does not use a write barrier, in order to be
compatible with arbitrary C compilers.
> Anyone who's used garbage collection with C is probably familiar
> with the Boehm Garbage Collector. I believe that the Boehm GC
> library embodies what most people would expect of a garbage
> collection system: The programmer is freed from having to worry
> about memory allocations and when or where pointers to allocated
> memory are, it's the collectors job to find those pointers and piece
> together what memory is still pointed to by active pointers and
> reclaim the memory which has no live pointers referencing it. Very
> roughly, it does this by starting with a collection of root
> allocations. It scans these allocations looking for pointers, and
> then following those pointers and scanning those blocks of memory.
> It builds a graph of references from these root objects, and when
> all the memory has been scanned, memory allocations that are not
> part of this 'liveness' graph can be reclaimed. It makes no
> particular demands of the programmer or compiler, in fact it can be
> used as a drop in replacement for malloc() and free(), requiring no
> changes.
>
> From what I've pieced together, Leopards GC system is nothing like
> this. While the Boehm GC system detects liveness passively by
> scanning memory and looking for and tracing pointers, Leopards GC
> system does no scanning and requires /active/ notification of
> changes to the heap. This, I believe, is what a 'write-barrier'
> actually is: it is a function call to the GC system so that it can
> update it's internal state as to what memory is live. It relies, I
> suspect exclusively, on these function calls to track memory
> allocations.
Both Leopard's GC and the Boehm-Demers-Weiser GC are conservative
scanning collectors.
Like the Boehm collector, Leopards's GC traces from a set of roots.
Unlike the Boehm collector, the root set is a more limited set of
memory; for example, non-strong global variables are not part of the
root set.
Unlike the Boehm collector, Leopard's GC uses a write barrier; both
"scanning" and "active notification" are required for performance.
Unlike the Boehm collector, Leopard's GC does not scan memory
allocated with malloc(), nor does it manage blocks allocated with
malloc().
Unlike the Boehm collector, Leopard's GC never stops all threads at
once, and stops each thread for only a short period of time (just
enough to scan the thread registers and stack).
The Leopard GC is designed and optimized for Objective-C; it can be
used for ordinary C code, but it's not as easy to use for C code as
the Boehm collector.
> A consequence of all of this is that you must not pass pointers that
> may have been allocated by the garbage collector to any C function
> in a library.
>
> printf("String: %s\n", [@"Hello, world!" UTF8String]);
>
> passes a GC allocated pointer to a C library function, which almost
> assuredly does not have the proper write barrier logic in place to
> properly guard the pointer.
Not true. This example is perfectly safe, assuming printf() does not
store that pointer for use after the printf() call returns.
Leopard's GC includes every local variable and function parameter in
the root set. No write barriers are required for stack memory.
Ordinary C code can use GC pointers on the stack and as parameters.
> /ANY/ pointer that holds a pointer to memory that MAY be allocated
> from the garbage collector must be marked __strong. The compiler
> attempts to 'automagically' add __strong to certain types of pointer
> references, specifically 'id' and derivatives of 'id', namely class
> pointers (NSString *).
>
> Realistically, to properly add __strong to a pointer, you need to
> know if that allocation came from the garbage collector. This
> information is essentially impossible to know apriori, so the only
> practical course of action is to defensively qualify all pointers as
> __strong.
Luckily, it's easier than you describe for most code.
If your code uses only NSObjects and NSArrays, you don't have to do
anything.
If your code uses C pointers that you allocate and free yourself, you
don't have to do anything.
If your code uses C pointers, and you don't know where they came from,
but you only store those pointers on the stack, you don't have to do
anything.
If your code stores Objective-C pointers or other pointers of unknown
provenance into malloc blocks or global variables or Objective-C
ivars, then you do need to do extra work. Usually "extra work" means
"mark the variable as __strong".
--
Greg Parker <email_removed> Runtime Wrangler
| Related mails | Author | Date |
|---|---|---|
| No related mails found. | ||






Cocoa mail archive

