Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful
-
This is bound to be an inflammatory subject. That is not my intent,
and I mean no disrespect to the programmers who worked on the GC
system. I'm quite sure that adding GC to Cocoa is a non-trivial, near
impossible task, filled with trade-offs between "really bad" and "even
worse." Also understand that a bit of 'authoritative' documentation
or instruction can out right negate some of the points I make below as
I have had only the publicly available documentation and my
(relatively brief 2-3 months) experiences with the 10.5 GC system to
form these opinions....
I've had several reservations about Leopard's GC system since I
started working with it. There is very little documentation on
Leopards GC system, so the following has been pieced together by
inference and observations of how the garbage collection system seems
to work. My first concern was with the use of "compiler assisted
write barriers". The current public documentation is extremely vague
as to what a 'write barrier' is, and I'm sure that the majority of
you, like me, assumed the term referred to an "atomic write barrier /
fence" used to ensure that all CPU's past the write barrier would see
the same data at a given location. See `man 3 barrier` for a
description of the OSMemoryBarrier() function that performs this
operation. This would make some sense for a GC system, it would ensure
that the use of a pointer is visible to the collector no matter what
thread or CPU is using the pointer. From what I can tell, the term
'write barrier' as it is used by the GC documentation has absolutely
nothing to do with this traditional meaning of the term.
Anyone who's used garbage collection with C is probably familiar with
the Boehm Garbage Collector. I believe that the Boehm GC library
embodies what most people would expect of a garbage collection
system: The programmer is freed from having to worry about memory
allocations and when or where pointers to allocated memory are, it's
the collectors job to find those pointers and piece together what
memory is still pointed to by active pointers and reclaim the memory
which has no live pointers referencing it. Very roughly, it does this
by starting with a collection of root allocations. It scans these
allocations looking for pointers, and then following those pointers
and scanning those blocks of memory. It builds a graph of references
from these root objects, and when all the memory has been scanned,
memory allocations that are not part of this 'liveness' graph can be
reclaimed. It makes no particular demands of the programmer or
compiler, in fact it can be used as a drop in replacement for malloc()
and free(), requiring no changes.
From what I've pieced together, Leopards GC system is nothing like
this. While the Boehm GC system detects liveness passively by
scanning memory and looking for and tracing pointers, Leopards GC
system does no scanning and requires /active/ notification of changes
to the heap. This, I believe, is what a 'write-barrier' actually is:
it is a function call to the GC system so that it can update it's
internal state as to what memory is live. It relies, I suspect
exclusively, on these function calls to track memory allocations.
If this is indeed the case, it's my opinion that the 10.5 GC system is
fundamentally and fatally flawed. In fact, its use should be actively
discouraged. I'll now outline the reasoning behind this, including an
example that highlights the magnitude of the problem.
This would explain the need for 'dual mode' frameworks, and that an
application that uses GC must be linked to frameworks that are all GC
capable. This is because a non-GC framework would not actively inform
the GC system of its use of pointers, leading to random crashes and
what not as the GC system reclaimed memory that was actively in use.
In order for leopards GC system to function properly, the compiler
must be aware of all pointers that have been allocated by the GC
system so that it can wrap all uses of the pointer with the
appropriate GC notification functions (objc_assign*). Note that this
is subtly different that the definitions and examples used in 'Garbage
Collection Programming Guide'. From 'Garbage Collection Programming
Guide', 'Language Support':
__strong
Specifies a reference that is visible to (followed by) the garbage
collector (see “How the Garbage Collector Works”).
__strong modifies an instance variable or struct field declaration to
inform the compiler to unconditionally issue a write-barrier to write
to memory. __strong is implicitly part of any declaration of an
Objective-C object reference type. You must use it explicitly if you
need to use Core Foundation types, void *, or other non-object
references (__strong modifies pointer assignments, not scalar
assignments).
----
This is a deceptive description. /ANY/ pointer that holds a pointer
to memory that MAY be allocated from the garbage collector must be
marked __strong. The compiler attempts to 'automagically' add
__strong to certain types of pointer references, specifically 'id' and
derivatives of 'id', namely class pointers (NSString *).
Realistically, to properly add __strong to a pointer, you need to know
if that allocation came from the garbage collector. This information
is essentially impossible to know apriori, so the only practical
course of action is to defensively qualify all pointers as __strong.
The consequence of using a pointer that is not properly qualified as
__strong is that the GC system may determine that the allocation is no
longer live and reclaim it, even if there is still a valid pointer out
there. Therefore, all pointer references which have the possibility
of referencing an allocation from the garbage collection system must
treat that pointer as __strong. If any piece of code, at any level,
at any point in time fails to satisfy this condition, you are in for a
world of hurt. The fact of the matter is that, for all practical
purposes, it is impossible to guarantee this. It is also trivial to
get wrong, and the only indication that there's a problem is an
occasional random error or crash. Most of the time things will work,
but every once in awhile... and these 'bugs' are virtually impossible
to track down. (In fact, this message is the result of having to
track down Yet Another GC Problem where something, somewhere, did
something wrong... maybe).
I believe I have a succinct example that illustrates these issues:
----
#import <Foundation/Foundation.h>
@interface GCTest : NSObject {
const char *title;
};
- (void)setTitle:(const char *)newTitle;
- (const char *)title;
@end
@implementation GCTest
- (void)setTitle:(const char *)newTitle
{
printf("Setting title. Old title: %p, new title %p = '%s'\n",
title, newTitle, newTitle);
title = newTitle;
}
- (const char *)title
{
return title;
}
@end
int main(int argc, char *argv[]) {
GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
gcConstTitle = [[GCTest alloc] init];
gcUTF8Title = [[GCTest alloc] init];
[gcConstTitle setTitle:"Hello, world!"];
[gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
\xC2\xA1"] UTF8String]];
[[NSGarbageCollector defaultCollector] collectExhaustively];
NSLog(@"GC test");
printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
[gcConstTitle title]);
printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
[gcUTF8Title title]);
return(0);
}
----
[<johne...>] GC% gcc -framework Foundation -fobjc-gc-only gc.m -
o gc
[<johne...>] GC% ./gc
Setting title. Old title: 0x0, new title 0x1ed4 = 'Hello, world!'
Setting title. Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
2008-02-03 19:07:58.911 gc[6191:807] GC test
gcConstTitle title: 0x1ed4 = 'Hello, world!'
gcUTF8Title title: 0x1011860 = '??0?" '
[<johne...>] GC%
The problem is with the pointer returned by UTF8String. From
NSString.h:
- (const char *)UTF8String; // Convenience to return null-terminated
UTF8 representation
I strongly suspect the pointer that UTF8String returns is a pointer to
an allocation from the garbage collector. In fact, by changing the
'title' ivar to include __strong 'solves' the problem.
And herein lies the reason why I believe Leopards GC system is
fundamentally and fatally flawed, and should in fact not be used at
all. There are several possible 'solutions' to this, but you'd better
get it right or you're going to be stuck with race conditions of the
most insidious nature imaginable. Adding fuel to the fire, it's not
clear what the 'right' solution is, or if there even is one.
One might argue that, per the __strong documentation, the ivar
requires the __strong type qualifier. This is, at best, non-obvious,
and considering that the documentation makes references to 'objects'
almost exclusively, one can also argue that this pointer does not
qualify. But this points to a much bigger problem: anyone who has
used UTF8String and not qualified it as __strong has a race condition
just waiting to happen. This is also not a problem that can be fixed
with a patch to Foundation in the next Mac OS X version- every program
that has not qualified their use of UTF8String with __strong must be
recompiled and re-released as there is nothing a shared library fix
can do about this. Add to this the fact that the published
documentation is essentially silent on the topic and offers no
guidance. In fact, it's possible that adding __strong to the 'title'
ivar is just an observable side effect of something else that seems to
fix the problem. I'm not sure what you'd do in that case because at
that point just calling methods that return a pointer that you need
becomes an exercise in luck and race conditions.
This is but one example. I don't think I need to point out that there
are others. A lot of others. And most of them are non-obvious. A
consequence of all of this is that you must not pass pointers that may
have been allocated by the garbage collector to any C function in a
library. For example,
printf("String: %s\n", [@"Hello, world!" UTF8String]);
passes a GC allocated pointer to a C library function, which almost
assuredly does not have the proper write barrier logic in place to
properly guard the pointer. This example is innocent enough, and
likely to work due to its short lived nature, but it's easy to think
of examples where the pointer passed to a C function, say an SQLite3
call, can cause no end of problems if that pointer happens to be
reclaimed in the middle of the function call.
This is the basis for my opinion that the 10.5 GC should not be used.
In order to properly use the GC system, one must guarantee that all
uses of GC allocated pointers have compiler assisted write barrier
logic. This is beyond non-trivial in practice as the passing of
pointers is part of most functions calls. Those functions call other
functions, and at some point that pointer is likely to pass through a
C library function.
Since Leopards GC system places the burden of keeping the state of the
GC system up to date on to the compiler, and in turn to every line of
code that uses a pointer, this increases the possible locations for GC
bugs to every single pointer using line of code. There's a
considerable amount of code that's been added to GCC to facilitate all
of this, and bugs and code being what they are, there's bound to be
bugs in there. Code compiled with those bugs is frozen, the only way
to fix it is to recompile. This means that anyone, /anyone/, who
created GC enabled code needs to recompile their code in order to
receive the bug fix. This is an unalterable consequence of the
decision to move the GC logic in to the compiler. -
On Feb 3, 2008, at 17:57, John Engelhart wrote:
> In order for leopards GC system to function properly, the compiler
> must be aware of all pointers that have been allocated by the GC
> system so that it can wrap all uses of the pointer with the
> appropriate GC notification functions (objc_assign*). Note that
> this is subtly different that the definitions and examples used in
> 'Garbage Collection Programming Guide'. From 'Garbage Collection
> Programming Guide', 'Language Support':
>
> __strong
> Specifies a reference that is visible to (followed by) the garbage
> collector (see “How the Garbage Collector Works”).
>
> __strong modifies an instance variable or struct field declaration
> to inform the compiler to unconditionally issue a write-barrier to
> write to memory. __strong is implicitly part of any declaration of
> an Objective-C object reference type. You must use it explicitly if
> you need to use Core Foundation types, void *, or other non-object
> references (__strong modifies pointer assignments, not scalar
> assignments).
>
> ----
>
> This is a deceptive description. /ANY/ pointer that holds a pointer
> to memory that MAY be allocated from the garbage collector must be
> marked __strong. The compiler attempts to 'automagically' add
> __strong to certain types of pointer references, specifically 'id'
> and derivatives of 'id', namely class pointers (NSString *).
Interesting post. A couple of comments (that may just show I didn't
absorb all of your argument):
-- The extent of the deception seems to be that __strong is an
attribute of the declaration, not of the pointer, but the
documentation confuses the two: the compiler must be aware of all
*variables used for* pointers that have been allocated by the GC
system, and *a single variable cannot be used at different times for
pointers to memory in different allocation systems*. If there was a
fix to the documentation, would you still say GC is broken?
-- It doesn't exactly surprise me that your sample code failed,
because my reading of the documentation (the section you quoted) is
that it told you the rules and you didn't follow them -- by not
putting __strong on the char* ivar in the version that failed. The
only issue is whether -[NSString UTF8String] returns memory allocated
from a GC-controlled pool or not. The documentation for the method says:
> Discussion
>
> The returned C string is automatically freed just as a returned
> object would be released; you should copy the C string if it needs
> to store it outside of the autorelease context in which the C string
> is created.
This sounds like it hasn't been updated for Leopard, but I'd sure read
it as telling me the return string comes from the same place objects
come from -- GC memory. And therefore any stored pointer to it would
need a __strong or a __weak on its variable. Or, as stipulated, the
result could be copied into malloc memory before being used. (The
picture in the GC documentation suggests that malloc memory isn't GC-
controlled, although I didn't find any text to state this absolutely.
Maybe it's too obvious to say.)
Or did I miss your point?
-- I too puzzled over the meaning of the stuff in the GC document
about write barriers, which I agree raises more questions than it
answers. In the end, I came to the conclusion that "write barriers" in
this case were nothing to do with protecting the integrity or lifetime
of any pointer, but rather a pragmatic hint to *this* GC
implementation about how hard it might work at collection on any given
occasion.
If the documentation were changed to use the phrase "collection
performance hints" instead of "write barriers", would you still say GC
is broken?
-- So I wonder if the problem is that GC is broken, or just annoyingly
fussy and poorly documented as regards to non-object memory.
I hope you'll post more analysis of the problem. I (with a sigh of
relief) jumped ship from the SS Retainer, so it matters to me if I'm
now sailing towards that world of hurt you foreshadow. :) -
j o a r Re: Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful Feb 04 2008, 08:18Hello John,
On Feb 3, 2008, at 5:57 PM, John Engelhart wrote:
> This is bound to be an inflammatory subject. That is not my intent,
> and I mean no disrespect to the programmers who worked on the GC
> system. I'm quite sure that adding GC to Cocoa is a non-trivial,
> near impossible task, filled with trade-offs between "really bad"
> and "even worse." Also understand that a bit of 'authoritative'
> documentation or instruction can out right negate some of the points
> I make below as I have had only the publicly available documentation
> and my (relatively brief 2-3 months) experiences with the 10.5 GC
> system to form these opinions....
Given that GC in Leopard is a 1.0 release I think it's to be expected
that there will be bugs and room for improvement in both the
implementation and the documentation. The best way to get the
improvements that matters the most to you is to file targeted bug
reports and enhancement requests.
> Anyone who's used garbage collection with C is probably familiar
> with the Boehm Garbage Collector. [...]
> From what I've pieced together, Leopards GC system is nothing like
> this.
I think that the current "Architecture" section of the documentation
gives a fairly good introduction and overview to what most developers
need to know about the GC in Leopard. That said, I'm sure that you can
think of things that you would like to see improved, and I encourage
you to file enhancement requests wherever you do. The documentation
department is very responsive and typically release multiple updates
to the documentation per year.
As an example: The documentation currently basically only deals with
Cocoa and CoreFoundation. It seems to me that you think that it's not
clear enough on how to deal with for example a (char *), and I would
agree. This would be a great enhancement request.
> This would explain the need for 'dual mode' frameworks, and that an
> application that uses GC must be linked to frameworks that are all
> GC capable. This is because a non-GC framework would not actively
> inform the GC system of its use of pointers, leading to random
> crashes and what not as the GC system reclaimed memory that was
> actively in use.
Manual memory management and automatic memory management is
sufficiently different that you need to change your coding patterns to
adapt to either mode. I don't think that you could point to any
environment where you can run non-trivial code in either manual or
automatic memory management without changes.
Finalizers have a different purpose in life than dealloc methods. You
can't automatically turn dealloc methods into finalizers, and you
can't just skip over them either. You need to have different code
paths depending on the mode you choose for your code.
> And herein lies the reason why I believe Leopards GC system is
> fundamentally and fatally flawed, and should in fact not be used at
> all. There are several possible 'solutions' to this, but you'd
> better get it right or you're going to be stuck with race conditions
> of the most insidious nature imaginable. Adding fuel to the fire,
> it's not clear what the 'right' solution is, or if there even is one.
Not to say that the engineers at Apple never make any mistakes, but do
you really think that Apple would release something like this if what
you say was true? :-)
> One might argue that, per the __strong documentation, the ivar
> requires the __strong type qualifier. This is, at best, non-
> obvious, and considering that the documentation makes references to
> 'objects' almost exclusively, one can also argue that this pointer
> does not qualify. But this points to a much bigger problem: anyone
> who has used UTF8String and not qualified it as __strong has a race
> condition just waiting to happen. This is also not a problem that
> can be fixed with a patch to Foundation in the next Mac OS X
> version- every program that has not qualified their use of
> UTF8String with __strong must be recompiled and re-released as there
> is nothing a shared library fix can do about this.
As a generalization I think it's fair to say that Apple will only fix
bugs in *their* code by updates to Mac OS X, any bugs in *your* code
must be fixed by you. If you have made a GC bug in one of your
shipping applications - because you lacked sufficient documentation to
get it right, or for any other reason - it's quite likely that you
will have to issue an update to fix that bug.
> [...] A consequence of all of this is that you must not pass
> pointers that may have been allocated by the garbage collector to
> any C function in a library. For example,
>
> printf("String: %s\n", [@"Hello, world!" UTF8String]);
>
> passes a GC allocated pointer to a C library function, which almost
> assuredly does not have the proper write barrier logic in place to
> properly guard the pointer. This example is innocent enough, and
> likely to work due to its short lived nature, but it's easy to think
> of examples where the pointer passed to a C function, say an SQLite3
> call, can cause no end of problems if that pointer happens to be
> reclaimed in the middle of the function call.
I think that you forget, and this might be at the heart of your
worries, that any pointer found on the *stack* is treated as a root.
Being a root it will not be collected, and neither will anything that
it in turn references.
Cheers,
j o a r -
[and to the list]
You'll want to read this:
<http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection
/Introduction.html>
On the first page, it says:
"The initial root set of objects is comprised of global variables,
stack variables, and objects with external references. These objects
are never considered as garbage."
The key point here is "stack variables" include everything that thread
is referencing on any of its stack frames or registers.
> This information is essentially impossible to know apriori,
No, it's not a quality of the dynamic allocation, it's a quality of
the class's ivar. Should title be scanned ? Yes ? No ?
Practically speaking, Objective-C objects are allocated from scanned
memory, and the system frameworks Do The Right Thing. If you call
malloc() yourself, well, that's clearly not from GC's scanned memory.
> so the only practical course of action is to defensively qualify all
> pointers as __strong.
That's probably wise for people learning to use GC.
> I believe I have a succinct example that illustrates these issues:
... you did it wrong and it doesn't work. Check.
> but it's easy to think of examples where the pointer passed to a C
> function, say an SQLite3
> call, can cause no end of problems if that pointer happens to be
> reclaimed in the middle of the function call.
As it just so happens, I write a dual mode framework that does exactly
that.
> This is beyond non-trivial in practice as the passing of pointers is
> part of most functions calls.
Arguments and return values are always live as they are on the stack
(or a register). If one of those functions wants to store the pointer
in its own memory, then you need to keep that pointer live for however
long you expect the C library to reference it. Which isn't any
different than life before GC.
- Ben -
On 4 Feb 2008, at 01:57, John Engelhart wrote:
> I've had several reservations about Leopard's GC system since I
> started working with it. There is very little documentation on
> Leopards GC system, so the following has been pieced together by
> inference and observations of how the garbage collection system
> seems to work. My first concern was with the use of "compiler
> assisted write barriers". The current public documentation is
> extremely vague as to what a 'write barrier' is,
[snip]
> From what I can tell, the term 'write barrier' as it is used by the
> GC documentation has absolutely nothing to do with this traditional
> meaning of the term.
The meaning of "write barrier" in this context is the traditional one
in the world of garbage collection, which has been around a lot longer
than other meaning. It's certainly traditional though; e.g. see
<ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps>
or the excellent book "Garbage Collection: Algorithms for Automatic
Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN
0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).
The GC docs actually explain what the write barrier is used for here:
<http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection
/Articles/gcArchitecture.html#//apple_ref/doc/uid/TP40002451-SW4>
> Anyone who's used garbage collection with C is probably familiar
> with the Boehm Garbage Collector. I believe that the Boehm GC
> library embodies what most people would expect of a garbage
> collection system: The programmer is freed from having to worry
> about memory allocations
[snip]
> It makes no particular demands of the programmer or compiler, in
> fact it can be used as a drop in replacement for malloc() and
> free(), requiring no changes.
>
> From what I've pieced together, Leopards GC system is nothing like
> this. While the Boehm GC system detects liveness passively by
> scanning memory and looking for and tracing pointers, Leopards GC
> system does no scanning and requires /active/ notification of
> changes to the heap. This, I believe, is what a 'write-barrier'
> actually is: it is a function call to the GC system so that it can
> update it's internal state as to what memory is live. It relies, I
> suspect exclusively, on these function calls to track memory
> allocations.
The Boehm GC and the Leopard Cocoa GC have very different design
goals. In the case of Boehm's collector, it's a requirement that the
collector work without any assistance from the compiler; as a result,
it has to use "conservative" techniques, which may in general result
in leaks of arbitrary amounts of memory simply because of a stray
value that *looks like* a pointer to something. The lack of compiler
assistance means that it's almost impossible to write a collector that
will run in the background (the Boehm collector has to stop *all* the
other threads in your program every so often if you run it in the
background), and it's difficult to implement generational behaviour
without relying on platform-specific features such as access to dirty
bits from the system page table... Even in that case, use of dirty
bits is woefully inefficient compared to compiler co-operation, since
a single dirty bit means you must re-scan an entire page of memory.
The Boehm GC is very clever, certainly, but it has to cope with these
limitations (and more besides).
Cocoa GC, on the other hand, is able to co-operate with the compiler,
and that's what the write barriers are. You have mis-interpreted
their function; they exist to track inter-generational pointers, not
to enable some sort of behind-the-scenes reference counting as I think
you imply. They may also be used to help the collector to obtain a
consistent view of the mutator's objects in spite of running in the
background... I don't know whether the Leopard GC does that or not.
(Incidentally, there is also a read barrier, which is used to help
implement zeroing weak references; the compiler only generates that
for variables marked __weak.)
I think, perhaps, that it would be worth your while reading through
the literature on garbage collection, as you might then understand the
various trade-offs involved better.
> In order for leopards GC system to function properly, the compiler
> must be aware of all pointers that have been allocated by the GC
> system so that it can wrap all uses of the pointer with the
> appropriate GC notification functions (objc_assign*).
Yep.
[snip]
> Realistically, to properly add __strong to a pointer, you need to
> know if that allocation came from the garbage collector. This
> information is essentially impossible to know apriori, so the only
> practical course of action is to defensively qualify all pointers as
> __strong.
No. Cocoa GC mostly deals with objects (which may include Core
Foundation objects). That's why the default assumption, which is that
object pointers are strong, is enough for most situations.
That only changes if you have pointers of non-object types that happen
to point to things that were allocated with the GC, *and only then* if
they are stored in locations that are not scanned by default. This is
an unusual situation, since few methods return things that are
allocated by GC and that are not objects. -UTF8String is probably the
most common example, but since you tend not to store the result of
that method, there would rarely---if ever---be a problem.
> The consequence of using a pointer that is not properly qualified as
> __strong is that the GC system may determine that the allocation is
> no longer live and reclaim it, even if there is still a valid
> pointer out there.
Only if there is no copy of the pointer in any of the locations that
are scanned by default (e.g. the stack, in registers, in global
variables).
> It is also trivial to get wrong, and the only indication that
> there's a problem is an occasional random error or crash.
In most cases, because GC'd things are objects, it's trivial to get
*right*.
It's only in special cases, where you're using C pointer types to
point to GC'd memory, that you need worry about this kind of thing.
> I believe I have a succinct example that illustrates these issues:
[snip]
> I strongly suspect the pointer that UTF8String returns is a pointer
> to an allocation from the garbage collector. In fact, by changing
> the 'title' ivar to include __strong 'solves' the problem.
Yes, that's your bug. It doesn't just 'solve' the problem, the lack
of __strong here *is* the problem, but only because this is an ivar
and not e.g. a function argument or a stack-based variable.
> But this points to a much bigger problem: anyone who has used
> UTF8String and not qualified it as __strong has a race condition
> just waiting to happen.
No, because stack variables and registers are included in the set of
GC roots.
> This is but one example. I don't think I need to point out that
> there are others. A lot of others. And most of them are non-
> obvious. A consequence of all of this is that you must not pass
> pointers that may have been allocated by the garbage collector to
> any C function in a library. For example,
>
> printf("String: %s\n", [@"Hello, world!" UTF8String]);
That code is fine. The reference is on the stack (or, before that, in
the register that holds the return value of -UTF8String). It will be
followed, so the memory won't be released until the printf() function
has finished with it.
> passes a GC allocated pointer to a C library function, which almost
> assuredly does not have the proper write barrier logic in place to
> properly guard the pointer.
The write barrier is nothing to do with it. The write barrier is for
inter-generational pointers, and possibly also to help the collector
to scan in the background safely.
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Feb 3, 2008, at 5:57 PM, John Engelhart wrote:
> int main(int argc, char *argv[]) {
> GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
>
> gcConstTitle = [[GCTest alloc] init];
> gcUTF8Title = [[GCTest alloc] init];
>
> [gcConstTitle setTitle:"Hello, world!"];
> [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
> \xC2\xA1"] UTF8String]];
>
> [[NSGarbageCollector defaultCollector] collectExhaustively];
> NSLog(@"GC test");
>
> printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
> [gcConstTitle title]);
> printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
> [gcUTF8Title title]);
>
> return(0);
> }
> The problem is with the pointer returned by UTF8String. From
> NSString.h:
>
> - (const char *)UTF8String; // Convenience to return null-terminated
> UTF8 representation
>
> I strongly suspect the pointer that UTF8String returns is a pointer
> to an allocation from the garbage collector. In fact, by changing
> the 'title' ivar to include __strong 'solves' the problem.
I'd just like to comment quickly that in Tiger and earlier OS X
releases without GC, that your code here would be just as broken. The -
UTF8String method has always returned "autoreleased memory", that is,
a pointer to a UTF8 string that is only being held by an autoreleased
object. So once the containing autorelease pool was dealloced, so
would the UTF8 string representation, and you'd have a bad pointer to
unallocated memory in your object.
In fact, unlike in the GC world, there was no way to actually keep
that memory around longer. There was no way to tell it that you wanted
a __strong reference. If you wanted to keep a pointer to a UTF8
representation you had to do your own malloc() and make your own copy.
And that is why this isn't a problem. The new GC implementation
doesn't make anything a bug that was legal before - it was just as
much of a crasher before GC as it is after GC.
Hope this helps,
- Greg -
On 2/3/08 8:57 PM, John Engelhart said:
> This is the basis for my opinion that the 10.5 GC should not be used.
I've been working on a GC-only app for several months now. In practice,
I haven't run into major problems.
But I would say that the dev tools and docs are not quite 'GC ready'.
Lots of docs talk about autoreleasing and make no mention of GC. The
(great) Debugging Magic technote has not a single hint for GC
debugging. MallocDebug does not work with GC apps (you can still have
leaks in GC apps if your app also uses C/C++ code). Rosetta does not
support GC code (could be nice for running unit tests). OpenGL
Profiler.app does not work with GC apps. Interface Bulider does not
support GC plugins. Hopefully the tools with catch up soon.
--
____________________________________________________________
Sean McBride, B. Eng <sean...>
Rogue Research www.rogue-research.com
Mac Software Developer Montréal, Québec, Canada -
On Feb 4, 2008, at 8:11 AM, Alastair Houghton wrote:
> or the excellent book "Garbage Collection: Algorithms for Automatic
> Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN
> 0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).
You must have a later edition than mine, as the inside cover of my
copy says '96.
>> >
> The GC docs actually explain what the write barrier is used for here:
>
> <http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection
/Articles/gcArchitecture.html#//apple_ref/doc/uid/TP40002451-SW4
>
>> It makes no particular demands of the programmer or compiler, in
>> fact it can be used as a drop in replacement for malloc() and
>> free(), requiring no changes.
>>
>> From what I've pieced together, Leopards GC system is nothing like
>> this. While the Boehm GC system detects liveness passively by
>> scanning memory and looking for and tracing pointers, Leopards GC
>> system does no scanning and requires /active/ notification of
>> changes to the heap. This, I believe, is what a 'write-barrier'
>> actually is: it is a function call to the GC system so that it can
>> update it's internal state as to what memory is live. It relies, I
>> suspect exclusively, on these function calls to track memory
>> allocations.
>
> The Boehm GC and the Leopard Cocoa GC have very different design
> goals. In the case of Boehm's collector, it's a requirement that
> the collector work without any assistance from the compiler; as a
> result, it has to use "conservative" techniques, which may in
> general result in leaks of arbitrary amounts of memory simply
> because of a stray value that *looks like* a pointer to something.
> The lack of compiler assistance means that it's almost impossible to
> write a collector that will run in the background (the Boehm
> collector has to stop *all* the other threads in your program every
> so often if you run it in the background), and it's difficult to
> implement generational behaviour without relying on platform-
> specific features such as access to dirty bits from the system page
> table... Even in that case, use of dirty bits is woefully
> inefficient compared to compiler co-operation, since a single dirty
> bit means you must re-scan an entire page of memory. The Boehm GC
> is very clever, certainly, but it has to cope with these limitations
> (and more besides).
I've always enjoyed using the Boehm garbage collector. I've never had
a problem with it's speed, and off the top of my head I can't think of
any issue where I had to work around the collector. It always just
'works', and not only works but has caught several pointer misuse or
off by one errors as well. It's a joy to use, you literally just
allocate and forget. I had such high hopes for Cocoa's GC system
because once you're spoiled by GC, it's hard to go back.
Unfortunately, the 4-5 months of time I've put in on Leopard's GC
system has not been nearly as pleasant. It has been outright
frustrating, and it's reached the point where I consider the system
untenable.
>
> Cocoa GC, on the other hand, is able to co-operate with the
> compiler, and that's what the write barriers are. You have mis-
> interpreted their function; they exist to track inter-generational
> pointers, not to enable some sort of behind-the-scenes reference
> counting as I think you imply. They may also be used to help the
> collector to obtain a consistent view of the mutator's objects in
> spite of running in the background... I don't know whether the
> Leopard GC does that or not.
As I've stated, my opinions are formed from the publicly available
documentation and my (hair pulling) experiences over the last few
months. The quick and the short of it is Leopards GC system behaves
unlike any other GC system I've used.
>
> (Incidentally, there is also a read barrier, which is used to help
> implement zeroing weak references; the compiler only generates that
> for variables marked __weak.)
>
> I think, perhaps, that it would be worth your while reading through
> the literature on garbage collection, as you might then understand
> the various trade-offs involved better.
>
>> In order for leopards GC system to function properly, the compiler
>> must be aware of all pointers that have been allocated by the GC
>> system so that it can wrap all uses of the pointer with the
>> appropriate GC notification functions (objc_assign*).
>
> Yep.
>
> [snip]
>
>> Realistically, to properly add __strong to a pointer, you need to
>> know if that allocation came from the garbage collector. This
>> information is essentially impossible to know apriori, so the only
>> practical course of action is to defensively qualify all pointers
>> as __strong.
>
> No. Cocoa GC mostly deals with objects (which may include Core
> Foundation objects). That's why the default assumption, which is
> that object pointers are strong, is enough for most situations.
There is nothing special about objects. I believe that this doesn't
quite hold true for the ObjC 2.0 64 bit API, but it still holds true
for the 32 bit API: Objects are nothing but pointers to structs.
Your typical
@interface MYObject : NSObject {
void *ptr;
}
essentially becomes:
typedef struct {
#include "NSObject_struct_bits";
void *ptr;
} MYObject;
The following is a working example of the key points of objective-c,
and for all practical purposes, this is what your objective-c object
gets turned in to. It's literally possible to hack up a small perl
script that gets you 60-70% of the way to a full blown "Objective-C
Compiler":
#include <stdio.h>
#include <stdlib.h>
typedef struct { const char *title; } MYObject;
void *alloc (MYObject *self, const char
*_cmd) { return(calloc(1, sizeof(MYObject))); }
void *init (MYObject *self, const char
*_cmd) { return(self); }
void setTitle(MYObject *self, const char *_cmd, const char
*newTitle) { self->title = newTitle; }
const char *title (MYObject *self, const char
*_cmd) { return(self->title); }
int main(int argc, char *argv[]) {
MYObject *testObject = NULL;
testObject = init(alloc(NULL, "alloc"), "init");
setTitle(testObject, "setTitle:", "Object Title");
const char *theTitle = title(testObject, "title");
printf("Object: %p title: %p, '%s'\n", testObject, theTitle,
theTitle);
return(0);
}
[<johne...>] /tmp% gcc -o obj obj.c
[<johne...>] /tmp% ./obj
Object: 0x100120 title: 0x1fcc, 'Object Title'
You can even have the compiler create your objects ivars with the
@defs directive. In fact, it's possible and perfectly legal to create
a C function inside your @implementation with a prototype like
myCFunction(MYObject *self, SEL _cmd) and access your objects ivar's
with "self->ivar" inside the function.
>
> That only changes if you have pointers of non-object types that
> happen to point to things that were allocated with the GC, *and only
> then* if they are stored in locations that are not scanned by
> default. This is an unusual situation, since few methods return
> things that are allocated by GC and that are not objects. -
> UTF8String is probably the most common example, but since you tend
> not to store the result of that method, there would rarely---if
> ever---be a problem.
As the above example illustrates, the entire issue of "object" vs.
"non-object" is a red-herring. There is nothing special about
objects, nor anything special about ivars. The fact that the GCC
compiler attempts to 'automagically' detect which pointers are
__strong behind your back only obscures the issues at hand. Because
of this automatic promotion, it's easy to fall in to a trap where
objects are some how magical. Nothing could be further from the truth.
The fact of the matter is that the DEFAULT behavior for pointers in
Leopards GC system is that they are ignored and do not point to live
data. I challenge anyone to find another GC system in which the
default behavior for a pointer is to be ignored, and what it points to
to NOT be considered part of the live set. However, through some
unspecified logic, SOME pointers are elevated to 'Points to live GC
data'.
I mean, seriously, can anyone conjure up a compelling reason why the
default behavior of a pointer is that it does not point to live data?
I can think of some infrequent special cases when I would want to turn
it off, but off by default unless you qualify it with __strong?
>
>> The consequence of using a pointer that is not properly qualified
>> as __strong is that the GC system may determine that the allocation
>> is no longer live and reclaim it, even if there is still a valid
>> pointer out there.
>
> Only if there is no copy of the pointer in any of the locations that
> are scanned by default (e.g. the stack, in registers, in global
> variables).
I'd put instantiated objects on that list.
>
>> It is also trivial to get wrong, and the only indication that
>> there's a problem is an occasional random error or crash.
>
> In most cases, because GC'd things are objects, it's trivial to get
> *right*.
>
> It's only in special cases, where you're using C pointer types to
> point to GC'd memory, that you need worry about this kind of thing.
Your choice of words makes me suspect that you consider an object
pointer, such as NSObject *, and a C pointer, ala void * or char *, to
be two distinctly different things. I believe I have shown that this
is not the case, and one can, in fact, consider them all to be 'void
*' pointers for the purposes of reasoning.
When considered from the 'void *' perspective, I believe your argument
highlights my point: Pointers are pointers, and knowing which ones to
treat as 'special' is non-trivial and easy to get wrong.
>
>> I believe I have a succinct example that illustrates these issues:
>
> [snip]
>
>> I strongly suspect the pointer that UTF8String returns is a pointer
>> to an allocation from the garbage collector. In fact, by changing
>> the 'title' ivar to include __strong 'solves' the problem.
>
> Yes, that's your bug. It doesn't just 'solve' the problem, the lack
> of __strong here *is* the problem, but only because this is an ivar
> and not e.g. a function argument or a stack-based variable.
I don't disagree with you, 'technically' it's my bug, but this is my
point. Take a step back for a second and consider what you're saying:
The garbage collector has reclaimed the allocation that contains the
text for the string. From an object that the GC system considers to
be live. That contains a pointer, that isn't 'hidden' by xor or what
not from the GC system, it's a normal pointer to the allocation. That
the garbage collector just recycled because there are no references to
keep it live.
Can you name another garbage collector in which this is a /
programmers/ error, and not a bug in the GC system? Reading over this
I almost have to chuckle at the absurdity of it. Yet when you get
right down to it, this is what is being advocated.
Now, trying to find 'bugs' such as this in running code is every
programmers worst nightmare. The bug manifests itself only when the
GC system reclaims it, which is essentially at some completely random,
non-deterministic point in time in the future. There is essentially
nothing you can do to reproduce the bug.
Then take a look again at the method prototype:
- (const char *)UTF8String; // Convenience to return null-terminated
UTF8 representation
Can you clearly and concisely articulate why the pointer returned from
this particular method requires a __strong qualifier? Remember, if
you get it wrong you sign yourself up for many long nights of trying
to track down some random, non repeatable bug.
Then consider the following:
- (const char *)hexString;
- (const char *)hexString
{
char *hexPtr = NULL;
asprintf(&hexPtr, "0x%8.8x", myIvar);
return(hexPtr);
}
Now what? And whatever you do, DON'T cross the streams, or you risk
total protonic reversal.
>
>> But this points to a much bigger problem: anyone who has used
>> UTF8String and not qualified it as __strong has a race condition
>> just waiting to happen.
>
> No, because stack variables and registers are included in the set of
> GC roots.
Oddly, I don't find this reassuring. In fact, I think it might make
things worse. Why? This implies that the collector considers
anything that looks like a pointer on the stack is a pointer, and it
should be considered live and followed. If this is the case, this has
the effect of automatically promoting all pointers on the stack to
__strong, and that's a problem. This masks pointer declaration
errors, and pointers that are missing __strong will work as a side
effect of this behavior instead of causing crashes.
Besides which, not everything lives on the stack.
>
>> This is but one example. I don't think I need to point out that
>> there are others. A lot of others. And most of them are non-
>> obvious. A consequence of all of this is that you must not pass
>> pointers that may have been allocated by the garbage collector to
>> any C function in a library. For example,
>>
>> printf("String: %s\n", [@"Hello, world!" UTF8String]);
>
> That code is fine. The reference is on the stack (or, before that,
> in the register that holds the return value of -UTF8String). It
> will be followed, so the memory won't be released until the printf()
> function has finished with it.
Again, you are correct in the sense that this is how the 10.5 GC
system works.
I still contend that this is a design flaw. If the pointer were
passed any other way, let's say via some mutex guarded inter-thread
queue, it would require a __strong qualification. This works due to a
side effect of the GC system promoting ALL pointers it sees on the
stack to __strong. It's another "exception to the rule" to keep track
of.
This is why I believe Leopards GC system is fundamentally flawed.
There are a lot of these little rules and one offs you need to keep
track of, and hope that whoever wrote the code you're calling got it
all right too. The UTF8String highlights just how easy it is to
forget to add a __strong qualifier in front of a pointer, and that
most of the time things will work just fine. Until you hit that odd
ball corner case, and then at some point long after the initial event
that caused the problem has passed, things crash.
Those of you out there that think that these are non-issues, or "might
happen rarely"... please, knock yourself out and have fun. You too
can add "Reflexively knows the default address for the stack of the
first four threads and can unwind said stack frames by hand!" to your
resume. I'm just saying that I've been down this road and I'll gladly
take tracking down multithreaded deadlocks and hunting that last
missing release over this any day.
>
>> passes a GC allocated pointer to a C library function, which almost
>> assuredly does not have the proper write barrier logic in place to
>> properly guard the pointer.
>
> The write barrier is nothing to do with it. The write barrier is
> for inter-generational pointers, and possibly also to help the
> collector to scan in the background safely.
Well, you see... You'd think that, wouldn't you? But take a careful
look at the example I posted. Now, this is just a simple executable
built in the shell for example purposes, so none of the AppKit stuff
is fired up. The GC docs also say that the GC system is demand driven
under these conditions and that AppKit kicks the GC system to spawn a
background collector thread (objc_startCollectorThread())... so, I
think it's safe to say that things are 'quite' and nothing fancy is
going on in the background.... and there's only a few lines, so we can
be reasonably sure there's no background, hidden mutation events that
a write barrier would normally catch... but follow the steps closely
(from the original example posted)
[gcConstTitle setTitle:"Hello, world!"];
[gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
\xC2\xA1"] UTF8String]];
[[NSGarbageCollector defaultCollector] collectExhaustively];
NSLog(@"GC test");
printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
[gcConstTitle title]);
printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
[gcUTF8Title title]);
return(0);
}
[<johne...>] GC% gcc -framework Foundation -fobjc-gc-only gc.m -
o gc
[<johne...>] GC% ./gc
Setting title. Old title: 0x0, new title 0x1ed4 = 'Hello, world!'
Setting title. Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
2008-02-03 19:07:58.911 gc[6191:807] GC test
gcConstTitle title: 0x1ed4 = 'Hello, world!'
gcUTF8Title title: 0x1011860 = '??0?" '
[<johne...>] GC%
We can be reasonably sure that the pointer to the UTF8String is
'visible' before we call the collector, and there's no race conditions
happening. There's no mutations that a write barrier needs to
intercept going on. The gcUTF8Title object is clearly still 'live'
according to the GC system. The object clearly has the same pointer
in its ivar before and after the collection, yet the GC system
reclaimed the allocation that contained the string.
This has got to be the only GC system in which an object is live and
traceable from the roots, contains a pointer (that's not hidden or
anything else fancy) to an allocation that contains the text of the
string, and the GC system considers the string buffer to be dead, and
it's the fault of the programmer. Because the default behavior for
pointers is not that they point to live data, but that they are
ignored and not considered when tracing the heap. Pointers don't
point to things that are needed that often, do they? -
On Feb 4, 2008, at 4:14 PM, John Engelhart wrote:
> However, through some unspecified logic, SOME pointers are elevated
> to 'Points to live GC data'.
The logic isn't unspecified.
If a variable is of an object type or is of a pointer type with the
qualifier "__strong", it refers to something allocated -- and cleaned
up -- by the collector. Otherwise, it refers to something not
allocated by the collector.
> I mean, seriously, can anyone conjure up a compelling reason why the
> default behavior of a pointer is that it does not point to live data?
Yes. Distinguishing between pointers to collector-allocated objects
and non-collector-allocated objects ensures that the collector has far
less work to do and can do the work it has more efficiently, because
it can have more exact information about what portions of memory it
needs to check for strong references to objects.
It also goes a long way towards preventing false roots, which can help
keep down the working set of an application that uses garbage
collection.
> Your choice of words makes me suspect that you consider an object
> pointer, such as NSObject *, and a C pointer, ala void * or char *,
> to be two distinctly different things.
They can be; when running under GC, an object is assumed to be
allocated by the collector. An arbitrary buffer is not. If you want
to use an arbitrary "non-object pointer" type variable to refer to
something that is allocated by the collector (e.g. a buffer returned
from NSAllocateCollectable) you need to mark that variable with the
__strong type qualifier so it gets the same treatment as an object
type variable.
> Pointers are pointers, and knowing which ones to treat as 'special'
> is non-trivial and easy to get wrong.
In Objective-C it is relatively straightforward to tell which pointer
type variables to treat as objects. You have pointers to objects
which you can send arbitrary messages to, and pointers to things that
aren't objects which you can't send arbitrary messages to. The
compiler itself has to be able to tell the difference between them to
generate correct code, warnings, and errors. The syntax therefore
really isn't that ambiguous: If there has been an @interface or
@class declaration for it, it's an instances of a class, otherwise
it's something else.
In practice for many developers this isn't a significant issue. I
don't recall having seen any Cocoa code which used "char *" to store
an object pointer, for example. There are few places in idiomatic
Cocoa where you might commonly use a "void *" to store an object
pointer, and in those situations it's straightforward (and correct
under non-GC as well) to introduce a CFRetain of the object before
storing into the "void *" and a CFRelease of the object after the
"void *" is no longer relevant. (Under GC, CFRetain effectively adds
an extra root while CFRelease removes one.)
One of the places I do this is in code that presents a sheet, because
it returns control to the main run loop. If I have to pass an object
as the "(void *)context" parameter to the sheet invocation, I CFRetain
it first. Then in the sheet's did-dismiss selector (if it has one) or
did-end selector (if there's no did-dismiss), I CFRelease the object.
This ensures that "the sheet" acts as a root for the object in case
it's transient and just being used to pass information around.
> [gcConstTitle setTitle:"Hello, world!"];
> [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
> \xC2\xA1"] UTF8String]];
>
> [[NSGarbageCollector defaultCollector] collectExhaustively];
> NSLog(@"GC test");
>
> printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
> [gcConstTitle title]);
> printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
> [gcUTF8Title title]);
If you build this example non-GC, and you replace
[[NSGarbageCollector defaultCollector] collectExhaustively];
with
[pool drain];
it would be just as incorrect. The object backing that UTF-8 string
is no longer live, therefore you can't trust that the UTF-8 string
itself is valid.
I appreciate that the design of the Objective-C garbage collector in
Mac OS X 10.5 is not what you might be used to -- especially in that
it *does* take advantage of the fact that you can treat variables
typed as "object" differently from variables typed as "non-object
pointer" in Objective-C -- but it really doesn't have the fundamental
or design flaws that you assert it does.
-- Chris -
Hi John,
I think I have an idea of how your expectations differ from the design
goals of "auto". You're expecting the GC to take over memory
management for the entire process, whereas libauto is designed to
handle the memory management of only the Objective-C side of things
(that is, objects). That is, libauto is _not_ a replacement for
malloc()/free(), but only for -retain/-release (simplistically). I
presume Apple chose this so that existing C libraries using malloc()/
free() would work identically.
For GC to work with ObjC objects whilst keeping vanilla malloc
behavior unchanged, the system has to make some assumptions. This
brings up issues where objects and void* meet. As the documentation
says (and as you've discovered), the rules are:
• For GC-allocated memory, void* references stored in globals and on
the stack are considered strong.
• For GC-allocated memory, void* references stored in objects are
ignored by default.
• For malloc-allocated memory, void* references stored anywhere are
unchanged.
(By void* I mean any non-id pointer, including const char *.)
I don't know why the second rule was chosen (having the GC ignore
undecorated void* ivars), and I haven't had enough experience with it
to know if it's a good or bad thing. But you certainly need to be
aware of it if you're used to a purely-GC environment, as auto
provides a mixed malloc/GC environment.
So there are new conventions for Leopard's GC. I'll attempt to start a
list here, but don't take it as fact until there has been some peer
review.
• If you get a non-object pointer from somewhere (e.g. returned by -
[NSString UTF8String]), you need to know if it has been allocated in
the GC zone or the malloc zone. (The documentation "should" tell you
which.)
- If it's in the GC zone, you need to store the pointer in a strong
instance variable if you want it to stick around for more than the
current stack frame.
- If it's in the malloc zone, store it anywhere, but it's your
responsibility to free() it.
(Anything else?)
Jonathon Mah
<me...> -
On 5 Feb 2008, at 00:14, John Engelhart wrote:
> I had such high hopes for Cocoa's GC system because once you're
> spoiled by GC, it's hard to go back.
>
> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
> system has not been nearly as pleasant. It has been outright
> frustrating, and it's reached the point where I consider the system
> untenable.
Honestly, this point has now been answered over and over.
I think it comes down to the fact that you have failed to appreciate
that Cocoa GC is designed for easy use with *OBJECTS*. If you're
using it with objects, it "just works".
The only problem you've been able to demonstrate is in a situation
where you are doing something that would never have worked under the
autorelease model. And you've been told that you need to add __strong
in that case, *because* the compiler won't automatically pick that
pointer out as being a pointer to GC'd memory.
All of the rest of your post is worry about nothing. You can't create
any of the problems that you claimed to be concerned about (e.g.
collections destroying temporary objects), and you had misinterpreted
the write barriers as something that they weren't (if you have the
Garbage Collection book, I don't know why you did that; it's explained
quite clearly in there).
As far as objects "not being special", they *are* special, in that the
compiler generates layout information and method signatures for them.
AFAIK the layout information (albeit in a slightly different format)
is used by the garbage collector when scanning objects, which is
another reason that you need to use __strong on instance variables if
they point to non-object garbage collected memory. FYI, the Boehm
collector can also take advantage of layout information, so you could
create the same issue with that collector too; the only reason that
you don't often see it is that programmers are generally too lazy to
specify the pointer layout of their memory blocks and just let the
collector conservatively scan everything, which, of course, is slower
and more error prone (i.e. greater likelihood of leaks).
As for your example:
> - (const char *)hexString
> {
> char *hexPtr = NULL;
> asprintf(&hexPtr, "0x%8.8x", myIvar);
> return(hexPtr);
> }
That method is just badly designed. You shouldn't be returning
malloc()'d memory from a method like that; at the very least you
should name the method to indicate that you're doing something funny
with memory ownership, e.g.
- (void)getHexString:(const char **)mallocedResult
{
asprintf(mallocedResult, "0x%8.8x", myIvar);
}
but better yet, why not do
- (NSString *)hexString
{
return [NSString stringWithFormat:@"0x%8.8x", myIvar];
}
or if you absolutely must return a const char * pointer,
- (const char *)hexString
{
return [[NSString stringWithFormat:@"0x%8.8x", myIvar] UTF8String];
}
which has the benefit of working exactly like -UTF8String. If you
*really* wanted, you could mess about with NSAllocateCollectable().
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
> On 5 Feb 2008, at 00:14, John Engelhart wrote:
>
>> I had such high hopes for Cocoa's GC system because once you're
>> spoiled by GC, it's hard to go back.
>>
>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
>> system has not been nearly as pleasant. It has been outright
>> frustrating, and it's reached the point where I consider the system
>> untenable.
>
> Honestly, this point has now been answered over and over.
>
> I think it comes down to the fact that you have failed to appreciate
> that Cocoa GC is designed for easy use with *OBJECTS*. If you're
> using it with objects, it "just works".
You misunderstand what Objective C is, and how it works. "Objects" is
synonymous for "Structs".
> As far as objects "not being special", they *are* special, in that
> the compiler generates layout information and method signatures for
> them.
Read the above, "object" is synonymous for "struct". The "layout" of
an object is identical to the "layout" of a struct. This point is so
basic and fundamental to Objective-C and how things work at a low
level that it seriously brings in to question the accuracy of the rest
of your conclusions. If you do not understand the fundamentals such
as this, I do not see how you can possibly predict the effects and
implications of pointers in Leopards GC system.
> AFAIK the layout information (albeit in a slightly different format)
> is used by the garbage collector when scanning objects, which is
> another reason that you need to use __strong on instance variables
> if they point to non-object garbage collected memory.
Again, this clearly indicates that you do not understand the
fundamentals at hand. Your reasoning is faulty (in fact, it's
outright wrong). You have, in essence, made my point: How these
things work, and their subtle interactions, are CRITICAL to the
correct operation of the GC system. If you do not understand them,
you /can not/ possibly use the GC system correctly. You have,
LITERALLY, just signed yourself up to tracking down a GC related bug
in your code.
You should review the relevant files from the GCC compiler,
specifically gcc-5465/gcc/objc/objc-act.c from the 'gcc-5465.tar.gz'
distribution.
Thus spoke the documentation (documentation/Cocoa/Conceptual/
GarbageCollection/Articles/gcAPI.html):
__strong essentially modifies all levels of indirection of a pointer
to use write-barriers, except when the final indirection produces a
non-pointer l-value.
For example:
@interface GCTest : NSObject {
__strong void *ptr;
}
@implementation GCTest
- (void)setPtr:(void *)newPtr
{
ptr = newPtr;
}
__strong does not modify any layout information. At compile time,
when the compiler is working with a pointer that is qualified as
__strong, and the location that contains the pointer is written to /
updated / assigned (i.e., ptr = newPtr), the compiler re-write the
assignment to:
- (void)setPtr:(void *)newPtr
{
objc_assignIvar(newPtr, self, offsetof(ptr));
}
>
> As for your example:
>
>> - (const char *)hexString
>> {
>> char *hexPtr = NULL;
>> asprintf(&hexPtr, "0x%8.8x", myIvar);
>> return(hexPtr);
>> }
>
[snip]
> or if you absolutely must return a const char * pointer,
>
> - (const char *)hexString
> {
> return [[NSString stringWithFormat:@"0x%8.8x", myIvar] UTF8String];
> }
>
> which has the benefit of working exactly like -UTF8String. If you
> *really* wanted, you could mess about with NSAllocateCollectable().
Oh... my...
I could not possibly have asked for a better example of just how easy
it is to get this wrong.
Let those of you who are considering using Leopards GC system use this
as a warning of how dangerous its use is in practice.
First, understand that my original example, the one in which the GC
system snatched away the live allocation, did use
NSAllocateCollectable. Your example, using UTF8String, uses
NSAllocateCollectable as well. We can infer this by the behavior
exhibited by the GC system when qualifying pointers that store the
results from these methods as __strong, which prevents the collector
from reclaiming the allocation. Thus, by induction, the pointer to
the buffer that contains the strings text must ultimately come from
NSAllocateCollectable.
Let's start with NSAllocateCollectable(). The prototype for
NSAllocateCollectable, from NSZone.h, is as follows:
FOUNDATION_EXPORT void *__strong NSAllocateCollectable(NSUInteger
size, NSUInteger options) AVAILABLE_MAC_OS_X_VERSION_10_4_AND_LATER;
NOTE WELL: the __strong qualifier for the pointer.
Now, there is no formal grammar of Objective-C 2.0 published, but it
is a reasonable assumption that "__strong" is in the same group of
type qualifiers as "const" and "volatile". This makes __strong
subject to the same ANSI-C rules governing the use of type qualifiers,
including promotion and assignment rules. Returning to the method
definition of UTF8String, we find the following in NSString.h:
- (const char *)UTF8String; // Convenience to return null-terminated
UTF8 representation
Since the pointer that UTF8String returns is provably from
NSAllocateCollectable, this prototype has /DISCARDED/ the type
qualifier of __strong.
In (brief, simplified) summary, ANSI-C says that pointer assignments
from a "lesser qualified" type can be made to a "more qualified" type,
but not the other way around. For example:
char *cp;
const char *ccp;
It's perfectly legal to do the following:
ccp = cp;
The reverse, however, is not necessarily true:
cp = ccp;
This will result in a warning issued by the compiler that the type
qualification has been discarded. Now, modifying the example slightly
(for brevity, I'm going to gloss over the details of why moving const
past the pointer changes things):
char *cp;
char * const cpc;
The following results in an error:
cpc = cp;
Specifically: "error: assignment of read-only variable 'cpc'", and the
following is legal:
cp = cpc;
While there are certain qualifier specific semantics, I think we can
all reasonably agree that dropping the "__strong" qualifier should be
an error, as the consequences of discarding it is that the garbage
collector will loose it's ability to determine the liveness of that
pointer, resulting in difficult to find bugs.
QED, assigning a __strong qualifier to a non-__strong qualified
pointer is an error, per ANSI-C standard type qualifier promotion rules.
Naturally, one can override this behavior by typecasting the pointer
assignment, but by doing so, you, the programmer, have explicitly told
the compiler that the type qualification "does not apply in this
case.". This kind of type casting should only be done if you
understand all the consequences of the resulting type cast, and never
to just silence the compiler.
Referring back to my original example in my original post, in which I
store the pointer from a call to UTF8String to a "const char *title"
ivar pointer that the garbage collector later considers dead and
recycles, it is provable that the "__strong" qualifier most certainly
does apply to the pointer returned.
Therefore, by ANSI-C language rules, my assignment of the pointer
returned by UTF8String is legal, and the declaration of UTF8String
implicitly states that "the GC problem will be taken care of because
the programmer of this function has EXPLICITLY typecasted the __strong
qualifier away."
QED, my use of the UTF8String pointer is bug free and legal by ANSI-C
rules. That this later causes the GC system to reclaim the memory
pointed to by this pointer is due to a bug in prototype of UTF8String.
By type promotion rules, the prototype for UTF8String should be:
- (const char * __strong)UTF8String; // Convenience to return null-
terminated UTF8 representation
And, following ANSI-C rules, my assignment to "const char *title;"
would result in a compiler error by the "no stronger qualified to
lesser qualified" doctrine. This is exactly as it should be, because
as I have demonstrated, dropping that qualifier results in the GC
system reclaiming live memory.
Because of the design of Leopards GC system, it is the PROGRAMMERS
responsibility to INFER when a pointer should be __strong qualified.
Failure to correctly infer, apriori, which pointers require __strong
qualification is an intractable problem, and certainly should not be
left up to the programmer to "guess" correctly. The consequences of
getting this CRITICAL qualifier wrong will result in "race condition"
like problems. Programs will appear to operate correctly under light
load, but as they are pushed harder and hard, the conditions to expose
these race conditions are guaranteed to happen, resulting in nearly
impossible to find bugs and crashes.
In practice, as I have found, this results in programs that operate
problem free during development, but fail in unexplainable ways "in
the real world", with all the symptoms of race condition induced bugs.
And now I will show how Leopards GC system is, in fact, FUNDAMENTALLY
and fatally flawed.
Since the design of Leopards GC system has hoisted critical aspects of
it's functioning in to the compiler, and in turn the code emitted by
the compiler, this has the unfortunate effect of expanding the points
in which GC bugs can pop up. Contrast this to a dynamic shared
library: When a bug is fixed in a dynamic shared library, programs
using that shared library do not have to be recompiled to take
advantage of those bug fixes. Leopards GC system is akin to using
static libraries, if a bug is found in that library, every single
program that links to that library must be recompiled. The use of
static libraries is considered such poor practice that Sun no longer
supports their use, for obvious reasons.
As shown, proper application of ANSI-C type qualifier propagation
rules would have eliminated the storing of a __strong qualified
pointer to a non-__strong qualified pointer by refusing to compile the
program due to errors. Instead, the compiler has allowed the code to
compile, both warning and error free, even though the code will result
in obvious problems later on.
However, the problem does not lie just in the definition of the
UTF8String method, the problem is with the compiler itself. According
to ANSI-C standard, updating the definition of UTF8String to include
__strong should result in an error being generated when its result is
assigned to a "const char *" pointer declaration.
No such error is generated.
In fact, the compiler appears to be incapable of correctly following
ANSI-C type qualifier rules regarding __strong. For example:
---
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
char *cp;
const char *ccp;
char * const cpc;
__strong char *scp;
char * __strong cps;
ccp = cp;
cp = ccp; // Line 12
cpc = cp; // Line 14
cp = cpc;
scp = cp;
cp = scp;
cps = cp;
cp = cps;
*cp = 'X';
*ccp = 'X'; // Line 24
*scp = 'X';
}
[<johne...>] /tmp% gcc -fobjc-gc-only -c test.m
test.m: In function 'main':
test.m:12: warning: assignment discards qualifiers from pointer target
type
test.m:14: error: assignment of read-only variable 'cpc'
test.m:24: error: assignment of read-only location
---
Type qualifier rules are correctly followed for the 'const' qualifier,
but not a single warning or error is given for what are clearly type
qualification errors according to ANSI-C rules with regard to __strong
qualified pointers. It therefore follows that the compiler, by not
throwing an error, is in fact generating buggy code since such
assignments are not legal.
Since, as demonstrated and should be obvious, improperly discarding
the __strong qualifier WILL result in the GC system reclaiming live
memory. In practice, this results in code that appears to run fine
during development, but due to the "race condition" like nature of
improper reclamation of live memory by the GC system, "real world"
code tends to be buggy and unstable, and crashes for mysterious reasons.
Since the compiler gives no warning, nor error, for discarding the
__strong qualifier when it should, the compiler is emitting code that
will cause GC errors. It's impossible to gauge how frequent this is
in practice, but I don't think anyone will disagree that the sheer
volume of code involved in Cocoa effectively guarantees that these
errors are present.
Once one accepts that these errors are present, one must implicitly
accept that when using Leopards GC system, it is just a matter of time
until the right conditions are present to cause accidental reclamation
and it's associated problems. Therefore, by using Leopards GC system,
you are guaranteeing that you will have random and difficult to find,
if not outright impossible, bugs.
QED, Leopards GC system is fundamentally and fatally flawed, and its
use should be actively discouraged. Due to the nature and severity of
improperly dropping the __strong qualifier, all code generated by the
current compiler (in GC mode) must be considered suspect, and
pragmatically, discarded.
While I'm sure you thought your retort was clever, you have in fact
underlined my point that Leopards GC system is, in fact, deceptively
difficult, hard to master, and trivially easy to get wrong.
Unfortunately, I was not very clear in my original message regarding
these problems. To those of you who have pointed out that, according
to the docs, it is an error on my part that I did not qualify my
pointer with __strong, please consider for a moment that I am writing
this after several months of using Leopards GC system. You are all
right. In theory, by following the docs, this all works.
This is not the point I was trying to make.
In theory, this all works. In practice, it does not, and it's easy to
get wrong. That is my point.
As I believe I have shown in this message, the design of Leopards GC
system is such that it essentially guarantees you're going to get it
wrong at some point. It gleefully allows you to compile code which
will create problems within your application, and does so without
warning or errors. The fact that the compiler is violating ANSI-C
rules in discarding the __strong qualifier guarantees that some code,
some where, is going to get it wrong, and compile it anyway.
If one were to speculate as to what the behavior of the end result of
all of this is, one would figure that "things will mostly work, except
on the rare occasion when they don't". I'm here to tell you that
after four months of this, the "rare occasion" is much more frequent
than you think.
The consequence of such tight integration with the compiler for GC
support dooms code to the same problems that haunt static library
linked code. There will be no bug fixes for your compiled code. No
later version of anything will correct the problems forever frozen in
your binary. As stated, some vendors no longer support linking to
static libraries because of problems like this, for obvious reasons.
I'm a huge fan of GC. Those of you who have used the Boehm GC system
know how easy it is to get spoiled by GC in C and say "Never again!"
to manual memory management. Those of you reading this will have to
draw your own conclusions as to the validity of my claims. My
observations are not the result of some theoretical speculation from
glancing through the GC docs, it's rooted in attempting to use it in
the real world, on non-toy real problems, and I'm sharing with you the
pain I've experienced.
Those of you who've read the examples and think "There's no way I will
ever slip up and miss a __strong qualification," then you're good to
go. Anyone else who thinks "Well, I might, but rarely.." should
understand the full ramifications of what happens when you slip up.
This class of bugs is orders of magnitude more difficult to find and
fix than any retain/release bug you've ever had to deal with. In
fact, I'll go so far to say that multi-threaded locking heisenbugs are
easier, at least then you've got a pretty good idea of where the bug
originates and can concentrate of finding the rare corner case that
triggers it.
As non-trivial, real world examples, consider the following:
(From "Garbage Collection Programming Guide", Core Foundation section:)
o NULL, kCFAllocatorDefault, and kCFAllocatorSystemDefault specify
allocation from the garbage collection zone.
By default, all Core Foundation objects are allocated in the garbage
collection zone.
- (NSString *)someMethod
{
NSUInteger finalStringLength = 1024; // Example only
NSString *copySting = NULL;
char * __strong restrict copyBuffer = NULL;
copyBuffer = NSAllocateCollectable(finalStringLength, 0);
/* Since this is just an example, the part that fills contents of
copyBuffer with text are omitted */
copyString =
NSMakeCollectable
((id)CFStringCreateWithCStringNoCopy(kCFAllocatorDefault, copyBuffer,
kCFStringEncodingUTF8, kCFAllocatorNull));
/* kCFAllocatorNull = This allocator is useful as the
bytesDeallocator in CFData or contentsDeallocator in CFString where
the memory should not be freed. So.. Don't call free() on our
NSAllocateCollectable buffer, which is an error. */
return(copyString);
}
You see where the bug is, right? (Those wondering 'Why CFString.. ?',
it's much faster, and no dispatch overhead. You get the same effect
with initWithBytesNoCopy:length:encoding:freeWhenDone:)
How about this:
- (id *)copyOfObjectArray:(id *)originalObjects length:
(NSUInteger)length
{
id *newObjectArray = NULL;
newObjectArray = NSAllocateCollectable(sizeof(id) * length,
NSScannedOption);
memcpy(newObjectArray, originalObjects, sizeof(id) * length);
return(newObjectArray);
}
Does this contain a bug? And if so, where in "Garbage Collection
Programming Guide" or "NSGarbageCollector Class Reference" does it
indicate that this is a bug?
The "Garbage Collection Programming Guide" and "NSGarbageCollector
Class Reference" documentation say this is "no problem", or at least
don't say that you shouldn't do this? Wonder why it's crashing
randomly then. The allocation has the NSScannedOption, so the garbage
collector is obviously scanning the memory.. and it's dealing with
objects... which are 'special'.. so?
Hint: This is buggy as hell, which should be intuitively obvious to
anyone who's read the GC documentation listed above.
Actually, this is a pretty good test I think. If, after reading the
code snippet and two docs above on the GC system, you can't spot the
bug, you probably shouldn't be using Leopards GC system, cause I
guarantee you this is just one of many land mines just waiting for you
to discover.
If you really want to know the answer: nm -mg /usr/lib/libobjc.dylib |
grep mem (The 'obvious' in the above hint is satirical, obviously) -
On 6 Feb 2008, at 09:39, John Engelhart wrote:
> On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
>
>> I think it comes down to the fact that you have failed to
>> appreciate that Cocoa GC is designed for easy use with *OBJECTS*.
>> If you're using it with objects, it "just works".
>
> You misunderstand what Objective C is, and how it works.
No, I don't. I know *exactly* how Objective-C works and what it is.
Your repeated assertion to the contrary, like much of the rest of your
posts on this topic, couldn't be more wrong.
I don't think arguing further with you on-list will be productive, and
more to the point it will annoy other list members.
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Feb 4, 2008, at 8:21 PM, Chris Hanson wrote:
> On Feb 4, 2008, at 4:14 PM, John Engelhart wrote:
>
>> However, through some unspecified logic, SOME pointers are elevated
>> to 'Points to live GC data'.
>
> The logic isn't unspecified.
>
> If a variable is of an object type or is of a pointer type with the
> qualifier "__strong", it refers to something allocated -- and
> cleaned up -- by the collector. Otherwise, it refers to something
> not allocated by the collector.
Exactly. So, in my example, since the buffer is allocated by the
collector (as demonstrated by the collector not reclaiming the
allocation when it is qualified as __strong), and since we can be
reasonably sure that allocation came from NSAllocateCollectable, which
by its prototype returns void * __strong, my example is 100% bug free.
Unless someone has explicitly typecasted the __strong qualifier away
some how, but that would be a bug on their part because, as you've
said, __strong refers to something that's allocated by the collector,
which the pointer from UTF8String clearly is.
Thankfully these rules are super simple and virtually impossible to
get wrong.
So, as you've said, __strong refers to allocations from the collector,
pointers that can be traced back to the collector have a __strong
qualifier, which no one will casually discard because it will lead to
the collector loosing visibility and creating difficult, hard to find
bugs, my const char * pointer assignment is promoted to strong, or by
ANSI-C type qualification rules causes a compiler error for improperly
discarding the __strong qualifier on assignment, because the
NSAllocateCollectable collector allocation that is returned by
UTF8String is marked __strong, as its method prototype clearly shows:
- (const char *)UTF8String; // Convenience to return null-terminated
UTF8 representation
... Oh... Well, at least the bug isn't mine, the bug is in the
prototype for UTF8String, which has erroneously dropped the __strong
qualifier to a NSAllocateCollectable collector allocation.
Do I file a bug at this point? Do I also file a bug with everyone
who's used the compiler in GC mode and called this method and let them
know that due to a buggy declaration in a header, they may experience
issues with the pointer returned by UTF8String that may lead to data
corruption because the compiler will not emit the required write
barriers required for proper operation of allocations from the
collector?
I'm really glad this is just a hypothetical problem and doesn't happen
in practice, like this UTF8String example demonstrates.
>
>> I mean, seriously, can anyone conjure up a compelling reason why
>> the default behavior of a pointer is that it does not point to live
>> data?
>
> Yes. Distinguishing between pointers to collector-allocated objects
> and non-collector-allocated objects ensures that the collector has
> far less work to do and can do the work it has more efficiently,
> because it can have more exact information about what portions of
> memory it needs to check for strong references to objects.
While you are undeniably right, your justification specious.
Like security, the proper way to analyze the problem is not from the
perspective of "if everything goes right", but "what are the
consequences on failure."
From this perspective, it becomes a question of "What are the
consequences when someone forgets to add __strong? How often do
programers make mistakes? How likely is someone to make this
mistake? Are there robust, automated checks in place to make sure
this doesn't happen?"
The consequences of inadvertently forgetting to add a __strong
qualifier when you should are likely to result in random data loss and
mysterious crashes, all of which are nearly impossible to trace back
to the root cause due to the fact that the problem occurs at some
later point in time, far from the source, due to the nature of how the
collector works.
Do you see how ridiculous your justification is when stated from this
perspective? Can you think of a group of programmers who will take
"Fast, but buggy and unstable, and impossible to debug" over "Slow,
but rock solid"?
Personally, I don't care how f'ing fast the thing is if it essentially
guarantees instability. I've got better things to do with my time
then spend days trying to find the cause of some random, non-
repeatable crash due to some allocation problem. Ironically, the very
thing GC is supposed to prevent.
> It also goes a long way towards preventing false roots, which can
> help keep down the working set of an application that uses garbage
> collection.
>
>> Your choice of words makes me suspect that you consider an object
>> pointer, such as NSObject *, and a C pointer, ala void * or char *,
>> to be two distinctly different things.
>
> They can be; when running under GC, an object is assumed to be
> allocated by the collector. An arbitrary buffer is not. If you
> want to use an arbitrary "non-object pointer" type variable to refer
> to something that is allocated by the collector (e.g. a buffer
> returned from NSAllocateCollectable) you need to mark that variable
> with the __strong type qualifier so it gets the same treatment as an
> object type variable.
By ANSI-C type qualification rules, anything that's returning a
pointer from NSAllocateCollectable must be returning that pointer with
the __strong qualification. Considering the consequences of dropping
that qualification is likely to result in buggy, unstable code,
overriding that qualifier by typecasting it out should not be done
lightly. There's a reason why '-Wcast-qual' exists.
>
>> Pointers are pointers, and knowing which ones to treat as 'special'
>> is non-trivial and easy to get wrong.
>
> In Objective-C it is relatively straightforward to tell which
> pointer type variables to treat as objects. You have pointers to
> objects which you can send arbitrary messages to, and pointers to
> things that aren't objects which you can't send arbitrary messages
> to. The compiler itself has to be able to tell the difference
> between them to generate correct code, warnings, and errors.
This is flat out wrong.
[<johne...>] /tmp% cat gc_str.m
#import <Foundation/Foundation.h>
int main(int argc, char *argv[]) {
NSString *aString = NULL;
void *ptr = NULL;
aString = [NSString stringWithString:@"Hello, world"];
ptr = aString;
NSLog(@"ptr '%@', description '%@'", ptr, [ptr description]);
}
[<johne...>] /tmp% gcc -framework Foundation -fobjc-gc-only -o
gc_str -g gc_str.m
gc_str.m: In function 'main':
gc_str.m:10: warning: invalid receiver type 'void *'
[<johne...>] /tmp% ./gc_str
2008-02-06 07:43:06.732 gc_str[17181:807] ptr 'Hello, world',
description 'Hello, world'
[<johne...>] /tmp%
Having the class type allows some extra compile time diagnostics to
take place, such as sending messages to a class that aren't defined in
its @implementation, but this is nothing but sugar for you and me.
Awfully useful sugar to you and me, no doubt about it, but sugar none
the less.
As the code above shows, your assertion that "The compiler itself has
to be able to tell the difference between them (pointers) to generate
correct code" is obviously wrong.
> The syntax therefore really isn't that ambiguous: If there has
> been an @interface or @class declaration for it, it's an instances
> of a class, otherwise it's something else.
No! This is utterly wrong.
Look at the code above. By your logic, it is incapable of working
because the compiler doesn't know it's an object.
This is not some minor point to be glossed over. It forms a core part
of this entire argument.
If you do not understand the fact that "NSString *string" is equal to
"void *string", you can not understand the complete ramifications of
what the type qualifier __strong is doing (or not doing).
Once you understand that "NSString *" and "void *" are the same thing,
just a pointer, you are forced to ask "Where is the __strong qualifier
magically coming from?"
And before you can answer that, you are forced ask "Wait wait wait,
why did the compiler allow an unqualified void * pointer to be
assigned a more qualified void * pointer, against ANSI-C type
qualifier rules?"
This will be quickly followed by ".. and by silently dropping the
__strong qualifier, that means pointer assignments which are critical
to the proper operation of the collector are getting silently
discarded left and right."
You'll know it when you get it because this will be followed by a
quick mental estimation of just how often this error is occurring in
the entire code base, along with the sensation of the floor dropping
out from underneath you while simultaneously feeling what can only be
described as a baseball bat being cracked over the back of your head.
In that slow motion, car crash time dilation effect, you'll notice
yourself slowly uttering the words:
"Oh... Shit..."
>
> In practice for many developers this isn't a significant issue. I
> don't recall having seen any Cocoa code which used "char *" to store
> an object pointer, for example. There are few places in idiomatic
> Cocoa where you might commonly use a "void *" to store an object
> pointer, and in those situations it's straightforward (and correct
> under non-GC as well) to introduce a CFRetain of the object before
> storing into the "void *" and a CFRelease of the object after the
> "void *" is no longer relevant. (Under GC, CFRetain effectively
> adds an extra root while CFRelease removes one.)
>
> One of the places I do this is in code that presents a sheet,
> because it returns control to the main run loop. If I have to pass
> an object as the "(void *)context" parameter to the sheet
> invocation, I CFRetain it first. Then in the sheet's did-dismiss
> selector (if it has one) or did-end selector (if there's no did-
> dismiss), I CFRelease the object. This ensures that "the sheet"
> acts as a root for the object in case it's transient and just being
> used to pass information around.
No, what you're obviously doing is compensating for a buggy and flawed
GC system which is randomly reclaiming live data, and you're hacking
around the root problem. one. pointer. at. a. time. From my
experience, this is only after hours, usually days, of debugging of
trying to find out why every once in awhile displaying a sheet causes
a crash.
Your example pretty much epitomizes my experience with Cocoas GC
system. I spend far, FAR more time debugging screwy problems like the
one described, only to have to come up with some god awful hacky
kludge to get around the problem. The very problem GC is supposed to
be fixing and freeing me from dealing with so I can spend my time on
real problems.
>
>> [gcConstTitle setTitle:"Hello, world!"];
>> [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
>> \xC2\xA1"] UTF8String]];
>>
>> [[NSGarbageCollector defaultCollector] collectExhaustively];
>> NSLog(@"GC test");
>>
>> printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
>> [gcConstTitle title]);
>> printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
>> [gcUTF8Title title]);
>
> If you build this example non-GC, and you replace
>
> [[NSGarbageCollector defaultCollector] collectExhaustively];
>
> with
>
> [pool drain];
>
> it would be just as incorrect. The object backing that UTF-8 string
> is no longer live, therefore you can't trust that the UTF-8 string
> itself is valid.
Well, scratching the itch of curiosity, in the non-GC example, does
the pointer for UTF8String come from NSAllocateCollectable? That has
a prototype of void * __strong which indicates that it's from the
collector and therefore requires write barriers? No?
"I'll take 'Not relevant' for $200 and 'Misunderstands the
fundamentals' for the win, Alex."
Your example is flawed on the face of it. retain/release allocation
documentation makes it pretty clear that such pointers are temporary
and are valid only up until the autorelease pool pops. You popped the
pool, therefore your use after that point in time is clearly an error.
A garbage collection systems sine qua non is to free the programmer
from having to deal with the issues memory allocation. What good is a
garbage collection system that requires me to hand hold it every step
of the way, that causes me to spend MUCH more time having to deal with
memory allocation problems than if I'd never used it in the first
place? In the GC example, there is a live pointer to an allocation
that the GC system has reclaimed. That allocation comes from a
function that returns a __strong qualified pointer, and UTF8String has
silently discarded it, and as a consequence, caused a perfectly
legitimate and live pointer to become invisible to the collector. -
On Feb 6, 2008, at 4:39 AM, John Engelhart wrote:
> Read the above, "object" is synonymous for "struct". The "layout"
> of an object is identical to the "layout" of a struct.
That is true but irrelevant. What matters for garbage collection is
whether the variables are typed as objects at compile time, because
that's what determines what code the compiler emits for assignments.
> - (const char *)UTF8String; // Convenience to return null-
> terminated UTF8 representation
>
> Since the pointer that UTF8String returns is provably from
> NSAllocateCollectable, this prototype has /DISCARDED/ the type
> qualifier of __strong.
The API does not promise that -UTF8String returns a collectable
pointer. The current implementation may do so, but the documentation
says that you should copy the C string if you want to store it.
--Michael -
On Feb 6, 2008, at 3:39 AM, John Engelhart wrote:
>
> On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
>
>> On 5 Feb 2008, at 00:14, John Engelhart wrote:
>>
>>> I had such high hopes for Cocoa's GC system because once you're
>>> spoiled by GC, it's hard to go back.
>>>
>>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
>>> system has not been nearly as pleasant. It has been outright
>>> frustrating, and it's reached the point where I consider the
>>> system untenable.
>>
>> Honestly, this point has now been answered over and over.
>>
>> I think it comes down to the fact that you have failed to
>> appreciate that Cocoa GC is designed for easy use with *OBJECTS*.
>> If you're using it with objects, it "just works".
>
> You misunderstand what Objective C is, and how it works. "Objects"
> is synonymous for "Structs".
If that were true, you'd be able to declare objects as local variables
(as opposed to as pointers to structures):
NSPoint aPoint; // <-- NSPoint = struct, legal
NSString aString; // <-- NSString = object, Illegal
(yes, at one time there was an attempt to add support for that, but it
didn't survive).
Structures don't have "magic invisible members":
@interface Foo {
}
@end
Foo *aFoo;
NSLog(@"Foo is a %@", aFoo->isa);
Notice how there is an "isa" member that is automatically put there,
not unlike the way that a C++ object might have a vtable (or other
internal plumbing for multiple inheritance).
Glenn Andreas <gandreas...>
<http://www.gandreas.com/> wicked fun!
quadrium | flame : flame fractals & strange attractors : build,
mutate, evolve, animate -
On 6 Feb 2008, at 09:39, John Engelhart wrote:
> - (NSString *)someMethod
> {
> NSUInteger finalStringLength = 1024; // Example only
> NSString *copySting = NULL;
> char * __strong restrict copyBuffer = NULL;
>
> copyBuffer = NSAllocateCollectable(finalStringLength, 0);
> /* Since this is just an example, the part that fills contents of
> copyBuffer with text are omitted */
> copyString =
> NSMakeCollectable
> ((id)CFStringCreateWithCStringNoCopy(kCFAllocatorDefault,
> copyBuffer, kCFStringEncodingUTF8, kCFAllocatorNull));
> /* kCFAllocatorNull = This allocator is useful as the
> bytesDeallocator in CFData or contentsDeallocator in CFString where
> the memory should not be freed. So.. Don't call free() on our
> NSAllocateCollectable buffer, which is an error. */
> return(copyString);
> }
>
> You see where the bug is, right?
I'll just add, publicly, that I think this probably is a bug in
CFString that John has found here. That is, I don't see why
CFString's pointer shouldn't be traced by the collector in this case
(it doesn't appear to be; certainly when I try it the backing buffer
is released). The problem also occurs with NSString's -
initWithBytesNoCopy:length:encoding:freeWhenDone: et al.
I've asked John if he's filed a bug report (I just filed one, <rdar://5727379
> , with a working code snippet).
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Feb 6, 2008, at 10:55 AM, Alastair Houghton wrote:
> I'll just add, publicly, that I think this probably is a bug in
> CFString that John has found here. That is, I don't see why
> CFString's pointer shouldn't be traced by the collector in this
> case (it doesn't appear to be; certainly when I try it the backing
> buffer is released). The problem also occurs with NSString's -
> initWithBytesNoCopy:length:encoding:freeWhenDone: et al.
I don't think this is a bug. The NSString and CFString APIs do not
indicate that they treat the bytes as scanned memory. In fact, when
you pass in kCFAllocatorNull you are telling CFString that you
"assume responsibility for deallocating the buffer." At the end of -
someMethod, you haven't saved a __strong reference to the buffer, so
the collector is allowed to free it.
--Michael -
On Feb 6, 2008 1:39 AM, John Engelhart <john.engelhart...> wrote:
>
> On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
>
>> On 5 Feb 2008, at 00:14, John Engelhart wrote:
>>
>>> I had such high hopes for Cocoa's GC system because once you're
>>> spoiled by GC, it's hard to go back.
>>>
>>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
>>> system has not been nearly as pleasant. It has been outright
>>> frustrating, and it's reached the point where I consider the system
>>> untenable.
>>
>> Honestly, this point has now been answered over and over.
>>
>> I think it comes down to the fact that you have failed to appreciate
>> that Cocoa GC is designed for easy use with *OBJECTS*. If you're
>> using it with objects, it "just works".
>
> You misunderstand what Objective C is, and how it works. "Objects" is
> synonymous for "Structs".
No, he understands perfectly well. "Structs" are an implementation
detail of "Objects". That is, conceptually, each Objective-C object
*has a* C struct.
>> As far as objects "not being special", they *are* special, in that
>> the compiler generates layout information and method signatures for
>> them.
>
> Read the above, "object" is synonymous for "struct". The "layout" of
> an object is identical to the "layout" of a struct.
Again, that is an implementation detail of the old (32-bit) runtime,
this is not true going forward, and you would be well served to sever
that association in your mental model of Objective-C. Believe me,
Alastair has demonstrated many times over that he understands what he
is saying.
>> or if you absolutely must return a const char * pointer,
>>
>> - (const char *)hexString
>> {
>> return [[NSString stringWithFormat:@"0x%8.8x", myIvar] UTF8String];
>> }
>>
>> which has the benefit of working exactly like -UTF8String. If you
>> *really* wanted, you could mess about with NSAllocateCollectable().
>
> Oh... my...
>
> I could not possibly have asked for a better example of just how easy
> it is to get this wrong.
There is nothing wrong with this example. Since the return value is on
the stack, it is rooted. If you get the result of this function and
expect it to stay around for any length of time, it is your
responsibility to copy it. This is the case without GC and it is still
the case with GC.
> Let those of you who are considering using Leopards GC system use this
> as a warning of how dangerous its use is in practice.
>
> First, understand that my original example, the one in which the GC
> system snatched away the live allocation, did use
> NSAllocateCollectable. Your example, using UTF8String, uses
> NSAllocateCollectable as well. We can infer this by the behavior
> exhibited by the GC system when qualifying pointers that store the
> results from these methods as __strong, which prevents the collector
> from reclaiming the allocation. Thus, by induction, the pointer to
> the buffer that contains the strings text must ultimately come from
> NSAllocateCollectable.
>
> Let's start with NSAllocateCollectable(). The prototype for
> NSAllocateCollectable, from NSZone.h, is as follows:
>
> FOUNDATION_EXPORT void *__strong NSAllocateCollectable(NSUInteger
> size, NSUInteger options) AVAILABLE_MAC_OS_X_VERSION_10_4_AND_LATER;
>
> NOTE WELL: the __strong qualifier for the pointer.
__strong as a qualifier on a return value is effectively meaningless;
All values on the stack or in a register are implicitly strong. It is
simply there to add information for the human reading it.
> Now, there is no formal grammar of Objective-C 2.0 published, but it
> is a reasonable assumption that "__strong" is in the same group of
> type qualifiers as "const" and "volatile". This makes __strong
> subject to the same ANSI-C rules governing the use of type qualifiers,
> including promotion and assignment rules. Returning to the method
> definition of UTF8String, we find the following in NSString.h:
>
> - (const char *)UTF8String; // Convenience to return null-terminated
> UTF8 representation
>
> Since the pointer that UTF8String returns is provably from
> NSAllocateCollectable,
It may do this now, but that is an implementation detail. Using the
result of UTF8String will work just fine as long as you don't store a
pointer to it for longer than the current context (whether that
context is an autorelease pool in pre-GC or it is the length of time
which the value is on the stack in GC) then you must copy it.
>
> QED, my use of the UTF8String pointer is bug free and legal by ANSI-C
> rules. That this later causes the GC system to reclaim the memory
> pointed to by this pointer is due to a bug in prototype of UTF8String.
> By type promotion rules, the prototype for UTF8String should be:
Your use of UTF8String is counter to the documentation of that method.
"The returned C string is automatically freed just as a returned
object would be released; you should copy the C string if it needs to
store it outside of the autorelease context in which the C string is
created."
This is true, regardless of how Apple decides to implement this
method. For instance, it could be a pointer to inline storage within
the NSString instance itself, it could be a pointer into an
autoreleased NSData object, it could be a pointer to an
NSAllocateCollectable-allocated block of memory, etc. Regardless of
how it is implemented, it still behaves as per the contract given you
by UTF8String, you don't need to know any more or any less to use it
properly.
--
Clark S. Cox III
<clarkcox3...> -
>>
>
> Structures don't have "magic invisible members":
>
> @interface Foo {
> }
> @end
>
> Foo *aFoo;
> NSLog(@"Foo is a %@", aFoo->isa);
>
> Notice how there is an "isa" member that is automatically put there,
> not unlike the way that a C++ object might have a vtable (or other
> internal plumbing for multiple inheritance).
>
Wrong. This will not work. Foo will not have a magic isa ivar, an is
not a valid objc class.
You have to either:
1) inherit from a valid root class (NSObject).
2) add a "Class isa" ivar to your declaration.
See the NSObject declaration:
@interface NSObject <NSObject> {
Class isa;
}
...
@end -
On 6 Feb 2008, at 16:46, Michael Tsai wrote:
> On Feb 6, 2008, at 10:55 AM, Alastair Houghton wrote:
>
>> I'll just add, publicly, that I think this probably is a bug in
>> CFString that John has found here. That is, I don't see why
>> CFString's pointer shouldn't be traced by the collector in this
>> case (it doesn't appear to be; certainly when I try it the backing
>> buffer is released). The problem also occurs with NSString's -
>> initWithBytesNoCopy:length:encoding:freeWhenDone: et al.
>
> I don't think this is a bug. The NSString and CFString APIs do not
> indicate that they treat the bytes as scanned memory.
That's true, but it doesn't matter whether they treat the bytes as
scanned memory or not; that would only change whether putting pointer
data in the bytes was safe. The problem is whether the pointer itself
is being traced, which isn't happening right now; the docs *do* say
(in the Garbage Collection Programming Guide) that NULL,
kCFAllocatorDefault and kCFAllocatorSystemDefault cause objects to be
allocated in the GC zone, so I don't think it's unreasonable to expect
that the pointer will be traced.
> In fact, when you pass in kCFAllocatorNull you are telling CFString
> that you "assume responsibility for deallocating the buffer." At the
> end of -someMethod, you haven't saved a __strong reference to the
> buffer, so the collector is allowed to free it.
It's *an* argument, certainly.
I just think that there's no harm in making the pointer visible to the
collector; it doesn't hurt if the pointer isn't pointing into the GC
pool. And it would mean that you could pass a chunk of memory
allocated using NSAllocateCollectable(), which seems not
unreasonable. I don't think it's hugely important, since you can
always use malloc() and let it call free() (which will happen
automatically), but if it's an easy fix then it's probably worth
doing. Not that many people will do this or indeed should be doing
this kind of thing.
Anyway, at the very least it's worth drawing to the attention of
whoever's responsible for CFString at Apple. If they want to fix it,
great. If not, the docs could be updated to say that you shouldn't
pass GC'd memory into those APIs.
If we could see the sources for CFString, we could probably make a
better determination as to whether this was worth fixing. But
currently the CF project's sources aren't visible (for Leopard) :-(
Kind regards,
Alastair.
--
http://alastairs-place.net -
Please remember that Apple Engineers spend their own time answering
questions on this list for the benefit of developers and the developer
community. It isn't part of their job to do this, it's their
commitment to the community.
It is simply unfair to take this type of attitude with their response
(or in fact a response from anyone).
I don't want any Apple engineer, or any other people who contribute to
the list, to have to endure this type of acidic response. Posters that
do so risk moderation.
Remember, we're all people and we're all trying to work together to
get 'stuff' done.
Be kind, rewind.
Scott Anguish
Apple
[if there are issues with this email, please send them to <cocoa-dev-admins...>
, NOT to this list]
On Feb 6, 2008, at 6:59 AM, John Engelhart wrote:
> <snip acidic rant>
-
On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...> wrote:
> "I'll take 'Not relevant' for $200 and 'Misunderstands the
> fundamentals' for the win, Alex."
Speaking of "not relevant" and "misunderstands the fundamentals":
1) UTF8String returns a non-__strong pointer.
2) You fail to copy the data at that pointer.
3) You cry foul when that data disappears.
> A garbage collection systems sine qua non is to free the programmer
> from having to deal with the issues memory allocation.
Exactly. So stop getting your knickers in a twist about whether or not
UTF8String actually returns memory from NSAllocateCollectable(), and
simply copy the result as required by the documentation for
UTF8String.
Hamish -
On Feb 6, 2008, at 12:46 PM, Alastair Houghton wrote:
>> I don't think this is a bug. The NSString and CFString APIs do not
>> indicate that they treat the bytes as scanned memory.
>
> That's true, but it doesn't matter whether they treat the bytes as
> scanned memory or not; that would only change whether putting
> pointer data in the bytes was safe. The problem is whether the
> pointer itself is being traced, which isn't happening right now
Sorry, that's what I meant.
> the docs *do* say (in the Garbage Collection Programming Guide)
> that NULL, kCFAllocatorDefault and kCFAllocatorSystemDefault cause
> objects to be allocated in the GC zone, so I don't think it's
> unreasonable to expect that the pointer will be traced.
The string was allocated using kCFAllocatorDefault, but the
deallocator was specified as kCFAllocatorNull. The docs say:
"If the buffer does not need to be deallocated, or if you want to
assume responsibility for deallocating the buffer (and not have the
CFString object deallocate it), pass kCFAllocatorNull."
If CFString is not going to be responsible for deallocating the
buffer, then it would not make sense to rely on CFString keeping the
buffer alive for you.
> I just think that there's no harm in making the pointer visible to
> the collector; it doesn't hurt if the pointer isn't pointing into
> the GC pool.
I'm not saying there would be harm; I'm just saying that the API
doesn't say that it traces the pointer, so it's not a bug if it
doesn't. In the absence of a general guideline or an explicit
promise, passing a buffer to a CF call is just like passing it to,
say, a sqlite3 call.
--Michael -
On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...> wrote:
> "Wait wait wait,
> why did the compiler allow an unqualified void * pointer to be
> assigned a more qualified void * pointer, against ANSI-C type
> qualifier rules?"
Good point, and you might want to file a bug report with Apple that
gcc should generate a warning in cases like this. Luckily, this is not
the same as "Leopard GC is inexorably broken". Why are you so keen to
throw out the baby with the bathwater?
Hamish -
On 6 Feb 2008, at 17:52, Hamish Allan wrote:
> On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...>
> wrote:
>
>> "I'll take 'Not relevant' for $200 and 'Misunderstands the
>> fundamentals' for the win, Alex."
>
> Speaking of "not relevant" and "misunderstands the fundamentals":
>
> 1) UTF8String returns a non-__strong pointer.
__strong isn't a type qualifier, it's an attribute (in the sense of
the __attribute__ keyword). The distinction is perhaps a bit subtle,
especially as attributes can be attached to a typedef'd type, but it's
the reason that you can put __strong anywhere in a variable
declaration and it still has the same effect. It *isn't* like const
or volatile, and the ANSI C rules regarding type qualifiers absolutely
*do not* apply.
Furthermore I *think* (and this is from memory, based on some work I
did on GCC several years ago, so I might be wrong) that if you write
something like
void * __strong MyFunction(void);
you'll find that the __strong attribute is attached to the *function*
rather than to the type. In any case it's going to be ignored because
__strong only really affects variables, not types or functions.
> 2) You fail to copy the data at that pointer.
> 3) You cry foul when that data disappears.
Well the bug is *either* ignoring the bit in the -UTF8String docs
where it says you should copy the string (though that does read like
it was only intended to talk about the non-GC case---I just filed <rdar://5727581
> asking for a clarification), or not using __strong on the variableyou're storing the result in.
>> A garbage collection systems sine qua non is to free the programmer
>> from having to deal with the issues memory allocation.
>
> Exactly. So stop getting your knickers in a twist about whether or not
> UTF8String actually returns memory from NSAllocateCollectable(), and
> simply copy the result as required by the documentation for
> UTF8String.
Indeed, I rather wish I hadn't mentioned NSAllocateCollectable(),
since I think it's only muddied the waters further.
Kind regards,
Alastair.
--
http://alastairs-place.net -
On Feb 6, 2008, at 11:05 AM, Jean-Daniel Dupas wrote:
>>>
>>
>> Structures don't have "magic invisible members":
>>
>> @interface Foo {
>> }
>> @end
>>
>> Foo *aFoo;
>> NSLog(@"Foo is a %@", aFoo->isa);
>>
>> Notice how there is an "isa" member that is automatically put
>> there, not unlike the way that a C++ object might have a vtable (or
>> other internal plumbing for multiple inheritance).
>>
>
>
> Wrong. This will not work. Foo will not have a magic isa ivar, an is
> not a valid objc class.
>
> You have to either:
> 1) inherit from a valid root class (NSObject).
> 2) add a "Class isa" ivar to your declaration.
>
> See the NSObject declaration:
>
> @interface NSObject <NSObject> {
> Class isa;
> }
> ...
> @end
Mea culpa - someday I'll learn to not post until the _second_ cup of
coffee....
Perhaps a more accurate statement is that it has an "explicit magic
member" - it must have an isa pointer as the very first field
(contrary to the caffeine deprived ruminations, the compiler doesn't
put it there automatically).
Regardless, all objects must start with that isa pointer - the runtime
requires it. Objects _are_ special, and pretending that this is just
another arbitrary struct is incorrect:
1) They can't live on the stack
2) They have a special isa pointer
3) They have implicit requirements to be allocated/copied/released
using special routines (NSAllocateObject, NSCopyObject,
NSDeallocateObject) - i.e., it's unclear if you attempted to malloc or
new a structure of the same type and manually filled in the isa
pointer would necessarily work on current or future Objective-C
runtimes (it would almost certainly fail under 64 bit Objective 2.0)
Glenn Andreas <gandreas...>
<http://www.gandreas.com/> wicked fun!
quadrium | prime : build, mutate, evolve, animate : the next
generation of fractal art -
On 2/6/08 7:12 PM, Alastair Houghton said:
> Furthermore I *think* (and this is from memory, based on some work I
> did on GCC several years ago, so I might be wrong) that if you write
> something like
>
> void * __strong MyFunction(void);
>
> you'll find that the __strong attribute is attached to the *function*
> rather than to the type. In any case it's going to be ignored because
> __strong only really affects variables, not types or functions.
Well, NSAllocateCollectable is declared like so in NSZone.h:
FOUNDATION_EXPORT void *__strong NSAllocateCollectable(NSUInteger size,
NSUInteger options) AVAILABLE_MAC_OS_X_VERSION_10_4_AND_LATER;
And the comment just above says "the pointer type of the stored location
must be marked with the __strong attribute in order for the write-
barrier assignment primitive to be generated".
--
____________________________________________________________
Sean McBride, B. Eng <sean...>
Rogue Research www.rogue-research.com
Mac Software Developer Montréal, Québec, Canada -
On Feb 6, 2008 3:23 PM, glenn andreas <gandreas...> wrote:
> Regardless, all objects must start with that isa pointer - the runtime
> requires it. Objects _are_ special, and pretending that this is just
> another arbitrary struct is incorrect:
> 1) They can't live on the stack
This is a bit nitpicky, but they can:
struct { Class isa; id ivar; } fakeObj;
fakeObj.isa = [SomeClass class];
id fakeObjPtr = (id)&fakeObj;
[fakeObjPtr myCustomInitializer];
Of course you have to ensure that the layout is correct, and you can't use
this with any Cocoa classes since you completely break Cocoa's idea of
initialization memory management, but it works fine with classes which are
100% custom (i.e. do not have Cocoa anywhere within their inheritance chain)
as long as they understand stack semantics. You can fix the layout problem
in 32-bit by using @defs, but in 64-bit land you'd have to fall back to
trickery with alloca or C variable-length arrays, neither of which is going
to be much fun.
The Stepstone compiler explicitly supported stack objects by doing the
obvious thing and leaving out the * when declaring a local object variable.
Obviously gcc doesn't support this though.
2) They have a special isa pointer
> 3) They have implicit requirements to be allocated/copied/released
> using special routines (NSAllocateObject, NSCopyObject,
> NSDeallocateObject) - i.e., it's unclear if you attempted to malloc or
> new a structure of the same type and manually filled in the isa
> pointer would necessarily work on current or future Objective-C
> runtimes (it would almost certainly fail under 64 bit Objective 2.0)
As far as I know, the only tricky thing about the 64-bit runtime is that you
can't calculate the amount of storage required at compile time.
Once you have an object, the only thing that really cares about it is the
objc_msgSend family of functions and ivar access. objc_msgSend only cares
about the isa pointer which will still be there as long as you set it up
properly, and ivar access, even in 64-bit land, is just accessing memory at
an offset to the self pointer. Cocoa classes have certain memory management
requirements but ObjC classes do not. You can write your own class hierarchy
which uses malloc/free, new/delete, mmap, or any other memory allocation
technique and it will work fine. You'll lose out on garbage collection, but
that's to be expected when you start doing custom memory allocation.
I would not recommend actually *doing* any of the above, but there's a
fairly wide space between what's reasonable to do and what's possible to do.
Digressing slightly, it would be nice if NSObject would offer separate
deinitialization and deallocation methods to match the separate
initialization and allocation methods so that it would be possible to
implement a custom memory allocation scheme in an NSObject subclass. As it
stands, you either have to implement your own root class, or you have to
subclass directly from NSObject and assume that its dealloc method does
nothing other than free the object's memory.
Mike -
On Feb 6, 2008 7:12 PM, Alastair Houghton <alastair...> wrote:
> Indeed, I rather wish I hadn't mentioned NSAllocateCollectable(),
> since I think it's only muddied the waters further.
I'm starting to wonder whether much of this debate could have been
avoided if NSAllocateCollectable() hadn't been declared __strong. If
such an attribute is meaningless for a function, it serves only to
mislead and confuse.
Hamish -
On Feb 6, 2008, at 2:12 PM, Alastair Houghton wrote:
> On 6 Feb 2008, at 17:52, Hamish Allan wrote:
>
>> On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...>
>> wrote:
>>
>>> "I'll take 'Not relevant' for $200 and 'Misunderstands the
>>> fundamentals' for the win, Alex."
>>
>> Speaking of "not relevant" and "misunderstands the fundamentals":
>>
>> 1) UTF8String returns a non-__strong pointer.
>
> __strong isn't a type qualifier, it's an attribute (in the sense of
> the __attribute__ keyword). The distinction is perhaps a bit
> subtle, especially as attributes can be attached to a typedef'd
> type, but it's the reason that you can put __strong anywhere in a
> variable declaration and it still has the same effect. It *isn't*
> like const or volatile, and the ANSI C rules regarding type
> qualifiers absolutely *do not* apply.
As I mentioned previously, having no formal grammar of ObjC 2.0 makes
this point debatable. Your interpretation that __strong is an
__attribute__ extension could certainly be true, and is a valid way of
looking at things. Without a grammar to guide us, I think both
interpretations are equally valid.
However, consider for a moment if it was a type attribute, and
followed type attribute rules, and how this would effect the examples
cited. Off the top of my head, I think treating it as a type
attribute would have prevent every single error I've pointed out.
Hypothetically, consider if UTF8String propagated the __strong type,
and the assignment of its pointer to the ivar 'const char *'.
The compiler would fail to compile the code, and generate an error.
Again, I'm pretty sure that every example posted would be caught by if
__strong was treated as a type attribute.
Since we have two valid ways to interpret the meaning of __strong, I
believe that the usefulness of catching these errors at compile time
argues strongly in favor of considering it a type attribute.
>
> Furthermore I *think* (and this is from memory, based on some work I
> did on GCC several years ago, so I might be wrong) that if you write
> something like
>
> void * __strong MyFunction(void);
>
> you'll find that the __strong attribute is attached to the
> *function* rather than to the type. In any case it's going to be
> ignored because __strong only really affects variables, not types or
> functions.
I hate this part of the language. Each qualifier has it's quirks, and
their application is non obvious, including exactly what they apply
to. The distinction between
const char * ptr; and
char * const ptr;
isn't terribly clear by just looking at it. Add to this
const char * const ptr;
Three, totally different things, and using const twice in this way
just seems like it would be a bug at first glance.
>
>> 2) You fail to copy the data at that pointer.
>> 3) You cry foul when that data disappears.
>
> Well the bug is *either* ignoring the bit in the -UTF8String docs
> where it says you should copy the string (though that does read like
> it was only intended to talk about the non-GC case---I just filed <rdar://5727581
>> asking for a clarification), or not using __strong on the variable
> you're storing the result in.
Actually, I've thought of another example which addresses the use of
(or lack of) __strong unambiguously and still demonstrates the problem:
#import <Foundation/Foundation.h>
@interface GCTest : NSObject {
const char *title;
};
- (void)setTitle:(const char *)newTitle;
- (const char *)title;
@end
@implementation GCTest
- (void)setTitle:(const char *)newTitle
{
printf("Setting title. Old title: %p, new title %p = '%s'\n",
title, newTitle, newTitle);
title = newTitle;
}
- (const char *)title
{
return title;
}
@end
int main(int argc, char *argv[]) {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
void *ptr;
gcConstTitle = [[GCTest alloc] init];
gcUTF8Title = [[GCTest alloc] init];
[gcConstTitle setTitle:"Hello, world!"];
[gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
\xC2\xA1"] UTF8String]];
NSLog(@"Test: %@", @"hello");
[[NSGarbageCollector defaultCollector] collectExhaustively];
NSLog(@"GC test");
printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
[gcConstTitle title]);
printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
[gcUTF8Title title]);
[gcConstTitle setTitle:NULL]; // Must clear the pointer before
popping pool.
[gcUTF8Title setTitle:NULL];
[pool release];
return(0);
}
[<johne...>] /tmp% gcc -framework Foundation -fobjc-gc-only -o
gc -g gc.m
[<johne...>] /tmp% ./gc
Setting title. Old title: 0x0, new title 0x1ea4 = 'Hello, world!'
Setting title. Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
2008-02-06 18:32:35.712 gc[18108:807] Test: hello
2008-02-06 18:32:35.798 gc[18108:807] GC test
gcConstTitle title: 0x1ea4 = 'Hello, world!'
gcUTF8Title title: 0x1011860 = 'Hello, world'
Setting title. Old title: 0x1ea4, new title 0x0 = '(null)'
Setting title. Old title: 0x1011860, new title 0x0 = '(null)'
Oddly, I had to add a second NSLog() in order to get some kind of
lossage, but I think it's fair to chalk this up to the semi-random
nature of allocations.
The above example is now perfectly legal by everyones definition of
how things were under retain/release, and I correctly clear the
pointer before it goes out of scope, and demonstrates that the GC
system can, and does, reclaim live data out from under you.
>
>>> A garbage collection systems sine qua non is to free the programmer
>>> from having to deal with the issues memory allocation.
>>
>> Exactly. So stop getting your knickers in a twist about whether or
>> not
>> UTF8String actually returns memory from NSAllocateCollectable(), and
>> simply copy the result as required by the documentation for
>> UTF8String.
>
> Indeed, I rather wish I hadn't mentioned NSAllocateCollectable(),
> since I think it's only muddied the waters further.
I don't. Again, consider the hypothetical case where __strong is type
qualifier. During development of UTF8String, this would cause a
warning or error (implementation dependent, but I'd argue for error).
This would highlight the fact that the prototype is discarding a
critical qualifier. Assuming that the definition of UTF8String was
altered to include __strong, my attempt at assigning a __strong
qualified pointer to a non-strong pointer would instantly be flagged
and reported by the compiler.
Consider the case of handing a __strong pointer off to a function,
such as CFStringCreateWithCStringNoCopy(). If the prototype does not
have __strong for the buffer argument, my example of handing it an
NSAllocateCollectable pointer would again, instantly trigger a
compiler warning or error (I vote error considering the consequences).
It's hard to argue that this is not "The Right Thing" to be doing as
it would have mooted every single point I have raised, and caught all
of these errors at compile time before they could have become
problems. This would have also caught the problem in the example I
pasted above, at compile time, and alerted me that I just created a
problem.
Finally, consider the effects of the current behavior of silently
discarding __strong. As the example in this message shows, it's
surprisingly easy to create conditions which violate the conditions
required for proper GC behavior.
After four months of practical, hands on usage of Leopards GC system,
my experience is that this is happening far, far more frequently than
you might think. I have had to deal with endless problems under GC
which have all the tell tale signs of race conditions. Because what
I'm developing is dual mode, flipping over to retain/release works
flawlessly, even after intensive multithreaded concurrent extreme
stress testing. The retain/release side of thing never crashes, but
the GC side of things mostly doesn't crash.
I think the evidence I've put forth is pretty strong. I think the
example I give in this message addresses everyones concerns regarding
the use of __strong. It's pretty clear that I don't need to use
__strong for the pointer, and that under retain/release methodology,
it is incapable of freeing the UTF8String buffer while its still in
use. It's certainly open to debate how frequently this happens in
practice. I have shown that it is possible. In my opinion, after
four months of use, this is happening much more frequently than you
might think at first approximation. And just like race conditions, it
gets worse the harder you push, which is to be expected. It's pretty
much a ticking time bomb, and everything seems to work fine when
you're doing development, but then starts failing mysteriously out in
the field. -
On Feb 7, 2008 1:06 AM, John Engelhart <john.engelhart...> wrote:
> It's pretty clear that I don't need to use __strong for the pointer
I don't think this is at all clear. You have only a single weak
reference to the data returned by UTF8String, so it's not at all
surprising when it gets lost. This is pretty much akin to what happens
if you fail to send a retain to an autoreleased object, but you're not
complaining that pre-Leopard memory management is broken.
Hamish -
On 07/02/2008, at 12:06 PM, John Engelhart wrote:
> However, consider for a moment if it was a type attribute, and
> followed type attribute rules, and how this would effect the
> examples cited. Off the top of my head, I think treating it as a
> type attribute would have prevent every single error I've pointed
> out. Hypothetically, consider if UTF8String propagated the __strong
> type, and the assignment of its pointer to the ivar 'const char *'.
>
> The compiler would fail to compile the code, and generate an error.
I don't think it should be a type qualifier. It would mean that you
wouldn't be able to do things like:
puts ([myString UTF8String])
without getting a compiler warning.
> Oddly, I had to add a second NSLog() in order to get some kind of
> lossage, but I think it's fair to chalk this up to the semi-random
> nature of allocations.
I think that would have been because the pointer returned by
UTF8String was still on the stack or in a register.
> The above example is now perfectly legal by everyones definition of
> how things were under retain/release, and I correctly clear the
> pointer before it goes out of scope, and demonstrates that the GC
> system can, and does, reclaim live data out from under you.
I think your example is contrived. Whilst it's legal in the retain/
release world you wouldn't ever write anything like that.
The solution to all of this is, as has already been stated, is to
understand the contract that UTF8String promises and to make your own
arrangements if you want to hang on to the value.
- Chris -
On Feb 6, 2008, at 10:45 AM, glenn andreas wrote:
>
> On Feb 6, 2008, at 3:39 AM, John Engelhart wrote:
>
>>
>> On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
>>
>>> On 5 Feb 2008, at 00:14, John Engelhart wrote:
>>>
>>>> I had such high hopes for Cocoa's GC system because once you're
>>>> spoiled by GC, it's hard to go back.
>>>>
>>>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
>>>> system has not been nearly as pleasant. It has been outright
>>>> frustrating, and it's reached the point where I consider the
>>>> system untenable.
>>>
>>> Honestly, this point has now been answered over and over.
>>>
>>> I think it comes down to the fact that you have failed to
>>> appreciate that Cocoa GC is designed for easy use with *OBJECTS*.
>>> If you're using it with objects, it "just works".
>>
>> You misunderstand what Objective C is, and how it works. "Objects"
>> is synonymous for "Structs".
>
>
>
> If that were true, you'd be able to declare objects as local
> variables (as opposed to as pointers to structures):
>
> NSPoint aPoint; // <-- NSPoint = struct, legal
> NSString aString; // <-- NSString = object, Illegal
Surprisingly, you can actually do this. It requires some contortions
and manual initiation, but the end result ends up being identical to
what "NSString aString" would have been:
[<johne...>] /tmp% cat tst.m
#import <Foundation/Foundation.h>
int main(int argc, char **argv) {
NSObject *stackObject = NULL;
stackObject = alloca(sizeof(stackObject));
memset(stackObject, 0, sizeof(NSObject));
stackObject->isa = [NSObject class];
NSLog(@"stackObject: %@", stackObject);
}
[<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
tst.m: In function 'main':
tst.m:7: warning: instance variable 'isa' is @protected; this will be
a hard error in the future
[<johne...>] /tmp% ./tst
2008-02-06 20:17:16.306 tst[18320:807] stackObject: <NSObject:
0xbffff380>
As the address clearly shows, this is an object on the stack.
Although I have had to manually initialize the object, it is exactly,
or very close to, what "NSObject stackObject" would have created.
This is generally a bad idea to do in practice as the object is
"deallocated" as soon as the frame pops. But because it's on the
stack there is no way to ensure something along the lines of
"release / dealloc" happens to make sure that any of the resources the
object may have created/acquired are released... this is generally a
bad idea, and why it is disallowed in practice.
As you can see, and the code clearly demonstrates, my original
assertion stands.
>
> (yes, at one time there was an attempt to add support for that, but
> it didn't survive).
Yes, the machinery to arrange for stack objects to have a chance to
"dealloc" when the stack frame pops is non-trivial. But, as shown
above, you can do it, but it's really not a good idea.
>
> Structures don't have "magic invisible members":
>
> @interface Foo {
> }
> @end
>
> Foo *aFoo;
> NSLog(@"Foo is a %@", aFoo->isa);
>
> Notice how there is an "isa" member that is automatically put there,
> not unlike the way that a C++ object might have a vtable (or other
> internal plumbing for multiple inheritance).
struct FooDef { @defs(Foo) } *aFoo;
aFoo->isa;
You're right, structs don't have magic invisible members. The @defs()
directive allows you to "copy" the ivars from the @interface
declaration.
It's instructive to look at objc/objc.h, where we find the following:
typedef struct objc_class *Class;
typedef struct objc_object {
Class isa;
} *id;
typedef struct objc_selector *SEL;
typedef id (*IMP)(id, SEL, ...);
As you can see, it's very clear where "isa" comes from. Subclassing
an object has the effect of "pasting" your ivar declarations at the
end of the class you're inheriting from, and forms the cause of
"fragile classes" since a struct effectively becomes pointer + offset,
and changing a struct requires recompiling code to update those offsets.
So, no, there are no magic invisible members. Furthermore:
typedef id (*IMP)(id, SEL, ...);
Is a function prototype declaration. You're probably familiar with
it, as the following makes a bit more clear:
typedef id (*IMP)(id self, SEL _cmd, ...);
This is where self and _cmd come from. Nothing hidden. It's pure
ANSI-C, with a bit of syntactic sugar that automates common tasks like
object inheritance, automatic scoping of variables from self, and a
clever key (selector) / value (pointer to a function) dynamic run time
system known as "message dispatching." -
On Feb 6, 2008 5:06 PM, John Engelhart <john.engelhart...> wrote:
>
> Actually, I've thought of another example which addresses the use of
> (or lack of) __strong unambiguously and still demonstrates the problem:
No, you're just rehashing and reformulating the same argument over and
over again. You're not listening.
>
> #import <Foundation/Foundation.h>
>
> @interface GCTest : NSObject {
> const char *title;
> };
>
> - (void)setTitle:(const char *)newTitle;
> - (const char *)title;
>
> @end
>
> @implementation GCTest
>
> - (void)setTitle:(const char *)newTitle
> {
> printf("Setting title. Old title: %p, new title %p = '%s'\n",
> title, newTitle, newTitle);
> title = newTitle;
This is a bad thing to do in pre-GC Cocoa, it is still a bad thing to
do in post-GC Cocoa. Don't do it. If you want to store the result of
calling UTF8String, copy it. Period.
> }
>
> - (const char *)title
> {
> return title;
> }
>
> @end
>
> int main(int argc, char *argv[]) {
> NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
> GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
> void *ptr;
>
> gcConstTitle = [[GCTest alloc] init];
> gcUTF8Title = [[GCTest alloc] init];
>
> [gcConstTitle setTitle:"Hello, world!"];
> [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world \xC2\xA1"] UTF8String]];
At this point, the temporary NSString object has exactly zero roots.
Why are surprised that it goes away?
> NSLog(@"Test: %@", @"hello");
> [[NSGarbageCollector defaultCollector] collectExhaustively];
> NSLog(@"GC test");
>
> printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title], [gcConstTitle title]);
> printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title], [gcUTF8Title title]);
>
> [gcConstTitle setTitle:NULL]; // Must clear the pointer before popping pool.
> [gcUTF8Title setTitle:NULL];
>
> [pool release];
> return(0);
> }
> The above example is now perfectly legal by everyones definition of
> how things were under retain/release, and I correctly clear the
> pointer before it goes out of scope, and demonstrates that the GC
> system can, and does, reclaim live data out from under you.
No it is not. Holding on to the result of UTF8String for longer than
the lifetime of the NSString from which it came is not legal. How hard
is that to understand?
--
Clark S. Cox III
<clarkcox3...> -
On Feb 6, 2008, at 7:48 PM, John Engelhart wrote:
> int main(int argc, char **argv) {
> NSObject *stackObject = NULL;
> stackObject = alloca(sizeof(stackObject));
This, of course, just alloced something the size of a pointer, which
may not be the real size of the class (if it were something other than
NSObject)
>
> memset(stackObject, 0, sizeof(NSObject));
> stackObject->isa = [NSObject class];
>
> NSLog(@"stackObject: %@", stackObject);
> }
> [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
> tst.m: In function 'main':
> tst.m:7: warning: instance variable 'isa' is @protected; this will
> be a hard error in the future
> [<johne...>] /tmp% ./tst
> 2008-02-06 20:17:16.306 tst[18320:807] stackObject: <NSObject:
> 0xbffff380>
>
> As the address clearly shows, this is an object on the stack.
> Although I have had to manually initialize the object, it is
> exactly, or very close to, what "NSObject stackObject" would have
> created.
>
>
The biggest problem is that the above example shows something that
isn't usable - it's sterile. You can't pass it to other routines (due
to memory management requirements), you may not even be able to call
all the methods of the object (since they may pass the themselves as
parameters to other objects which invoke memory management
requirements). Only if you wrote your own entire hierarchies could
you use such a construction (or if you implemented full closures,
which is even more difficult in a C based language) . There's a whole
raft of semantics associated with Cocoa objects (above and beyond
anything that whatever version of the objective-c runtime may require).
If you can't use the object, have you actually created it?
>
> As you can see, and the code clearly demonstrates, my original
> assertion stands.
>
One could also manually construct a C++ vtable and set up all the
magic "behind the scenes" plumbing that a C++ object has, but that
doesn't mean that C++ object are ' synonymous for "Structs" ' either.
The point is that they are not synonymous for structs - if they were,
you could replace one with the other, and both Objective-C and C++
objects have additional semantic requirements.
>
> As you can see, it's very clear where "isa" comes from. Subclassing
> an object has the effect of "pasting" your ivar declarations at the
> end of the class you're inheriting from, and forms the cause of
> "fragile classes" since a struct effectively becomes pointer +
> offset, and changing a struct requires recompiling code to update
> those offsets.
Except, of course, for 64 bit Objective-C 2.0 which doesn't have these
problems (which makes the exercise of trying to allocate it on the
stack even more problematic).
>
>
> So, no, there are no magic invisible members.
But there are semanticly required members that aren't in "plain"
structs.
Glenn Andreas <gandreas...>
<http://www.gandreas.com/> wicked fun!
quadrium | flame : flame fractals & strange attractors : build,
mutate, evolve, animate -
On Feb 6, 2008, at 8:06 PM, John Engelhart wrote:
> --snip----snip--
> Actually, I've thought of another example which addresses the use of
> (or lack of) __strong unambiguously and still demonstrates the
> problem:
>
> #import <Foundation/Foundation.h>
>
> @interface GCTest : NSObject {
> const char *title;
> };
>
> - (void)setTitle:(const char *)newTitle;
> - (const char *)title;
>
> @end
>
> @implementation GCTest
>
> - (void)setTitle:(const char *)newTitle
> {
> printf("Setting title. Old title: %p, new title %p = '%s'\n",
> title, newTitle, newTitle);
> title = newTitle;
> }
>
> - (const char *)title
> {
> return title;
> }
>
> @end
>
> int main(int argc, char *argv[]) {
> NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
> GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
> void *ptr;
>
> gcConstTitle = [[GCTest alloc] init];
> gcUTF8Title = [[GCTest alloc] init];
>
> [gcConstTitle setTitle:"Hello, world!"];
> [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
> \xC2\xA1"] UTF8String]];
>
> NSLog(@"Test: %@", @"hello");
> [[NSGarbageCollector defaultCollector] collectExhaustively];
> NSLog(@"GC test");
>
> printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
> [gcConstTitle title]);
> printf("gcUTF8Title title: %p = '%s'\n", [gcUTF8Title title],
> [gcUTF8Title title]);
>
> [gcConstTitle setTitle:NULL]; // Must clear the pointer before
> popping pool.
> [gcUTF8Title setTitle:NULL];
>
> [pool release];
> return(0);
> }
> [<johne...>] /tmp% gcc -framework Foundation -fobjc-gc-only -
> o gc -g gc.m
> [<johne...>] /tmp% ./gc
> Setting title. Old title: 0x0, new title 0x1ea4 = 'Hello, world!'
> Setting title. Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
> 2008-02-06 18:32:35.712 gc[18108:807] Test: hello
> 2008-02-06 18:32:35.798 gc[18108:807] GC test
> gcConstTitle title: 0x1ea4 = 'Hello, world!'
> gcUTF8Title title: 0x1011860 = 'Hello, world'
> Setting title. Old title: 0x1ea4, new title 0x0 = '(null)'
> Setting title. Old title: 0x1011860, new title 0x0 = '(null)'
>
> Oddly, I had to add a second NSLog() in order to get some kind of
> lossage, but I think it's fair to chalk this up to the semi-random
> nature of allocations.
>
> The above example is now perfectly legal by everyones definition of
> how things were under retain/release, and I correctly clear the
> pointer before it goes out of scope, and demonstrates that the GC
> system can, and does, reclaim live data out from under you.
>
"
UTF8String
Returns a null-terminated UTF8 representation of the receiver.
- (const char *)UTF8String
"
Direct from Apple's docs. You seriously need to go back to basics if
you don't understand how screwed up your logic here is! -
On Feb 6, 2008, at 7:01 PM, glenn andreas wrote:
>
> One could also manually construct a C++ vtable and set up all the
> magic "behind the scenes" plumbing that a C++ object has, but that
> doesn't mean that C++ object are ' synonymous for "Structs" ' either.
Well, in C++ you can say that class is a synonym for struct (aside
from the fairly minor issue of default visibility). Though, I think
the point stands that in Objective-C they are not. Just because you
can pound a square peg into a round hole doesn't mean you have a round
peg.
--Brady -
On Feb 6, 2008, at 10:01 PM, glenn andreas wrote:
>
> On Feb 6, 2008, at 7:48 PM, John Engelhart wrote:
>> int main(int argc, char **argv) {
>> NSObject *stackObject = NULL;
>> stackObject = alloca(sizeof(stackObject));
>
> This, of course, just alloced something the size of a pointer, which
> may not be the real size of the class (if it were something other
> than NSObject)
Opps, you're right. I clearly meant NSObject, as the memset line
below shows. You'll note that changing the alloca line from
stackObject to NSObject works just fine.
>
>>
>> memset(stackObject, 0, sizeof(NSObject));
>> stackObject->isa = [NSObject class];
>>
>> NSLog(@"stackObject: %@", stackObject);
>> }
>> [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
>> tst.m: In function 'main':
>> tst.m:7: warning: instance variable 'isa' is @protected; this will
>> be a hard error in the future
>> [<johne...>] /tmp% ./tst
>> 2008-02-06 20:17:16.306 tst[18320:807] stackObject: <NSObject:
>> 0xbffff380>
>>
>> As the address clearly shows, this is an object on the stack.
>> Although I have had to manually initialize the object, it is
>> exactly, or very close to, what "NSObject stackObject" would have
>> created.
>>
>>
>
> The biggest problem is that the above example shows something that
> isn't usable - it's sterile. You can't pass it to other routines
> (due to memory management requirements), you may not even be able to
> call all the methods of the object (since they may pass the
> themselves as parameters to other objects which invoke memory
> management requirements). Only if you wrote your own entire
> hierarchies could you use such a construction (or if you implemented
> full closures, which is even more difficult in a C based
> language) . There's a whole raft of semantics associated with Cocoa
> objects (above and beyond anything that whatever version of the
> objective-c runtime may require).
I'm pretty sure I was clear that this is "not really a good idea in
reality." It is, unlike you stated, possible to do.
>
> If you can't use the object, have you actually created it?
You can use the object. It remains live until the stack frame pops.
It will work exactly like any other object. The "deallocation" of the
object is tricky, but I'm sure you could still pull it off if you
really wanted.
>>
>> As you can see, and the code clearly demonstrates, my original
>> assertion stands.
>>
>
> One could also manually construct a C++ vtable and set up all the
> magic "behind the scenes" plumbing that a C++ object has, but that
> doesn't mean that C++ object are ' synonymous for "Structs" ' either.
>
> The point is that they are not synonymous for structs - if they
> were, you could replace one with the other, and both Objective-C and
> C++ objects have additional semantic requirements.
You can replace one for the other, see above. The GCC compiler stops
you from doing this because it's really not a good idea in practice.
There are other Objective-C implementations that allow stack based
objects exactly as you describe. I can't remember which one it was
off the top of my head, it might have been poc.
This is a literal "NSObject stackObject;", if it helps any:
#import <Foundation/Foundation.h>
int main(int argc, char **argv) {
struct { @defs(NSObject) } stackObject;
memset(&stackObject, 0, sizeof(NSObject));
stackObject.isa = [NSObject class];
NSLog(@"stackObject: %@", &stackObject);
}
[<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
[<johne...>] /tmp% ./tst
2008-02-07 00:04:20.477 tst[18686:807] stackObject: <NSObject:
0xbffff398>
[<johne...>] /tmp%
I suppose I could go through all the trouble of putting together
initialization and deallocation methods as part of a subclass, or
category override, that specifically dealt with stack "allocation" and
"deallocation". I could even get crafty and have dealloc check to see
if self is an pointer that's on the stack and call the stack dealloc
code, and the standard dealloc code otherwise.
Honestly though, I'm not sure how much more plain I can make it. You
said:
>
>>
>> You misunderstand what Objective C is, and how it works. "Objects"
>> is synonymous for "Structs".
>
>
>
> If that were true, you'd be able to declare objects as local
> variables (as opposed to as pointers to structures):
>
> NSPoint aPoint; // <-- NSPoint = struct, legal
> NSString aString; // <-- NSString = object, Illegal
Yet, as the code above plainly shows, "struct { @defs(NSObject) }
stackObject;" has declared an object as a local variable, even passed
to NSLog() to print its description, proving that's it's a bona-fide,
useable object, and that the keyword "struct" makes it obvious that an
"object" is just a "struct".
How about "Objects" is synonymous for "struct objc_object"s ? And
since struct objc_object is typedefed to "*id", that makes it pretty
literal.
Does this help at all?
#import <Foundation/Foundation.h>
typedef struct { @defs(NSObject) } NSObject_;
int main(int argc, char **argv) {
NSObject_ stackObject;
memset(&stackObject, 0, sizeof(NSObject));
stackObject.isa = [NSObject class];
NSLog(@"stackObject: %@", &stackObject);
}
[<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
[<johne...>] /tmp% ./tst
2008-02-07 00:28:47.784 tst[18734:807] stackObject: <NSObject:
0xbffff398>
I've had to add an underscore to prevent namespace collision, but....
I'm not really sure how else I can explain it.
>
>>
>> As you can see, it's very clear where "isa" comes from.
>> Subclassing an object has the effect of "pasting" your ivar
>> declarations at the end of the class you're inheriting from, and
>> forms the cause of "fragile classes" since a struct effectively
>> becomes pointer + offset, and changing a struct requires
>> recompiling code to update those offsets.
>
> Except, of course, for 64 bit Objective-C 2.0 which doesn't have
> these problems (which makes the exercise of trying to allocate it on
> the stack even more problematic).
Yes, as I have noted, the ObjC 2.0 64 bit ABI/API is different. This
is dealing with the 32 bit version, which has been around since the
80's.
>> So, no, there are no magic invisible members.
>
> But there are semanticly required members that aren't in "plain"
> structs.
Honestly, I don't follow. What is a "plain" struct?
struct objc_object {
void *isa;
};
Is that not a "plain" struct? And what do you mean by "semantically
required members"?
#import <Foundation/Foundation.h>
typedef struct { char letters[4]; } NSObject_;
int main(int argc, char **argv) {
NSObject_ stackObject;
memset(&stackObject, 0, sizeof(NSObject_));
void *classPtr = [NSObject class];
memcpy(stackObject.letters, &classPtr, sizeof(void *));
NSLog(@"stackObject: %@", &stackObject);
}
[<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
[<johne...>] /tmp% ./tst
2008-02-07 00:37:57.632 tst[18799:807] stackObject: <NSObject:
0xbffff398>
Sure, a bit of square peg in to a round hole abuse to get the class
ptr copied over.... but.. No isa here, yet it works.
I suppose pedantically one could argue that in order for something to
be an "object", it would have to have the layout of a particular kind
of struct... but I think that's pushing things a little far, and in
the end, whatever the layout, that layout is declared as a struct. -
On Feb 6, 2008, at 8:48 PM, Chris Suter wrote:
>
> On 07/02/2008, at 12:06 PM, John Engelhart wrote:
>
>> However, consider for a moment if it was a type attribute, and
>> followed type attribute rules, and how this would effect the
>> examples cited. Off the top of my head, I think treating it as a
>> type attribute would have prevent every single error I've pointed
>> out. Hypothetically, consider if UTF8String propagated the
>> __strong type, and the assignment of its pointer to the ivar 'const
>> char *'.
>>
>> The compiler would fail to compile the code, and generate an error.
>
> I don't think it should be a type qualifier. It would mean that you
> wouldn't be able to do things like:
>
> puts ([myString UTF8String])
>
> without getting a compiler warning.
This is a pretty debatable point, with pros and cons on each side. My
opinion is that you should not be handing pointers which require write
barriers for proper operation of the GC system to code that is not
compiled with the proper support. One could argue that the decision
to require that "all frameworks must be GC compiled/capable" makes
this policy a requirement.
Since the GC system considers everything on the stack to be a live
pointer, this has the effect of catching 99.99% use of pointers in
this fashion. While I do not have a specific example to show here, I
think you'll agree that there are occasionally times when calling a C
library function will violate these principles. Realistically, when
you call a function, that pointer vanishes down a call stack that is
surpassingly complex, and that pointer has a tendency of visiting
places you would not think applies in a particular case. If you've
used Shark.app, I'm sure you've seen some functions which end up
creating some surprisingly deep call-stacks that in turn are calling
all sorts of seemingly unrelated functions.
I don't argue that it covers 99.99% of the cases. It's that last tiny
bit that I'm writing about. I'm sure you'll agree that tracking down
errors of this nature is, politely, frustrating.
There's an example someone once used to underscore how optimistic we
can be and spectacularly misjudge the likelihood of these rare errors
occurring. It comes from the canonical example of multi-threaded/
multi-cpu programming:
x++;
I'll skip over the specifics, but the question is asked "How likely do
you think the condition is for two threads to get into a race
condition and incorrectly update 'x'? A million to one?"
While a million to one odds seems like a lot, at two gigahertz, that's
roughly 2000 times per second. That's roughly a 500 microseconds mean
time to failure rate.
>
>> Oddly, I had to add a second NSLog() in order to get some kind of
>> lossage, but I think it's fair to chalk this up to the semi-random
>> nature of allocations.
>
> I think that would have been because the pointer returned by
> UTF8String was still on the stack or in a register.
Hard to say. I think it does illustrate another point: the seemingly
random nature in which you will get bitten by these kinds of bugs.
>
>> The above example is now perfectly legal by everyones definition of
>> how things were under retain/release, and I correctly clear the
>> pointer before it goes out of scope, and demonstrates that the GC
>> system can, and does, reclaim live data out from under you.
>
> I think your example is contrived. Whilst it's legal in the retain/
> release world you wouldn't ever write anything like that.
Granted, my example is contrived, no two ways about it. But is that
not the point? To create a compact example which replicates the
problem? I believe I have done all that is required: demonstrate that
it is possible. Once I've done that, the sheer volume of code under
consideration essentially guarantees that this is taking place. My
practical, hands on experience suggests this (and by this, I don't
mean this particular example per se, but the ease in which it's
possible to get some of these subtle points wrong) is happening far
more frequently then you would think.
>
> The solution to all of this is, as has already been stated, is to
> understand the contract that UTF8String promises and to make your
> own arrangements if you want to hang on to the value.
You're absolutely right. But my point is that, in practice, this is
not quite as clear cut as it seems.
Allow me to back way, way, way up. For the purposes of this argument,
let us not consider all the technical points that have been discussed
so far, as it's easy to get lost arguing pedantic, nuanced details.
Let's consider the GC system from a purely pragmatic point of view.
Now, the precise specifics not withstanding, you will at some time get
some small detail wrong. You will have created a bug with regards to
some GC detail. The effect of this bug, which I think everyone will
reasonably agree on, is likely to result in the collector reclaiming
memory that you have in use when you clearly didn't want it to.
The hows and whys of your bug aren't really important, but you have
done something wrong when you shouldn't have. These things happen.
My experience with these bugs has been that they consume an
EXTRAORDINARY amount of time to track down. Because of the semi-
random nature that these bugs manifest themselves in, I have found
that it's virtually impossible to find a solid set of conditions to
tickle the condition. I have found that unit tests are worthless in
trying to track down these problems. The complex interactions
required to tickle these bugs are essentially impossible to create
with unit tests. All unit tests will pass, flawlessly, and in fact it
some times may not be possible to recreate the right conditions to
trigger the bug because essentially all your variables are sitting on
the stack in the scope of the unit test.
The pragmatic effects of using Leopards GC system has been a MASSIVE
increase in the amount of time I have spent debugging problems that
have all the symptoms of "race condition" bugs, and consequently the
huge uptick in effort required to find and eliminate these bugs.
This is my warning to you (you being all of the list, or archive
reader). Setting aside all the technical points discussed, you will
eventually create a bug that in retrospect, you clearly shouldn't
have. The nature of these bugs means that it will take tremendous
effort and time to track down and correct. My experiences with the GC
system have seen the time I spend debugging explode, and in fact
dominate the time I spend developing. What's worse is that these bugs
rarely manifest themselves during development and are nearly
impossible to catch with unit tests. This means that the reliability
of "shipped code" goes right through the floor, and replicating the
bug often requires considerable interaction with an outside party and
all the difficulties that that entails. -
On Feb 6, 2008 10:52 PM, Ben Trumbull <trumbull...> wrote:
>
> You cannot message a void* without type casting. The compiler won't
> let you. The compiler treats objects differently. Further, there is
> no way for you to accept an arbitrary void* parameter and prove at
> runtime it is in fact capable of dispatching messages (i.e. has an
> isa at offset 0). Yet I always know I can message an id.
More nitpickery here....
You can certainly message a void * without type casting. The compiler emits
a warning, but warnings are not errors. I would not be surprised if this
were not the case in ObjC++, I only tried plain ObjC.
And you certainly cannot know that you can message an id. Aside from obvious
pathological cases such as (id)42, consider the extremely common problems we
see on this list all the time where people mess up their memory management
and end up with id variables which crash when messaged.
It's often convenient to ignore these facts and act as though ObjC is a safe
OO language, but the fact is that it is C and it allows all the dumb (and
powerful) stuff that C allows.
For any arbitrary id, I can invoke -class without crashing.
This is untrue. You can only invoke -class safely if the target implements
it. Anything which inherits from NSObject will, but it is trivial to create
a root class which doesn't respond to -class. And there's no reasonable way
to find out if it responds to -class or not, because -respondsToSelector: is
*also* not guaranteed to exist! The ObjC runtime functions can try to tell
you, but they can give a false negative in the case of classes which handle
these messages via forwarding. Once again, it's convenient to act as though
all ObjC objects conform to the NSObject protocol, but it is not actually
enforced.
Mike -
On Feb 6, 2008, at 1:29 PM, Michael Tsai wrote:
> On Feb 6, 2008, at 12:46 PM, Alastair Houghton wrote:
>
>>> I don't think this is a bug. The NSString and CFString APIs do not
>>> indicate that they treat the bytes as scanned memory.
>>
>> That's true, but it doesn't matter whether they treat the bytes as
>> scanned memory or not; that would only change whether putting
>> pointer data in the bytes was safe. The problem is whether the
>> pointer itself is being traced, which isn't happening right now
>
> Sorry, that's what I meant.
>
>> the docs *do* say (in the Garbage Collection Programming Guide)
>> that NULL, kCFAllocatorDefault and kCFAllocatorSystemDefault cause
>> objects to be allocated in the GC zone, so I don't think it's
>> unreasonable to expect that the pointer will be traced.
>
> The string was allocated using kCFAllocatorDefault, but the
> deallocator was specified as kCFAllocatorNull. The docs say:
>
> "If the buffer does not need to be deallocated, or if you want to
> assume responsibility for deallocating the buffer (and not have the
> CFString object deallocate it), pass kCFAllocatorNull."
>
> If CFString is not going to be responsible for deallocating the
> buffer, then it would not make sense to rely on CFString keeping the
> buffer alive for you.
Just a quick note... my recollection is that using anything but
kCFAllocatorNull results in a double free()... but I might be
misremembering, it was awhile ago, but I'm pretty sure that this was
the bit of code that did it. A quick peek at the source code base
shows this is only one of a handful of places where
NSAllocateCollectable is used, so chances are this is it. I think you
also need to crank up the malloc environment debugging to catch it. -
On Feb 6, 2008, at 10:12 AM, Michael Tsai wrote:
> On Feb 6, 2008, at 4:39 AM, John Engelhart wrote:
>
>> Read the above, "object" is synonymous for "struct". The "layout"
>> of an object is identical to the "layout" of a struct.
>
>
> That is true but irrelevant. What matters for garbage collection is
> whether the variables are typed as objects at compile time, because
> that's what determines what code the compiler emits for assignments.
A nitpick, but it is more correct to say "where the variables are
typed as __strong at compile time". Subtle, but important, especially
when you consider the following from objc/objc.h:
typedef struct objc_object {
Class isa;
} *id;
Despite any prose definition of what an object is, this is the
definition of an object to the compiler. I will note that the
compilers definition of an object is a) a struct, and b) not
__strong. Therefore, in an overly pedantically strict sense, your
statement is wrong, but I'm fairly sure you would have picked __strong
in place of objects as that is the effect you were trying to
communicate.
Point B is especially interesting, and to fully appreciate its
consequences this is going to turn in to an ugly discussion of the
intermediate gimple code. When you consider the points I've raised
from this perspective, and what's really going on under the hood, many
of my arguments will snap in to focus. Needless to say, we rarely
examine the assembly code emitted by the compiler, and in practice the
effect of a write barrier assignment and a non-write barrier
assignment during execution is identical, you would have no reason to
suspect anything is wrong.
One of my original claims is that, despite the prose definition of how
the GC system works and peoples beliefs, the compiler is magically
transforming some pointers in to __strong behind your back by some
unspecified logic, as the definition of id shows. 'id' is the codified
definition of an object in fact, and the fact that __strong is not
part of its definition means that the prose definition of "objects are
__strong" is misleading at best, but wrong in a strict sense. And
since one pointer looks pretty much like any other pointer in the guts
of the compiler, the likelihood that this magical promotion is being
applied correctly to the appropriate pointers is pretty slim. Since
__strong is not being treated as a type qualifier per ANSI-C rules,
there is nothing in place to catch inadvertent "down promotions" when
they happen. The true performance impact of the GC system has
probably also been grossly understated as wrapping a write to memory
in a function call essentially obliterates any hope of hoisting values
in to registers during optimization along with the attendant
performance robbing spills.
Another claim I have made is that "treating everything on the stack as
__strong is not the right thing, it is only going to mask the
problem." One could argue that the choice of treating everything on
the stack as __strong is proof of how often the compiler is getting
this automagic promotion wrong, or otherwise dropping the __strong
qualification during some code transformation. My hypothesis is that
if write barriers were being generated correctly, there would be no
need to treat the stack as special, other than the fact of "how much"
of the stack to be considered live relative to the top frame.
There has also been some discussion regarding proper use of memory
allocated by NSAllocateCollectable, such as its use with CFString
creation methods. This highlights another point of mine. Despite the
(sometimes foaming at the mouth) assertions that the GC rules are
"easy", what is transpiring is reasonable people are having
differences of opinion regarding the use of GC allocated memory. They
are even perfectly valid, totally reasonable interpretations. This
highlights that in practice, these "simple rules" are in fact
deceptively complex and trivially easy to get wrong. Understand that
it doesn't really matter who is right in the argument, the fact that
there is reasonable debate about something which should be completely
unambiguous and crystal clear demonstrates that in practice, there's
probably one correct way, and many wrong ways, and chances are you are
going to get it wrong. From experience, being told that "you
obviously should have used __strong, DUH" after four days of intense
debugging is going to ring hollow. You should be cautious of objects
near your vicinity spontaneously levitating and taking flight. Or, as
a courtesy, at least hang a sign on the door. -
Ben Trumbull wrote:
> John Engelhart wrote:
>> - (id *)copyOfObjectArray:(id *)originalObjects length:
>> (NSUInteger)length
>> {
>> id *newObjectArray = NULL;
>> newObjectArray = NSAllocateCollectable(sizeof(id) * length,
>> NSScannedOption);
>> memcpy(newObjectArray, originalObjects, sizeof(id) * length);
>> return(newObjectArray);
>> }
> This does not work. Pushing GC'd objects through memcpy, a system
> call that can't know anything about Objective-C Garbage Collection,
> seems unwise.
Correct. Don't manipulate memory that might have GC pointers in it
without using a GC-aware function.
> Nonetheless, that also should be better documented, and a bug report
> for a public GC compatible memory copy API would be good.
The GC-aware memcpy() is in <objc/objc-auto.h>
void *objc_memmove_collectable(void *dst, const void *src, size_t
size);
There are also GC-aware versions of OSAtomicCompareAndSwapPtr()
BOOL objc_atomicCompareAndSwapGlobal(id predicate, id
replacement, volatile id *objectLocation);
BOOL objc_atomicCompareAndSwapGlobalBarrier(id predicate, id
replacement, volatile id *objectLocation);
// atomic update of an instance variable
BOOL objc_atomicCompareAndSwapInstanceVariable(id predicate, id
replacement, volatile id *objectLocation);
BOOL objc_atomicCompareAndSwapInstanceVariableBarrier(id
predicate, id replacement, volatile id *objectLocation);
These are typedef'd as id, but they work equally well with any pointer.
>> Anyone who's used garbage collection with C is probably familiar
>> with the Boehm Garbage Collector. [...] It makes no particular
>> demands of the programmer or compiler, in fact it can be used as a
>> drop in replacement for malloc() and free(), requiring no changes.
Of course, that only works if malloc() is replaced everywhere in the
system, which is impractical in a dynamic shared library environment.
Storing a Boehm-managed pointer in a block allocated from non-Boehm
malloc() or a non-default malloc zone would cause just as much grief
as storing a Leopard-GC-managed pointer in a block allocated with
malloc().
The designer of a GC system always has to draw a line and say "if you
cross this line, you have to start thinking about memory management
again".
Java: memory management is easy, until you start working with non-Java
code via JNI. Benefits: the JVM can use sophisticated GC techniques
because of its tight control. Drawbacks: working with non-Java code is
very hard.
Boehm: memory management is easy, until you call mmap() or start
working with code in shared libraries. Benefits: most ordinary C code
works. Drawbacks: GC algorithms invented after 1970 or so are
impractical.
Objective-C: memory management is easy, until you want to use it with
blocks that aren't Objective-C objects. Benefits: most ordinary
Objective-C code works. Drawbacks: C code is harder than Boehm; GC
flexibility is less than Java.
--
Greg Parker <gparker...> Runtime Wrangler -
On Feb 7, 2008, at 11:23 PM, John Engelhart wrote:
> Another claim I have made is that "treating everything on the stack
> as __strong is not the right thing, it is only going to mask the
> problem." One could argue that the choice of treating everything
> on the stack as __strong is proof of how often the compiler is
> getting this automagic promotion wrong, or otherwise dropping the
> __strong qualification during some code transformation. My
> hypothesis is that if write barriers were being generated
> correctly, there would be no need to treat the stack as special,
> other than the fact of "how much" of the stack to be considered
> live relative to the top frame.
Treating everything on the stack as __strong makes things simpler for
the programmer; you don't have to worry about what the compiler is or
isn't checking for you. Secondly, as you noted above, it's relatively
expensive to turn assignments into function calls. It would be a
waste of time to do this for variables within short-lived stack frames.
> Despite the (sometimes foaming at the mouth) assertions that the GC
> rules are "easy", what is transpiring is reasonable people are
> having differences of opinion regarding the use of GC allocated
> memory.
Perhaps we can agree that the rules for objects, the most common
case, are easy.
--Michael -
On Feb 8, 2008 4:23 AM, John Engelhart <john.engelhart...> wrote:
> 'id' is the codified
> definition of an object in fact
Here we go again.
Perhaps your confusion between "an address in memory" and "the
contents of memory from that address" is at the heart of the problem?
The following code "proves" that objects and character arrays are "equivalent":
#import <Foundation/Foundation.h>
int main(int argc, char **argv) {
char string[] = "test";
id x = [NSObject class];
memcpy(string, &x, 4);
NSLog(@"object: %@", string);
}
$ gcc -framework Foundation -o test test.m
$ ./test
2008-02-08 17:33:21.514 test[1380:10b] object: <NSObject: 0xbffffa8b>
$
To put it a different way: memory is memory is memory. Data types give
us structured access to that memory. Objective-C objects are not
__strong, but ids (a special type of pointer to those objects) are
__strong. It's that simple. Take it or leave it, but don't run around
shouting that the sky is falling.
> Since
> __strong is not being treated as a type qualifier per ANSI-C rules,
> there is nothing in place to catch inadvertent "down promotions" when
> they happen.
You want to have the compiler warn you whenever you create a weak
reference. Fine: file an enhancement request. The sky still isn't
falling!
Hamish -
This discussion has gotten a bit far from involving Cocoa directly.
Please move it to the objective-C mailing list (<objc-language...>
).
On Feb 7, 2008, at 8:23 PM, John Engelhart wrote:
>
> On Feb 6, 2008, at 10:12 AM, Michael Tsai wrote:
>
>> On Feb 6, 2008, at 4:39 AM, John Engelhart wrote:
>>
>>> Read the above, "object" is synonymous for "struct". The "layout"
>>> of an object is identical to the "layout" of a struct.
>>
>>
>> That is true but irrelevant. What matters for garbage collection is
>> whether the variables are typed as objects at compile time, because
>> that's what determines what code the compiler emits for assignments.
>
> A nitpick, but it is more correct to say "where the variables are
> typed as __strong at compile time". Subtle, but important,
> especially when you consider the following from objc/objc.h:
>
> typedef struct objc_object {
> Class isa;
> } *id;
>
> Despite any prose definition of what an object is, this is the
> definition of an object to the compiler. I will note that the
> compilers definition of an object is a) a struct, and b) not
> __strong. Therefore, in an overly pedantically strict sense, your
> statement is wrong, but I'm fairly sure you would have picked
> __strong in place of objects as that is the effect you were trying
> to communicate.
>
> Point B is especially interesting, and to fully appreciate its
> consequences this is going to turn in to an ugly discussion of the
> intermediate gimple code. When you consider the points I've raised
> from this perspective, and what's really going on under the hood,
> many of my arguments will snap in to focus. Needless to say, we
> rarely examine the assembly code emitted by the compiler, and in
> practice the effect of a write barrier assignment and a non-write
> barrier assignment during execution is identical, you would have no
> reason to suspect anything is wrong.
>
> One of my original claims is that, despite the prose definition of
> how the GC system works and peoples beliefs, the compiler is
> magically transforming some pointers in to __strong behind your back
> by some unspecified logic, as the definition of id shows. 'id' is
> the codified definition of an object in fact, and the fact that
> __strong is not part of its definition means that the prose
> definition of "objects are __strong" is misleading at best, but
> wrong in a strict sense. And since one pointer looks pretty much
> like any other pointer in the guts of the compiler, the likelihood
> that this magical promotion is being applied correctly to the
> appropriate pointers is pretty slim. Since __strong is not being
> treated as a type qualifier per ANSI-C rules, there is nothing in
> place to catch inadvertent "down promotions" when they happen. The
> true performance impact of the GC system has probably also been
> grossly understated as wrapping a write to memory in a function call
> essentially obliterates any hope of hoisting values in to registers
> during optimization along with the attendant performance robbing
> spills.
>
> Another claim I have made is that "treating everything on the stack
> as __strong is not the right thing, it is only going to mask the
> problem." One could argue that the choice of treating everything on
> the stack as __strong is proof of how often the compiler is getting
> this automagic promotion wrong, or otherwise dropping the __strong
> qualification during some code transformation. My hypothesis is
> that if write barriers were being generated correctly, there would
> be no need to treat the stack as special, other than the fact of
> "how much" of the stack to be considered live relative to the top
> frame.
>
> There has also been some discussion regarding proper use of memory
> allocated by NSAllocateCollectable, such as its use with CFString
> creation methods. This highlights another point of mine. Despite
> the (sometimes foaming at the mouth) assertions that the GC rules
> are "easy", what is transpiring is reasonable people are having
> differences of opinion regarding the use of GC allocated memory.
> They are even perfectly valid, totally reasonable interpretations.
> This highlights that in practice, these "simple rules" are in fact
> deceptively complex and trivially easy to get wrong. Understand
> that it doesn't really matter who is right in the argument, the fact
> that there is reasonable debate about something which should be
> completely unambiguous and crystal clear demonstrates that in
> practice, there's probably one correct way, and many wrong ways, and
> chances are you are going to get it wrong. From experience, being
> told that "you obviously should have used __strong, DUH" after four
> days of intense debugging is going to ring hollow. You should be
> cautious of objects near your vicinity spontaneously levitating and
> taking flight. Or, as a courtesy, at least hang a sign on the door.


