Use of Mac OS X 10.5 / Leopards Garbage Collection Considered Harmful

  • This is bound to be an inflammatory subject.  That is not my intent,
    and I mean no disrespect to the programmers who worked on the GC
    system.  I'm quite sure that adding GC to Cocoa is a non-trivial, near
    impossible task, filled with trade-offs between "really bad" and "even
    worse."  Also understand that a bit of 'authoritative' documentation
    or instruction can out right negate some of the points I make below as
    I have had only the publicly available documentation and my
    (relatively brief 2-3 months) experiences with the 10.5 GC system to
    form these opinions....

    I've had several reservations about Leopard's GC system since I
    started working with it.  There is very little documentation on
    Leopards GC system, so the following has been pieced together by
    inference and observations of how the garbage collection system seems
    to work.  My first concern was with the use of "compiler assisted
    write barriers".  The current public documentation is extremely vague
    as to what a 'write barrier' is, and I'm sure that the majority of
    you, like me, assumed the term referred to an "atomic write barrier /
    fence" used to ensure that all CPU's past the write barrier would see
    the same data at a given location.  See `man 3 barrier` for a
    description of the OSMemoryBarrier() function that performs this
    operation. This would make some sense for a GC system, it would ensure
    that the use of a pointer is visible to the collector no matter what
    thread or CPU is using the pointer.  From what I can tell, the term
    'write barrier' as it is used by the GC documentation has absolutely
    nothing to do with this traditional meaning of the term.

    Anyone who's used garbage collection with C is probably familiar with
    the Boehm Garbage Collector.  I believe that the Boehm GC library
    embodies what most people would expect of a garbage collection
    system:  The programmer is freed from having to worry about memory
    allocations and when or where pointers to allocated memory are, it's
    the collectors job to find those pointers and piece together what
    memory is still pointed to by active pointers and reclaim the memory
    which has no live pointers referencing it.  Very roughly, it does this
    by starting with a collection of root allocations.  It scans these
    allocations looking for pointers, and then following those pointers
    and scanning those blocks of memory.  It builds a graph of references
    from these root objects, and when all the memory has been scanned,
    memory allocations that are not part of this 'liveness' graph can be
    reclaimed.  It makes no particular demands of the programmer or
    compiler, in fact it can be used as a drop in replacement for malloc()
    and free(), requiring no changes.

    From what I've pieced together, Leopards GC system is nothing like
    this.  While the Boehm GC system detects liveness passively by
    scanning memory and looking for and tracing pointers, Leopards GC
    system does no scanning and requires /active/ notification of changes
    to the heap.  This, I believe, is what a 'write-barrier' actually is:
    it is a function call to the GC system so that it can update it's
    internal state as to what memory is live.  It relies, I suspect
    exclusively, on these function calls to track memory allocations.

    If this is indeed the case, it's my opinion that the 10.5 GC system is
    fundamentally and fatally flawed.  In fact, its use should be actively
    discouraged.  I'll now outline the reasoning behind this, including an
    example that highlights the magnitude of the problem.

    This would explain the need for 'dual mode' frameworks, and that an
    application that uses GC must be linked to frameworks that are all GC
    capable.  This is because a non-GC framework would not actively inform
    the GC system of its use of pointers, leading to random crashes and
    what not as the GC system reclaimed memory that was actively in use.

    In order for leopards GC system to function properly, the compiler
    must be aware of all pointers that have been allocated by the GC
    system so that it can wrap all uses of the pointer with the
    appropriate GC notification functions (objc_assign*).  Note that this
    is subtly different that the definitions and examples used in 'Garbage
    Collection Programming Guide'.  From 'Garbage Collection Programming
    Guide', 'Language Support':

    __strong
    Specifies a reference that is visible to (followed by) the garbage
    collector (see “How the Garbage Collector Works”).

    __strong modifies an instance variable or struct field declaration to
    inform the compiler to unconditionally issue a write-barrier to write
    to memory. __strong is implicitly part of any declaration of an
    Objective-C object reference type. You must use it explicitly if you
    need to use Core Foundation types, void *, or other non-object
    references (__strong modifies pointer assignments, not scalar
    assignments).

    ----

    This is a deceptive description.  /ANY/ pointer that holds a pointer
    to memory that MAY be allocated from the garbage collector must be
    marked __strong.  The compiler attempts to 'automagically' add
    __strong to certain types of pointer references, specifically 'id' and
    derivatives of 'id', namely class pointers (NSString *).

    Realistically, to properly add __strong to a pointer, you need to know
    if that allocation came from the garbage collector.  This information
    is essentially impossible to know apriori, so the only practical
    course of action is to defensively qualify all pointers as __strong.

    The consequence of using a pointer that is not properly qualified as
    __strong is that the GC system may determine that the allocation is no
    longer live and reclaim it, even if there is still a valid pointer out
    there.  Therefore, all pointer references which have the possibility
    of referencing an allocation from the garbage collection system must
    treat that pointer as __strong.  If any piece of code, at any level,
    at any point in time fails to satisfy this condition, you are in for a
    world of hurt.  The fact of the matter is that, for all practical
    purposes, it is impossible to guarantee this.  It is also trivial to
    get wrong, and the only indication that there's a problem is an
    occasional random error or crash.  Most of the time things will work,
    but every once in awhile...  and these 'bugs' are virtually impossible
    to track down.  (In fact, this message is the result of having to
    track down Yet Another GC Problem where something, somewhere, did
    something wrong... maybe).

    I believe I have a succinct example that illustrates these issues:

    ----
    #import <Foundation/Foundation.h>

    @interface GCTest : NSObject {
      const char *title;
    };

    - (void)setTitle:(const char *)newTitle;
    - (const char *)title;

    @end

    @implementation GCTest

    - (void)setTitle:(const char *)newTitle
    {
      printf("Setting title.  Old title: %p, new title %p = '%s'\n",
    title, newTitle, newTitle);
      title = newTitle;
    }

    - (const char *)title
    {
      return title;
    }

    @end

    int main(int argc, char *argv[]) {
      GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;

      gcConstTitle = [[GCTest alloc] init];
      gcUTF8Title = [[GCTest alloc] init];

      [gcConstTitle setTitle:"Hello, world!"];
      [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    \xC2\xA1"] UTF8String]];

      [[NSGarbageCollector defaultCollector] collectExhaustively];
      NSLog(@"GC test");

      printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    [gcConstTitle title]);
      printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    [gcUTF8Title title]);

      return(0);
    }

    ----

    [<johne...>] GC% gcc -framework Foundation -fobjc-gc-only gc.m -
    o gc
    [<johne...>] GC% ./gc
    Setting title.  Old title: 0x0, new title 0x1ed4 = 'Hello, world!'
    Setting title.  Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
    2008-02-03 19:07:58.911 gc[6191:807] GC test
    gcConstTitle title: 0x1ed4 = 'Hello, world!'
    gcUTF8Title  title: 0x1011860 = '??0?" '
    [<johne...>] GC%

    The problem is with the pointer returned by UTF8String.  From
    NSString.h:

    - (const char *)UTF8String;    // Convenience to return null-terminated
    UTF8 representation

    I strongly suspect the pointer that UTF8String returns is a pointer to
    an allocation from the garbage collector.  In fact, by changing the
    'title' ivar to include __strong 'solves' the problem.

    And herein lies the reason why I believe Leopards GC system is
    fundamentally and fatally flawed, and should in fact not be used at
    all.  There are several possible 'solutions' to this, but you'd better
    get it right or you're going to be stuck with race conditions of the
    most insidious nature imaginable.  Adding fuel to the fire, it's not
    clear what the 'right' solution is, or if there even is one.

    One might argue that, per the __strong documentation, the ivar
    requires the __strong type qualifier.  This is, at best, non-obvious,
    and considering that the documentation makes references to 'objects'
    almost exclusively, one can also argue that this pointer does not
    qualify.  But this points to a much bigger problem:  anyone who has
    used UTF8String and not qualified it as __strong has a race condition
    just waiting to happen.  This is also not a problem that can be fixed
    with a patch to Foundation in the next Mac OS X version- every program
    that has not qualified their use of UTF8String with __strong must be
    recompiled and re-released as there is nothing a shared library fix
    can do about this.  Add to this the fact that the published
    documentation is essentially silent on the topic and offers no
    guidance.  In fact, it's possible that adding __strong to the 'title'
    ivar is just an observable side effect of something else that seems to
    fix the problem.  I'm not sure what you'd do in that case because at
    that point just calling methods that return a pointer that you need
    becomes an exercise in luck and race conditions.

    This is but one example.  I don't think I need to point out that there
    are others.  A lot of others.  And most of them are non-obvious.  A
    consequence of all of this is that you must not pass pointers that may
    have been allocated by the garbage collector to any C function in a
    library.  For example,

    printf("String: %s\n", [@"Hello, world!" UTF8String]);

    passes a GC allocated pointer to a C library function, which almost
    assuredly does not have the proper write barrier logic in place to
    properly guard the pointer.  This example is innocent enough, and
    likely to work due to its short lived nature, but it's easy to think
    of examples where the pointer passed to a C function, say an SQLite3
    call, can cause no end of problems if that pointer happens to be
    reclaimed in the middle of the function call.

    This is the basis for my opinion that the 10.5 GC should not be used.
    In order to properly use the GC system, one must guarantee that all
    uses of GC allocated pointers have compiler assisted write barrier
    logic.  This is beyond non-trivial in practice as the passing of
    pointers is part of most functions calls.  Those functions call other
    functions, and at some point that pointer is likely to pass through a
    C library function.

    Since Leopards GC system places the burden of keeping the state of the
    GC system up to date on to the compiler, and in turn to every line of
    code that uses a pointer, this increases the possible locations for GC
    bugs to every single pointer using line of code.  There's a
    considerable amount of code that's been added to GCC to facilitate all
    of this, and bugs and code being what they are, there's bound to be
    bugs in there.  Code compiled with those bugs is frozen, the only way
    to fix it is to recompile. This means that anyone, /anyone/, who
    created GC enabled code needs to recompile their code in order to
    receive the bug fix.  This is an unalterable consequence of the
    decision to move the GC logic in to the compiler.
  • On Feb 3, 2008, at 17:57, John Engelhart wrote:

    > In order for leopards GC system to function properly, the compiler
    > must be aware of all pointers that have been allocated by the GC
    > system so that it can wrap all uses of the pointer with the
    > appropriate GC notification functions (objc_assign*).  Note that
    > this is subtly different that the definitions and examples used in
    > 'Garbage Collection Programming Guide'.  From 'Garbage Collection
    > Programming Guide', 'Language Support':
    >
    > __strong
    > Specifies a reference that is visible to (followed by) the garbage
    > collector (see “How the Garbage Collector Works”).
    >
    > __strong modifies an instance variable or struct field declaration
    > to inform the compiler to unconditionally issue a write-barrier to
    > write to memory. __strong is implicitly part of any declaration of
    > an Objective-C object reference type. You must use it explicitly if
    > you need to use Core Foundation types, void *, or other non-object
    > references (__strong modifies pointer assignments, not scalar
    > assignments).
    >
    > ----
    >
    > This is a deceptive description.  /ANY/ pointer that holds a pointer
    > to memory that MAY be allocated from the garbage collector must be
    > marked __strong.  The compiler attempts to 'automagically' add
    > __strong to certain types of pointer references, specifically 'id'
    > and derivatives of 'id', namely class pointers (NSString *).

    Interesting post. A couple of comments (that may just show I didn't
    absorb all of your argument):

    -- The extent of the deception seems to be that __strong is an
    attribute of the declaration, not of the pointer, but the
    documentation confuses the two: the compiler must be aware of all
    *variables used for* pointers that have been allocated by the GC
    system, and *a single variable cannot be used at different times for
    pointers to memory in different allocation systems*. If there was a
    fix to the documentation, would you still say GC is broken?

    -- It doesn't exactly surprise me that your sample code failed,
    because my reading of the documentation (the section you quoted) is
    that it told you the rules and you didn't follow them -- by not
    putting __strong on the char* ivar in the version that failed. The
    only issue is whether -[NSString UTF8String] returns memory allocated
    from a GC-controlled pool or not. The documentation for the method says:

    > Discussion
    >
    > The returned C string is automatically freed just as a returned
    > object would be released; you should copy the C string if it needs
    > to store it outside of the autorelease context in which the C string
    > is created.

    This sounds like it hasn't been updated for Leopard, but I'd sure read
    it as telling me the return string comes from the same place objects
    come from -- GC memory. And therefore any stored pointer to it would
    need a __strong or a __weak on its variable. Or, as stipulated, the
    result could be copied into malloc memory before being used. (The
    picture in the GC documentation suggests that malloc memory isn't GC-
    controlled, although I didn't find any text to state this absolutely.
    Maybe it's too obvious to say.)

    Or did I miss your point?

    -- I too puzzled over the meaning of the stuff in the GC document
    about write barriers, which I agree raises more questions than it
    answers. In the end, I came to the conclusion that "write barriers" in
    this case were nothing to do with protecting the integrity or lifetime
    of any pointer, but rather a pragmatic hint to *this* GC
    implementation about how hard it might work at collection on any given
    occasion.

    If the documentation were changed to use the phrase "collection
    performance hints" instead of "write barriers", would you still say GC
    is broken?

    -- So I wonder if the problem is that GC is broken, or just annoyingly
    fussy and poorly documented as regards to non-object memory.

    I hope you'll post more analysis of the problem. I (with a sigh of
    relief) jumped ship from the SS Retainer, so it matters to me if I'm
    now sailing towards that world of hurt you foreshadow. :)
  • Hello John,

    On Feb 3, 2008, at 5:57 PM, John Engelhart wrote:

    > This is bound to be an inflammatory subject.  That is not my intent,
    > and I mean no disrespect to the programmers who worked on the GC
    > system.  I'm quite sure that adding GC to Cocoa is a non-trivial,
    > near impossible task, filled with trade-offs between "really bad"
    > and "even worse."  Also understand that a bit of 'authoritative'
    > documentation or instruction can out right negate some of the points
    > I make below as I have had only the publicly available documentation
    > and my (relatively brief 2-3 months) experiences with the 10.5 GC
    > system to form these opinions....

    Given that GC in Leopard is a 1.0 release I think it's to be expected
    that there will be bugs and room for improvement in both the
    implementation and the documentation. The best way to get the
    improvements that matters the most to you is to file targeted bug
    reports and enhancement requests.

    > Anyone who's used garbage collection with C is probably familiar
    > with the Boehm Garbage Collector.  [...]
    > From what I've pieced together, Leopards GC system is nothing like
    > this.

    I think that the current "Architecture" section of the documentation
    gives a fairly good introduction and overview to what most developers
    need to know about the GC in Leopard. That said, I'm sure that you can
    think of things that you would like to see improved, and I encourage
    you to file enhancement requests wherever you do. The documentation
    department is very responsive and typically release multiple updates
    to the documentation per year.

    As an example: The documentation currently basically only deals with
    Cocoa and CoreFoundation. It seems to me that you think that it's not
    clear enough on how to deal with for example a (char *), and I would
    agree. This would be a great enhancement request.

    > This would explain the need for 'dual mode' frameworks, and that an
    > application that uses GC must be linked to frameworks that are all
    > GC capable.  This is because a non-GC framework would not actively
    > inform the GC system of its use of pointers, leading to random
    > crashes and what not as the GC system reclaimed memory that was
    > actively in use.

    Manual memory management and automatic memory management is
    sufficiently different that you need to change your coding patterns to
    adapt to either mode. I don't think that you could point to any
    environment where you can run non-trivial code in either manual or
    automatic memory management without changes.

    Finalizers have a different purpose in life than dealloc methods. You
    can't automatically turn dealloc methods into finalizers, and you
    can't just skip over them either. You need to have different code
    paths depending on the mode you choose for your code.

    > And herein lies the reason why I believe Leopards GC system is
    > fundamentally and fatally flawed, and should in fact not be used at
    > all.  There are several possible 'solutions' to this, but you'd
    > better get it right or you're going to be stuck with race conditions
    > of the most insidious nature imaginable.  Adding fuel to the fire,
    > it's not clear what the 'right' solution is, or if there even is one.

    Not to say that the engineers at Apple never make any mistakes, but do
    you really think that Apple would release something like this if what
    you say was true?  :-)

    > One might argue that, per the __strong documentation, the ivar
    > requires the __strong type qualifier.  This is, at best, non-
    > obvious, and considering that the documentation makes references to
    > 'objects' almost exclusively, one can also argue that this pointer
    > does not qualify.  But this points to a much bigger problem:  anyone
    > who has used UTF8String and not qualified it as __strong has a race
    > condition just waiting to happen.  This is also not a problem that
    > can be fixed with a patch to Foundation in the next Mac OS X
    > version- every program that has not qualified their use of
    > UTF8String with __strong must be recompiled and re-released as there
    > is nothing a shared library fix can do about this.

    As a generalization I think it's fair to say that Apple will only fix
    bugs in *their* code by updates to Mac OS X, any bugs in *your* code
    must be fixed by you. If you have made a GC bug in one of your
    shipping applications - because you lacked sufficient documentation to
    get it right, or for any other reason - it's quite likely that you
    will have to issue an update to fix that bug.

    > [...] A consequence of all of this is that you must not pass
    > pointers that may have been allocated by the garbage collector to
    > any C function in a library.  For example,
    >
    > printf("String: %s\n", [@"Hello, world!" UTF8String]);
    >
    > passes a GC allocated pointer to a C library function, which almost
    > assuredly does not have the proper write barrier logic in place to
    > properly guard the pointer.  This example is innocent enough, and
    > likely to work due to its short lived nature, but it's easy to think
    > of examples where the pointer passed to a C function, say an SQLite3
    > call, can cause no end of problems if that pointer happens to be
    > reclaimed in the middle of the function call.

    I think that you forget, and this might be at the heart of your
    worries, that any pointer found on the *stack* is treated as a root.
    Being a root it will not be collected, and neither will anything that
    it in turn references.

    Cheers,

    j o a r
  • [and to the list]

    You'll want to read this:

    <http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection
    /Introduction.html
    >

    On the first page, it says:

    "The initial root set of objects is comprised of global variables,
    stack variables, and objects with external references. These objects
    are never considered as garbage."

    The key point here is "stack variables" include everything that thread
    is referencing on any of its stack frames or registers.

    > This information is essentially impossible to know apriori,

    No, it's not a quality of the dynamic allocation, it's a quality of
    the class's ivar.  Should title be scanned ?  Yes ?  No ?

    Practically speaking, Objective-C objects are allocated from scanned
    memory, and the system frameworks Do The Right Thing.  If you call
    malloc() yourself, well, that's clearly not from GC's scanned memory.

    > so the only practical course of action is to defensively qualify all
    > pointers as __strong.

    That's probably wise for people learning to use GC.

    > I believe I have a succinct example that illustrates these issues:

    ... you did it wrong and it doesn't work.  Check.

    > but it's easy to think of examples where the pointer passed to a C
    > function, say an SQLite3

    > call, can cause no end of problems if that pointer happens to be
    > reclaimed in the middle of the function call.

    As it just so happens, I write a dual mode framework that does exactly
    that.

    > This is beyond non-trivial in practice as the passing of pointers is
    > part of most functions calls.

    Arguments and return values are always live as they are on the stack
    (or a register).  If one of those functions wants to store the pointer
    in its own memory, then you need to keep that pointer live for however
    long you expect the C library to reference it.  Which isn't any
    different than life before GC.

    - Ben
  • On 4 Feb 2008, at 01:57, John Engelhart wrote:

    > I've had several reservations about Leopard's GC system since I
    > started working with it.  There is very little documentation on
    > Leopards GC system, so the following has been pieced together by
    > inference and observations of how the garbage collection system
    > seems to work.  My first concern was with the use of "compiler
    > assisted write barriers".  The current public documentation is
    > extremely vague as to what a 'write barrier' is,

    [snip]

    > From what I can tell, the term 'write barrier' as it is used by the
    > GC documentation has absolutely nothing to do with this traditional
    > meaning of the term.

    The meaning of "write barrier" in this context is the traditional one
    in the world of garbage collection, which has been around a lot longer
    than other meaning.  It's certainly traditional though; e.g. see

      <ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps>

    or the excellent book "Garbage Collection: Algorithms for Automatic
    Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN
    0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).

    The GC docs actually explain what the write barrier is used for here:

    <http://developer.apple.com/documentation/Cocoa/Conceptual/GarbageCollection
    /Articles/gcArchitecture.html#//apple_ref/doc/uid/TP40002451-SW4
    >

    > Anyone who's used garbage collection with C is probably familiar
    > with the Boehm Garbage Collector.  I believe that the Boehm GC
    > library embodies what most people would expect of a garbage
    > collection system:  The programmer is freed from having to worry
    > about memory allocations

    [snip]

    > It makes no particular demands of the programmer or compiler, in
    > fact it can be used as a drop in replacement for malloc() and
    > free(), requiring no changes.
    >
    > From what I've pieced together, Leopards GC system is nothing like
    > this.  While the Boehm GC system detects liveness passively by
    > scanning memory and looking for and tracing pointers, Leopards GC
    > system does no scanning and requires /active/ notification of
    > changes to the heap.  This, I believe, is what a 'write-barrier'
    > actually is: it is a function call to the GC system so that it can
    > update it's internal state as to what memory is live.  It relies, I
    > suspect exclusively, on these function calls to track memory
    > allocations.

    The Boehm GC and the Leopard Cocoa GC have very different design
    goals.  In the case of Boehm's collector, it's a requirement that the
    collector work without any assistance from the compiler; as a result,
    it has to use "conservative" techniques, which may in general result
    in leaks of arbitrary amounts of memory simply because of a stray
    value that *looks like* a pointer to something.  The lack of compiler
    assistance means that it's almost impossible to write a collector that
    will run in the background (the Boehm collector has to stop *all* the
    other threads in your program every so often if you run it in the
    background), and it's difficult to implement generational behaviour
    without relying on platform-specific features such as access to dirty
    bits from the system page table...  Even in that case, use of dirty
    bits is woefully inefficient compared to compiler co-operation, since
    a single dirty bit means you must re-scan an entire page of memory.
    The Boehm GC is very clever, certainly, but it has to cope with these
    limitations (and more besides).

    Cocoa GC, on the other hand, is able to co-operate with the compiler,
    and that's what the write barriers are.  You have mis-interpreted
    their function; they exist to track inter-generational pointers, not
    to enable some sort of behind-the-scenes reference counting as I think
    you imply.  They may also be used to help the collector to obtain a
    consistent view of the mutator's objects in spite of running in the
    background...  I don't know whether the Leopard GC does that or not.

    (Incidentally, there is also a read barrier, which is used to help
    implement zeroing weak references; the compiler only generates that
    for variables marked __weak.)

    I think, perhaps, that it would be worth your while reading through
    the literature on garbage collection, as you might then understand the
    various trade-offs involved better.

    > In order for leopards GC system to function properly, the compiler
    > must be aware of all pointers that have been allocated by the GC
    > system so that it can wrap all uses of the pointer with the
    > appropriate GC notification functions (objc_assign*).

    Yep.

    [snip]

    > Realistically, to properly add __strong to a pointer, you need to
    > know if that allocation came from the garbage collector.  This
    > information is essentially impossible to know apriori, so the only
    > practical course of action is to defensively qualify all pointers as
    > __strong.

    No.  Cocoa GC mostly deals with objects (which may include Core
    Foundation objects).  That's why the default assumption, which is that
    object pointers are strong, is enough for most situations.

    That only changes if you have pointers of non-object types that happen
    to point to things that were allocated with the GC, *and only then* if
    they are stored in locations that are not scanned by default.  This is
    an unusual situation, since few methods return things that are
    allocated by GC and that are not objects.  -UTF8String is probably the
    most common example, but since you tend not to store the result of
    that method, there would rarely---if ever---be a problem.

    > The consequence of using a pointer that is not properly qualified as
    > __strong is that the GC system may determine that the allocation is
    > no longer live and reclaim it, even if there is still a valid
    > pointer out there.

    Only if there is no copy of the pointer in any of the locations that
    are scanned by default (e.g. the stack, in registers, in global
    variables).

    > It is also trivial to get wrong, and the only indication that
    > there's a problem is an occasional random error or crash.

    In most cases, because GC'd things are objects, it's trivial to get
    *right*.

    It's only in special cases, where you're using C pointer types to
    point to GC'd memory, that you need worry about this kind of thing.

    > I believe I have a succinct example that illustrates these issues:

    [snip]

    > I strongly suspect the pointer that UTF8String returns is a pointer
    > to an allocation from the garbage collector.  In fact, by changing
    > the 'title' ivar to include __strong 'solves' the problem.

    Yes, that's your bug.  It doesn't just 'solve' the problem, the lack
    of __strong here *is* the problem, but only because this is an ivar
    and not e.g. a function argument or a stack-based variable.

    > But this points to a much bigger problem:  anyone who has used
    > UTF8String and not qualified it as __strong has a race condition
    > just waiting to happen.

    No, because stack variables and registers are included in the set of
    GC roots.

    > This is but one example.  I don't think I need to point out that
    > there are others.  A lot of others.  And most of them are non-
    > obvious.  A consequence of all of this is that you must not pass
    > pointers that may have been allocated by the garbage collector to
    > any C function in a library.  For example,
    >
    > printf("String: %s\n", [@"Hello, world!" UTF8String]);

    That code is fine.  The reference is on the stack (or, before that, in
    the register that holds the return value of -UTF8String).  It will be
    followed, so the memory won't be released until the printf() function
    has finished with it.

    > passes a GC allocated pointer to a C library function, which almost
    > assuredly does not have the proper write barrier logic in place to
    > properly guard the pointer.

    The write barrier is nothing to do with it.  The write barrier is for
    inter-generational pointers, and possibly also to help the collector
    to scan in the background safely.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Feb 3, 2008, at 5:57 PM, John Engelhart wrote:
    > int main(int argc, char *argv[]) {
    > GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
    >
    > gcConstTitle = [[GCTest alloc] init];
    > gcUTF8Title = [[GCTest alloc] init];
    >
    > [gcConstTitle setTitle:"Hello, world!"];
    > [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    > \xC2\xA1"] UTF8String]];
    >
    > [[NSGarbageCollector defaultCollector] collectExhaustively];
    > NSLog(@"GC test");
    >
    > printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    > [gcConstTitle title]);
    > printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    > [gcUTF8Title title]);
    >
    > return(0);
    > }

    > The problem is with the pointer returned by UTF8String.  From
    > NSString.h:
    >
    > - (const char *)UTF8String;    // Convenience to return null-terminated
    > UTF8 representation
    >
    > I strongly suspect the pointer that UTF8String returns is a pointer
    > to an allocation from the garbage collector.  In fact, by changing
    > the 'title' ivar to include __strong 'solves' the problem.

    I'd just like to comment quickly that in Tiger and earlier OS X
    releases without GC, that your code here would be just as broken. The -
    UTF8String method has always returned "autoreleased memory", that is,
    a pointer to a UTF8 string that is only being held by an autoreleased
    object. So once the containing autorelease pool was dealloced, so
    would the UTF8 string representation, and you'd have a bad pointer to
    unallocated memory in your object.

      In fact, unlike in the GC world, there was no way to actually keep
    that memory around longer. There was no way to tell it that you wanted
    a __strong reference. If you wanted to keep a pointer to a UTF8
    representation you had to do your own malloc() and make your own copy.

    And that is why this isn't a problem. The new GC implementation
    doesn't make anything a bug that was legal before - it was just as
    much of a crasher before GC as it is after GC.

    Hope this helps,
    - Greg
  • On 2/3/08 8:57 PM, John Engelhart said:

    > This is the basis for my opinion that the 10.5 GC should not be used.

    I've been working on a GC-only app for several months now.  In practice,
    I haven't run into major problems.

    But I would say that the dev tools and docs are not quite 'GC ready'.
    Lots of docs talk about autoreleasing and make no mention of GC.  The
    (great) Debugging Magic technote has not a single hint for GC
    debugging.  MallocDebug does not work with GC apps (you can still have
    leaks in GC apps if your app also uses C/C++ code).  Rosetta does not
    support GC code (could be nice for running unit tests).  OpenGL
    Profiler.app does not work with GC apps.  Interface Bulider does not
    support GC plugins.  Hopefully the tools with catch up soon.

    --
    ____________________________________________________________
    Sean McBride, B. Eng                <sean...>
    Rogue Research                        www.rogue-research.com
    Mac Software Developer              Montréal, Québec, Canada
  • On Feb 4, 2008, at 8:11 AM, Alastair Houghton wrote:

    > or the excellent book "Garbage Collection: Algorithms for Automatic
    > Dynamic Memory Management" by Jones and Lins (Wiley 1997, ISBN
    > 0-471-94148-4 <http://www.amazon.co.uk/dp/0471941484>).

    You must have a later edition than mine, as the inside cover of my
    copy says '96.

    > >
    >
    >> It makes no particular demands of the programmer or compiler, in
    >> fact it can be used as a drop in replacement for malloc() and
    >> free(), requiring no changes.
    >>
    >> From what I've pieced together, Leopards GC system is nothing like
    >> this.  While the Boehm GC system detects liveness passively by
    >> scanning memory and looking for and tracing pointers, Leopards GC
    >> system does no scanning and requires /active/ notification of
    >> changes to the heap.  This, I believe, is what a 'write-barrier'
    >> actually is: it is a function call to the GC system so that it can
    >> update it's internal state as to what memory is live.  It relies, I
    >> suspect exclusively, on these function calls to track memory
    >> allocations.
    >
    > The Boehm GC and the Leopard Cocoa GC have very different design
    > goals.  In the case of Boehm's collector, it's a requirement that
    > the collector work without any assistance from the compiler; as a
    > result, it has to use "conservative" techniques, which may in
    > general result in leaks of arbitrary amounts of memory simply
    > because of a stray value that *looks like* a pointer to something.
    > The lack of compiler assistance means that it's almost impossible to
    > write a collector that will run in the background (the Boehm
    > collector has to stop *all* the other threads in your program every
    > so often if you run it in the background), and it's difficult to
    > implement generational behaviour without relying on platform-
    > specific features such as access to dirty bits from the system page
    > table...  Even in that case, use of dirty bits is woefully
    > inefficient compared to compiler co-operation, since a single dirty
    > bit means you must re-scan an entire page of memory.  The Boehm GC
    > is very clever, certainly, but it has to cope with these limitations
    > (and more besides).

    I've always enjoyed using the Boehm garbage collector.  I've never had
    a problem with it's speed, and off the top of my head I can't think of
    any issue where I had to work around the collector.  It always just
    'works', and not only works but has caught several pointer misuse or
    off by one errors as well.  It's a joy to use, you literally just
    allocate and forget.  I had such high hopes for Cocoa's GC system
    because once you're spoiled by GC, it's hard to go back.

    Unfortunately, the 4-5 months of time I've put in on Leopard's GC
    system has not been nearly as pleasant.  It has been outright
    frustrating, and it's reached the point where I consider the system
    untenable.

    >
    > Cocoa GC, on the other hand, is able to co-operate with the
    > compiler, and that's what the write barriers are.  You have mis-
    > interpreted their function; they exist to track inter-generational
    > pointers, not to enable some sort of behind-the-scenes reference
    > counting as I think you imply.  They may also be used to help the
    > collector to obtain a consistent view of the mutator's objects in
    > spite of running in the background...  I don't know whether the
    > Leopard GC does that or not.

    As I've stated, my opinions are formed from the publicly available
    documentation and my (hair pulling) experiences over the last few
    months.  The quick and the short of it is Leopards GC system behaves
    unlike any other GC system I've used.

    >
    > (Incidentally, there is also a read barrier, which is used to help
    > implement zeroing weak references; the compiler only generates that
    > for variables marked __weak.)
    >
    > I think, perhaps, that it would be worth your while reading through
    > the literature on garbage collection, as you might then understand
    > the various trade-offs involved better.
    >
    >> In order for leopards GC system to function properly, the compiler
    >> must be aware of all pointers that have been allocated by the GC
    >> system so that it can wrap all uses of the pointer with the
    >> appropriate GC notification functions (objc_assign*).
    >
    > Yep.
    >
    > [snip]
    >
    >> Realistically, to properly add __strong to a pointer, you need to
    >> know if that allocation came from the garbage collector.  This
    >> information is essentially impossible to know apriori, so the only
    >> practical course of action is to defensively qualify all pointers
    >> as __strong.
    >
    > No.  Cocoa GC mostly deals with objects (which may include Core
    > Foundation objects).  That's why the default assumption, which is
    > that object pointers are strong, is enough for most situations.

    There is nothing special about objects.  I believe that this doesn't
    quite hold true for the ObjC 2.0 64 bit API, but it still holds true
    for the 32 bit API:  Objects are nothing but pointers to structs.
    Your typical
    @interface MYObject : NSObject {
      void *ptr;
    }

    essentially becomes:

    typedef struct {
    #include "NSObject_struct_bits";
      void *ptr;
    } MYObject;

    The following is a working example of the key points of objective-c,
    and for all practical purposes, this is what your objective-c object
    gets turned in to.  It's literally possible to hack up a small perl
    script that gets you 60-70% of the way to a full blown "Objective-C
    Compiler":

    #include <stdio.h>
    #include <stdlib.h>

    typedef struct { const char *title; } MYObject;
    void      *alloc  (MYObject *self, const char
    *_cmd)                      { return(calloc(1, sizeof(MYObject))); }
    void      *init    (MYObject *self, const char
    *_cmd)                      { return(self); }
    void        setTitle(MYObject *self, const char *_cmd, const char
    *newTitle) { self->title = newTitle; }
    const char *title  (MYObject *self, const char
    *_cmd)                      { return(self->title); }

    int main(int argc, char *argv[]) {
      MYObject *testObject = NULL;

      testObject = init(alloc(NULL, "alloc"), "init");
      setTitle(testObject, "setTitle:", "Object Title");
      const char *theTitle = title(testObject, "title");
      printf("Object: %p title: %p, '%s'\n", testObject, theTitle,
    theTitle);

      return(0);
    }

    [<johne...>] /tmp% gcc -o obj obj.c
    [<johne...>] /tmp% ./obj
    Object: 0x100120 title: 0x1fcc, 'Object Title'

    You can even have the compiler create your objects ivars with the
    @defs directive.  In fact, it's possible and perfectly legal to create
    a C function inside your @implementation with a prototype like
    myCFunction(MYObject *self, SEL _cmd) and access your objects ivar's
    with "self->ivar" inside the function.

    >
    > That only changes if you have pointers of non-object types that
    > happen to point to things that were allocated with the GC, *and only
    > then* if they are stored in locations that are not scanned by
    > default.  This is an unusual situation, since few methods return
    > things that are allocated by GC and that are not objects.  -
    > UTF8String is probably the most common example, but since you tend
    > not to store the result of that method, there would rarely---if
    > ever---be a problem.

    As the above example illustrates, the entire issue of "object" vs.
    "non-object" is a red-herring.  There is nothing special about
    objects, nor anything special about ivars.  The fact that the GCC
    compiler attempts to 'automagically' detect which pointers are
    __strong behind your back only obscures the issues at hand.  Because
    of this automatic promotion, it's easy to fall in to a trap where
    objects are some how magical.  Nothing could be further from the truth.

    The fact of the matter is that the DEFAULT behavior for pointers in
    Leopards GC system is that they are ignored and do not point to live
    data.  I challenge anyone to find another GC system in which the
    default behavior for a pointer is to be ignored, and what it points to
    to NOT be considered part of the live set.  However, through some
    unspecified logic, SOME pointers are elevated to 'Points to live GC
    data'.

    I mean, seriously, can anyone conjure up a compelling reason why the
    default behavior of a pointer is that it does not point to live data?
    I can think of some infrequent special cases when I would want to turn
    it off, but off by default unless you qualify it with __strong?

    >
    >> The consequence of using a pointer that is not properly qualified
    >> as __strong is that the GC system may determine that the allocation
    >> is no longer live and reclaim it, even if there is still a valid
    >> pointer out there.
    >
    > Only if there is no copy of the pointer in any of the locations that
    > are scanned by default (e.g. the stack, in registers, in global
    > variables).

    I'd put instantiated objects on that list.

    >
    >> It is also trivial to get wrong, and the only indication that
    >> there's a problem is an occasional random error or crash.
    >
    > In most cases, because GC'd things are objects, it's trivial to get
    > *right*.
    >
    > It's only in special cases, where you're using C pointer types to
    > point to GC'd memory, that you need worry about this kind of thing.

    Your choice of words makes me suspect that you consider an object
    pointer, such as NSObject *, and a C pointer, ala void * or char *, to
    be two distinctly different things.  I believe I have shown that this
    is not the case, and one can, in fact, consider them all to be 'void
    *' pointers for the purposes of reasoning.

    When considered from the 'void *' perspective, I believe your argument
    highlights my point:  Pointers are pointers, and knowing which ones to
    treat as 'special' is non-trivial and easy to get wrong.

    >
    >> I believe I have a succinct example that illustrates these issues:
    >
    > [snip]
    >
    >> I strongly suspect the pointer that UTF8String returns is a pointer
    >> to an allocation from the garbage collector.  In fact, by changing
    >> the 'title' ivar to include __strong 'solves' the problem.
    >
    > Yes, that's your bug.  It doesn't just 'solve' the problem, the lack
    > of __strong here *is* the problem, but only because this is an ivar
    > and not e.g. a function argument or a stack-based variable.

    I don't disagree with you, 'technically' it's my bug, but this is my
    point.  Take a step back for a second and consider what you're saying:

    The garbage collector has reclaimed the allocation that contains the
    text for the string.  From an object that the GC system considers to
    be live.  That contains a pointer, that isn't 'hidden' by xor or what
    not from the GC system, it's a normal pointer to the allocation.  That
    the garbage collector just recycled because there are no references to
    keep it live.

    Can you name another garbage collector in which this is a /
    programmers/ error, and not a bug in the GC system?  Reading over this
    I almost have to chuckle at the absurdity of it.  Yet when you get
    right down to it, this is what is being advocated.

    Now, trying to find 'bugs' such as this in running code is every
    programmers worst nightmare.  The bug manifests itself only when the
    GC system reclaims it, which is essentially at some completely random,
    non-deterministic point in time in the future.  There is essentially
    nothing you can do to reproduce the bug.

    Then take a look again at the method prototype:

    - (const char *)UTF8String;    // Convenience to return null-terminated
    UTF8 representation

    Can you clearly and concisely articulate why the pointer returned from
    this particular method requires a __strong qualifier?  Remember, if
    you get it wrong you sign yourself up for many long nights of trying
    to track down some random, non repeatable bug.

    Then consider the following:

    - (const char *)hexString;

    - (const char *)hexString
    {
      char *hexPtr = NULL;
      asprintf(&hexPtr, "0x%8.8x", myIvar);
      return(hexPtr);
    }

    Now what?  And whatever you do, DON'T cross the streams, or you risk
    total protonic reversal.

    >
    >> But this points to a much bigger problem:  anyone who has used
    >> UTF8String and not qualified it as __strong has a race condition
    >> just waiting to happen.
    >
    > No, because stack variables and registers are included in the set of
    > GC roots.

    Oddly, I don't find this reassuring.  In fact, I think it might make
    things worse.  Why?  This implies that the collector considers
    anything that looks like a pointer on the stack is a pointer, and it
    should be considered live and followed.  If this is the case, this has
    the effect of automatically promoting all pointers on the stack to
    __strong, and that's a problem.  This masks pointer declaration
    errors, and pointers that are missing __strong will work as a side
    effect of this behavior instead of causing crashes.

    Besides which, not everything lives on the stack.

    >
    >> This is but one example.  I don't think I need to point out that
    >> there are others.  A lot of others.  And most of them are non-
    >> obvious.  A consequence of all of this is that you must not pass
    >> pointers that may have been allocated by the garbage collector to
    >> any C function in a library.  For example,
    >>
    >> printf("String: %s\n", [@"Hello, world!" UTF8String]);
    >
    > That code is fine.  The reference is on the stack (or, before that,
    > in the register that holds the return value of -UTF8String).  It
    > will be followed, so the memory won't be released until the printf()
    > function has finished with it.

    Again, you are correct in the sense that this is how the 10.5 GC
    system works.

    I still contend that this is a design flaw.  If the pointer were
    passed any other way, let's say via some mutex guarded inter-thread
    queue, it would require a __strong qualification.  This works due to a
    side effect of the GC system promoting ALL pointers it sees on the
    stack to __strong.  It's another "exception to the rule" to keep track
    of.

    This is why I believe Leopards GC system is fundamentally flawed.
    There are a lot of these little rules and one offs you need to keep
    track of, and hope that whoever wrote the code you're calling got it
    all right too.  The UTF8String highlights just how easy it is to
    forget to add a __strong qualifier in front of a pointer, and that
    most of the time things will work just fine.  Until you hit that odd
    ball corner case, and then at some point long after the initial event
    that caused the problem has passed, things crash.

    Those of you out there that think that these are non-issues, or "might
    happen rarely"... please, knock yourself out and have fun.  You too
    can add "Reflexively knows the default address for the stack of the
    first four threads and can unwind said stack frames by hand!" to your
    resume.  I'm just saying that I've been down this road and I'll gladly
    take tracking down multithreaded deadlocks and hunting that last
    missing release over this any day.

    >
    >> passes a GC allocated pointer to a C library function, which almost
    >> assuredly does not have the proper write barrier logic in place to
    >> properly guard the pointer.
    >
    > The write barrier is nothing to do with it.  The write barrier is
    > for inter-generational pointers, and possibly also to help the
    > collector to scan in the background safely.

    Well, you see...  You'd think that, wouldn't you?  But take a careful
    look at the example I posted.  Now, this is just a simple executable
    built in the shell for example purposes, so none of the AppKit stuff
    is fired up.  The GC docs also say that the GC system is demand driven
    under these conditions and that AppKit kicks the GC system to spawn a
    background collector thread (objc_startCollectorThread())... so, I
    think it's safe to say that things are 'quite' and nothing fancy is
    going on in the background.... and there's only a few lines, so we can
    be reasonably sure there's no background, hidden mutation events that
    a write barrier would normally catch... but follow the steps closely
    (from the original example posted)

      [gcConstTitle setTitle:"Hello, world!"];
      [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    \xC2\xA1"] UTF8String]];

      [[NSGarbageCollector defaultCollector] collectExhaustively];
      NSLog(@"GC test");

      printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    [gcConstTitle title]);
      printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    [gcUTF8Title title]);

      return(0);
    }
    [<johne...>] GC% gcc -framework Foundation -fobjc-gc-only gc.m -
    o gc
    [<johne...>] GC% ./gc
    Setting title.  Old title: 0x0, new title 0x1ed4 = 'Hello, world!'
    Setting title.  Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
    2008-02-03 19:07:58.911 gc[6191:807] GC test
    gcConstTitle title: 0x1ed4 = 'Hello, world!'
    gcUTF8Title  title: 0x1011860 = '??0?" '
    [<johne...>] GC%

    We can be reasonably sure that the pointer to the UTF8String is
    'visible' before we call the collector, and there's no race conditions
    happening. There's no mutations that a write barrier needs to
    intercept going on. The gcUTF8Title object is clearly still 'live'
    according to the GC system.  The object clearly has the same pointer
    in its ivar before and after the collection, yet the GC system
    reclaimed the allocation that contained the string.

    This has got to be the only GC system in which an object is live and
    traceable from the roots, contains a pointer (that's not hidden or
    anything else fancy) to an allocation that contains the text of the
    string, and the GC system considers the string buffer to be dead, and
    it's the fault of the programmer.  Because the default behavior for
    pointers is not that they point to live data, but that they are
    ignored and not considered when tracing the heap.  Pointers don't
    point to things that are needed that often, do they?
  • On Feb 4, 2008, at 4:14 PM, John Engelhart wrote:

    > However, through some unspecified logic, SOME pointers are elevated
    > to 'Points to live GC data'.

    The logic isn't unspecified.

    If a variable is of an object type or is of a pointer type with the
    qualifier "__strong", it refers to something allocated -- and cleaned
    up -- by the collector.  Otherwise, it refers to something not
    allocated by the collector.

    > I mean, seriously, can anyone conjure up a compelling reason why the
    > default behavior of a pointer is that it does not point to live data?

    Yes.  Distinguishing between pointers to collector-allocated objects
    and non-collector-allocated objects ensures that the collector has far
    less work to do and can do the work it has more efficiently, because
    it can have more exact information about what portions of memory it
    needs to check for strong references to objects.

    It also goes a long way towards preventing false roots, which can help
    keep down the working set of an application that uses garbage
    collection.

    > Your choice of words makes me suspect that you consider an object
    > pointer, such as NSObject *, and a C pointer, ala void * or char *,
    > to be two distinctly different things.

    They can be; when running under GC, an object is assumed to be
    allocated by the collector.  An arbitrary buffer is not.  If you want
    to use an arbitrary "non-object pointer" type variable to refer to
    something that is allocated by the collector (e.g. a buffer returned
    from NSAllocateCollectable) you need to mark that variable with the
    __strong type qualifier so it gets the same treatment as an object
    type variable.

    > Pointers are pointers, and knowing which ones to treat as 'special'
    > is non-trivial and easy to get wrong.

    In Objective-C it is relatively straightforward to tell which pointer
    type variables to treat as objects.  You have pointers to objects
    which you can send arbitrary messages to, and pointers to things that
    aren't objects which you can't send arbitrary messages to.  The
    compiler itself has to be able to tell the difference between them to
    generate correct code, warnings, and errors.  The syntax therefore
    really isn't that ambiguous:  If there has been an @interface or
    @class declaration for it, it's an instances of a class, otherwise
    it's something else.

    In practice for many developers this isn't a significant issue.  I
    don't recall having seen any Cocoa code which used "char *" to store
    an object pointer, for example.  There are few places in idiomatic
    Cocoa where you might commonly use a "void *" to store an object
    pointer, and in those situations it's straightforward (and correct
    under non-GC as well) to introduce a CFRetain of the object before
    storing into the "void *" and a CFRelease of the object after the
    "void *" is no longer relevant.  (Under GC, CFRetain effectively adds
    an extra root while CFRelease removes one.)

    One of the places I do this is in code that presents a sheet, because
    it returns control to the main run loop.  If I have to pass an object
    as the "(void *)context" parameter to the sheet invocation, I CFRetain
    it first.  Then in the sheet's did-dismiss selector (if it has one) or
    did-end selector (if there's no did-dismiss), I CFRelease the object.
    This ensures that "the sheet" acts as a root for the object in case
    it's transient and just being used to pass information around.

    > [gcConstTitle setTitle:"Hello, world!"];
    > [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    > \xC2\xA1"] UTF8String]];
    >
    > [[NSGarbageCollector defaultCollector] collectExhaustively];
    > NSLog(@"GC test");
    >
    > printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    > [gcConstTitle title]);
    > printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    > [gcUTF8Title title]);

    If you build this example non-GC, and you replace

      [[NSGarbageCollector defaultCollector] collectExhaustively];

    with

      [pool drain];

    it would be just as incorrect.  The object backing that UTF-8 string
    is no longer live, therefore you can't trust that the UTF-8 string
    itself is valid.

    I appreciate that the design of the Objective-C garbage collector in
    Mac OS X 10.5 is not what you might be used to -- especially in that
    it *does* take advantage of the fact that you can treat variables
    typed as "object" differently from variables typed as "non-object
    pointer" in Objective-C -- but it really doesn't have the fundamental
    or design flaws that you assert it does.

      -- Chris
  • Hi John,

    I think I have an idea of how your expectations differ from the design
    goals of "auto". You're expecting the GC to take over memory
    management for the entire process, whereas libauto is designed to
    handle the memory management of only the Objective-C side of things
    (that is, objects). That is, libauto is _not_ a replacement for
    malloc()/free(), but only for -retain/-release (simplistically). I
    presume Apple chose this so that existing C libraries using malloc()/
    free() would work identically.

    For GC to work with ObjC objects whilst keeping vanilla malloc
    behavior unchanged, the system has to make some assumptions. This
    brings up issues where objects and void* meet. As the documentation
    says (and as you've discovered), the rules are:

    • For GC-allocated memory, void* references stored in globals and on
    the stack are considered strong.
    • For GC-allocated memory, void* references stored in objects are
    ignored by default.
    • For malloc-allocated memory, void* references stored anywhere are
    unchanged.

    (By void* I mean any non-id pointer, including const char *.)

    I don't know why the second rule was chosen (having the GC ignore
    undecorated void* ivars), and I haven't had enough experience with it
    to know if it's a good or bad thing. But you certainly need to be
    aware of it if you're used to a purely-GC environment, as auto
    provides a mixed malloc/GC environment.

    So there are new conventions for Leopard's GC. I'll attempt to start a
    list here, but don't take it as fact until there has been some peer
    review.

    • If you get a non-object pointer from somewhere (e.g. returned by -
    [NSString UTF8String]), you need to know if it has been allocated in
    the GC zone or the malloc zone. (The documentation "should" tell you
    which.)
    - If it's in the GC zone, you need to store the pointer in a strong
    instance variable if you want it to stick around for more than the
    current stack frame.
    - If it's in the malloc zone, store it anywhere, but it's your
    responsibility to free() it.

    (Anything else?)

    Jonathon Mah
    <me...>
  • On 5 Feb 2008, at 00:14, John Engelhart wrote:

    > I had such high hopes for Cocoa's GC system because once you're
    > spoiled by GC, it's hard to go back.
    >
    > Unfortunately, the 4-5 months of time I've put in on Leopard's GC
    > system has not been nearly as pleasant.  It has been outright
    > frustrating, and it's reached the point where I consider the system
    > untenable.

    Honestly, this point has now been answered over and over.

    I think it comes down to the fact that you have failed to appreciate
    that Cocoa GC is designed for easy use with *OBJECTS*.  If you're
    using it with objects, it "just works".

    The only problem you've been able to demonstrate is in a situation
    where you are doing something that would never have worked under the
    autorelease model.  And you've been told that you need to add __strong
    in that case, *because* the compiler won't automatically pick that
    pointer out as being a pointer to GC'd memory.

    All of the rest of your post is worry about nothing.  You can't create
    any of the problems that you claimed to be concerned about (e.g.
    collections destroying temporary objects), and you had misinterpreted
    the write barriers as something that they weren't (if you have the
    Garbage Collection book, I don't know why you did that; it's explained
    quite clearly in there).

    As far as objects "not being special", they *are* special, in that the
    compiler generates layout information and method signatures for them.
    AFAIK the layout information (albeit in a slightly different format)
    is used by the garbage collector when scanning objects, which is
    another reason that you need to use __strong on instance variables if
    they point to non-object garbage collected memory.  FYI, the Boehm
    collector can also take advantage of layout information, so you could
    create the same issue with that collector too; the only reason that
    you don't often see it is that programmers are generally too lazy to
    specify the pointer layout of their memory blocks and just let the
    collector conservatively scan everything, which, of course, is slower
    and more error prone (i.e. greater likelihood of leaks).

    As for your example:

    > - (const char *)hexString
    > {
    > char *hexPtr = NULL;
    > asprintf(&hexPtr, "0x%8.8x", myIvar);
    > return(hexPtr);
    > }

    That method is just badly designed.  You shouldn't be returning
    malloc()'d memory from a method like that; at the very least you
    should name the method to indicate that you're doing something funny
    with memory ownership, e.g.

      - (void)getHexString:(const char **)mallocedResult
      {
        asprintf(mallocedResult, "0x%8.8x", myIvar);
      }

    but better yet, why not do

      - (NSString *)hexString
      {
        return [NSString stringWithFormat:@"0x%8.8x", myIvar];
      }

    or if you absolutely must return a const char * pointer,

      - (const char *)hexString
      {
        return [[NSString stringWithFormat:@"0x%8.8x", myIvar] UTF8String];
      }

    which has the benefit of working exactly like -UTF8String.  If you
    *really* wanted, you could mess about with NSAllocateCollectable().

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:

    > On 5 Feb 2008, at 00:14, John Engelhart wrote:
    >
    >> I had such high hopes for Cocoa's GC system because once you're
    >> spoiled by GC, it's hard to go back.
    >>
    >> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
    >> system has not been nearly as pleasant.  It has been outright
    >> frustrating, and it's reached the point where I consider the system
    >> untenable.
    >
    > Honestly, this point has now been answered over and over.
    >
    > I think it comes down to the fact that you have failed to appreciate
    > that Cocoa GC is designed for easy use with *OBJECTS*.  If you're
    > using it with objects, it "just works".

    You misunderstand what Objective C is, and how it works.  "Objects" is
    synonymous for "Structs".

    > As far as objects "not being special", they *are* special, in that
    > the compiler generates layout information and method signatures for
    > them.

    Read the above, "object" is synonymous for "struct".  The "layout" of
    an object is identical to the "layout" of a struct.  This point is so
    basic and fundamental to Objective-C and how things work at a low
    level that it seriously brings in to question the accuracy of the rest
    of your conclusions.  If you do not understand the fundamentals such
    as this, I do not see how you can possibly predict the effects and
    implications of pointers in Leopards GC system.

    > AFAIK the layout information (albeit in a slightly different format)
    > is used by the garbage collector when scanning objects, which is
    > another reason that you need to use __strong on instance variables
    > if they point to non-object garbage collected memory.

    Again, this clearly indicates that you do not understand the
    fundamentals at hand.  Your reasoning is faulty (in fact, it's
    outright wrong).  You have, in essence, made my point: How these
    things work, and their subtle interactions, are CRITICAL to the
    correct operation of the GC system.  If you do not understand them,
    you /can not/ possibly use the GC system correctly.  You have,
    LITERALLY, just signed yourself up to tracking down a GC related bug
    in your code.

    You should review the relevant files from the GCC compiler,
    specifically gcc-5465/gcc/objc/objc-act.c from the 'gcc-5465.tar.gz'
    distribution.

    Thus spoke the documentation (documentation/Cocoa/Conceptual/
    GarbageCollection/Articles/gcAPI.html):

    __strong essentially modifies all levels of indirection of a pointer
    to use write-barriers, except when the final indirection produces a
    non-pointer l-value.

    For example:

    @interface GCTest : NSObject {
      __strong void *ptr;
    }

    @implementation GCTest

    - (void)setPtr:(void *)newPtr
    {
      ptr = newPtr;
    }

    __strong does not modify any layout information.  At compile time,
    when the compiler is working with a pointer that is qualified as
    __strong, and the location that contains the pointer is written to /
    updated / assigned (i.e., ptr = newPtr), the compiler re-write the
    assignment to:

    - (void)setPtr:(void *)newPtr
    {
      objc_assignIvar(newPtr, self, offsetof(ptr));
    }

    >
    > As for your example:
    >
    >> - (const char *)hexString
    >> {
    >> char *hexPtr = NULL;
    >> asprintf(&hexPtr, "0x%8.8x", myIvar);
    >> return(hexPtr);
    >> }
    >

    [snip]

    > or if you absolutely must return a const char * pointer,
    >
    > - (const char *)hexString
    > {
    > return [[NSString stringWithFormat:@"0x%8.8x", myIvar] UTF8String];
    > }
    >
    > which has the benefit of working exactly like -UTF8String.  If you
    > *really* wanted, you could mess about with NSAllocateCollectable().

    Oh... my...

    I could not possibly have asked for a better example of just how easy
    it is to get this wrong.

    Let those of you who are considering using Leopards GC system use this
    as a warning of how dangerous its use is in practice.

    First, understand that my original example, the one in which the GC
    system snatched away the live allocation, did use
    NSAllocateCollectable.  Your example, using UTF8String, uses
    NSAllocateCollectable as well.  We can infer this by the behavior
    exhibited by the GC system when qualifying pointers that store the
    results from these methods as __strong, which prevents the collector
    from reclaiming the allocation.  Thus, by induction, the pointer to
    the buffer that contains the strings text must ultimately come from
    NSAllocateCollectable.

    Let's start with NSAllocateCollectable().  The prototype for
    NSAllocateCollectable, from NSZone.h, is as follows:

    FOUNDATION_EXPORT void *__strong NSAllocateCollectable(NSUInteger
    size, NSUInteger options) AVAILABLE_MAC_OS_X_VERSION_10_4_AND_LATER;

    NOTE WELL: the __strong qualifier for the pointer.

    Now, there is no formal grammar of Objective-C 2.0 published, but it
    is a reasonable assumption that "__strong" is in the same group of
    type qualifiers as "const" and "volatile".  This makes __strong
    subject to the same ANSI-C rules governing the use of type qualifiers,
    including promotion and assignment rules.  Returning to the method
    definition of UTF8String, we find the following in NSString.h:

    - (const char *)UTF8String;    // Convenience to return null-terminated
    UTF8 representation

    Since the pointer that UTF8String returns is provably from
    NSAllocateCollectable, this prototype has /DISCARDED/ the type
    qualifier of __strong.

    In (brief, simplified) summary, ANSI-C says that pointer assignments
    from a "lesser qualified" type can be made to a "more qualified" type,
    but not the other way around.  For example:

    char *cp;
    const char *ccp;

    It's perfectly legal to do the following:

    ccp = cp;

    The reverse, however, is not necessarily true:

    cp = ccp;

    This will result in a warning issued by the compiler that the type
    qualification has been discarded.  Now, modifying the example slightly
    (for brevity, I'm going to gloss over the details of why moving const
    past the pointer changes things):

    char *cp;
    char * const cpc;

    The following results in an error:

    cpc = cp;

    Specifically: "error: assignment of read-only variable 'cpc'", and the
    following is legal:

    cp = cpc;

    While there are certain qualifier specific semantics, I think we can
    all reasonably agree that dropping the "__strong" qualifier should be
    an error, as the consequences of discarding it is that the garbage
    collector will loose it's ability to determine the liveness of that
    pointer, resulting in difficult to find bugs.

    QED, assigning a __strong qualifier to a non-__strong qualified
    pointer is an error, per ANSI-C standard type qualifier promotion rules.

    Naturally, one can override this behavior by typecasting the pointer
    assignment, but by doing so, you, the programmer, have explicitly told
    the compiler that the type qualification "does not apply in this
    case.".  This kind of type casting should only be done if you
    understand all the consequences of the resulting type cast, and never
    to just silence the compiler.

    Referring back to my original example in my original post, in which I
    store the pointer from a call to UTF8String to a "const char *title"
    ivar pointer that the garbage collector later considers dead and
    recycles, it is provable that the "__strong" qualifier most certainly
    does apply to the pointer returned.

    Therefore, by ANSI-C language rules, my assignment of the pointer
    returned by UTF8String is legal, and the declaration of UTF8String
    implicitly states that "the GC problem will be taken care of because
    the programmer of this function has EXPLICITLY typecasted the __strong
    qualifier away."

    QED, my use of the UTF8String pointer is bug free and legal by ANSI-C
    rules.  That this later causes the GC system to reclaim the memory
    pointed to by this pointer is due to a bug in prototype of UTF8String.
    By type promotion rules, the prototype for UTF8String should be:

    - (const char * __strong)UTF8String;    // Convenience to return null-
    terminated UTF8 representation

    And, following ANSI-C rules, my assignment to "const char *title;"
    would result in a compiler error by the "no stronger qualified to
    lesser qualified" doctrine.  This is exactly as it should be, because
    as I have demonstrated, dropping that qualifier results in the GC
    system reclaiming live memory.

    Because of the design of Leopards GC system, it is the PROGRAMMERS
    responsibility to INFER when a pointer should be __strong qualified.
    Failure to correctly infer, apriori, which pointers require __strong
    qualification is an intractable problem, and certainly should not be
    left up to the programmer to "guess" correctly.  The consequences of
    getting this CRITICAL qualifier wrong will result in "race condition"
    like problems.  Programs will appear to operate correctly under light
    load, but as they are pushed harder and hard, the conditions to expose
    these race conditions are guaranteed to happen, resulting in nearly
    impossible to find bugs and crashes.

    In practice, as I have found, this results in programs that operate
    problem free during development, but fail in unexplainable ways "in
    the real world", with all the symptoms of race condition induced bugs.

    And now I will show how Leopards GC system is, in fact, FUNDAMENTALLY
    and fatally flawed.

    Since the design of Leopards GC system has hoisted critical aspects of
    it's functioning in to the compiler, and in turn the code emitted by
    the compiler, this has the unfortunate effect of expanding the points
    in which GC bugs can pop up.  Contrast this to a dynamic shared
    library: When a bug is fixed in a dynamic shared library, programs
    using that shared library do not have to be recompiled to take
    advantage of those bug fixes.  Leopards GC system is akin to using
    static libraries, if a bug is found in that library, every single
    program that links to that library must be recompiled.  The use of
    static libraries is considered such poor practice that Sun no longer
    supports their use, for obvious reasons.

    As shown, proper application of ANSI-C type qualifier propagation
    rules would have eliminated the storing of a __strong qualified
    pointer to a non-__strong qualified pointer by refusing to compile the
    program due to errors.  Instead, the compiler has allowed the code to
    compile, both warning and error free, even though the code will result
    in obvious problems later on.

    However, the problem does not lie just in the definition of the
    UTF8String method, the problem is with the compiler itself.  According
    to ANSI-C standard, updating the definition of UTF8String to include
    __strong should result in an error being generated when its result is
    assigned to a "const char *" pointer declaration.

    No such error is generated.

    In fact, the compiler appears to be incapable of correctly following
    ANSI-C type qualifier rules regarding __strong.  For example:

    ---
    #include <stdio.h>
    #include <stdlib.h>

    int main(int argc, char *argv[]) {
      char *cp;
      const char *ccp;
      char * const cpc;
      __strong char *scp;
      char * __strong cps;

      ccp = cp;
      cp  = ccp;  // Line 12

      cpc = cp; // Line 14
      cp = cpc;

      scp = cp;
      cp  = scp;

      cps = cp;
      cp = cps;

      *cp  = 'X';
      *ccp = 'X'; // Line 24
      *scp = 'X';
    }

    [<johne...>] /tmp% gcc -fobjc-gc-only -c test.m
    test.m: In function 'main':
    test.m:12: warning: assignment discards qualifiers from pointer target
    type
    test.m:14: error: assignment of read-only variable 'cpc'
    test.m:24: error: assignment of read-only location
    ---

    Type qualifier rules are correctly followed for the 'const' qualifier,
    but not a single warning or error is given for what are clearly type
    qualification errors according to ANSI-C rules with regard to __strong
    qualified pointers.  It therefore follows that the compiler, by not
    throwing an error, is in fact generating buggy code since such
    assignments are not legal.

    Since, as demonstrated and should be obvious, improperly discarding
    the __strong qualifier WILL result in the GC system reclaiming live
    memory.  In practice, this results in code that appears to run fine
    during development, but due to the "race condition" like nature of
    improper reclamation of live memory by the GC system, "real world"
    code tends to be buggy and unstable, and crashes for mysterious reasons.

    Since the compiler gives no warning, nor error, for discarding the
    __strong qualifier when it should, the compiler is emitting code that
    will cause GC errors.  It's impossible to gauge how frequent this is
    in practice, but I don't think anyone will disagree that the sheer
    volume of code involved in Cocoa effectively guarantees that these
    errors are present.

    Once one accepts that these errors are present, one must implicitly
    accept that when using Leopards GC system, it is just a matter of time
    until the right conditions are present to cause accidental reclamation
    and it's associated problems.  Therefore, by using Leopards GC system,
    you are guaranteeing that you will have random and difficult to find,
    if not outright impossible, bugs.

    QED, Leopards GC system is fundamentally and fatally flawed, and its
    use should be actively discouraged.  Due to the nature and severity of
    improperly dropping the __strong qualifier, all code generated by the
    current compiler (in GC mode) must be considered suspect, and
    pragmatically, discarded.

    While I'm sure you thought your retort was clever, you have in fact
    underlined my point that Leopards GC system is, in fact, deceptively
    difficult, hard to master, and trivially easy to get wrong.

    Unfortunately, I was not very clear in my original message regarding
    these problems.  To those of you who have pointed out that, according
    to the docs, it is an error on my part that I did not qualify my
    pointer with __strong, please consider for a moment that I am writing
    this after several months of using Leopards GC system.  You are all
    right.  In theory, by following the docs, this all works.

    This is not the point I was trying to make.

    In theory, this all works.  In practice, it does not, and it's easy to
    get wrong.  That is my point.

    As I believe I have shown in this message, the design of Leopards GC
    system is such that it essentially guarantees you're going to get it
    wrong at some point.  It gleefully allows you to compile code which
    will create problems within your application, and does so without
    warning or errors.  The fact that the compiler is violating ANSI-C
    rules in discarding the __strong qualifier guarantees that some code,
    some where, is going to get it wrong, and compile it anyway.

    If one were to speculate as to what the behavior of the end result of
    all of this is, one would figure that "things will mostly work, except
    on the rare occasion when they don't".  I'm here to tell you that
    after four months of this, the "rare occasion" is much more frequent
    than you think.

    The consequence of such tight integration with the compiler for GC
    support dooms code to the same problems that haunt static library
    linked code.  There will be no bug fixes for your compiled code.  No
    later version of anything will correct the problems forever frozen in
    your binary.  As stated, some vendors no longer support linking to
    static libraries because of problems like this, for obvious reasons.

    I'm a huge fan of GC.  Those of you who have used the Boehm GC system
    know how easy it is to get spoiled by GC in C and say "Never again!"
    to manual memory management.  Those of you reading this will have to
    draw your own conclusions as to the validity of my claims.  My
    observations are not the result of some theoretical speculation from
    glancing through the GC docs, it's rooted in attempting to use it in
    the real world, on non-toy real problems, and I'm sharing with you the
    pain I've experienced.

    Those of you who've read the examples and think "There's no way I will
    ever slip up and miss a __strong qualification," then you're good to
    go.  Anyone else who thinks "Well, I might, but rarely.." should
    understand the full ramifications of what happens when you slip up.
    This class of bugs is orders of magnitude more difficult to find and
    fix than any retain/release bug you've ever had to deal with.  In
    fact, I'll go so far to say that multi-threaded locking heisenbugs are
    easier, at least then you've got a pretty good idea of where the bug
    originates and can concentrate of finding the rare corner case that
    triggers it.

    As non-trivial, real world examples, consider the following:

    (From "Garbage Collection Programming Guide", Core Foundation section:)

    o NULL, kCFAllocatorDefault, and kCFAllocatorSystemDefault specify
    allocation from the garbage collection zone.
    By default, all Core Foundation objects are allocated in the garbage
    collection zone.

    - (NSString *)someMethod
    {
      NSUInteger finalStringLength = 1024; // Example only
      NSString *copySting = NULL;
      char * __strong restrict copyBuffer = NULL;

      copyBuffer = NSAllocateCollectable(finalStringLength, 0);
      /* Since this is just an example, the part that fills contents of
    copyBuffer with text are omitted */
      copyString =
    NSMakeCollectable
    ((id)CFStringCreateWithCStringNoCopy(kCFAllocatorDefault, copyBuffer,
    kCFStringEncodingUTF8, kCFAllocatorNull));
      /* kCFAllocatorNull = This allocator is useful as the
    bytesDeallocator in CFData or contentsDeallocator in CFString where
    the memory should not be freed.  So.. Don't call free() on our
    NSAllocateCollectable buffer, which is an error. */
      return(copyString);
    }

    You see where the bug is, right? (Those wondering 'Why CFString.. ?',
    it's much faster, and no dispatch overhead.  You get the same effect
    with initWithBytesNoCopy:length:encoding:freeWhenDone:)

    How about this:

    - (id *)copyOfObjectArray:(id *)originalObjects length:
    (NSUInteger)length
    {
      id *newObjectArray = NULL;
      newObjectArray = NSAllocateCollectable(sizeof(id) * length,
    NSScannedOption);
      memcpy(newObjectArray, originalObjects, sizeof(id) * length);
      return(newObjectArray);
    }

    Does this contain a bug?  And if so, where in "Garbage Collection
    Programming Guide" or "NSGarbageCollector Class Reference" does it
    indicate that this is a bug?

    The "Garbage Collection Programming Guide" and "NSGarbageCollector
    Class Reference" documentation say this is "no problem", or at least
    don't say that you shouldn't do this?  Wonder why it's crashing
    randomly then.  The allocation has the NSScannedOption, so the garbage
    collector is obviously scanning the memory..  and it's dealing with
    objects... which are 'special'.. so?

    Hint: This is buggy as hell, which should be intuitively obvious to
    anyone who's read the GC documentation listed above.

    Actually, this is a pretty good test I think.  If, after reading the
    code snippet and two docs above on the GC system, you can't spot the
    bug, you probably shouldn't be using Leopards GC system, cause I
    guarantee you this is just one of many land mines just waiting for you
    to discover.

    If you really want to know the answer: nm -mg /usr/lib/libobjc.dylib |
    grep mem (The 'obvious' in the above hint is satirical, obviously)
  • On 6 Feb 2008, at 09:39, John Engelhart wrote:

    > On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
    >
    >> I think it comes down to the fact that you have failed to
    >> appreciate that Cocoa GC is designed for easy use with *OBJECTS*.
    >> If you're using it with objects, it "just works".
    >
    > You misunderstand what Objective C is, and how it works.

    No, I don't.  I know *exactly* how Objective-C works and what it is.
    Your repeated assertion to the contrary, like much of the rest of your
    posts on this topic, couldn't be more wrong.

    I don't think arguing further with you on-list will be productive, and
    more to the point it will annoy other list members.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Feb 4, 2008, at 8:21 PM, Chris Hanson wrote:

    > On Feb 4, 2008, at 4:14 PM, John Engelhart wrote:
    >
    >> However, through some unspecified logic, SOME pointers are elevated
    >> to 'Points to live GC data'.
    >
    > The logic isn't unspecified.
    >
    > If a variable is of an object type or is of a pointer type with the
    > qualifier "__strong", it refers to something allocated -- and
    > cleaned up -- by the collector.  Otherwise, it refers to something
    > not allocated by the collector.

    Exactly.  So, in my example, since the buffer is allocated by the
    collector (as demonstrated by the collector not reclaiming the
    allocation when it is qualified as __strong), and since we can be
    reasonably sure that allocation came from NSAllocateCollectable, which
    by its prototype returns void * __strong, my example is 100% bug free.

    Unless someone has explicitly typecasted the __strong qualifier away
    some how, but that would be a bug on their part because, as you've
    said, __strong refers to something that's allocated by the collector,
    which the pointer from UTF8String clearly is.

    Thankfully these rules are super simple and virtually impossible to
    get wrong.

    So, as you've said, __strong refers to allocations from the collector,
    pointers that can be traced back to the collector have a __strong
    qualifier, which no one will casually discard because it will lead to
    the collector loosing visibility and creating difficult, hard to find
    bugs, my const char * pointer assignment is promoted to strong, or by
    ANSI-C type qualification rules causes a compiler error for improperly
    discarding the __strong qualifier on assignment, because the
    NSAllocateCollectable collector allocation that is returned by
    UTF8String is marked __strong, as its method prototype clearly shows:

    - (const char *)UTF8String;    // Convenience to return null-terminated
    UTF8 representation

    ... Oh... Well, at least the bug isn't mine, the bug is in the
    prototype for UTF8String, which has erroneously dropped the __strong
    qualifier to a NSAllocateCollectable collector allocation.

    Do I file a bug at this point?  Do I also file a bug with everyone
    who's used the compiler in GC mode and called this method and let them
    know that due to a buggy declaration in a header, they may experience
    issues with the pointer returned by UTF8String that may lead to data
    corruption because the compiler will not emit the required write
    barriers required for proper operation of allocations from the
    collector?

    I'm really glad this is just a hypothetical problem and doesn't happen
    in practice, like this UTF8String example demonstrates.

    >
    >> I mean, seriously, can anyone conjure up a compelling reason why
    >> the default behavior of a pointer is that it does not point to live
    >> data?
    >
    > Yes.  Distinguishing between pointers to collector-allocated objects
    > and non-collector-allocated objects ensures that the collector has
    > far less work to do and can do the work it has more efficiently,
    > because it can have more exact information about what portions of
    > memory it needs to check for strong references to objects.

    While you are undeniably right, your justification specious.

    Like security, the proper way to analyze the problem is not from the
    perspective of "if everything goes right", but "what are the
    consequences on failure."

    From this perspective, it becomes a question of "What are the
    consequences when someone forgets to add __strong?  How often do
    programers make mistakes?  How likely is someone to make this
    mistake?  Are there robust, automated checks in place to make sure
    this doesn't happen?"

    The consequences of inadvertently forgetting to add a __strong
    qualifier when you should are likely to result in random data loss and
    mysterious crashes, all of which are nearly impossible to trace back
    to the root cause due to the fact that the problem occurs at some
    later point in time, far from the source, due to the nature of how the
    collector works.

    Do you see how ridiculous your justification is when stated from this
    perspective?  Can you think of a group of programmers who will take
    "Fast, but buggy and unstable, and impossible to debug" over "Slow,
    but rock solid"?

    Personally, I don't care how f'ing fast the thing is if it essentially
    guarantees instability.  I've got better things to do with my time
    then spend days trying to find the cause of some random, non-
    repeatable crash due to some allocation problem.  Ironically, the very
    thing GC is supposed to prevent.

    > It also goes a long way towards preventing false roots, which can
    > help keep down the working set of an application that uses garbage
    > collection.
    >
    >> Your choice of words makes me suspect that you consider an object
    >> pointer, such as NSObject *, and a C pointer, ala void * or char *,
    >> to be two distinctly different things.
    >
    > They can be; when running under GC, an object is assumed to be
    > allocated by the collector.  An arbitrary buffer is not.  If you
    > want to use an arbitrary "non-object pointer" type variable to refer
    > to something that is allocated by the collector (e.g. a buffer
    > returned from NSAllocateCollectable) you need to mark that variable
    > with the __strong type qualifier so it gets the same treatment as an
    > object type variable.

    By ANSI-C type qualification rules, anything that's returning a
    pointer from NSAllocateCollectable must be returning that pointer with
    the __strong qualification.  Considering the consequences of dropping
    that qualification is likely to result in buggy, unstable code,
    overriding that qualifier by typecasting it out should not be done
    lightly.  There's a reason why '-Wcast-qual' exists.

    >
    >> Pointers are pointers, and knowing which ones to treat as 'special'
    >> is non-trivial and easy to get wrong.
    >
    > In Objective-C it is relatively straightforward to tell which
    > pointer type variables to treat as objects.  You have pointers to
    > objects which you can send arbitrary messages to, and pointers to
    > things that aren't objects which you can't send arbitrary messages
    > to.  The compiler itself has to be able to tell the difference
    > between them to generate correct code, warnings, and errors.

    This is flat out wrong.

    [<johne...>] /tmp% cat gc_str.m
    #import <Foundation/Foundation.h>

    int main(int argc, char *argv[]) {
      NSString *aString = NULL;
      void *ptr = NULL;

      aString = [NSString stringWithString:@"Hello, world"];
      ptr = aString;

      NSLog(@"ptr '%@', description '%@'", ptr, [ptr description]);
    }
    [<johne...>] /tmp% gcc -framework Foundation -fobjc-gc-only -o
    gc_str -g gc_str.m
    gc_str.m: In function 'main':
    gc_str.m:10: warning: invalid receiver type 'void *'
    [<johne...>] /tmp% ./gc_str
    2008-02-06 07:43:06.732 gc_str[17181:807] ptr 'Hello, world',
    description 'Hello, world'
    [<johne...>] /tmp%

    Having the class type allows some extra compile time diagnostics to
    take place, such as sending messages to a class that aren't defined in
    its @implementation, but this is nothing but sugar for you and me.
    Awfully useful sugar to you and me, no doubt about it, but sugar none
    the less.

    As the code above shows, your assertion that "The compiler itself has
    to be able to tell the difference between them (pointers) to generate
    correct code" is obviously wrong.

    > The syntax therefore really isn't that ambiguous:  If there has
    > been an @interface or @class declaration for it, it's an instances
    > of a class, otherwise it's something else.

    No! This is utterly wrong.

    Look at the code above.  By your logic, it is incapable of working
    because the compiler doesn't know it's an object.

    This is not some minor point to be glossed over.  It forms a core part
    of this entire argument.

    If you do not understand the fact that "NSString *string" is equal to
    "void *string", you can not understand the complete ramifications of
    what the type qualifier __strong is doing (or not doing).

    Once you understand that "NSString *" and "void *" are the same thing,
    just a pointer, you are forced to ask "Where is the __strong qualifier
    magically coming from?"

    And before you can answer that, you are forced ask "Wait wait wait,
    why did the compiler allow an unqualified void * pointer to be
    assigned a more qualified void * pointer, against ANSI-C type
    qualifier rules?"

    This will be quickly followed by ".. and by silently dropping the
    __strong qualifier, that means pointer assignments which are critical
    to the proper operation of the collector are getting silently
    discarded left and right."

    You'll know it when you get it because this will be followed by a
    quick mental estimation of just how often this error is occurring in
    the entire code base, along with the sensation of the floor dropping
    out from underneath you while simultaneously feeling what can only be
    described as a baseball bat being cracked over the back of your head.
    In that slow motion, car crash time dilation effect, you'll notice
    yourself slowly uttering the words:

    "Oh... Shit..."

    >
    > In practice for many developers this isn't a significant issue.  I
    > don't recall having seen any Cocoa code which used "char *" to store
    > an object pointer, for example.  There are few places in idiomatic
    > Cocoa where you might commonly use a "void *" to store an object
    > pointer, and in those situations it's straightforward (and correct
    > under non-GC as well) to introduce a CFRetain of the object before
    > storing into the "void *" and a CFRelease of the object after the
    > "void *" is no longer relevant.  (Under GC, CFRetain effectively
    > adds an extra root while CFRelease removes one.)
    >
    > One of the places I do this is in code that presents a sheet,
    > because it returns control to the main run loop.  If I have to pass
    > an object as the "(void *)context" parameter to the sheet
    > invocation, I CFRetain it first.  Then in the sheet's did-dismiss
    > selector (if it has one) or did-end selector (if there's no did-
    > dismiss), I CFRelease the object.  This ensures that "the sheet"
    > acts as a root for the object in case it's transient and just being
    > used to pass information around.

    No, what you're obviously doing is compensating for a buggy and flawed
    GC system which is randomly reclaiming live data, and you're hacking
    around the root problem.  one. pointer. at. a. time.  From my
    experience, this is only after hours, usually days, of debugging of
    trying to find out why every once in awhile displaying a sheet causes
    a crash.

    Your example pretty much epitomizes my experience with Cocoas GC
    system.  I spend far, FAR more time debugging screwy problems like the
    one described, only to have to come up with some god awful hacky
    kludge to get around the problem.  The very problem GC is supposed to
    be fixing and freeing me from dealing with so I can spend my time on
    real problems.

    >
    >> [gcConstTitle setTitle:"Hello, world!"];
    >> [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    >> \xC2\xA1"] UTF8String]];
    >>
    >> [[NSGarbageCollector defaultCollector] collectExhaustively];
    >> NSLog(@"GC test");
    >>
    >> printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    >> [gcConstTitle title]);
    >> printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    >> [gcUTF8Title title]);
    >
    > If you build this example non-GC, and you replace
    >
    > [[NSGarbageCollector defaultCollector] collectExhaustively];
    >
    > with
    >
    > [pool drain];
    >
    > it would be just as incorrect.  The object backing that UTF-8 string
    > is no longer live, therefore you can't trust that the UTF-8 string
    > itself is valid.

    Well, scratching the itch of curiosity, in the non-GC example, does
    the pointer for UTF8String come from NSAllocateCollectable?  That has
    a prototype of void * __strong which indicates that it's from the
    collector and therefore requires write barriers?  No?

    "I'll take 'Not relevant' for $200 and 'Misunderstands the
    fundamentals' for the win, Alex."

    Your example is flawed on the face of it.  retain/release allocation
    documentation makes it pretty clear that such pointers are temporary
    and are valid only up until the autorelease pool pops.  You popped the
    pool, therefore your use after that point in time is clearly an error.

    A garbage collection systems sine qua non is to free the programmer
    from having to deal with the issues memory allocation.  What good is a
    garbage collection system that requires me to hand hold it every step
    of the way, that causes me to spend MUCH more time having to deal with
    memory allocation problems than if I'd never used it in the first
    place?  In the GC example, there is a live pointer to an allocation
    that the GC system has reclaimed.  That allocation comes from a
    function that returns a __strong qualified pointer, and UTF8String has
    silently discarded it, and as a consequence, caused a perfectly
    legitimate and live pointer to become invisible to the collector.
  • On Feb 6, 2008, at 4:39 AM, John Engelhart wrote:

    > Read the above, "object" is synonymous for "struct".  The "layout"
    > of an object is identical to the "layout" of a struct.

    That is true but irrelevant. What matters for garbage collection is
    whether the variables are typed as objects at compile time, because
    that's what determines what code the compiler emits for assignments.

    > - (const char *)UTF8String;    // Convenience to return null-
    > terminated UTF8 representation
    >
    > Since the pointer that UTF8String returns is provably from
    > NSAllocateCollectable, this prototype has /DISCARDED/ the type
    > qualifier of __strong.

    The API does not promise that -UTF8String returns a collectable
    pointer. The current implementation may do so, but the documentation
    says that you should copy the C string if you want to store it.

    --Michael
  • On Feb 6, 2008, at 3:39 AM, John Engelhart wrote:

    >
    > On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
    >
    >> On 5 Feb 2008, at 00:14, John Engelhart wrote:
    >>
    >>> I had such high hopes for Cocoa's GC system because once you're
    >>> spoiled by GC, it's hard to go back.
    >>>
    >>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
    >>> system has not been nearly as pleasant.  It has been outright
    >>> frustrating, and it's reached the point where I consider the
    >>> system untenable.
    >>
    >> Honestly, this point has now been answered over and over.
    >>
    >> I think it comes down to the fact that you have failed to
    >> appreciate that Cocoa GC is designed for easy use with *OBJECTS*.
    >> If you're using it with objects, it "just works".
    >
    > You misunderstand what Objective C is, and how it works.  "Objects"
    > is synonymous for "Structs".

    If that were true, you'd be able to declare objects as local variables
    (as opposed to as pointers to structures):

    NSPoint aPoint; // <-- NSPoint = struct, legal
    NSString aString; // <-- NSString = object, Illegal

    (yes, at one time there was an attempt to add support for that, but it
    didn't survive).

    Structures don't have "magic invisible members":

    @interface Foo {
    }
    @end

    Foo *aFoo;
    NSLog(@"Foo is a %@", aFoo->isa);

    Notice how there is an "isa" member that is automatically put there,
    not unlike the way that a C++ object might have a vtable (or other
    internal plumbing for multiple inheritance).

    Glenn Andreas                      <gandreas...>
      <http://www.gandreas.com/> wicked fun!
    quadrium | flame : flame fractals & strange attractors : build,
    mutate, evolve, animate
  • On 6 Feb 2008, at 09:39, John Engelhart wrote:

    > - (NSString *)someMethod
    > {
    > NSUInteger finalStringLength = 1024; // Example only
    > NSString *copySting = NULL;
    > char * __strong restrict copyBuffer = NULL;
    >
    > copyBuffer = NSAllocateCollectable(finalStringLength, 0);
    > /* Since this is just an example, the part that fills contents of
    > copyBuffer with text are omitted */
    > copyString =
    > NSMakeCollectable
    > ((id)CFStringCreateWithCStringNoCopy(kCFAllocatorDefault,
    > copyBuffer, kCFStringEncodingUTF8, kCFAllocatorNull));
    > /* kCFAllocatorNull = This allocator is useful as the
    > bytesDeallocator in CFData or contentsDeallocator in CFString where
    > the memory should not be freed.  So.. Don't call free() on our
    > NSAllocateCollectable buffer, which is an error. */
    > return(copyString);
    > }
    >
    > You see where the bug is, right?

    I'll just add, publicly, that I think this probably is a bug in
    CFString that John has found here.  That is, I don't see why
    CFString's pointer shouldn't be traced by the collector in this case
    (it doesn't appear to be; certainly when I try it the backing buffer
    is released).  The problem also occurs with NSString's -
    initWithBytesNoCopy:length:encoding:freeWhenDone: et al.

    I've asked John if he's filed a bug report (I just filed one, <rdar://5727379
    > , with a working code snippet).

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Feb 6, 2008, at 10:55 AM, Alastair Houghton wrote:

    > I'll just add, publicly, that I think this probably is a bug in
    > CFString that John has found here.  That is, I don't see why
    > CFString's pointer shouldn't be traced by the collector in this
    > case (it doesn't appear to be; certainly when I try it the backing
    > buffer is released).  The problem also occurs with NSString's -
    > initWithBytesNoCopy:length:encoding:freeWhenDone: et al.

    I don't think this is a bug. The NSString and CFString APIs do not
    indicate that they treat the bytes as scanned memory. In fact, when
    you pass in kCFAllocatorNull you are telling CFString that you
    "assume responsibility for deallocating the buffer." At the end of -
    someMethod, you haven't saved a __strong reference to the buffer, so
    the collector is allowed to free it.

    --Michael
  • On Feb 6, 2008 1:39 AM, John Engelhart <john.engelhart...> wrote:
    >
    > On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
    >
    >> On 5 Feb 2008, at 00:14, John Engelhart wrote:
    >>
    >>> I had such high hopes for Cocoa's GC system because once you're
    >>> spoiled by GC, it's hard to go back.
    >>>
    >>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
    >>> system has not been nearly as pleasant.  It has been outright
    >>> frustrating, and it's reached the point where I consider the system
    >>> untenable.
    >>
    >> Honestly, this point has now been answered over and over.
    >>
    >> I think it comes down to the fact that you have failed to appreciate
    >> that Cocoa GC is designed for easy use with *OBJECTS*.  If you're
    >> using it with objects, it "just works".
    >
    > You misunderstand what Objective C is, and how it works.  "Objects" is
    > synonymous for "Structs".

    No, he understands perfectly well. "Structs" are an implementation
    detail of "Objects". That is, conceptually, each Objective-C object
    *has a* C struct.

    >> As far as objects "not being special", they *are* special, in that
    >> the compiler generates layout information and method signatures for
    >> them.
    >
    > Read the above, "object" is synonymous for "struct".  The "layout" of
    > an object is identical to the "layout" of a struct.

    Again, that is an implementation detail of the old (32-bit) runtime,
    this is not true going forward, and you would be well served to sever
    that association in your mental model of Objective-C. Believe me,
    Alastair has demonstrated many times over that he understands what he
    is saying.

    >> or if you absolutely must return a const char * pointer,
    >>
    >> - (const char *)hexString
    >> {
    >> return [[NSString stringWithFormat:@"0x%8.8x", myIvar] UTF8String];
    >> }
    >>
    >> which has the benefit of working exactly like -UTF8String.  If you
    >> *really* wanted, you could mess about with NSAllocateCollectable().
    >
    > Oh... my...
    >
    > I could not possibly have asked for a better example of just how easy
    > it is to get this wrong.

    There is nothing wrong with this example. Since the return value is on
    the stack, it is rooted. If you get the result of this function and
    expect it to stay around for any length of time, it is your
    responsibility to copy it. This is the case without GC and it is still
    the case with GC.

    > Let those of you who are considering using Leopards GC system use this
    > as a warning of how dangerous its use is in practice.
    >
    > First, understand that my original example, the one in which the GC
    > system snatched away the live allocation, did use
    > NSAllocateCollectable.  Your example, using UTF8String, uses
    > NSAllocateCollectable as well.  We can infer this by the behavior
    > exhibited by the GC system when qualifying pointers that store the
    > results from these methods as __strong, which prevents the collector
    > from reclaiming the allocation.  Thus, by induction, the pointer to
    > the buffer that contains the strings text must ultimately come from
    > NSAllocateCollectable.
    >
    > Let's start with NSAllocateCollectable().  The prototype for
    > NSAllocateCollectable, from NSZone.h, is as follows:
    >
    > FOUNDATION_EXPORT void *__strong NSAllocateCollectable(NSUInteger
    > size, NSUInteger options) AVAILABLE_MAC_OS_X_VERSION_10_4_AND_LATER;
    >
    > NOTE WELL: the __strong qualifier for the pointer.

    __strong as a qualifier on a return value is effectively meaningless;
    All values on the stack or in a register are implicitly strong. It is
    simply there to add information for the human reading it.

    > Now, there is no formal grammar of Objective-C 2.0 published, but it
    > is a reasonable assumption that "__strong" is in the same group of
    > type qualifiers as "const" and "volatile".  This makes __strong
    > subject to the same ANSI-C rules governing the use of type qualifiers,
    > including promotion and assignment rules. Returning to the method
    > definition of UTF8String, we find the following in NSString.h:
    >
    > - (const char *)UTF8String;    // Convenience to return null-terminated
    > UTF8 representation
    >
    > Since the pointer that UTF8String returns is provably from
    > NSAllocateCollectable,

    It may do this now, but that is an implementation detail. Using the
    result of UTF8String will work just fine as long as you don't store a
    pointer to it for longer than the current context (whether that
    context is an autorelease pool in pre-GC or it is the length of time
    which the value is on the stack in GC) then you must copy it.

    >
    > QED, my use of the UTF8String pointer is bug free and legal by ANSI-C
    > rules.  That this later causes the GC system to reclaim the memory
    > pointed to by this pointer is due to a bug in prototype of UTF8String.
    > By type promotion rules, the prototype for UTF8String should be:

    Your use of UTF8String is counter to the documentation of that method.

    "The returned C string is automatically freed just as a returned
    object would be released; you should copy the C string if it needs to
    store it outside of the autorelease context in which the C string is
    created."

    This is true, regardless of how Apple decides to implement this
    method. For instance, it could be a pointer to inline storage within
    the NSString instance itself, it could be a pointer into an
    autoreleased NSData object, it could be a pointer to an
    NSAllocateCollectable-allocated block of memory, etc. Regardless of
    how it is implemented, it still behaves as per the contract given you
    by UTF8String, you don't need to know any more or any less to use it
    properly.

    --
    Clark S. Cox III
    <clarkcox3...>
  • >>
    >
    > Structures don't have "magic invisible members":
    >
    > @interface Foo {
    > }
    > @end
    >
    > Foo *aFoo;
    > NSLog(@"Foo is a %@", aFoo->isa);
    >
    > Notice how there is an "isa" member that is automatically put there,
    > not unlike the way that a C++ object might have a vtable (or other
    > internal plumbing for multiple inheritance).
    >

    Wrong. This will not work. Foo will not have a magic isa ivar, an is
    not a valid objc class.

    You have to either:
    1) inherit from a valid root class (NSObject).
    2) add a "Class isa" ivar to your declaration.

    See the NSObject declaration:

    @interface NSObject <NSObject> {
        Class    isa;
    }
    ...
    @end
  • On 6 Feb 2008, at 16:46, Michael Tsai wrote:

    > On Feb 6, 2008, at 10:55 AM, Alastair Houghton wrote:
    >
    >> I'll just add, publicly, that I think this probably is a bug in
    >> CFString that John has found here.  That is, I don't see why
    >> CFString's pointer shouldn't be traced by the collector in this
    >> case (it doesn't appear to be; certainly when I try it the backing
    >> buffer is released).  The problem also occurs with NSString's -
    >> initWithBytesNoCopy:length:encoding:freeWhenDone: et al.
    >
    > I don't think this is a bug. The NSString and CFString APIs do not
    > indicate that they treat the bytes as scanned memory.

    That's true, but it doesn't matter whether they treat the bytes as
    scanned memory or not; that would only change whether putting pointer
    data in the bytes was safe.  The problem is whether the pointer itself
    is being traced, which isn't happening right now; the docs *do* say
    (in the Garbage Collection Programming Guide) that NULL,
    kCFAllocatorDefault and kCFAllocatorSystemDefault cause objects to be
    allocated in the GC zone, so I don't think it's unreasonable to expect
    that the pointer will be traced.

    > In fact, when you pass in kCFAllocatorNull you are telling CFString
    > that you "assume responsibility for deallocating the buffer." At the
    > end of -someMethod, you haven't saved a __strong reference to the
    > buffer, so the collector is allowed to free it.

    It's *an* argument, certainly.

    I just think that there's no harm in making the pointer visible to the
    collector; it doesn't hurt if the pointer isn't pointing into the GC
    pool.  And it would mean that you could pass a chunk of memory
    allocated using NSAllocateCollectable(), which seems not
    unreasonable.  I don't think it's hugely important, since you can
    always use malloc() and let it call free() (which will happen
    automatically), but if it's an easy fix then it's probably worth
    doing.  Not that many people will do this or indeed should be doing
    this kind of thing.

    Anyway, at the very least it's worth drawing to the attention of
    whoever's responsible for CFString at Apple.  If they want to fix it,
    great.  If not, the docs could be updated to say that you shouldn't
    pass GC'd memory into those APIs.

    If we could see the sources for CFString, we could probably make a
    better determination as to whether this was worth fixing.  But
    currently the CF project's sources aren't visible (for Leopard) :-(

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • Please remember that Apple Engineers spend their own time answering
    questions on this list for the benefit of developers and the developer
    community. It isn't part of their job to do this, it's their
    commitment to the community.

    It is simply unfair to take this type of attitude with their response
    (or in fact a response from anyone).

    I don't want any Apple engineer, or any other people who contribute to
    the list, to have to endure this type of acidic response. Posters that
    do so risk moderation.

    Remember, we're all people and we're all trying to work together to
    get 'stuff' done.

    Be kind, rewind.

    Scott Anguish
    Apple

    [if there are issues with this email, please send them to <cocoa-dev-admins...>
    , NOT to this list]

    On Feb 6, 2008, at 6:59 AM, John Engelhart wrote:

    > <snip acidic rant>
  • On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...> wrote:

    > "I'll take 'Not relevant' for $200 and 'Misunderstands the
    > fundamentals' for the win, Alex."

    Speaking of "not relevant" and "misunderstands the fundamentals":

    1) UTF8String returns a non-__strong pointer.
    2) You fail to copy the data at that pointer.
    3) You cry foul when that data disappears.

    > A garbage collection systems sine qua non is to free the programmer
    > from having to deal with the issues memory allocation.

    Exactly. So stop getting your knickers in a twist about whether or not
    UTF8String actually returns memory from NSAllocateCollectable(), and
    simply copy the result as required by the documentation for
    UTF8String.

    Hamish
  • On Feb 6, 2008, at 12:46 PM, Alastair Houghton wrote:

    >> I don't think this is a bug. The NSString and CFString APIs do not
    >> indicate that they treat the bytes as scanned memory.
    >
    > That's true, but it doesn't matter whether they treat the bytes as
    > scanned memory or not; that would only change whether putting
    > pointer data in the bytes was safe.  The problem is whether the
    > pointer itself is being traced, which isn't happening right now

    Sorry, that's what I meant.

    > the docs *do* say (in the Garbage Collection Programming Guide)
    > that NULL, kCFAllocatorDefault and kCFAllocatorSystemDefault cause
    > objects to be allocated in the GC zone, so I don't think it's
    > unreasonable to expect that the pointer will be traced.

    The string was allocated using kCFAllocatorDefault, but the
    deallocator was specified as kCFAllocatorNull. The docs say:

    "If the buffer does not need to be deallocated, or if you want to
    assume responsibility for deallocating the buffer (and not have the
    CFString object deallocate it), pass kCFAllocatorNull."

    If CFString is not going to be responsible for deallocating the
    buffer, then it would not make sense to rely on CFString keeping the
    buffer alive for you.

    > I just think that there's no harm in making the pointer visible to
    > the collector; it doesn't hurt if the pointer isn't pointing into
    > the GC pool.

    I'm not saying there would be harm; I'm just saying that the API
    doesn't say that it traces the pointer, so it's not a bug if it
    doesn't. In the absence of a general guideline or an explicit
    promise, passing a buffer to a CF call is just like passing it to,
    say, a sqlite3 call.

    --Michael
  • On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...> wrote:

    > "Wait wait wait,
    > why did the compiler allow an unqualified void * pointer to be
    > assigned a more qualified void * pointer, against ANSI-C type
    > qualifier rules?"

    Good point, and you might want to file a bug report with Apple that
    gcc should generate a warning in cases like this. Luckily, this is not
    the same as "Leopard GC is inexorably broken". Why are you so keen to
    throw out the baby with the bathwater?

    Hamish
  • On 6 Feb 2008, at 17:52, Hamish Allan wrote:

    > On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...>
    > wrote:
    >
    >> "I'll take 'Not relevant' for $200 and 'Misunderstands the
    >> fundamentals' for the win, Alex."
    >
    > Speaking of "not relevant" and "misunderstands the fundamentals":
    >
    > 1) UTF8String returns a non-__strong pointer.

    __strong isn't a type qualifier, it's an attribute (in the sense of
    the __attribute__ keyword).  The distinction is perhaps a bit subtle,
    especially as attributes can be attached to a typedef'd type, but it's
    the reason that you can put __strong anywhere in a variable
    declaration and it still has the same effect.  It *isn't* like const
    or volatile, and the ANSI C rules regarding type qualifiers absolutely
    *do not* apply.

    Furthermore I *think* (and this is from memory, based on some work I
    did on GCC several years ago, so I might be wrong) that if you write
    something like

      void * __strong MyFunction(void);

    you'll find that the __strong attribute is attached to the *function*
    rather than to the type.  In any case it's going to be ignored because
    __strong only really affects variables, not types or functions.

    > 2) You fail to copy the data at that pointer.
    > 3) You cry foul when that data disappears.

    Well the bug is *either* ignoring the bit in the -UTF8String docs
    where it says you should copy the string (though that does read like
    it was only intended to talk about the non-GC case---I just filed <rdar://5727581
    > asking for a clarification), or not using __strong on the variable
    you're storing the result in.

    >> A garbage collection systems sine qua non is to free the programmer
    >> from having to deal with the issues memory allocation.
    >
    > Exactly. So stop getting your knickers in a twist about whether or not
    > UTF8String actually returns memory from NSAllocateCollectable(), and
    > simply copy the result as required by the documentation for
    > UTF8String.

    Indeed, I rather wish I hadn't mentioned NSAllocateCollectable(),
    since I think it's only muddied the waters further.

    Kind regards,

    Alastair.

    --
    http://alastairs-place.net
  • On Feb 6, 2008, at 11:05 AM, Jean-Daniel Dupas wrote:

    >>>
    >>
    >> Structures don't have "magic invisible members":
    >>
    >> @interface Foo {
    >> }
    >> @end
    >>
    >> Foo *aFoo;
    >> NSLog(@"Foo is a %@", aFoo->isa);
    >>
    >> Notice how there is an "isa" member that is automatically put
    >> there, not unlike the way that a C++ object might have a vtable (or
    >> other internal plumbing for multiple inheritance).
    >>
    >
    >
    > Wrong. This will not work. Foo will not have a magic isa ivar, an is
    > not a valid objc class.
    >
    > You have to either:
    > 1) inherit from a valid root class (NSObject).
    > 2) add a "Class isa" ivar to your declaration.
    >
    > See the NSObject declaration:
    >
    > @interface NSObject <NSObject> {
    > Class    isa;
    > }
    > ...
    > @end

    Mea culpa - someday I'll learn to not post until the _second_ cup of
    coffee....

    Perhaps a more accurate statement is that it has an "explicit magic
    member" - it must have an isa pointer as the very first field
    (contrary to the caffeine deprived ruminations, the compiler doesn't
    put it there automatically).

    Regardless, all objects must start with that isa pointer - the runtime
    requires it.  Objects _are_ special, and pretending that this is just
    another arbitrary struct is incorrect:
    1) They can't live on the stack
    2) They have a special isa pointer
    3) They have implicit requirements to be allocated/copied/released
    using special routines (NSAllocateObject, NSCopyObject,
    NSDeallocateObject) - i.e., it's unclear if you attempted to malloc or
    new a structure of the same type and manually filled in the isa
    pointer would necessarily work on current or future Objective-C
    runtimes (it would almost certainly fail under 64 bit Objective 2.0)

    Glenn Andreas                      <gandreas...>
      <http://www.gandreas.com/> wicked fun!
    quadrium | prime : build, mutate, evolve, animate : the next
    generation of fractal art
  • On 2/6/08 7:12 PM, Alastair Houghton said:

    > Furthermore I *think* (and this is from memory, based on some work I
    > did on GCC several years ago, so I might be wrong) that if you write
    > something like
    >
    > void * __strong MyFunction(void);
    >
    > you'll find that the __strong attribute is attached to the *function*
    > rather than to the type.  In any case it's going to be ignored because
    > __strong only really affects variables, not types or functions.

    Well, NSAllocateCollectable is declared like so in NSZone.h:

    FOUNDATION_EXPORT void *__strong NSAllocateCollectable(NSUInteger size,
    NSUInteger options) AVAILABLE_MAC_OS_X_VERSION_10_4_AND_LATER;

    And the comment just above says "the pointer type of the stored location
    must be marked with the __strong attribute in order for the write-
    barrier assignment primitive to be generated".

    --
    ____________________________________________________________
    Sean McBride, B. Eng                <sean...>
    Rogue Research                        www.rogue-research.com
    Mac Software Developer              Montréal, Québec, Canada
  • On Feb 6, 2008 3:23 PM, glenn andreas <gandreas...> wrote:

    > Regardless, all objects must start with that isa pointer - the runtime
    > requires it.  Objects _are_ special, and pretending that this is just
    > another arbitrary struct is incorrect:
    > 1) They can't live on the stack

    This is a bit nitpicky, but they can:

    struct { Class isa; id ivar; } fakeObj;
    fakeObj.isa = [SomeClass class];
    id fakeObjPtr = (id)&fakeObj;
    [fakeObjPtr myCustomInitializer];

    Of course you have to ensure that the layout is correct, and you can't use
    this with any Cocoa classes since you completely break Cocoa's idea of
    initialization memory management, but it works fine with classes which are
    100% custom (i.e. do not have Cocoa anywhere within their inheritance chain)
    as long as they understand stack semantics. You can fix the layout problem
    in 32-bit by using @defs, but in 64-bit land you'd have to fall back to
    trickery with alloca or C variable-length arrays, neither of which is going
    to be much fun.

    The Stepstone compiler explicitly supported stack objects by doing the
    obvious thing and leaving out the * when declaring a local object variable.
    Obviously gcc doesn't support this though.

          2) They have a special isa pointer
    > 3) They have implicit requirements to be allocated/copied/released
    > using special routines (NSAllocateObject, NSCopyObject,
    > NSDeallocateObject) - i.e., it's unclear if you attempted to malloc or
    > new a structure of the same type and manually filled in the isa
    > pointer would necessarily work on current or future Objective-C
    > runtimes (it would almost certainly fail under 64 bit Objective 2.0)

    As far as I know, the only tricky thing about the 64-bit runtime is that you
    can't calculate the amount of storage required at compile time.

    Once you have an object, the only thing that really cares about it is the
    objc_msgSend family of functions and ivar access. objc_msgSend only cares
    about the isa pointer which will still be there as long as you set it up
    properly, and ivar access, even in 64-bit land, is just accessing memory at
    an offset to the self pointer. Cocoa classes have certain memory management
    requirements but ObjC classes do not. You can write your own class hierarchy
    which uses malloc/free, new/delete, mmap, or any other memory allocation
    technique and it will work fine. You'll lose out on garbage collection, but
    that's to be expected when you start doing custom memory allocation.

    I would not recommend actually *doing* any of the above, but there's a
    fairly wide space between what's reasonable to do and what's possible to do.

    Digressing slightly, it would be nice if NSObject would offer separate
    deinitialization and deallocation methods to match the separate
    initialization and allocation methods so that it would be possible to
    implement a custom memory allocation scheme in an NSObject subclass. As it
    stands, you either have to implement your own root class, or you have to
    subclass directly from NSObject and assume that its dealloc method does
    nothing other than free the object's memory.

    Mike
  • On Feb 6, 2008 7:12 PM, Alastair Houghton <alastair...> wrote:

    > Indeed, I rather wish I hadn't mentioned NSAllocateCollectable(),
    > since I think it's only muddied the waters further.

    I'm starting to wonder whether much of this debate could have been
    avoided if NSAllocateCollectable() hadn't been declared __strong. If
    such an attribute is meaningless for a function, it serves only to
    mislead and confuse.

    Hamish
  • On Feb 6, 2008, at 2:12 PM, Alastair Houghton wrote:

    > On 6 Feb 2008, at 17:52, Hamish Allan wrote:
    >
    >> On Feb 6, 2008 2:59 PM, John Engelhart <john.engelhart...>
    >> wrote:
    >>
    >>> "I'll take 'Not relevant' for $200 and 'Misunderstands the
    >>> fundamentals' for the win, Alex."
    >>
    >> Speaking of "not relevant" and "misunderstands the fundamentals":
    >>
    >> 1) UTF8String returns a non-__strong pointer.
    >
    > __strong isn't a type qualifier, it's an attribute (in the sense of
    > the __attribute__ keyword).  The distinction is perhaps a bit
    > subtle, especially as attributes can be attached to a typedef'd
    > type, but it's the reason that you can put __strong anywhere in a
    > variable declaration and it still has the same effect.  It *isn't*
    > like const or volatile, and the ANSI C rules regarding type
    > qualifiers absolutely *do not* apply.

    As I mentioned previously, having no formal grammar of ObjC 2.0 makes
    this point debatable.  Your interpretation that __strong is an
    __attribute__ extension could certainly be true, and is a valid way of
    looking at things.  Without a grammar to guide us, I think both
    interpretations are equally valid.

    However, consider for a moment if it was a type attribute, and
    followed type attribute rules, and how this would effect the examples
    cited.  Off the top of my head, I think treating it as a type
    attribute would have prevent every single error I've pointed out.
    Hypothetically, consider if UTF8String propagated the __strong type,
    and the assignment of its pointer to the ivar 'const char *'.

    The compiler would fail to compile the code, and generate an error.

    Again, I'm pretty sure that every example posted would be caught by if
    __strong was treated as a type attribute.

    Since we have two valid ways to interpret the meaning of __strong, I
    believe that the usefulness of catching these errors at compile time
    argues strongly in favor of considering it a type attribute.

    >
    > Furthermore I *think* (and this is from memory, based on some work I
    > did on GCC several years ago, so I might be wrong) that if you write
    > something like
    >
    > void * __strong MyFunction(void);
    >
    > you'll find that the __strong attribute is attached to the
    > *function* rather than to the type.  In any case it's going to be
    > ignored because __strong only really affects variables, not types or
    > functions.

    I hate this part of the language.  Each qualifier has it's quirks, and
    their application is non obvious, including exactly what they apply
    to.  The distinction between

    const char * ptr; and
    char * const ptr;

    isn't terribly clear by just looking at it.  Add to this

    const char * const ptr;

    Three, totally different things, and using const twice in this way
    just seems like it would be a bug at first glance.
    >
    >> 2) You fail to copy the data at that pointer.
    >> 3) You cry foul when that data disappears.
    >
    > Well the bug is *either* ignoring the bit in the -UTF8String docs
    > where it says you should copy the string (though that does read like
    > it was only intended to talk about the non-GC case---I just filed <rdar://5727581
    >> asking for a clarification), or not using __strong on the variable
    > you're storing the result in.

    Actually, I've thought of another example which addresses the use of
    (or lack of) __strong unambiguously and still demonstrates the problem:

    #import <Foundation/Foundation.h>

    @interface GCTest : NSObject {
      const char *title;
    };

    - (void)setTitle:(const char *)newTitle;
    - (const char *)title;

    @end

    @implementation GCTest

    - (void)setTitle:(const char *)newTitle
    {
      printf("Setting title.  Old title: %p, new title %p = '%s'\n",
    title, newTitle, newTitle);
      title = newTitle;
    }

    - (const char *)title
    {
      return title;
    }

    @end

    int main(int argc, char *argv[]) {
      NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
      GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
      void *ptr;

      gcConstTitle = [[GCTest alloc] init];
      gcUTF8Title = [[GCTest alloc] init];

      [gcConstTitle setTitle:"Hello, world!"];
      [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    \xC2\xA1"] UTF8String]];

      NSLog(@"Test: %@", @"hello");
      [[NSGarbageCollector defaultCollector] collectExhaustively];
      NSLog(@"GC test");

      printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    [gcConstTitle title]);
      printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    [gcUTF8Title title]);

      [gcConstTitle setTitle:NULL];  // Must clear the pointer before
    popping pool.
      [gcUTF8Title setTitle:NULL];

      [pool release];
      return(0);
    }
    [<johne...>] /tmp% gcc -framework Foundation -fobjc-gc-only -o
    gc -g gc.m
    [<johne...>] /tmp% ./gc
    Setting title.  Old title: 0x0, new title 0x1ea4 = 'Hello, world!'
    Setting title.  Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
    2008-02-06 18:32:35.712 gc[18108:807] Test: hello
    2008-02-06 18:32:35.798 gc[18108:807] GC test
    gcConstTitle title: 0x1ea4 = 'Hello, world!'
    gcUTF8Title  title: 0x1011860 = 'Hello, world'
    Setting title.  Old title: 0x1ea4, new title 0x0 = '(null)'
    Setting title.  Old title: 0x1011860, new title 0x0 = '(null)'

    Oddly, I had to add a second NSLog() in order to get some kind of
    lossage, but I think it's fair to chalk this up to the semi-random
    nature of allocations.

    The above example is now perfectly legal by everyones definition of
    how things were under retain/release, and I correctly clear the
    pointer before it goes out of scope, and demonstrates that the GC
    system can, and does, reclaim live data out from under you.

    >
    >>> A garbage collection systems sine qua non is to free the programmer
    >>> from having to deal with the issues memory allocation.
    >>
    >> Exactly. So stop getting your knickers in a twist about whether or
    >> not
    >> UTF8String actually returns memory from NSAllocateCollectable(), and
    >> simply copy the result as required by the documentation for
    >> UTF8String.
    >
    > Indeed, I rather wish I hadn't mentioned NSAllocateCollectable(),
    > since I think it's only muddied the waters further.

    I don't.  Again, consider the hypothetical case where __strong is type
    qualifier.  During development of UTF8String, this would cause a
    warning or error (implementation dependent, but I'd argue for error).
    This would highlight the fact that the prototype is discarding a
    critical qualifier.  Assuming that the definition of UTF8String was
    altered to include __strong, my attempt at assigning a __strong
    qualified pointer to a non-strong pointer would instantly be flagged
    and reported by the compiler.

    Consider the case of handing a __strong pointer off to a function,
    such as CFStringCreateWithCStringNoCopy().  If the prototype does not
    have __strong for the buffer argument, my example of handing it an
    NSAllocateCollectable pointer would again, instantly trigger a
    compiler warning or error (I vote error considering the consequences).

    It's hard to argue that this is not "The Right Thing" to be doing as
    it would have mooted every single point I have raised, and caught all
    of these errors at compile time before they could have become
    problems.  This would have also caught the problem in the example I
    pasted above, at compile time, and alerted me that I just created a
    problem.

    Finally, consider the effects of the current behavior of silently
    discarding __strong.  As the example in this message shows, it's
    surprisingly easy to create conditions which violate the conditions
    required for proper GC behavior.

    After four months of practical, hands on usage of Leopards GC system,
    my experience is that this is happening far, far more frequently than
    you might think.  I have had to deal with endless problems under GC
    which have all the tell tale signs of race conditions.  Because what
    I'm developing is dual mode, flipping over to retain/release works
    flawlessly, even after intensive multithreaded concurrent extreme
    stress testing.  The retain/release side of thing never crashes, but
    the GC side of things mostly doesn't crash.

    I think the evidence I've put forth is pretty strong.  I think the
    example I give in this message addresses everyones concerns regarding
    the use of __strong.  It's pretty clear that I don't need to use
    __strong for the pointer, and that under retain/release methodology,
    it is incapable of freeing the UTF8String buffer while its still in
    use.  It's certainly open to debate how frequently this happens in
    practice.  I have shown that it is possible.  In my opinion, after
    four months of use, this is happening much more frequently than you
    might think at first approximation.  And just like race conditions, it
    gets worse the harder you push, which is to be expected.  It's pretty
    much a ticking time bomb, and everything seems to work fine when
    you're doing development, but then starts failing mysteriously out in
    the field.
  • On Feb 7, 2008 1:06 AM, John Engelhart <john.engelhart...> wrote:

    > It's pretty clear that I don't need to use __strong for the pointer

    I don't think this is at all clear. You have only a single weak
    reference to the data returned by UTF8String, so it's not at all
    surprising when it gets lost. This is pretty much akin to what happens
    if you fail to send a retain to an autoreleased object, but you're not
    complaining that pre-Leopard memory management is broken.

    Hamish
  • On 07/02/2008, at 12:06 PM, John Engelhart wrote:

    > However, consider for a moment if it was a type attribute, and
    > followed type attribute rules, and how this would effect the
    > examples cited.  Off the top of my head, I think treating it as a
    > type attribute would have prevent every single error I've pointed
    > out.  Hypothetically, consider if UTF8String propagated the __strong
    > type, and the assignment of its pointer to the ivar 'const char *'.
    >
    > The compiler would fail to compile the code, and generate an error.

    I don't think it should be a type qualifier. It would mean that you
    wouldn't be able to do things like:

    puts ([myString UTF8String])

    without getting a compiler warning.

    > Oddly, I had to add a second NSLog() in order to get some kind of
    > lossage, but I think it's fair to chalk this up to the semi-random
    > nature of allocations.

    I think that would have been because the pointer returned by
    UTF8String was still on the stack or in a register.

    > The above example is now perfectly legal by everyones definition of
    > how things were under retain/release, and I correctly clear the
    > pointer before it goes out of scope, and demonstrates that the GC
    > system can, and does, reclaim live data out from under you.

    I think your example is contrived. Whilst it's legal in the retain/
    release world you wouldn't ever write anything like that.

    The solution to all of this is, as has already been stated, is to
    understand the contract that UTF8String promises and to make your own
    arrangements if you want to hang on to the value.

    - Chris
  • On Feb 6, 2008, at 10:45 AM, glenn andreas wrote:

    >
    > On Feb 6, 2008, at 3:39 AM, John Engelhart wrote:
    >
    >>
    >> On Feb 5, 2008, at 7:40 AM, Alastair Houghton wrote:
    >>
    >>> On 5 Feb 2008, at 00:14, John Engelhart wrote:
    >>>
    >>>> I had such high hopes for Cocoa's GC system because once you're
    >>>> spoiled by GC, it's hard to go back.
    >>>>
    >>>> Unfortunately, the 4-5 months of time I've put in on Leopard's GC
    >>>> system has not been nearly as pleasant.  It has been outright
    >>>> frustrating, and it's reached the point where I consider the
    >>>> system untenable.
    >>>
    >>> Honestly, this point has now been answered over and over.
    >>>
    >>> I think it comes down to the fact that you have failed to
    >>> appreciate that Cocoa GC is designed for easy use with *OBJECTS*.
    >>> If you're using it with objects, it "just works".
    >>
    >> You misunderstand what Objective C is, and how it works.  "Objects"
    >> is synonymous for "Structs".
    >
    >
    >
    > If that were true, you'd be able to declare objects as local
    > variables (as opposed to as pointers to structures):
    >
    > NSPoint aPoint; // <-- NSPoint = struct, legal
    > NSString aString; // <-- NSString = object, Illegal

    Surprisingly, you can actually do this.  It requires some contortions
    and manual initiation, but the end result ends up being identical to
    what "NSString aString" would have been:

    [<johne...>] /tmp% cat tst.m
    #import <Foundation/Foundation.h>

    int main(int argc, char **argv) {
      NSObject *stackObject = NULL;
      stackObject = alloca(sizeof(stackObject));
      memset(stackObject, 0, sizeof(NSObject));
      stackObject->isa = [NSObject class];

      NSLog(@"stackObject: %@", stackObject);
    }
    [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
    tst.m: In function 'main':
    tst.m:7: warning: instance variable 'isa' is @protected; this will be
    a hard error in the future
    [<johne...>] /tmp% ./tst
    2008-02-06 20:17:16.306 tst[18320:807] stackObject: <NSObject:
    0xbffff380>

    As the address clearly shows, this is an object on the stack.
    Although I have had to manually initialize the object, it is exactly,
    or very close to, what "NSObject stackObject" would have created.

    This is generally a bad idea to do in practice as the object is
    "deallocated" as soon as the frame pops.  But because it's on the
    stack there is no way to ensure something along the lines of
    "release / dealloc" happens to make sure that any of the resources the
    object may have created/acquired are released... this is generally a
    bad idea, and why it is disallowed in practice.

    As you can see, and the code clearly demonstrates, my original
    assertion stands.

    >
    > (yes, at one time there was an attempt to add support for that, but
    > it didn't survive).

    Yes, the machinery to arrange for stack objects to have a chance to
    "dealloc" when the stack frame pops is non-trivial.  But, as shown
    above, you can do it, but it's really not a good idea.

    >
    > Structures don't have "magic invisible members":
    >
    > @interface Foo {
    > }
    > @end
    >
    > Foo *aFoo;
    > NSLog(@"Foo is a %@", aFoo->isa);
    >
    > Notice how there is an "isa" member that is automatically put there,
    > not unlike the way that a C++ object might have a vtable (or other
    > internal plumbing for multiple inheritance).

    struct FooDef { @defs(Foo) } *aFoo;
    aFoo->isa;

    You're right, structs don't have magic invisible members.  The @defs()
    directive allows you to "copy" the ivars from the @interface
    declaration.

    It's instructive to look at objc/objc.h, where we find the following:

    typedef struct objc_class *Class;
    typedef struct objc_object {
        Class isa;
    } *id;

    typedef struct objc_selector     *SEL;
    typedef id             (*IMP)(id, SEL, ...);

    As you can see, it's very clear where "isa" comes from.  Subclassing
    an object has the effect of "pasting" your ivar declarations at the
    end of the class you're inheriting from, and forms the cause of
    "fragile classes" since a struct effectively becomes pointer + offset,
    and changing a struct requires recompiling code to update those offsets.

    So, no, there are no magic invisible members.  Furthermore:

    typedef id             (*IMP)(id, SEL, ...);

    Is a function prototype declaration.  You're probably familiar with
    it, as the following makes a bit more clear:

    typedef id (*IMP)(id self, SEL _cmd, ...);

    This is where self and _cmd come from.  Nothing hidden.  It's pure
    ANSI-C, with a bit of syntactic sugar that automates common tasks like
    object inheritance, automatic scoping of variables from self, and a
    clever key (selector) / value (pointer to a function) dynamic run time
    system known as "message dispatching."
  • On Feb 6, 2008 5:06 PM, John Engelhart <john.engelhart...> wrote:
    >
    > Actually, I've thought of another example which addresses the use of
    > (or lack of) __strong unambiguously and still demonstrates the problem:

    No, you're just rehashing and reformulating the same argument over and
    over again. You're not listening.

    >
    > #import <Foundation/Foundation.h>
    >
    > @interface GCTest : NSObject {
    > const char *title;
    > };
    >
    > - (void)setTitle:(const char *)newTitle;
    > - (const char *)title;
    >
    > @end
    >
    > @implementation GCTest
    >
    > - (void)setTitle:(const char *)newTitle
    > {
    > printf("Setting title.  Old title: %p, new title %p = '%s'\n",
    > title, newTitle, newTitle);
    > title = newTitle;

    This is a bad thing to do in pre-GC Cocoa, it is still a bad thing to
    do in post-GC Cocoa. Don't do it. If you want to store the result of
    calling UTF8String, copy it. Period.

    > }
    >
    > - (const char *)title
    > {
    > return title;
    > }
    >
    > @end
    >
    > int main(int argc, char *argv[]) {
    > NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    > GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
    > void *ptr;
    >
    > gcConstTitle = [[GCTest alloc] init];
    > gcUTF8Title = [[GCTest alloc] init];
    >
    > [gcConstTitle setTitle:"Hello, world!"];
    > [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world \xC2\xA1"] UTF8String]];

    At this point, the temporary NSString object has exactly zero roots.
    Why are surprised that it goes away?

    > NSLog(@"Test: %@", @"hello");
    > [[NSGarbageCollector defaultCollector] collectExhaustively];
    > NSLog(@"GC test");
    >
    > printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title], [gcConstTitle title]);
    > printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title], [gcUTF8Title title]);
    >
    > [gcConstTitle setTitle:NULL];  // Must clear the pointer before popping pool.
    > [gcUTF8Title setTitle:NULL];
    >
    > [pool release];
    > return(0);
    > }

    > The above example is now perfectly legal by everyones definition of
    > how things were under retain/release, and I correctly clear the
    > pointer before it goes out of scope, and demonstrates that the GC
    > system can, and does, reclaim live data out from under you.

    No it is not. Holding on to the result of UTF8String for longer than
    the lifetime of the NSString from which it came is not legal. How hard
    is that to understand?

    --
    Clark S. Cox III
    <clarkcox3...>
  • On Feb 6, 2008, at 7:48 PM, John Engelhart wrote:
    > int main(int argc, char **argv) {
    > NSObject *stackObject = NULL;
    > stackObject = alloca(sizeof(stackObject));

    This, of course, just alloced something the size of a pointer, which
    may not be the real size of the class (if it were something other than
    NSObject)
    >
    > memset(stackObject, 0, sizeof(NSObject));
    > stackObject->isa = [NSObject class];
    >
    > NSLog(@"stackObject: %@", stackObject);
    > }
    > [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
    > tst.m: In function 'main':
    > tst.m:7: warning: instance variable 'isa' is @protected; this will
    > be a hard error in the future
    > [<johne...>] /tmp% ./tst
    > 2008-02-06 20:17:16.306 tst[18320:807] stackObject: <NSObject:
    > 0xbffff380>
    >
    > As the address clearly shows, this is an object on the stack.
    > Although I have had to manually initialize the object, it is
    > exactly, or very close to, what "NSObject stackObject" would have
    > created.
    >
    >

    The biggest problem is that the above example shows something that
    isn't usable - it's sterile.  You can't pass it to other routines (due
    to memory management requirements), you may not even be able to call
    all the methods of the object (since they may pass the themselves as
    parameters to other objects  which invoke memory management
    requirements).  Only if you wrote your own entire hierarchies could
    you use such a construction (or if you implemented full closures,
    which is even more difficult in a C based language) .  There's a whole
    raft of semantics associated with Cocoa objects (above and beyond
    anything that whatever version of the objective-c runtime may require).

    If you can't use the object, have you actually created it?

    >
    > As you can see, and the code clearly demonstrates, my original
    > assertion stands.
    >

    One could also manually construct a C++ vtable and set up all the
    magic "behind the scenes" plumbing that a C++ object has, but that
    doesn't mean that C++ object are ' synonymous for "Structs" ' either.

    The point is that they are not synonymous for structs - if they were,
    you could replace one with the other, and both Objective-C and C++
    objects have additional semantic requirements.

    >
    > As you can see, it's very clear where "isa" comes from.  Subclassing
    > an object has the effect of "pasting" your ivar declarations at the
    > end of the class you're inheriting from, and forms the cause of
    > "fragile classes" since a struct effectively becomes pointer +
    > offset, and changing a struct requires recompiling code to update
    > those offsets.

    Except, of course, for 64 bit Objective-C 2.0 which doesn't have these
    problems (which makes the exercise of trying to allocate it on the
    stack even more problematic).

    >
    >
    > So, no, there are no magic invisible members.

    But there are semanticly required members that aren't in "plain"
    structs.

    Glenn Andreas                      <gandreas...>
      <http://www.gandreas.com/> wicked fun!
    quadrium | flame : flame fractals & strange attractors : build,
    mutate, evolve, animate
  • On Feb 6, 2008, at 8:06 PM, John Engelhart wrote:
    > --snip--
    > Actually, I've thought of another example which addresses the use of
    > (or lack of) __strong unambiguously and still demonstrates the
    > problem:
    >
    > #import <Foundation/Foundation.h>
    >
    > @interface GCTest : NSObject {
    > const char *title;
    > };
    >
    > - (void)setTitle:(const char *)newTitle;
    > - (const char *)title;
    >
    > @end
    >
    > @implementation GCTest
    >
    > - (void)setTitle:(const char *)newTitle
    > {
    > printf("Setting title.  Old title: %p, new title %p = '%s'\n",
    > title, newTitle, newTitle);
    > title = newTitle;
    > }
    >
    > - (const char *)title
    > {
    > return title;
    > }
    >
    > @end
    >
    > int main(int argc, char *argv[]) {
    > NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    > GCTest *gcConstTitle = NULL, *gcUTF8Title = NULL;
    > void *ptr;
    >
    > gcConstTitle = [[GCTest alloc] init];
    > gcUTF8Title = [[GCTest alloc] init];
    >
    > [gcConstTitle setTitle:"Hello, world!"];
    > [gcUTF8Title setTitle:[[NSString stringWithUTF8String:"Hello, world
    > \xC2\xA1"] UTF8String]];
    >
    > NSLog(@"Test: %@", @"hello");
    > [[NSGarbageCollector defaultCollector] collectExhaustively];
    > NSLog(@"GC test");
    >
    > printf("gcConstTitle title: %p = '%s'\n", [gcConstTitle title],
    > [gcConstTitle title]);
    > printf("gcUTF8Title  title: %p = '%s'\n", [gcUTF8Title title],
    > [gcUTF8Title title]);
    >
    > [gcConstTitle setTitle:NULL];  // Must clear the pointer before
    > popping pool.
    > [gcUTF8Title setTitle:NULL];
    >
    > [pool release];
    > return(0);
    > }
    > [<johne...>] /tmp% gcc -framework Foundation -fobjc-gc-only -
    > o gc -g gc.m
    > [<johne...>] /tmp% ./gc
    > Setting title.  Old title: 0x0, new title 0x1ea4 = 'Hello, world!'
    > Setting title.  Old title: 0x0, new title 0x1011860 = 'Hello, world¡'
    > 2008-02-06 18:32:35.712 gc[18108:807] Test: hello
    > 2008-02-06 18:32:35.798 gc[18108:807] GC test
    > gcConstTitle title: 0x1ea4 = 'Hello, world!'
    > gcUTF8Title  title: 0x1011860 = 'Hello, world'
    > Setting title.  Old title: 0x1ea4, new title 0x0 = '(null)'
    > Setting title.  Old title: 0x1011860, new title 0x0 = '(null)'
    >
    > Oddly, I had to add a second NSLog() in order to get some kind of
    > lossage, but I think it's fair to chalk this up to the semi-random
    > nature of allocations.
    >
    > The above example is now perfectly legal by everyones definition of
    > how things were under retain/release, and I correctly clear the
    > pointer before it goes out of scope, and demonstrates that the GC
    > system can, and does, reclaim live data out from under you.
    >
    --snip--

    "
    UTF8String
    Returns a null-terminated UTF8 representation of the receiver.

    - (const char *)UTF8String

    "

    Direct from Apple's docs.  You seriously need to go back to basics if
    you don't understand how screwed up your logic here is!
  • On Feb 6, 2008, at 7:01 PM, glenn andreas wrote:
    >
    > One could also manually construct a C++ vtable and set up all the
    > magic "behind the scenes" plumbing that a C++ object has, but that
    > doesn't mean that C++ object are ' synonymous for "Structs" ' either.

    Well, in C++ you can say that class is a synonym for struct (aside
    from the fairly minor issue of default visibility). Though, I think
    the point stands that in Objective-C they are not. Just because you
    can pound a square peg into a round hole doesn't mean you have a round
    peg.

    --Brady
  • On Feb 6, 2008, at 10:01 PM, glenn andreas wrote:

    >
    > On Feb 6, 2008, at 7:48 PM, John Engelhart wrote:
    >> int main(int argc, char **argv) {
    >> NSObject *stackObject = NULL;
    >> stackObject = alloca(sizeof(stackObject));
    >
    > This, of course, just alloced something the size of a pointer, which
    > may not be the real size of the class (if it were something other
    > than NSObject)

    Opps, you're right.  I clearly meant NSObject, as the memset line
    below shows.  You'll note that changing the alloca line from
    stackObject to NSObject works just fine.

    >
    >>
    >> memset(stackObject, 0, sizeof(NSObject));
    >> stackObject->isa = [NSObject class];
    >>
    >> NSLog(@"stackObject: %@", stackObject);
    >> }
    >> [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
    >> tst.m: In function 'main':
    >> tst.m:7: warning: instance variable 'isa' is @protected; this will
    >> be a hard error in the future
    >> [<johne...>] /tmp% ./tst
    >> 2008-02-06 20:17:16.306 tst[18320:807] stackObject: <NSObject:
    >> 0xbffff380>
    >>
    >> As the address clearly shows, this is an object on the stack.
    >> Although I have had to manually initialize the object, it is
    >> exactly, or very close to, what "NSObject stackObject" would have
    >> created.
    >>
    >>
    >
    > The biggest problem is that the above example shows something that
    > isn't usable - it's sterile.  You can't pass it to other routines
    > (due to memory management requirements), you may not even be able to
    > call all the methods of the object (since they may pass the
    > themselves as parameters to other objects  which invoke memory
    > management requirements).  Only if you wrote your own entire
    > hierarchies could you use such a construction (or if you implemented
    > full closures, which is even more difficult in a C based
    > language) .  There's a whole raft of semantics associated with Cocoa
    > objects (above and beyond anything that whatever version of the
    > objective-c runtime may require).

    I'm pretty sure I was clear that this is "not really a good idea in
    reality."  It is, unlike you stated, possible to do.

    >
    > If you can't use the object, have you actually created it?

    You can use the object.  It remains live until the stack frame pops.
    It will work exactly like any other object.  The "deallocation" of the
    object is tricky, but I'm sure you could still pull it off if you
    really wanted.

    >>
    >> As you can see, and the code clearly demonstrates, my original
    >> assertion stands.
    >>
    >
    > One could also manually construct a C++ vtable and set up all the
    > magic "behind the scenes" plumbing that a C++ object has, but that
    > doesn't mean that C++ object are ' synonymous for "Structs" ' either.
    >
    > The point is that they are not synonymous for structs - if they
    > were, you could replace one with the other, and both Objective-C and
    > C++ objects have additional semantic requirements.

    You can replace one for the other, see above.  The GCC compiler stops
    you from doing this because it's really not a good idea in practice.
    There are other Objective-C implementations that allow stack based
    objects exactly as you describe.  I can't remember which one it was
    off the top of my head, it might have been poc.

    This is a literal "NSObject stackObject;", if it helps any:

    #import <Foundation/Foundation.h>

    int main(int argc, char **argv) {
      struct { @defs(NSObject) } stackObject;
      memset(&stackObject, 0, sizeof(NSObject));
      stackObject.isa = [NSObject class];

      NSLog(@"stackObject: %@", &stackObject);
    }
    [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
    [<johne...>] /tmp% ./tst
    2008-02-07 00:04:20.477 tst[18686:807] stackObject: <NSObject:
    0xbffff398>
    [<johne...>] /tmp%

    I suppose I could go through all the trouble of putting together
    initialization and deallocation methods as part of a subclass, or
    category override, that specifically dealt with stack "allocation" and
    "deallocation".  I could even get crafty and have dealloc check to see
    if self is an pointer that's on the stack and call the stack dealloc
    code, and the standard dealloc code otherwise.

    Honestly though, I'm not sure how much more plain I can make it.  You
    said:
    >
    >>
    >> You misunderstand what Objective C is, and how it works.  "Objects"
    >> is synonymous for "Structs".
    >
    >
    >
    > If that were true, you'd be able to declare objects as local
    > variables (as opposed to as pointers to structures):
    >
    > NSPoint aPoint; // <-- NSPoint = struct, legal
    > NSString aString; // <-- NSString = object, Illegal

    Yet, as the code above plainly shows, "struct { @defs(NSObject) }
    stackObject;" has declared an object as a local variable, even passed
    to NSLog() to print its description, proving that's it's a bona-fide,
    useable object, and that the keyword "struct" makes it obvious that an
    "object" is just a "struct".

    How about "Objects" is synonymous for "struct objc_object"s ?  And
    since struct objc_object is typedefed to "*id", that makes it pretty
    literal.

    Does this help at all?

    #import <Foundation/Foundation.h>

    typedef struct { @defs(NSObject) } NSObject_;

    int main(int argc, char **argv) {
      NSObject_ stackObject;
      memset(&stackObject, 0, sizeof(NSObject));
      stackObject.isa = [NSObject class];

      NSLog(@"stackObject: %@", &stackObject);
    }
    [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
    [<johne...>] /tmp% ./tst
    2008-02-07 00:28:47.784 tst[18734:807] stackObject: <NSObject:
    0xbffff398>

    I've had to add an underscore to prevent namespace collision, but....
    I'm not really sure how else I can explain it.

    >
    >>
    >> As you can see, it's very clear where "isa" comes from.
    >> Subclassing an object has the effect of "pasting" your ivar
    >> declarations at the end of the class you're inheriting from, and
    >> forms the cause of "fragile classes" since a struct effectively
    >> becomes pointer + offset, and changing a struct requires
    >> recompiling code to update those offsets.
    >
    > Except, of course, for 64 bit Objective-C 2.0 which doesn't have
    > these problems (which makes the exercise of trying to allocate it on
    > the stack even more problematic).

    Yes, as I have noted, the ObjC 2.0 64 bit ABI/API is different.  This
    is dealing with the 32 bit version, which has been around since the
    80's.

    >> So, no, there are no magic invisible members.
    >
    > But there are semanticly required members that aren't in "plain"
    > structs.

    Honestly, I don't follow.  What is a "plain" struct?

    struct objc_object {
      void *isa;
    };

    Is that not a "plain" struct?  And what do you mean by "semantically
    required members"?

    #import <Foundation/Foundation.h>

    typedef struct { char letters[4]; } NSObject_;

    int main(int argc, char **argv) {
      NSObject_ stackObject;
      memset(&stackObject, 0, sizeof(NSObject_));
      void *classPtr = [NSObject class];
      memcpy(stackObject.letters, &classPtr, sizeof(void *));

      NSLog(@"stackObject: %@", &stackObject);
    }
    [<johne...>] /tmp% gcc -framework Foundation -o tst tst.m
    [<johne...>] /tmp% ./tst
    2008-02-07 00:37:57.632 tst[18799:807] stackObject: <NSObject:
    0xbffff398>

    Sure, a bit of square peg in to a round hole abuse to get the class
    ptr copied over.... but.. No isa here, yet it works.

    I suppose pedantically one could argue that in order for something to
    be an "object", it would have to have the layout of a particular kind
    of struct... but I think that's pushing things a little far, and in
    the end, whatever the layout, that layout is declared as a struct.
  • On Feb 6, 2008, at 8:48 PM, Chris Suter wrote:

    >
    > On 07/02/2008, at 12:06 PM, John Engelhart wrote:
    >
    >> However, consider for a moment if it was a type attribute, and
    >> followed type attribute rules, and how this would effect the
    >> examples cited.  Off the top of my head, I think treating it as a
    >> type attribute would have prevent every single error I've pointed
    >> out.  Hypothetically, consider if UTF8String propagated the
    >> __strong type, and the assignment of its pointer to the ivar 'const
    >> char *'.
    >>
    >> The compiler would fail to compile the code, and generate an error.
    >
    > I don't think it should be a type qualifier. It would mean that you
    > wouldn't be able to do things like:
    >
    > puts ([myString UTF8String])
    >
    > without getting a compiler warning.

    This is a pretty debatable point, with pros and cons on each side.  My
    opinion is that you should not be handing pointers which require write
    barriers for proper operation of the GC system to code that is not
    compiled with the proper support.  One could argue that the decision
    to require that "all frameworks must be GC compiled/capable" makes
    this policy a requirement.

    Since the GC system considers everything on the stack to be a live
    pointer, this has the effect of catching 99.99% use of pointers in
    this fashion.  While I do not have a specific example to show here, I
    think you'll agree that there are occasionally times when calling a C
    library function will violate these principles.  Realistically, when
    you call a function, that pointer vanishes down a call stack that is
    surpassingly complex, and that pointer has a tendency of visiting
    places you would not think applies in a particular case.  If you've
    used Shark.app, I'm sure you've seen some functions which end up
    creating some surprisingly deep call-stacks that in turn are calling
    all sorts of seemingly unrelated functions.

    I don't argue that it covers 99.99% of the cases.  It's that last tiny
    bit that I'm writing about.  I'm sure you'll agree that tracking down
    errors of this nature is, politely, frustrating.

    There's an example someone once used to underscore how optimistic we
    can be and spectacularly misjudge the likelihood of these rare errors
    occurring.  It comes from the canonical example of multi-threaded/
    multi-cpu programming:

    x++;

    I'll skip over the specifics, but the question is asked "How likely do
    you think the condition is for two threads to get into a race
    condition and incorrectly update 'x'?  A million to one?"

    While a million to one odds seems like a lot, at two gigahertz, that's
    roughly 2000 times per second.  That's roughly a 500 microseconds mean
    time to failure rate.
    >
    >> Oddly, I had to add a second NSLog() in order to get some kind of
    >> lossage, but I think it's fair to chalk this up to the semi-random
    >> nature of allocations.
    >
    > I think that would have been because the pointer returned by
    > UTF8String was still on the stack or in a register.

    Hard to say.  I think it does illustrate another point: the seemingly
    random nature in which you will get bitten by these kinds of bugs.

    >
    >> The above example is now perfectly legal by everyones definition of
    >> how things were under retain/release, and I correctly clear the
    >> pointer before it goes out of scope, and demonstrates that the GC
    >> system can, and does, reclaim live data out from under you.
    >
    > I think your example is contrived. Whilst it's legal in the retain/
    > release world you wouldn't ever write anything like that.

    Granted, my example is contrived, no two ways about it.  But is that
    not the point?  To create a compact example which replicates the
    problem?  I believe I have done all that is required: demonstrate that
    it is possible.  Once I've done that, the sheer volume of code under
    consideration essentially guarantees that this is taking place.  My
    practical, hands on experience suggests this (and by this, I don't
    mean this particular example per se, but the ease in which it's
    possible to get some of these subtle points wrong) is happening far
    more frequently then you would think.

    >
    > The solution to all of this is, as has already been stated, is to
    > understand the contract that UTF8String promises and to make your
    > own arrangements if you want to hang on to the value.

    You're absolutely right.  But my point is that, in practice, this is
    not quite as clear cut as it seems.

    Allow me to back way, way, way up.  For the purposes of this argument,
    let us not consider all the technical points that have been discussed
    so far, as it's easy to get lost arguing pedantic, nuanced details.
    Let's consider the GC system from a purely pragmatic point of view.

    Now, the precise specifics not withstanding, you will at some time get
    some small detail wrong.  You will have created a bug with regards to
    some GC detail.  The effect of this bug, which I think everyone will
    reasonably agree on, is likely to result in the collector reclaiming
    memory that you have in use when you clearly didn't want it to.

    The hows and whys of your bug aren't really important, but you have
    done something wrong when you shouldn't have.  These things happen.

    My experience with these bugs has been that they consume an
    EXTRAORDINARY amount of time to track down.  Because of the semi-
    random nature that these bugs manifest themselves in, I have found
    that it's virtually impossible to find a solid set of conditions to
    tickle the condition.  I have found that unit tests are worthless in
    trying to track down these problems.  The complex interactions
    required to tickle these bugs are essentially impossible to create
    with unit tests.  All unit tests will pass, flawlessly, and in fact it
    some times may not be possible to recreate the right conditions to
    trigger the bug because essentially all your variables are sitting on
    the stack in the scope of the unit test.

    The pragmatic effects of using Leopards GC system has been a MASSIVE
    increase in the amount of time I have spent debugging problems that
    have all the symptoms of "race condition" bugs, and consequently the
    huge uptick in effort required to find and eliminate these bugs.

    This is my warning to you (you being all of the list, or archive
    reader).  Setting aside all the technical points discussed, you will
    eventually create a bug that in retrospect, you clearly shouldn't
    have.  The nature of these bugs means that it will take tremendous
    effort and time to track down and correct.  My experiences with the GC
    system have seen the time I spend debugging explode, and in fact
    dominate the time I spend developing.  What's worse is that these bugs
    rarely manifest themselves during development and are nearly
    impossible to catch with unit tests.  This means that the reliability
    of "shipped code" goes right through the floor, and replicating the
    bug often requires considerable interaction with an outside party and
    all the difficulties that that entails.
  • On Feb 6, 2008 10:52 PM, Ben Trumbull <trumbull...> wrote:
    >
    > You cannot message a void* without type casting.  The compiler won't
    > let you.  The compiler treats objects differently.  Further, there is
    > no way for you to accept an arbitrary void* parameter and prove at
    > runtime it is in fact capable of dispatching messages (i.e. has an
    > isa at offset 0).  Yet I always know I can message an id.

    More nitpickery here....

    You can certainly message a void * without type casting. The compiler emits
    a warning, but warnings are not errors. I would not be surprised if this
    were not the case in ObjC++, I only tried plain ObjC.

    And you certainly cannot know that you can message an id. Aside from obvious
    pathological cases such as (id)42, consider the extremely common problems we
    see on this list all the time where people mess up their memory management
    and end up with id variables which crash when messaged.

    It's often convenient to ignore these facts and act as though ObjC is a safe
    OO language, but the fact is that it is C and it allows all the dumb (and
    powerful) stuff that C allows.

    For any arbitrary id, I can invoke -class without crashing.

    This is untrue. You can only invoke -class safely if the target implements
    it. Anything which inherits from NSObject will, but it is trivial to create
    a root class which doesn't respond to -class. And there's no reasonable way
    to find out if it responds to -class or not, because -respondsToSelector: is
    *also* not guaranteed to exist! The ObjC runtime functions can try to tell
    you, but they can give a false negative in the case of classes which handle
    these messages via forwarding. Once again, it's convenient to act as though
    all ObjC objects conform to the NSObject protocol, but it is not actually
    enforced.

    Mike
  • On Feb 6, 2008, at 1:29 PM, Michael Tsai wrote:

    > On Feb 6, 2008, at 12:46 PM, Alastair Houghton wrote:
    >
    >>> I don't think this is a bug. The NSString and CFString APIs do not
    >>> indicate that they treat the bytes as scanned memory.
    >>
    >> That's true, but it doesn't matter whether they treat the bytes as
    >> scanned memory or not; that would only change whether putting
    >> pointer data in the bytes was safe.  The problem is whether the
    >> pointer itself is being traced, which isn't happening right now
    >
    > Sorry, that's what I meant.
    >
    >> the docs *do* say (in the Garbage Collection Programming Guide)
    >> that NULL, kCFAllocatorDefault and kCFAllocatorSystemDefault cause
    >> objects to be allocated in the GC zone, so I don't think it's
    >> unreasonable to expect that the pointer will be traced.
    >
    > The string was allocated using kCFAllocatorDefault, but the
    > deallocator was specified as kCFAllocatorNull. The docs say:
    >
    > "If the buffer does not need to be deallocated, or if you want to
    > assume responsibility for deallocating the buffer (and not have the
    > CFString object deallocate it), pass kCFAllocatorNull."
    >
    > If CFString is not going to be responsible for deallocating the
    > buffer, then it would not make sense to rely on CFString keeping the
    > buffer alive for you.

    Just a quick note... my recollection is that using anything but
    kCFAllocatorNull results in a double free()... but I might be
    misremembering, it was awhile ago, but I'm pretty sure that this was
    the bit of code that did it.  A quick peek at the source code base
    shows this is only one of a handful of places where
    NSAllocateCollectable is used, so chances are this is it.  I think you
    also need to crank up the malloc environment debugging to catch it.
  • On Feb 6, 2008, at 10:12 AM, Michael Tsai wrote:

    > On Feb 6, 2008, at 4:39 AM, John Engelhart wrote:
    >
    >> Read the above, "object" is synonymous for "struct".  The "layout"
    >> of an object is identical to the "layout" of a struct.
    >
    >
    > That is true but irrelevant. What matters for garbage collection is
    > whether the variables are typed as objects at compile time, because
    > that's what determines what code the compiler emits for assignments.

    A nitpick, but it is more correct to say "where the variables are
    typed as __strong at compile time".  Subtle, but important, especially
    when you consider the following from objc/objc.h:

    typedef struct objc_object {
        Class isa;
    } *id;

    Despite any prose definition of what an object is, this is the
    definition of an object to the compiler.  I will note that the
    compilers definition of an object is a) a struct, and b) not
    __strong.  Therefore, in an overly pedantically strict sense, your
    statement is wrong, but I'm fairly sure you would have picked __strong
    in place of objects as that is the effect you were trying to
    communicate.

    Point B is especially interesting, and to fully appreciate its
    consequences this is going to turn in to an ugly discussion of the
    intermediate gimple code.  When you consider the points I've raised
    from this perspective, and what's really going on under the hood, many
    of my arguments will snap in to focus.  Needless to say, we rarely
    examine the assembly code emitted by the compiler, and in practice the
    effect of a write barrier assignment and a non-write barrier
    assignment during execution is identical, you would have no reason to
    suspect anything is wrong.

    One of my original claims is that, despite the prose definition of how
    the GC system works and peoples beliefs, the compiler is magically
    transforming some pointers in to __strong behind your back by some
    unspecified logic, as the definition of id shows. 'id' is the codified
    definition of an object in fact, and the fact that __strong is not
    part of its definition means that the prose definition of "objects are
    __strong" is misleading at best, but wrong in a strict sense. And
    since one pointer looks pretty much like any other pointer in the guts
    of the compiler, the likelihood that this magical promotion is being
    applied correctly to the appropriate pointers is pretty slim.  Since
    __strong is not being treated as a type qualifier per ANSI-C rules,
    there is nothing in place to catch inadvertent "down promotions" when
    they happen.  The true performance impact of the GC system has
    probably also been grossly understated as wrapping a write to memory
    in a function call essentially obliterates any hope of hoisting values
    in to registers during optimization along with the attendant
    performance robbing spills.

    Another claim I have made is that "treating everything on the stack as
    __strong is not the right thing, it is only going to mask the
    problem."  One could argue that the choice of treating everything on
    the stack as __strong is proof of how often the compiler is getting
    this automagic promotion wrong, or otherwise dropping the __strong
    qualification during some code transformation.  My hypothesis is that
    if write barriers were being generated correctly, there would be no
    need to treat the stack as special, other than the fact of "how much"
    of the stack to be considered live relative to the top frame.

    There has also been some discussion regarding proper use of memory
    allocated by NSAllocateCollectable, such as its use with CFString
    creation methods.  This highlights another point of mine.  Despite the
    (sometimes foaming at the mouth) assertions that the GC rules are
    "easy", what is transpiring is reasonable people are having
    differences of opinion regarding the use of GC allocated memory.  They
    are even perfectly valid, totally reasonable interpretations.  This
    highlights that in practice, these "simple rules" are in fact
    deceptively complex and trivially easy to get wrong.  Understand that
    it doesn't really matter who is right in the argument, the fact that
    there is reasonable debate about something which should be completely
    unambiguous and crystal clear demonstrates that in practice, there's
    probably one correct way, and many wrong ways, and chances are you are
    going to get it wrong.  From experience, being told that "you
    obviously should have used __strong, DUH" after four days of intense
    debugging is going to ring hollow.  You should be cautious of objects
    near your vicinity spontaneously levitating and taking flight.  Or, as
    a courtesy, at least hang a sign on the door.
  • Ben Trumbull wrote:
    > John Engelhart wrote:
    >> - (id *)copyOfObjectArray:(id *)originalObjects length:
    >> (NSUInteger)length
    >> {
    >> id *newObjectArray = NULL;
    >> newObjectArray = NSAllocateCollectable(sizeof(id) * length,
    >> NSScannedOption);
    >> memcpy(newObjectArray, originalObjects, sizeof(id) * length);
    >> return(newObjectArray);
    >> }
    > This does not work. Pushing GC'd objects through memcpy, a system
    > call that can't know anything about Objective-C Garbage Collection,
    > seems unwise.

    Correct. Don't manipulate memory that might have GC pointers in it
    without using a GC-aware function.

    > Nonetheless, that also should be better documented, and a bug report
    > for a public GC compatible memory copy API would be good.

    The GC-aware memcpy() is in <objc/objc-auto.h>

        void *objc_memmove_collectable(void *dst, const void *src, size_t
    size);

    There are also GC-aware versions of OSAtomicCompareAndSwapPtr()

        BOOL objc_atomicCompareAndSwapGlobal(id predicate, id
    replacement, volatile id *objectLocation);
        BOOL objc_atomicCompareAndSwapGlobalBarrier(id predicate, id
    replacement, volatile id *objectLocation);
    // atomic update of an instance variable
        BOOL objc_atomicCompareAndSwapInstanceVariable(id predicate, id
    replacement, volatile id *objectLocation);
        BOOL objc_atomicCompareAndSwapInstanceVariableBarrier(id
    predicate, id replacement, volatile id *objectLocation);

    These are typedef'd as id, but they work equally well with any pointer.

    >> Anyone who's used garbage collection with C is probably familiar
    >> with the Boehm Garbage Collector. [...] It makes no particular
    >> demands of the programmer or compiler, in fact it can be used as a
    >> drop in replacement for malloc() and free(), requiring no changes.

    Of course, that only works if malloc() is replaced everywhere in the
    system, which is impractical in a dynamic shared library environment.
    Storing a Boehm-managed pointer in a block allocated from non-Boehm
    malloc() or a non-default malloc zone would cause just as much grief
    as storing a Leopard-GC-managed pointer in a block allocated with
    malloc().

    The designer of a GC system always has to draw a line and say "if you
    cross this line, you have to start thinking about memory management
    again".
    Java: memory management is easy, until you start working with non-Java
    code via JNI. Benefits: the JVM can use sophisticated GC techniques
    because of its tight control. Drawbacks: working with non-Java code is
    very hard.
    Boehm: memory management is easy, until you call mmap() or start
    working with code in shared libraries. Benefits: most ordinary C code
    works. Drawbacks: GC algorithms invented after 1970 or so are
    impractical.
    Objective-C: memory management is easy, until you want to use it with
    blocks that aren't Objective-C objects. Benefits: most ordinary
    Objective-C code works. Drawbacks: C code is harder than Boehm; GC
    flexibility is less than Java.

    --
    Greg Parker    <gparker...>    Runtime Wrangler
  • On Feb 7, 2008, at 11:23 PM, John Engelhart wrote:

    > Another claim I have made is that "treating everything on the stack
    > as __strong is not the right thing, it is only going to mask the
    > problem."  One could argue that the choice of treating everything
    > on the stack as __strong is proof of how often the compiler is
    > getting this automagic promotion wrong, or otherwise dropping the
    > __strong qualification during some code transformation.  My
    > hypothesis is that if write barriers were being generated
    > correctly, there would be no need to treat the stack as special,
    > other than the fact of "how much" of the stack to be considered
    > live relative to the top frame.

    Treating everything on the stack as __strong makes things simpler for
    the programmer; you don't have to worry about what the compiler is or
    isn't checking for you. Secondly, as you noted above, it's relatively
    expensive to turn assignments into function calls. It would be a
    waste of time to do this for variables within short-lived stack frames.

    > Despite the (sometimes foaming at the mouth) assertions that the GC
    > rules are "easy", what is transpiring is reasonable people are
    > having differences of opinion regarding the use of GC allocated
    > memory.

    Perhaps we can agree that the rules for objects, the most common
    case, are easy.

    --Michael
  • On Feb 8, 2008 4:23 AM, John Engelhart <john.engelhart...> wrote:

    > 'id' is the codified
    > definition of an object in fact

    Here we go again.

    Perhaps your confusion between "an address in memory" and "the
    contents of memory from that address" is at the heart of the problem?

    The following code "proves" that objects and character arrays are "equivalent":

    #import <Foundation/Foundation.h>
    int main(int argc, char **argv) {
        char string[] = "test";
        id x = [NSObject class];
        memcpy(string, &x, 4);
        NSLog(@"object: %@", string);
    }

    $ gcc -framework Foundation -o test test.m
    $ ./test
    2008-02-08 17:33:21.514 test[1380:10b] object: <NSObject: 0xbffffa8b>
    $

    To put it a different way: memory is memory is memory. Data types give
    us structured access to that memory. Objective-C objects are not
    __strong, but ids (a special type of pointer to those objects) are
    __strong. It's that simple. Take it or leave it, but don't run around
    shouting that the sky is falling.

    > Since
    > __strong is not being treated as a type qualifier per ANSI-C rules,
    > there is nothing in place to catch inadvertent "down promotions" when
    > they happen.

    You want to have the compiler warn you whenever you create a weak
    reference. Fine: file an enhancement request. The sky still isn't
    falling!

    Hamish
  • This discussion has gotten a bit far from involving Cocoa directly.

    Please move it to the objective-C mailing list (<objc-language...>
    ).

    On Feb 7, 2008, at 8:23 PM, John Engelhart wrote:

    >
    > On Feb 6, 2008, at 10:12 AM, Michael Tsai wrote:
    >
    >> On Feb 6, 2008, at 4:39 AM, John Engelhart wrote:
    >>
    >>> Read the above, "object" is synonymous for "struct".  The "layout"
    >>> of an object is identical to the "layout" of a struct.
    >>
    >>
    >> That is true but irrelevant. What matters for garbage collection is
    >> whether the variables are typed as objects at compile time, because
    >> that's what determines what code the compiler emits for assignments.
    >
    > A nitpick, but it is more correct to say "where the variables are
    > typed as __strong at compile time".  Subtle, but important,
    > especially when you consider the following from objc/objc.h:
    >
    > typedef struct objc_object {
    > Class isa;
    > } *id;
    >
    > Despite any prose definition of what an object is, this is the
    > definition of an object to the compiler.  I will note that the
    > compilers definition of an object is a) a struct, and b) not
    > __strong.  Therefore, in an overly pedantically strict sense, your
    > statement is wrong, but I'm fairly sure you would have picked
    > __strong in place of objects as that is the effect you were trying
    > to communicate.
    >
    > Point B is especially interesting, and to fully appreciate its
    > consequences this is going to turn in to an ugly discussion of the
    > intermediate gimple code.  When you consider the points I've raised
    > from this perspective, and what's really going on under the hood,
    > many of my arguments will snap in to focus.  Needless to say, we
    > rarely examine the assembly code emitted by the compiler, and in
    > practice the effect of a write barrier assignment and a non-write
    > barrier assignment during execution is identical, you would have no
    > reason to suspect anything is wrong.
    >
    > One of my original claims is that, despite the prose definition of
    > how the GC system works and peoples beliefs, the compiler is
    > magically transforming some pointers in to __strong behind your back
    > by some unspecified logic, as the definition of id shows. 'id' is
    > the codified definition of an object in fact, and the fact that
    > __strong is not part of its definition means that the prose
    > definition of "objects are __strong" is misleading at best, but
    > wrong in a strict sense. And since one pointer looks pretty much
    > like any other pointer in the guts of the compiler, the likelihood
    > that this magical promotion is being applied correctly to the
    > appropriate pointers is pretty slim.  Since __strong is not being
    > treated as a type qualifier per ANSI-C rules, there is nothing in
    > place to catch inadvertent "down promotions" when they happen.  The
    > true performance impact of the GC system has probably also been
    > grossly understated as wrapping a write to memory in a function call
    > essentially obliterates any hope of hoisting values in to registers
    > during optimization along with the attendant performance robbing
    > spills.
    >
    > Another claim I have made is that "treating everything on the stack
    > as __strong is not the right thing, it is only going to mask the
    > problem."  One could argue that the choice of treating everything on
    > the stack as __strong is proof of how often the compiler is getting
    > this automagic promotion wrong, or otherwise dropping the __strong
    > qualification during some code transformation.  My hypothesis is
    > that if write barriers were being generated correctly, there would
    > be no need to treat the stack as special, other than the fact of
    > "how much" of the stack to be considered live relative to the top
    > frame.
    >
    > There has also been some discussion regarding proper use of memory
    > allocated by NSAllocateCollectable, such as its use with CFString
    > creation methods.  This highlights another point of mine.  Despite
    > the (sometimes foaming at the mouth) assertions that the GC rules
    > are "easy", what is transpiring is reasonable people are having
    > differences of opinion regarding the use of GC allocated memory.
    > They are even perfectly valid, totally reasonable interpretations.
    > This highlights that in practice, these "simple rules" are in fact
    > deceptively complex and trivially easy to get wrong.  Understand
    > that it doesn't really matter who is right in the argument, the fact
    > that there is reasonable debate about something which should be
    > completely unambiguous and crystal clear demonstrates that in
    > practice, there's probably one correct way, and many wrong ways, and
    > chances are you are going to get it wrong.  From experience, being
    > told that "you obviously should have used __strong, DUH" after four
    > days of intense debugging is going to ring hollow.  You should be
    > cautious of objects near your vicinity spontaneously levitating and
    > taking flight.  Or, as a courtesy, at least hang a sign on the door.
previous month february 2008 next month
MTWTFSS
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29    
Go to today