How to debug a corrupted stack

  • I have a document based app which works perfectly with -O0 or -O1 but
    crashes with -O2 or higher.

    When the crash occurs the debugger comes up and says: "Previous frame
    identical to this frame (corrupt stack?)"

    When I try to step through the function (which is kind of difficult,
    as the optimization has shuffled the lines a lot) at some time the
    top frame of the stack gets duplicated.

    The faulty method starts with:
    NSString *path = @"/Users/gerriet/Desktop/some alias"; // error with
    -O2

    If it starts with:
    NSString *path = @"/Users/gerriet/Desktop/some file"; // ok with -O2
    then everything works perfectly.

    When I comment out the place where the error seems to occur, it will
    just occur at some earlier place.

    So it is kind of difficult to see where and why the stack gets
    corrupted.

    Any help would be most welcome. I am completely run out of ideas and
    spent already hours with this bug.

    Tiger 10.4.11, C Language Dialect C99 or GNU99, powerpc-apple-darwin8-
    gcc-4.0.1

    Kind regards,

    Gerriet.
  • On Aug 5, 2008, at 9:51 PM, Gerriet M. Denkmann wrote:

    > I have a document based app which works perfectly with -O0 or -O1
    > but crashes with -O2 or higher.
    >
    > When the crash occurs the debugger comes up and says: "Previous
    > frame identical to this frame (corrupt stack?)"
    >
    > When I try to step through the function (which is kind of difficult,
    > as the optimization has shuffled the lines a lot) at some time the
    > top frame of the stack gets duplicated.
    >
    > The faulty method starts with:
    > NSString *path = @"/Users/gerriet/Desktop/some alias";    //    error
    > with -O2
    >
    > If it starts with:
    > NSString *path = @"/Users/gerriet/Desktop/some file";    //    ok with -O2
    > then everything works perfectly.
    >
    > When I comment out the place where the error seems to occur, it will
    > just occur at some earlier place.
    >
    > So it is kind of difficult to see where and why the stack gets
    > corrupted.
    >
    > Any help would be most welcome. I am completely run out of ideas and
    > spent already hours with this bug.

    You don't say what kind of crash it is?  EXC_BAD_ACCESS?

    One thing to try is running the program with MallocDebug.  When the
    crash happens, use malloc_history to learn the history of the address
    involved in the crash.

    Another thing to try is enabling zombies.

    You might also try turning on additional compilation warnings.  When
    optimization is turned on, the compiler can check more things because
    it's doing data-flow analysis.  It might find your error.

    Finally, the last refuge of all programmers is to liberally sprinkle
    printfs/NSLogs throughout the suspect code to see what's going on.
    Similarly, you can put a breakpoint at some point you're sure is
    before the problem starts and then step slowly through the code.

    Good luck,
    Ken
  • On 6 Aug 2008, at 11:14, Ken Thomases wrote:

    > On Aug 5, 2008, at 9:51 PM, Gerriet M. Denkmann wrote:
    >
    >> I have a document based app which works perfectly with -O0 or -O1
    >> but crashes with -O2 or higher.
    >>
    >> When the crash occurs the debugger comes up and says: "Previous
    >> frame identical to this frame (corrupt stack?)"
    >>
    >> When I try to step through the function (which is kind of
    >> difficult, as the optimization has shuffled the lines a lot) at
    >> some time the top frame of the stack gets duplicated.
    >>
    >>
    >> Any help would be most welcome. I am completely run out of ideas
    >> and spent already hours with this bug.
    >
    > You don't say what kind of crash it is?  EXC_BAD_ACCESS?

    I didn't say because no one told me. Especially not gdb. But I ran
    the program outside of Xcode and got a crash report with:

    Exception:  EXC_BAD_ACCESS (0x0001)
    Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000000

    The crash has nothing to do with aliases (they just created diffent
    paths through the code).
    The program crashes when both -O2 (or better) and Generate Position-
    Dependend Code are set.

    here r20 = 0x90ec (good)
    0x00002b04  <+0464>  lwz    r4,0(r21)
    0x00002b08  <+0468>  mr      r5,r29
    0x00002b0c  <+0472>  mr      r3,r24
    0x00002b10  <+0476>  bla    0xfffeff00 <objc_msgSend_rtp> = [ a
    addChild: b ]
    now r20 = 0 (bad)
    ...
    0x00002b4c  <+0536>  lwz    r4,0(r20)    <---- crash here, because r20 = 0
    0x00002b50  <+0540>  mr      r3,r29
    0x00002b54  <+0544>  bla    0xfffeff00 <objc_msgSend_rtp> = [ b
    release ]

    If someone wants to check whether it really is a compiler bug (and
    not just some stupidity on my side) I can send the whole project.
    10.4.11 - not tested on 10.5

    This was difficult to debug, because -O2 keeps most variables in
    registers, so whenever I did
    "po a" I got the answer: "No symbol "a" in current context."
    And the two identical stack-frames are maybe a gdb bug (the crash
    report had a normal stack trace).

    Anyway - I switched off the Generate Position-Dependend Code (Faster
    function calls for applications)  and all is fine again (after a day
    lost fighting with the compiler).

    Kind regards,

    Gerriet.
  • On Tue, Aug 5, 2008 at 7:51 PM, Gerriet M. Denkmann
    <gerriet...> wrote:
    > I have a document based app which works perfectly with -O0 or -O1 but
    > crashes with -O2 or higher.
    >
    > When the crash occurs the debugger comes up and says: "Previous frame
    > identical to this frame (corrupt stack?)"
    >
    > When I try to step through the function (which is kind of difficult, as the
    > optimization has shuffled the lines a lot) at some time the top frame of the
    > stack gets duplicated.
    >
    > The faulty method starts with:
    > NSString *path = @"/Users/gerriet/Desktop/some alias";  //      error
    > with -O2
    >
    > If it starts with:
    > NSString *path = @"/Users/gerriet/Desktop/some file";  //      ok
    > with -O2
    > then everything works perfectly.
    >
    > When I comment out the place where the error seems to occur, it will just
    > occur at some earlier place.

    You are moving things around in memory by chaning the length of your
    string above and commenting out things. Additionally changing
    optimizer settings shuffles things around.

    I have a feeling you are hitting an issue cause by an uninitialized
    value or something similar in your code that is being hidden in some
    situations by the code emitted.

    Also are you sure you are getting any runtime exceptions logged?

    -Shawn
  • On 8/6/08 9:51 AM, Gerriet M. Denkmann said:

    > So it is kind of difficult to see where and why the stack gets
    > corrupted.

    Have you tried 'stack canaries'?
    <http://lists.apple.com/archives/xcode-users/2007/Dec/msg00055.html>

    On 8/6/08 7:59 PM, Gerriet M. Denkmann said:

    > If someone wants to check whether it really is a compiler bug (and
    > not just some stupidity on my side) I can send the whole project.
    > 10.4.11 - not tested on 10.5

    Xcode 3.1 comes with 3 compilers: gcc 4.0, gcc 4.2, llvm-gcc 4.2... you
    could try all 3 if you suspect a compiler bug.

    --
    ____________________________________________________________
    Sean McBride, B. Eng                <sean...>
    Rogue Research                        www.rogue-research.com
    Mac Software Developer              Montréal, Québec, Canada
  • On 7 Aug 2008, at 01:16, Sean McBride wrote:

    > On 8/6/08 9:51 AM, Gerriet M. Denkmann said:
    >
    >> So it is kind of difficult to see where and why the stack gets
    >> corrupted.
    >
    > Have you tried 'stack canaries'?
    > <http://lists.apple.com/archives/xcode-users/2007/Dec/msg00055.html>

    I have not. Seems this is a Xcode 3.0 thing. I have Xcode 2.4.

    > On 8/6/08 7:59 PM, Gerriet M. Denkmann said:
    >
    >> If someone wants to check whether it really is a compiler bug (and
    >> not just some stupidity on my side) I can send the whole project.
    >> 10.4.11 - not tested on 10.5
    >
    > Xcode 3.1 comes with 3 compilers: gcc 4.0, gcc 4.2, llvm-gcc 4.2...
    > you
    > could try all 3 if you suspect a compiler bug.

    I am under the impression that Xcode 3.x is Leopard only (please
    correct me if I am wrong).
    And due to hardware constraints I can run only Tiger.

    Kind regards,

    Gerriet.
  • On 6 Aug 2008, at 21:56, Shawn Erickson wrote:

    > On Tue, Aug 5, 2008 at 7:51 PM, Gerriet M. Denkmann
    > <gerriet...> wrote:
    >> I have a document based app which works perfectly with -O0 or -O1 but
    >> crashes with -O2 or higher.
    >>
    >> When the crash occurs the debugger comes up and says: "Previous frame
    >> identical to this frame (corrupt stack?)"
    >>
    >> When I try to step through the function (which is kind of
    >> difficult, as the
    >> optimization has shuffled the lines a lot) at some time the top
    >> frame of the
    >> stack gets duplicated.
    >>
    >> The faulty method starts with:
    >> NSString *path = @"/Users/gerriet/Desktop/some
    >> alias";  //      error
    >> with -O2
    >>
    >> If it starts with:
    >> NSString *path = @"/Users/gerriet/Desktop/some
    >> file";  //      ok
    >> with -O2
    >> then everything works perfectly.
    >>
    >> When I comment out the place where the error seems to occur, it
    >> will just
    >> occur at some earlier place.
    >
    > You are moving things around in memory by chaning the length of your
    > string above and commenting out things. Additionally changing
    > optimizer settings shuffles things around.
    I admit that the alias-thing was a red herring. Not a question of
    files, aliases or strings, but of different paths through my code.

    > I have a feeling you are hitting an issue cause by an uninitialized
    > value or something similar in your code that is being hidden in some
    > situations by the code emitted.

    Your feeling is absolutely right. No compiler bug. I have to
    apologize to the compiler - I spoke rashly.

    The problem is NSValue. Here is a small code sample:

    unsigned int ss = sizeof(FSCatalogInfo); // 144 bytes
    ss /= 4; // 36 words

    FSCatalogInfo catalogInfo1;
      memset( &catalogInfo1, 0xab, sizeof(FSCatalogInfo) );

    FSCatalogInfo catalogInfo2;
      memset( &catalogInfo2, 0xcd, sizeof(FSCatalogInfo) );

    unsigned int *p = (unsigned int *)&catalogInfo1;
    for(unsigned int i = 0; i < 2 * ss; i +=12)
    {
      for(unsigned int j = i; j < i + 12; j++) fprintf(stderr,"%#010x ", p
    [j]);
      fprintf(stderr,"\n");
    };
    fprintf(stderr,"\n");

    NSValue *data = [ NSValue value: &catalogInfo1 withObjCType: @encode
    (FSCatalogInfo) ];    //    reads 36 + 5 words -- very naughty
    const char *objCType = [ data objCType ]; // correct
    fprintf(stderr,"objCType = %s\n", objCType);

      memset( &catalogInfo2, 0, sizeof(FSCatalogInfo) );

    [ data getValue: &catalogInfo1 ]; // writes 36 + 5 words and
    destroys parts of catalogInfo2

    for(unsigned int i = 0; i < 2 * ss; i +=12)
    {
      for(unsigned int j = i; j < i + 12; j++) fprintf(stderr,"%#010x ", p
    [j]);
      fprintf(stderr,"\n");
    };

    When I run this, I see that NSValue reads (and writes) not 36 words
    (as it should, as instructed by @encode ) but 36 + 5 words.
    And subsequently overwrites 5 words, which in my case were 2
    unimportant words + 3 stored registers (r20.. r22) which then upon
    return lead to a crash.

    How can it be that such a fundamental thing like NSValue does not
    work correctly? At least not on Tiger 10.4.11.

    Maybe someone would want to check this on Leopard.

    > Also are you sure you are getting any runtime exceptions logged?
    I believe that all non-caught exceptions are logged in the Xcode Run
    Log. And there were none.

    Kind regards,

    Gerriet.
  • On Thu, Aug 7, 2008 at 11:28 AM, Gerriet M. Denkmann
    <gerriet...> wrote:
    > Maybe someone would want to check this on Leopard.

    File a bug (http://bugreport.apple.com).  That's the quickest way to
    make sure that 1) if it's a problem, it gets fixed or 2) if it's not a
    problem someone from Apple will tell you.

    --Kyle Sluder
  • The problem here is that UTCDateTime is defined with #pragma pack 2 in
    effect. That means the compiler packs with an alignment of 2, so the
    whole structure has 8 bytes. The proper alignment (4) results in 12
    bytes. Since there's nothing in the @encode'd information specifying
    the non-standard alignment, NSGetSizeAndAlignment (which NSValue
    probably uses internally) will return 12 bytes as the struct's size.

    As an example,

    #include <Foundation/Foundation.h>

    void main()
    {
        NSUInteger size, align;
        NSGetSizeAndAlignment (@encode(UTCDateTime),
                              &size,
                              &align);
        printf("%i, %i, %s\n", sizeof(UTCDateTime), size,
    @encode(UTCDateTime));
    }

    prints "8, 12, {UTCDateTime=SIS}".

    HTH,
    Johannes Fortmann
  • On 8 Aug 2008, at 01:59, Johannes Fortmann
    <johannes.fortmann...> wrote:
    >
    > The problem here is that UTCDateTime is defined with #pragma pack 2 in
    > effect. That means the compiler packs with an alignment of 2, so the
    > whole structure has 8 bytes. The proper alignment (4) results in 12
    > bytes. Since there's nothing in the @encode'd information specifying
    > the non-standard alignment, NSGetSizeAndAlignment (which NSValue
    > probably uses internally) will return 12 bytes as the struct's size.
    >
    > As an example,
    >
    > #include <Foundation/Foundation.h>
    >
    > void main()
    > {
    > NSUInteger size, align;
    > NSGetSizeAndAlignment (@encode(UTCDateTime),
    > &size,
    > &align);
    > printf("%i, %i, %s\n", sizeof(UTCDateTime), size,
    > @encode(UTCDateTime));
    > }
    >
    > prints "8, 12, {UTCDateTime=SIS}".

    Thank you. This explains the problems I had.

    So:
    some_type a;
    NSValue *data = [ NSValue value: &a withObjCType: @encode
    (some_type) ];
    followed by:
    some_type b;
    [ data getValue: &b ];
    is unsafe, dangerous and strictly to be avoided - especially if the
    definiton of "some_type" is buried in some frameworks.

    Instead one must use:
    some_type *bPointer = [ data bytes ];
    The only problem: NSValue has no method "bytes".

    So we have to do:
    const char *objCType = [ data objCType ];
    unsigned int size;
    NSGetSizeAndAlignment (objCType, &size, NULL);
    // The alloca() function is machine and compiler dependent; its use
    is discouraged. Why?
    some_type *bPointer = alloca(size);
    [ data getValue: bPointer ];
    Or does anyone have a better idea?

    Another question:
    sizeof(UTCDateTime) < NSValue.
    Now this begs the question: Can there be some bloated_type with:
    sizeof(bloated_type) > NSValue ?
    That is: is there some alignment pragma taking more bytes than the
    alignment used by NSValue?

    FInal question: May the dangerous and (at least in Tiger)
    undocumented behaviour of getValue: be labeled as a severe design error?

    Kind regards,

    Gerriet.
  • Gerriet M. Denkmann (<gerriet...>) on 2008-8-8 9:49 PM said:

    > some_type a;
    > NSValue *data = [ NSValue value: &a withObjCType: @encode
    > (some_type) ];
    > followed by:
    > some_type b;
    > [ data getValue: &b ];
    > is unsafe, dangerous and strictly to be avoided - especially if the
    > definiton of "some_type" is buried in some frameworks.
    >
    > Instead one must use:
    > some_type *bPointer = [ data bytes ];
    > The only problem: NSValue has no method "bytes".

    Note that the docs say that value:withObjCType: and objCType "may be
    deprecated in a future release".  Also, I suspect objCType would be
    problematic in GC apps (see archive discussion of NSData's btyes method).

    What is your ultimate goal?  Could you use NSData instead of NSValue?

    Sean
  • On Aug 7, 2008, at 9:49 PM, Gerriet M. Denkmann wrote:

    > Or does anyone have a better idea?
    >

    Define your own struct or Objective-C class that has the same members
    as UTCDateTime.  Copy the values from a UTCDateTime to your struct or
    class. Encode/Decode your struct or class from NSValue.  With a few
    utility routines to handle this for you this should be simple and safe.

    Or:

    Use NSData, add a category to NSData containing: dataWithUTCDateTime:,
    add the appropriate accesor for UTCDateTime and/or for the members of
    UTCDateTime.

    --
    Brian Stern
    <brians99...>
  • On 8 Aug 2008, at 09:04, Sean McBride wrote:

    > Gerriet M. Denkmann (<gerriet...>) on 2008-8-8 9:49 PM said:
    >
    >> some_type a;
    >> NSValue *data = [ NSValue value: &a withObjCType: @encode
    >> (some_type) ];
    >> followed by:
    >> some_type b;
    >> [ data getValue: &b ];
    >> is unsafe, dangerous and strictly to be avoided - especially if the
    >> definiton of "some_type" is buried in some frameworks.
    >>
    >> Instead one must use:
    >> some_type *bPointer = [ data bytes ];
    >> The only problem: NSValue has no method "bytes".
    >
    > Note that the docs say that value:withObjCType: and objCType "may be
    > deprecated in a future release".  Also, I suspect objCType would be
    > problematic in GC apps (see archive discussion of NSData's btyes
    > method).

    On the other hand NSData will definitely have problems if it is sent
    via Distributed Object to another application with another endian-ness.
    But this not something which I will do with this app. And I am also
    not sure whether NSValue will do the right thing with DO.

    > What is your ultimate goal?  Could you use NSData instead of NSValue?
    Just to transport data. In the case which caused me so much trouble
    144 bytes of FSCatalogInfo.
    But, now that you mention it, I could use NSData instead.
    And probably will, as I am tired of fighting with NSValue.
    Thanks for this suggestion. I was kind of hooked on NSValue for - as
    I see now - no good reason.

    Kind regards,

    Gerriet.
previous month august 2008 next month
MTWTFSS
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Go to today