Inline Assembly Getting Optimized

  • Hello, I have several asm() statements in sequence.  I read somewhere
    that the compiler can possibly rearrange the order of these
    instructions.  Well, it seems to be doing just that.  When I compile
    with -O0 [no optimizations], everything works fine.  When I compile with
    -O3, funroll-loops and fast-math, it doesn't.  Is there a way to protect
    these asm() statements from optimization?

    Thanks,
    Vincent
  • Hi Vincent,
    how about 'volatile'?

    Stefan

    > Hello, I have several asm() statements in sequence.  I read somewhere
    > that the compiler can possibly rearrange the order of these
    > instructions.  Well, it seems to be doing just that.  When I compile
    > with -O0 [no optimizations], everything works fine.  When I compile
    > with -O3, funroll-loops and fast-math, it doesn't.  Is there a way to
    > protect these asm() statements from optimization?
    >
    > Thanks,
    > Vincent
    >
    > _______________________________________________
    > MacOSX-dev mailing list
    > <MacOSX-dev...>
    > http://www.omnigroup.com/mailman/listinfo/macosx-dev
    >
  • If you are referring to asm volatile(), yes, I am doing that.

    Vincent

    On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:

    > Hi Vincent,
    > how about 'volatile'?
    >
    > Stefan
    >
    >> Hello, I have several asm() statements in sequence.  I read somewhere
    >> that the compiler can possibly rearrange the order of these
    >> instructions.  Well, it seems to be doing just that.  When I compile
    >> with -O0 [no optimizations], everything works fine.  When I compile
    >> with -O3, funroll-loops and fast-math, it doesn't.  Is there a way to
    >> protect these asm() statements from optimization?
    >>
    >> Thanks,
    >> Vincent
    >>
    >> _______________________________________________
    >> MacOSX-dev mailing list
    >> <MacOSX-dev...>
    >> http://www.omnigroup.com/mailman/listinfo/macosx-dev
    >>
    >
    > _______________________________________________
    > MacOSX-dev mailing list
    > <MacOSX-dev...>
    > http://www.omnigroup.com/mailman/listinfo/macosx-dev
    >
  • Is there a pragma in GNU C that will shut off optimization for a single
    function?  I know the optimization is causing my code to crash, but when
    I disassemble with the -S option, the inlined assembly instructions look
    identical in the debug and the optimized build, so there must be
    something else going wrong outside of my assembly instructions.

    Vincent

    On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:

    > Hi Vincent,
    > how about 'volatile'?
    >
    > Stefan
  • On Wednesday, October 10, 2001, at 08:27  AM, Vincent Predoehl wrote:

    > Is there a pragma in GNU C that will shut off optimization for a single
    > function?  I know the optimization is causing my code to crash, but
    > when I disassemble with the -S option, the inlined assembly
    > instructions look identical in the debug and the optimized build, so
    > there must be something else going wrong outside of my assembly
    > instructions.

    You can put multiple asm statements inside an "asm()", like this:

    asm (
      "foo\n"
      "bar r3,r6\n"
      "baz\n"
    : ...);

    which may help keep your instructions together and wholesome, perhaps
    with the help of 'volatile'.

    To turn off optimization of a function, put

    #pragma CC_OPT_OFF

    before the function and

    #pragma CC_OPT_RESTORE

    after the function.

    The best bet is to put the "inline" assembly in a separate function,
    particularly if the function it is in currently is a large one.  This
    simplifies the analysis of what may be going on.  If you code the entire
    new function in assembly (which can be done inside a single asm(), as
    above), you can leave out the preamble and postamble setup if it's a
    leaf.

    Chris Kane
    Cocoa Frameworks, Apple
  • I am pretty sure I am somehow clobbering a register with my asm
    volatile() statement, even though I am listing all registers after the
    third colon to tell the compiler they are being clobbered.

    How can I efficiently save/restore all 32 integer and floating point
    registers?

    Vincent

    On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:

    > Hi Vincent,
    > how about 'volatile'?
    >
    > Stefan
    >
    >> Hello, I have several asm() statements in sequence.  I read somewhere
    >> that the compiler can possibly rearrange the order of these
    >> instructions.  Well, it seems to be doing just that.  When I compile
    >> with -O0 [no optimizations], everything works fine.  When I compile
    >> with -O3, funroll-loops and fast-math, it doesn't.  Is there a way to
    >> protect these asm() statements from optimization?
    >>
    >> Thanks,
    >> Vincent
  • Chris Kane wrote:
    >
    > To turn off optimization of a function, put
    >
    > #pragma CC_OPT_OFF
    >
    > before the function and
    >
    > #pragma CC_OPT_RESTORE
    >
    > after the function.

    FYI, this will no longer do what you want in GCC 3.x, because of
    changes to how optimization works.  I've made our version of GCC 3
    parse these pragmas (which are Apple-isms), but all you'll get is
    a warning that they're deprecated.

    Stan
  • Am Mittwoch den, 10. Oktober 2001, um 5:29, schrieb Vincent Predoehl:

    > I am pretty sure I am somehow clobbering a register with my asm
    > volatile() statement, even though I am listing all registers after the
    > third colon to tell the compiler they are being clobbered.
    >
    > How can I efficiently save/restore all 32 integer and floating point
    > registers?
    Calling convention is to save R13 to R31, FPR14 to FPR31 and CR2 to CR4.
    Of course only if you change the register contents. R1 is the stack
    pointer and R2 the RTOC.
    Can you give us an example? How does your code lokk like?

    Stefan
    >
    > Vincent
    >
    > On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:
    >
    >> Hi Vincent,
    >> how about 'volatile'?
    >>
    >> Stefan
    >>
    >>> Hello, I have several asm() statements in sequence.  I read somewhere
    >>> that the compiler can possibly rearrange the order of these
    >>> instructions.  Well, it seems to be doing just that.  When I compile
    >>> with -O0 [no optimizations], everything works fine.  When I compile
    >>> with -O3, funroll-loops and fast-math, it doesn't.  Is there a way to
    >>> protect these asm() statements from optimization?
    >>>
    >>> Thanks,
    >>> Vincent
    >
    > _______________________________________________
    > MacOSX-dev mailing list
    > <MacOSX-dev...>
    > http://www.omnigroup.com/mailman/listinfo/macosx-dev
    >
  • I think I got it working now by putting it in its own function.  Here's
    the code anyway.  Basically, I pass everything to the function as a
    parameter and copy everything to temporary registers before using them.
    Comments and suggestions are welcome.

    Vincent

    #pragma CC_OPT_OFF
    void Analyze(char *in00, char *in01, char *in02, char *in03,
                    int bands, int prots_per_band,
                    RLABEL_TYPE *atp_,
                    RLABEL_TYPE *tmp0_, RLABEL_TYPE *tmp1_,
                    RLABEL_TYPE *tmp2_, RLABEL_TYPE *tmp3_)
    {
          register int bands_temp __asm__("r3") = bands;
          register int temp1 __asm__("r4"), temp2 __asm__("r5");
          register int prots_per_band_temp __asm__("r6") = prots_per_band;
            register char *in00_temp __asm__("r7") = in00-1;
            register char *in01_temp __asm__("r8") = in01-1;
            register char *in02_temp __asm__("r9") = in02-1;
            register char *in03_temp __asm__("r10") = in03-1;
            register int addr __asm__("r11") = 0;
            register RLABEL_TYPE *atp_temp __asm__("r12") = atp_;
            register RLABEL_TYPE tmp0_temp __asm__("f0") = *tmp0_;
            register RLABEL_TYPE tmp1_temp __asm__("f1") = *tmp1_;
            register RLABEL_TYPE tmp2_temp __asm__("f2") = *tmp2_;
            register RLABEL_TYPE tmp3_temp __asm__("f3") = *tmp3_;
            register RLABEL_TYPE v0 __asm__("f4");
            register RLABEL_TYPE v1 __asm__("f5");

    //        asm volatile(
      //          "\n"
            asm volatile("mtctr %0\n" : : "r" (bands_temp));
            asm volatile(
                "LP1:\n"
                "lbzu %2, 1(%3)\n"    // temp1 = in00[k]
                "lbzu %6, 1(%7)\n"    // temp2 = in01[k]
                "add %2, %0, %2\n"    // temp1 = addr + in00[k]
                "add %6, %0, %6\n"    // temp2 = addr + in01[k]
                "slwi %2, %2, 2\n"        // temp1 *= 4
                "slwi %6, %6, 2\n"        // temp2 *= 4
                "add %2, %1, %2\n"    // temp1 = atp + addr + in00[k]
                "add %6, %1, %6\n"    // temp2 = atp + addr + in01[k]
                "lfs %5, 0(%2)\n"    // v0 = atp [ addr + in00[k] ]
                "lfs %9, 0(%6)\n"    // v1 = atp [ addr + in01[k] ]
                "fadds %4, %4, %5\n"    // tmp0_temp += v0
                "fadds %8, %8, %9\n"    // tmp1_temp += v1
            :
            :
                "r" (addr), "r" (atp_temp),
                "r" (temp1), "r" (in00_temp), "f" (tmp0_temp), "f" (v0),
                "r" (temp2), "r" (in01_temp), "f" (tmp1_temp), "f" (v1)
            );
            asm volatile(
                "lbzu %2, 1(%3)\n"    // temp1 = in00[k]
                "lbzu %6, 1(%7)\n"    // temp2 = in01[k]
                "add %2, %0, %2\n"    // temp1 = addr + in00[k]
                "add %6, %0, %6\n"    // temp2 = addr + in01[k]
                "slwi %2, %2, 2\n"        // temp1 *= 4
                "slwi %6, %6, 2\n"        // temp2 *= 4
                "add %2, %1, %2\n"    // temp1 = atp + addr + in00[k]
                "add %6, %1, %6\n"    // temp2 = atp + addr + in01[k]
                "lfs %5, 0(%2)\n"    // v0 = atp [ addr + in00[k] ]
                "lfs %9, 0(%6)\n"    // v1 = atp [ addr + in01[k] ]
                "fadds %4, %4, %5\n"    // tmp0_temp += v0
                "fadds %8, %8, %9\n"    // tmp1_temp += v1
            :
            :
                "r" (addr), "r" (atp_temp),
                "r" (temp1), "r" (in02_temp), "f" (tmp2_temp), "f" (v0),
                "r" (temp2), "r" (in03_temp), "f" (tmp3_temp), "f" (v1)
            );
            asm volatile(
                "add %0, %1, %2\n"
                "bdnz LP1\n"
                :
                    "=r" (addr)
                :
                    "0" (addr), "r" (prots_per_band_temp)
            );
        *tmp0_ = tmp0_temp;
        *tmp1_ = tmp1_temp;
        *tmp2_ = tmp2_temp;
        *tmp3_ = tmp3_temp;
    }
    #pragma CC_OPT_RESTORE

    On Wednesday, October 10, 2001, at 06:21 PM, Stefan Jung wrote:

    > Calling convention is to save R13 to R31, FPR14 to FPR31 and CR2 to
    > CR4. Of course only if you change the register contents. R1 is the
    > stack pointer and R2 the RTOC.
    > Can you give us an example? How does your code lokk like?
    >
    > Stefan