Inline Assembly Getting Optimized
-
Hello, I have several asm() statements in sequence. I read somewhere
that the compiler can possibly rearrange the order of these
instructions. Well, it seems to be doing just that. When I compile
with -O0 [no optimizations], everything works fine. When I compile with
-O3, funroll-loops and fast-math, it doesn't. Is there a way to protect
these asm() statements from optimization?
Thanks,
Vincent -
Hi Vincent,
how about 'volatile'?
Stefan> Hello, I have several asm() statements in sequence. I read somewhere
> that the compiler can possibly rearrange the order of these
> instructions. Well, it seems to be doing just that. When I compile
> with -O0 [no optimizations], everything works fine. When I compile
> with -O3, funroll-loops and fast-math, it doesn't. Is there a way to
> protect these asm() statements from optimization?
>
> Thanks,
> Vincent
>
> _______________________________________________
> MacOSX-dev mailing list
> <MacOSX-dev...>
> http://www.omnigroup.com/mailman/listinfo/macosx-dev
> -
If you are referring to asm volatile(), yes, I am doing that.
Vincent
On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:> Hi Vincent,
> how about 'volatile'?
>
> Stefan
>
>> Hello, I have several asm() statements in sequence. I read somewhere
>> that the compiler can possibly rearrange the order of these
>> instructions. Well, it seems to be doing just that. When I compile
>> with -O0 [no optimizations], everything works fine. When I compile
>> with -O3, funroll-loops and fast-math, it doesn't. Is there a way to
>> protect these asm() statements from optimization?
>>
>> Thanks,
>> Vincent
>>
>> _______________________________________________
>> MacOSX-dev mailing list
>> <MacOSX-dev...>
>> http://www.omnigroup.com/mailman/listinfo/macosx-dev
>>
>
> _______________________________________________
> MacOSX-dev mailing list
> <MacOSX-dev...>
> http://www.omnigroup.com/mailman/listinfo/macosx-dev
> -
Is there a pragma in GNU C that will shut off optimization for a single
function? I know the optimization is causing my code to crash, but when
I disassemble with the -S option, the inlined assembly instructions look
identical in the debug and the optimized build, so there must be
something else going wrong outside of my assembly instructions.
Vincent
On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:> Hi Vincent,
> how about 'volatile'?
>
> Stefan -
On Wednesday, October 10, 2001, at 08:27 AM, Vincent Predoehl wrote:> Is there a pragma in GNU C that will shut off optimization for a single
> function? I know the optimization is causing my code to crash, but
> when I disassemble with the -S option, the inlined assembly
> instructions look identical in the debug and the optimized build, so
> there must be something else going wrong outside of my assembly
> instructions.
You can put multiple asm statements inside an "asm()", like this:
asm (
"foo\n"
"bar r3,r6\n"
"baz\n"
: ...);
which may help keep your instructions together and wholesome, perhaps
with the help of 'volatile'.
To turn off optimization of a function, put
#pragma CC_OPT_OFF
before the function and
#pragma CC_OPT_RESTORE
after the function.
The best bet is to put the "inline" assembly in a separate function,
particularly if the function it is in currently is a large one. This
simplifies the analysis of what may be going on. If you code the entire
new function in assembly (which can be done inside a single asm(), as
above), you can leave out the preamble and postamble setup if it's a
leaf.
Chris Kane
Cocoa Frameworks, Apple -
I am pretty sure I am somehow clobbering a register with my asm
volatile() statement, even though I am listing all registers after the
third colon to tell the compiler they are being clobbered.
How can I efficiently save/restore all 32 integer and floating point
registers?
Vincent
On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:> Hi Vincent,
> how about 'volatile'?
>
> Stefan
>
>> Hello, I have several asm() statements in sequence. I read somewhere
>> that the compiler can possibly rearrange the order of these
>> instructions. Well, it seems to be doing just that. When I compile
>> with -O0 [no optimizations], everything works fine. When I compile
>> with -O3, funroll-loops and fast-math, it doesn't. Is there a way to
>> protect these asm() statements from optimization?
>>
>> Thanks,
>> Vincent -
Chris Kane wrote:>
> To turn off optimization of a function, put
>
> #pragma CC_OPT_OFF
>
> before the function and
>
> #pragma CC_OPT_RESTORE
>
> after the function.
FYI, this will no longer do what you want in GCC 3.x, because of
changes to how optimization works. I've made our version of GCC 3
parse these pragmas (which are Apple-isms), but all you'll get is
a warning that they're deprecated.
Stan -
Am Mittwoch den, 10. Oktober 2001, um 5:29, schrieb Vincent Predoehl:> I am pretty sure I am somehow clobbering a register with my asmCalling convention is to save R13 to R31, FPR14 to FPR31 and CR2 to CR4.
> volatile() statement, even though I am listing all registers after the
> third colon to tell the compiler they are being clobbered.
>
> How can I efficiently save/restore all 32 integer and floating point
> registers?
Of course only if you change the register contents. R1 is the stack
pointer and R2 the RTOC.
Can you give us an example? How does your code lokk like?
Stefan>
> Vincent
>
> On Wednesday, October 10, 2001, at 05:29 AM, Stefan Jung wrote:
>
>> Hi Vincent,
>> how about 'volatile'?
>>
>> Stefan
>>
>>> Hello, I have several asm() statements in sequence. I read somewhere
>>> that the compiler can possibly rearrange the order of these
>>> instructions. Well, it seems to be doing just that. When I compile
>>> with -O0 [no optimizations], everything works fine. When I compile
>>> with -O3, funroll-loops and fast-math, it doesn't. Is there a way to
>>> protect these asm() statements from optimization?
>>>
>>> Thanks,
>>> Vincent
>
> _______________________________________________
> MacOSX-dev mailing list
> <MacOSX-dev...>
> http://www.omnigroup.com/mailman/listinfo/macosx-dev
> -
I think I got it working now by putting it in its own function. Here's
the code anyway. Basically, I pass everything to the function as a
parameter and copy everything to temporary registers before using them.
Comments and suggestions are welcome.
Vincent
#pragma CC_OPT_OFF
void Analyze(char *in00, char *in01, char *in02, char *in03,
int bands, int prots_per_band,
RLABEL_TYPE *atp_,
RLABEL_TYPE *tmp0_, RLABEL_TYPE *tmp1_,
RLABEL_TYPE *tmp2_, RLABEL_TYPE *tmp3_)
{
register int bands_temp __asm__("r3") = bands;
register int temp1 __asm__("r4"), temp2 __asm__("r5");
register int prots_per_band_temp __asm__("r6") = prots_per_band;
register char *in00_temp __asm__("r7") = in00-1;
register char *in01_temp __asm__("r8") = in01-1;
register char *in02_temp __asm__("r9") = in02-1;
register char *in03_temp __asm__("r10") = in03-1;
register int addr __asm__("r11") = 0;
register RLABEL_TYPE *atp_temp __asm__("r12") = atp_;
register RLABEL_TYPE tmp0_temp __asm__("f0") = *tmp0_;
register RLABEL_TYPE tmp1_temp __asm__("f1") = *tmp1_;
register RLABEL_TYPE tmp2_temp __asm__("f2") = *tmp2_;
register RLABEL_TYPE tmp3_temp __asm__("f3") = *tmp3_;
register RLABEL_TYPE v0 __asm__("f4");
register RLABEL_TYPE v1 __asm__("f5");
// asm volatile(
// "\n"
asm volatile("mtctr %0\n" : : "r" (bands_temp));
asm volatile(
"LP1:\n"
"lbzu %2, 1(%3)\n" // temp1 = in00[k]
"lbzu %6, 1(%7)\n" // temp2 = in01[k]
"add %2, %0, %2\n" // temp1 = addr + in00[k]
"add %6, %0, %6\n" // temp2 = addr + in01[k]
"slwi %2, %2, 2\n" // temp1 *= 4
"slwi %6, %6, 2\n" // temp2 *= 4
"add %2, %1, %2\n" // temp1 = atp + addr + in00[k]
"add %6, %1, %6\n" // temp2 = atp + addr + in01[k]
"lfs %5, 0(%2)\n" // v0 = atp [ addr + in00[k] ]
"lfs %9, 0(%6)\n" // v1 = atp [ addr + in01[k] ]
"fadds %4, %4, %5\n" // tmp0_temp += v0
"fadds %8, %8, %9\n" // tmp1_temp += v1
:
:
"r" (addr), "r" (atp_temp),
"r" (temp1), "r" (in00_temp), "f" (tmp0_temp), "f" (v0),
"r" (temp2), "r" (in01_temp), "f" (tmp1_temp), "f" (v1)
);
asm volatile(
"lbzu %2, 1(%3)\n" // temp1 = in00[k]
"lbzu %6, 1(%7)\n" // temp2 = in01[k]
"add %2, %0, %2\n" // temp1 = addr + in00[k]
"add %6, %0, %6\n" // temp2 = addr + in01[k]
"slwi %2, %2, 2\n" // temp1 *= 4
"slwi %6, %6, 2\n" // temp2 *= 4
"add %2, %1, %2\n" // temp1 = atp + addr + in00[k]
"add %6, %1, %6\n" // temp2 = atp + addr + in01[k]
"lfs %5, 0(%2)\n" // v0 = atp [ addr + in00[k] ]
"lfs %9, 0(%6)\n" // v1 = atp [ addr + in01[k] ]
"fadds %4, %4, %5\n" // tmp0_temp += v0
"fadds %8, %8, %9\n" // tmp1_temp += v1
:
:
"r" (addr), "r" (atp_temp),
"r" (temp1), "r" (in02_temp), "f" (tmp2_temp), "f" (v0),
"r" (temp2), "r" (in03_temp), "f" (tmp3_temp), "f" (v1)
);
asm volatile(
"add %0, %1, %2\n"
"bdnz LP1\n"
:
"=r" (addr)
:
"0" (addr), "r" (prots_per_band_temp)
);
*tmp0_ = tmp0_temp;
*tmp1_ = tmp1_temp;
*tmp2_ = tmp2_temp;
*tmp3_ = tmp3_temp;
}
#pragma CC_OPT_RESTORE
On Wednesday, October 10, 2001, at 06:21 PM, Stefan Jung wrote:> Calling convention is to save R13 to R31, FPR14 to FPR31 and CR2 to
> CR4. Of course only if you change the register contents. R1 is the
> stack pointer and R2 the RTOC.
> Can you give us an example? How does your code lokk like?
>
> Stefan


