Is clang generating incorrect code for inline assembly?

Question

I have some C code:

#include "stdio.h"

typedef struct num {
    unsigned long long x;
} num;

int main(int argc, char **argv) {
    struct num anum;
    anum.x = 0;
    __asm__("movq %%rax, %0n" : "=m" (anum.x) : "rax"(2));
    printf("%llun",anum.x);
}

which I’m compiling and running on my (intel) Mac laptop.
The output from the code seems to be different depending on whether I compile with (gnu) gcc or clang.
I compile with gnucc -o gnu-test test.c for gcc (I built gnucc from source on my mac after downloading the source from https://gcc.gnu.org/install/download.html)
and clang -o clang-test test.c for clang (built-in macos clang)

On my mc, with gnu, the result is 2 (which is what I expect). With clang, the result is 140701838959608.

The clang result seems wrong to me, but I’m also wondering if, perhaps, my inline assembly isn’t quite correct and gcc just happens to not expose my error.

I tried out the same code on godbolt.org and the output there is also different for gcc (x86-64 gcc 13.2 gives 2) and clang (x86-64 clang 16.0.0 gives 140726522786920)

I tried disassembling the clang binary with objdump -d:

clang-test: file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000100003f60 <_main>:
100003f60: 55                           pushq   %rbp
100003f61: 48 89 e5                     movq    %rsp, %rbp
100003f64: 48 83 ec 20                  subq    $32, %rsp
100003f68: 89 7d fc                     movl    %edi, -4(%rbp)
100003f6b: 48 89 75 f0                  movq    %rsi, -16(%rbp)
100003f6f: 48 c7 45 e8 00 00 00 00      movq    $0, -24(%rbp)
100003f77: 48 8d 45 e8                  leaq    -24(%rbp), %rax
100003f7b: b9 02 00 00 00               movl    $2, %ecx
100003f80: 48 89 00                     movq    %rax, (%rax)
100003f83: 48 8b 75 e8                  movq    -24(%rbp), %rsi
100003f87: 48 8d 3d 16 00 00 00         leaq    22(%rip), %rdi          ## 0x100003fa4 <_printf+0x100003fa4>
100003f8e: b0 00                        movb    $0, %al
100003f90: e8 09 00 00 00               callq   0x100003f9e <_printf+0x100003f9e>
100003f95: 31 c0                        xorl    %eax, %eax
100003f97: 48 83 c4 20                  addq    $32, %rsp
100003f9b: 5d                           popq    %rbp
100003f9c: c3                           retq

Disassembly of section __TEXT,__stubs:

0000000100003f9e <__stubs>:
100003f9e: ff 25 5c 00 00 00            jmpq    *92(%rip)               ## 0x100004000 <_printf+0x100004000>

and 100003f80: 48 89 00 movq %rax, (%rax) seems to be the issue? clang has the correct value in ecx and the correct address to write to in rax but it does movq %rax, (%rax) instead of movq %rcx, (%rax)?

It does it correctly: godbolt.org/z/44oTGMxcY but only if you enable optimizations as your assumptions are not valid without the optimizations — 15 hours ago
I hate gas syntax, but I can't find the expected "movq $0, %rax" instruction in the assembly output so I think something's wrong here. — 15 hours ago
@Joshua: Where were you expecting to see a mov-immediate of 0 to a register? In a debug build (which this obviously is), anum.x = 0; compiles to movq $0, -24(%rbp). (The leaq -24(%rbp), %rax is preparing a register for (%rax) to be the addressing mode for "=m".) movb $0, %al is zeroing AL to tell the variadic printf there are zero XMM register args. xorl %eax, %eax implements the implicit return 0 at the bottom of main. There's never any reason for a compiler to emit movq $0 to a register; at most it'd use movl $0, %eax if not xor-zeroing. — 12 hours ago
@PeterCordes: I'm expecting to see the actual instruction in the asm block somewhere in the compiler output. — 12 hours ago
Ah I see. The asm template is "movq %%rax, %0", also using AT&T syntax, so it's moving RAX to whatever the %0 placeholder expands to (a lot like a printf format string, hence the %% to get a literal %). movq %rax, (%rax) is the expansion of the asm template. (The compiler picked RAX for the address, and a different register the template didn't use for the "rax"(2) input constraint.) This would be clearer if the OP looked at the compiler's asm output (clang -S) instead of disassembly, since then they could put a comment inside the asm template and make it super easy to find it. — 11 hours ago

score 15 · Accepted Answer · 2023-08-14 00:42:20Z

15

Clang is generating correct code, but you specified the incorrect constraint on the input operand.

The constraint ("rax") is not interpreted as a register name. Instead, each letter in the constraint specifies an allowed operand type. The first letter here, r, allows using any general register, which makes the choice of rcx valid.

To constrain to the rax register, you need to use the "a" constraint. See the x86 section in the machine constraints page.

__asm__("movq %%rax, %0n" : "=m" (anum.x) : "a"(2));

edited 15 hours ago

answered 15 hours ago

interjay

107k21 gold badges269 silver badges254 bronze badges

6

2

@mtraceur "rax" is not treated as a register name, but as a combination of 3 different constraints r, a, and x. The correct way to specify a constraint on rax, eax, or ax (depending on the operand size) is the use the "a" constraint instead of "rax".

– interjay

15 hours ago
2

interjay, so the constraints are union rather then intersection, I guess. In other words, r (any register) and a (ax-type register) would be permissively r rather than restrictively a?

– paxdiablo

15 hours ago
3

@paxdiablo Yes. The GCC documentation says: The simplest kind of constraint is a string full of letters, each of which describes one kind of operand that is permitted.

– interjay

15 hours ago
2

Just a side note: there is an experimental GCC port for ia86 that breaks with this tradition of one letter = one constraint. stackoverflow.com/questions/62686259/…

– Michael Petch

14 hours ago
3

Great, thanks for clarifying. Here's a +1, and I recommend adding ""rax" is not treated as a register name, but as a combination of 3 different constraints r, a, and x" and "when combining r (any register) and a (ax-type register), r wins" to the answer itself.

– mtraceur

13 hours ago

|
Show 1 more comment

Is clang generating incorrect code for inline assembly?