Linux memcpy restrict keyword syntax

Question

I know that the restrict qualifier in C specifies that the memory region pointed by two pointers should not overlap. It was my understanding that the Linux (not SUS) prototype for memcpy looks like –

void* memcpy(void *restrict dest, const void *restrict src, size_t count);

However, when I looked at man7.org/memcpy it seems that the declarations is –

void *memcpy(void dest[restrict .n], const void src[restrict .n], size_t n);

My questions are –

When did this syntax get introduced? C99 or later or is this some GNU extension?
What does the . before n signify? I am familiar with the variable length array declaration. Is the . for the variable appearing after the array specification? Is this part of the standard?

I don't think that's valid syntax even in gcc. It's probably just a new way of documenting. Can't say I like it. — 11 hours ago
restrict qualifier has been introduced since C99 standard. — 11 hours ago
I think the important thing is what those extra things mean in the Linux documentation. It is not like OP is not sure about the C standard itself. — 11 hours ago
The thing with pointer to VLA is that n needs to be known in advance in order to be used. But in memcpy it is the right-most parameter, so that isn't possible. Then in the Linux world, there's a lot of people who love to complicate things as much as possible just for the heck of it… — 9 hours ago

score 14 · Accepted Answer · 2023-09-04 07:43:01Z

TLDR: It’s an ad hoc syntax created in a discussion in the Linux kernel mailing lists that is used to express the size of VLA before the variable is declared, the . in .n means n refers to a parameter in the current function declaration, but n may appear after the currently declared parameter. They have also extended the usual int a[restrict n] parameter declaration to void type. I have no idea where such syntax can be found in the official documentation, but the mailing list has all the details.

The change to the memcpy syntax in the Linux kernel man-pages was introduced by this commit. The commit message is copied here verbatim for reference.

Various pages: SYNOPSIS: Use VLA syntax in ‘void *’ function parameters

Use VLA syntax also for void *, even if it’s a bit more weird.

Admittedly, it is weird enough from the C language perspective, because while void f(int n, int[restrict n]) is valid VLA syntax, void f(int n, void[restrict n]) is not because we are not allowed to have arrays of void.

For the . before n, if we dig deeper we can find this thread in the Linux kernel man pages mail list.

Let’s take an example:
    int getnameinfo(const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char *restrict host, socklen_t hostlen,
                    char *restrict serv, socklen_t servlen,
                    int flags);
and some transformations:
    int getnameinfo(const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char host[restrict hostlen], socklen_t hostlen,
                    char serv[restrict servlen], socklen_t servlen,
                    int flags);


    int getnameinfo(socklen_t hostlen;
                    socklen_t servlen;
                    const struct sockaddr *restrict addr,
                    socklen_t addrlen,
                    char host[restrict hostlen], socklen_t hostlen,
                    char serv[restrict servlen], socklen_t servlen,
                    int flags);
(I’m not sure if I used correct GNU syntax, since I never used that
extension myself.)

The first transformation above is non-ambiguous, as concise as possible,
and its only issue is that it might complicate the implementation a bit
too much. I don’t think forward-using a parameter’s size would be too
much of a parsing problem for human readers.
I personally find the second form not terrible. Being able to read
code left-to-right, top-down is helpful in more complicated examples.
The second one is unnecessarily long and verbose, and semicolons are not
very distinguishable from commas, for human readers, which may be very
confusing.
    int foo(int a; int b[a], int a);
    int foo(int a, int b[a], int o);
Those two are very different to the compiler, and yet very similar to
the human eye. I don’t like it. The fact that it allows for simpler
compilers isn’t enough to overcome the readability issues.
This is true, I would probably use it with a comma and/or syntax
highlighting.

I think I’d prefer having the forward-using syntax as a non-standard
extension –or a standard but optional language feature– to avoid
forcing small compilers to implement it, rather than having the GNU
extension standardized in all compilers.

The problems with the second form are:

it is not 100% backwards compatible (which maybe ok though) as the semantics of the following code changes:

int n; int foo(int a[n], int n); // refers to different n!

Code written for new compilers could then be misunderstood by old
compilers when a variable with ‘n’ is in scope.

it would generally be fundamentally new to C to have backwards references and parser might need to be changes to allow this

a compiler or tool then has to deal also with ugly corner cases such as mutual references:

int foo(int (*a)[sizeof(*b)], int (*b)[sizeof(*a)]);

We could consider new syntax such as

int foo(char buf[.n], int n);

Personally, I would prefer the conceptual simplicity of forward
declarations and the fact that these exist already in GCC over any
alternative. I would also not mind new syntax, but then one has to
define the rules more precisely to avoid the aforementioned problems.

According to my understanding, this basically means the . is a way to refer to a VLA array size parameter that is used before declaration, and one use case is to handle mutual references.

There is a follow-up thread that states,

I am ok with the syntax, but I am not sure how this would work. If the
type is determined only later you would still have to change parsers
(some C compilers do type checking and folding during parsing, so
need the types to be known during parsing) and you also still have the
problem with the mutual dependencies.

We thought about using this syntax

int foo(char buf[.n], int n);

because it is new syntax which means we can restrict the size to be
the name of a parameter instead of allowing arbitrary expressions,
which then makes forward references less problematic. It is also
consistent with designators in initializers and could also be extend
to annotate flexible array members or for storing pointers to arrays
in structures:

struct { int n; char buf[.n]; };

struct { int n; char (*buf)[.n]; };

Of course, there was also objection, which I think many people in the SO community would agree with,

the only point i strongly care about is this one:

Manual pages should not use

non-standard syntax

non-portable syntax

ambiguous syntax (i.e. syntax that might have different meanings with different compilers or in different contexts)

syntax that might be invalid or dangerous with some widely used compiler collections like GCC or LLVM

score 2 · Accepted Answer · 2023-09-04 08:42:52Z

For both questions, the VLA notation appears to be a goal of a design principle for C23 whereby "APIs should be self-documenting when possible". See Programming Language C – C23 Charter.

The dot notation does not appear in the April 2023 C23 draft, and I speculate it is a wish-list item for a future revision of the standard. The author of the dot notation openly admits that it’s not valid syntax, and gives reasons why he chose it, at 1eed67e

The notation seems to originate in the Linux development community, and its use in published man-pages documentation appears to be somewhat speculative. It was introduced with commits 1eed67e (the commit message is a better answer to this question than I can manage) and c64cd13, and the language "Use VLA syntax also for void *, even if it’s a bit more weird.".

The language "even if it’s a bit more weird" tells me that the author hopes the syntax might eventually be considered for inclusion in the C standard, since he doesn’t cite any authoritative source like a draft or a compiler implementation.

As far as the variable length array feature, it has been supported in GCC as extension since C90 and as a standard since C99: GCC Variable Length documentation. The dot notation used is man-pages is not yet implemented in any GCC version, AFAIK.

glibc uses the void * notation in the header files at the time of this writing (Sep 3, 2023).

This syntax will not be in C23 so it seems like nonsense to me. And you can't have arrays of void anyway so I don't know what they were even thinking… — 9 hours ago
I think the goal of "self-documenting APIs" is fantastic. void issue aside, the readability of the language would benefit from this VLA notation. The idea needs work, but I think it has merit. — 9 hours ago
"the notation seems to be in the proposal stage" is a bit misleading. This normally means that a proposal was written and submitted to the C standard committee, which doesn't seem to be the case here. Or do you mean proposed for use in the man pages? — 9 hours ago
@HolyBlackCat I took the text of the introductory commit to literally be a proposal to WG14 members: 1eed67e – it is written, public, and aimed at the standard committee. With the text of this commit message in mind, how could I word my phrase better? — 9 hours ago
A proposal is normally a paper submitted to the committee as described here. "how could I word my phrase better" I would get rid of the word "proposal", and of the "might eventually/soon become part of a draft" part. — 8 hours ago

Linux memcpy restrict keyword syntax

Linux memcpy restrict keyword syntax

2 Answers 2

2 Answers
2