Why don’t free functions in implementation files have internal linkage by default?

Why don’t free functions in implementation files have internal linkage by default?


15

When it comes to functions (non-member functions in C++), marking them as static gives them internal linkage. This means that they are not visible outside the translation unit. Why isn’t this the default? I don’t have a good statistic but from what I’ve seen most functions in implementation files should be marked as static.

I believe the consensus is to split functionality into smaller units. So it makes sense that in general the number of "utility"-like functions in the implementation file that should not be visible in other translation units is greater than the number of functions that just are an implementation of the public interface.

What is the non-opinionated reason as to why they went with "export everything" by default in this context? I mean they wouldn’t make decisions willy-nilly, would they?

Share
Improve this question

16

  • 4

    C and C++ are two different languages. C++ has destructors and the rule of five. C code uses some conventions (e.g. like in GTK…). C or C++ code is sometimes generated.

    – Basile Starynkevitch

    yesterday


  • 10

    @BasileStarynkevitch This question is valid for both C and C++. The double tag is understandable.

    – YSC

    yesterday

  • 5

    As for the reason why, you probably have to look way back into the history of C, long before it was standardized. I doubt anyone alive could give a definitive answer for the "why".

    – Some programmer dude

    yesterday


  • 4

    Actually, neither C nor C++ have anything called "free" functions. C only have "functions". C++ have "member function" (might be named something else) or "namespace scope functions" (non-member functions). The term "free" is ambiguous when it comes to functions.

    – Some programmer dude

    yesterday


  • 5

    @HolyBlackCat The free function comes to mind.

    – Lundin

    yesterday

6 Answers
6

Reset to default


13

In the C/C++ compilation model, the preprocessor runs before everything else, and replaces #includes with their contents.

Hence, there’s no difference between a function that’s defined in a .cpp file and a function defined in a header it includes.

Your suggestion would make functions defined in headers static by default (which would remove the "mulitple definition" linking error), which would be very bad, as it would cause silent code duplication in the resulting binary if you forget inline (in C++) or if you don’t know you’re not supposed to define functions in headers (in C).

Share
Improve this answer

11

  • 1

    This answer doesn't make sense in the context of the question. Question speaks of implementation files, which is more likely the actual source file, rather than header file.

    – user694733

    yesterday

  • 10

    @user694733 I addressed this in the second paragraph. It's impossible to distinguish between a function defined in a .cpp file and defined in a header it includes.

    – HolyBlackCat

    yesterday


  • 2

    @chqrlie I don't understand. In the scenario I'm describing, there's no prototype at all. Imagine I add following to a header: void foo() {std::cout << "Hello!n";}. Under current rules, this is a linking error. If you make it implicitly static (there's no prototype), this is no longer an error, but rather a silent code duplication in the binary.

    – HolyBlackCat

    yesterday

  • 1

    But is this the actual reason designers of C chose the external linkage as default? If not, then I don't think this answers the question "Why?".

    – user694733

    yesterday


  • 3

    Early programming languages, including early versions of C, did not have different types of linkages. Every identifier outside a function was linked together. That is now called external linkage. Internal linkage was invented later, and the default was already for identifiers declared outside functions to have external linkage.

    – Eric Postpischil

    yesterday


9

  1. C part

    C is now a very old language (from the 1970s…) and is highly conservative. Include files are just meant to be included at the source level. Draft n1570 for C11 explicitly says:

    A
    source file together with all the headers and source files included via the preprocessing
    directive #include is known as a preprocessing translation unit. After preprocessing, a
    preprocessing translation unit is called a translation unit.

    That means that a conformant C compiler does not make any difference between what comes from an include and a source file, since the inclusion occurred before the compilation phase.

    This is enough for functions to receive an external linkage by default (not being declared as static).

  2. C++ part

    Despite being a totally different language, C++ still assumes its inheritance from C. Specifically, the C standard library is still officially a part of the C++ standard library.

    This is probably enough for non-member function to receive the same processing by default as what they receive in C. This is of course far less important that in the C language, because C functions are actually declared as extern C. But on the other hand, non-member functions are also called namespace scoped function for a reason. And in C++, scoping is the correct way to handle the namespace pollution.

My opinion is that best practices should recommend to scope everything. You just use a named scope to get an external linkage and an anonymous one to limit scoping to the local unit. That is enough to not require changing the C default for non-member functions.

Share
Improve this answer

3

  • @BenVoigt: Not true since C++11; namespace {...} grants the same linkage as static (but can apply it to types as well as functions/variables).

    – Davis Herring

    16 hours ago

  • It has never been common for functions to be defined in C header files, so why do you think that's the original reason for this default?

    – Barmar

    8 hours ago

  • @Barmar I was thinking of function declarations which are common in include files. And a function definition is just a declaration which happens to contain the function body. Probably simpler at the compiler level to have same defaults in a pure declaration and in the declaration part of a definition.

    – Serge Ballesta

    6 hours ago


7

You would be hard pressed to find out why the default is "export everything". The language and its compilers have both evolved dramatically since its inception in the 1970s, where there are no release notes nor "working group" discussions available on the internet. "Structured programming" and goto statements were of the time; very few people were thinking about using encapsulation to minimise the shared-state complexity problem. Fortran also made functions publicly visible.

I would surmise that as the language grew in popularity, ever larger systems emerged which may have broken early editions of the linker. So some means of circumventing this needed to be introduced. For some crazy reason they chose to use static to hide functions from the linker to reduce its load (for me this is a bigger mystery as opposed to why linkage is arbitrarily public).


Practically, when declaring functions static, aside from denying other modules access to "internals", it’s worth hiding symbols from the linker in very large programs in order to to speed up the build time and reduce memory consumption. This can become unwieldy very quickly. Instead of sprinkling the codebase with static to conceal methods, it actually makes more sense to use a compiler option to set hidden visibility as the default, then decorate the functions you do want to be visible to other modules.

In Linux you can direct the compiler to make hidden visibility the default (-fvisibility=hidden):
see https://stackoverflow.com/a/52742992/1607937

In truth it’s a little bit more complicated than that; there are other options that provide finer tuning of visibility. From https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Attributes.html:

visibility ("visibility_type")
The visibility attribute on ELF targets causes the declaration to be emitted with default, hidden, protected or internal visibility.

          void __attribute__ ((visibility ("protected")))
          f () { /* Do something. */; }
          int i __attribute__ ((visibility ("hidden")));

See the ELF gABI for complete details, but the short story is:

default

Default visibility is the normal case for ELF.
This value is available for the visibility attribute to override
other options that may change the assumed visibility of symbols.

hidden

Hidden visibility indicates that the symbol will not be placed into
the dynamic symbol table, so no other module (executable or shared
library) can reference it directly.

internal

Internal visibility is like hidden visibility, but with additional
processor specific semantics. Unless otherwise specified by the
psABI, GCC defines internal visibility to mean that the function
is never called from another module. Note that hidden symbols,
while they cannot be referenced directly by other modules,
can be referenced indirectly via function pointers.

By indicating that a symbol cannot be called from outside
the module, GCC may for instance omit the load of a PIC
register since it is known that the calling function loaded
the correct value.

protected

Protected visibility indicates that the symbol will be placed in the
dynamic symbol table, but that references within the defining module
will bind to the local symbol. That is, the symbol cannot be
overridden by another module.

Not all ELF targets support this attribute.

(also see Peter Cordes’ comment in the thread)


Also note that functions can be overridden by "bolt-on" implementations that can be linked in. This is useful for mocking methods in unit tests. It’s worth using the "weak linkage" attribute if you intend to use this.


It’s worth mentioning that in C++ it is preferred to use anonymous namespaces instead of static to declare symbols as being "private":

namespace {
    <module-private code>
} // anonymous namespace

See core guidelines SF.22 – https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rs-unnamed2

In my experience, many companies embrace this in their coding standards.

Note that is not exactly equivalent to "static":

static and anonymous namespace are not the same thing. A function defined in an anonymous namespace will have external linkage. But it is guaranteed to live in a uniquely named scope. Indeed we can’t refer to it outside of the translation unit it is defined in because it is unnamed.

… so for very large C++ programs it’s still worth using -fvisibility=hidden and decorate the methods you do want to be visible to the linker, even with using anonymous namespaces.

Share
Improve this answer

16

  • 3

    This appears to be a comment rather than an answer to the question.

    – Cubic

    yesterday


  • 4

    This doesn't answer the question.

    – Lundin

    yesterday

  • 1

    "in C++ it is preferred" Preferred by who?

    – HolyBlackCat

    yesterday

  • 1

    @HolyBlackCat core guidelines; I've added link.

    – Den-Jason

    yesterday

  • 1

    static and anonymous namespace are not the same thing. A function defined in an anonymous namespace will have external linkage. But it is guaranteed to live in a uniquely named scope. Indeed we can't refer to it outside of the translation unit it is defined in because it is unnamed.

    – Fareanor

    yesterday


3

The compiler doesn’t see "definition in source file, no declarations in header". All it sees is "definition in translation unit". Under your scheme you’d need to give external linkage to every function you intend to use in multiple translation units.

The default makes lots of sense for C, where there are only free functions, and that was kept in C++ for backwards compatibility.

Share
Improve this answer

2

  • It doesn't really make sense in C either. External linkage has a downside that it pollutes global namespace, and makes function harder to inline. I don't know if it's conscious desing decision or not, but with power of hindsight, it's definitely a mistake.

    – user694733

    yesterday

  • 3

    @user694733 see molbdnilo's comment on your question: historically, a large proportion of functions are intended to be used from outside the translation unit they are defined in.

    – Caleth

    yesterday


3

Functions with static keyword declared in global namespace scope would have local linkage. This mean, that

a) if they are declared in a .cpp file, they cannot be accessed from any other compilation unit (other .cpp file).

b) if they are declared in a header, there would be a copy of each function in every compilation unit which included that header file.

c) in they are declared in a module, they cannot be accessed from anywhere else.

Why the language is designed this way? It was original decision, both in C and C++. In C header files were a secondary, an optional item. You can link a program with zero header files in it. In C++ you would need prototypes of function to be declared in source code before use. In C you didn’t need even that.

C++ uses same strategy. You could say it follows principle of least surprise. It would be unexpected for those functions to have local linkage by default and to require an "extern" keyword ( or "export", or some other extended abomination). In C++ anonymous namespaces act as a closest analog to "default local linkage".

Share
Improve this answer

3

  • Agreed except that anonymous namespaces have external linkage 🙂

    – Fareanor

    yesterday

  • @Fareanor that's why I wrote "act as closest analog to". Static const variables or functions may even be optimized away early if no address taken, that's not happening with anon namespace.

    – Swift – Friday Pie

    yesterday

  • Yes, I admit I was nitpicking 🙂

    – Fareanor

    yesterday


2

If it will be desirable to let programmers declare two slightly different forms of a construct, using a syntactic marker to distinguish an "alternate" form from a primary form, there two at least two sensible ways one could decide which form should be the primary form:

  1. If one form will be used more than the other, make that the primary form.

  2. If the language would be useless without one form, but would be at least somewhat usable without the other, make the first one the primary form.

If one is trying to minimize the amount of effort required to "bootstrap" a compiler onto a new platform, one should seek to omit things that aren’t absolutely necessary to get a minimal compiler up and running. If a compiler will be generating code that needs to interact with any other code that’s already up and running, support for external linkage will be absolutely required. Support for internal linkage may be nice, but far less necessary.

Share
Improve this answer

5

  • The OP asserts that internal linkage should be the norm. I'm not saying I agree with them, but in light of these criteria, perhaps that's a key point to address?

    – John Bollinger

    1 hour ago

  • A compiler which didn't support any kind of external linkage would be rather useless, but I guess I forgot to mention that.

    – supercat

    1 hour ago

  • I agree, @supercat, and that was clear to me. My point is that that leaves your two criteria at odds with each other according to the OP.

    – John Bollinger

    1 hour ago

  • @JohnBollinger: The fact that the criteria would call for different actions means that one's choice of action depends upon which criterion one views as more important. Presumably, the second was more important in the design of C.

    – supercat

    1 hour ago

  • Ok, fair enough.

    – John Bollinger

    1 hour ago



Leave a Reply

Your email address will not be published. Required fields are marked *