Benefit of endless-loops without side-effects in C++ being UB compared to C?

Question

In C++ loops as

for(;;) {}

are UB, whereas they are not in C?

In https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2809r0.html it is expressed that there are good reasons for doing so. Are there any simple examples to make that clear?

I suppose with "good reasons" you refer to the rationale but are looking for a concrete example, right? — 17 hours ago
Yes, excactly! I'm looking for a simple example of an optimization according to that possibiliy. — 17 hours ago
The main benefit is to add yet another case of UB to the vast tome of endless UB sometimes referred to as C++… — 16 hours ago

score 26 · Accepted Answer · 2023-08-23 19:39:12Z

The reasons are optimizations only.

If the compiler can assume that all loops without side-effects terminate, it does not have to prove that.

If the non-terminating loops were allowed, the compiler would be allowed to perform certain optimizations only if it could prove termination, that is impossible in general so it would turn into pattern recognition game. And for what benefit?

The underlying issue is that non-termination is a kind of side-effect on itself. Any observable effect which is certain to happen after the loop terminates is observed if and only if the loop terminates, even though the loop has no effects.

Of course exactly the same reasoning can be done for if(expr) return;, the compiler is not allowed to move stuff after the if before if unless it can prove expr is false. But if is the fundamental control flow mechanism while non-terminating loops are not (IMHO).

Take the following code.

int collatz_conjecture(int i){
    while(i!=1){
        if ((i%2)==0)
            i/=2;
        else
            i=i*3+1;
    }
    return i;
}

int main(){
    collatz_conjecture(10);
    
    return 5;
}

With O3, gcc compiles it to:

collatz_conjecture(int):
        mov     eax, 1
        ret
main:
        mov     eax, 5
        ret

So, did the compiler prove the Collatz conjecture in order to determine that it should return 1 for all numbers? Of course not, that is just one of the optimizations allowed by termination assumption (and where UB can happen). The only way the loop can terminate is if i==1 therefore it can assume i==1 after the loop and use it for further optimizations -> the function always return 1 and can thus be reduced to it.

More useful example can be interleaved copy. If you have

loop A
loop B

the compiler is allowed to interleave them even without knowing that A terminates. Many vectorization operations rely on this assumption.

Similarly, reordering of some independent after-loop operations before the loop assumes the loop will terminate.

EDIT: I meant Collatz conjecture, not Goldbach…

"did the compiler prove the Goldbach conjecture" Just restricted to int range, (before overflow of int, so another UB) 😉 — 17 hours ago
Does that mean, that in C this kind of optimization is not possible? — 17 hours ago
Proving termination is not an impossible feat in general. What is currently impossible is doing it within a constrained/bounded time – the problem is mathematically described as NP-hard. Developers demand fast optimising compilers, and vendors compete on that basis. Compiler optimisations that require solving an NP-hard problem in tractable (non-polynomial) time are impossible within current state of scientific/mathematical knowledge (anyone who actually manages to solve any NP-hard problem in less than polynomial time will, almost certainly, get a Nobel prize) — 16 hours ago
@Peter what? NP-hard problems are solvable in a constrained amount of time. That's pretty much how "NP-hard" is defined. Loop termination is undecidable, not NP-hard or any other kind of hard. It's possible to proof for a given loop if it will terminate or not, but it is not possible to devise an algorithm that can determine this for any loop. This was proven over a century ago in one of the foundational papers of CS, there might be a price for proving that we had it wrong for the past century-ish, but I wouldn't put my money on this actually happening. — 16 hours ago
Also worth pointing out that "NP hard" doesn't mean "impractical". This is a common myth. Many NP-hard problems are pretty amenable to stochastical approaches, heuristics and approximations for practical problem inputs. It's not uncommon for NP-hard or even undecidable problems to pop up in compilers, which are usually solved by heuristics and "try our best and report an error if it doesn't work out" respectively. — 15 hours ago

score 1 · Accepted Answer · 2023-08-23 22:34:43Z

The primary benefit is simplicity of specification. The as-if rule cannot really accommodate the notion that a program can have behavior which is defined and yet may be observably inconsistent with sequential program execution. Further, the authors of the C and C++ Standards use the phrase "Undefined Behavior" as a catch-all for situations where they didn’t think it necessary to exercise jurisdiction, in many cases because they expected that compiler writers would be better placed than the Committee to understand their customers’ needs.

Most of the useful optimizations which are facilitated by specifying that if no individual action within a loop would be sequenced relative to some later piece of code, execution of the loop as a whole need not be treated as sequenced either. This is a bit "hand-wavy" with regard to what code comes "after" an endless loop, but it makes clear what it would allow a compiler to do. Among other things, if no individual action within a loop would be sequenced before program termination, then execution of the loop as a whole may be omitted entirely.

An important principle that such a rule would embody, which is missing from the present rule, is that a transform that would introduce a dependency between code within a loop and code outside the loop would also introduce a sequencing relationship. If a loop would exit if a certain condition is true, and code after the loop checks that condition, a compiler may use the result from the earlier check to avoid having to repeat the test, but that would mean that code after the loop was relying upon a value computed within the loop.

Note that giving compilers the freedom to resequence code while keeping data dependencies would not preclude the possibility that a correct application might get stuck in an endless loop when fed some possible inputs, if the need to terminate the application using outside means in some cases would be acceptable. Allowing them to resequence code without regard for the resulting data dependencies, however, will have the ironic effect of speeding up primarily "erroneous" programs that cannot be relied upon to satisfy application requirements, and offer no benefit to most programs that add extra checking (which would not otherwise be needed to satisfy application requirements) to guard against endless loops.

Benefit of endless-loops without side-effects in C++ being UB compared to C?

Benefit of endless-loops without side-effects in C++ being UB compared to C?

2 Answers 2

2 Answers
2