19

In many discussions about undefined behavior (UB), the point of view has been put forward that in the mere presence in a program of any construct that has UB in a program mandates a conforming implementation to do just anything (including nothing at all). My question is whether this should be taken in that sense even in those cases where the UB is associated to the execution of code, while the behaviour (otherwise) specified in the standard stipulates that the code in question should not be executed (and this possibly for specific input to the program; it might not be decidable at compile time).

Phrased more informally, does the smell of UB mandate a conforming implementation to decide that the whole program stinks, and refuse to execute correctly even the parts of the program for which the behaviour is perfectly well defined. An example program would be

#include <iostream>

int main()
{
    int n = 0;
    if (false)
      n=n++;   // Undefined behaviour if it gets executed, which it doesn't
    std::cout << "Hi there.\n";
}

For clarity, I am assuming the program is well-formed (so in particular the UB is not associated to preprocessing). In fact I am willing to restrict to UB associated to "evaluations", which clearly are not compile-time entities. The definitions pertinent to the example given are, I think,(emphasis is mine):

Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread (1.10), which induces a partial order among those evaluations

The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either ... or a value computation using the value of the same scalar object, the behavior is undefined.

It is implicitly clear that the subjects in the final sentence, "side effect" and "value computation", are instances of "evaluation", since that is what the relation "sequenced before" is defined for.

I posit that in the above program, the standard stipulates that no evaluations occur for which the condition in the final sentence is satisfied (unsequenced relative to each other and of the described kind) and that therfore the program does not have UB; it is not erroneous.

In other words I am convinced that the answer to the question of my title is negative. However I would appreciate the (motivated) opinions of other people on this matter.

Maybe an additional question for those who advocate an affirmative answer, would that mandate that the proverbial reformatting of your hard drive might occur when an erroneous program is compiled?

Some related pointers on this site:

Community
  • 1
  • 1
Marc van Leeuwen
  • 3,327
  • 20
  • 34
  • In your case mandates that your programme although, it will run OK because the point of UB is never reached, it'll lacks portability and sustainability, because one other compiler now or in the future might treat this UB as an error. – 101010 Jun 12 '14 at 14:20
  • Since this question refers to standard; please choose which one c or c++. – this Jun 12 '14 at 14:20
  • Thanks for pointing to the duplicate question (although if I understand correctly it is not valid for C++), and for retagging (indeed C was not my concern) so swiftly. It is another tribute to the lousy search features on this site, given the time I spent in vain to locate an earlier question on this subject. – Marc van Leeuwen Jun 12 '14 at 14:26
  • Maybe somebody wants to comment whether the quasi-unanimous "not UB" found at the "duplicate" question is portable from C to C++? – Marc van Leeuwen Jun 12 '14 at 14:36
  • @Yann I did not ask about re-usability of code; nobody is saying this is good practice. Nor about compiler warnings (I appreciate those). Just whether there is any ground to brandish UB here. – Marc van Leeuwen Jun 12 '14 at 14:38
  • 9
    Code paths not taken cannot invoke undefined behavior; to think otherwise leads to madness... Every null pointer check followed by dereference would be a problem, for example. See also http://stackoverflow.com/questions/7961067/ – Nemo Jun 12 '14 at 14:43
  • A similar question about C: [Can code that will never be executed invoke undefined behavior?](http://stackoverflow.com/q/18385020/827263) – Keith Thompson Jun 12 '14 at 15:07
  • 1
    [this answer and comments](http://stackoverflow.com/a/13121406/476681) are saying that the whole program is broken, regardless whether the part that causes UB executes or not. – BЈовић Jun 12 '14 at 15:09
  • 1
    @BЈовић That answer is simply wrong. – n. 'pronouns' m. Jun 12 '14 at 15:14
  • Note that `clang` and `gcc` treat undefined behavior in a [constexpr as an error](http://stackoverflow.com/questions/21319413/why-do-constant-expressions-have-an-exclusion-for-undefined-behavior) even in a constexpr function that is not invoked. – Shafik Yaghmour Jun 12 '14 at 15:14
  • @Nemo: This is C++ we are talking about. Are you sure a never-taken code path couldn't perhaps result in e.g. a template instantiation that, elsewhere, results in UB...? Not a loaded question -- I am genuinely not sure if this couldn't be the case. I just wouldn't trust C++ with *not* coming up with something convoluted like this. ;-) – DevSolar Jun 12 '14 at 15:15
  • @ShafikYaghmour Potential UB in an unevaluated constexpr function does not by itself lead UB in the program. It might make the program ill-formed. – n. 'pronouns' m. Jun 12 '14 at 15:27
  • @n.m. the question says `erroneous` not undefined. – Shafik Yaghmour Jun 12 '14 at 15:29
  • @n.m. "[Pete Becker](http://en.wikipedia.org/wiki/Pete_Becker) is simply wrong*? You say that without a shred of supporting evidence? I bet he knows more about C++ that everyone else who's posted on this thread combined and doubled. REFERENCES AND EVIDENCE, PEOPLE. – Tony Delroy Jun 12 '14 at 16:03
  • 1
    @TonyD Argumentum ad autoritate? Excellent. I have presented my evidence in my answer. Where's Pete Becker's evidence? – n. 'pronouns' m. Jun 12 '14 at 16:22
  • 1
    @TonyD Clarification: Pete Becker's answer is wrong as far as the question about never-executed code is concerned (if it pertains to that question at all). If invalid code is ever executed, it of course needs to be fixed, but that's not what we are talking about. – n. 'pronouns' m. Jun 12 '14 at 16:35
  • 1
    The answer by Pete Becker is in itself right, but the comment above by @BЈовић is wrong, because Pete is not saying "regardless whether the part that causes UB executes or not". In the linked to question the UB is inevitably executed, and the implementation has the right to not do what the standard prescribes even before "executing" the UB. But that is not the same thing as behaving badly _without being sure that UB will be executed_ (or even being certain it won't). – Marc van Leeuwen Jun 12 '14 at 20:19
  • ...Concretely an optimizer might not compile the conditional branch (even if it had more stuff before the UB) at all, since executing it would involve UB; of course in the example, the same decision might much simpler be the result of recognising the branch as dead code. But the result would be a program that does precisely what it is supposed to, no UB. – Marc van Leeuwen Jun 12 '14 at 20:23

9 Answers9

11

If a side effect on a scalar object is unsequenced relative to etc

Side effects are changes in the state of the execution environment (1.9/12). A change is a change, not an expression that, if evaluated, would potentially produce a change. If there is no change, there is no side effect. If there is no side effect, then no side effect is unsequenced relative to anything else.

This does not mean that any code which is never executed is UB-free (though I'm pretty sure most of it is). Each occurrence of UB in the standard needs to be examined separately. (The stricken-out text is probably overly cautious; see below).

The standard also says that

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

(emphasis mine)

This, as far as I can tell, is the only normative reference that says what the phrase "undefined behavior" means: an undefined operation in a program execution. No execution, no UB.

n. 'pronouns' m.
  • 95,181
  • 13
  • 111
  • 206
  • 2
    This is by far the best answer. Still worth noting that undefined behavior, if invoked, can "travel backwards in time"... For example, if a conforming compiler can prove your program *always* invokes UB, it is free to compile the entire program into a no-op. But the UB still has to occur on an actual taken code path. (I also feel compelled to note that `constexpr` and UB have funny interactions in C++11/14/xxx that I am not sure I will ever fully understand.) – Nemo Jun 13 '14 at 01:05
  • 1
    "A conforming implementation executing a **well-formed** program" may indicate that for "not-so-well" formed programs all bets are off. – BЈовић Jun 13 '14 at 07:00
  • @BЈовић The standard says which programs are ill-formed. For some of them a diagnostic is required, for others not. With this latter category all bets are indeed off, the standard says so explicitly. – n. 'pronouns' m. Jun 13 '14 at 09:01
  • 1
    @Nemo: Concerning `constexpr` (more precisely _constant expressions_ which may or may not involve the `constexpr` keyword), those expressions _must_ be computed at compile time as they occur in contexts (bounds of an array type occuring in a struct, for instance) where the compiler must know their value (for knowing the offset of subsequent fields, in the example). I would expect a program where the evaluation of such an expression either fails to produce a compile-time constant, or invokes (say) division by 0, to be called ill-formed. But I don't have chapter and verse for that ready. – Marc van Leeuwen Jun 19 '14 at 08:57
  • 1
    @Nemo: If a program reads a variable of type `volatile char` and a compiler determines that it will engage in UB regardless of the value read, would it be able to draw any inferences backward before the read, or would it have to allow for the possibility that a read of a volatile variable might trigger a signal that prevents execution from proceeding far enough to reach the UB? – supercat Apr 09 '15 at 20:03
6

No. Example:

struct T {
    void f() { }
};
int main() {
    T *t = nullptr;
    if (t) {
        t->f(); // UB if t == nullptr but since the code tested against that
    }
}
j12x
  • 77
  • 1
  • 1
    Thats a good example, but could be great to improve the quality of the answer providing an explanation, references, etc – Manu343726 Jun 12 '14 at 20:29
  • 1
    You answered "No." to what? Also, what is your example supposed to show? Weird how many up votes such bad answer got. – BЈовић Jun 13 '14 at 07:02
  • 3
    @BЈовић The "no" must be taken as an answer to the title question: this does not make a program erroneous. The example is a clear case why it must be so, +1. – Marc van Leeuwen Jun 19 '14 at 08:35
6

Deciding whether a program will perform an integer division by 0 (which is UB) is in general equivalent the halting problem. There is no way a compiler can determine that, in general. And so the mere presence of possible UB can not logically affect the rest of the program: a requirement to that effect in the standard, would require each compiler vendor to provide a halting problem solver in the compiler.

Even simpler, the following program has UB only if the user inputs 0:

#include <iostream>
using namespace std;

auto main() -> int
{
    int x;
    if( cin >> x ) cout << 100/x << endl;
}

It would be absurd to maintain that this program in itself has UB.

Once the undefined behavior occurs, however, then anything can happen: the further execution of code in the program is then compromised (e.g. the stack might have been fouled up).

Manu343726
  • 13,319
  • 3
  • 35
  • 69
Cheers and hth. - Alf
  • 135,616
  • 15
  • 192
  • 304
  • I know that not all UB can be (easily) detected even at runtime, let alone be decided by mere inspection of the program; this is why behaving as if nothing special has happened is one of the legal options for UB. But the question is whether an implementation that does detect that an expression would produce UB may brandish that fact to defend aberrant behaviour even if execution of that code does not (or can not be proven to) take place. The answer is NO, I believe, but even if it were YES it would not _oblige_ anybody to solve the halting problem. – Marc van Leeuwen Jun 12 '14 at 19:22
  • @MarcvanLeeuwen: you're right. the first part of my answer here is ... dumb. not a valid argument. argh. thanks, fixing. – Cheers and hth. - Alf Jun 12 '14 at 22:10
  • @Manu343726: rather, what's the point of arbitrarily using two different function declaration syntaxes. we now have one syntax that covers everything. no reason not to use it. – Cheers and hth. - Alf Jun 13 '14 at 08:23
  • 1
    Don't use the term "once Undefined Behavior occurs", but rather "Once conditions have been established which would make Undefined Behavior inevitable". Language in the C Standard which may have been intended to make Undefined Behavior be unsequenced relative to other code has instead been interpreted by some compiler writers to imply that it should be bound by laws of neither time nor causality. – supercat Apr 19 '15 at 22:55
3

In the general case the best we can say here is that it depends.

One case where the answer is no, happens when dealing with indeterminate values. The latest draft clearly makes it undefined behavior to produce an indeterminate value during an evaluation with some exceptions but the code sample clearly shows how subtle it could be:

[ Example:

int f(bool b) {
  unsigned char c;
  unsigned char d = c; // OK, d has an indeterminate value
  int e = d;           // undefined behavior
  return b ? d : 0;    // undefined behavior if b is true
}

end example ]

so this line of code:

return b ? d : 0;

is only undefined if b is true. This seems to be the intuitive approach and seems to be how John Regehr sees it as well, if we read It’s Time to Get Serious About Exploiting Undefined Behavior.

In this case the answer is yes, the code is erroneous even though we are not calling the code invoking undefined behavior:

constexpr const char *str = "Hello World" ;      

constexpr char access()
{
    return str[100] ;
}

int main()
{
}

clang chooses to make access erroneous even though it is never invoked (see it live).

Community
  • 1
  • 1
Shafik Yaghmour
  • 143,425
  • 33
  • 399
  • 682
  • if that code sample is from a standard draft then it should be corrected. it's never UB to deal with byte values, regardless of how indeterminate they are. since bytes have no invalid bitpatterns. related, probably *all* examples using iostream output in C++98 and c++03 were incorrect. also the examples about UB in expressions were incorrect. happily examples are **non-normative**. they're just examples. – Cheers and hth. - Alf Jun 12 '14 at 18:49
  • @Cheersandhth.-Alf you should read [Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++1y?](http://stackoverflow.com/questions/23415661/has-c-standard-changed-with-respect-to-the-use-of-indeterminate-values-and-und) the latest draft has changed the language here a lot. – Shafik Yaghmour Jun 12 '14 at 18:55
  • yeah that's what I commented on. i only observed the modded square wheels. i have no desire to examine the design of the car's now modded electrical systems etc., thank you. – Cheers and hth. - Alf Jun 12 '14 at 18:57
  • @Cheersandhth.-Alf the examples are consistent with the normative text as far as I can tell and the text is pretty explicit now and does not leave much room for interpretation. – Shafik Yaghmour Jun 12 '14 at 18:58
  • ouch, i was afraid of that. it's so easy to destroy compared to creating. :( – Cheers and hth. - Alf Jun 12 '14 at 19:00
3

There's a clear divide between inherent undefined behaviour, such as n=n++, and code that can have defined or undefined behaviour depending on the program state at runtime, such as x/y for ints. In the latter case the program is required to work unless y is 0, but in the first case the compiler's asked to generate code that's totally illegitimate - it's within its rights to refuse to compile, it may just not be "bullet proofed" against such code and consequently its optimiser state (register allocations, records of which values may have been modified since read etc) gets corrupted resulting in bogus machine code for that and surrounding source code. It may be that early analysis recognised an "a=b++" situation and generated code for the preceding if to jump over a two byte instruction, but when n=n++ is encountered no instruction was output, such that the if statement jumps somewhere into the following opcodes. Anyway, it's simply game over. Putting an "if" in front, or even wrapping it in a different function, isn't documented as "containing" the undefined behaviour... bits of code aren't tainted with undefined behaviour - the Standard consistently says "the program has undefined behaviour".

user433534
  • 953
  • 5
  • 8
  • When you say "it is within [the compiler's] rights to refuse to compile", that is true in the sense that it does not have to emit any code for the UB statement (or even for any path the inevitably will run into an UB statement). It may not refuse to compile your program (if it is well formed), and on any input where the UB statement would never be encoutered, the program should behave as specified. For instance, if you have just divided by `n` and then say `if (n==0) something;`, then the compiler may suppress that conditional, dealing only with an UB situation. It will still work for `n!=0`. – Marc van Leeuwen Jun 19 '14 at 08:30
  • In conclusion, I disagree with "it's simply game over", in particular for the example in the question where the UB statment is _never_ executed. – Marc van Leeuwen Jun 19 '14 at 08:33
  • Yeah - can see from the question you only want to tell the world your own view... knock yourself out mate. – user433534 Jun 20 '14 at 02:52
  • Not really. Yes, I had my doubts whether people were interpreting the freedom given about UB in a too extreme way as even applying to purely hypothetical UB (as you do) could be right, and I was hoping for arguments for the contrary, but I did have a real question at the time. The answers I received (for instance the excepted one) did provide the convincing argument I hope for, so now I don't have that doubt any more. – Marc van Leeuwen Jun 20 '14 at 03:29
  • @MarcvanLeeuwen: Provided that there exists at least one (possibly contrived and useless) source text which exercises the translation limits given in the Standard, and which an implementation would processes as described by the Standard, an implementation's ability to behave meaningfully when given any other source text is purely a Quality of Implementation issue; nothing an implementation would do given any other source text would render it non-conforming. – supercat May 06 '20 at 20:42
3

It should be, if not "shall".

Behavior, by definition from ISO C (no corresponding definition found in ISO C++ but it should be still somehow applicable), is:

3.4

1 behavior

external appearance or action

And UB:

WG21/N4527

1.3.25 [defns.undefined]

undefined behavior

behavior for which this International Standard imposes no requirements [ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. —end note ]

Despite "to behaving during translation" above, the word "behavior" used by ISO C++ is mainly about the execution of programs.

WG21/N4527

1.9 Program execution [intro.execution]

1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5

2 Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects.6 Such documentation shall define the instance of the abstract machine that corresponds to that implementation (referred to as the “corresponding instance” below).

3 Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, evaluation of expressions in a new-initializer if the allocation function fails to allocate memory (5.3.4)). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. An instance of the abstract machine can thus have more than one possible execution for a given program and a given input.

4 Certain other operations are described in this International Standard as undefined (for example, the effect of attempting to modify a const object). [ Note: This International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]

5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

5) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

6) This documentation also includes conditionally-supported constructs and locale-specific behavior. See 1.4.

It is clear the undefined behavior would be caused by specific language construct used wrongly or in a non-portable way (which is not conforming to the standard). However, the standard mention nothing about which specific portion of code in a program would cause it. In other words, "having undefined behavior" is the property (about conforming) of the whole program being executed, not any smaller parts of it.

The standard could have given a stronger guarantee to make the behavior well-defined once some specific code is not being executed, only when there exists a way to map the C++ code to the corresponding behavior precisely. This is hard (if not impossible) without a detailed semantic model about execution. In short, the operational semantics given by the abstract machine model above is not enough to achieve the stronger guarantee. But anyway, ISO C++ would never be JVMS or ECMA-335. And I don't expect there would be a complete set of formal semantics describing the language.

A key problem here is the meaning of "execution". Some people think "executing a program" means making the program being run. This is not quite true. Note the representation of program executed in the abstract machine is not specified. (Also note "this International Standard places no requirement on the structure of conforming implementations".) The code being executed here can be literally C++ code (not necessarily machine code or some other forms of intermediate code which is not specified by the standard at all). This effectively allows the core language to be implemented as an interpreter, an online partial evaluator or some other monsters translating C++ code on-the-fly. As a result, actually there is no way to split the phases of translation (defined by ISO C++ [lex.phases]) completely ahead of the process of execution without knowledge about specific implementations. Thus, it is necessary to allow UB occurring during the translation when it is too difficult to specify portable well-defined behavior.

Besides the problems above, perhaps for most ordinary users, one (non-technical) reason is enough: it is simply unnecessary to provide the stronger guarantee, allow bad code and defeat one of the (probable most important) usefulness aspect of UB itself: to encourage quickly throwing away some (unnecessarily) nonportable smelly code without effort to "fix" them which would be eventually in vain.

Additional notes:

Some words are copied and reconstructed from one of my reply to this comment.

Community
  • 1
  • 1
FrankHB
  • 1,698
  • 17
  • 15
0

A C compiler is allowed to do anything it likes as soon as a program enters a state via which there is no defined sequence of events which would allow the program to avoid invoking Undefined Behavior at some point in the future (note any loop which does not have any side-effects, and which does not have an exit condition which a compiler would be to required to recognize, invokes Undefined Behavior in and of itself). The compiler's behavior in such cases is bound by the laws of neither time nor causality. In situations where Undefined Behavior occurs in an expression whose result is never used, some compilers won't generate any code for the expression (so it will never "execute") but that won't prevent compilers from using the Undefined Behavior to make other inferences about program behavior.

For example:

void maybe_launch_missiles(void)
{      
  if (should_launch_missiles())
  {
    arm_missiles();
    if (should_launch_missiles())
      launch_missiles();
  }
  disarm_missiles();
}
int foo(int x)
{
  maybe_launch_missiles();
  return x<<1;
}

Under the C current C standard, if the compiler could determinate that disarm_missiles() would always return without terminating but the three other external functions called above might terminate, the most efficient standard-compliant replacement for the statement foo(-1); (return value ignored) would be should_launch_missiles(); arm_missiles(); should_launch_missiles(); launch_missiles();.

Program behavior will only be defined if either call to should_launch_missiles() terminates without returning, if the first call returns non-zero and arm_missiles() terminates without returning, or if both calls return non-zero and launch_missiles() terminates without returning. A program which works correctly in those cases will abide by the standard regardless of what it does in any other situation. If returning from maybe_launch_missiles() would cause Undefined Behavior, compiler would not be required to recognize the possibility that either call to should_launch_missiles() could return zero.

As a consequence, some modern compilers, the effect of left-shifting a negative number may be worse than anything that could be caused by any kind of Undefined Behavior on a typical C99 compiler on platforms that separate code and data spaces and trap stack overflow. Even if code engaged in Undefined Behavior which could cause random control transfers, there would be no means by which it could cause arm_missiles() and launch_missiles() to be called consecutively without having an intervening call to disarm_missiles() unless at least one call to should_launch_missiles() returned a non-zero value. A hyper-modern compiler, however, may negate such protections.

supercat
  • 69,493
  • 7
  • 143
  • 184
0

In the dialect processed by gcc with full optimizations enabled, if a program contains two constructs which would behave identically in cases where both are defined, reliable program operation requires that any code that would switch among them only be executed in cases where both are defined. For example, when optimizations are enabled, both ARM gcc 9.2.1 and x86-64 gcc 10.1 will process the following source:

#include <limits.h>

#if LONG_MAX == 0x7FFFFFFF
typedef int longish;
#else
typedef long long longish;
#endif

long test(long *x, long *y)
{
    if (*x)
    {
        if (x==y)
            *y = 1;
        else
            *(longish*)y = 1;
    }
    return *x;
}

into machine code that will test if x and y are equal, set *x to 1 if they are and *y to 1 if they aren't, but return the previous value of *x in either case. For purpose of determining whether anything might affect *x, gcc decides that both branches of the if are equivalent, and thus only evaluates the "false" branch. Since that can't affect *x, it concludes that the if as a whole can't either. That determination is unswayed by its observation that on the true branch, the write to *y can be replaced with a write to *x.

supercat
  • 69,493
  • 7
  • 143
  • 184
-2

In the context of a safety-critical embedded system, the posted code would be considered defective:

  1. The code should not pass code review and/or standards compliance (MISRA, etc)
  2. Static analysis (lint, cppcheck, etc) should flag this as a defect
  3. Some compilers can flag this as a warning (implying a defect, as well.)
Throwback1986
  • 5,597
  • 1
  • 26
  • 21