8

I know that the following is undefined because I am trying to read and write the value of variable in the same expression, which is

int a=5;
a=a++;

but if it is so then why the following code snippet is not undefined

int a=5;
a=a+1;

as here also I am trying to modify value of a and write to it at the same time.

Also explain why the standard is not curing this or removing this undefined behavior, in spite of the fact that they know that it is undefined?

haccks
  • 97,141
  • 23
  • 153
  • 244
OldSchool
  • 1,973
  • 4
  • 18
  • 39
  • 5
    You're *not* modifying the value of `a` on the right-side of the second expression. There is no sequence violation there, so your "also" is erroneous. – WhozCraig Mar 24 '14 at 07:57
  • 1
    As to the second question: the answer is probably "because it's so convenient" and the philosophy of C sometimes appears as "if the user absolutely wants to shoot his foot, we provide him with all means to conveniently do so" ;). Btw. if I compile your first two lines with `gcc -Wall` I get: warning: multiple unsequenced modifications to 'a' [-Wunsequenced] a = a++; ~ ^` – mfro Mar 24 '14 at 08:03

5 Answers5

6

why the following code snippet is not undefined

int a=5;
a=a+1;  

The Standard states that

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.

In case of a = a + 1; a is modified only once and the prior value of a accessed only to determine the value to be stored in a.
While in case of a=a++;, a is modified more than once-- by ++ operator in sub-expression a++ and by = operator in assigning the result to left a. Now it is not defined that which modification, either by ++ or by =, will takes place first.

Almost all modern compiler with flag -Wall would raise a warning, on compiling the first snippet, like:

[Warning] operation on 'a' may be undefined [-Wsequence-point]

Further reading: How can I understand complex expressions like the ones in this section, and avoid writing undefined ones?

haccks
  • 97,141
  • 23
  • 153
  • 244
  • please give me some detail – OldSchool Mar 24 '14 at 07:57
  • @haccks Won't the value of `a` after the statement `a = a++;` gets executed be always `1` more than it was before the statement is executed? It doesn't matter whether the increment takes place first or the assignment. Is it undefined simply because the standard says so? – ajay Mar 24 '14 at 08:42
  • 1
    @ajay; Partially, yes. For beginning level you can say that it invokes UB because standard says. The answer that why it invokes UB is lies in the [execution of statements by CPU](http://stackoverflow.com/a/21671069/2455888) (CPU architecture). – haccks Mar 24 '14 at 09:01
6

The reason why it is undefined is not that you read and write, it is that you write twice.

a++ means read a and increment it after reading, but we don't know if the ++ will happen before the assignment with = (in which case the = will overwrite with the old value of a) or after, in which case a will be incremented.

Just use a++; :)

a = a + 1 does not have the problem as a is only written once.

CMoi
  • 836
  • 4
  • 9
  • 1
    +1 Ironically, this specific example is a tough nut to crack. It violates the sequence-point mandate of the standard, but *how* is not necessarily intuitive. `a++` does not mean read, then increment. It summarily means (1) read-to-temp, (2) increment, and (3) return the temp from (1) as the expression value. Of all of the examples the OP could have chosen, this is one of the more warped, Per 5.2.6 [expr.post.incr] "The value computation of the ++ expression is sequenced **before** the modification of the operand object.". Yeah, like that made it clear. – WhozCraig Mar 24 '14 at 08:16
  • @WhozCraig So the value of `a` after the statement `a = a++;` gets executed is either the old value or old value plus one depending on whether increment takes place first or assignment. Shouldn't it be unspecified behaviour then, instead of undefined behaviour? – ajay Mar 24 '14 at 09:35
  • @WhozCraig You must be confused. Section 5 is “Environment” in all C standards since 1999 (in C89, there was no section 5). In addition, whatever you quoted is not the C definition of post-increment. – Pascal Cuoq Mar 24 '14 at 23:50
  • @PascalCuoq You're write about my confusion on one regard. I was quoting the C++11 standard, not C (I really need to cleanup my desktop, all these docs!). My peaked interest in said-description stands, but you're quite correct it is not from the C standard, and for that, rampant apologies. – WhozCraig Mar 25 '14 at 00:32
  • @ajay: If `a` is larger than a machine word (on an 8-bit machine, even `int` would be two words), a statement like `*p=b++;` could be handled many different ways. If `p` aliases `b` and `b` holds 0x04FF, plausible results would not only be 0x500 and 0x4FF, but also 0x5FF and 0x400. – supercat Jul 10 '15 at 21:12
3

The ++ operator will add one to a, meaning the variable a will become a+1. In effect, the following two statements are equal:

a++;
a = a + 1;

The last statement, a + 1, will not increase a - it will generate a result that has value a + 1. If you want a to become a+1, you have to assign the result of a + 1 to a with

a = a + 1;

The reason the first statement you made won't work is because you write something like

a = (a = a + 1);
EagleV_Attnam
  • 659
  • 5
  • 9
3

Long story short, you can find every defined behavior in the standard. Everything that is not mentioned there as defined - is undefined.

Intuitive explanation to your example:

a=a++;

You want to modify the variable a two times in a single statement.

1) a= //first time
2) a++ //second time

If you look here:

a=a+1;

You modify variable a only once:

a= // (a+1) - doesn't change the value of a

Why don't the standard define a=a++ behavior?

One of the possible reasons is: Compiler can perform optimizations. The more cases you define in a standard, the less freedom compiler has to optimize your code. Because different architectures can have different increasing instructions implementations, the compiler wouldn't use all processor instructions in case they will break the standard behavior. Or in some cases compiler can change the evaluation order, but this restriction will force a compiler to disable such optimizations if you want to modify something twice.

JustAnotherCurious
  • 2,049
  • 11
  • 30
  • Unspecified behaviors are good for optimizations in cases where any allowable behavior would satisfy program requirements. Undefined Behavior is often far less useful for optimization, since it compels programmers to specify things they don't care about in order to meet those requirements they do care about. – supercat Jul 10 '15 at 21:07
2

Others have already talked about the details of your specific example, so I'll add some general information and tools that help to catch undefined behaviour.

There is no ultimate tool or method to catch undefined behaviour, so even if you utilize all of these tools, there is no guarantee that there isn't something in your code that isn't undefined. But IME these will catch quite a lot of the common issues. I'm not listing the standard good practices of software development like unit-testing, that you should be using anyway.

  • clang(-analyze) has an several options that can help with catching undefined behaviour, both at compile-time and at runtime. It has -ftrapv, it has newly acquired support for canary values, its address sanitizer, --fcatch-undefined-behaviour, et cetera.

  • gcc also has several options to catch undefined behaviour, such as mudflaps, its address sanitizer, the stack protector.

  • valgrind is a fantastic tool for finding memory-related undefined behaviour at runtime.

  • frama-c is a static analysis tool that can find and visualize undefined behaviour. It's ability to find dead code (undefined behaviour can oftentimes cause other portions of code to become dead) is a pretty useful tool to track down potential security concerns. frama-c has many more advanced features, but can arguably be more difficult to use than...

  • Other commercial static analysis tools that can catch undefined behaviour exist, such as PVS-studio, klocwork, et cetera. These usually cost a lot, though.

  • Compile with different compilers and for strange architectures. If you can, why not compile and run your code on a 8-bit AVR chip? A raspberry pi (32-bit ARM)? Compile it to javascript using emscripten and run it in V8? Doing this tends to be a practical fashion of catching undefined behaviour that would cause crashes down the line (but does little/nothing for catching lurking UB that may e.g. cause security issues).

Now, as to the ontological reasons as to why undefined behaviour exists... It is basically for performance and ease-of-implementation reasons. Many things that are UB in C allow the compiler to optimize certain things that other languages are not capable of optimizing. If you e.g. compare how java, python and C handle overflow of signed integer types, you can see that on one extreme end, python completely well-defines it in a fashion convenient for the programmer -- ints can in fact become infinitely big. C on the other end of the spectrum leaves it undefined -- it is your responsibility to never overflow your signed integers. Java is somewhat inbetween.

But on the other hand, that means that there is no knowing in python what work the "int + int" operation will actually perform when executed. It may execute many hundreds of instructions, take a round-trip through the operating system to allocate some memory, et cetera. This is pretty bad if you care a lot about performance, or more specifically, consistent performance. C on the other end of the spectrum allows the compiler to map "+" to the CPUs native instruction that adds integers (if one exists.) Sure, different CPUs may handle overflows differently, but since C leaves that undefined, that's fine -- you as the programmer have to take care of not overflowing your ints. This means that C gives the compiler the option to compile your "int + int" operations to a single machine instruction on pretty much all CPUs - something compilers can and do take advantage of.

Note that C makes no guarantee that + actually maps directly to a native CPU instruction, it just leaves the possibility for the compiler to do it that way open -- and obviously this is something any compiler-writer would be eager to take advantage of. Javas method of defining signed integer overflow is less unpredictable (in terms of performance) than pythons, but may not lead to + being turned into a single CPU instruction on many CPU types where C would allow it.

So essentially, C attempts to embrace undefined behaviour, and opts for (consistent) speed and ease-of-implementation where other languages opt for safety or predictable behaviour (from the programmers perspective.) That isn't necessary a good decision with e.g. respect to safety/security, but that's where C stands. It boils down to "know the appropriate tool for the job at hand", and there are definitely many cases where the performance predictability C gives you is absolutely essential.