0

Is the third line in the following code well-defined?

char* result = new char[0];                                                                                                                                                                                                                    
printf("%d\n", strlen(result));                                                                                                                                                                                                                                                        
printf("%s\n", result);                                                                                                                                                                                                                                                                
delete[] result;

When I run the code, I get the expected output (a length of 0 followed by a two newlines printed). However, I'm not confident about whether this is a well-defined behavior or I just got lucky.

Is the call on the third-line well-defined?

merlin2011
  • 63,368
  • 37
  • 161
  • 279
  • 1
    "The argument must be a pointer to the initial element of an array of characters." from https://en.cppreference.com/w/cpp/io/c/fprintf – alter igel Jun 26 '18 at 19:50
  • 9
    I can't verify this to be sure at the moment, but my bet is that this is undefined behaviour -- for two reasons. 1) The "%s" printf argument takes a null-terminated string, which requires (by definition) a string with at least 1 character -- the null character. 2) Calling 'new' with an array size of 0 returns a pointer to allocated memory that cannot legally be dereferenced -- which means it cannot even be read to be compared to the null-terminator. – Human-Compiler Jun 26 '18 at 19:50
  • 5
    I'm pretty sure that `strlen(result)` is already problematic, since that requires a null-terminated `char` array as well – UnholySheep Jun 26 '18 at 19:51
  • @alterigel The semantics of `new[]` is vastly different from the semantics of `malloc`. – Some programmer dude Jun 26 '18 at 19:51
  • 5
    `printf("%d\n", strlen(result));` is problematic no matter what `result` is because `%d` takes an `int` but `strlen` returns a `size_t`. – melpomene Jun 26 '18 at 19:52
  • 1
    I think I'd be more concerned with whether the _first_ line is well-formed. Dereferencing a pointer created by calling `new[0]` results in undefined behavior according to http://www.cplusplus.com/reference/new/operator%20new[]/ – tdk001 Jun 26 '18 at 19:53
  • @tdk001, The *first* line doesn't deference the pointer; only the later lines do that. – merlin2011 Jun 26 '18 at 19:53
  • unless new fails there is no way to implement printf or strlen for that matter for this to *not* be UB. a zero length char array is not a null terminated char array no? – Ankur S Jun 26 '18 at 19:55
  • Yes, there is a problem called undefined behavior. You are creating a pointer to *somewhere*. The `printf` function will print all the characters at *somewhere* until it finds a nul character. Your operating system or hardware may trigger an exception if you access illegal memory *or not*. Although the minimum allocatable unit is a single character, your compiler may initialize it to nul (0), so it *may* be safe. – Thomas Matthews Jun 26 '18 at 19:55
  • It seems I have my answer; the only question is whether all the activity on this question means it will be useful to others in the future, or it was a silly question and should be deleted. – merlin2011 Jun 26 '18 at 19:57
  • 2
    Reading [this `operator new` reference](https://en.cppreference.com/w/cpp/memory/new/operator_new) says that the allocation function must return a non-null pointer. The size doesn't seem to matter. However, as many comments, dereferencing this pointer should lead to UB as it's the same as `result[0]` and any index would be out of bounds. – Some programmer dude Jun 26 '18 at 19:58
  • If you want an authoritative answer with quotes from the standard, then I suggest you add the `language-lawyer` tag. – Some programmer dude Jun 26 '18 at 19:58
  • curious .... where did you require a zero length array – Ankur S Jun 26 '18 at 19:59
  • @AnkurS, I was trying to use [this answer](https://stackoverflow.com/a/2912602/391161) in a library function and wanted to find out whether I needed to do manual checks against empty files; it seems that I do. – merlin2011 Jun 26 '18 at 20:01
  • @ThomasMatthews: Actually, `new[0]` might not create a pointer to _somewhere_. And there is no actual requirement that the minimum allocatable unit is a single byte. The problem is that the standard defines the result is such a way that it's immediately Undefined Behavior if you try to even access your hypothetical one byte. – MSalters Jun 26 '18 at 22:46
  • @MSalters: The `new[0]` either returns a value that is assigned into the pointer or it fails. The value inside the pointer may not be valid; and printing C-Style strings from an invalid address is definitely U.B. – Thomas Matthews Jun 27 '18 at 00:16

2 Answers2

3

Short answer: It is Undefined Behavior

Long answer: In C++, allocating an array of size 0 will produce a valid pointer to an array with no elements. From the standard (taken from this answer):

From 5.3.4/7

When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.

From 3.7.3.1/2

The effect of dereferencing a pointer returned as a request for zero size is undefined.

(Emphasis mine)

This means that there is no way to properly read from (or write to) the pointer returned from a new T[0] request.

Both strlen and printf for string formatting "%s" are defined to work on strings of characters that are terminated by a special NUL character. They require reading the sequence of characters from the supplied pointer to try to find this NUL character in order to properly operate (which results in UB, since this requires dereferencing the pointer). These behaviors are defined in the C standard, since the C++ standard delegates definitions of most C library types/functions back to the C standard.

printf access for %s is defined to do the following:

From C11 Standard §7.21.6.1/6

If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.

Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many bytes are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.

This requires access to the array (which will be UB, since the pointer is not valid to dereference)

Bonus

Your sample code is actually introducing UB on the second line due to the use of strlen, for similar reasons to above.

strlen is defined to do the following:

From C11 Standard §7.24.6.3/3: The strlen function

Returns

The strlen function returns the number of characters that precede the terminating null character.

Which is UB for the same reason as using printf.

Human-Compiler
  • 7,671
  • 21
  • 44
0

Sorry for having answered your "original" question (before your edit):

How about C?

In C you don't have new.

However:

strlen counts the characters in an array until a NUL character is found.

printf(%s) will print the characters in an array up to the NUL character found.

If you have a native compiler and the array does not contain a NUL character the two commands will continue searching for a NUL character after the end of the array.

Example:

char a[6]="Hello ";
char b[100]="world!";
char c[100]="John!";
printf("%s\n",a);

If the compiler places the array b in memory directly after the array a this example will print "Hello world!".

However if the compiler decides to place c after a the program will print "Hello John!".

If you use a compiler that can detect accesses outside an array (e.g. a C++ compiler for .NET) you'll get an error when the end of the array is reached and there is no NUL character or the end of the array will even be treated the same way as a NUL character.

All in all you can say: Depending on the compiler you will have different behavior when you pass an array to printf(%s) when it does not contain a NUL character.

This is what I would call undefined behavior...

I don't know how the new char[0] in C++ behaves however I think there is no difference to C...

Martin Rosenau
  • 14,832
  • 2
  • 13
  • 30
  • More generally, printf("%s",...) and strlen() should certainly be expected to read bytes starting at the supplied address, and neither is specified as having any ability to read bytes of storage that could not be read via other means. Note that in C, an attempt to declare an array with a compile-time-constant length of zero is a constraint violation, and declaring an array with a run-time-computed length that happens to be zero invokes Undefined Behavior regardless of whether code attempts to use the array in any fashion whatsoever. – supercat Jun 27 '18 at 21:24