-1

In C++, if I read an integer from a string, it seems it does not really matter whether I use u or d as conversion specifier as both accept even negative integers.

#include <cstdio>
using namespace std;

int main() {
    int u, d;
    sscanf("-2", "%u", &u);
    sscanf("-2", "%d", &d);
    puts(u == d ? "u == d" : "u != d");
    printf("u: %u %d\n", u, u);
    printf("d: %u %d\n", d, d);
    return 0;
}

Ideone.com

I dug deeper to find if there is any difference. I found that

int u, d;
sscanf("-2", "%u", &u);
sscanf("-2", "%d", &d);

is equivalent to

int u, d;
u = strtoul("-2", NULL, 10);
d = strtol("-2", NULL, 10);

according to cppreference.com.

Is there any difference at all between u and d when using these conversion specifiers for parsing, i.e. in format passed to scanf-type functions? What is it?

The answer is the same for C and C++, right? If not, I am interested in both.

Palec
  • 10,298
  • 7
  • 52
  • 116
  • `%u` is for unsigned and `%i` is signed. – user3813674 Feb 09 '16 at 15:44
  • 1
    I don't quite get the question I guess. Isn't the answer right in the documentation you linked? – Baum mit Augen Feb 09 '16 at 15:47
  • `scanf` is C. If you use C++, use the appropriate mechanism of C++! They are different languages. Said that: RTFM and enable compiler warnings and pay heed to them. – too honest for this site Feb 09 '16 at 15:48
  • 1
    @Olaf: `scanf` is C-style input, but still available in C++. The main motivation for this question is that a senior colleague of mine rejected part of a change from %u to %i I suggested when we started accepting negative integers. The codebase is legacy in uses C-style I/O all over the place. – Palec Feb 09 '16 at 16:04
  • If you want to write C code, use C. Coding C-style in C++ is just bad style. Even with identical syntax, semantics can differ. – too honest for this site Feb 09 '16 at 17:54
  • That is why I included the last line into my question, @Olaf. For my personal projects, I use C. At work, I have to maintain some old C++ code, written as “C with objects”. I want to know the answer for both the languages. Code should be used, not rewritten. Moreover, the C++ codebase is too large to change the style. In such a scenario, preserving consistency in style is more important than using the best style, IMO. Mixing styles causes confusion, which causes bugs. – Palec Feb 10 '16 at 11:17
  • *Even though `scanf()` is also available in C, you are asking about C++ code. C++ is not C.* I know the difference, @FUZxxl. I want to know the answer for both the languages. See my previous comment for detailed explanation, please. – Palec Feb 10 '16 at 11:19
  • 2
    @Palec “I want the answer for both languages.”—that's not how this site works. Please ask a question about one language at a time, otherwise you would require people who answer your question to be deeply familiar with both which is unreasonable. – fuz Feb 10 '16 at 11:28
  • "preserving consistency in style is more important than using the best style" - Not true as an absolute statement. There is a point a complete rewrite using language features is better than patching old code. This is true from a techical and economical view. But as MBAs have no idea about programming (euphemism!) and new development is more problematic to book than maintenance, people try riding dead horses. Another reason are programmers not learning new things. Ask _good_ programmers in banking who have to maintain old Cobol code which was never meant to be used that long. – too honest for this site Feb 10 '16 at 11:38
  • OK, @FUZxxl. That makes sense. Now I agree that this question should be only about C++. – Palec Feb 10 '16 at 13:58
  • Complete rewrite is OK, as it preserves consistency, @Olaf. But a rewrite of half of a single module among tens of other modules written using the same style is useless waste of time, producing a harder-to-maintain codebase. But we are getting OT. – Palec Feb 10 '16 at 14:03

3 Answers3

9

%d: Scan an integer as a decimal signed int. A similar conversion specifier, %i, interprets the number as hexadecimal when preceded by 0x and as octal when preceded by 0. Otherwise, it is identical.

%u: Scan an integer as a decimal unsigned int.

Palec
  • 10,298
  • 7
  • 52
  • 116
Bart
  • 519
  • 3
  • 14
1

Technically, you are invoking undefined behavior when trying to read a negative number into int using %u format specifier. You make sscanf treat pointer to signed integer as pointer to unsigned integer and those types are not compatible. It only works because both unsigned and signed ints have similar bit representation and signed integers use 2-complement representation.

TL/DR: You are not guaranteed to get -2 from sscanf("-2", "%u", &u);

Palec
  • 10,298
  • 7
  • 52
  • 116
Revolver_Ocelot
  • 7,747
  • 3
  • 26
  • 45
  • This is a thing where the behaviour might differ between C and C++ (not sure). In C, it's actually legal given that `signed int` and `unsigned int` have equal alignment quarantees, cf. ISO 9899:2011 §6.3.2.3 ¶7. – fuz Feb 10 '16 at 11:31
  • In the context of `scanf`-type functions, using unsigned conversion specifier for signed variable can still lead to [undefined behavior in C11](http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10), @FUZxxl. [C11 §6.3.2.3 ¶7](http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7) defines only requirements for casting pointers back and forth. Dereferencing has further constraints ([C11 §6.5 ¶7](http://port70.net/~nsz/c/c11/n1570.html#6.5p7)), but signed/unsigned aliasing is OK. Thus the `sscanf` example in my question is not exactly equivalent to the `strtoul`/`strtol` one. – Palec Feb 10 '16 at 18:40
1

Each conversion specifier has a corresponding type of the result argument defined in the C spec. The %u and %d conversion directives really accept the same inputs, as you observed, but the argument corresponding to %u shall be of type unsigned int*, not int*. I.e. your example should be corrected as:

unsigned int u;
int d;
sscanf("-2", "%u", &u);
sscanf("-2", "%d", &d);

Had you enabled warnings, you’d get one when compiling the original example. And rightfully so:

Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

Emphasis mine.

So, you were invoking undefined behavior (see the part What is Undefined Behavior). Once you invoke undefined behavior, you’re alone and nasty things may happen.


Conversion modifiers are defined in C99 (latest public draft, N1256; official PDF). The definition is the same as in C11 (latest public draft, N1570; official PDF). The most recent C++ draft (as of 2015-02-10, N4567) linked from the list of C++ standard documents under another question on Stack Overflow takes the definition of cstdio header from C99 and does not modify it (apart from placing the functions into the std namespace and the minor modifications mentioned in § 27.9.2).

Community
  • 1
  • 1
Palec
  • 10,298
  • 7
  • 52
  • 116
  • Thanks to [Olaf](http://stackoverflow.com/users/4774918/olaf) for encouraging me to use the spec instead of cppreference.com and for the link to its HTML version. – Palec Feb 10 '16 at 19:54