4

After Cppcheck was complaining about "%u" as the wrong format specifier to scan into an int variable, I changed the format into "%d", but when having a second look on it before committing the change, I thought that the intention could be to prevent for negative inputs. I wrote two small programs to see the difference:

Specifier %d

#include <iostream>
#include <stdlib.h>
using namespace std;

int main() {
    const char* s = "-4";
    int value = -1;
    int res = sscanf(s, "%d", &value);
    cout << "value:" << value << endl;
    cout << "res:" << res << endl;
    return 0;
}

see also https://ideone.com/OR3IKN

Specifier %u

#include <iostream>
#include <stdlib.h>
using namespace std;

int main() {
    const char* s = "-4";
    int value = -1;
    int res = sscanf(s, "%u", &value);
    cout << "value:" << value << endl;
    cout << "res:" << res << endl;
    return 0;
}

see also https://ideone.com/WPWdqi

Result(s)

Surprisingly in both conversion specifiers accept the sign:

value:-4
res:1

I had a look into the documentation on cppreference.com. For C (scanf, fscanf, sscanf, scanf_s, fscanf_s, sscanf_s - cppreference.com) as well as C++ (std::scanf, std::fscanf, std::sscanf - cppreference.com) the description for the "%u" conversion specifier is the same (emphasis mine):

matches an unsigned decimal integer.
The format of the number is the same as expected by strtoul() with the value 10 for the base argument.

Is the observed behaviour standard complient? Where can I find this documented?

[Update] Undefined Behaviour, really, why?

I read that it was simply UB, well, to add to the confusion, here is the version declaring value as unsigned https://ideone.com/nNBkqN - I think the assignment of -1 is still as expected, but "%u" obviously still matches the sign:

#include <iostream>
#include <stdlib.h>

using namespace std;

int main() {
    const char* s = "-4";
    unsigned value = -1;
    cout << "value before:" << value << endl;
    int res = sscanf(s, "%u", &value);
    cout << "value after:" << value << endl;
    cout << "res:" << res << endl;
    return 0;
}

Result:

value before:4294967295
value after:4294967292
res:1
orbitcowboy
  • 1,190
  • 9
  • 22
Wolf
  • 8,482
  • 7
  • 48
  • 92
  • Is there a reason you use `sscanf` to "parse" the string? Why not simply put it into an [`std::istringstream`](http://en.cppreference.com/w/cpp/io/basic_istringstream) and use normal stream extraction with the `>>` operator? Of if you're certain that the string can only contain a valid number then perhaps use [`std::stoi`](http://en.cppreference.com/w/cpp/string/basic_string/stol)? – Some programmer dude Sep 13 '17 at 11:15
  • If you use `%u` with an `unsigned int`, you will get the expected output: https://ideone.com/XUaZmV – mch Sep 13 '17 at 11:17
  • 1
    Regarding documentation, the reference you link to (or its [C++ counterpart](http://en.cppreference.com/w/cpp/io/c/fscanf)) is very good. Otherwise you can also go to the [official homepage of the C++ standards commite](https://isocpp.org/) and read [the latest draft](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4687.pdf). – Some programmer dude Sep 13 '17 at 11:18
  • @mch I think `sscanf` has as well the opportunity to block the sign. In my opinion, the result is not the expected one. – Wolf Sep 13 '17 at 11:25
  • @Someprogrammerdude thanks for the ref. I'm using `sscanf` because **it** is the function in question. I found it easier to check it within a C++ environment. (the original sources mix parts from both worlds) – Wolf Sep 13 '17 at 11:29
  • 1
    http://en.cppreference.com/w/cpp/io/c/fscanf says that `%u` expect a decimal number like `strtoul`. http://en.cppreference.com/w/cpp/string/byte/strtoul says "(optional) plus or minus sign" and "If the minus sign was part of the input sequence, the numeric value calculated from the sequence of digits is negated as if by unary minus in the result type, which applies unsigned integer wraparound rules." – mch Sep 13 '17 at 11:31
  • @mch Thanks for helping me having a closer look. I last time learned that *scanf functions are supporting a (limited!) flavour of regex (things like `[^1-5]`) so I'm surprised to find it treating possible signs characters different. Maybe your are interested in writing one more answer? – Wolf Sep 13 '17 at 11:41
  • @Someprogrammerdude (concerning c/c++) sorry I mixed the references (corrected). Have a more specific standard (draft) reference - maybe a page number? – Wolf Sep 13 '17 at 11:56

2 Answers2

4

There are two separate issues.

  1. %u expects a unsigned int* argument; passing a int* is UB.
  2. Does %u match -4? Yes. The expected format is that of strtoul with base 10, and if you read the documentation it's quite clear that a leading minus sign is allowed.
Wolf
  • 8,482
  • 7
  • 48
  • 92
T.C.
  • 123,516
  • 14
  • 264
  • 384
  • I see, I should better have been following this link. I stuck to the headline "matches an unsigned decimal integer." (with explicit *unsigned* here) – Wolf Sep 13 '17 at 11:32
3

No, it's not standard compliant. In fact the behaviour of your program is undefined: the format specifier for sscanf must match the types of the arguments.

Bathsheba
  • 220,365
  • 33
  • 331
  • 451