2

I came across some code and was wondering if it was just a fluke that it worked as expected or was just bad practice. Consider the following MCVE (ideone):

#include <cstdio>

struct dummyStruct
{
    unsigned short min[4];
    unsigned short max[4];
    int            dummyBuffer; // This just happens to be here as a real variable in the original code, not just as a buffer.
};


int main()
{
    dummyStruct db;
    // Note that the size of the short is assumed to be half of that of the %d specifier
    sscanf("  123,   456,  789,   112", "%d, %d, %d, %d", db.min+0, db.min+1, db.min+2, db.min+3);
    sscanf("29491, 29491, 29491, 29491", "%d, %d, %d, %d", db.max+0, db.max+1, db.max+2, db.max+3);
    db.dummyBuffer = 1234;
    printf("%hd, %hd, %hd, %hd\n", db.min[0], db.min[1], db.min[2], db.min[3]);
    printf("%hd, %hd, %hd, %hd\n", db.max[0], db.max[1], db.max[2], db.max[3]);
    printf("%d\n", db.dummyBuffer);

    return 0;
}

Are the contents of the struct guaranteed by the standard, or is this undefined behavior? I saw no mention of this in N4810. Alternatively, if we reversed the order of the variables, e.g.

printf("%hd, %hd, %hd, %hd\n", db.min[0], db.min[2], db.min[1], db.min[3]);

are the contents of db.min guaranteed? Is the order of the parameters (from left to right) the order of assignment? Also note that I'm not asking why this is bad practice, even if defined. Nor do I need comments telling me not to use scanf. I'm not.

Avi Ginsburg
  • 9,517
  • 3
  • 24
  • 50

1 Answers1

3

You saw no mention in N4810 because when it comes to the C standard library the specification is mostly deferred to "ISO/IEC 9899:2011, Programming languages — C". If we take a look in N1570 (C11 draft), it says this about the scanf family of functions:

7.21.6.2 The fscanf function (emphasis mine)

10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

So your sample working is indeed a fluke born out of undefined behavior.

StoryTeller - Unslander Monica
  • 148,497
  • 21
  • 320
  • 399
  • So the UB isn't due to the mismatched widths, but rather just because the `short`s aren't `int`s? So even on a system where `sizeof(short) == sizeof(int)` this would still be UB, correct? – Avi Ginsburg May 01 '19 at 07:24
  • Why doesn't the "or if the result of the conversion cannot be represented in the object" part make it defined (in the case where `sizeof(short) == sizeof(int)`)? I guess that would be "and" instead of "or" but then the second condition doesn't make sense, as if the object has an appropriate type, then it should be representable by the object. – Avi Ginsburg May 01 '19 at 07:29
  • @AviGinsburg - Those are two separate conditions that must both hold. Put another way, it's well defined only if the type fits exactly AND the character sequence represents a value in the range of the type. Edit: Yes to your own ninja edit :) – StoryTeller - Unslander Monica May 01 '19 at 07:32
  • Got it. Thanks. – Avi Ginsburg May 01 '19 at 07:33