7

When checking

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char    c[20];
    size_t  l;

    l = fread(c, sizeof c, 1, stdin);
    if (l != 1)
        return 1;

    return c[0] == 42;
}

with clang, I get

$ clang  --analyze -Xclang -analyzer-checker=alpha x.c
x.c:13:14: warning: The left operand of '==' is a garbage value
        return c[0] == 42;
               ~~~~ ^

$ clang -v
clang version 7.0.1 (Fedora 7.0.1-4.fc29)

Is there really a chance that c contains garbage at this point? If not, how can I avoid the warning (without the obvious initialization of c)?

Update

Because it seems to be common consensus, that this is a false positive, I want to focus on the way how to avoid the warning.

It is true that fread() is a standard function and analyzers should know their semantics as they are doing e.g. for memset() already. But I am interested in a more generic way which can be used e.g. on library functions.

I would call some special function (let call it assert_defined()) in a way like

    l = fread(c, sizeof c, 1, stdin);
    assert_defined(c, l * sizeof c);

which is

  • a noop
  • but lets the compiler/analyzer think that l * sizeof c bytes at c are initialized

Does clang know annotations like

inline static void assert_defined(void const *p, size_t cnt) 
   __attribute__((__dear_compiler_this_memory_is_not_garbage__(1,2)))
{
}

or are there tricks like the related

int i = i;

which prevents gcc to emit "uninitialized warnings"?

ensc
  • 5,922
  • 12
  • 17
  • @Eugene read last statement? – machine_1 Mar 07 '19 at 17:39
  • @machine_1 read and removed. Weird that the tool is complaining. Anyway, a workaround could be to initialize `c`. – Eugene Sh. Mar 07 '19 at 17:39
  • But looks like your usage of `fread` is incorrect. The second argument should be the size of *one* element. – Eugene Sh. Mar 07 '19 at 17:42
  • `c[0]` can contain garbage if `fread()` failed to read any data. But you already test for that. It might not be smart enough to realize this. – Shawn Mar 07 '19 at 17:44
  • 1
    @EugeneSh. `fread(c, sizeof c, 1, stdin);` and `fread(c, 1, sizeof c, stdin);` both attempt to read the same amount of data, but the first will report `0` if it can't read it all. – Weather Vane Mar 07 '19 at 17:45
  • @EugeneSh.in this example one element has the size of 20 bytes; in real world, `c` might be some structured data – ensc Mar 07 '19 at 17:45
  • @ensc For that case it should take `&c` as the first argument. Yes, for arrays it might not matter, but it is demonstrating the intention. Perhaps it could matter for the analyzer? – Eugene Sh. Mar 07 '19 at 17:46
  • @EugeneSh.: Logically it should take `c`, and it would make a difference if `fread` took `char *` rather than `void *`. – R.. GitHub STOP HELPING ICE Mar 07 '19 at 17:48
  • @R.. Logically it should take the pointer to the type of the element whose size is passed as the second argument. – Eugene Sh. Mar 07 '19 at 17:49
  • @EugeneSh.: OK that's a reasonable position. – R.. GitHub STOP HELPING ICE Mar 07 '19 at 17:51
  • 2
    It is probably a good idea to avoid using a single `l` as a symbol name; in many fonts it is not clearly distinguishable from `1`. – Clifford Mar 07 '19 at 18:29
  • Your update is a new question. Since this one already had answers that do not address the second question, it would have been better to post a new question. – Clifford Mar 08 '19 at 09:32

2 Answers2

1

Yes it could contain garbage - if fread() fails.

For the analyser to understand that the check guarantees c[0] is not read if fread fails would require the analyser to understand the semantics of the fread() function. That's computationally expensive task for any non-trivial code, and would require either sight of the library source, or encoding of the standard library semantics - which is possible but would only spot a small subset of issues involving "known functions".

Initialising the array will avoid this specific issue:

char c[20] = {0} ;
Clifford
  • 76,825
  • 12
  • 79
  • 145
  • 2
    The case `fread` is failing is covered by the code. The question is why the analyzer can't catch it. – Eugene Sh. Mar 07 '19 at 17:47
  • The question was: "If not, how can I avoid the warning (without the obvious initialization of c)?" – Weather Vane Mar 07 '19 at 17:49
  • when `fread()` fails, `l` will be `0` and `c[0]` will never be accessed. I would like to avoid the explicit initialization to catch cases where `c` is really undefined. – ensc Mar 07 '19 at 17:50
  • 1
    @EugeneSh. It assumes that the analyser understands the semantics of `fread()`. Possible but unlikely perhaps for standard library functions but not generally true of any arbitrary function. – Clifford Mar 07 '19 at 17:50
  • @R.. But does it? And if it did would it be worth much if it could only do that for a small set of "known" functions - any third-party library would be unknown. – Clifford Mar 07 '19 at 17:57
  • @ensc : Initialising all variables is generally considered good practice and is required in many coding standards, I also have an opinion about multiple returns in a single function. You should not perhaps use a static analysis tool as a crutch to support poor coding practices. – Clifford Mar 07 '19 at 18:02
  • 2
    @Clifford dump initialization of variables is a bad practice which hides many programming errors which would be detected by compilers else. – ensc Mar 07 '19 at 18:03
  • @EugeneSh. I think the analyzer cannot understand that the `if` after the `fread` instruction controls errors. The `fread` function is a standard library function, but who says the code uses the standard library. Instead of `fread` the function could be `gread` ... what's the expected analyzer behaviour in this case? – Sir Jo Black Mar 07 '19 at 18:05
  • 1
    @SirJoBlack Analyzers can understand pretty fancy stuff these days. I am pretty sure Coverity would understand it very well (but these are different weight classes, of course). Anyway, this one is a clear false positive. – Eugene Sh. Mar 07 '19 at 18:07
  • @EugeneSh. Coverity is paid for. Perhaps you get what you pay for with static analysis? – Clifford Mar 07 '19 at 18:09
  • @Clifford That's what I am saying. False positive that's it, but not totally impossible to get rid of. – Eugene Sh. Mar 07 '19 at 18:11
  • @Clifford: Any third-party library can be known because it's (in theory) portable source that the analysis tool can inspect. The standard functions on the other hand may not be written in [portable] C, and from a standpoint of judging correctness, what it's *specified* to do is more important than what your particular implementation does (e.g. even if a function always had some special property on your particular implementation, if it's not part of the specified behavior, assuming it would be a bug in the program). – R.. GitHub STOP HELPING ICE Mar 07 '19 at 18:18
  • @R.. I am not sure of your point or whether you are agreeing, disagreeing or just raising a new point. Not all libraries are distributed as source code, which was my point. I don't think we disagree. It is academic in context of this question and this tool where the evidence is that it does not know the semantics. The comments were fairly initiated in response to my original incomplete answer, which I have edited. As it happens the fix is just _better code_ so in that sense a "clever" tool would have passed some pretty hideous code (IMO). – Clifford Mar 07 '19 at 18:25
  • @Clifford: If you have third-party libraries that aren't distributed as source then you really can't analyze the correctness of the program beyond a limited degree without binary analysis. Perhaps it would be nice to have frameworks for expressing their contracts and analyzing with the assumption that they actually honor their contracts. Anyway what I was saying is that it *does make sense* to treat the standard library (which is part of the language) as special and analyze based on its specified behavior. – R.. GitHub STOP HELPING ICE Mar 07 '19 at 19:07
  • Sight of the library source is not required; the analyzer could assume the function behaves as specified by the C Standard – M.M Mar 07 '19 at 21:21
  • @M.M Yes, I said that. I think we are all in violent agreement about the possibilities. I am not justifying the behaviour of clang-analyze, just explaining it - answering the question. – Clifford Mar 08 '19 at 09:29
0

I would say that the compiler is not supposed to know the working of the function fread(). From the perspective of the compiler, fread() may or may not modify 'c'.

So if you do not explicitly initialize your variable in your specific case, the compiler has no option but to emit a warning.

machine_1
  • 3,863
  • 2
  • 16
  • 39
  • A static analyzer is allowed to know all about myriad APIs, especially the standard library, as many as useful to improve accuracy. Likewise, a compiler may know for improved optimisation, though is less likely to, as the benefits are generally marginal. – Deduplicator Mar 07 '19 at 18:19
  • `fread` is part of the C language, and most certainly does modify `c[0]` if it doesn't return 0. – R.. GitHub STOP HELPING ICE Mar 07 '19 at 18:19
  • clang-analyze is not a compiler; it is a static analysis tool intended to catch _semantic_ errors that a compiler _syntax_ checker does not. As such saying what the compiler should or should not do is not the issue or in dispute. – Clifford Mar 07 '19 at 18:35
  • @Clifford thanks for bringing that to my attention. In this case, the cause could be that the source of fread() is not available for the analyser and as such has no way of telling whether it shall modify the first parameter unless it has the necessary information regarding standard library functions. – machine_1 Mar 07 '19 at 18:45
  • This answer is not correct. Since `fread` is part of the C library, a program [cannot redefine it](https://stackoverflow.com/questions/50741722/redefining-function-from-standard-library). Both a compiler and an analyzer can and should have built-in knowledge of its behavior, just like an analyzer must know what `malloc` and `free` do in order to report resource leaks. – Scott McPeak Sep 08 '19 at 22:21