Why can't dead code detection be fully solved by a compiler?

Question

The compilers I've been using in C or Java have dead code prevention (warning when a line won't ever be executed). My professor says that this problem can never be fully solved by compilers though. I was wondering why that is. I am not too familiar with the actual coding of compilers as this is a theory-based class. But I was wondering what they check (such as possible input strings vs acceptable inputs, etc.), and why that is insufficient.

make a loop, put code after it, then apply https://en.wikipedia.org/wiki/Halting_problem — zapl, Oct 21 '15 at 18:19
`if (isPrime(1234234234332232323423)){callSomething();}` will this code ever call something or not? There are many other examples, where deciding wether a function is ever called is by far more expensive than just including it in the program. — 463035818_is_not_a_number, Oct 21 '15 at 18:22
`public static void main(String[] args) {int counterexample = findCollatzConjectureCounterexample(); System.out.println(counterexample);}` — user253751, Oct 21 '15 at 23:17
@tobi303 not a great example, it's really easy to factor prime numbers... just not to factor them relatively efficiently. The halting problem is not in NP, its unsolvable. — en_Knight, Oct 22 '15 at 04:11
You can also have dead-code analyzers with false positives. Imagine you used a tool that analyzed your Java code, and found that a private method somewhere was never used. (Fairly reasonable to assume that it's therefore dead code, right?) So you remove it and run your program. Now it crashes. Huh, that's weird. Why did it crash? Because somewhere else in the program, that private method was called using reflection! (A bit of a contrived example, perhaps, but still.) — Alex, Oct 22 '15 at 08:12
@Alex Also, code that is defined externally (for example, on embedded systems) — MKII, Oct 22 '15 at 08:40
@en_Knight From wikipedia: "The halting problem is theoretically decidable for linear bounded automata (LBAs) or deterministic machines with finite memory." As I understand, for most programs it is decidable if they halt or not, but still in practice this would be too expensive. I might be wrong, but I think in reality it is always a trade off between letting the compiler perform intensive calculations to save some kB on the executable. imho any example is a bad one. Real programs use finite memory thus finite number of states, but this number is to huge to let the compiler check all of them — 463035818_is_not_a_number, Oct 22 '15 at 09:29
@tobi303 `do { print ("So sad!"); res = doGoogleSearch("coolest guy ever"); } while (res.getFirstResult() != myHomepage) ` Will it halt? — Josef says Reinstate Monica, Oct 22 '15 at 13:48
@tobi303 For practical purposes you cannot treat computers as simple automatons. There's nothing really preventing me from adding a few TB of hard disk when more memory is required; so the whole argument "there are a finite number of states" doesn't really hold because even they are finite you do not know an upper bound on their number and thus you have to treat the machine as a turing machine. A finite automaton never modifies its memory. Btw `1234234234332232323423` is trivially composite:it's divisible by `3` (sum of the digits is `60`, or `python3 -c 'print(1234234234332232323423%3)'`). — Bakuriu, Oct 22 '15 at 13:52
@tobi303 it is a do..while loop. they are always executed. That's just how they roll! — Josef says Reinstate Monica, Oct 22 '15 at 13:58
@Bakuriu I get your point and I agree. Btw I was sure that someone will jump on it and tell me that the number is trivially non-prime ;). What I actually wanted to say is: `if (someLenghtyCalculations()){doSomething();}` in such a case usually I dont want lenghty calculations during compile time but at runtime. — 463035818_is_not_a_number, Oct 22 '15 at 14:00
@Josef yes, I misread it. And actually google isnt yet aware of me being the coolest guy, so it wont halt :( — 463035818_is_not_a_number, Oct 22 '15 at 14:03
@Bakuriu I guess the assumption is that the cost of adding several TB of HDD and a bunch more CPUs and the time cost of running them exceeds the cost of distributing unreachable code in the final program. — Damian Yerrick, Oct 22 '15 at 16:16
@alephzero and en_Knight - You are both wrong. isPrime is a great example. You made an assumption that the function is checking for a Prime Number. Maybe that number was a serial number and it does a database lookup to see if the user is an Amazon Prime member? The reason it is a great example is because the only way to know if the condition is constant or not is to actually execute the isPrime function. So now that would require the Compiler to also be an interpreter. But that still wouldn't solve those cases where the data is volatile. — Dunk, Oct 22 '15 at 17:27
As I understand it, *proving* whether each line of code in an arbitrary program is unreachable would require solving the halting problem, which is established to be impossible. — bwDraco, Oct 24 '15 at 07:18
@Dunk: no, `isPrime` (the Amazon kind) is not a good example because this is _definitely not dead code_ – since at some point, anybody could yet become a Prime member (even if it requires, I don't know, zombie resurrection and major changes of international law... disproving this is not something you could even attempt in a mathematical sense; from a mathematical perspective, [anything physical you can think about is possible](https://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker's_Guide_to_the_Galaxy#Infinite_Improbability_Drive)). — leftaroundabout, Oct 24 '15 at 17:47
What is a dead code? Is that commented one? like this '// This is a dead code.' — Emmanuel Angelo.R, Oct 26 '15 at 06:24
@EmmanuelAngelo.R No, a comment is a comment. Dead code is just code, that will never run. For example, anything inside an `if` whose condition is always `false`, e.g. in `if ( x > 2 && x < 2 ) { foo(); }`, the statement `foo();` is dead code. — RealSkeptic, Oct 26 '15 at 14:10

RealSkeptic · Accepted Answer · 2015-10-22T10:37:30.557

277

The dead code problem is related to the Halting problem.

Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.

How does this relate to dead code?

The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.

How do you transfer an algorithm for dead code into an algorithm for the Halting problem?

Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).

Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).

These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.

edited Oct 22 '15 at 10:37

answered Oct 21 '15 at 18:35

RealSkeptic

32,074
7
48
75

8

I would argue that the halting problem is not applicable here, as every platform which is a compile target of every compiler in the real world has a maximum ammunt of data which it may access, it will therefore have a maximum number of states meaning it is in fact a finite state machine, not a turing machine. The halting problem is not insoluable for FSMs so any compiler in the real world can perform dead code detection. – Vality Oct 21 '15 at 23:32
50

@Vality 64-bit processors can address 2^64 bytes. Have fun searching all 256^(2^64) states! – Daniel Wagner Oct 22 '15 at 00:20
82

@DanielWagner This shouldn't be a problem. Searching `256^(2^64)` states is `O(1)`, so dead code detection can be done in polynomial time. – aebabis Oct 22 '15 at 03:12
3

@acbabis Even assuming that it's correct, for such a large number of states, even very small constant factors are enough to make calculation time so large as to render it impractical. – Leliel Oct 22 '15 at 03:56
13

@Leliel, that was sarcasm. – Paul Draper Oct 22 '15 at 04:37
44

@Vality: Most modern computers have disks, input devices, network communications, etc. Any complete analysis would have to consider all such devices - including, literally, the internet and everything hooked up to it. This isn't a tractable problem. – Nat Oct 22 '15 at 04:38
6

I think it's more important (and accurate) to say that the halting problem is reducible to dead code analysis (is the arbitrary code executed after the program would normally halt ever executed). As in the comments above any problem can be reduced to halting by a simple `if(problem)while(true);`. – ratchet freak Oct 22 '15 at 10:35
@ratchetfreak Yes, I mixed the order of the problems there. – RealSkeptic Oct 22 '15 at 10:40
1

Sometimes it's plain silly to apply basic theories to complex cases like this because modern systems are a lot more sophisticated than a FSM theorical model. The app (in this context) can depend on inputs from web services, dlls and other kinds of "black boxes" also the example given by RubberDuck shows how reflection can fool the compiler forcing the state to change in a non deterministic way – jean Oct 22 '15 at 12:04
2

@Vality I believe an LBA is a more correct description of real-world machines than an FSM (I'd prefer DFA, as a TM is also an FSM, it's just the tape that's infinite). Now, LBA halting is decidable, but I believe that not by an LBA. That is, whatever the length of the Decider LBA tape, there will be arbitrary LBAs with longer tapes it would not be able to decide. So basically - practical machines can't be decided by practical machines. – RealSkeptic Oct 22 '15 at 12:24
@RealSkeptic An LBA of length 2n + C can decide whether an LBA of length n halts by running tortoise-hare cycle detection: emulate two cycles of hare and one cycle of tortoise then compare them. – Damian Yerrick Oct 22 '15 at 16:21
@tepples Yes, but you can't use a *specific* LBA to decide arbitrary LBA. That is, you don't get to pick your decider LBA based on the input. – RealSkeptic Oct 22 '15 at 16:32
That's what the + C is for: to encode the emulated LBAs' action table in the same way that a [UTM](Universal Turing machine) would. – Damian Yerrick Oct 22 '15 at 16:41
@tepples - that's not the issue. The issue is that once you select your decider LBA, you can't pick an arbitrary length for its tape. It has what it has. If you decide to give it another memory chip, then well, there are still the LBAs with sizes bigger than that that it won't be able to decide. – RealSkeptic Oct 22 '15 at 20:15
What I'm saying is that for each LBA, you know how big of an LBA you need to decide it, and this bound is linear. Or to put it in pop-CS terms, a typical developer machine has at least twice as much RAM as an end user machine. – Damian Yerrick Oct 22 '15 at 20:53
Your proof that an algorithm that can find dead code is reducible to the halting problem has much hand waving and little substance. Care to provide a link to something that actually goes through the details? – Eric Oct 23 '15 at 00:42
2

@Eric: It's a simple problem, and ratchet freak's comment already provides an answer. Basically, if you want to do full dead-code analysis in the general case, you have to be able to determine `if (SomeHaltingProblem()) { ShouldThisBeEliminated(); }` removes `ShouldThisBeEliminated()`. This determination requires solving the halting problem in `SomeHaltingProblem()`. More generally, dead code is dead code because we can prove that it'll never be run; this proving can be as difficult as a halting problem since the flow of code can be determined by a method containing a halting problem. – Nat Oct 23 '15 at 02:47
@Vality: Halting problem is soluble for FSM? Citation? Considering Turing machines are FSMs processing a tape I don't think that statement is true universally. Or do you mean a subset of FSMs? Like DFA? – slebetman Oct 23 '15 at 02:58
3

@slebetman A Turing machine is not a FSM - it has infinite state because it has an infinite tape. Proving that the halting problem for finite state machines is decidable is trivial. You only have a finte number of states, so in order not to halt the FSM needs to go into a loop. Loop detection in general finite graphs is decidable, thus halting of FSM is decidable. – Taemyr Oct 23 '15 at 07:26
@Taemyr: The tape is the input. The read head, the machine, is the FSM. All CPU's are FSM reading in instruction stream. It's the instruction stream that's potentially infinite (well, not really, it's bounded by the number of electrons in the universe, but so are all physical Turing machines). An FSM has finite state but can potentially have infinite inputs – slebetman Oct 23 '15 at 07:44
@Taemyr: Maybe we're having some misunderstanding here but I'm looking at this from the perspective of a CPU designer rather than a language designer. If I were to build a Turing machine, I'd use an FSM (just like any other CPU) – slebetman Oct 23 '15 at 07:46
@slebetman https://en.wikipedia.org/wiki/Finite-state_machine . A Turing machine is a mathematical abstraction, you can't build one, because it has an infite tape. A Turing machine with a finite tape is an LBA, and halting for LBA is decidable, although you need a bigger LBA to actually determine halting. (That is, no LBA can determine the halting problem for LBA's of the same size) – Taemyr Oct 23 '15 at 07:56
@slebetman Sorry, I missed a point re. the tape as input and head as FSM. The tape is not input, you can write to it and you can navigate to different parts of it. So your head needs as part of it's state the position it's at, meaning there is an infite number of states. – Taemyr Oct 23 '15 at 07:58
The OP is correct, take any open problem in mathematics, e.g. are there any odd perfect numbers.. And then write a loop to check each number and print out an example when it finds it. If you could solve the "dead code", you could resolve the truthiness of any mathematical statement... – Stephen Oct 23 '15 at 15:24
If you write a program in Maude, you can know if it is going to end or not. It is possible to verify it. – Ricardo Oct 23 '15 at 16:35
4

@Ricardo there are a few possibilities: (1) The Maude system is not as powerful as a Turing machine. (2) The Maude system operates under space constraints that are lower than that of the verifying machine (LBA verified by larger LBA). (3) Verification only works for some programs, not all (4) Your statement is wrong. – RealSkeptic Oct 23 '15 at 18:56
@Taemyr: Given a FSM with states {A,B} and inputs {0,1}, where B is the terminal state and initial state A. Transition A->B on 1, and A->A on 0. You cannot guarantee that the input will not be all 0's. Now is state B reachable/does the machine halt? – ratiotile Oct 23 '15 at 19:30
@ratoile It halts for some inputs. This is perfectly decidable. And thus different from turing machines where this can be uncdecidable. – Taemyr Oct 24 '15 at 11:28
@richardo Halting of rewrite systems like Maude is undecidable. It might be possible to verify this, but no algorithm will be able to verify all systems. – Taemyr Oct 24 '15 at 11:30
@acbabis I just notice that number, 256^(2^64), do you know that this is much much bigger than the number of atoms in universe, and by the way the search problem is not O(1) – albanx Oct 27 '15 at 23:22
@albanx Start with a graph with `256^(2^64)` nodes and 0 edges, implemented with a hashmap. For each node, add a directed edge to the next state. This takes `256^(2^64)` operations. Then, for each node, determine whether the terminal state is the last node in it's chain. This takes no more than `256^(2^64)` operations per node, for a total of `256^(2^64) + (256^(2^64))^2` operations. This takes `O(1)` time and `O(1)` memory. – aebabis Oct 27 '15 at 23:38
hey @acbabis do you have an idea to what are you saying? does not exists any method to solve the search problem in O(1) time and memory. in your case you have n = 256^(2^64) with time n+n^2 so O(n^2)... and more halting problem cannot be resolved, you may have finite states but infinite inputs... – albanx Oct 28 '15 at 00:03
@albanx where did those ns come from? They dont vary with the size of the input they are constant for any input running on a 64 bit machine. Its 0n + 256^(2^64) which is just a constant and so O(1) – tobyodavies Oct 28 '15 at 03:29
ok maybe I am wrong, I do not remember university lecture of 8 years ago... but in any case the fact that number is finite cannot be consider O(1). search problem is not simple any compiler would time a huge(very huge) amount of time to search all states, real world compilers just test specific cases – albanx Oct 28 '15 at 20:46
@albanx It's a huge time, but it's constant time. That's the problem with O notation: everything is up to a constant. O(f(n)) means that the algorithm can be achieved in c*f(n) time. But c can be arbitrarily large - as long as it does not depend on the size of the input. This is why radix sort is considered O(n) for fixed-width comparables (e.g. `long`). But if you look closely, you'll realize that with a constant like 64*n (64 being the width of the comparable), it's not actually going to take less than n log(n) when log(n) <= 64. The constants suddenly need to be considered. – RealSkeptic Oct 28 '15 at 20:54

score 77 · Answer 2 · edited Oct 23 '15 at 14:26

Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!

C# program

using System;
using YourVendor.Compiler;

class Program
{
    static void Main(string[] args)
    {
        string quine_text = @"using System;
using YourVendor.Compiler;

class Program
{{
    static void Main(string[] args)
    {{
        string quine_text = @{0}{1}{0};
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {{
            System.Console.WriteLine({0}Dead code!{0});
        }}
    }}
}}";
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {
            System.Console.WriteLine("Dead code!");
        }
    }
}

If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.

But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.

So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.

If I've understood your argument correctly, then technically another option would be that it is not possible to write a quite which is a dead code detector, but it is possible to write a dead code detector in the general case. :-) — abligh, Oct 22 '15 at 13:51
@abligh Ugh, that was a bad choice of words. I am not actually feeding the dead-code detector's source code to itself, but the source code of the program that uses it. Surely, at some point it probably would have to look at its own code, but it's its business. — Joker_vD, Oct 22 '15 at 17:19

abligh · Answer 3 · 2015-10-25T10:13:21.940

65

If the halting problem is too obscure, think of it this way.

Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):

 for (BigInt n = 4; ; n+=2) {
     if (!isGoldbachsConjectureTrueFor(n)) {
         print("Conjecture is false for at least one value of n\n");
         exit(0);
     }
 }

Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n

Now, logically the above must either be the equivalent of:

 for (; ;) {
 }

(i.e. an infinite loop) or

print("Conjecture is false for at least one value of n\n");

as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:

 String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
 for (BigInt n = 0; n < 2**2048; n++) {
     String s = n.toString();
     if (sha256(s).equals(target)) {
         print("Found SHA value\n");
         exit(0);
     }
 }
 print("Not found SHA value\n");

we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.

edited Oct 25 '15 at 10:13

answered Oct 22 '15 at 12:42

abligh

23,144
3
41
81

4

It's the best answer by far +1 – jean Oct 22 '15 at 15:57
2

What makes things particularly interesting is the ambiguity about what the C Standard allows or doesn't allow when it comes to assuming that loops will terminate. There is value in allowing a compiler to defer slow calculations whose results may or may not be used until the point where their results would actually be needed; this optimization could in some cases be useful even if the compiler can't prove the calculations terminate. – supercat Oct 22 '15 at 21:19
2

2^2048 iterations? Even [Deep Thought](https://en.wikipedia.org/wiki/List_of_minor_The_Hitchhiker's_Guide_to_the_Galaxy_characters#Deep_Thought) would give up. – Peter Mortensen Oct 22 '15 at 22:57
It will print "Found SHA value" with very high probability, even if that target was a random string of 64 hex digits. Unless `sha256` returns a byte array and byte arrays don't compare equal to strings in your language. – user253751 Oct 23 '15 at 10:46
How would you prove the dead code optimizer is incorrect if it optimizes the code to `print("Found SHA value\n");` ? As immibis points out, this is _almost certainly_ correct, and the code fails to print `n`. – MSalters Oct 23 '15 at 13:28
5

`Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader` This made me chuckle. – biziclop Oct 23 '15 at 14:41
@immibis - well, I could increase the hash length so the probability of the SHA matching (with no prior knowledge of where it might have come from) would be roughly one in two. The argument "if the compiler omitted code A rather than code B as dead, you can't prove it is wrong" is (when either A or B must be dead) isomorphic to "you can't prove which can be omitted safely" which means neither can the compiler. "Probably dead code elimination" is another question entirely. – abligh Oct 23 '15 at 15:59

RubberDuck · Answer 4 · 2015-10-25T14:15:25.713

34

I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.

Dim methodName As String

If foo Then
    methodName = "Bar"
Else
    methodName = "Qux"
End If

Application.Run(methodName)

The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.

Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying

Application.Run("Bar")

Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.

edited Oct 25 '15 at 14:15

answered Oct 21 '15 at 23:28

RubberDuck

10,206
4
40
89

2

In Java (or C#), this could be done with reflection. C++ you could probably pull off some nastiness using macros to do it. Wouldn't be pretty, but C++ rarely is. – Darrel Hoffman Oct 22 '15 at 13:36
8

@DarrelHoffman - Macros are expanded before the code is given to the compiler, so macros definitely aren't how you would do this. Pointers to functions is how you would do this. I haven't used C++ in years so excuse me if my exact type names are wrong, but you can just store a map of strings to function pointers. Then have something that accepts a string from user input, looks up that string in the map, and then executes the function that is pointed at. – ArtOfWarfare Oct 22 '15 at 14:10
1

@ArtOfWarfare we're not talking about how it could be done. Obviously, semantic analysis of the code can be done to find this situation, the point was that the compiler *doesn't*. It could, possibly, maybe, but it doesn't. – RubberDuck Oct 22 '15 at 15:59
4

@ArtOfWarfare: If you want to nitpick, sure. I consider the preprocessor to be part of the compiler, though I know it technically isn't. Anyhow, function pointers might break the rule that the functions are not directly referenced anywhere - they are, just as a pointer instead of a direct call, much like a delegate in C#. C++ is in general much more difficult for a compiler to predict since it has so many ways of doing things indirectly. Even tasks as simple as "find all references" aren't trivial, as they can hide in typedefs, macros, etc. No surprise it can't find dead code easily. – Darrel Hoffman Oct 22 '15 at 17:20
AFAIK the C++ standard doesn't specify a way to call a function by name, so it's implementation dependent; but if you couldn't call a function by name, then you couldn't use a function from a shared library. I.e. C on a typical UNIX like could call a function like this: #include #include void sayhello(void) { puts("hello world"); } int main () { ((void(*)())dlsym(dlopen(0,0), "sayhello"))(); } – Steve Sanbeg Oct 22 '15 at 20:35
@DarrelHoffman that's a good point. I considered mentioning events (which are really very much like .Net's delegates in how they function), but didn't quite have enough time. They also make the problem much harder. These are the two big reasons why we want (need) to implement an `@ignore warning` annotation for our code analysis plug-in. – RubberDuck Oct 22 '15 at 23:42
1

You don't even need dynamic method calls to face this problem. Any public method can be called by a not-yet-written function that will depend on the already compiled class in Java or C# or any other compiled language with some mechanism for dynamic linking. If compilers eliminated these as "dead code," then we wouldn't be able to package precompiled libraries for distribution (NuGet, jars, Python wheels with binary component). – jpmc26 Oct 23 '15 at 06:22
The standard OO construct here is `Object O = (foo()) ? new A : new B; O.bar()`. Depending on the value returned by `foo`()`, this executes either `A.bar` or `B.bar`. In turn, this means that a dead code analyzer must prove that `foo` can return both `true` and `fals`. Also, neither method executes if `foo` doesn't return. – MSalters Oct 23 '15 at 13:23
1

Java can (with classloader). C++ cannot in itself. But shared libraries, in effect, allow for the same functionality. So it's a matter what your C++ compiles into. If it compiles into something which has a runtime that allows loading of shared libraries (as both \*nix and Win do), then the calling convention can be by name. In fact, you can even have your program *generate* code, compile it and then load it as a shared library within the same runtime of a program. – Dmitry Rubanovich Oct 24 '15 at 00:55

Alex Lop. · Answer 5 · 2015-10-22T20:32:13.397

12

A simple example:

int readValueFromPort(const unsigned int portNum);

int x = readValueFromPort(0x100); // just an example, nothing meaningful
if (x < 2)
{
    std::cout << "Hey! X < 2" << std::endl;
}
else
{
    std::cout << "X is too big!" << std::endl;
}

Now assume that the port 0x100 is designed to return only 0 or 1. In that case the compiler cannot figure out that the else block will never be executed.

However in this basic example:

bool boolVal = /*anything boolean*/;

if (boolVal)
{
  // Do A
}
else if (!boolVal)
{
  // Do B
}
else
{
  // Do C
}

Here the compiler can calculate out the the else block is a dead code. So the compiler can warn about the dead code only if it has enough data to to figure out the dead code and also it should know how to apply that data in order to figure out if the given block is a dead code.

EDIT

Sometimes the data is just not available at the compilation time:

// File a.cpp
bool boolMethod();

bool boolVal = boolMethod();

if (boolVal)
{
  // Do A
}
else
{
  // Do B
}

//............
// File b.cpp
bool boolMethod()
{
    return true;
}

While compiling a.cpp the compiler cannot know that boolMethod always returns true.

edited Oct 22 '15 at 20:32

answered Oct 21 '15 at 18:29

Alex Lop.

6,640
1
24
43

1

While strictly true that the *compiler* doesn't know, I think it is in the spirit of the question to also ask whether the *linker* can know. – Casey Kuball Oct 22 '15 at 16:38
1

@Darthfett It is not the *linker*s responsibility. Linker doesn't analyze the content of the compiled code. The linker (generally speaking) just links the methods and the global data, it doesn't care about the content. However some compilers do have the option to concatenate the source files (like ICC) and then perform the optimization. In such case the case under **EDIT** is covered but this option will effect the compilation time especially when the project is large. – Alex Lop. Oct 22 '15 at 16:48
This answer seems misleading to me; you're giving two examples where it isn't possible because not all information is available, but shouldn't you say that it's impossible even if the information is there? – Anton Golov Oct 22 '15 at 18:39
@AntonGolovIt os not always true. In many cases when the information is there, the compilers can detect the dead code and optimize it out. – Alex Lop. Oct 22 '15 at 18:52
@abforce just a block of code. It could have been anything else. :) – Alex Lop. Oct 22 '15 at 20:22
@AlexLop. I've usually seen `count << "Hey";` not `count >> "Hey"`. Is it correct at all? – frogatto Oct 22 '15 at 20:29
@abfotce Nice catch! I will edit it. But I believe you meant `cout` and not `count`. – Alex Lop. Oct 22 '15 at 20:31
@AlexLop. Yeah, was a typo. – frogatto Oct 22 '15 at 20:32
@AlexLop. But the point is that even when all information is there, not all cases can be solved. Of course some cases can be covered, but it can't be done in general. – Anton Golov Oct 23 '15 at 07:20
@AntonGolov You are right. Even when all the data is given, it is *not always* possible to detect the dead code. My point was that even when the case seems to be solvable, the way the code is designed/written may prevent from compiler to detect it. – Alex Lop. Oct 23 '15 at 07:59

score 12 · Answer 6 · answered Oct 21 '15 at 18:54

12

Unconditional dead code can be detected and removed by advanced compilers.

But there is also conditional dead code. That is code that cannot be known at the time of compilation and can only be detected during runtime. For example, a software may be configurable to include or exclude certain features depending on user preference, making certain sections of code seemingly dead in particular scenarios. That is not be real dead code.

There are specific tools that can do testing, resolve dependencies, remove conditional dead code and recombine the useful code at runtime for efficiency. This is called dynamic dead code elimination. But as you can see it is beyond the scope of compilers.

answered Oct 21 '15 at 18:54

dspfnder

1,049
1
8
12

5

"Unconditional dead code can be detected and removed by advanced compilers." This does not seem likely. Code deadness can depend on the outcome of a given function, and that given function can solve arbitrary problems. So your statement asserts that advanced compilers can solve arbitrary problems. – Taemyr Oct 23 '15 at 11:24
6

@Taemyr Then it wouldn't be known to be unconditionally dead, now would it? – JAB Oct 23 '15 at 14:41
1

@Taemyr You seem to misunderstand the word "unconditional." If the code deadness depends on the outcome of a function, then it is conditional dead code. The "condition" being the outcome of the function. To be "unconditional" it would have to *not* depend on any outcome. – Kyeotic Oct 23 '15 at 21:40

score 4 · Answer 7 · answered Oct 21 '15 at 18:37

The compiler will always lack some context information. E.g. you might know, that a double value never exeeds 2, because that is a feature of the mathematical function, you use from a library. The compiler does not even see the code in the library, and it can never know all features of all mathematical functions, and detect all weired and complicated ways to implement them.

score 4 · Answer 8 · edited Oct 22 '15 at 23:01

4

The compiler doesn't necessarily see the whole program. I could have a program that calls a shared library, which calls back into a function in my program which isn't called directly.

So a function which is dead with respect to the library it's compiled against could become alive if that library was changed at runtime.

edited Oct 22 '15 at 23:01

Peter Mortensen

28,342
21
95
123

answered Oct 22 '15 at 20:43

Steve Sanbeg

887
4
7

biziclop · Answer 9 · 2015-10-23T15:07:09.403

If a compiler could eliminate all dead code accurately, it would be called an interpreter.

Consider this simple scenario:

if (my_func()) {
  am_i_dead();
}

my_func() can contain arbitrary code and in order for the compiler to determine whether it returns true or false, it will either have to run the code or do something that is functionally equivalent to running the code.

The idea of a compiler is that it only performs a partial analysis of the code, thus simplifying the job of a separate running environment. If you perform a full analysis, that isn't a compiler any more.

If you consider the compiler as a function c(), where c(source)=compiled code, and the running environment as r(), where r(compiled code)=program output, then to determine the output for any source code you have to compute the value of r(c(source code)). If calculating c() requires the knowledge of the value of r(c()) for any input, there is no need for a separate r() and c(): you can just derive a function i() from c() such that i(source)=program output.

score 2 · Answer 10 · edited Oct 23 '15 at 12:44

2

Take a function

void DoSomeAction(int actnumber) 
{
    switch(actnumber) 
    {
        case 1: Action1(); break;
        case 2: Action2(); break;
        case 3: Action3(); break;
    }
}

Can you prove that actnumber will never be 2 so that Action2() is never called...?

edited Oct 23 '15 at 12:44

Willi Mentzel

21,499
16
88
101

answered Oct 22 '15 at 11:03

CiaPan

8,142
2
18
32

7

If you can analyse the callers of the function, then you may be able to, yes. – abligh Oct 22 '15 at 13:42
2

@abligh But compiler usually can't analyse all the calling code. Anyway even if it could, the full analysis might require just a simulation of all possible control flows, which is almost always just impossible due to resources and time needed. So even if theoretically there *exists* a proof that '`Action2()` will never be called' it is impossible to prove the claim in practice — **can't be fully solved by a compiler**. The difference is like 'there exists a number X' vs. 'we can write the number X in decimal'. For some X's the latter will never happen although the former is true. – CiaPan Oct 23 '15 at 12:58
This is a poor answer. the other answers **prove** that it's impossible to know whether `actnumber==2`. This answer merely claims it's hard without even stating a complexity. – MSalters Oct 26 '15 at 09:45

score 2 · Answer 11 · answered Oct 22 '15 at 12:40

Others have commented on the halting problem and so forth. These generally apply to portions of functions. However it can be hard/impossible to know whether even an entire type (class/etc) is used or not.

In .NET/Java/JavaScript and other runtime driven environments there's nothing stopping types being loaded via reflection. This is popular with dependency injection frameworks, and is even harder to reason about in the face of deserialisation or dynamic module loading.

The compiler cannot know whether such types would be loaded. Their names could come from external config files at runtime.

You might like to search around for tree shaking which is a common term for tools that attempt to safely remove unused subgraphs of code.

I don't know about Java, and javascript, but .NET actually has a resharper plugin for that kind of DI detection (called Agent Mulder). Of course, it won't be able to detect configuration files, but it is able to detect confit in code (which is much more popular). — Ties, Oct 28 '15 at 21:07

score 1 · Answer 12 · edited Oct 23 '15 at 12:46

1

I disagree about the halting problem. I wouldn't call such code dead even though in reality it will never be reached.

Instead, lets consider:

for (int N = 3;;N++)
  for (int A = 2; A < int.MaxValue; A++)
    for (int B = 2; B < int.MaxValue; B++)
    {
      int Square = Math.Pow(A, N) + Math.Pow(B, N);
      float Test = Math.Sqrt(Square);
      if (Test == Math.Trunc(Test))
        FermatWasWrong();
    }

private void FermatWasWrong()
{
  Press.Announce("Fermat was wrong!");
  Nobel.Claim();
}

(Ignore the type and overflow errors) Dead code?

edited Oct 23 '15 at 12:46

Willi Mentzel

21,499
16
88
101

answered Oct 22 '15 at 23:34

Loren Pechtel

8,549
3
27
45

2

Fermat's last theorem was proven in 1994. So a correct implementation of your method would never run FermatWasWrong. I suspect your implementation will run FermatWasWrong, because you can hit the limit of precision of floats. – Taemyr Oct 23 '15 at 07:31
@Taemyr Aha! This program does not correctly test Fermat's Last Theorem; a counterexample for what it does test is N=3, A=65536, B=65536 (which yields Test=0) – user253751 Oct 23 '15 at 10:51
@immibis Yes, I missed that it will overflow int before precision on the floats becoming an issue. – Taemyr Oct 23 '15 at 11:11
@immibis Note the bottom of my post: Ignore the type and overflow errors. I was just taking what I thought was an unsolved problem as the basis of a decision--I know the code isn't perfect. It's a problem that can't be brute-forced anyway. – Loren Pechtel Oct 23 '15 at 17:15

user · Answer 13 · 2015-10-23T18:00:38.350

-1

Look at this example:

public boolean isEven(int i){

    if(i % 2 == 0)
        return true;
    if(i % 2 == 1)
        return false;
    return false;
}

The compiler can't know that an int can only be even or odd. Therefore the compiler must be able to understand the semantics of your code. How should this be implemented? The compiler can't ensure that the lowest return will never be executed. Therefore the compiler can't detect the dead code.

edited Oct 23 '15 at 18:00

answered Oct 21 '15 at 18:20

user

715
1
6
18

1

Umm, really? If I write that in C# + ReSharper I get a couple of hints. Following them finally gives me the code `return i%2==0;`. – Thomas Weller Oct 21 '15 at 21:37
10

Your example is too simple to be convincing. The specific case of `i % 2 == 0` and `i % 2 != 0` doesn't even require reasoning about the value of an integer modulo a constant (which is still easy to do), it only requires common subexpression elimination and the general principle (canonicalization, even) that `if (cond) foo; if (!cond) bar;` can be simplified to `if (cond) foo; else bar;`. Of course "understanding semantics" is a very hard problem, but this post neither shows that it is, nor shows that solving this hard problem is necessary for dead code detection. – Oct 21 '15 at 21:39
5

In your example, an optimizing compiler will spot the common subexpression `i % 2` and pull it out into a temporary variable. It will then recognize that the two `if` statements are mutually exclusive and can be written as `if(a==0)...else...`, and then spot that all possible execution paths go through the first two `return` statements and therefore the third `return` statement is dead code. (A *good* optimizing compiler is even more aggressive: GCC turned my test code into a pair of bit-manipulation operations). – Mark Oct 21 '15 at 23:16
1

This example is good for me. It represents the case when a compiler does not know about some factual circunstances. The same goes for `if (availableMemory()<0) then {dead code}`. – Little Santi Oct 22 '15 at 12:25
@LittleSanti the problem with the example is that the compiler doesn't need to know the factual cirumstance. It needs to know that (a != b) is the same as !(a==b), which, since we are dealing with ints, is a part of the language syntax. To accurately represent the requirement that relevant facts be known (i%2 != 0) should be changed to (i%2 == 1). It's still easy though, because %2 is a bit shift operation. Having three cases and doing modulo 3 would be a bit tricker. – Taemyr Oct 23 '15 at 11:16
@LittleSanti: It's fairly obvious that you cannot decide whether a system contains dead code by looking at only part of the system., if the execution flow in the observer part depends on the unobserved part. Heck, a compiler can't even prove that `availableMemory()` even returns. (A trivial case of execution flow being influenced) – MSalters Oct 23 '15 at 13:35
@Taemyr I assumed this example wanted to demonstrate what are factual circunstances in a simple manner. OK, let it calculate possible modulus of 2: `if (x % 2 ==2) { dead code }`. Or some simple arithmetic: `int x=...; if ( (2*x) % 2 ==1) {dead code}`. At the end, you'll find that a compiler must get a math degree to identify dead code. – Little Santi Oct 23 '15 at 13:58
@MSalters `availableMemory` was an example of something us programmers know best than the program itself. Let it be: `int i=1;while (2*i>i) {i++;} {dead code}`. I doubt a common compiler would spot that. – Little Santi Oct 23 '15 at 14:02
1

@LittleSanti: Actually, GCC will detect that _everything_ you wrote there is dead code ! It's not just the `{dead code}` part. GCC discovers this by proving there's an unavoidable signed integer overflow. All code on that arc in the execution graph is therefore dead code. GCC can even remove the conditional branch that leads to that arc. – MSalters Oct 23 '15 at 14:10
@LittleSanti I think you are correct in that the example wanted to demonstrat these factual circumstances. It's just that by using the not equal opertor he fails give such an example. As I already stated his point would be better made if he instead of != 0 had said ==1. – Taemyr Oct 23 '15 at 14:24
1

The idea is good but the example is too simple. Use something like this instead: `if (n has an odd number of divisors) return true; if (sqrt(n) is not an integer) return false;` This makes code following it dead for all positive integers, but I doubt there's a compiler on Earth that could reason that out. And this still doesn't prove theoretical impossibility, only practical impossibility: such a compiler would have to be so complex that it would be impractical for the language. – biziclop Oct 23 '15 at 14:45
@biziclop: Hey, who the hell are you? :) – biziclop Oct 25 '15 at 09:12

Why can't dead code detection be fully solved by a compiler?

13 Answers13