77

After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?

If not, what does it lack to not qualify?

Brian Tompsett - 汤莱恩
  • 5,195
  • 62
  • 50
  • 120
Anycorn
  • 46,748
  • 41
  • 153
  • 250
  • 8
    The missing thing in CPP for turing completeness is essentially recursion, since it can't loop without it (and really has a fairly limited conditional since you can't conditionally expand portions of a macro) – Spudd86 Jun 29 '10 at 16:18

4 Answers4

147

Well macros don't directly expand recursively, but there are ways we can work around this.

The easiest way of doing recursion in the preprocessor is to use a deferred expression. A deferred expression is an expression that requires more scans to fully expand:

#define EMPTY()
#define DEFER(id) id EMPTY()
#define OBSTRUCT(...) __VA_ARGS__ DEFER(EMPTY)()
#define EXPAND(...) __VA_ARGS__

#define A() 123
A() // Expands to 123
DEFER(A)() // Expands to A () because it requires one more scan to fully expand
EXPAND(DEFER(A)()) // Expands to 123, because the EXPAND macro forces another scan

Why is this important? Well when a macro is scanned and expanding, it creates a disabling context. This disabling context will cause a token, that refers to the currently expanding macro, to be painted blue. Thus, once its painted blue, the macro will no longer expand. This is why macros don't expand recursively. However, a disabling context only exists during one scan, so by deferring an expansion we can prevent our macros from becoming painted blue. We will just need to apply more scans to the expression. We can do that using this EVAL macro:

#define EVAL(...)  EVAL1(EVAL1(EVAL1(__VA_ARGS__)))
#define EVAL1(...) EVAL2(EVAL2(EVAL2(__VA_ARGS__)))
#define EVAL2(...) EVAL3(EVAL3(EVAL3(__VA_ARGS__)))
#define EVAL3(...) EVAL4(EVAL4(EVAL4(__VA_ARGS__)))
#define EVAL4(...) EVAL5(EVAL5(EVAL5(__VA_ARGS__)))
#define EVAL5(...) __VA_ARGS__

Now if we want to implement a REPEAT macro using recursion, first we need some increment and decrement operators to handle state:

#define CAT(a, ...) PRIMITIVE_CAT(a, __VA_ARGS__)
#define PRIMITIVE_CAT(a, ...) a ## __VA_ARGS__

#define INC(x) PRIMITIVE_CAT(INC_, x)
#define INC_0 1
#define INC_1 2
#define INC_2 3
#define INC_3 4
#define INC_4 5
#define INC_5 6
#define INC_6 7
#define INC_7 8
#define INC_8 9
#define INC_9 9

#define DEC(x) PRIMITIVE_CAT(DEC_, x)
#define DEC_0 0
#define DEC_1 0
#define DEC_2 1
#define DEC_3 2
#define DEC_4 3
#define DEC_5 4
#define DEC_6 5
#define DEC_7 6
#define DEC_8 7
#define DEC_9 8

Next we need a few more macros to do logic:

#define CHECK_N(x, n, ...) n
#define CHECK(...) CHECK_N(__VA_ARGS__, 0,)

#define NOT(x) CHECK(PRIMITIVE_CAT(NOT_, x))
#define NOT_0 ~, 1,

#define COMPL(b) PRIMITIVE_CAT(COMPL_, b)
#define COMPL_0 1
#define COMPL_1 0

#define BOOL(x) COMPL(NOT(x))

#define IIF(c) PRIMITIVE_CAT(IIF_, c)
#define IIF_0(t, ...) __VA_ARGS__
#define IIF_1(t, ...) t

#define IF(c) IIF(BOOL(c))

#define EAT(...)
#define EXPAND(...) __VA_ARGS__
#define WHEN(c) IF(c)(EXPAND, EAT)

Now with all these macros we can write a recursive REPEAT macro. We use a REPEAT_INDIRECT macro to refer back to itself recursively. This prevents the macro from being painted blue, since it will expand on a different scan(and using a different disabling context). We use OBSTRUCT here, which will defer the expansion twice. This is necessary because the conditional WHEN applies one scan already.

#define REPEAT(count, macro, ...) \
    WHEN(count) \
    ( \
        OBSTRUCT(REPEAT_INDIRECT) () \
        ( \
            DEC(count), macro, __VA_ARGS__ \
        ) \
        OBSTRUCT(macro) \
        ( \
            DEC(count), __VA_ARGS__ \
        ) \
    )
#define REPEAT_INDIRECT() REPEAT

//An example of using this macro
#define M(i, _) i
EVAL(REPEAT(8, M, ~)) // 0 1 2 3 4 5 6 7

Now this example is limited to 10 repeats, because of limitations of the counter. Just like a repeat counter in a computer would be limited by the finite memory. Multiple repeat counters could be combined together to workaround this limitation, just like in a computer. Furthermore, we could define a FOREVER macro:

#define FOREVER() \
    ? \
    DEFER(FOREVER_INDIRECT) () ()
#define FOREVER_INDIRECT() FOREVER
// Outputs question marks forever
EVAL(FOREVER())

This will try to output ? forever, but will eventually stop because there are no more scans being applied. Now the question is, if we gave it an infinite number of scans would this algorithm complete? This is known as the halting problem, and Turing completeness is necessary to prove the undecidability of the halting problem. So as you can see, the preprocessor can act as a Turing complete language, but instead of being limited to the finite memory of a computer it is instead limited by the finite number of scans applied.

stgatilov
  • 4,973
  • 26
  • 44
Paul Fultz II
  • 16,112
  • 11
  • 51
  • 58
  • 9
    ...Wow. Very impressive! Here I thought that the C99 preprocessor was definetely not turing complete.. +1 for thinking out of the box – Earlz Aug 11 '12 at 02:36
  • 1
    +1 Quite a creative way to show that preprocessor *can* scan symbols on a tape ;-) (Thanks to the mod for accepting the flagging to remove the wiki!). – P.P Feb 12 '14 at 21:55
  • 1
    I like how this uses O(log(N)) macros to recurse N times. This is better than Boost Preprocessor which uses O(N) macros. – qbt937 Jan 03 '16 at 00:16
  • 1
    @qbt937 The `EVAL` macro applies over 250 scans to the algorithm even it is not needed. Whereas Boost.PP only uses the number of scans that is needed by the algorithm, so Boost.PP is much more efficient. – Paul Fultz II Jan 14 '16 at 00:10
  • @Paul I guess that it is a sort of a macro source size vs speed trade off. This does make me wonder if there is a way to get both log(N) macro size and Boost.PP's reduced number of scans. Maybe it would be possible to quit at the first power of two number of scans after the algorithm finishes, so it would only have up to N extra scans. – qbt937 Jan 14 '16 at 06:33
  • Paul, without checking the details of how you did the ``for(;;) {}`` loop, I can say that you cannot do an infinite loop in cpp, simply because the maximum number of iterations depend on the number of identifiers inside the input file, whatever you do. You should create new symbols inside the eval(), but it is not possible to create an infinite number of symbols... – alinsoar May 27 '16 at 18:27
  • I am confused as to why this qualifies as Turing completeness. It seems we are guaranteed that the program terminates because there are only a finite number of scans. If you could force infinite scans, then it would be Turing complete, but you can't force infinite scans. – Justin Raymond Jul 31 '16 at 13:59
  • The algorithm is not guaranteed to terminate, however, the program is guaranteed to always terminate just like recursion is always guaranteed to terminate in C due to stack overflow. The difference comes from the "machinery", which is why the PP **acts** as a Turing complete language whereas C **is** a Turing complete language. – Paul Fultz II Aug 01 '16 at 11:08
  • 1
    This does *not* establish Turing completeness. The essential limitation is the need to fix the number of scans being done. – reinierpost Aug 20 '17 at 10:46
  • 6
    The limited number of scans is analogous is *not* analogous to a computer having limited memory. The essence of Turing completeness is that the specification of the computations itself carries *no* limitation, even when it is actually made to run on a machine that does have limited capacity. – reinierpost Aug 20 '17 at 10:48
  • I tested it. The FOREVER() is not infinite, for me it displays only 365 `?`. This function is not mu-recusive operator but sigma-recursive. The only way to try and make it turing-complete is by defining an infinite number of macros, as I explained in my answer. C preprocessor does not allow such a feature. – alinsoar May 24 '19 at 04:08
  • @alinsoar The [Ackerman](https://en.wikipedia.org/wiki/Ackermann_function) function is considered mu-recursive and not sigma-recursive and can be implemented in the preprocessor: https://gist.github.com/pfultz2/80391e8b18abf3225da2242dcc570cec – Paul Fultz II Jun 04 '19 at 01:42
  • 1
    And interestingly enough, the Ackerman function in the PP will usually hit a stack overflow before running out of scans. – Paul Fultz II Jun 04 '19 at 01:47
  • 1
    @Paul Fultz II It is because you know a priori about this function that it is going to finish, it was proved. So you need limited time to finish. The mu recursive functions , like searching for a solution of an arbitrary equation, cannot be computed by cpp. I said that cpp cannot compute the class of mu recursive, not some particular example of function about you know already lots of things. – alinsoar Jun 04 '19 at 02:07
37

Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.

From the description of the afore-linked project:

the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)

The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:

  • 63 nesting levels of parenthesized expressions within a full expression
  • 63 significant initial characters in an internal identifier or a macro name
  • 4095 macro identifiers simultaneously defined in one preprocessing translation unit
  • 4095 characters in a logical source line

The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~)) would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.

This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.

bta
  • 39,855
  • 5
  • 67
  • 92
  • 12
    This answer is incorrect, as the 77 point answer below it shows extensively. Please unaccept it and accept the more useful answer, thank you. – nightpool Oct 12 '15 at 14:25
  • If you mean the 115 point answer by Paul Fultz II below: it confirms this answer. – reinierpost Aug 20 '17 at 10:56
  • 2
    The limit is in the language itself, but not because of the spec, but because we must write the scans to evaluate the algorithm in the language itself, which there is no mechanism to apply an infinite number of scans. – Paul Fultz II Oct 18 '17 at 23:26
  • To add to the lack of recursiveness, that seems to be (under)specified by the standard. [§6.10.3.5](http://port70.net/~nsz/c/c11/n1570.html#6.10.3.4p4) paragraph 4 shows an example of two macros calling each other and states: *"There are cases where it is not clear whether a replacement is nested or not [...]. Strictly conforming programs are not permitted to depend on such unspecified behavior. "* – Luiz Martins Mar 13 '21 at 15:51
9

To be Turing complete, one needs to define recursion that may never finish -- one calls them mu-recursive operator.

To define such an operator one needs an infinite space of defined identifiers (in case that each identifier is evaluated a finite number of times), as one cannot know a priori an upper limit of time in which the result is found. With a finite number of operators inside the code one needs to be able to check an unlimited number of possibilities.

So this class of functions cannot be computed by the C preprocessor because in C preprocessor there is a limited number of defined macros and each one is expanded only once.

The C preprocessor uses the Dave Prosser's algorithm (written by Dave Prosser for the WG14 team in 1984). In this algorithm a macro is painted blue in the moment of the first expansion; a recursive call (or mutual recursive call) does not expand it, as it has already been painted blue in the moment when the first expansion starts. So with a finite number of preprocessing lines it is impossible to make infinite calls of functions(macros), which characterizes the mu-recursive operators.

The C preprocessor can compute only sigma-recursive operators .

For details see the course of computation of Marvin L. Minsky (1967) -- Computation: Finite and Infinite Machines, Prentice-Hall, Inc. Englewood Cliffs, N.J. etc.

alinsoar
  • 13,197
  • 4
  • 45
  • 63
  • The Ackerman function, which is mu-recursive only, can be implemented in the PP, so the C preprocessor is not limited to sigma-recursive operators only: https://gist.github.com/pfultz2/80391e8b18abf3225da2242dcc570cec – Paul Fultz II Jun 04 '19 at 01:44
  • 2
    As I said in the other comment, it's a difference between what you did, to check a large (but limited) number of inputs and to search an infinte number. The Ackerman function is known to finish, this is why cpp can find its value. It is impossible to compute _any_ mu operator and make cpp turing complete. – alinsoar Jun 04 '19 at 06:47
  • @PaulFultzII You need to modify your code in order to check for a larger space of solutions, while a mu-operator is fixed and will check for an infinite space (as large as the hardware resources allow). – alinsoar Jun 04 '19 at 06:53
  • The algorithm(ie the `A` macro) does not need to be modified, only the evaluation needs to be updated to add more scans. – Paul Fultz II Jun 04 '19 at 17:32
5

It's Turing complete within limits (as are all computers since they don't have infinite RAM). Check out the kinds of things you can do with Boost Preprocessor.

Edit in response to question edits:

The main limitation on Boost is the maximum macro expansion depth which is compiler-specific. Also, the macros that implement recursion (FOR..., ENUM..., etc.) aren't truly recursive, they just appear that way thanks to a bunch of near-identical macros. In the big picture, this limitation is no different than having a maximum stack size in an actually recursive language.

The only two things that are really necessary for limited Turing-completeness (Turing-compatibility?) are iteration/recursion (equivalent constructs) and conditional branching.

Cogwheel
  • 20,820
  • 4
  • 42
  • 67
  • hi. That was actually what prompted my question, I have been using preprocessor for a while. – Anycorn Jun 28 '10 at 22:54
  • digging around in BOOST_PP's source code is the best way to figure out how it's done. – Cogwheel Jun 28 '10 at 22:55
  • 14
    I *believe* that macros can't do recursion. Boost just seems to simulate them by having hardcoded macros named like `macro0`, `macro1` .. `macro255`. I'm not sure whether that counts as "turing complete". The preprocessor has an explicit rule that forbids going from `macro255` back to `macro0` :( It seems like trying to build a verifyer for fully parenthesized expressions using a finite state automaton. It can work for a limited number of parentheses, but that's not a general verifyer anymore. I've no clue about boost.pp inner workings though, so i could likely be wrong on this. – Johannes Schaub - litb Jun 28 '10 at 23:23
  • @Johannes Schaub: yes you're right. I had confused that with vararg when i wrote this initially. I updated the answer. – Cogwheel Jun 28 '10 at 23:30
  • 5
    @Johannes: The chaos preprocessor doesn't have any macros like that. Check it out here: http://sourceforge.net/projects/chaos-pp/ – Joe D Aug 24 '10 at 21:01
  • Or you can take a look into [M4 preprocessor](http://en.wikipedia.org/wiki/M4_(computer_language)) which supports recursion :-) – Agnius Vasiliauskas Sep 23 '11 at 12:36
  • 1
    To the reader: be aware that *Turing complete within limits* means *not Turing complete*. – reinierpost Aug 20 '17 at 10:49
  • @reinierpost by that logic, nothing is turing complete, because there are always finite resources – Retr0id Apr 09 '21 at 17:20
  • @Retr0id No. Turing completeness is a property of computational systems. Such systems describe an abstraction of reality. Programming languages do that. When I write `while :; do echo 1; done` in my Linux shell, I've specified an *unbounded* loop. It is *actually* unbounded, as far as that programming language is concerned. The fact that all such programs must run in an environment that is finite and will at some point stop any such loop, is irrelevant. We're not discussing that environment, we're only discussing the language itself. – reinierpost Apr 09 '21 at 18:38