3

I have a basic function to parse a lisp expression. It's using a while loop, but as an exercise I'd like to convert it into a recursive function. However, it's a bit tricky for me to do. Here is what I have thus far:

def build_ast(self, tokens=None):
    # next two lines example input to make self-contained
    LEFT_PAREN, RIGHT_PAREN = '(', ')'
    tokens = ['(', '+', '2', '(', '*', '3', '4', ')', ')']
    while RIGHT_PAREN in tokens:
        right_idx = tokens.index(RIGHT_PAREN)
        left_idx = right_idx - tokens[:right_idx][::-1].index(LEFT_PAREN)-1
        extraction = [tokens[left_idx+1:right_idx],]
        tokens = tokens[:left_idx] + extraction + tokens[right_idx+1:]
    ast = tokens
    return ast

And so it would parse something like this:

(+ 2 (* 3 4))

Into this:

[['+', '2', ['*', '3', '4']]]

What would be an example of how I could make the above function recursive? So far I've started with something like:

def build_ast(self, ast=None):
    if ast is None: ast=self.lexed_tokens
    if RIGHT_PAREN not in ast:
        return ast
    else:
        right_idx = ast.index(RIGHT_PAREN)
        left_idx = right_idx - ast[:right_idx][::-1].index(LEFT_PAREN)-1
        ast = ast[:left_idx] + [ast[left_idx+1:right_idx],] + ast[right_idx+1:]
        return self.build_ast(ast)

But it just comes across as a bit strange (as if the recursion isn't helpful here). What would be a better way to construct this? Or perhaps a better/more elegant algorithm to build this simple ast?

David542
  • 96,524
  • 132
  • 375
  • 637
  • 1
    See my stack overflow answer on how to build recursive descent parsers: https://stackoverflow.com/a/2336769/120163 – Ira Baxter Mar 31 '21 at 03:38

3 Answers3

3

You can use a recursive generator function:

def _build_ast(tokens):
   LEFT_PAREN, RIGHT_PAREN = '(', ')'
   #consume the iterator until it is empty or a right paren occurs
   while (n:=next(tokens, None)) is not None and n != RIGHT_PAREN:
      #recursively call _build_ast if we encounter a left paren
      yield n if n != LEFT_PAREN else list(_build_ast(tokens))
   

def build_ast(tokens):
   #pass tokens as an iterator to _build_ast
   return list(_build_ast(iter(tokens)))

tokens = ['(', '+', '2', '(', '*', '3', '4', ')', ')']
print(build_ast(tokens))

Output:

[['+', '2', ['*', '3', '4']]]
Ajax1234
  • 58,711
  • 7
  • 46
  • 83
  • wow, that is so concise and cool. Could you provide a link or so so I can learn up a bit on that? Is `n:=` a new thing in python, I've never seen that before? Also, want to add a few comments into the code to show what it's doing and all? – David542 Mar 28 '21 at 04:59
  • 2
    @David542 Please see my recent edit, as I added several comments to clarify the process. `n:=` is an assignment expression, recently introduced in Python versions >= 3.8. See more here: https://stackoverflow.com/questions/50297704/syntax-and-assignment-expressions-what-and-why – Ajax1234 Mar 28 '21 at 15:29
  • thanks for the update! One question: I posted an answer of my own at the bottom, what's the difference between (1) using the `next` instead of `pop`; and then (2) using `yield` instead of just `return` in this case? – David542 Mar 31 '21 at 01:24
  • @David542 The recursive paradigm is the same, and both methods rely on reference to properly consume the tokens. `yield`ing, however, is in my view a cleaner way to handle cases where the goal is to collect individual results, produced in an iterative process, into a container. – Ajax1234 Mar 31 '21 at 01:49
2

Similar to the other answer, I would pass to the recursive function the token that will end the current expression. This usually is the closing parenthesis, but for the very first call, it will be the end-of-input (None).

def build_ast(tokens):
    LEFT_PAREN, RIGHT_PAREN = '(', ')'
    it = iter(tokens)  # Iterator over the input
    
    # Recursive (generator) function that processes tokens until the close 
    #   of the expression, i.e until the given token is encountered
    def recur(until=RIGHT_PAREN):
        # Keep processing tokens until closing token is encountered
        while (token := next(it, None)) != until:
            # If parenthesis opens, recur and convert to list
            #    otherwise just yield the token as-is
            yield list(recur()) if token == LEFT_PAREN else token

    # Main recursive call: process until end of input (i.e. until None)
    return list(recur(None))

Call as:

ast = build_ast(['(', '+', '2', '(', '*', '3', '4', ')', ')'])
trincot
  • 211,288
  • 25
  • 175
  • 211
  • is `next(it, None))` a way to do `tokens.pop()` without the IndexError when it hits the end? – David542 Mar 31 '21 at 01:08
  • 2
    Not really. `pop` (without argument) removes the last element from the end of the list, while `next(it)` reads the next value from an iterator (in this case created with `iter(tokens)`), from left to right. It is more like `i += 1; tokens[i]`. The `None` argument will indeed avoid an error when there is no more next value. – trincot Mar 31 '21 at 05:49
0

The other two approaches are great, here's one more:

# Helper function: pop from left or return default
pops = lambda l, d=None: l.pop(0) if l else d

def read_from_tokens(tokens):
    L = []
    while (token := pops(tokens, ')')) != ')':
        L.append(token if token!='(' else read_from_tokens(tokens))
    return L
David542
  • 96,524
  • 132
  • 375
  • 637