5

I have a few things about the AST / Symbol Table relation that i don't understand.

I currently have a AST implemented in C# which has nodes for variable declarations (these contain informations about the name, type, source position, a possible constant value as expression node, etc).

Now i want to fill a symbol table (using the visitor pattern on my AST) but my question is: are the "symbols" new classes for example VariableSymbol or does the symbol table directly store the VariableDeclarationNode from the AST?

IF the symbols are new classes, then who would store the evaluated expression value for constant variables, the VariableDeclarationNode or the VariableSymbol or somewhere else?

(I have seen some interpreter examples and they store all variable values, including constants, in a additional hash table, but im working on a source-to-source compiler and not a interpreter, so im not sure where you store the evaluated constants in this case. Sorry i know these a kinda multiple questions)

R1PFake
  • 360
  • 1
  • 8
  • if you are working on a source-to-source compiler, why would you even want to store the evaluated expression value of constant expressions? it's about translating to the syntax of the target language, not interpreting or maintaining any runtime state. could you give an example on what your code does/is supposed to do once you have the declaration/initialization node of a variable? Storing a reference to the node that declares a symbol in the object you use as a basis for target code emission is not right or wrong per se, it depends on the architecture you have in mind. – Cee McSharpface Jun 03 '18 at 17:03
  • My current architecture is a scanner and parser to generate the AST, my goal is to convert the AST to C code, but only if the code is "valid" for example not multiple variables with the same name. I saw some blogs/tutorials and they describe that i can build a scope/symbol table out of the AST to do the semantic checks. The examples that i found do both have a visitor pattern to visit all variable declarations and add them to the scope/symbol table but some examples store the AST node (VariableDeclarationNode in this example) and other examples create a new class for example VariableSymbol – R1PFake Jun 03 '18 at 18:48
  • That's why im not sure which solution is better, if i should directly add the AST nodes to the scope/table or if i should create new classes for the symbols. You are right about the constant expressions, sadly i need to evaluate them for a kinda "stupid" / legacy reason, it's not allowed to have multiple variables with the same name but there is a special rule, if both of the variables are constants and have the same value then there should be no error message (but only one variable is used) that's why i have to evaluate and store the value of constant variables for this special legacy case – R1PFake Jun 03 '18 at 18:52
  • I wanted to add that this is currently for a source-to-source compiler but the AST might be used for other things in the future because we maybe add a interpreter later, that's why i want to make sure that im using a good architecture, because this is the first time that im doing anything with a AST / Compiler – R1PFake Jun 03 '18 at 18:56
  • I see. still probably a) a matter of opinion or b) driven by design considerations not sufficiently clear from your post: if the information in the AST node is *sufficient for the task* then you're good with just storing references in scope tree/table. if you interpret from syntax tree instead of just emitting code, then you need sophisticated data structures where the existence of a reference to original AST nodes is a secondary concern (c'ued) – Cee McSharpface Jun 03 '18 at 19:02
  • I did both and both work. Not keeping references to "primitive" AST nodes at stages beyond lexing & parsing is a cleaner approach; may also depend on which library you use and show code otherwise it gets closed as off topic sooner or later. – Cee McSharpface Jun 03 '18 at 19:03
  • We created the scanner/parser without a library (following blogs/book tutorials). Btw thanks for your comments so far! I think you answered my main question, can you maybe post your previous comment (were you said that both is possible and it depends on the task etc) as answer then i can accept it? I have one last offtopic question, after you comments i think that i will seperate them and create additional symbol classes, do you think it would be dirty to store the evaluated constant values (for the special case) in the symbol classes or should i create an additonal table for these? – R1PFake Jun 03 '18 at 19:19

1 Answers1

2

are the "symbols" new classes for example VariableSymbol or does the symbol table directly store the VariableDeclarationNode from the AST?

If the information in the AST node is sufficient for the task then you're good with just storing references in scope tree/table. if you interpret from syntax tree instead of just emitting code, then you need sophisticated data structures where the existence of a reference to original AST nodes is a secondary concern. We've seen and done both and both work. Not keeping references to "primitive" AST nodes at stages beyond lexing & parsing is a cleaner approach.

[would it] be dirty to store the evaluated constant values (for the special case) in the symbol classes or should i create an additonal table for these?

That really depends, too... If you envision the constant value as an inherent property of the declaration, store it in your symbol descriptor class:

class Symbol : ISymbol {
    ASTNode DeclaringNode;
    SymbolType RuntimeType;
    bool InitializeAsConstant;
    RuntimeValue ConstantValue;

    ...
}

If you keep the comprising rvalues, so you could replicate the declaration verbatim in the target language, then treat them like a variable until the end of the process:

/* fantasy source language */
Constant $$IAMCONSTANT :=> /03\ MUL /02\ KTHXBYE

/* target language */
const int IAMCONSTANT = 3 * 2;

/* as opposed to compilation stage 1 precomputed */
const int IAMCONSTANT = 6;

The first is easier for the source-to-source case because you may get away without computing values of expressions in the compiler.

Cee McSharpface
  • 7,540
  • 3
  • 29
  • 65