2

Should I be considered about identifiers and keywords difference when making the automata?

I could make a general automata and when the automata accept the string I would make it check a table of keywords to check if it is identifier or keyword but I am not sure if this is the right or the best way to do it.

-Update-

I am supposed to make a lexical analyzer for a language that has the following lexemes: Integers, Identifiers, Keywords(‘if’, ‘then’, ‘else’,‘while’) Predicates(‘==’, ‘<’,‘<=’), "=", "+", "(", ")", ";", "}".

I don't need a solution It is just that I am not sure if I understood the concept right..Probably I don't..but I tried researching a bit and still I don't understand.

First finite automatas only accept or reject right? so how can I know when using them what is the lexeme it is accepted as? Did it get accepted as keyword? Identifier? and so on.

If I were to solve this logically I would make my automata have accepting state for each different lexeme then in the implementation I would check which accepted state it ended at, is this how it is supposed to be done? or is an entirely different logic used usually?

If the above assumption is right here comes the problem of my question how would I distinguish between identifiers and keywords? do I do it in the automata? or at the implementation?

If I am not clear just ignore my question I think I need to research more because I am not making sense.

Cloules
  • 81
  • 1
  • 9
  • This might be a bit too general for SO - try asking this on [Programmers](http://programmers.stackexchange.com/help/on-topic) instead. – GoBusto Nov 11 '14 at 12:27
  • @GoBusto if this will end up as off-topic then it might better fit on Computer Science (beta) Stack Exchange - http://cs.stackexchange.com/help/on-topic – xmojmr Nov 11 '14 at 12:36
  • Your question is very abstract and unclear. What language do you mean? Will you create the automata by hand or use a parser generator? BTW: for a few conventional languages where keywords could not be used as identifiers (at the same time) I have seen this distinction being resolved using a keyword lookup table inside the [tokenizer](http://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) – xmojmr Nov 11 '14 at 12:39
  • @xmojmr It is just a simple language. I will create the automata by hand, as it is an assignment. In my case keywords could not used as identifiers so I think I would implement the table look up, Thank you. – Cloules Nov 11 '14 at 12:50
  • 1
    You can take a look at the kind of tokenizer I was thinking about in https://github.com/Microsoft/TypeScript/blob/v1.1.0.1/src/services/syntax/syntaxKind.ts and https://github.com/Microsoft/TypeScript/blob/v1.1.0.1/src/compiler/scanner.ts#L33. But for your assignment you are probably supposed to solve almost everything using an automata so you are probably supposed to recognize distinct keywords directly without any non-automata stuff and maybe even without using any parser generators. But your question is still very unclear. Can you edit the question and publish the grammar you were given? – xmojmr Nov 11 '14 at 13:27
  • 1
    @xmojmr Sorry for not being clear It is just that I am not sure if I even got the concepts right. The language is really really simple It just have four keywords "if, then, else, and while". But i still can't see how I can create automata that distinguish between identifiers and these keywords without it looking spaghetti. I will edit the question now. – Cloules Nov 12 '14 at 19:54

2 Answers2

2

Your approach sounds fine: first check from the table of keywords and if it matches one of these, it's a keyword. If not, it is an identifier.

Agis
  • 29,320
  • 2
  • 67
  • 77
1

You can do it either way: either have a rule per keyword and a catchall for identifiers, or just a rule for identifiers and a lookup table of keywords that you consult when the identifier rule fires. First approach is quicker but requires larger DFA/NFA tables.

user207421
  • 289,834
  • 37
  • 266
  • 440