id := ( ) , num binop are tokens, pulled out during lexical analysis.
When we write a parser we create code, and data
structures that determine if the program is in the correct
syntactical form. (Essentially a data structure for each
production.)
An Abstract Syntax Tree removes unnecessary tokens from
the parse tree (like := , ( ))
Project 1 starts at the point of representing the
program via an AST. The data structures you download
represent these productions in abstract form. This is the
point at which we can begin to do semantic analysis of the
code. In project 1 we're really just practicing dealing
with the tree form and moving information around.
Lexical Analysis
What it is: breaking the program into a stream of
tokens -- similar to what happens when we hear or read
language, where we actually notice the distinct words.
This allows the parser to deal with the grammar without
worrying about things like whitespace, comments and any
other unnecessary stuff.
What are tokens? Make a list...
Task: Give regular expressions or FAs for
identifiers
integers
floats
if
else
while
How do the FAs get implemented?
How can we combine them to make a single scanner?
is "ifta" two tokens or one?
Can we teach an FA to decide which one? Do we need to
add info (and what info)?