CSCI 371

Project 2

Now that you have experience with finite state machines, we will work on a  more challenging project. For this, you will accept tokens in the language described here.

The first step is to produce a finite state automata for the language's tokens.  Identify the states with integers and we can use a table (2-d array) for the finite state machine.  The row index would identify the current state and the column index would be the input.  You could use the ascii character code for the column index, but that would create many (256) columns that you would need to deal with.  Since many of the columns are going to be the same (all digits treated equally, many codes not used, etc), it would be best to have a function convert the ascii character code into an internal character category code (some categories have many ascii codes, some may only have one).  The best way to do this is to have a one dimensional array where the subscript is the ascii code and the value in the array is the category code.  This way your program will be very efficient in looking up the category code (just use the ascii code as the subscript).

Also, you will need another 2-d table for actions.  In some situations, you will want to do different things.  For example, when you have identified a token, you will want to print it out.  Sometimes when you look at the next character, you want to use it verses not using it (so that when you go to the finite state table again, you use the character again rather than the next character).  This is somewhat like epsilon transitions.  For example, with the statement y=sum+4; you don't know you are at the end of the variable name y until you get to the =.  You don't want the = to be part of the token.  Similarly with sum.  You don't know you are at the end of the variable sum until you hit the +.  You don't want the + to be part of the token.

Another action is add to token.  While parsing sum you need to build up the string 'sum' to eventually print out.