Automatic Generation of Efficient Lexical
Processors Using Finite State Techniques

The practical application of the theory of
finite-state automata to automatically generate 
lexical processors is dealt with in this tutorial article
by the use of the AED RWORD system, developed 
at M. as part of the AED-1 system.  This system
accepts as input description of the multicharacter 
items or of words allowable in a language given in terms
of a subset of regular expressions. The output 
of the system is a lexical processor which reads a string
of characters and combines them into the items 
as defined by the regular expressions.  Each output
item is identified by a code number together with 
a pointer to a block of storage containing the characters
and character count in the item.  The processors 
produced by the system are based on finite-state machines.
 Each state of a "machine" corresponds to 
a unique condition in the lexical processing of a character
string.  At each state a character is read, 
and the machine changes to a new state.  At each transition
appropriate actions are taken based on the 
particular character read.  The system has been in operation
since 1966, and processors generated have 
compared favorably in speed to carefully hand-coded programs
to accomplish the same task.  Lexical processors 
for AED-O and MAD are among the many which have been
produced.  The techniques employed are independent 
of the nature of the items being evaluated.  If the
word "events" is substituted for character string, 
these processors may be described as generalized decision-making
mechanisms based upon an ordered sequence 
of events.  This allows the system to be used in a
range of applications outside the area of lexical 
processing.  However convenient these advantages may
be, speed is the most important consideration.  
In designing a system for automatic generation of a
lexical processor, the goal was a processor which 
completely eliminated backup or rereading, which was nearly
as fast as hand-coded processors, which would 
analyze the language and detect errors, and
which would be convenient and easy to use.

CACM December, 1968

Johnson, W. L.
Porter, J. H.
Ackley, S. I.
Ross, D. T.

character string, compiler, finite-state automata, finite-state
machine, lexical processor, nondeterministic 
machine, parsing, plex structure, regular expressions,sequential
machine, syntactic analysis

3.63 3.75 4.12 5.22 5.24 5.31

CA681201 JB February 21, 1978  2:19 PM

1051	4	1665
1139	4	1665
1265	4	1665
1323	4	1665
1358	4	1665
1380	4	1665
1552	4	1665
1665	4	1665
1665	4	1665
1665	4	1665
1665	4	1665
1768	4	1665
1781	4	1665
1787	4	1665
1787	4	1665
1824	4	1665
1825	4	1665
1836	4	1665
1860	4	1665
1861	4	1665
1989	4	1665
2015	4	1665
2110	4	1665
2112	4	1665
2127	4	1665
2155	4	1665
2187	4	1665
2317	4	1665
2534	4	1665
2541	4	1665
2545	4	1665
2698	4	1665
2698	4	1665
2733	4	1665
2733	4	1665
2820	4	1665
3073	4	1665
3155	4	1665
763	4	1665
1665	5	1665
1665	5	1665
1665	5	1665
1781	5	1665
378	5	1665
2746	5	1665
631	5	1665
799	5	1665
1665	6	1665
1739	6	1665
2139	6	1665
2545	6	1665
2786	6	1665