A Stochastic Approach to the Grammatical Coding of English

A computer program is described which will
assign each word in an English text to its form 
class or part of speech.  The program operates at relatively
high speed in only a limited storage space. 
 About half of the word-events in a corpus are identified
through the use of a small dictionary of function 
words and frequently occurring lexical words.  Some suffix
tests and logical-decision rules are employed 
to code additional words.  Finally, the remaining words
are assigned to one class or another on the basis 
of the most probable form classes to occur within the already
identified contexts.  The conditional probabilities 
used as a basis for this coding were empirically derived
from a separate hand-coded corpusn preliminary 
trials, the accuracy of the coder was 91% to 93%, with
obvious ways of improving the algorithm being 
suggested by an analysis of the results.

CACM June, 1965

Stolz, W. S.
Tannenbaum, H.
Carstensen, F. V.

CA650620 JB March 6, 1978  9:35 PM

1235	5	1235
1235	5	1235
1235	5	1235