Preliminary Investigation of Techniques for Automated Reading of Unformatted Text Methods for converting unstructured printed material into computer code are experimentally investigated. An operator-controlled mode, depending on human demarcation of the various regions of the page for guiding the scanner, is implemented by means of a joystick and a CRT display. This mode, for which some performance figures are obtained, is thought to be suitable for processing very complicated material, such as technical journals. For simpler material, for instance the "claims" sections of patents, and in applications where the utmost accuracy is not necessary, an unsupervised mode is advocated. Here, the textual portions of the page are located during a rapid prescan by a rudimentary form of frequency analysis. These areas are then rescanned at a higher resolution suitable for character recognition. Error rates of the order of 0.1 percent are obtained in a simple problem involving photographs of telephone company meter boards. Other matters related to the design of a general purpose page reader, such as the segmentation of printed text, the possibility of time-sharing the scanner, interactive man-machine operation, and the facsimile reproduction of illustrations, are discussed. CACM July, 1968 Nagy, G. pattern recognition, character recognition, text reading, information retrieval, unformatted text operator-controlled reader, online reader, text-image discrimination, reading machine 3.63 3.79 3.89 6.29 6.35 CA680703 JB February 22, 1978 2:52 PM 1726 5 1726 1726 5 1726 1726 5 1726 2370 5 1726 1726 6 1726