- Chronicling America Home
- See All Available Newspapers
- Search Newspaper Pages
- Search Newspaper Directory
- About Chronicling America
- Technical - API
- Awardees
- Help
- Ask a Librarian
The U.S. National Endowment for the Humanities is dedicated to supporting research, education, preservation, and public programs in the humanities.
OCR in Chronicling America
OCR text in Chronicling America is generated through Optical Character Recognition (OCR) without correction.
What is OCR?
Optical character recognition (OCR) is a fully automated process that converts the visual image of numbers and letters into computer-readable numbers and letters. Computer software can then search the OCR-generated text for words, phrases, numbers, or other characters. However, OCR is not 100 percent accurate, and, particularly if the original item has extraneous markings on the page, unusual text styles, or very small fonts, the searchable text OCR generates will contain errors that cannot be corrected by automated means.
Although errors in the process are unavoidable, OCR is still a powerful tool for making text-based items accessible to searching. For example, important concept words often appear more than once within an article. Therefore, if OCR misreads one instance of a key word in a passage, but correctly reads the second instance, the passage will still be found in a full-text search.

