The Library of Congress

Chronicling America: Historic American Newspapers

The Library of Congress > Chronicling America

Search America's historic newspapers pages from 1836-1922 or use the U.S. Newspaper Directory to find information about American newspapers published between 1690-present. Chronicling America is sponsored jointly by the National Endowment for the Humanities external link and the Library of Congress. Learn more

Pages Available: 4,645,220

What is OCR?

Optical character recognition (OCR) is a fully automated process that converts the visual image of numbers and letters into computer-readable numbers and letters. Computer software can then search the OCR-generated text for words, phrases, numbers, or other characters. However, OCR is not 100 percent accurate, and, particularly if the original item has extraneous markings on the page, unusual text styles, or very small fonts, the searchable text OCR generates will contain errors that cannot be corrected by automated means.

Although errors in the process are unavoidable, OCR is still a powerful tool for making text-based items accessible to searching. For example, important concept words often appear more than once within an article. Therefore, if OCR misreads one instance of a key word in a passage, but correctly reads the second instance, the passage will still be found in a full-text search.