IMPACT was een Europees project dat was opgezet om historische teksten online toegankelijker te maken. Er zijn daarbij diverse lexica tot stand gekomen, bedoeld om in te kunnen zetten voor OCR, OCR-postcorrectie en betere zoekmogelijkheden in teksten.

IMPACT lexicons

During the IMPACT project, which ran from 2008-2012, various lexicons were compiled. These lexicons were intended for OCR, OCR post-correction and better search facilities in texts. The computational lexicon of common nouns, compiled to make searching easier, has been included in GiGaNT (a computational lexicon in the making, covering the Dutch language from the 6th century until now).

INT Historical Word List

The INT Historical Word List consists of two lists, each containing around 500,000 historical word forms, to be used for OCR and OCR post-correction, roughly for the period 1550-1970. One list contains regular words, the other contains names.

For a demonstration of the use of the lexicon in OCR, see this paper.


The INT IMPACT NE Lexicon is a computational lexicon of proper nouns based on sources from the period 1750-1945.

The lexicon contains names of persons, places and organisations. Place names and organisations are linked to a contemporary Dutch lemma and, if relevant, to an alternative name.

Personal names are equivalent in order to group variants. Personal names have been given a lemma form equal to their form in the original material. Variants of the same personal name are linked automatically.


