Vincent Christlein1, Dominique Stutzmann2, Anguelos Nicolaou1, Mathias Seuret1, and Andreas Maier1
1 Pattern Recognition Lab, Friedrich-Alexander-University Erlangen-Nürnberg, Germany 2 Institut de Recherche et d’Histoire des Textes (Centre National de la Recherche Scientifique - UPR841), France [1] {firstname.name}@fau.de; [2] {firstname.name}@irht.cnrs.fr
Abstract
As part of ICDAR2019 Historical Document Reading Challenges, this competition investigates the performance of large-scale
retrieval of historical document images based on writer recognition. In resemblance of the popular ImageNet, we rely on a large image dataset provided by several institutions. In contrast to ImageNet, we focus on the task of image retrieval, since a ground-truth is easier to acquire.
1 Task
This competition is in line with previous ICDAR and ICFHR competitions on writer identification. The last competition on historical writer identification [3] consisted of 720 writers contributing five samples each, which resulted in 3,600 samples. In this competition, we want to increase the number of samples significantly. Therefore, we employ a semi-automatic procedure allowing us to gather 20 000 document images from 10 000 writers: 2 500 writers with five sample pages each, and 7 500
pages from anonymous writers.
The task consists of finding all documents corresponding to a specific writer using a document from this writer as query. Those 7 500 writers with a single page are used as distractor images. Additionally, they can be used for the task of detecting if retrieved documents do match the target writer at all. More details about this are given in Section 4.
Image retrieval, and in particular writer identification is a relevant task for the ICDAR community. In word spotting, the challenge is to conduct an efficient image retrieval on small image patches. While the amount of text is not an issue in this competition, the challenge is the extreme heterogeneity of the data material for finding relevant images.
This competition is not only relevant for the ICDAR community but also to the field of history, literary studies, and, particularly, paleography. Indeed, writer identification contributes to cultural studies in allowing to trace the activity of individuals and the organization of groups within the society. Thanks to writer identification, one can ascribe some new writings to a particular author in connection with their known autographs and trace their methods of work (annotations, drafts, preliminary version, etc.) in order to have a better understanding of their philosophy and aims. By applying writer identification to the production of a group, such as a chancery, historians can also gain an idea of the inner organization and the relationships between individuals in this group. Therefore, the outcome may have direct impact on our knowledge of the past. Especially, in the age of mass digitization, a successful retrieval can assist humanists in their daily work.