ICDAR2019 Competition on Image Retrieval for Historical Handwritten Documents

Vincent Christlein1, Dominique Stutzmann2, Anguelos Nicolaou1, Mathias Seuret1, and Andreas Maier1
1 Pattern Recognition Lab, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
2 Institut de Recherche et d’Histoire des Textes (Centre National de la Recherche Scientifique – UPR841), France
[1] {firstname.name}@fau.de; [2] {firstname.name}@irht.cnrs.fr


As part of ICDAR2019 Historical Document Reading Challenges, this competition investigates the performance of large-scale
retrieval of historical document images based on writer recognition. In resemblance of the popular ImageNet, we rely on a large image dataset provided by several institutions. In contrast to ImageNet, we focus on the task of image retrieval, since a ground-truth is easier to acquire.

1 Task

This competition is in line with previous ICDAR and ICFHR competitions on writer identification. The last competition on historical writer identification [3] consisted of 720 writers contributing five samples each, which resulted in 3,600 samples. In this competition, we want to increase the number of samples significantly. Therefore, we employ a semi-automatic procedure allowing us to gather 20 000 document images from 10 000 writers: 2 500 writers with five sample pages each, and 7 500
pages from anonymous writers.
The task consists of finding all documents corresponding to a specific writer using a document from this writer as query. Those 7 500 writers with a single page are used as distractor images. Additionally, they can be used for the task of detecting if retrieved documents do match the target writer at all. More details
about this are given in Section 4.
Image retrieval, and in particular writer identification is a relevant task for the ICDAR community. In word spotting, the challenge is to conduct an efficient image retrieval on small image patches. While the amount of text is not an issue in this competition, the challenge is the extreme heterogeneity of the data material for finding relevant images.
This competition is not only relevant for  he ICDAR community but also to the field of history, literary studies, and, particularly, paleography. Indeed, writer identification contributes to cultural studies in allowing to trace the activity of individuals and the organization of groups within the society. Thanks to writer identification, one can ascribe some new writings to a particular author in connection with their known autographs and trace their methods of work (annotations, drafts, preliminary version, etc.) in order to have a better understanding of their philosophy and aims. By applying writer identification to the production of a group, such as a chancery, historians can also gain an idea of the inner organization and the relationships between individuals in this group. Therefore, the outcome may have direct impact on our knowledge of the past. Especially, in the age of mass digitization, a successful retrieval can assist humanists in their daily work.

2 Participants and Timeline

Six teams participated in the last ICDAR2017 historical writer identification competition [3] and four in the ICDAR2017 Competition on the Classification of Medieval Handwriting in Latin Script [2]. We expect the same number of participants, but hope that the more general direction towards visual image retrieval attracts more people than the pure writer identification community.


Jan 15 Homepage running, registration possible, link to ICDAR17 HistoricalWI Test data set for training (https://zenodo.org/record/854353)
Feb 28 Providing additional validation samples
Apr 15 Providing evaluation test set, not too early disclosure so that a manual image search is made more difficult. Additionally, we plan to provide a baseline system based on the work of Nicolaou et al. [4]. A stronger baseline will be provided in the final competition paper based on the work of Christlein et al. [1] to compare against a current state of the art system.
Apr 25 Deadline for submitting the results (data and system)

3 Data

For the training data, we suggest to use the “ICDAR17 Historical-WI” test dataset. We will not forbid the training on additional data, such as the ICFHR2016 of ICDAR2017 “Competition on the Classification of Medieval Handwriting in Latin Script”.
The test data set contains 20,000 images, that is: images from 2,500 writer represented by 5 images each, and 7.500 images chosen at random. The main focus of this new corpus is the writers of books in the European Middle Ages, especially 9th to 15th century CE. The larger part of the corpus is anonymous.
Indeed, few of the writer of this period signed their products and fewer are known by their names. In this part, given that paleographers’ attributions across books may be disputed, the organizers posit that consecutive pages in a homogeneous
part of a book represent one particular writer. A smaller part of the corpus is composed by script samples from books that are believed or demonstrably known to have been written by the same individual, such as literary autographs.
Concerning this subset and for the sake of homogeneity in the competition, the corpus comprises also five consecutive pages of each of the selected autograph books. In the test data set, most images are taken from IIIF compliant repositories allowing the use of images for scientific and teaching purposes. The organizers crop the selected images in a randomly chosen text region and in different sizes, in order to avoid that the participants base the image retrieval on page layout or digitization protocols (color scale, ruler, etc.).
In both the training and the test data sets, the images are in different resolutions and formats (jpg, tif, grey-level, color, etc.). After the competition, the images will be published on Zenodo and linked from the CLAMM competition website http:\clamm.irht.cnrs.fr, and sent to the IAPR-TC11.

4 Evaluation

The evaluation will be done using a leave-one-image-out cross-validation approach. This means that every image of the test set will be used as query for which the other test images will have to be ranked. Additionally, the users have to tell if the writer of the query also appears in the remaining test set.
The competition will contain two tracks: one which is aimed to ease the participation of all researchers engaged in writer identification and a second one which is aimed at an in depth analysis of the behavior of writer identification systems.

4.1 Data evaluation track

In this track participants will be provided with a test-set. They will have to perform a traditional leave-one-out competition and will be asked to report ranking of all the test-set samples for each query document. The participants should also report for each query an estimate of how many documents in the database match the query. The results will be evaluated using the metrics described in Section 4.3. The organizers of the competition will make the toolkit for performance evaluation publicly available on the 28th of February 2019, along with the release of the validation dataset.

4.2 System evaluation track

This track will work by having the evaluation system to interface directly with writer identification systems deployed on the machines of participants. In this track participants will have to provide their method in the form of a deterministic image to vector embedding method. We will provide a program that turns the embedding method into service which is then queried by our evaluation system.
Participants who want to participate in this track but cannot use their own system in the described manner, will be given access to an account on our server to deploy their system, although finite computational resources will be available.
This track will allow a thorough testing of the systems and their sensitivities to various distortions that reflect realistic problems. Several benefits exist to this methodology:

  • This approach will minimize the effort of participants to deploy their system since they will be deploying on their own machine or a machine which they control remotely.
  • People who cannot share their systems for legal or other reasons, can participate.
  • People who need special hardware such as GPUs can participate with their own hardware.
  • Even though hardware differs we can obtain realistic estimates of time needed by each system.

4.3 Error Metrics

Our dataset consists of matchable queries, i.e. queries that have items in the retrieval list written by the same scribe, and unmatchable queries (“not-in” queries), i.e. queries of a single scribe where no other related image can be found in the retrieval list. The evaluation will focus on two tasks: retrieval and relevance. The retrieval task will assess the performance on finding relevant
documents for matchable queries. That means we compute mAP for those queries whose associated writer contributed more than one page. The relevance measure is computed by means of accuracy. Therefore, the participants will hand in a list telling for each item if it is a matchable item, i.e. if there are other samples in the retrieval list from the same writer or not. The winner of the competition will be assigned according to the sum of the ranks in both tasks.
Several other metrics will be provided for reference but they will have no affect on the ranking for selecting the winning method.

5 Organizers

The following persons will organize this competition:

Vincent Christlein

Contact: vincent.christlein@fau.de, www5.cs.fau.de/~christlein
Short-bio: He received his Diploma degree in computer science in July 2012 from the Friedrich-Alexander University of Erlangen-Nürnberg, Germany. During his studies, he worked on the detection of copy-move forgeries in the field of image forensics. Currently, he is pursuing his PHD-studies in the analysis of handwritings with focus on writer identification and writer retrieval. His research interests lie in the field of computer vision and pattern recognition, particularly in handwriting analysis and historical document analysis.
Experience: He participated in various competitions on script type classification and writer recognition between 2014 and 2018. Last year, he co-organized the “ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI)”.

Dominique Stutzmann

Contact: dominique.stutzmann@irht.cnrs.fr, www.irht.cnrs.fr/annuaire/stutzmann-dominique
Short-bio: After degrees in Classics, History and German studies at the Sorbonne, he studied at the Ecole Nationale des Chartes (2002), received a MLIS and worked at the Staatsbibliothek zu Berlin and the Bibliothèque nationale de France. He received his PhD in history in 2009 from the Université Paris-1 Panthéon Sorbonne, France. He completed a PhD on scribal practices in medieval communities. He is senior researcher at the Institut de Recherche et d’Histoire des Textes (CNRS) and Principal Investigator of several research projects in the field of digital humanities and palaeography.
Experience: He co-organized two sessions of the “Competition on the Classification of Medieval Handwritings in Latin Script”, including script type classification and dating of handwriting samples at ICFHR2016 and ICDAR2017.

Anguelos Nicolaou

Contact: anguelos.nikolaou@fau.de, www5.cs.fau.de/~nicolaou
Short-bio: Anguelos Nicolaou obtained his master degree in 2014 in computer science from the university of Bern, Switzerland. He is a PhD student in the Computer Vision Center at the Autonomous University of Barcelona studding robust reading systems. He is currently working on a project involving information retrieval from historical document images from the Czech Republic.
He co-organized the “ICDAR2015 Robust Reading Competition”.

Mathias Seuret

Contact: mathias.seuret@fau.de, www5.cs.fau.de/~seuret
Short-bio: He obtained his Master degree in computer science at the University of Fribourg, Switzerland in 2013. He is currently a PhD student in the Document, Image and Voice Analysis (Diva) research group of the same university, with a main focus on convolutional neural networks applied on layout analysis of historical document images. He recently started working, in parallel to his studies, on font analysis, classification and clustering of printed historical books at the Friedrich-Alexander University of Erlangen-Nürnberg, Germany.
Experience: He co-organized the “ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts”.

Andreas Maier

Contact: andreas.maier@fau.de, www5.cs.fau.de/~maier
Short-bio: Andreas Maier studied Computer Science, graduated in 2005, and received his PhD in 2009. From 2005 to 2009 he was working at the Pattern Recognition Lab at the Computer Science Department of the University of ErlangenNuremberg. His major research subject was medical signal processing in speech data. In this period, he developed the first online speech intelligibility assessment tool – PEAKS – that has been used to analyze over 4.000 patient and control subjects so far. From 2009 to 2010, he started working on flat-panel C-arm CT as post-doctoral fellow at the Radiological Sciences Laboratory in the Department of Radiology at the Stanford University. From 2011 to 2012 he joined Siemens Healthcare as innovation project manager and was responsible for reconstruction topics in the Angiography and X-ray business unit. In 2012, he returned the University of Erlangen-Nuremberg as head of the Medical Reconstruction Group at the Pattern Recognition lab. In 2015 he became professor and head of the Pattern Recognition Lab.
Experience: He has supervised and instructed PhD students in historical document processing. Highlights include Vincent Christlein’s writer identification and Daniel Stromer’s scanning of historial books without opening them.


1. Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised Feature Learning for Writer Identification and Writer Retrieval. In: ICDAR (2017)
2. Cloppet, F., Eglin, V., Helias-Baron, M., Kieu, C., Vincent, N., Stutzmann, D.: ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). pp. 1371–1376. Kyoto (nov 2017)
3. Fiel, S., Kleber, F., Diem, M., Christlein, V., Louloudis, G., Stamatopoulos, N., Gatos, B.: ICDAR 2017 Competition on Historical Document Writer Identification (Historical-WI). In: ICDAR (2017)
4. Nicolaou, A., Bagdanov, A.D., Liwicki, M., Karatzas, D.: Sparsely Sampled Binary Patterns for Writer Identification. In: ICDAR. Nancy, France (2015)