Creating, Managing and Archiving Textual Corpora in Underresourced Languages

28. bis 30. August 2024


Veranstaltet von
Research Data Management and Multilingual DH DARIAH Working Groups

Vortragende Person/Vortragende Personen:

This workshop is organised by the Research Data Management and the Multilingual DH DARIAH Working Groups.

If you are interested to participate remotely, please send an email to No on-site participation is possible.


Day 1 (corpus building) – 28.08.2024, afternoon

Francesco Gelati and colleague (Universität Hamburg): Welcome Greetings

Alíz Horváth (Eötvös Loránd University Budapest): Opening Speech. What is Corpus/Data/Workflow in a multilingual context? Why a workflow for non-English sources? Rationale behind the workshop

Metadata, lexicography

Péter Király (GWDG, Göttingen): Multilingual metadata standards and metrics through Europeana

Francesco Gelati: Metadata for archiving Textual Corpora

Agnes Kim (ACDH-CH, Vienna): Multilingualism in corpus building

Corpus building

Till Grallert (Freie Universität Berlin): Corpus building from open, collaborative and scholarly digital editions

Merve Tekgürler (Stanford University): Approaching existing materials as data through 18th century court histories

Day 2 (corpus management) – 29.08.2024, whole day

Nanette Rißler-Pipka (DARIAH-DE National Coordinator): Institutional Greetings

Corpus management – textual and non-textual

Cristina Vertan (Herder Institute for Historical Research on East Central Europe, Marburg): Corpus-Processing with Clarin Tools

Alessia Spadi and Emiliano Degl’Innocenti (Consiglio Nazionale delle Ricerche, Florence): Tools for Corpus Processing

Aleksandr Riaposov and Alexandre Arkhipov (Universität Hamburg): Workflows for managing digital corpora of minority languages

Georgios Vardakis (University of Padua): Annotating low-resourced languages with the Interlinear Text Annotator: A case study in Corfioto

Putting it all together

Jonas Müller-Laackman (State and University Library, Hamburg): Closing the Gap in non-Latin script data

Shih-Pei Chen and Calvin Yeh (Max Planck Institute for the History of Science, Berlin): LoGaRT – Local Gazetteers Research Tools for classical Chinese

Day 3 (Synthesis, publication, writing sprint) – 30.08.2024, morning until lunch

Alex König (CLARIN) [and Laure Barbot (DARIAH) remote]: SSH Open Marketplace

Françoise Gouzi (DARIAH): Transformations – DARIAH Overlay Journal

Further information
