Towards a broad-coverage graphemic analysis of large historical corpora

This paper presents a method which we are developing to explore graphemic variation in large historical corpora of German. Historical corpora provide an amount of data at the level of graphemics which cannot be handled exhaustively using common methods of manual evaluation. To deal with this challenge, we apply methods from computational linguistics to pave the way for a broad-coverage graph(em)ic analysis of large historical corpora. In this paper, we show how our approach can be applied to the Reference Corpus of Middle High German. Illustrating our method and linguistic analysis, we present findings from our investigations into diatopic and/or diachronic variation as documented in 13th and 14th century charters (Urkunden) from the corpus.

Metadaten
Author:	Sandra Waldenberger GND, Stefanie Dipper ORCiD GND, Ilka Lemke GND
URN:	urn:nbn:de:hbz:294-89052
DOI:	https://doi.org/10.1515/zfs-2021-2037
Parent Title (English):	Zeitschrift für Sprachwissenschaft
Publisher:	de Gruyter
Place of publication:	Berlin
Document Type:	Article
Language:	English
Date of Publication (online):	2022/05/05
Date of first Publication:	2022/01/07
Publishing Institution:	Ruhr-Universität Bochum, Universitätsbibliothek
Tag:	Middle High German; corpus-based analysis; graphemic variation; quantitative analysis
Volume:	40
Issue:	3
First Page:	401
Last Page:	420
Institutes/Facilities:	Sprachwissenschaftliches Institut
	Germanistisches Institut
open_access (DINI-Set):	open_access
faculties:	Fakultät für Philologie
Licence (English):	Creative Commons - CC BY 4.0 - Attribution 4.0 International

RUB » Bibliotheksportal