Database of Latvian Morphemes and Derivational Models

About the project

The “Database of Latvian Morphemes and Derivational Models (DLMDM)” is a project currently run by the Department of Latvian and Baltic Studies of the Faculty of Humanities of the University of Latvia as part of the Latvian Research Council’s Fundamental and Applied Research programme.

Project leader: Dr. philol. Andra Kalnača, Professor and Senior Researcher at the Department of Latvian and Baltic Studies, Faculty of Humanities, University of Latvia (andra.kalnaca@lu.lv)

Project No: lzp-2022/1-0013

Implementation period: 01.04.2023.–31.03.2026.

Project funding: 300 000 EUR

Funded by: Latvian Council of Science

Overview

The objective of the project is to create a comprehensive digital resource (database) on Latvian derivational morphology, providing structured, machine-readable data on the internal composition of Latvian words and the observable regularities/ patterns underlying the system of Latvian word formation. The scope of theoretical research involved in the implementation of this goal includes data-based/ bottom-up inventorying, classification and analysis, within the broader context of the language system as a whole, of morphemes, morpheme functions and principles governing morpheme combinability/ ordering.

The database is designed as comprising two linked (co-indexed) basic parts:

1) an annotated list of Latvian word constituents (roots, endings, suffixes, prefixes, etc.);

2) an alphabetically ordered list of sets of common-root/ common-stem words (derivational families) with the morphemic and derivational structure marked for each word.

The initial core of the database (i.e. the range of derivational families/ zero-nodes) is based on the 165,090 lemmas extracted from the “Balanced Corpus of Modern Latvian LVK2018”, as of April 2023.

Expected results:

1) a free-access database published in a public internet repository;

2) scientific articles (SCOPUS/WOS), a dedicated open access volume (SCOPUS).

Apart from technical, scientific and dissemination activities (such as scientific workshops, conferences, publications) project activities will also include three public lectures on Latvian morphemics, morpheme combinability/ ordering, word formation and technical aspects of the DLMDM.

We hope that the results of the project “The Database of Latvian Morphemes and Derivational Models” will be useful not only to the scientific community, but also to a wider audience, especially translators, IT specialists, lexicographers, teachers, and learners of Latvian in Latvia and abroad. The database will also provide a solid, dependable basis for all kinds of further data-based research on Latvian word formation, and help fill the still existing overall gap in digital resources dedicated to the Latvian language, thereby, hopefully, contributing to the development of new language learning materials, language-use manuals, etc.

Word formation and morphemics are interlinked with other subsystems of a language (grammar, the lexicon, pragmatics, semantics) in a complex web of manifold relations. A proper understanding of how language, as a system, functions and is organized is impossible without a solid, detailed understanding of derivational morphology and the system of morphemes. It is our hope, therefore, that the DLMDM will, overall, make an important contribution to the future of Latvian language research and the humanities in Latvia.