Fundamental concepts

Published

July 5, 2024

This section explains what makes Tabulae different from other approaches to Latin morphological parsing.

Key points

Tabulae datasets are composed of tables of inflectional rules, and tables of vocabulary stems.
Each dataset follows a specified orthography.
Forms and lexemes are identified by URNs that are independent of orthography.
Datasets are compatible between the Scala and Julia implementations of Tabulae.
Data tables use abbreviated URNs; adding a URN registry to the dataset allows expanding abbreviated URNs to full Cite2Urns.
Tabulae.jl is a pure Julia system: it has no external dependencies
Tabulae.jl implements the model of citable morphological parsing defined the CitableParserBuilders package (documentation)

datasets: data can be maintained in directories of delimited-text files for stem and rules tables
rule types and stem types
reading/writing datasets uses fromcex and cex functions of the CitableBase package