Fundamental concepts

Published

July 5, 2024

This section explains what makes Tabulae different from other approaches to Latin morphological parsing.

Key points

  • Tabulae datasets are composed of tables of inflectional rules, and tables of vocabulary stems.
  • Each dataset follows a specified orthography.
  • Forms and lexemes are identified by URNs that are independent of orthography.
  • Datasets are compatible between the Scala and Julia implementations of Tabulae.
  • Data tables use abbreviated URNs; adding a URN registry to the dataset allows expanding abbreviated URNs to full Cite2Urns.
  • Tabulae.jl is a pure Julia system: it has no external dependencies
  • Tabulae.jl implements the model of citable morphological parsing defined the CitableParserBuilders package (documentation)

Architecture

Datasets

  • datasets: data can be maintained in directories of delimited-text files for stem and rules tables
  • rule types and stem types
  • reading/writing datasets uses fromcex and cex functions of the CitableBase package

Building parser

  • Cartesian product of rules * stems, filtered on inflectional class