Fundamental concepts
This section explains what makes Tabulae different from other approaches to Latin morphological parsing.
Key points
- Tabulae datasets are composed of tables of inflectional rules, and tables of vocabulary stems.
- Each dataset follows a specified orthography.
- Forms and lexemes are identified by URNs that are independent of orthography.
- Datasets are compatible between the Scala and Julia implementations of Tabulae.
- Data tables use abbreviated URNs; adding a URN registry to the dataset allows expanding abbreviated URNs to full
Cite2Urns. Tabulae.jlis a pure Julia system: it has no external dependenciesTabulae.jlimplements the model of citable morphological parsing defined theCitableParserBuilderspackage (documentation)
Architecture
Datasets
- datasets: data can be maintained in directories of delimited-text files for stem and rules tables
- rule types and stem types
- reading/writing datasets uses
fromcexandcexfunctions of theCitableBasepackage
Building parser
- Cartesian product of rules
*stems, filtered on inflectional class