shareddir = joinpath(repo, "datasets", "core-infl-shared")
lat25dir = joinpath(repo, "datasets", "core-infl-lat25") Build a parser and parse Latin strings
To replicate all the steps in this tutorial:
- install Julia if you haven’t already done so
- download or clone the Tabulae.jl repository
- start a Julia REPL
- assign to the variable
repothe path to the cloned repository
Building a parser from local files
You can build a parser from one or more sets of delimited-text files organized in directories following Tabulae’s conventions. In this tutorial, we’ll use the files in the core-infl-shared and core-infl-lat25 directories in the datasets directory of the Tabulae github repository.
If you have a variable named repo with the root directory of the Tabulae repository, then the full path to the directories will:
Instantiate a data set
You can create a Tabulae.DataSet from a list of one or more directories.
using Tabulae
ds = dataset([shareddir, lat25dir])Compile a parser
You can then build a parser from a data set.
p = tabulaeStringParser(ds)Interactive parsing
Use the parsetoken function (from the CitableParserBuilder package) to parse a string with a parser.
using CitableParserBuilder
s = "agricolae"
parses = parsetoken(s, p)4-element Vector{Analysis}:
Analysis("agricolae", ls.n1626, forms.2010001200, latcommon.noun1626, latcommoninfl.a_ae14, "agricolae")
Analysis("agricolae", ls.n1626, forms.2010001300, latcommon.noun1626, latcommoninfl.a_ae15, "agricolae")
Analysis("agricolae", ls.n1626, forms.2020001100, latcommon.noun1626, latcommoninfl.a_ae18, "agricolae")
Analysis("agricolae", ls.n1626, forms.2020001600, latcommon.noun1626, latcommoninfl.a_ae24, "agricolae")
Morphological analyses
The result is a Vector of analyses. Each Analysis includes identifiers for a morphological form object and a lexeme (or vocabulary item). You can use the lexemeurn function (from the CitableParserBuilder package) to extract the lexeme’s identifier from an Analysis.
using CitableParserBuilder
lexemeurn(parses[1])ls.n1626
Tabulae’s latinForm function extracts the form identifier from an analysis, and creates a LatinMorphologicalFormfrom it.
forms = latinForm.(parses)4-element Vector{LMFNoun}:
masculine genitive singular
masculine dative singular
masculine nominative plural
masculine vocative plural
Note that morphological forms are not string values. If you want a string label for a form, use the aptly named label function.
label.(forms)4-element Vector{String}:
"masculine genitive singular"
"masculine dative singular"
"masculine nominative plural"
"masculine vocative plural"
See the following tutorial on working with morphological forms