Compile a parser
using Tabulae
shareddir = joinpath(repo, "datasets", "core-infl-shared")
lat25dir = joinpath(repo, "datasets", "core-infl-lat25")
parser = dataset([shareddir, lat25dir]) |> tabulaeStringParserPackage version: 0.12.0
July 5, 2024
To replicate all the steps in this tutorial:
repo the path to the cloned repositoryStart by repeating the steps from the introductory tutorial to compile a parser, and assign it to variable named parser:
When we parse a token, the result is a Vector of analyses. Each analysis assocates the token with four identifiers (as you can see in the parser output). If the form is unambiguous, the Vector will have only one element:
1-element Vector{Analysis}:
Analysis("amabatur", ls.n2280, forms.3312120000, latcommon.verbn2280, latcommon.are_conj1impft9, "amabatur")
If the form is morphologically ambiguous, the results will include an analysis for each possibility.
4-element Vector{Analysis}:
Analysis("agricolae", ls.n1626, forms.2010001200, latcommon.noun1626, latcommoninfl.a_ae14, "agricolae")
Analysis("agricolae", ls.n1626, forms.2010001300, latcommon.noun1626, latcommoninfl.a_ae15, "agricolae")
Analysis("agricolae", ls.n1626, forms.2020001100, latcommon.noun1626, latcommoninfl.a_ae18, "agricolae")
Analysis("agricolae", ls.n1626, forms.2020001600, latcommon.noun1626, latcommoninfl.a_ae24, "agricolae")
Use the latinForm function to construct a Latin morphological form from the identifier in a morphological analysis. Morpological forms belong to subtypes of the abstract LatinMorphologicalForm type. The following cells, for example create LMFNoun and LMFFiniteVerb forms from our previous analyses.
These different types of form have different properties, as the default display suggests. Noun forms have properties for gender, case and number, while finite verb forms have properties for tense, mood, voice, person and number.
We can get at any property of a Latin form with a function having a name beginning with lower-case lmp followed by the property name. For example, the lmpCase function gets the morphological property of case, and lmpTense gets the tense property.
The same functions that retrieve a property from a form can also be used to construct a property from a string value. For example, you can use lmpTense to construct a property for tense.
We can take advantage of this in normal Julia operations on collections of analyses. For instance, in the following cell we separate out all the analyses for the token agricolae with plural forms, and extract the form object from them:
Note that the various types of LatinMorphologicalForm are not equivalent to a traditional “part of speech.” Rather, they are analytical types defined by their unique set of properties. “Verbs” as a category for part of speech include multiple types of morphological forms: finite verbs like the example above, but also infinitives, participles, and other forms.
Consider the ambiguity of the token amare, for example.
3-element Vector{LatinMorphologicalForm}:
present indicative passive second singular
present imperative passive second singular
present active infinitive
One of the forms is an infinitive, with only two morphological properties, for tense and voice. Compare the types of the forms.
We can meaningfully look at tense properties for all of the forms.
but if we apply the lmpPerson function to an infinitive, we get a warning, and resulting value of nothing.