Generating forms

Published

June 9, 2024

You can generate a complete analysis, including a string token, for a givein lexeme and morphological form. Tabulae can generate forms using either a parser or a dataset.

The following cell creates a Tabulae dataset from directories in the github repository’s datasets directory, and builds a parser from it.

using Tabulae, CitableParserBuilder

shareddir = joinpath(repo, "datasets", "core-infl-shared") 
lat25dir = joinpath(repo, "datasets", "core-infl-lat25") 
ds = dataset([shareddir, lat25dir])
parser =  tabulaeStringParser(ds)

Generating forms using a parser

To illustrate generating forms with a parser, we’ll first analyze a token.

parses = parsetoken("animo", parser)
2-element Vector{Analysis}:
 Analysis("animo", ls.n2636, forms.2010001300, latcommon.nounn2636, nouninfl.us_i3)
 Analysis("animo", ls.n2636, forms.2010001500, latcommon.nounn2636, nouninfl.us_i5)

animo is morphologically ambiguous: it has two possible analyses. We’ll use the latinForm function to see the morphological form of the first analysis.

form1 = latinForm(parses[1])
label(form1)
"masculine dative singular"

We’ll now try to generate the same form using identifying URNs for the form and the lexeme.

lex = lexemeurn(parses[1])
mform = formurn(parses[1])

generated = generate(lex, mform, parser)
1-element Vector{Analysis}:
 Analysis("animo", ls.n2636, forms.2010001300, latcommon.nounn2636, nouninfl.us_i3)

The result is a vector of Analysis objects – the same structure that parsetoken returns, but we only have one possible result for this specific form of this lexeme. The token function extracts the string value of the token from an analysis

token(generated[1])
"animo"

Generating forms using a Tabulae dataset

Using a dataset instead of a parser, we get the same results:

dsgenerated = generate(lex, mform, ds)
dsgenerated .|> token
1-element Vector{String}:
 "animo"

With a dataset, you also have the option of generating analyses from the combination of a lexeme identifier plus a full Latin morphological form, such as the noun form we saved from a previous analysis:

generate(lex, form1, ds)
1-element Vector{Analysis}:
 Analysis("animo", ls.n2636, forms.2010001300, latcommon.nounn2636, nouninfl.us_i3)