Parsing a string value

Published

June 8, 2024

First, create a GettysburgParser and assign it to a variable named parser.

Instantiate a parser
using CitableParserBuilder
parser = CitableParserBuilder.gettysburgParser()

When we parse a string token, the result is a Vector of Analysis objects. Our parser produces only one analysis for the toekn “score”.

Out of date

The signature with the optional data parameter is out of date and will be removed in the next release.

scoreparses = parsetoken("score", parser)
length(scoreparses)
1
typeof(scoreparses[1])
Analysis

The analysis object associates with the token a URN value, in abbreviated format, for each of the four properties of an analysis.

scoreparses[1].token
"score"
scoreparses[1].form
pennpos.NN

NN is the Penn Tree Bank code for Noun, singular or mass.

We can also parse a list of words. Here, parsing four words produces a Vector containing four Vectors of Analysis objects.

wordsparsed = parselist(split("Four score and seven"), parser)
length(wordsparsed)
4
Tip for parsing a citable corpus

You can use an OrthographicSystem to create generate a list of unique lexical tokens for an entire citable corpus. See the documentation for the Orthography.jl package with this tutorial.