using CitableText, CitableCorpus
= "Four score and seven years ago..."
s = CtsUrn("urn:cts:parserdocs:example.docs.v1:1")
psgurn = CitablePassage(psgurn, s) cpsg
<urn:cts:parserdocs:example.docs.v1:1> Four score and seven years ago...
Package version: 0.30.1
June 14, 2024
CitablePassage
and Orthography
packagesThe CitablePassage
type from the Julia CitableCorpus
package represents a passage of citable text with a URN identifier and a string value.
using CitableText, CitableCorpus
s = "Four score and seven years ago..."
psgurn = CtsUrn("urn:cts:parserdocs:example.docs.v1:1")
cpsg = CitablePassage(psgurn, s)
<urn:cts:parserdocs:example.docs.v1:1> Four score and seven years ago...
The tokenize
function in the Julia Orthography
package includes a method for tokenize CitablePassage
s. This creates a series of CitableToken
s.
See this tutorial for a hands-on introduction to tokenizing citable texts with the Orthography
package.
7-element Vector{CitableToken}:
<urn:cts:parserdocs:example.docs.v1_tokens:1.1> Four (LexicalToken)
<urn:cts:parserdocs:example.docs.v1_tokens:1.2> score (LexicalToken)
<urn:cts:parserdocs:example.docs.v1_tokens:1.3> and (LexicalToken)
<urn:cts:parserdocs:example.docs.v1_tokens:1.4> seven (LexicalToken)
<urn:cts:parserdocs:example.docs.v1_tokens:1.5> years (LexicalToken)
<urn:cts:parserdocs:example.docs.v1_tokens:1.6> ago (LexicalToken)
<urn:cts:parserdocs:example.docs.v1_tokens:1.6a> ... (PunctuationToken)
Each citable token has defined a new citable passage, with a single token for the text value.
7-element Vector{CitablePassage}:
<urn:cts:parserdocs:example.docs.v1_tokens:1.1> Four
<urn:cts:parserdocs:example.docs.v1_tokens:1.2> score
<urn:cts:parserdocs:example.docs.v1_tokens:1.3> and
<urn:cts:parserdocs:example.docs.v1_tokens:1.4> seven
<urn:cts:parserdocs:example.docs.v1_tokens:1.5> years
<urn:cts:parserdocs:example.docs.v1_tokens:1.6> ago
<urn:cts:parserdocs:example.docs.v1_tokens:1.6a> ...
The tokenizer has also extended the canonical citation of passages of text to refer to individual tokens. The entire passage had a passage component with a single level of citation (1
); the tokens are cited at two levels (1.1
, etc.)
Each of these citable passages is assigned a tokencategory.