using CitableText, CitableCorpus
s = "Four score and seven years ago..."
psgurn = CtsUrn("urn:cts:parserdocs:example.docs.v1:1")
cpsg = CitablePassage(psgurn, s)<urn:cts:parserdocs:example.docs.v1:1> Four score and seven years ago...Package version: 0.30.1
June 14, 2024
CitablePassage and Orthography packagesThe CitablePassage type from the Julia CitableCorpus package represents a passage of citable text with a URN identifier and a string value.
using CitableText, CitableCorpus
s = "Four score and seven years ago..."
psgurn = CtsUrn("urn:cts:parserdocs:example.docs.v1:1")
cpsg = CitablePassage(psgurn, s)<urn:cts:parserdocs:example.docs.v1:1> Four score and seven years ago...The tokenize function in the Julia Orthography package includes a method for tokenize CitablePassages. This creates a series of CitableTokens.
See this tutorial for a hands-on introduction to tokenizing citable texts with the Orthography package.
7-element Vector{CitableToken}:
 <urn:cts:parserdocs:example.docs.v1_tokens:1.1> Four (LexicalToken)
 <urn:cts:parserdocs:example.docs.v1_tokens:1.2> score (LexicalToken)
 <urn:cts:parserdocs:example.docs.v1_tokens:1.3> and (LexicalToken)
 <urn:cts:parserdocs:example.docs.v1_tokens:1.4> seven (LexicalToken)
 <urn:cts:parserdocs:example.docs.v1_tokens:1.5> years (LexicalToken)
 <urn:cts:parserdocs:example.docs.v1_tokens:1.6> ago (LexicalToken)
 <urn:cts:parserdocs:example.docs.v1_tokens:1.6a> ... (PunctuationToken)Each citable token has defined a new citable passage, with a single token for the text value.
7-element Vector{CitablePassage}:
 <urn:cts:parserdocs:example.docs.v1_tokens:1.1> Four
 <urn:cts:parserdocs:example.docs.v1_tokens:1.2> score
 <urn:cts:parserdocs:example.docs.v1_tokens:1.3> and
 <urn:cts:parserdocs:example.docs.v1_tokens:1.4> seven
 <urn:cts:parserdocs:example.docs.v1_tokens:1.5> years
 <urn:cts:parserdocs:example.docs.v1_tokens:1.6> ago
 <urn:cts:parserdocs:example.docs.v1_tokens:1.6a> ...The tokenizer has also extended the canonical citation of passages of text to refer to individual tokens. The entire passage had a passage component with a single level of citation (1); the tokens are cited at two levels (1.1, etc.)
Each of these citable passages is assigned a tokencategory.