Citable corpora

Published

June 12, 2024

The CitableTextCorpus is a wrapper around a Vector of CitablePassages. You can construct one from a Vector CitablePassages.

using CitableCorpus
using CitableText

bancroft1 = CtsUrn("urn:cts:citedemo:gburg.bancroft.v2:1")
everett1 = CtsUrn("urn:cts:citedemo:gburg.everett.v2:1")
psgs = [
    CitablePassage(bancroft1, "Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal."),
    CitablePassage(everett1, "Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.")
]
corpus = CitableTextCorpus(psgs)
┌ Warning: The active manifest file has dependencies that were resolved with a different julia version (1.9.1). Unexpected behavior may occur.
└ @ ~/Desktop/cite-julia/CitableCorpus.jl/Manifest.toml:0
┌ Warning: The project dependencies or compat requirements have changed since the manifest was last resolved.
│ It is recommended to `Pkg.resolve()` or consider `Pkg.update()` if necessary.
└ @ Pkg.API /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Pkg/src/API.jl:1807
Precompiling CitableCorpus
  ✓ CitableCorpus
  1 dependency successfully precompiled in 1 seconds. 43 already precompiled.
Corpus with 2 citable passages in 2 documents.

The CitableTextCorpus implements the CitableCollectionTrait, and therefore supports URN comparison and CEX serialization.

using CitableBase
citablecollection(corpus)
true
cexserializable(corpus)
true
urncomparable(corpus)
true
urntype(corpus)
CtsUrn

In addition, it implements Julia’s Iterators behavior.

Urn comparison and filtering

urnequals(everett1, corpus)
Corpus with 1 citable passages in 1 documents.
allgburg = CtsUrn("urn:cts:citedemo:gburg:")
urncontains(allgburg, corpus)
Corpus with 2 citable passages in 2 documents.
allgburg = CtsUrn("urn:cts:citedemo:gburg:")
urnsimilar(allgburg, corpus)
Corpus with 2 citable passages in 2 documents.

CEX serialization

corpuscex = cex(corpus)
"#!ctsdata\nurn:cts:citedemo:gburg.bancroft.v2:1|Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.\nurn:cts:citedemo:gburg.everett.v2:1|Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal."
fromcex(corpuscex, CitableTextCorpus) == corpus
true

## Iteration

length(corpus)
2
eltype(corpus)
CitablePassage
collect(corpus)
2-element Vector{CitablePassage}:
 <urn:cts:citedemo:gburg.bancroft.v2:1> Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
 <urn:cts:citedemo:gburg.everett.v2:1> Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
for psg in corpus
    println(psg)
end
<urn:cts:citedemo:gburg.bancroft.v2:1> Four score and seven years ago our fathers brought forth, on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
<urn:cts:citedemo:gburg.everett.v2:1> Four score and seven years ago our fathers brought forth, upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.