using CitableBase, CitableText
= CtsUrn("urn:cts:citeusnatarch:foundingdocuments.declindep.engraving:")
engraving_text = CtsUrn("urn:cts:citeusnatarch:foundingdocuments.declindep.original:") original_text
A walkthrough: the American Declaration of Independence
This page was compiled with the following versions of the packages illustrated here:
- CitableBase 10.4.0
- CitableCorpus 0.13.5
- CitableImage 0.7.2
- CitableObject 0.16.1
- CitablePhysicalText 0.11.0
- CitableTeiReaders 0.10.3
- CitableText 0.16.2
- EditionBuilders 0.8.5
- OrderedCollections 1.7.0
- Orthography 0.22.0
- StatsBase 0.34.4
- VectorAlignments 0.3.0
Background and sources
The historian and classicist Danielle Allen offers a new interpretation of the American Declaration of Independence that is literally based on a new reading: she punctuates the text differently from most printed editions. (See her book (D. S. Allen 2014) and the subsequent manuscript available from the Institute for Advanced Study (D. Allen 2015).)
This page briefly illustrates how we could apply the Julia packages for citation and for working with digital texts to the problem of reading the text of the Declaration.
Sources
- The U.S. National Archives has published freely available, high-resolution images of both the original signed parchment, and a widely used engraved reproduction. (See the presentation of the Declaration of Independence from the National Archives, with links to downloadable images) These images have been added to the Homer Multitext project’s citable image service.
- Two transcriptions of the text in TEI-compliant XML are included in this site’s github repository:
- transcription 1: a transcription of the text of the engraving, taken from this page of the National Archives web site
- transcription 2: a transcription of the text from the image of the original parchment
- The relations of text, pages and documentary image are documented in a delimited-text file in this site’s github repository here.
Citation with URNs
Citing texts
Cite the text of the original and the engraving using CtsUrn
s with different version identifiers.
CtsUrn
s
Citing objects
Identify a collection of pages (front and back) for the original and for the engraving using Cite2Urn
s.
using CitableObject
= Cite2Urn("urn:cite2:citeusnatarch:declindep.engraving:")
engraving_pages = Cite2Urn("urn:cite2:citeusnatarch:declindep.original:") original_pages
Cite2Urn
s
Citing images
Identify documentary images for the front of the parchment and the first page of the engraving using Cite2Urn
s.
= Cite2Urn("urn:cite2:citeusnatarch:declindep.v1:Declaration_Pg1of1_AC")
original_img1 = Cite2Urn("urn:cite2:citeusnatarch:declindep.v1:Declaration_Engrav_Pg1of1_AC") engraving_img1
Digital texts
Orthography
Define an orthographic system for the texts we will read.
using Orthography
= simpleAscii() ortho
Create citable texts
Read the XML source from a URL, and make the marked-up content accessible through canonical citation with the CitableTeiReaders
package.
using CitableTeiReaders
= "https://raw.githubusercontent.com/neelsmith/quart-home/main/walkthrough/adoi/declaration_engraving.xml"
engraving_url = readcitable(engraving_url, engraving_text, TEIAnonblock, UrlReader)
engraving_corpus
= "https://raw.githubusercontent.com/neelsmith/quart-home/main/walkthrough/adoi/declaration_original.xml"
original_url = readcitable(original_url, engraving_text, TEIAnonblock, UrlReader) original_corpus
Compose a univocal diplomatic edition from the multivalent XML document with the EditionBuilders
package.
using EditionBuilders
= MidDiplomaticBuilder("Diplomatic edition buidler", "dipl")
builder = edited(builder, engraving_corpus; edition = "engraving_dip")
engraving_dipl = edited(builder, original_corpus; edition = "original_dipl") original_dipl
A full digital scholarly edition
Collect relations of text, image and artifact
Read records relating text, image and artifact using the CitablePhysicalText
package.
using CitablePhysicalText
= "https://raw.githubusercontent.com/neelsmith/quart-home/main/walkthrough/adoi/declaration.cex"
delimited_url = fromcex(delimited_url, DSECollection, UrlReader) triplesets
Services for images
Configure the Homer Mulitext project’s image service using the CitableImage
package.
using CitableImage
= "http://www.homermultitext.org/iipsrv"
imgbaseurl = "/project/homer/pyramidal/deepzoom"
imgroot = IIIFservice(imgbaseurl, imgroot) imgservice
Application: explore differences in the texts visually
First, we’ll find passages where the two editions differ. One way to focus our attention on differences in content rather than formatting is to compare tokenizations of each text. The following function compares two citable text passages by tokenizing each and comparing the result.
using CitableCorpus
"""True if text content of tokens in two citable passages matches."""
function tokensmatch(psg1::CitableCorpus.CitablePassage, psg2::CitableCorpus.CitablePassage, orthography::OrthographicSystem)
= tokenize(psg1,ortho)
tkns1 = tokenize(psg2,ortho)
tkns2 = map(t -> t.passage.text, tkns1)
text1 = map(t -> t.passage.text, tkns2)
text2 == text2
text1 end
Next we use the function to find parallel passages where the texts differ.
@assert length(original_dipl) == length(engraving_dipl.passages)
= []
difflist for (orig,dipl) in zip(original_dipl.passages, engraving_dipl.passages)
if ! tokensmatch(orig,dipl,ortho)
push!(difflist,(orig,dipl))
end
end
Look in appropriate set of triples:
1] |> label triplesets[
"Collection of DSE records for the stone engraving of the American Declaration of Independence"
= triplesets[1]
triples_engraving = triplesets[2] triples_original
Now find images
= difflist[1]
(psg1_orig, psg1_engraving) = imagesfortext(psg1_orig.urn, triples_original; keepversion = false)
imgs_orig = imagesfortext(psg1_engraving.urn, triples_engraving; keepversion = false) imgs_engraving
[ Info: Txt urn:cts:citeusnatarch:foundingdocuments.declindep.original_dipl:2
[ Info: Look for txturn urn:cts:citeusnatarch:foundingdocuments.declindep:2
[ Info: Txt urn:cts:citeusnatarch:foundingdocuments.declindep.engraving_dip:2
[ Info: Look for txturn urn:cts:citeusnatarch:foundingdocuments.declindep:2
1-element Vector{Cite2Urn}:
urn:cite2:citeusnatarch:declindep.v1:Declaration_Engrav_Pg1of1_AC@0.03594,0.1768,0.9243,0.1439
Display the result on this web page!
= "http://www.homermultitext.org/ict2/?"
ict = linkedMarkdownImage(ict, imgs_orig[1], imgservice; w=600, caption="Passage in original")
img_md_original
= linkedMarkdownImage(ict, imgs_orig[1], imgservice; w=600, caption="Passage in engraving")
img_md_engraving
"""
$(img_md_original)
$(img_md_engraving)
""" |> Markdown.parse
- manually use the Homer Multitext project’s Image Citation Tool to zoom in to more specific region of interest and cite it with a URN. Here we save references to the region of two images illustrating the word “Happiness.”
Manually explore, create two more detailed references.
= Cite2Urn("urn:cite2:citeusnatarch:declindep.v1:Declaration_Engrav_Pg1of1_AC@0.4672,0.1935,0.07188,0.01804")
happiness_engraving = Cite2Urn("urn:cite2:citeusnatarch:declindep.v1:Declaration_Pg1of1_AC@0.4765,0.1892,0.07596,0.01857")
happiness_original
= linkedMarkdownImage(ict, happiness_original, imgservice; w=600, caption="Passage in original")
detail_md_original
= linkedMarkdownImage(ict,happiness_engraving, imgservice; w=600, caption="Passage in engraving")
detail_md_engraving
"""
$(detail_md_original)
$(detail_md_engraving)
""" |> Markdown.parse
Application: organize differences in the texts as a sequence alignment
using VectorAlignments
= split("Now is the time")
tkns1 = split("Now is a time")
tkns2 featurematrix(tkns1, tkns2)
5×2 Matrix{Any}:
"Now" "Now"
"is" "is"
"the" nothing
nothing "a"
"time" "time"
Application: basic vocabulary frequency
It is straightforward to find the vocabulary of a citable corpus with a known orthography.
= tokenize(original_dipl, ortho)
tkns = filter(tkn -> tokencategory(tkn) isa LexicalToken,tkns)
lex = map(tkn -> tokentext(tkn), lex) vocab
1328-element Vector{SubString{String}}:
"In"
"Congress"
"July"
"The"
"unanimous"
"Declaration"
"of"
"the"
"thirteen"
"united"
⋮
"other"
"our"
"Lives"
"our"
"Fortunes"
"and"
"our"
"sacred"
"Honor"
We can use the StatsBase
package to count frequencies of items in a list, and can convert that dictionary to a sortable ordered dictionary with the OrderedCollections
package.
using StatsBase, OrderedCollections
= countmap(vocab) |> OrderedDict
counts sort!(counts; byvalue = true, rev = true)
OrderedDict{SubString{String}, Int64} with 576 entries:
"of" => 78
"the" => 76
"to" => 64
"and" => 55
"our" => 25
"for" => 20
"their" => 20
"has" => 20
"He" => 18
"in" => 18
"a" => 15
"them" => 15
"these" => 13
"by" => 13
"us" => 11
"have" => 11
"that" => 10
"all" => 10
"which" => 10
⋮ => ⋮