The task: We want to create a type for working with a collection of the citable books we developed on the previous page.
We should be able to filter the collection by appying URN logic to the identifiers for our books. We should be able to write our collection to plain-text format and re-instantiate it from the plain-text representation. And we should be able to apply any Julia functions for working with iterable content to our book list.
The implementation:
define a new type for a collection of citable books, the ReadingList type
identify it as a citable collection (the CitableCollectionTrait)
implement filtering the collection using URN logic (the UrnComparisonTrait)
implement round-trip serialization (the CexTrait) make the collection available to all Julia functions working with iterable content (Iterators)
Defining the ReadingList
Our model for a reading list is simple: it’s just a Vector of citable publications. We’ll annotate our vector as containing subtypes of the abstract CitablePublication we previously defined, even though in this example we’ll only use our one concrete implementation, the CitableBook. As with our other custom types, we’ll override Base.show.
The publications field is just a normal Julia Vector.
rl.publications[4]
Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)
What will make it different from other Vectors is that it will support a series of CITE traits.
Implementing the CitableCollectionTrait
We first want to identify our new type as fufilling the requirements of a citable collection with the CitableCollectionTrait. We’ll repeat the pattern:
define a singleton type for the trait value.
override the function identifying the trait value for our new type. Here the function is named citablecollectiontrait, and we’ll define it to return the concrete value CitableReadingList for the tyupe ReadingList.
The promise we now need to fulfill is that our collection will implement three further traits for URN comparison, serialization and iteration.
Implementing the UrnComparisonTrait
We have previously implemented the UrnComparisonTrait for an identifer type (the Isbn10Urn) and for a citable object type (the CitableBook). In both of those cases, we compared two objects of the same type, and returned a boolean result of comparing them on URN logic.
For our citable collection, we will implement the same suite of functions, but with a different signature and result type. This time, our first parameter will be a URN which we will use to filter the collection given in the second parameter. The result will be a (possibly empty) list of content in our citable collection – in this example, a list of CitableBooks.
We mark our ReadingList type as urn-comparable exactly as we did for Isbn10Urns and CitableBooks.
urncomparisontrait (generic function with 4 methods)
urncomparable(rl)
true
Implementing the required functions urnequals, urncontains and urnsimilar
To implement the required functions, we’ll just lean on the work we’ve already done: we’ll use the boolean version of those functions to filter our collections.
If your collection does not allow duplicate identifiers, urnequals should return a list of 0 or 1 item.
urnequals(distanthorizons, rl)
1-element Vector{CitableBook}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Three of the books in our list are published in the English-language zone, and therefore will satisfy urnsimilar when compared to Distant Horizons.
urnsimilar(distanthorizons, rl)
3-element Vector{CitableBook}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)
But only two are published in the same ISBN area code as Distant Horizons:
urncontains(distanthorizons, rl)
2-element Vector{CitableBook}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Implementing the CexTrait
As we did with citable objects, we want to ensure that we can round-trip an entire collection to and from delimited-text format. We’ll make our new ReadingList type implement CexTrait in the same way as CitableBook.
Implementing the required functions cex and fromcex
We will serialize our collection with a header line identifying it as citecollection block, followed by one line for each book in our list. We can format the books’ data by mapping each book to an invocation the cex that we previously wrote for CitableBooks.
#!citecollection
urn:isbn10:022661283X|Distant Horizons: Digital Evidence and Literary Change|Ted Underwood
urn:isbn10:022656875X|Enumerations: Data and Literary Study|Andrew Piper
urn:isbn10:1108922036|Can We Be Wrong? The Problem of Textual Evidence in a Time of Data|Andrew Piper
urn:isbn10:3030234133|Quantitative Intertextuality: Analyzing the Markers of Information Reuse|Christopher W. Forstall and Walter J. Scheirer
Recall from our experience implementing CEX serialization for CitableBooks that we will need to expose three mandatory parameters for fromcex: the trait value, the CEX data and the Julia type we want to instantiate.
To keep this example brief and avoid introducing other packages, our implementation of fromcex naively assumes cexsrc will contain a single CEX block introduced by the #!citecollection heading. This would break on real world CEX data sources: in a real application, we would instead use the CiteEXchange package to parse and extract appropriate blocks. See the documentation of CiteEXchange, or look at how a package like CitableCorpus uses CiteEXchange in its implementation of fromcex for different data type.
Once again, we can now invoke fromcex with just the parameters for the CEX data and desired Julia type to create, and CitableBase will find our implementation.
fromcex(cexoutput, ReadingList)
ReadingList with 4 items
Free bonus!
CitableBase optionally allows you to include a third parameter to the fromcex function naming the type of reader to apply to the first string parameter. Valid values are StringReader, FileReader or UrlReader. The previous example relied on the default value of StringReader. The following examples use the file RL/test/data/dataset.cex in this repository; its contents are the output of cex(rl) above.
Implementing required and optional frnctions from Base.Iterators
The Iterators module in Julia Base was one of the first traits or interfaces in Julia. It allows you to apply the same functions to many types of iterable collections. We need to import the Base.iterate function, and implement two versions of it for our new type: one with a single parameter for the collection, and one with a second parameter maintaining some kind of state information. Both of them have the same return type: either nothing, or a Tuple pairing one item in the collection with state information.
Since our reading list is keeping books in a Vector internally, we can use the state parameter to pass along an index into the Vector. In the version of iterate with no parameters, we’ll return the first item in the list, and set the “state” value to 2. In the two-parameter version, we’ll return the item indexed by the state count, and bump the count up one.
importBase: iteratefunctioniterate(rlist::ReadingList)isempty(rlist.publications) ? nothing: (rlist.publications[1], 2)endfunctioniterate(rlist::ReadingList, state) state >length(rlist.publications) ? nothing: (rlist.publications[state], state +1)end
iterate (generic function with 292 methods)
It is also useful (and trivial) to implement the optional methods for the length and base type of the collection.
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)
Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)
checking for presence of an item
distantbook in rl
true
collect contents without having to know anything about the internal structure of the type
collect(rl)
4-element Vector{CitablePublication}:
Ted Underwood, *Distant Horizons: Digital Evidence and Literary Change* (urn:isbn10:022661283X)
Andrew Piper, *Enumerations: Data and Literary Study* (urn:isbn10:022656875X)
Andrew Piper, *Can We Be Wrong? The Problem of Textual Evidence in a Time of Data* (urn:isbn10:1108922036)
Christopher W. Forstall and Walter J. Scheirer, *Quantitative Intertextuality: Analyzing the Markers of Information Reuse* (urn:isbn10:3030234133)
Recap: citable collections
On this page, we wrapped a citable collection type, the ReadingList, around a Vector of CitableBooks. We made the type identifiable as a citable collection. We implemented filter of the collection on URN logic with the UrnComparisonTrait, and serialization with the CexSerializableTrait. You can test these for these traits with boolean functions.
citablecollection(rl)
true
urncomparable(rl)
true
cexserializable(rl)
true
In addition, we made the ReadingList implement Julia’s Iterators behavior.