Canonically citable texts

Published

January 6, 2025

The four main texts

We have drawn on existing openly licensed versions of the Hebrew Bible, the Latin Vulgate, the Greek Septuagint and the Aramic Targum Onkelos to create a single delimited-text file (CEX format) containing all four texts with passages identified by CTS URN. The source files and resulting composite are available from this github repository. You can directly download the file with the complete set of citable texts from https://github.com/neelsmith/compnov/raw/refs/heads/main/corpus/compnov.cex

Example usage

In Julia, you can use the CitableCorpus package to load a corpus from a CEX file like this:

   Resolving package versions...
  No Changes to `~/.julia/scratchspaces/4c0109c6-14e9-4c88-93f0-2b974d3468f4/loader.1.11.2.C6bTW6WcJii/Project.toml`
  No Changes to `~/.julia/scratchspaces/4c0109c6-14e9-4c88-93f0-2b974d3468f4/loader.1.11.2.C6bTW6WcJii/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/scratchspaces/4c0109c6-14e9-4c88-93f0-2b974d3468f4/loader.1.11.2.C6bTW6WcJii/Project.toml`
  No Changes to `~/.julia/scratchspaces/4c0109c6-14e9-4c88-93f0-2b974d3468f4/loader.1.11.2.C6bTW6WcJii/Manifest.toml`

using CitableBase, CitableCorpus
corpusurl = "https://github.com/neelsmith/compnov/raw/refs/heads/main/corpus/compnov.cex"
corpus = fromcex(corpusurl, CitableTextCorpus, UrlReader)

Corpus with 85514 citable passages in 139 documents.

Sources for the texts

Tanach, Vulgate, Septuagint

The texts of the Hebrew Bible, the Latin translation of Jerome and the Greek translation of the Septuagint are taken from https://ebible.org:

כתבי הקודש: the Hebrew Masoretic text (public domain)
The Clementine Vulgate of 1598 (public domain)
The Greek Septuagint with Apocrypha, compiled by Sir Lancelot C. L. Brenton (public domain)

Plain-text versions of the ebibles.org files are replicated in the src directory of the github repository (here). The file scripts/citify.jl is a Julia script for converting the ebibles source texts (in the src directory) to a single citable corpus.

The Targum Onkelos

The digital text of the Targum Onkelos is taken from the “merged” texts in this directory of the Sefaria project’s data: https://github.com/Sefaria/Sefaria-Export/tree/master/txt/Tanakh/Targum/Onkelos/Torah

The github repository’s src/onkelos directory has one “merged” file for each both of the Torah. The file src/onkelos/source-credits.txt has the Sefaria project’s detailed metadata about each file.

The script src/onkelos/parse-onkelos.jl is a Julia script that creates a single file in CEX format from the five source files.

Latin glosses on the Greek and Aramaic translations

Repository for glosses

We are working on digital encoding of textual and non-textual content of the Complutensian Bible in this github repository. This repository includes editions of the Latin glosses on the Septuagint and on the Targum Onkelos in TEI-conformant XML.

Status

initial draft XML editions of the Latin glosses on the Septuagint and the Targum Onkelos are complete through Genesis chapter 40.
the contents of the Latin glosses have not yet been fully machine validated