OpenScripturesHebrew.jl

Published

November 9, 2024

OpenScripturesHebrew.jl is a Julia package for working with the morphological data of the Open Scriptures Hebrew Bible project (OSHB). OpenScripturesHebrew.jl parses the OSHB data into easily manipulated Julia tuples, and provides a Julia data model for the OSHB morphological data.

Quickest start

Download data

Get OSH annotations on all words in the Hebrew Bible:

using OpenScripturesHebrew
words = tanakh()
length(words)
432307

That’s a lot of words!

What’s in a word

Each word is reprsented by a named tuple.

words[1]
(urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HR", mtoken = "בְּ", otoken = "בְּ/רֵאשִׁ֖ית", otoken_num = 1, lemma = "b")

In the tuple, the morphologically analyzed token is named mtoken:

words[1].mtoken
"בְּ"

Morphological analysis

The morphological analysis for the token is represented by a code string, but you can use the parseword function to create a OSHMorphologicalForm.

words[1].code
"HR"
parseword(words[1])
preposition

Select by canonical reference

Select words for a passage identified by the OSH name and passage conventions:

versewords = oshverse("Gen", "1.1", words)
11-element Vector{Any}:
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HR", mtoken = "בְּ", otoken = "בְּ/רֵאשִׁ֖ית", otoken_num = 1, lemma = "b")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HNcfsa", mtoken = "רֵאשִׁ֖ית", otoken = "בְּ/רֵאשִׁ֖ית", otoken_num = 1, lemma = "7225")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HVqp3ms", mtoken = "בָּרָ֣א", otoken = "בָּרָ֣א", otoken_num = 2, lemma = "1254 a")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HNcmpa", mtoken = "אֱלֹהִ֑ים", otoken = "אֱלֹהִ֑ים", otoken_num = 3, lemma = "430")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HTo", mtoken = "אֵ֥ת", otoken = "אֵ֥ת", otoken_num = 4, lemma = "particle")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HTd", mtoken = "הַ", otoken = "הַ/שָּׁמַ֖יִם", otoken_num = 5, lemma = "particle")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HNcmpa", mtoken = "שָּׁמַ֖יִם", otoken = "הַ/שָּׁמַ֖יִם", otoken_num = 5, lemma = "8064")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HC", mtoken = "וְ", otoken = "וְ/אֵ֥ת", otoken_num = 6, lemma = "c")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HTo", mtoken = "אֵ֥ת", otoken = "וְ/אֵ֥ת", otoken_num = 6, lemma = "particle")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HTd", mtoken = "הָ", otoken = "הָ/אָֽרֶץ", otoken_num = 7, lemma = "particle")
 (urn = "urn:cts:compnov:bible.genesis.osh:1.1", code = "HNcbsa", mtoken = "אָֽרֶץ", otoken = "הָ/אָֽרֶץ", otoken_num = 7, lemma = "776")

Apply regular Julia mapping to a selection of words

Extract the token and the morphological analysis from the tuple:

map(versewords) do w
    string(w.mtoken, " = ", parseword(w))
end
11-element Vector{String}:
 "בְּ = preposition"
 "רֵאשִׁ֖ית = noun (common noun): feminine singular absolute state"
 "בָּרָ֣א = finite verb: qal perfect third singular masculine"
 "אֱלֹהִ֑ים = noun (common noun): masculine plural absolute state"
 "אֵ֥ת = particle"
 "הַ = particle"
 "שָּׁמַ֖יִם = noun (common noun): masculine plural absolute state"
 "וְ = conjunction"
 "אֵ֥ת = particle"
 "הָ = particle"
 "אָֽרֶץ = noun (common noun): common gender singular absolute state"

More information

Status

The package is not yet registered with the central Julia registry. You can directly add it to your Julia environment from its github repository https://github.com/neelsmith/OpenScripturesHebrew.jl. For example:

using Pkg
Pkg.add(url = "https://github.com/neelsmith/OpenScripturesHebrew.jl")