using Unicode
= "מִדְבָּר"
desert = graphemes(desert) |> collect graphemev
4-element Vector{SubString{String}}:
"מִ"
"דְ"
"בָּ"
"ר"
Built with package version 0.4.0
July 9, 2024
The BiblicalHebrew
package includes a few shortcuts for viewing different representations of codepoints that, although generic, can be useful when working with fully pointed Hebrew text.
The Unicode package’s graphemes
function iterates through the graphemes in a string. If we apply it to the string “מִדְבָּר” and collect the results into a vector, the vector will have four elements, one for each consonant together with any associated points such as vowel points or dagesh.
4-element Vector{SubString{String}}:
"מִ"
"דְ"
"בָּ"
"ר"
We can use collect
on string values to gather a vector of Char
s.
BiblicalHebrew.codepoint
gives the integer value of a character.
2-element Vector{UInt32}:
0x000005de
0x000005b4
These are unsigned integers. If you want signed integers, you can construct signed integers directly from them:
So these are tautologies:
(BiblicalHebrew.codepoint.(charv) .|> Char) ==
(BiblicalHebrew.codepoint.(charv) .|> Int64 .|> Char) == charv
true
Julia’s string
function displays integers in decimal notation.
BiblicalHebrew.hex
gets a hex string for codepoints, integers or characters:
2-element Vector{String}:
"5de"
"5b4"
And BiblicalHebrew.int
converts a hex string into an integer value.
2-element Vector{UInt32}:
0x000005de
0x000005b4
So this is also a tautology:
BiblicalHebrew.codept_split
works like Julia’s split
function, but by default keeps the separating character value;
You can override that behavior: