using Unicode
desert = "מִדְבָּר"
graphemev = graphemes(desert) |> collect4-element Vector{SubString{String}}:
"מִ"
"דְ"
"בָּ"
"ר"
Built with package version 0.4.0
July 9, 2024
The BiblicalHebrew package includes a few shortcuts for viewing different representations of codepoints that, although generic, can be useful when working with fully pointed Hebrew text.
The Unicode package’s graphemes function iterates through the graphemes in a string. If we apply it to the string “מִדְבָּר” and collect the results into a vector, the vector will have four elements, one for each consonant together with any associated points such as vowel points or dagesh.
4-element Vector{SubString{String}}:
"מִ"
"דְ"
"בָּ"
"ר"
We can use collect on string values to gather a vector of Chars.
BiblicalHebrew.codepoint gives the integer value of a character.
2-element Vector{UInt32}:
0x000005de
0x000005b4
These are unsigned integers. If you want signed integers, you can construct signed integers directly from them:
So these are tautologies:
(BiblicalHebrew.codepoint.(charv) .|> Char) ==
(BiblicalHebrew.codepoint.(charv) .|> Int64 .|> Char) == charvtrue
Julia’s string function displays integers in decimal notation.
BiblicalHebrew.hex gets a hex string for codepoints, integers or characters:
2-element Vector{String}:
"5de"
"5b4"
And BiblicalHebrew.int converts a hex string into an integer value.
2-element Vector{UInt32}:
0x000005de
0x000005b4
So this is also a tautology:
BiblicalHebrew.codept_split works like Julia’s split function, but by default keeps the separating character value;
You can override that behavior: