Build your own orthographic system
Warning
Just a dump of notes here: contents TBA.
Orthography.jl
defines an abstract type OrthographicSystem
Concrete subtypes must implement three functions:
codepoints(orthography)
: returns a complete list of codepoints allowed in this orthographytokentypes(orthography)
: returns a complete list of the types of tokens that can be recognized in this orthography. These are subtypes ofTokenCategory
.tokenize(s, orthography)
: useorthography
to tokenizes
. This function is the basis for the higher-order functions presented in the following pages.