String, Char & Unicode

cilia::String with basic/standard unicode support.
- Based on UTF-8, as that IMHO is (among all the Unicode formats)
  - the most widespread nowadays,
  - the most compatible (as it is ASCII based),
  - the most efficient, at least for “western” use (and you are free to use UTF16- or UTF32String otherwise).
- Iteration over a String or StringView by:
  - Graphemes/Grapheme Clusters
    - represented by StringView.
    - This is the default form of iteration over a String or StringView
    - A single grapheme will often consist of multiple code units
      and may even consist of multiple code points (then it is called a grapheme cluster).
    - for grapheme in "abc 🥸👮🏻"
      - “a”, “b”, “c”, “ “, “🥸”, “👮🏻”
      - “\x61”, “\x62”, “\x63”, “\x20”, “\xf0\x9f\xa5\xb8”, “\xf0\x9f\x91\xae\xf0\x9f\x8f\xbb”
    - A bit slow, as it has to find grapheme (and cluster) boundaries.
    - It is recommended to mostly use the standard functions for string manipulation anyway. But if you need to iterate manually over a Unicode-String, then grapheme-cluster-based iteration is the safe/right way.
    - Additional/alternative names?
      - for graphemeCluster in text.asGraphemeClusters()?
  - Code Points
    - represented by UInt32,
      - independent of the encoding (i.e. the same for UTF-8, UTF-16, and UTF-32 strings).
        
        Called “auto decoding” in D.
      - for codePoint in "abc 🥸👮🏻".asCodePoints()
      - 0x00000061, 0x00000062, 0x00000063, 0x00000020, 0x0001F978, 0x0001F46E, 0x0001F3FB
    - Note: Not even with UTF-32 do all grapheme clusters fit into a single code point,
      so not:
      - Emoji ZWJ Sequences (Zero Width Joiner),
        
        emoji with modifier characters like skin tone or variation selector,
      - diacritical characters (äöü…, depending on the normal form chosen),
      - surely some more …
    - A bit faster than iteration over grapheme clusters, but still slow, as it has to find code point boundaries in UTF-8/16 strings.
    - Fast with UTF-32 strings, but UTF-32 strings in general are often slower than UTF-8, simply due to their size (cache, memory bandwidth).
  - Code Units
    - represented by
      - Char for String
        
        it is Char==Char8==UInt8 and String==UTF8String
      - Char16 for UTF16String
      - Char32 for UTF32String
    - for aChar8 in "abc 🥸👮🏻".asArray()
      - 0x61, 0x62, 0x63, 0x20, 0xf0, 0x9f, 0xa5, 0xb8, 0xf0, 0x9f, 0x91, 0xae, 0xf0, 0x9f, 0x8f, 0xbb
      - same for
        
        for aChar8 in u8"abc 🥸👮🏻".asArray()
        
        for aChar8 in UTF8String("abc 🥸👮🏻").asArray()
    - for aChar16 in u"abc 🥸👮🏻".asArray()
      - 0x0061, 0x0062, 0x0063, 0x0020, 0xD83E, 0xDD78, 0xD83D, 0xDC6E, 0xD83C, 0xDFFB
      - same for for aChar16 in UTF16String("abc 🥸👮🏻").asArray()
    - for aChar32 in U"abc 🥸👮🏻".asArray()
      - 0x00000061, 0x00000062, 0x00000063, 0x00000020, 0x0001F978, 0x0001F46E , 0x0001F3FB
      - same for for aChar32 in UTF32String("abc 🥸👮🏻").asArray()
- string.toUpper(), string.toLower()
  - toUpper(String) -> String, toLower(String) -> String
- stringArray.sort()
  - sort(Container<String>) -> Container<String>
- compare(stringA, stringB) -> Int
ByteString to represent the strings with single byte encoding (i.e. the classical strings consisting of one-byte characters),
- like
  - ASCII
  - Latin-1
  - ANSI (mostly identical to Latin-1)
  - almost every one of the “code pages”
- Encoding is not defined.
  - The user has to take care of this,
  - or a subclass with known encoding has to be used (ASCIIString, Latin1String).
- ASCIIString, a string containing only ASCII characters.
  - Iteration over an ASCIIString or ASCIIStringView by Char==Char8
    - for aChar in a"abc"
      - 0x61, 0x62, 0x63
      - ‘a’, ‘b’, ‘c’
      - Compilation error, if string literal contains non-ASCII characters.
      - same for for aChar in ASCIIString("abc")
        
        but Exception thrown, if string contains non-ASCII characters.
  - Implicitly convertable to String==UTF8String.
    - Very fast conversion, as all characters have the same binary representation.
- Latin1String, a string containing only Latin-1 (ISO 8859-1) characters.
  - Iteration over an Latin1String or Latin1StringView by Char==Char8
    - for aChar in l"äßç"
      - 0xe4, 0xdf, 0xe7
      - ‘ä’, ‘ß’, ‘ç’
      - Compilation error, if string literal contains non-Latin-1 characters.
      - same for for aChar in Latin1String("abc")
        
        but Exception thrown, if string contains non-Latin1 characters.
  - Explicitly convertable to String==UTF8String.
    - Not as fast a conversion as ASCIIString to String, because typically some characters need to be translated into two UTF-8 code units.
Char8, Char16, Char32
- are considered as different types for parameter overloading,
- but otherwise are like UInt8, UInt16, UInt32,
ICU (“International Components for Unicode”) for advanced Unicode support.
- The ICU libraries provide support for:
  - The latest version of the Unicode standard
  - Character set conversions with support for over 220 codepages
  - Locale data for more than 300 locales
  - Language sensitive text collation (sorting) and searching based on the Unicode Collation Algorithm (=ISO 14651)
  - Regular expression matching and Unicode sets
  - Transformations for normalization, upper/lowercase, script triterations (50+ pairs)
  - Resource bundles for storing and accessing localized information
  - Date/Number/Message formatting and parsing of culture specific input/ou formats
  - Calendar specific date and time manipulation
  - Text boundary analysis for finding characters, word and sentence boundaries
- import icu adds extension methods for cilia::String
  - Allows iteration over:
    - words (important/difficult for Chinese, Japanese, Thai or Khmer, needs list of words)
      - for word in text.asWords()
    - lines
      - for line in text.asLines()
    - sentences (needs list of abbreviations, like “e.g.”, “i.e.”, “o.ä.”)
      - for sentence in text.asSentences()
  - Depending on locale
    - string.toUpper(locale), string.toLower(locale)
      - toUpper(String, locale) -> String, toLower(String, locale) -> String
    - stringArray.sort(locale)
      - sort(Container<String>, locale) -> Container<String>
    - compare(stringA, stringB, locale) -> Int