String, Char & Unicode

String

cilia::String (AKA UTF8String) with basic/standard unicode support.
Based on UTF-8, as that IMHO is (among all the Unicode formats)

Iteration over a String or StringView by:

Convert Upper/Lower Case

Sorting

ByteString

ByteString to represent the strings with single byte encoding (i.e. the classical strings consisting of one-byte characters), like:

The encoding is not defined, the user has to take care of this.
Or a subclass with known encoding has to be used:

Char

Char8, Char16, Char32 are considered as different types for parameter overloading, but otherwise are like UInt8, UInt16, UInt32.

ICU

International Components for Unicode (“ICU”) for advanced Unicode support.

The ICU libraries provide support for:

  • The latest version of the Unicode standard
  • Character set conversions with support for over 220 codepages
  • Locale data for more than 300 locales
  • Language sensitive text collation (sorting) and searching based on the Unicode Collation Algorithm (=ISO 14651)
  • Regular expression matching and Unicode sets
  • Transformations for normalization, upper/lowercase, script transliterations (50+ pairs)
  • Resource bundles for storing and accessing localized information
  • Date/Number/Message formatting and parsing of culture-specific input/output formats
  • Calendar specific date and time manipulation
  • Text boundary analysis for finding characters, word and sentence boundaries

import icu adds extension methods for cilia::String