| xapian-core
    1.4.23
    | 
Functions associated with handling Unicode characters. More...
| Enumerations | |
| enum | category { UNASSIGNED , UPPERCASE_LETTER , LOWERCASE_LETTER , TITLECASE_LETTER , MODIFIER_LETTER , OTHER_LETTER , NON_SPACING_MARK , ENCLOSING_MARK , COMBINING_SPACING_MARK , DECIMAL_DIGIT_NUMBER , LETTER_NUMBER , OTHER_NUMBER , SPACE_SEPARATOR , LINE_SEPARATOR , PARAGRAPH_SEPARATOR , CONTROL , FORMAT , PRIVATE_USE , SURROGATE , CONNECTOR_PUNCTUATION , DASH_PUNCTUATION , OPEN_PUNCTUATION , CLOSE_PUNCTUATION , INITIAL_QUOTE_PUNCTUATION , FINAL_QUOTE_PUNCTUATION , OTHER_PUNCTUATION , MATH_SYMBOL , CURRENCY_SYMBOL , MODIFIER_SYMBOL , OTHER_SYMBOL } | 
| Each Unicode character is in exactly one of these categories.  More... | |
| Functions | |
| unsigned | nonascii_to_utf8 (unsigned ch, char *buf) | 
| Convert a single non-ASCII Unicode character to UTF-8.  More... | |
| unsigned | to_utf8 (unsigned ch, char *buf) | 
| Convert a single Unicode character to UTF-8.  More... | |
| void | append_utf8 (std::string &s, unsigned ch) | 
| Append the UTF-8 representation of a single Unicode character to a std::string. | |
| category | get_category (unsigned ch) | 
| Return the category which a given Unicode character falls into. | |
| bool | is_wordchar (unsigned ch) | 
| Test if a given Unicode character is "word character". | |
| bool | is_whitespace (unsigned ch) | 
| Test if a given Unicode character is a whitespace character. | |
| bool | is_currency (unsigned ch) | 
| Test if a given Unicode character is a currency symbol. | |
| unsigned | tolower (unsigned ch) | 
| Convert a Unicode character to lowercase. | |
| unsigned | toupper (unsigned ch) | 
| Convert a Unicode character to uppercase. | |
| std::string | tolower (const std::string &term) | 
| Convert a UTF-8 std::string to lowercase. | |
| std::string | toupper (const std::string &term) | 
| Convert a UTF-8 std::string to uppercase. | |
Functions associated with handling Unicode characters.
Each Unicode character is in exactly one of these categories.
The Unicode standard calls this the "General Category", and uses a "Major, minor" convention to derive a two letter code.
| unsigned Xapian::Unicode::nonascii_to_utf8 | ( | unsigned | ch, | 
| char * | buf | ||
| ) | 
Convert a single non-ASCII Unicode character to UTF-8.
This is intended mainly as a helper method for to_utf8().
| ch | The character (which must be > 128) to write to buf. | 
| buf | The buffer to write the character to - it must have space for (at least) 4 bytes. | 
Referenced by to_utf8().
| 
 | inline | 
Convert a single Unicode character to UTF-8.
| ch | The character to write to buf. | 
| buf | The buffer to write the character to - it must have space for (at least) 4 bytes. | 
References nonascii_to_utf8().
Referenced by append_utf8().