Unicode
Functions for Unicode strings.
List of functions
-
Unicode::IsUtf(String) -> BoolChecks whether a string is a valid utf-8 sequence. For example, the string
"\xF0"isn't a valid utf-8 sequence, while the string"\xF0\x9F\x90\xB1"correctly describes a utf-8 cat emoji. -
Unicode::GetLength(Utf8{Flags:AutoMap}) -> Uint64Returns the utf-8 string's length in characters (unicode code points). Surrogate pairs are counted as one character.
SELECT Unicode::GetLength("жніўня"); -- 6 -
Unicode::Find(string:Utf8{Flags:AutoMap}, subString:Utf8, [pos:Uint64?]) -> Uint64? -
Unicode::RFind(string:Utf8{Flags:AutoMap}, subString:Utf8, [pos:Uint64?]) -> Uint64?Searches for the first (or the last in the case of
RFind) substring occurrence in the string starting from theposposition. Returns the first character position from the found string. Null is returned in case of failure.SELECT Unicode::Find("aaa", "bb"); -- Null -
Unicode::Substring(string:Utf8{Flags:AutoMap}, from:Uint64?, len:Uint64?) -> Utf8Returns a substring from the
string, starting from thefromcharacter, with length equalinglencharacters. If thelenargument is omitted, the substring is taken to the end of the source string. Iffromis greater than the source string length, an empty""string is returned.SELECT Unicode::Substring("0123456789abcdefghij", 10); -- "abcdefghij" -
Unicode::Normalize...functions reduce the transmitted utf-8 string to a normal form:Unicode::Normalize(Utf8{Flags:AutoMap}) -> Utf8-- NFCUnicode::NormalizeNFD(Utf8{Flags:AutoMap}) -> Utf8Unicode::NormalizeNFC(Utf8{Flags:AutoMap}) -> Utf8Unicode::NormalizeNFKD(Utf8{Flags:AutoMap}) -> Utf8Unicode::NormalizeNFKC(Utf8{Flags:AutoMap}) -> Utf8
-
Unicode::Translit(string:Utf8{Flags:AutoMap}, [lang:String?]) -> Utf8Transliterates with Latin letters the words from the passed string, consisting entirely of characters of the alphabet of the language passed by the second argument. If no language is specified, the words are transliterated from Russian. Available languages: "kaz", "rus", "tur", and "ukr".
SELECT Unicode::Translit("Тот уголок земли, где я провел"); -- "Tot ugolok zemli, gde ya provel" -
Unicode::LevensteinDistance(stringA:Utf8{Flags:AutoMap}, stringB:Utf8{Flags:AutoMap}) -> Uint64Calculates the Levenshtein distance for the passed strings.
-
Unicode::Fold(Utf8{Flags:AutoMap}, [ Language:String?, DoLowerCase:Bool?, DoRenyxa:Bool?, DoSimpleCyr:Bool?, FillOffset:Bool? ]) -> Utf8Performs case folding for the transmitted string.
Parameters:Languageis set according to the same rules as inUnicode::Translit().DoLowerCaseconverts a string to lowercase letters,trueby default.DoRenyxaconverts diacritical characters to similar Latin characters,trueby default.DoSimpleCyrconverts diacritical Cyrillic characters to similar Latin characters,trueby default.FillOffsetparameter is not used.
SELECT Unicode::Fold("Kongreßstraße", false AS DoSimpleCyr, false AS DoRenyxa); -- "kongressstrasse" SELECT Unicode::Fold("ҫурт"); -- "сурт" SELECT Unicode::Fold("Eylül", "Turkish" AS Language); -- "eylul" -
Unicode::ReplaceAll(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8 -
Unicode::ReplaceFirst(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8 -
Unicode::ReplaceLast(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8Replaces all/first/last occurrence(s) of the
findstring ininputthroughreplacement. -
Unicode::RemoveAll(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8 -
Unicode::RemoveFirst(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8 -
Unicode::RemoveLast(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8Removes all/first/last occurrence(s) of characters defined in a
symbolsset frominput. The second argument is interpreted as an unordered set of characters to be deleted.SELECT Unicode::ReplaceLast("absence", "enc", ""); -- "abse" SELECT Unicode::RemoveAll("abandon", "an"); -- "bdo" -
Unicode::ToCodePointList(Utf8{Flags:AutoMap}) -> List<Uint32>Splits the string into a unicode sequence of codepoints.
-
Unicode::FromCodePointList(List<Uint32>{Flags:AutoMap}) -> Utf8Forms a unicode string from codepoints.
SELECT Unicode::ToCodePointList("Щавель"); -- [1065, 1072, 1074, 1077, 1083, 1100] SELECT Unicode::FromCodePointList(AsList(99,111,100,101,32,112,111,105,110,116,115,32,99,111,110,118,101,114,116,101,114)); -- "code points converter" -
Unicode::Reverse(Utf8{Flags:AutoMap}) -> Utf8Reverses the string.
-
Unicode::ToLower(Utf8{Flags:AutoMap}) -> Utf8 -
Unicode::ToUpper(Utf8{Flags:AutoMap}) -> Utf8 -
Unicode::ToTitle(Utf8{Flags:AutoMap}) -> Utf8Changes the character case in the string to upper, lower, or title case.
-
Unicode::SplitToList( string:Utf8?, separator:Utf8, [ DelimeterString:Bool?, SkipEmpty:Bool?, Limit:Uint64? ]) -> List<Utf8>Splits the string into substrings with a separator.
string— Source string.
separator— Separator.
Parameters:- DelimeterString:Bool? — treating a separator as a string (
trueby default) or a set of characters "any of" (false). - SkipEmpty:Bool? — whether to skip empty strings in the result,
falseby default. - Limit:Uint64? — limits the number of fetched components (unlimited by default); if the limit is exceeded, the raw suffix of the source string is returned in the last item.
- DelimeterString:Bool? — treating a separator as a string (
-
Unicode::JoinFromList(List<Utf8>{Flags:AutoMap}, separator:Utf8) -> Utf8Concatenates a list of strings into a single string using a
separator.SELECT Unicode::SplitToList("One, two, three, four, five", ", ", 2 AS Limit); -- ["One", "two", "three, four, five"] SELECT Unicode::JoinFromList(["One", "two", "three", "four", "five"], ";"); -- "One;two;three;four;five" -
Unicode::ToUint64(string:Utf8{Flags:AutoMap}, [prefix:Uint16?]) -> Uint64Converts from the string into a number. The second optional argument sets the numeral system. By default, 0 (automatic detection by prefix).
Supported prefixes:
0x(0X)(base-16),0(base-8). Default system: base-10.
The-sign before a number is interpreted as in the C's unsigned arithmetic. For example,-0x1-> UI64_MAX.
If a string contains invalid characters, or a number is beyond ui64, the function finishes with an error. -
Unicode::TryToUint64(string:Utf8{Flags:AutoMap}, [prefix:Uint16?]) -> Uint64?Similar to the
Unicode::ToUint64()function, but returnsNULLinstead of an error.SELECT Unicode::ToUint64("77741"); -- 77741 SELECT Unicode::ToUint64("-77741"); -- 18446744073709473875 SELECT Unicode::TryToUint64("asdh831"); -- Null -
Unicode::Strip(string:Utf8{Flags:AutoMap}) -> Utf8Cuts off the string's outermost characters of Space Unicode category.
SELECT Unicode::Strip("\u2002ыкль\u2002"u); -- "ыкль" -
Unicode::IsAscii(string:Utf8{Flags:AutoMap}) -> BoolChecks whether the utf-8 string consists exclusively of ASCII characters.
-
Unicode::IsSpace(string:Utf8{Flags:AutoMap}) -> Bool -
Unicode::IsUpper(string:Utf8{Flags:AutoMap}) -> Bool -
Unicode::IsLower(string:Utf8{Flags:AutoMap}) -> Bool -
Unicode::IsAlpha(string:Utf8{Flags:AutoMap}) -> Bool -
Unicode::IsAlnum(string:Utf8{Flags:AutoMap}) -> Bool -
Unicode::IsHex(string:Utf8{Flags:AutoMap}) -> BoolChecks whether the utf-8 string satisfies the specified condition.
-
Unicode::IsUnicodeSet(string:Utf8{Flags:AutoMap}, unicode_set:Utf8) -> BoolChecks whether the utf-8
stringconsists exclusively of characters indicated inunicode_set. Characters inunicode_setmust be indicated in square brackets.SELECT Unicode::IsUnicodeSet("ваоао"u, "[вао]"u); -- true SELECT Unicode::IsUnicodeSet("ваоао"u, "[ваб]"u); -- false