Data types
ClickHouse has a wide variety of data types available. YTsaurus uses its own data type system. Although the ClickHouse data type system largely overlaps that of YTsaurus, some data types from YTsaurus may not exist in ClickHouse, and vice versa. For a complete list of the types supported in the YTsaurus system, see YTsaurus data types.
Primitive data types
The table below shows the correspondence between the primitive (not composite) ClickHouse and YTsaurus types.
YTsaurus type | ClickHouse type | Comment |
---|---|---|
intN, uintN |
intN, uintN |
Integer numeric columns from the ClickHouse point look like (U)Int of the appropriate digit capacity. |
boolean |
YtBoolean |
There are no Booleans in regular ClickHouse, they are represented as single-byte unsigned numbers (UInt8 ) with values 0 and 1. CHYT supports the additional YtBoolean type that is similar to the UInt8 type in ClickHouse computations, but is interpreted as a boolean when written to a YTsaurus table. |
string, utf8 |
String |
Strings are available as string |
double |
Float64 |
|
date |
Date |
|
datetime |
DateTime |
|
any |
String |
Any columns are read by the ClickHouse engine as string columns containing data in YSON. See also: chyt.composite.default_yson_format . |
T with required = %false |
Nullable(T) |
If a column of a T type is not required , it is seen as a Nullable column. |
Composite data types
Useful links:
In YTsaurus, the following composite data types are available: lists, dicts, optional types, structures, and tuples. By default, composite values are seen by the CH engine in the same way as the Any
type, i.e. as a YSON string.
If you configure the chyt.composite.enable_conversion
setting to 1, CHYT starts converting composite types (specified in the schema, that is, having the type_v3
type) into the corresponding ClickHouse composite types.
Query example:
# A query with no composite type conversion enabled. Composite types
# are seen as strings in binary YSON.
yt --proxy <cluster_name> clickhouse execute 'select * from `//home/user/sample_table_composite`' --format Vertical
Row 1:
──────
list: [foo;bar;]
dict: [(binary unreadable characters)]
struct: [(binary unreadable characters)]
Row 2:
──────
list: []
dict: []
struct: [(binary unreadable characters)]
# A query with enabled conversion.
yt --proxy <cluster_name> clickhouse execute 'select * from `//home/user/sample_table_composite`' --format Vertical --setting chyt.composite.enable_conversion=1 --format Vertical
Row 1:
──────
list: ['foo','bar']
dict: [('key1',42),('key2',-17)]
struct: (3.14,'pi')
Row 2:
──────
list: []
dict: []
struct: (0,'')
Unfortunately, the ClickHouse and YTsaurus type systems are quite different as far as composite data types are concerned. Below are the main differences.
Optional types
In the YTsaurus type system, any type T
can be made optional by surrounding it with optional<T>
, for example, optional<optional<int64>>
, optional<list<string>>
, and optional<tuple<int64, string>>
data types are possible.
In ClickHouse, placing values into Nullable
is very limited. In particular, you cannot place into Nullable
:
- The type that is already
Nullable
. Array
.Tuple
.
Thus, none of the three examples of types above have their own representation in the ClickHouse type system.
To process such types in CHYT, the following conversion is used.
If type T
corresponds to type T'
of the ClickHouse type system and type Nullable(T')
does not exist in CH, optional<T>
is mapped to T'
replacing null with the default value for T'
. Examples:
-
optional<optional<int64>>
is mapped toNullable(Int64)
, with two types of different initialnull
merged into one. -
optional<list<string>>
is mapped toArray(String)
, with initialnull
turning into empty array[]
. -
optional<tuple<int64, string>>
is mapped toTuple(Int64, String)
, with initialnull
turning into a tuple of two default values for Int64 and String, i.e. tuple(0, '')
.
Dicts
In the Yandex type system, for any two types K
and V
you can make type dict<K, V>
. ClickHouse has nothing like dicts, so dict<K, V>
is mapped to Array(Tuple(K', V'))
.
See a query example above: the first value in the dict
column contains the {'key1' -> 42, 'key2' -> -17}
dict.
Tuples
Yandex type system tuples are mapped to ClickHouse tuples in a trivial way.
Structures
Structures from the Yandex type system could be mapped to named CH tuples (Named Tuple
) CH. Unfortunately, ClickHouse has problems with supporting named tuples. They are now supported only on the external level of the complex type description (see the relevant issue). As a result, you cannot place Named Tuple
inside, for example, a list. To simplify the current implementation, the decision was made to map the structures to regular tuples.
Thus, you can work with structures as with tuples, i.e. you can get elements from them by index through .1
, .2
, and so on or using the tupleElement
function:
yt --proxy <cluster_name> clickhouse execute 'select toTypeName(struct), struct.1, struct.2 from `//home/user/sample_table_composite`' --format Vertical --setting chyt.composite.enable_conversion=1
Row 1:
──────
toTypeName(struct): Tuple(Float64, String)
tupleElement(struct, 1): 3.14
tupleElement(struct, 2): pi
Row 2:
──────
toTypeName(struct): Tuple(Float64, String)
tupleElement(struct, 1): 0
tupleElement(struct, 2):
Tagged types
Tagged
types from the general Yandex type system are mapped to the corresponding types without a tag. The tag information is lost.
Variants, Named variants, Any
Variants and named variants are not supported. Their display logic is the same as if there is a value of the Any
type somewhere inside a composite type. Any value of the Variant
, Named Variant
, or Any
type simply turns into a YSON string with a value representation.
Tips for working with insufficiently schematized data
Below are a few methods to get around some data type problems.
Many tables contain dates and times in text or numeric representation (for example, in ISO or Unix ts format). To convert them to a time data type, you can use the standard ClickHouse functions for working with date and time, reinterpret, and CAST.
YTsaurus has historically used the Any
data type for working with composite data (arrays, dicts, structures). Despite the appearance of composite type support in the YTsaurus type system, a lot of old data still remains represented in Any
: it will be available as a binary YSON to the ClickHouse engine.
However, Any
representation as a binary YSON means almost nothing for the ClickHouse engine, since YSON support is not built into ClickHouse. ClickHouse interprets such data simply as a binary string with a custom format. Therefore, the ClickHouse and YTsaurus compatibility layer provides a number of auxiliary functions that enable you to address over a YSON structure using the YPath language. And also interpret its nodes as primitive YTsaurus types (Int64
, UInt64
, Double
, or Boolean
), strings (String
), or lists of primitive YTsaurus types (Array(Int64)
, Array(UInt64)
, Array(Double)
, or Array(Boolean)
) with the conversion into the corresponding built-in ClickHouse type. For more information about these functions, see Functions for working with YSON.
Attention
Conversion from YTsaurus representation to ClickHouse representation will inevitably result in longer query execution time.