Data types
This section describes the data types supported by YTsaurus and the way they are described in the schema and represented in the formats.
Overview
YTsaurus supports a number of primitive types:
- string;
- integer;
- boolean;
- float;
- double;
- date;
- datetime;
- timestamp;
- interval.
As well as the following composite (complex) types:
- optional;
- list;
- struct;
- tuple;
- variant;
- tagged.
You can use one of the following two methods to specify a type in a table schema:
- Using the typeand (optionally) therequiredkeys: historically, the first method but it is only good for defining primitive or optional primitive types.
- Using the type_v3key.
The type key always expects a string.
The type_v3 key expects either a string for primitive types or a YSON dictionary.
A YSON dict always has the type_name key that stores the type name.
The remaining keys depend on the specific type and are described below.
Describing types in a schema
Primitive types
You can define primitive types in a schema both using the type and the type_v3 keys.
If primitive type T is defined using type, YTsaurus will additionally check the required key in the schema:
- required=%true: the column will have a strictly defined type.- Nullor empty values are illegal.
- required=%false: the column will be of type- optional<T>. Any values of a primitive type and the- Nullvalue will be legal.
The table lists the supported types and their representation in the type/type_v3 keys.
| Description | Representation in type | Representation in type_v3 | 
|---|---|---|
| an integer belonging to the range [-2^63, 2^63-1] | int64 | int64 | 
| an integer belonging to the range [-2^31, 2^31-1] | int32 | int32 | 
| an integer belonging to the range [-2^15, 2^15-1] | int16 | int16 | 
| an integer belonging to the range [-2^7, 2^7-1] | int8 | int8 | 
| an integer belonging to the range [0, 2^64-1] | uint64 | uint64 | 
| an integer belonging to the range [0, 2^32-1] | uint32 | uint32 | 
| an integer belonging to the range [0, 2^16-1] | uint16 | uint16 | 
| an integer belonging to the range [0, 2^8-1] | uint8 | uint8 | 
| an 8-byte floating-point number according to IEEE 754 | double | double | 
| a 4-byte floating-point number according to IEEE 754 | float | float | 
| Standard true/falseboolean | boolean | bool(different fromtype) | 
| a random sequence of bytes | string | string | 
| a proper UTF8 sequence | utf8 | utf8 | 
| a string that contains a valid JSON | json | json | 
| UUID, a random 16-byte sequence (stored in binary representation) | uuid | uuid | 
| an integer in the range [-53375809, 53375808 - 1],represents the number of days from the Unix epoch; the representable time range is about 145,000 years into the past and into the future see the section about temporal types | date32 | date32 | 
| an integer in the range [-53375809 * 86400, 53375808 * 86400 - 1],represents the number of seconds from the Unix epoch; the representable time range is about 145,000 years into the past and into the future see the section about temporal types | datetime64 | datetime64 | 
| an integer in the range [-53375809 * 86400 * 10^6, 53375808 * 86400 * 10^6 - 1],represents the number of microseconds from the Unix epoch; the representable time range is about 145,000 years into the past and into the future see the section about temporal types | timestamp64 | timestamp64 | 
| an integer in the range [-9223339708800000000, 9223339708800000000],represents the number of microseconds between two timestamp64timestampssee the section about temporal types | interval64 | interval64 | 
| an integer in the range [0, 49673 - 1],represents the number of days from the Unix epoch; representable date range: [1970-01-01, 2105-12-31]see the section about temporal types | date | date | 
| an integer in the range [0, 49673 * 86400 - 1],represents the number of seconds from the Unix epoch; representable time range: [1970-01-01T00:00:00Z, 2105-12-31T23:59:59Z]see the section about temporal types | datetime | datetime | 
| an integer in the range [0, 49673 * 86400 * 10^6 - 1],represents the number of microseconds from the Unix epoch; representable time range: [1970-01-01T00:00:00Z, 2105-12-31T23:59:59.999999Z]see the section about temporal types | timestamp | timestamp | 
| an integer in the range [- 49673 * 86400 * 10^6 + 1, 49673 * 86400 * 10^6 - 1],represents the number of microseconds between two timestamptimestampssee the section about temporal types | interval | interval | 
| (added in version 25.3) type datewith time zone informationsee the section about time zones | tz_date | tz_date | 
| (added in version 25.3) type datetimewith time zone informationsee the section about time zones | tz_datetime | tz_datetime | 
| (added in version 25.3) type timestampwith time zone informationsee the section about time zones | tz_timestamp | tz_timestamp | 
| (added in version 25.3) type date32with time zone informationsee the section about time zones | tz_date32 | tz_date32 | 
| (added in version 25.3) type datetime64with time zone informationsee the section about time zones | tz_datetime64 | tz_datetime | 
| (added in version 25.3) type timestamp64with time zone informationsee the section about time zones | tz_timestamp64 | tz_timestamp64 | 
| An arbitrary YSON structure that is physically represented as a byte sequence, cannot have a required=%trueattribute | any | yson(different fromtype) | 
| a system singular type that can only contain null(creating a separate column with this type makes no sense; we don't expect to see this type in user tables, but it's useful for YQL integration) | null | null | 
| a singular type that can only contain null; this type is different fromnull(creating a separate column with this type makes no sense; we don't expect to see this type in user tables, but it's useful for YQL integration) | void | void | 
Schema example:
type_v3=utf8
type_v3=bool
type_v3=yson
Temporal types
Temporal types in YTsaurus are categorized into two groups. Historically, the first ones to appear in the system were date, datetime, timestamp, and interval. They are used to represent times from the beginning of 1970 to the end of 2105. They were followed by date32, datetime64, timestamp64, and interval64. These types can be used to represent times over a wider range, about 145,000 years into the past and into the future. We recommend using the latter types, because they have a wider range of values.
The Gregorian calendar should be used for all of the temporal types. When dealing with values in the distant past, note that YTsaurus in no way accounts for the transition to the Gregorian calendar, which occurred in different countries at different times: YT assumes that the Gregorian calendar has always been in use.
Time zones
Types tz_timestamp64, tz_datetime64, tz_date32, tz_timestamp, tz_datetime, and tz_date store time information incorporating time zone details. Logically, these types store the pair:
- A timestamp, an integer from the corresponding "no time zone" type, representing a point in time in UTC.
- The name of the time zone from the IANA time zone database.
The internal representation of values for these types is described below. Certain higher-level tools offer a convenient way to work with these types.
| tz type | corresponding "no time zone" type | underlying integer type | integer type value range | unit | 
| tz_date | date | Uint16 | 
 | days | 
| tz_datetime | datetime | Uint32 | 
 | seconds | 
| tz_timestamp | timestamp | Uint64 | 
 | microseconds | 
| tz_date32 | date32 | Int32 | 
 | days | 
| tz_datetime64 | datetime64 | Int64 | 
 | seconds | 
| tz_timestamp64 | timestamp64 | Int64 | 
 | microseconds | 
This pair is serialized into a string as follows:
- The integer is written in presorted representation (see below).
- The time zone name is written in full, for example, Europe/Moscow.
The presorted representation of the integer is obtained as follows:
- Write the number in big-endian format.
- If the underlying integer type is signed (types tz_date32,tz_datetime64, andtz_timestamp64), invert the most significant (sign) bit. Skip this step for unsigned underlying types.
Example: you want to save the time point 2025-01-01T00:00:00 in Moscow time zone using type tz_datetime64. To do this, follow the steps below:
1. Convert the time point to UTC: 2024-12-31T21:00:00Z.
2. Convert UTC to Unix timestamp: 1735678800.
3. Write the timestamp in big-endian format: "\x00\x00\x00\x00\x67\x74\x5b\x50".
4. Since tz_datetime64 is based on the signed Int64 type, invert the most significant bit: "\x80\x00\x00\x00\x67\x74\x5b\x50".
5. Append time zone information: "\x80\x00\x00\x00\x67\x74\x5b\x50Europe/Moscow".
Decimal
The values of type decimal(p, s) are real numbers with the specified precision.
To define this type in a schema, specify the following keys:
- type_name: value of- decimal.
- precision: total number of decimal digits in a numeric value,- precisionmust be in the range- [1, 35].
- scale: number of digits to the right of the decimal point in a numeric value,- scalemust be in the range- [0, precision].
Schema example:
type_v3={
    type_name=decimal;
    precision=10;
    scale=2;
}
The values 3.14, -2.71, 9.99 may be of type decimal(3, 2) (precision=3, scale=2).
The type supports a number of special values, such as nan, +inf, -inf.
Description of binary representation
Decimal numbers have a unique binary representation
that is the default in many data formats, including yson.
For the purposes of this representation, the values of decimal(p, s) types are maintained as binary strings. Binary string length
depends on precision.
| Precision | Number of bits in the representation | Number of bytes in the representation | 
|---|---|---|
| 1-9 | 32 | 4 | 
| 10-18 | 64 | 8 | 
| 19-38 | 128 | 16 | 
| 39-76 | 256 | 32 | 
You need to perform the following steps to obtain a binary representation of a decimal number. These steps will be illustrated with the values 3.1415, -2.7182 of type decimal(5, 4).
- Take an integer made up of the value's digits. The number of bits is taken from precisionin the table above. In this example, 32-bit numbers31415,-27182.
- Write the number as a big-endian sequence. In this example, the strings are \x00\x00\x7A\xB7,\xFF\xFF\x95\xD2.
- Invert the most significant bit. In this example, the strings are \x80\x00\7A\xB7,\x7F\xFF\x95\xD2.
The integer representations of the special values of nan, +inf, -inf for the first step are shown in the table below:
| Special value | Integer representation | 
|---|---|
| nan | INT_MAX | 
| +inf | INT_MAX - 1 | 
| -inf | - INT_MAX + 1 | 
Please note
Only decimals with a precision of no more than 35 are currently supported when working with YQL.
Optional type
The optional<T> type means that a value may be of type T or be empty.
Please note
Each use of optional for a type adds new values.
For instance, optional<optional<bool>> may take on the following values:
- The external optionalis empty.
- The external optionalis non-empty, and the internal one is empty.
- All the optionalsare non-empty, the values aretrueorfalse.
Legacy columns containing the type=T;required=false attributes correspond to type optional<T> defined using type_v3.
To define type optional, specify the keys below:
- type_name: value of- optional.
- item: element type description.
Schema example:
type_v3={
  type_name=optional;
  item=string;
}
type_v3={
  type_name=optional;
  item={
    type_name=optional;
    item=bool;
  }
}
List
Values of type list<T> are lists of elements of type T.
To define the type in the schema, specify the keys below:
- type_name: value of- list.
- item: element type description.
Schema example:
type_v3={
  type_name=list;
  item=string;
}
type_v3={
  type_name=list;
  item={
    type_name=list;
    item=double;
  }
}
Struct
A collection of named fields with specified value types.
To define this type in a schema, specify the following keys:
- type_name: value of- struct.
- members: list of dictionaries with keys:- name: field name, must be a non-empty utf8 string.
- type: field type.
 
Schema example:
type_v3={
  type_name=struct;
  members=[
    {
      name=foo;
      type=int32;
    };
    {
      name=bar;
      type={
        type_name=optional;
        item=string;
      }
    };
  ]
}
Tuple
A collection of unnamed fields of certain predefined types.
To define this type, you need to specify the following keys in the schema:
- type_name: value of- tuple.
- elements: list of dictionaries with keys:
- type: element type.
Schema example:
type_v3={
  type_name=tuple;
  elements=[
     {
       type=double;
     };
     {
       type=double;
     };
  ]
}
Variant
Variant is strictly a single value from a defined collection of types.
A variant may be of one of two types:
- Variant over struct. Each type has a name (as in a struct), and each variant value is labeled with the name of the relevant variant element value.
- Variant over tuple. In this case, all the elements are unnamed, and each value is labeled with an index.
To define this type, specify the following keys in a schema:
- type_name: value of- variant.
- elementsor- members(not both): the keys have the same structure similar as these keys in- tuple/- struct:- elements: for the option with unnamed elements with the key itself containing a list of dictionaries with keys:- type: element type.
 
- members: for the option with named elements with the key itself containing a list of dictionaries with keys:- name: element name, must be a non-empty utf8 string.
- type: description of element type.
 
 
Schema example:
type_v3={
  type_name=variant;
  members=[
     {
       name=int_field;
       type=int64;
     };
     {
       name=string_field;
       type=string;
     };
  ]
}
type_v3={
  type_name=variant;
  elements=[
     {
       type=int32;
     };
     {
       type=string;
     };
     {
       type=double;
     };
  ]
}
Dict
A dict is a sequence of key/value pairs.
YTsaurus does not check the keys for uniqueness or order.
However, most clients will upload data to an actual dictionary while processing, and the value for non-unique keys will be lost.
To define this type, specify the following keys in a schema:
- type_name: value of- dict.
- key: description of key type.
- value: value type description.
Schema example:
type_v3={
  type_name=dict;
  key=int64;
  value={
    type_name=optional;
    item=string;
  };
}
Tagged
The tagged type helps annotate other types with a string. Any value of type T can serve as a value for type tagged<TAG_NAME,T>,
however, the types themselves will be considered different wherever YTsaurus compares schemas. For instance, when the possibility of merging two tables into one is being checked.
To define this type, specify the following keys in a schema:
- type_name: value of- tagged.
- tag: tag name, must be a non-empty utf8 string.
- item: element type description.
 —
 Schema example:
type_v3={
  type_name=tagged;
  tag="image/svg";
  item="string";
}
Representing compound types in formats
Formats are used to read and write tables.
Some formats do not support composite data, and some, such as dsv / schemaful_dsv, will return an error in response to an attempt to read a composite value. For instance, Values of type "any" are not supported by the chosen format.
YSON
There are two YSON representations of composite types. The representation of types struct and variant are different:
the default representation is more convenient to use,
while the alternative representation yields better storage and processing performance.
You can switch between the representations by using the complex_type_mode flag. Legal values: named / positional.
Type representation descriptions are provided below. Unless otherwise specified, a type representation does not depend on the complex_type_mode setting.
There are two representations for dict type with string keys: by default, the positional representation in the form of a list is used (since the data is stored in YTsaurus). For convenience, you can enable the named mode using the string_keyed_dict_mode flag.
Primitive types
Primitive types have a linear representation as a single YSON value.
The table shows a mapping between the primitive and the YSON types.
| type/type_v3 | YSON representation | 
|---|---|
| int64 | signed number | 
| int32 | signed number | 
| int16 | signed number | 
| int8 | signed number | 
| uint64 | unsigned number | 
| uint32 | unsigned number | 
| uint16 | unsigned number | 
| uint8 | unsigned number | 
| double | floating point number | 
| boolean/bool | boolean value | 
| string | string | 
| utf8 | string | 
| date | unsigned number, see below | 
| datetime | unsigned number, see below | 
| timestamp | unsigned number, see below | 
| interval | signed number | 
| uuid | string with binary data, see below | 
| any/yson | value-dependent | 
| null | # | 
| void | # | 
Time types
The YSON format has a time_mode option that determines how the date, datetime, and timestamp types are displayed. Its possible values are:
- binary(default) represents the value as an unsigned integer indicating the number of days/seconds/milliseconds that have passed since the beginning of the Unix epoch.
- textrepresents values as strings; examples of- date,- datetime, and- timestamprepresentations are as follows:
 - 2022-01-02,- 2022-01-02T03:04:05Z, and- 2022-01-02T03:04:05.123456Z.
UUID
The YSON format has a uuid_mode option that determines how the uuid type is displayed. Its possible values are:
- binary(default) uses a 16-byte binary representation.
- text_ytuses a string representation in the YTsaurus format with 4 groups. For example:- 61626364-65666768-696a6b6c-6d6e6f70.
- text_yqluses a string representation in the YQL format with 5 groups, which resembles the RFC representation. For example:- 64636261-6665-6867-696a-6b6c6d6e6f70.
Decimal
The YSON format has a decimal_mode option that determines how the decimal type is displayed. Its possible values are:
- binary(default) is encoded as a binary string with a binary representation of the- decimalnumber.
- textuses a text representation of the- decimalnumber.
Optional
The representation of type optional depends on its inner type.
This is required for backward compatibility with the columns with required=%false.
If T is an arbitrary type that is not optional, optional<T> is represented as follows:
- The Nullvalue that isoptionalis represented by#.
- Otherwise, the Ttype value uses the conventional representation.
If T is an arbitrary type that is optional, optional<T> is represented as follows:
- The Nullvalue of the outeroptionalis represented as#.
- Otherwise, the [ v ]representation is used (YSON list of length 1), wherevis the YSON representation of typeT.
Example values, type optional<int64>:
#
-42
Example values, type optional<optional<int64>>:
#
[ # ]
[ -42 ]
List
Type list<T> is encoded as a YSON list whose elements are encoded representations of elements of type T.
Example value, type list<int64>:
[]
[42; -1;]
Struct
A struct representation depends on the values of the complex_type_mode flag.
Named representation (default)
The representation being described applies to the situation
when the YSON option of format complex_type_mode is not set or is set to complex_type_mode=named.
The struct is represented by a YSON dictionary where field names serve as keys and the contents of these fields are the values.
Example values, type struct<Foo:int64;Bar:optional<utf8>>:
{Foo=42;Bar=#;}
{Foo=-5;Bar="minus five";}
Positional representation
Where the complex_type_mode=positional option is set for the YSON format, a different representation is used.
A struct is encoded as a YSON list with the i-th position containing a YSON representation of the struct's i-th field.
The list may contain fewer elements than the number of fields in your struct, which means that the remaining types must be optional<T>,
and the fields are considered to have an empty optional value.
Example values, type struct<Foo:int64;Bar:optional<utf8>>:
[42; #;]
[42]
[-5;"minus five";]
Tuple
Type tuple is encoded as a fixed-length YSON list. The i-th position contains the i-th field's encoded value.
Example values, type tuple<int64;optional<utf8>>:
[42; #;]
[-5;"minus five";]
Variant
Unnamed variant
The unnamed option is represented by a YSON list of length 2 that includes the following elements:
- Alternative number (indexed at 0).
- An encoded value for the relevant alternative.
Example values, type variant<int64;optional<utf8>>:
[0; 42]
[1; #]
[1; "foo bar";]
Named variant option
Named representation (default)
This representation applies when the YSON option complex_type_mode isn't specified or is set to complex_type_mode=named.
The named option is represented by a YSON list of length 2 that includes the following elements:
- Alternative name.
- An encoded value for the relevant alternative.
Example values, type `variant<Foo:int64;Bar:optional
[Foo; 42]
[Bar; #]
[Bar; "foo bar";]
Positional representation
If the complex_type_mode=positional option is set for a format.
The named option is represented by a YSON list of length 2 that includes the following elements:
- Index of alternative.
- An encoded value for the relevant alternative.
Example values, type variant<Foo:int64;Bar:optional<utf8>>:
[Foo; 42]
[Bar; #]
[Bar; "foo bar";]
Dict
The dict type is represented by default as a YSON list, where each element is a YSON list of two elements: key and value.
Example values of type dict<int32;string>:
[[1;"one"];[4;"four"]]
[]
A dict may be represented as a YSON dictionary; however, a YSON dictionary only supports strings as keys whereas dict also supports other keys.
Dict with string keys
The representation of a dictionary with string keys depends on the value of the string_keyed_dict_mode flag.
Positional representation (default)
This representation applies when the YSON option string_keyed_dict_mode isn't specified or is set to string_keyed_dict_mode=positional.
See above
Examples of the dict<string;int32> values:
[["one";1];["four";4]]
Named representation
Where the string_keyed_dict_mode=named option is set for the YSON format, a different representation is used.
dict is encoded as a YSON dictionary.
Examples of the dict<string;int32> values:
{one=1; four=4}
Tagged
Type tagged does not change its elements' YSON representations.