Data types
This section describes the data types supported by YTsaurus and the way they are described in the schema and represented in the formats.
Overview
YTsaurus supports a number of primitive types:
string
;integer
;boolean
;float
;double
;date
;datetime
;timestamp
;interval
.
As well as the following composite (complex) types:
optional
;list
;struct
;tuple
;variant
;tagged
.
You can use one of the following two methods to specify a type in a table schema:
- Using the
type
and (optionally) therequired
keys: historically, the first method but it is only good for defining primitive or optional primitive types. - Using the
type_v3
key.
The type
key always expects a string.
The type_v3
key expects either a string for primitive types or a YSON dictionary.
A YSON dict always has the type_name
key that stores the type name.
The remaining keys depend on the specific type and are described below.
Describing types in a schema
Primitive types
You can define primitive types in a schema both using the type
and the type_v3
keys.
If primitive type T
is defined using type
, YTsaurus will additionally check the required
key in the schema:
required=%true
: the column will have a strictly defined type.Null
or empty values are illegal.required=%false
: the column will be of typeoptional<T>
. Any values of a primitive type and theNull
value will be legal.
The table lists the supported types and their representation in the type
/type_v3
keys.
Description | Representation in type |
Representation in type_v3 |
---|---|---|
an integer belonging to the range [-2^63, 2^63-1] |
int64 |
int64 |
an integer belonging to the range [-2^31, 2^31-1] |
int32 |
int32 |
an integer belonging to the range [-2^15, 2^15-1] |
int16 |
int16 |
an integer belonging to the range [-2^7, 2^7-1] |
int8 |
int8 |
an integer belonging to the range [0, 2^64-1] |
uint64 |
uint64 |
an integer belonging to the range [0, 2^32-1] |
uint32 |
uint32 |
an integer belonging to the range [0, 2^16-1] |
uint16 |
uint16 |
an integer belonging to the range [0, 2^8-1] |
uint8 |
uint8 |
an 8-byte real number | double |
double |
a 4-byte real number | float |
float |
Standard true/false boolean |
boolean |
bool (different from type ) |
a random sequence of bytes | string |
string |
a proper UTF8 sequence | utf8 |
utf8 |
a string that contains a valid JSON | json |
json |
UUID, a random 16-byte sequence (stored in binary representation) | uuid |
uuid |
an integer in the range [0, 49673 - 1] represents the number of days from the Unix epoch; is the represented data range: [1970-01-01, 2105-12-31] |
date |
date |
an integer in the range [0, 49673 * 86400 - 1] , represents the number of seconds since the Unix epoch; is the represented time interval: [1970-01-01T00:00:00Z, 2105-12-31T23:59:59Z] |
datetime |
datetime |
an integer in the range [0, 49673 * 86400 * 10^6 - 1] , represents the number of microseconds since the Unix epoch; is the represented time interval: [1970-01-01T00:00:00Z, 2105-12-31T23:59:59.999999Z] |
timestamp |
timestamp |
an integer in the range [- 49673 * 86400 * 10^6 + 1, 49673 * 86400 * 10^6 - 1] , represents the number of microseconds between two timestamps |
interval |
interval |
a random YSON structure, is physically represented by a byte sequence, cannot have required=%true |
any |
yson (different from type ) |
a system singular type that can only contain null (creating a separate column with this type makes no sense; we don't expect to see this type in user tables, but it's useful for YQL integration) |
null |
null |
a singular type that can only contain null ; this type is different from null (creating a separate column with this type makes no sense; we don't expect to see this type in user tables, but it's useful for YQL integration) |
void |
void |
Schema example:
type_v3=utf8
type_v3=bool
type_v3=yson
Decimal
The values of type decimal(p, s)
are real numbers with the specified precision.
To define this type in a schema, specify the following keys:
type_name
: value ofdecimal
.precision
: total number of decimal digits in a numeric value,precision
must be in the range[1, 35]
.scale
: number of digits to the right of the decimal point in a numeric value,scale
must be in the range[0, precision]
.
Schema example:
type_v3={
type_name=`decimal`
precision=10;
scale=2;
}
The values 3.14
, -2.71
, 9.99
may be of type decimal(3, 2)
(precision=3
, scale=2
).
The type supports a number of special values, such as nan
, +inf
, -inf
.
Description of binary representation
Decimal
numbers have a unique binary representation
that is the default in many data formats, including yson
.
For the purposes of this representation, the values of decimal(p, s)
types are maintained as binary strings. Binary string length
depends on precision
.
Precision | Number of bits in the representation | Number of bytes in the representation |
---|---|---|
1-9 | 32 | 4 |
10-18 | 64 | 8 |
19-38 | 128 | 16 |
39-76 | 256 | 32 |
You need to perform the following steps to obtain a binary representation of a decimal
number. These steps will be illustrated with the values 3.1415
, -2.7182
of type decimal(5, 4)
.
- Take an integer made up of the value's digits. The number of bits is taken from
precision
in the table above. In this example, 32-bit numbers31415
,-27182
. - Write the number as a big-endian sequence. In this example, the strings are
\x00\x00\x7A\xB7
,\xFF\xFF\x95\xD2
. - Invert the most significant bit. In this example, the strings are
\x80\x00\7A\xB7
,\x7F\xFF\x95\xD2
.
The integer representations of the special values of nan
, +inf
, -inf
for the first step are shown in the table below:
Special value | Integer representation |
---|---|
nan |
INT_MAX |
+inf |
INT_MAX - 1 |
-inf |
- INT_MAX + 1 |
Please note
Only decimals with precision up to 35 are currently supported when working with YQL.
Optional type
The optional<T>
type means that a value may be of type T
or be empty.
Please note
Each use of optional
for a type adds new values.
For instance, optional<optional<bool>>
may take on the following values:
- The external
optional
is empty. - The external
optional
is non-empty, and the internal one is empty. - All the
optionals
are non-empty, the values aretrue
orfalse
.
Legacy columns containing the type=T;required=false
attributes correspond to type optional<T>
defined using type_v3
.
To define type optional
, specify the keys below:
type_name
: value ofoptional
.item
: element type description.
Schema example:
type_v3={
type_name=optional;
item=string;
}
type_v3={
type_name=optional;
item={
type_name=optional;
item=bool;
}
}
List
Values of type list<T>
are lists of elements of type T
.
To define the type in the schema, specify the keys below:
type_name
: value oflist
.item
: element type description.
Schema example:
type_v3={
type_name=list;
item=string;
}
type_v3={
type_name=list;
item={
type_name=list;
item=double;
}
}
Struct
A collection of named fields with specified value types.
To define this type in a schema, specify the following keys:
type_name
: value ofstruct
.members
: list of dictionaries with keys:name
: field name, must be a non-empty utf8 string.type
: field type.
Schema example:
type_v3={
type_name=struct;
members=[
{
name=foo;
type=int32;
};
{
name=bar;
type={
type_name=optional;
item=string;
}
};
]
}
Tuple
A collection of unnamed fields of certain predefined types.
To define this type, you need to specify the following keys in the schema:
type_name
: value oftuple
.elements
: list of dictionaries with keys:type
: element type.
Schema example:
type_v3={
type_name=tuple;
elements=[
{
type=double;
};
{
type=double;
};
]
}
Variant
Variant is strictly a single value from a defined collection of types.
A variant may be of one of two types:
- Variant over struct. Each type has a name (as in a struct), and each variant value is labeled with the name of the relevant variant element value.
- Variant over tuple. In this case, all the elements are unnamed, and each value is labeled with an index.
To define this type, specify the following keys in a schema:
type_name
: value ofvariant
.elements
ormembers
(not both): the keys have the same structure similar as these keys intuple
/struct
:elements
: for the option with unnamed elements with the key itself containing a list of dictionaries with keys:type
: element type.
members
: for the option with named elements with the key itself containing a list of dictionaries with keys:name
: element name, must be a non-empty utf8 string.type
: description of element type.
Schema example:
type_v3={
type_name=variant;
members=[
{
name=int_field;
type=int64;
};
{
name=string_field;
type=string;
};
]
}
type_v3={
type_name=variant;
elements=[
{
type=int32;
};
{
type=string;
};
{
type=double;
};
]
}
Dict
A dict is a sequence of key/value pairs.
YTsaurus does not check the keys for uniqueness or order.
However, most clients will upload data to an actual dictionary while processing, and the value for non-unique keys will be lost.
To define this type, specify the following keys in a schema:
type_name
: value ofdict
.key
: description of key type.value
: value type description.
Schema example:
type_v3={
type_name=dict;
key=int64;
value={
type_name=optional;
item=string;
};
}
Tagged
The tagged
type helps annotate other types with a string. Any value of type T
can serve as a value for type tagged<TAG_NAME,T>
,
however, the types themselves will be considered different wherever YTsaurus compares schemas. For instance, when the possibility of merging two tables into one is being checked.
To define this type, specify the following keys in a schema:
type_name
: value oftagged
.tag
: tag name, must be a non-empty utf8 string.item
: element type description.
—
Schema example:
type_v3={
type_name=tagged;
tag="image/svg";
item="string";
}
Representing compound types in formats
Formats are used to read and write tables.
Some formats do not support composite data, and some, such as dsv / schemaful_dsv, will return an error in response to an attempt to read a composite value. For instance, Values of type "any" are not supported by the chosen format
.
YSON
There are two YSON representations of composite types. The representation of types struct
and variant
are different:
the default representation is more convenient to use,
while the alternative representation yields better storage and processing performance.
You can switch between the representations by using the complex_type_mode
flag. Legal values: named
/ positional
.
Type representation descriptions are provided below. Unless otherwise specified, a type representation does not depend on the complex_type_mode
setting.
There are two representations for dict
type with string keys: by default, the positional
representation in the form of a list is used (since the data is stored in YTsaurus). For convenience, you can enable the named
mode using the string_keyed_dict_mode
flag.
Primitive types
Primitive types have a linear representation as a single YSON value.
The table shows a mapping between the primitive and the YSON types.
type / type_v3 |
YSON representation |
---|---|
int64 |
signed number |
int32 |
signed number |
int16 |
signed number |
int8 |
signed number |
uint64 |
unsigned number |
uint32 |
unsigned number |
uint16 |
unsigned number |
uint8 |
unsigned number |
double |
floating point number |
boolean / bool |
boolean value |
string |
string |
utf8 |
string |
date |
unsigned number, see below |
datetime |
unsigned number, see below |
timestamp |
unsigned number, see below |
interval |
signed number |
uuid |
string with binary data, see below |
any / yson |
value-dependent |
null |
# |
void |
# |
Time types
The YSON format has a time_mode
option that determines how the date
, datetime
, and timestamp
types are displayed. Its possible values are:
binary
(default) represents the value as an unsigned integer indicating the number of days/seconds/milliseconds that have passed since the beginning of the Unix epoch.text
represents values as strings; examples ofdate
,datetime
, andtimestamp
representations are as follows:
2022-01-02
,2022-01-02T03:04:05Z
, and2022-01-02T03:04:05.123456Z
.
UUID
The YSON format has a uuid_mode
option that determines how the uuid
type is displayed. Its possible values are:
binary
(default) uses a 16-byte binary representation.text_yt
uses a string representation in the YTsaurus format with 4 groups. For example:61626364-65666768-696a6b6c-6d6e6f70
.text_yql
uses a string representation in the YQL format with 5 groups, which resembles the RFC representation. For example:64636261-6665-6867-696a-6b6c6d6e6f70
.
Decimal
The YSON format has a decimal_mode
option that determines how the decimal
type is displayed. Its possible values are:
binary
(default) is encoded as a binary string with a binary representation of thedecimal
number.text
uses a text representation of thedecimal
number.
Optional
The representation of type optional
depends on its inner type.
This is required for backward compatibility with the columns with required=%false
.
If T
is an arbitrary type that is not optional
, optional<T>
is represented as follows:
- The
Null
value that isoptional
is represented by#
. - Otherwise, the
T
type value uses the conventional representation.
If T
is an arbitrary type that is optional
, optional<T>
is represented as follows:
- The
Null
value of the outeroptional
is represented as#
. - Otherwise, the
[ v ]
representation is used (YSON list of length 1), wherev
is the YSON representation of typeT
.
Example values, type optional<int64>
:
#
-42
Example values, type optional<optional<int64>>
:
#
[ # ]
[ -42 ]
List
Type list<T>
is encoded as a YSON list whose elements are encoded representations of elements of type T
.
Example value, type list<int64>
:
[]
[42; -1;]
Struct
A struct representation depends on the values of the complex_type_mode
flag.
Named representation (default)
The representation being described applies to the situation
when the YSON option of format complex_type_mode
is not set or is set to complex_type_mode=named
.
The struct is represented by a YSON dictionary where field names serve as keys and the contents of these fields are the values.
Example values, type struct<Foo:int64;Bar:optional<utf8>>
:
{Foo=42;Bar=#;}
{Foo=-5;Bar="minus five";}
Positional representation
Where the complex_type_mode=positional
option is set for the YSON format, a different representation is used.
A struct is encoded as a YSON list with the i-th position containing a YSON representation of the struct's i-th field.
The list may contain fewer elements than the number of fields in your struct, which means that the remaining types must be optional<T>
,
and the fields are considered to have an empty optional
value.
Example values, type struct<Foo:int64;Bar:optional<utf8>>
:
[42; #;]
[42]
[-5;"minus five";]
Tuple
Type tuple
is encoded as a fixed-length YSON list. The i-th position contains the i-th field's encoded value.
Example values, type tuple<int64;optional<utf8>>
:
[42; #;]
[-5;"minus five";]
Variant
Unnamed variant
The unnamed option is represented by a YSON list of length 2 that includes the following elements:
- Alternative number (indexed at 0).
- An encoded value for the relevant alternative.
Example values, type variant<int64;optional<utf8>>
:
[0; 42]
[1; #]
[1; "foo bar";]
Named variant option
Named representation (default)
This representation applies when the YSON option complex_type_mode
isn't specified or is set to complex_type_mode=named
.
The named option is represented by a YSON list of length 2 that includes the following elements:
- Alternative name.
- An encoded value for the relevant alternative.
Example values, type `variant<Foo:int64;Bar:optional
[Foo; 42]
[Bar; #]
[Bar; "foo bar";]
Positional representation
If the complex_type_mode=positional
option is set for a format.
The named option is represented by a YSON list of length 2 that includes the following elements:
- Index of alternative.
- An encoded value for the relevant alternative.
Example values, type variant<Foo:int64;Bar:optional<utf8>>
:
[Foo; 42]
[Bar; #]
[Bar; "foo bar";]
Dict
The dict
type is represented by default as a YSON list, where each element is a YSON list of two elements: key and value.
Example values of type dict<int32;string>
:
[[1;"one"];[4;"four"]]
[]
A dict
may be represented as a YSON dictionary; however, a YSON dictionary only supports strings as keys whereas dict
also supports other keys.
Dict with string keys
The representation of a dictionary with string keys depends on the value of the string_keyed_dict_mode
flag.
Positional representation (default)
This representation applies when the YSON option string_keyed_dict_mode
isn't specified or is set to string_keyed_dict_mode=positional
.
See above
Examples of the dict<string;int32>
values:
[["one";1];["four";4]]
Named representation
Where the string_keyed_dict_mode=named
option is set for the YSON format, a different representation is used.
dict
is encoded as a YSON dictionary.
Examples of the dict<string;int32>
values:
{one=1; four=4}
Tagged
Type tagged
does not change its elements' YSON representations.