Working with YTsaurus tables

The main task solved by CHYT is processing tables that are already in YTsaurus without having to copy them to any third-party storages.

Thus, the main difference of CHYT from ClickHouse is that it does not have the variety of engines from regular ClickHouse. Tables are stored according to the logic of table storage in YTsaurus. Creating tables using standard ClickHouse engines is prohibited. Instead, you can read any of the existing tables in the corresponding YTsaurus cluster. You can also save results as YTsaurus tables.

"Under the hood", working with YTsaurus tables is arranged similarly to the Distributed engine from ClickHouse. Exception: you do not have to think about manual or automatic distribution of data into shards, because as a storage YTsaurus solves this task in a transparent, seamless, and reliable way, distributing data to nodes.

CHYT works only with schematized tables. The table must have the filled in /@schema attribute, or the Schema tab in the YTsaurus web interface must have a schema (i.e. at least one column is mentioned), which is the same thing.

YTsaurus tables are denoted by their full paths in Cypress wrapped in backticks or in double quotes: `//path/to/table` or "//path/to/table":

SELECT * FROM `//tmp/sample_table`

CHYT can read and write static and sorted dynamic tables, but note that due to the data storage peculiarities, reading from dynamic tables may take by several times more time than reading from static tables.

Reading multiple tables

Concatenation of several YTsaurus tables: concatYtTables(table1, [table2, [...]]).

Example:
```
SELECT * FROM concatYtTables("//tmp/sample_table", "//tmp/sample_table2")
```

Concatenation of all tables along cypressPath (without a slash at the end): concatYtTablesRange(cypressPath, [from, [to]]).

Example:

SELECT * FROM concatYtTablesRange("//tmp/sample_tables")
SELECT * FROM concatYtTablesRange('//tmp/sample_tables','2019-01-01')
SELECT * FROM concatYtTablesRange('//tmp/sample_tables', '2019-08-13T11:00:00')

For more information about these functions, see Reading a set of tables.

Saving results

CHYT can create static tables, insert data into them, and save query results in YTsaurus. Below is a list of supported queries:

INSERT INTO … VALUES;
INSERT INTO … SELECT;
CREATE TABLE;
CREATE TABLE AS SELECT.

Examples:

-- Inserting data into a table
INSERT INTO `//tmp/sample_table`(a) VALUES (10), (11);

-- Inserting data from another table
INSERT INTO `//tmp/sample_table`
SELECT a+100 FROM `//tmp/sample_table` ORDER BY a;

-- Creating a table
CREATE TABLE IF NOT EXISTS `//tmp/sample_table`
(
   a String,
   b Int32
) ENGINE = YtTable();

-- Creating a table with sorting by key b
CREATE TABLE IF NOT EXISTS `//tmp/sample_table_with_pk`
(
   a String,
   b Int32
) ENGINE = YtTable() order by b;

-- Creating a table based on a select query
CREATE TABLE `//tmp/sample_table` ENGINE = YtTable()
AS SELECT a FROM `//tmp/sample_table` ORDER BY a;

-- Creating a table based on a select query with sorting by column b
CREATE TABLE `//tmp/sample_table_with_pk` ENGINE = YtTable() order by b
AS SELECT b,a FROM `//tmp/sample_table` order by b;

To overwrite the existing data, just add the <append=%false> option before the path:

Example:

INSERT INTO `<append=%false>//tmp/sample_table`
SELECT a+100 FROM `//tmp/sample_table` ORDER BY a.

When creating a table, you can specify additional attributes. To do this, just pass them to Engine in the form of YSON.

Example:

CREATE TABLE `//tmp/sample_table`(i Int64) engine = YtTable('{compression_codec=snappy}');

CREATE TABLE `//tmp/sample_table`(i Int64) engine = YtTable('{optimize_for=lookup}');

CREATE TABLE `//tmp/sample_table`(i Int64) engine = YtTable('{primary_medium=ssd_blobs}').

There are restrictions when saving results: default values, expressions for TTL, and compression formats for columns are ignored.

Table manipulation queries

CHYT supports the following operations on tables:

TRUNCATE TABLE [IF EXISTS].
RENAME TABLE … TO …;
EXCHANGE TABLES … AND …;
DROP TABLE [IF EXISTS].

Examples:

-- Deleting all data from a table
TRUNCATE TABLE `//tmp/sample_table`;
TRUNCATE TABLE IF EXISTS `//tmp/sample_table`;

-- Renaming a table
RENAME TABLE `//tmp/sample_table` TO `//tmp/sample_table_renamed`;

-- Exchanging the names of two tables
EXCHANGE TABLES `//tmp/sample_table` AND `//tmp/other_sample_table`;

-- Deleting a table
DROP TABLE `//tmp/sample_table`;
DROP TABLE IF EXISTS `//tmp/sample_table`;

Virtual columns

Each YTsaurus table has virtual columns. These columns are not visible in the DESCRIBE TABLE and SELECT * FROM queries and you can only get them by explicitly accessing them.

3 virtual columns are currently available for each table:

$table_index Int64: The table number in the order of enumeration in concatYtTables (or in the sort order in concatYtTablesRange). When reading a single table, the index is always zero.
$table_path String: The full path to the table in YTsaurus.
$table_name String: The table name, i.e. the last literal of the table path.

Input tables can be filtered quite effectively by described virtual columns, since their values are known before reading any data from the tables. This property can be used to read almost any subset of tables from the directory.

However, take into account that although the tables themselves will not be read, the table metainformation (for example, the schema) will still be loaded. Besides that, even if the table does not fall under the condition and will not be read, the user's access permissions to all tables specified in concatYtTables/concatYtTablesRange will still be checked. Therefore, we recommend specifying restrictions using the arguments of the concatYtTablesRange function for directories with a large number of tables.

Query examples:

SELECT $table_name FROM "//home/dev/username/t1";
-- Result
-- #	$table_name
-- 1	"t1"

SELECT * FROM "//home/dev/username/t1";
--Result
--#	a	b	c
--1	0	1	1

SELECT *, $table_name, $table_path, $table_index
FROM concatYtTables("//home/dev/username/t0", "//home/dev/username/t1");
-- Result
--#	a	b	c	$table_name	$table_path	$table_index
--1	0	0	0	"t0"	"//home/dev/username/t0"	0
--2	0	1	1	"t0"	"//home/dev/username/t0"	0
--3	0	1	1	"t1"	"//home/dev/username/t1"	1

Working with dynamic tables

You can read dynamic tables, including dynamic stores (fresh data in memory), from CHYT.

Attention

To make data in dynamic stores visible when reading from CHYT and Map-Reduce, the table must be remounted and the enable_dynamic_store_read = %true attribute must be set.

By default, if you try to read a dynamic table without the enable_dynamic_store_read = %true attribute, the query will end with an error. If you really need data only from chunk stores (with a delay of tens of minutes), then you can set chyt.dynamic_table.enable_dynamic_store_read to 0, after which CHYT will forcibly start reading only chunk stores (see the query settings page).

JOIN in CHYT

How to start a private clique