Instance configuration
The article contains information about the configuration of CHYT instances.
The CHYT configuration is a YSON document consisting of two sections: the clickhouse
section and the yt
section. The first section contains settings that affect the logic of the ClickHouse computation engine: they are almost identical to the original ClickHouse engine settings. The second section contains the settings of the YTsaurus Runtime built into the CHYT instances: these settings strongly resemble many of the settings from, for example, the operation specification.
An important difference between CHYT and original ClickHouse is that the settings are specified not in XML format, but in YSON format that is native for YTsaurus. The principle by which the settings are interpreted in this format is described below.
How the configuration is formed
The configuration is formed sequentially by overlaying the following documents:
- The basic config from Cypress. By default, this is a document from
//sys/clickhouse/config
. You can specify a different document using thecypress-base-config-path
option when you start the clique. - The computed part of the config generated by the startup procedure. Contains various memory and CPU limits, as well as other metainformation.
- The config passed in the
clickhouse-config
option when you start the clique. - The native cluster connection configuration taken from the contents of
//sys/@cluster_connection
.
Attention!
All clique instances always work with the same configuration and it is impossible to change the configuration without restarting the clique.
YTsaurus part configuration
The YTsaurus part of the configuration is in the yt
top-level section.
-
settings
: CHYT-specific default query settings. Changing the setting in this section changes the default value for this setting for all queries in the clique. -
table_writer
: Table Writer configuration. -
table_attribute_cache
: Table attribute cache configuration. This cache significantly improves the responsiveness of CHYT, but at the moment it can potentially lead to non-consistent reads (data has already appeared in the table, but CHYT does not see it yet). To disable this cache, use the following configuration:{read_from=follower;expire_after_successful_update_time=0;expire_after_failed_update_time=0;refresh_time=0;expire_after_access_time=0}
-
create_table_default_attributes
[{optimize_for = scan}
]: The default attributes with which tables will be created duringCREATE
queries in CHYT. -
health_checker
: The Health Checker configuration, see also the corresponding section on the CHYT dashboard. Consists of 3 fields:—
queries
[["select * from `//sys/clickhouse/sample_table`"]
]: A list of test queries whose performance will be checked regularly.Attention!
Queries from Health Checker are currently executed on behalf of user
yt-clickhouse
who must have theread
permissions to access the table.—
period
[60000]:The test triggering period in milliseconds.—
timeout
[0.9 * period / len(queries)
]: The triggering timeout for each of the configured queries. If the query does not fit into the timeout, it is consideredfailed
. -
subquery
: The configuration of the main system part which coordinates the execution of ClickHouse queries on top of YTsaurus tables. Before configuring this part, read the article about query execution within a clique.—
min_data_weight_per_thread
[64 MiB]: (in bytes) When splitting the input into subqueries, the coordinator will try to give no less than the given amount to each core of each instance.—
max_data_weight_per_subquery
[50 GiB]: (in bytes) The maximum allowable amount of data to be processed on one core of one instance. This restriction is protective and protects clique users from accidentally running a huge query that processes petabytes. The constant of 50 GiB is selected, because such amount is processed on a single core in about dozens of minutes. -
show_tables
: TheSHOW TABLES
query behavior configuration. Enables you to configure the list of directories in whichSHOW TABLES
will show a list of tables.—
roots
: The list of YPath paths of directories in Cypress from which the tables will be collected forSHOW TABLES
.
Attention!
The max_data_weight_per_subquery
limit uses column-by-column statistics to account for column sampling from processed tables. Column-by-column statistics may be missing for old tables created before the statistics became available. For such tables, columns in the limit are not taken into account. This means that when processing a very narrow slice of columns which forms 1% of the total table volume and is 1 GiB per core, CHYT will calculate that 100 GiB per core are processed. CHYT will not run such a query with the default settings.
If this scenario is required or you need to process large amounts of data, configure this setting to a random large value.
Configuring a ClickHouse part
The ClickHouse part of the configuration lives in the clickhouse
top-level section.
Converting a ClickHouse XML configuration into a CHYT YSON configuration
This section describes how to convert a regular clickhouse server
XML configuration into an equivalent CHYT configuration.
The rules for converting a ClickHouse XML configuration into a CHYT YSON configuration can be described as follows:
-
Any non-multiple configuration node in the ClickHouse configuration is a dict (
map
). -
A multiple node is represented by a list (
list
).
Below is an example of converting an artificial XML configuration into a YSON configuration.
<foo>42</foo>
<bar>qwe</bar>
<baz>
<quux>3.14</quux>
</baz>
<baz></baz>
<baz>hi!</baz>
{
foo = 42;
bar = "qwe";
baz = [
{quux = 3.14};
{};
"hi!";
];
}
clickhouse
section
The The clickhouse
section is passed to the ClickHouse configuration. Below are two parts of this section that may be convenient in CHYT:
-
settings
: Contains the query settings that will be applied to a default user profile, i.e. the profile from which the settings for all queries will be taken. For example, you can write the{extremes = 1}
dict to this section to enable the extremes ClickHouse option. -
dictionaries
([]
): The configuration of external dicts. The value must be a list of dict configurations. Each dict is configured bymap
with the following fields that retain the meaning of the original ClickHouse configuration:name
: The name of the external dict.source
: The data source for the external dict.layout
: Representation of the external dict in the instance memory.structure
: The schema of data stored in the dict.lifetime
: The dict lifetime.
External dicts
CHYT supports all layout
, structure
, and lifetime
settings of regular ClickHouse.
Note
You can specify sources from original ClickHouse as the source
, but they are not guaranteed to work. In particular, sources that require networking should technically work, but may face a lack of network access in practice.
There is also an additional data source type in which you can use static tables in YTsaurus. This type is called yt
and has one parameter:
path
: The path to the static table on the cluster which will serve as a data source for the dict.
As an example of connecting a YTsaurus table as an external ClickHouse dict, you can look at:
- The CHYT configuration example that connects this table as an external ClickHouse dict.
- Example of using an external dict in queries:
select dictGet('OS', 'OS', toUInt64(38)) as os_name,
dictGetHierarchy('OS', toUInt64(38)) as hierarchy,
dictGetChildren('OS', toUInt64(101)) as children
Configuration example
Below is an example of the complete configuration that can be specified with a set-speclet
command.
{
instance_count = 1;
query_settings = {
extremes = 1;
};
yt_config = {
subquery = {
max_data_weight_per_subquery = 1000000000000;
};
};
clickhouse_config = {
dictionaries = [
{
name = OS;
layout = {flat = {}};
structure = {
id = {name = Id};
attribute = [
{
name = "OS";
type = "Nullable(String)";
null_value = "NULL";
};
{
name = "ParentId";
type = "Nullable(UInt64)";
null_value = "NULL";
hierarchical = %true;
};
{
name = "RootId";
type = "Nullable(UInt64)";
null_value = "NULL";
};
];
};
lifetime = 0;
source = {yt = {path = "//sys/clickhouse/dictionaries/OS"}};
};
];
};
}