This section contains information about CHYT-specific settings. ClickHouse has a large number of different settings which can be specified in different ways: via the GET parameter of a query, directly in the query via the SETTINGS clause, when using YQL via pragma, or in other ways. For a list of the original ClickHouse settings, see the documentation.
The settings related to CHYT start with the
chyt. prefix. Some settings are placed into separate subsections: in this case, the name of the setting will consist of several tokens separated by dots, for example,
As usual in ClickHouse, the
0 values are used as
False for logical settings.
As usual, many of the settings include certain optimizations that are enabled by default and are listed only to emphasize the possibility of disabling them in case of incorrect optimization.
%true]: Enables the use of the fast column-by-column reading interface for tables in scan format.
%true]: Enables the output of values for the computed key columns using the predicate in WHERE. For example, if the
key_hashkey column is specified in the schema as a result of the
farm_hash(key)expression, then if the query condition contains the
key = 'xyz'expression, the
key_hash = 16518849956333482075consequence will be automatically added to the predicate. Only conditions of the
column = constant expression,
column tuple = tuple of constant expressions,
column IN tuple of constant expressions, and
column tuple IN tuple of constant expression tuplestype are supported.
%true]: Enables the additional step of range output when querying via a dynamic table using tablet pivot keys.
composite: A section with settings related to composite
binary]: The default format for representing YSON strings. Possible values are
binary, text, and pretty. Note that options other than
binaryare less efficient, because they require an explicit conversion from binary format. Functions for working with YSON work with any of the possible YSON formats.
dynamic_table: A section with settings related to working with dynamic tables.
%true]: Work with dynamic tables by reading data from dynamic stores. If you enable this option, you cannot read a table that was not mounted with the
enable_dynamic_store_read = %trueattribute. Attempting to read will cause an error. If this option is disabled, any query will forcibly read only data from chunk stores, i.e. not including the most recent data from memory store (regardless of whether the table is mounted with
enable_dynamic_store_read = %trueor not).
execution: A section with settings that affect the process of scheduling and executing a query:
query_depth_limit: The limit on the maximum depth of a distributed query. When reaching the specified depth, the query ends with an error. The 0 value means there is no limit.
distribute_initial]: JOIN query mode on top of a YTsaurus table. Possible options:
local: JOIN is executed locally on the query coordinator. The left- and right-hand side tables (or subquery) are read according to
JOINin the initial query is executed in a distributed manner, while JOIN in the secondary queries are executed locally.
JOINin all the queries (initial and secondary) is executed in a distributed manner. We do not recommend using this mode, because it can lead to an exponential number of secondary queries in case of multiple
SELECTquery is read and processed completely only on one instance (coordinator).
SELECTis executed in a distributed manner on clique instances in the initial query. In secondary queries, the execution is local.
SELECTin all the queries (initial and secondary) is executed in a distributed manner. We do not recommend using this mode, because it can lead to an exponential number of secondary queries.
join_node_limit: The maximum number of clique nodes on which the distributed
JOINquery is allowed. The 0 value means there is no limit (default).
select_node_limit: The maximum number of clique nodes on which the distributed
SELECTquery is allowed. The 0 value means there is no limit (default).
distribution_seed: The seed used for deterministic distribution of the query to the clique nodes.
input_streams_per_secondary_query: The limit on the number of concurrent table read threads on each clique node during query execution. In case of the 0 value, the value of the
max_threadsclickhouse setting (default) is used as a limit.
%true]: Enables optimization that pre-filters the right-hand side JOIN table (or subquery) by the sort key of the left-hand side table.
%true]: Adjusts the behavior of
FULL JOINwhen the right-hand side table (subquery) has the
Nullvalues in the key columns. If
%trueis set, the query is executed as usual. The query result will have all rows with the
Nullvalue in the key columns (default). If
%falseis set, all or some rows from the right-hand side table with the
Nullvalue in the key columns may be missing and the query can be executed more efficiently. If there are no
Nullvalues in the key columns of the right-hand side table (subquery), the query execution result does not depend on the values of this option.
with_mergeable_state(may be changed in the future)] is the minimum query execution stage after which distributed writing to the table is allowed when the
parallel_distributed_insert_selectClickHouse setting is enabled. Possible values:
none: Never use distributed writing.
with_mergeable_state(default, may be changed in the future): Use distributed writing if the query can be executed in a distributed manner at least to the
with_mergeable_statestage. Data aggregation (
LIMIT BY) and
LIMITclauses can be executed independently on each clique node, so in case of distributed writing the query execution result may contain several rows with the same aggregation key, rows may not be sorted in the ORDER BY order, and the number of rows may exceed a specified LIMIT (but the total number of rows will not exceed the
number of instances in the clique*
after_aggregation: Use distributed writing if the query can be executed in a distributed manner at least to the
after_aggregationstage. Data aggregation (
LIMIT BY) is guaranteed to be executed to the end and the
LIMITclauses can be executed independently on each clique node (which can cause sorting to fail and exceeding the specified row limit, as described above). If the query cannot be executed in a distributed manner to the
after_aggregationstage, data will be written locally from the query coordinator.
complete: Use distributed writing only if the query can be executed in a completely distributed manner. The result of execution of the query with distributed writing in this case is indistinguishable from an ordinary query, all the clauses (aggregation,
ORDER BY, and
LIMIT) will be executed to the end. If the query cannot be executed in a distributed manner to the
completestage, data will be written locally from the query coordinator.
caching: A section with settings related to the different caches used in CHYT:
sync]: The data invalidation mechanism mode in the table attribute cache when the table is changed (the
CREATE TABLE/INSERT INTO/DROP TABLEqueries). Possible values:
none: Data invalidation in the cache does not occur. Reading tables immediately after a change may return the old data or cause an error.
local: Data invalidation occurs only in the local instance cache, without any rpc queries. Reading tables on other instances immediately after a change may return the old data or cause an error.
async: Data invalidation occurs synchronously in the local cache and asynchronously on all clique instances. The query completes without waiting for data invalidation on the other clique instances. Errors during data invalidation in the cache do not cause a query to fail. Reading tables on other instances immediately after a change may still return the old data or cause an error, but this time interval after a data change is much shorter than in the
localcases. We recommend using an invalidation mode not lower than this one.
sync(default): Data invalidation occurs synchronously on all clique instances. The query execution time may be slightly longer, because it is necessary to wait for confirmation of cache invalidation on all instances before completing. Errors during cache invalidation will cause a query to fail. After a successful modifying query, you can immediately read the modified tables.
invalidate_request_timeout: The timeout in milliseconds (ms) for rpc queries to clique instances at the cache invalidation stage. If this timeout is exceeded, an error will be shown.
concat_tables: A section with settings related to the functions for merging tables of the
read_as_null]: The output mode of the general column schema if it is missing in one of the tables. Possible values:
throw: The query will end with an error.
drop: The column will be missing in the general schema. You cannot read such a column.
read_as_null(default): The column type will be set to
Optional(T). In the rows from tables where this column is missing, the value in the column will be
throw]: The general column schema output mode if column types in different tables are different and cannot be reduced to a general form. Possible values:
throw(default): The query will end with an error.
drop: The column will be missing in the general schema. You cannot read such a column.
read_as_any: The column type will be set to
Any. All values will be represented as YSON strings which can be handled using the functions of the
%false]: Allows combining multiple tables if they have no common columns.
max_tables: The maximum number of tables that can be combined for reading. If this limit is exceeded, the query will end with an error.