Sharding
This section describes ways of sharding dynamic tables. A description of the automatic sharding algorithm is given.
Dynamic tables are divided into tablets and a tablet is a unit of concurrency. . To distribute the load evenly, use the standard method: add an auxiliary key column to the front of the table that contains the hash of the part of the key to perform sharding for (for example, the hash from the first column). The result is a table whose keys are evenly distributed in the [0, 2^64-1]
range.
To split such a table into k tablets, it is sufficient to split the [0, 2^64-1]
range into k parts.
The value of the key column can be calculated on the client side and transmitted when writing to the table, but computed columns can be used as an alternative.
Computed columns
The YTsaurus system supports the feature that automatically calculates the value of a key column using a formula. You must specify this formula in the schema of this column in the expression
field. The key column can only depend on non-calculated key columns. When writing a row or searching for a row by key, the computed columns must be skipped.
For even distribution, better specify "expression" = "farm_hash(key)"
where key
is the prefix of the source key (farm_hash
is a built-in function that calculates FarmHash from arguments).
When using automatically computed columns, consider that the select_rows
operation infers the range of affected keys out of the predicate to optimize performance. If some of the values of the computed columns are not specified explicitly in the predicate, YTsaurus will try to supplement the condition with the value of the computed columns. In the current implementation, the result will be successful if those columns on which the calculated one depends are defined in the predicate by an equality or using the IN
operator. Calculation of the explicitly specified values of the computed columns in the range output does not take place.
Example of using compucolumns
Let's assume there is a table with hash, key, value
columns, hash
and key
are key columns, and the expression = "farm_hash(key)"
formula is specified in the schema for hash
. Then for insert, delete, and read operations by key, you only need to specify key
and value
. In order for the select_rows
operation to work efficiently, key
must be exactly specified in the predicate so that the YTsaurus system can calculate which hash
values to consider. For example, you can specify WHERE key = key_value
or WHERE key IN (key_value1, key_value2)
in the query.
If you specify WHERE key > key_lower_bound and key < key_upper_bound
, the range for hash
cannot be inferred. In some cases, enumeration of the values of the computed columns is possible. Enumeration occurs in the following cases:
- The expression of the computed column is
expression = "f(key / divisor)"
wherekey
anddivisor
are integers. In this case, all suchkey
values are enumerated and they generate differentexpression
values. This behavior is generalized in case there are multiple computed columns and multiplekey
occurrences with different divisors. - The expression is
expression = "f(key) % mod"
. In this case, theexpression
values within themod
value are enumerated, and thenull
value is also included in the enumeration.
If both enumeration methods can be applied, the one that generates the least number of values is chosen. The total number of keys generated by the enumeration is limited by the range_expansion_limit
parameter.
Automatic sharding
Sharding is needed to distribute the load evenly across the cluster. Automatic sharding includes:
- Table sharding.
- Redistributing tablets between tablet cells.
Sharding is needed for the table tablets to become approximately of the same size. Redistributing between tablet cells is needed for tablet cells to have an approximately equal amount of data. This is especially important for in-memory tables (with @in_memory_mode
other than none
), because the cluster memory is a very limited resource and some cluster nodes may become overloaded if the distribution fails.
You can configure balancing both on a per-table basis and for each tablet cell bundle.
A list of available bundle settings is available at //sys/tablet_cell_bundles/<bundle_name>/@tablet_balancer_config
and represented in the table.
Name | Type | Default value | Description |
---|---|---|---|
min_tablet_size | int | 128 MB | Minimum tablet size |
desired_tablet_size | int | 10 GB | Desired tablet size |
max_tablet_size | int | 20 GB | Maximum tablet size |
min_in_memory_tablet_size | int | 512 MB | Minimum in-memory table tablet size |
desired_in_memory_tablet_size | int | 1 GB | Desired in-memory table tablet size |
max_in_memory_tablet_size | int | 2 GB | Maximum in-memory table tablet size |
enable_tablet_size_balancer | boolean | %true | Enabling/disabling sharding |
enable_in_memory_cell_balancer | boolean | %true | Enabling/disabling moving in-memory tablets between cells |
enable_cell_balancer | boolean | %false | Enabling/disabling moving not-in-memory tablets between cells |
A list of available table settings (//path/to/table/@tablet_balancer_config
) is given in the table:
Name | Type | Default value | Description |
---|---|---|---|
enable_auto_reshard | boolean | %true | Enabling/disabling sharding |
enable_auto_tablet_move | boolean | %true | Enabling/disabling moving table tablets between cells |
min_tablet_size | int | - | Minimum tablet size |
desired_tablet_size | int | - | Desired tablet size |
max_tablet_size | int | - | Maximum tablet size |
desired_tablet_count | int | - | Desired number of tablets |
min_tablet_count | int | - | Minimum number of tablets (see explanation in the text) |
Previously, instead of //path/to/table/@tablet_balancer_config
, the balancer settings were set directly on the table. The following attributes could be found in the table:
- enable_tablet_balancer;
- disable_tablet_balancer;
- min_tablet_size;
- desired_tablet_size;
- max_tablet_size;
- desired_tablet_count.
These attributes are declared deprecated. They are linked to the corresponding values from //path/to/table/@tablet_balancer_config
, but their use is not recommended.
Previously, the enable_tablet_balancer
could either not exist or could take one of the true/false values. It now unambiguously corresponds to the enable_auto_reshard
option and either contains false
(balancing disabled) or does not exist (balancing enabled, default value).
If the desired_tablet_count
parameter is specified in the table settings, the balancer will attempt to shard the table by the specified number of tablets. Otherwise, if all three min_tablet_size
, desired_tablet_size
, and max_tablet_size
parameters are specified in the table settings and their values are allowable (i.e. min_tablet_size < desired_tablet_size < max_tablet_size
is true), the values of the specified parameters will be used instead of the default settings. Otherwise, the tablet cell bundle settings will be used.
The automatic sharding algorithm is as follows: the background process monitors the mounted tablets and as soon as it detects a tablet smaller than min_tablet_size
or larger than max_tablet_size
, it tries to bring it to desired_tablet_size
, possibly affecting adjacent tablets. If the desired_tablet_count
parameter is specified, the custom table size settings will be ignored and values will be calculated based on table size and desired_tablet_count
.
If the min_tablet_count
parameter is set, the balancer will not combine tablets if the resulting number of tablets is less than the limit. However, this option does not guarantee that the balancer will increase the number tablets if there are fewer now: if you use it, you need to pre-shard the table by the desired number of tablets yourself.
Disabling automatic sharding:
On the table:
yt set //path/to/table/@tablet_balancer_config/enable_auto_reshard %false
: Disable sharding.yt set //path/to/table/@tablet_balancer_config/enable_auto_tablet_move %false
: Disable moving tablets between cells.
On the tablet cell bundle:
yt set //sys/tablet_cell_bundles//@tablet_balancer_config/enable_tablet_size_balancer %false
: Disable sharding.yt set //sys/tablet_cell_bundles//@tablet_balancer_config/enable_{in_memory_,}cell_balancer %false
: Disable moving between in_memory/not-in_memory tablet cells.
Automatic sharding schedule
Sharding will inevitably unmount some of the tablets. To make the process more predictable, you can set up the balancer schedule. The setting can be per-cluster and per-bundle. The bundle schedule is in the //sys/tablet_cell_bundles//@tablet_balancer_config/tablet_balancer_schedule
attribute. Any arithmetic formula from the hours
and minutes
integer variables can be specified as a format. The bundle table balancing will only occur when the formula value is true (i.e. different from zero).
The background balancing process runs once every few minutes, so you should expect that the tablets can be in the unmounted state for 10 minutes after the formula becomes true.
Examples:
minutes % 20 == 0
: Balancing at the 0th, 20th, and 40th minute of each hour.hours % 2 == 0 && minutes == 30
: Balancing at 00:30, 02:30, ...
If no attribute value is specified for the bundle, the default value for the cluster is used. Example of a cluster-level balancing setup is shown in Table 3.
Cluster | Schedule |
---|---|
first-cluster | (hours * 60 + minutes) % 40 == 0 |
second-cluster | (hours * 60 + minutes) % 40 == 10 |
third-cluster | (hours * 60 + minutes) % 40 == 20 |