Concepts
Tablets
For each table, the key space is divided by a set of boundary keys into non-overlapping ranges — tablets. Table tablets are listed in the tablets
attribute. Each tablet is described by the TabletInfo
structure:
Name | Type | Description | Mandatory |
---|---|---|---|
tablet_id | Guid | Tablet ID | Yes |
statistics | TabletStatistics | Tablet statistics | Yes |
state | String | State (mounted/unmounted/frozen/mounting/unmounting/freezing/unfreezing/frozen_mounting) | Yes |
pivot_key | List | Boundary key where the tablet starts (for sorted tables only) | Yes |
cell_id | Guid | ID of the cell to which the tablet is mounted | No |
Example of tablet statistics
{
"chunk_count" = 1;
"compressed_data_size" = 324;
"disk_space" = 2502;
"disk_space_per_medium" = {
"default" = 2502;
};
"dynamic_memory_pool_size" = 0;
"hunk_compressed_data_size" = 0;
"hunk_uncompressed_data_size" = 0;
"memory_size" = 0;
"overlapping_store_count" = 1;
"partition_count" = 1;
"preload_completed_store_count" = 0;
"preload_failed_store_count" = 0;
"preload_pending_store_count" = 0;
"store_count" = 1;
"tablet_count" = 1;
"tablet_count_per_memory_mode" = {
"none" = 1;
"compressed" = 0;
"uncompressed" = 0;
};
"uncompressed_data_size" = 538;
"unmerged_row_count" = 2;
}
Each tablet logically consists of a set of sorted data chunks and a special memory area called dynamic store
. Chunks are stored in a replicated way in a blob repository ^[storage of large amounts of unstructured data] and overlap in key ranges. When data is written, it goes into the dynamic store
and is stored there. When the data reaches the amounts specified in the configuration, it is written to the disk as a chunk. When the number of chunks within a tablet becomes large enough, some of them are combined — compaction is performed. At this moment, old and irrelevant data may be deleted. For more information, see Compaction.
A list of all tablets of all dynamic system tables is available at //sys/tablets
.
Boundary keys
To change a table tablet set, use the reshard_table
command. When changing the number of tablets of sorted dynamic tables, boundary keys must be specified. Tablet boundary keys are random sequences of values, not necessarily of the same length as the key columns.
One of the options is to take a hash of some subsequent key components in order to distribute the data evenly across the key space. Then the boundary keys can be sequences of length 1, consisting of numbers that evenly divide the entire interval of hash function values.
The boundary key of the first table tablet always equals the empty sequence — effective minus infinity — and cannot be changed.
Partitions
Tables in dynamic tables are divided into partitions to ensure parallel processing of queries within the cluster node. A typical partition size is several hundred megabytes. If the chunks become larger than the partition size, a background process is started that forces the chunks to be split into smaller chunks.
Tablet cells
Tablets are maintained by entities called tablet cells. Each cell is a complex automaton whose state is replicated by the Hydra
subsystem. Each cluster node has so-called tablet slots — virtual positions into which cells can fall.
Cells in the cluster are created by the administrator during initial setup. The total number of cells must be close to the total number of tablet slots on the cluster nodes, but with a certain margin to account for the possibility of a certain number of cluster nodes failing.
Each tablet cell belongs to a certain bundle. The name of the bundle in which the cell is created is specified in the tablet_cell_bundle
attribute of a specific table or one of the parent directories. By default, tablet cells are created in the default
bundle.
A list of all cells is available at //sys/tablet_cells
. Each cell maintains changelogs (mutation logs) and sometimes records state snapshots. The logs go into the //sys/tablet_cells/<cell_id>/changelogs
directory and the snapshots go into the //sys/tablet_cells/<cell_id>/snapshots
directory as files.
The master server automatically selects a specific cluster node and an available tablet slot on that node to serve the tablet cell. If the cluster node serving the cell fails, after some time the master server will select a free slot on another node and start the cell on that node.
Tablet cells have the attributes indicated in the table:
Name | Type | Description |
---|---|---|
id | Guid | Cell ID. |
peers | Dict | Dict containing the name of the cluster node serving the cell and the status of the cell. |
tablet_cell_bundle | String | Name of the tablet cell bundle to which the cell belongs. |
status | Dict | Cell state, decommission flag. |
total_statistics | Dict | Various tablet cell statistics. |
Cell state example
{
"health" = "good";
"decommissioned" = %false;
}
Tablet cell bundles
Cells are usually created in groups with the same settings. Bundles are needed to combine cells on the basis of common settings. In the current implementation, bundles are the only reliable way to isolate the load, provided that the bundles are properly distributed across the cluster nodes.
Bundles are identified by string names. The list of bundles in the system is available at //sys/tablet_cell_bundles
. In the system, there is always a built-in and non-removable bundle with the default
name. Bundle settings are specified in the options
attribute when it is created and can also be changed later. Changing the settings in the options
attribute requires unmounting all the bundle tables and recreating all the cells.
Each table, including a static one, has the tablet_cell_bundle
attribute where the name of the bundle to which cells this table will be mounted is stored. For more information about mounting, see Table mounting.
Each bundle has the node_tag_filter
attribute which is used to select the cluster nodes for tablet cells in the given bundle. The filter is a logical expression with the operators: &
— logical "and", |
— logical "or", !
— logical "no" and round brackets. For example, foo | (bar & !baz)
.
The variables in the expression are the possible tag values. If the cluster node has the specified tag, the variable in the formula takes the true
value; if not, it takes the false
value. Tablet cells will only fall onto those cluster nodes for which the entire expression takes the true
value.
Tablet cell bundles have the attributes indicated in the table:
Name | Type | Description |
---|---|---|
name | String | Bundle name. |
options | TabletCellOptions | Settings of all tablets of this bundle. |
tablet_cell_count | Int | Number of cells in a bundle. |
tablet_cell_ids* | Guid | IDs of cells in a bundle. |
The state of the bundle, the state of the tablet cells, and the list of cluster nodes included in the bundle are displayed on the Tablet cells tab.
The parameter appears in the answer several times.