Tablet cell bundle management
YTsaurus provides flexible management of dynamic table bundles. For each bundle you can set the number of tablet nodes, the distribution of threads across thread pools, and memory by categories. If a node fails, automation will assign a new node from the spare pool to the bundle.
Overview
The system that manages bundles is called the Bundle controller. It manages instances of tablet nodes. Each node is bound to a certain bundle or is in the spare pool. Each bundle and instance belongs to a specific zone—a set of instances that share the same spare pool. Currently, only one zone is supported: zone_default
.
The Bundle controller sets the node_tag_filter
attribute on each bundle in the form zone_default/<bundle_name>
. This value is also written to the user_tags
attribute of all nodes assigned to the bundle. Each node can belong to no more than one bundle. In case of node failure, the Bundle controller automatically assigns a new node to the bundle.
For each bundle, you can specify its configuration: the number of tablet nodes, the number of tablet cells per node, the distribution of memory by categories, and the number of threads in certain thread pools. You can manage these settings through the user interface on the bundle's page.
The Bundle controller also manages the accounts where the bundle's tablet cell changelogs and snapshots are stored.
Initial setup
To simplify the setup of a typical scenario, a script has been prepared: bundle_controller_tools.py. It is recommended to run it with the --init-all
flag. When launching, you must specify the --cpu
and --memory
flags corresponding to the parameters of the tablet node instances. If there are instances of different sizes in the cluster, it is recommended to set the parameters for the smallest instance.
The following sections describe the setup steps performed by the script.
System directories
When running bundle_controller_tools.py
, system directories are created automatically. To skip this step, you can use the --no-init-system-directories
flag.
To start using the Bundle controller, you need to create the following directories:
//sys/bundle_controller/coordinator
//sys/bundle_controller/controller/zones/zone_default
//sys/bundle_controller/controller/bundles_state
You must also create an account named bundle_system_quotas
.
Zone zone_default
To automatically set up the zone, run bundle_controller_tools.py
with the --init-default-zone
flag.
Instances of each type are bound to a specific bundle or are in the spare pool. Each bundle and instance belongs to a specific zone. In the zone config, you must specify the size of the instances installed in the cluster (CPU and RAM), as well as the default settings for node thread pools and memory categories.
Warning
Currently, the Bundle controller can only work with clusters in which all instances are of the same size. This means that the resource_guarantee
field in the zone config, in all bundle configs, and in all tablet node annotations must have the same value.
If there are instances of different sizes in the cluster, it is recommended to set resource_guarantee
to the smallest instance.
The zone config is located at //sys/bundle_controller/controller/zones/zone_default
. Each field of the config is an attribute of the specified directory.
Tablet nodes
The configuration of tablet nodes is specified in the tablet_node_sizes
attribute. This is a map consisting of a single field regular
(the instance type). It looks like this:
{
"default_config" = {
"cpu_limits" = {
"lookup_thread_pool_size" = 8;
"write_thread_pool_size" = 10;
"query_thread_pool_size" = 8;
};
"memory_limits" = {
"tablet_dynamic" = 1000000;
"versioned_chunk_meta" = 1000000;
"uncompressed_block_cache" = 200000;
"tablet_static" = 0;
"compressed_block_cache" = 1000000;
"reserved" = 5000000;
"lookup_row_cache" = 0;
};
};
"resource_guarantee" = {
"net_bytes" = 0;
"vcpu" = 6000;
"memory" = 1073741824;
};
}
Here, default_config
is the default distribution of thread pools and memory categories for all bundles, and resource_guarantee
is the size of a tablet node instance. The Memory field corresponds to the RAM size in bytes; the vcpu
field corresponds to the number of CPU cores of the container multiplied by 1000. The net_bytes
field is currently not used.
More about the distribution of thread pools and memory categories is written in the Creating a bundle section.
RPC proxy
Note
In the current version, RPC proxy management is not fully supported.
The RPC proxy configuration is specified in the rpc_proxy_sizes
attribute. This is a map consisting of a single field regular
(the instance type). It looks like this:
{
"resource_guarantee" = {
"net_bytes" = 0;
"vcpu" = 6000;
"memory" = 1073741824;
};
}
The meanings of the fields are the same as for tablet nodes.
Instance annotations
To automatically configure instances, run bundle_controller_tools.py
with the --init-tablet-nodes
flag.
All tablet nodes managed by the Bundle controller must be labeled with certain attributes. The attributes are described in detail in the Adding nodes to the cluster section.
Bundle setup
To automatically configure bundles, run bundle_controller_tools.py
with the --init-bundle <bundle-name>
, --init-all-bundles
, or --init-all
flag.
For the Bundle controller to start managing existing bundles, you need to set the attributes specified in the Creating a bundle section. If the node_tag_filter
attribute was set on existing bundles, you must first set it to ""
(an empty string).
The bundle_controller_tools.py
script calculates the required number of nodes for each existing bundle based on the current number of tablet cells. If you need to assign more or fewer nodes, use the --bundle-node-count
argument, which takes a YSON map in the format {<bundle_name>=<node_count>}
.
Accounts for changelogs and snapshots
To automatically configure changelog and snapshot accounts, run bundle_controller_tools.py
with the --init-bundle-system-quotas
or --init-all
flag.
The Bundle controller can manage the accounts that store tablet cell changelogs and snapshots. The accounts are controlled by the changelog_account
and snapshot_account
fields in the bundle options in the attribute //sys/tablet_cell_bundles/<bundle_name>/@options
. Accounts managed by the Bundle controller are named <bundle_name>_bundle_system_quotas
and are children of the root account bundle_system_quotas
. The Bundle controller automatically adjusts the quotas of the parent and child accounts when the number of tablet cells changes.
In the static config of the Bundle controller, in the bundle_controller
section, there are fields that regulate the amount of resources needed for a single tablet cell:
node_count_per_cell
: number of Cypress nodeschunk_count_per_cell
: number of chunksjournal_disk_space_per_cell
: disk space for changelogssnapshot_disk_space_per_cell
: disk space for snapshotsmin_node_count
: minimum number of Cypress nodes allocated to a bundlemin_chunk_count
: minimum number of chunks allocated to a bundle
Creating a bundle
To create a bundle, you can use the bundle_controller_tools.py
script with the --create-bundle <bundle_name>
flag.
For a bundle to be managed by the Bundle controller, it must have the attributes enable_bundle_controller = %true
and zone = "zone_default"
. You must also set the bundle_controller_target_config
attribute with the following structure:
{
"cpu_limits" = {
"lookup_thread_pool_size" = 2;
"query_thread_pool_size" = 2;
"write_thread_pool_size" = 2;
};
"memory_limits" = {
"tablet_dynamic" = 1000000;
"tablet_static" = 0;
"compressed_block_cache" = 1000000;
"uncompressed_block_cache" = 200000;
"versioned_chunk_meta" = 1000000;
"lookup_row_cache" = 0;
"reserved" = 5000000;
};
"rpc_proxy_count" = 0;
"rpc_proxy_resource_guarantee" = {
"memory" = 1073741824;
"net_bytes" = 0;
"vcpu" = 6000;
};
"tablet_node_count" = 1;
"tablet_node_resource_guarantee" = {
"memory" = 1073741824;
"net_bytes" = 0;
"vcpu" = 6000;
};
}
The fields have the following meanings:
tablet_node_count
: the number of tablet nodes assigned to the bundle.tablet_node_resource_guarantee
: the size of the tablet node instances assigned to the bundle. Must match the value in the zone config.rpc_proxy_count
: the number of RPC proxies assigned to the bundle. Currently not supported.rpc_proxy_resource_guarantee
: the size of the RPC proxy instances assigned to the bundle. Must match the value in the zone config. Currently not supported.memory_limits
: the distribution of the bundle’s tablet node memory by categories. The total value must not exceedmemory
inresource_guarantee
.tablet_dynamic
: memory buffer for dynamic stores.tablet_static
: memory for in-memory tables (within_memory_mode
other thannone
).compressed_block_cache
,uncompressed_block_cache
: caches for data blocks used when reading.versioned_chunk_meta
: cache of chunk metadata.lookup_row_cache
: memory for the row-level cache.reserved
: memory reserved for system needs and unaccounted categories.
cpu_limits
: the distribution of the bundle’s tablet node threads across thread pools. Unlike memory, overcommit is allowed here, and the number of threads can exceedvcpu/1000
inresource_guarantee
.lookup_thread_pool_size
: number of threads for lookup requests.query_thread_pool_size
: number of threads for select requests.write_thread_pool_size
: number of tablet cells per node.
You must also set the cpu
and memory
fields in the bundle’s resource_limits
attribute. The total vcpu
and memory
of the bundle’s instances cannot exceed cpu*1000
and memory
from resource_limits
. More details are in the Resource quotas section.
Adding nodes to the cluster
After adding instances to the cluster, you need to mark them with the bundle_controller_annotations
attribute. To do this, you can use the bundle_controller_tools.py
script with the --init-tablet-nodes
flag.
The bundle_controller_annotations
attribute looks like this:
{
"allocated" = %true;
"allocated_for_bundle" = "spare";
"resources" = {
"vcpu" = 6000;
"memory" = 1073741824;
"type" = "regular";
};
}
Here, the allocated
field means that the instance is managed by the Bundle controller, allocated_for_bundle = "spare"
means that the instance is in the spare pool and does not belong to any bundle, and the resources
field contains the instance resources and must match the resource_guarantee
field in the corresponding zone settings section.
Decommissioning nodes
To decommission a node, you need to set the attribute //sys/cluster_nodes/<node_address>/@cms_maintenance_requests
to {maintenance={}}
. After that, the Bundle controller will gracefully move all tablet cells from this node to another node pulled from the spare pool. Once there are no tablet cells left on the node being decommissioned, it can be safely turned off. The presence of tablet cells is shown in the tablet_slots
attribute.
In the event of an unexpected node failure, the Bundle controller will assign a new node from the spare pool to the bundle, after which the tablet cells will be restored on it.
Resource quotas
Bundle resources are located in the attribute //sys/tablet_cell_bundles/<bundle_name>/@resource_limits
. The attribute contains the following fields:
tablet_count
: limit on the number of tablets of the bundle's tables.tablet_static_memory
: limit on the total size of the bundle’s in-memory tables. Set automatically.cpu
: limit on the totalcpu
of the bundle's nodes.memory
: limit on the totalmemory
of the bundle's nodes.
The tablet_count
and tablet_static_memory
fields affect the tables created in the bundle. The cpu
and memory
fields affect the number of tablet nodes that the Bundle controller can assign to the bundle.
The tablet_static_memory
field is automatically set by the Bundle controller. Its value is calculated as node_count * memory
, where node_count
is the number of nodes assigned to the bundle, and memory
is the amount of memory on a tablet node instance defined in the zone config.
To manage resources, you need the write
permission on the bundle. Resource management should be performed by the cluster administrator. You should grant node quotas so that the total quota of all bundles does not exceed the number of nodes in the cluster, taking into account the spare pool for node failures.
To simplify the setup of cpu
and memory
limits, the bundle_controller_tools.py
script has a set-bundle-resource-limits
command that takes a limit on the number of nodes and sets the limits according to the zone config.
Managing a bundle
Use the Edit bundle
button on the bundle page in the user interface to manage the bundle. The interface allows you to specify the number of nodes, the memory split by categories, and the number of threads in the thread pools for lookup and select requests.
To manage a bundle, you need the manage
permission on the bundle.
Disabling the Bundle controller
To switch the Bundle controller to read-only mode, set the attribute //sys/@disable_bundle_controller
to %true
. The Bundle controller will stop performing any actions on the cluster; however, the state of bundles can still be observed in the user interface.
Troubleshooting
The Bundle controller is not working
- Check the
//sys/@disable_bundle_controller
attribute. - Check for locks at
//sys/bundle_controller/coordinator/lock/@locks
. The absence of locks means that the Bundle controller cannot register with the cluster. The reasons are often in the error log in the container.
Failed
state
The bundle UI shows the The bundle UI displays two statuses: Health
—the state of tablet cells, and State
—the state of the Bundle controller. If State
is in the Failed
state, clicking on it shows the error that occurred. In some cases, this state can be caused by violations of the Bundle controller invariants (for example, if resource_guarantee
in the zone settings, nodes, and bundles do not match). The following steps are recommended:
- disable the Bundle controller by setting the
//sys/@disable_bundle_controller
attribute to%true
; - run the command
bundle_controller_tools.py drop-allocations <bundle_name>
. This command will remove incorrect node allocation requests; - make sure the configuration is correct (for example, by running
bundle_controller_tools.py init --init-all --cpu <cpu_guarantee> --memory <memory_guarantee>
).