Integral guarantees

The basic mechanism for providing guarantees for computing resources in the YTsaurus system stipulates issuing a constant amount of resources (strong guarantee), most often expressed in CPU cores. This guarantee type is well suited for tasks that can utilize resources constantly . In reality, there are other scenarios where more flexibility is required.

For example, when a production process needs to perform computations in a relatively short period of time, but needs a lot of computational quota at its peak, whereas the consumption level can be much lower the rest of the time. With the basic mechanism for providing a guarantee, you need to allocate resources to the process at the maximum consumption value and order more resources when scheduling than you actually needed.

There are processes for which it is not so important to get computational resources immediately, but it is important that the processes be completed within a certain extended period of time. If that's the case, you are looking at an average value of the computational quota over a long time: for example, research and analytical tasks - experiments.

The mechanism of integral guarantees enables you to find a balance between the described scenarios.

Integral pools

In the YTsaurus system, you can enable the accumulation of resources for the pool. Such pools receive a certain amount of virtual resources at a constant rate — accumulated_resource_ratio_volume. The amount of resource is stored in the cluster share per second — fair_share*sec, but for simplicity we can assume that the value is given in cores per second — cpu*sec. The pool can consume a virtual resource to start operations and can accumulate it to a certain limit.

Example: let there be accumulated_resource_ratio_volume = 60 fair_share*sec in the pool, 1000 cores on the cluster, and the dominant resource of the user process is CPU. This volume can be converted to cores * seconds — 60,000 cpu*sec. An operation that consumes 100 CPU cores at a time will use that amount in 10 minutes; one that consumes 50 CPU cores — in 20 minutes. This example does not take into account that the volume of the virtual resource in the pool is replenished over time.

The resource accumulation rate is configured in the integral-guarantees/resource_flow pool attribute. The attribute value must specify the resource type and its volume:

$ yt set //sys/pools/root-pool/burst/@integral-guarantees/resource_flow "{cpu=100}"

Since resource_flow is the rate at which accumulated_resource_ratio_volume increases, which is the instantaneous amount of resource multiplied by time, resource_flow/cpu is measured in cores, simply put, cpu*sec/sec = cpu. For example, resource_flow/cpu = 100 allows an operation consuming 100 cpu cores at a time to run indefinitely.

Accumulation of accumulated_resource_ratio_volume occurs to a certain limit, which is equal to k*resource_flow_ratio, where k is a single scheduler parameter for all pools and resource_flow_ratio is resource_flow as a share of all cluster resources. The value for the k parameter is set by the cluster administrator.

The described idea of accumulating and spending resources is much like the Token bucket algorithm (not to be mistaken for Leaky bucket).

Guarantee types

There are three types of guarantees for computing resources in the YTsaurus system:

strong_guarantee: A strong guarantee of the share of cluster resources. Set in an absolute value (in cores), automatically recalculated into the cluster share. Enables you to get a guarantee at any time and use it indefinitely.
burst_integral: An immediate guarantee ("peak guarantee") in cores, enables you to get a fixed guarantee over a period of time. For example, get 2000 cores for two hours a day.
relaxed_integral: An integral guarantee (in cores) enables you to get a fixed average amount of resource per day.

Types of integral pools

To support the two scenarios described at the beginning of this section, two types of integral pools are implemented in the YTsaurus system:

Burst pool: The pool that has a priority when spending accumulated_resource_ratio_volume. Such pools must have resource_flow and burst_guarantee_resources specified. If a pool has resource-demanding operations, the scheduler must issue at least burst_guarantee_resources to the pool if the pool has enough accumulated_resource_ratio_volume. This pool type provides a burst_integral guarantee.

A burst pool configuration example:
```
yt set //sys/pools/root-pool/burst/@integral-guarantees "
{
  guarantee_type=burst;
  resource_flow={cpu=1000};
  burst_guarantee_resources={cpu=2000}
}"
```
In the above example, the user can expect the pool to always be guaranteed 1000 cores, or guaranteed 2000 cores for "half" of the time (for example, 12 hours a day) if nothing is running during the second half.

A burst guarantee is simultaneously a limit on the consumption rate of the accumulated resource. If the burst_gurantee_resources/cpu guarantee equals 2000, the scheduler will not allow the pool to consume more than 2000 cores at a time by default. This behavior protects against accidental rapid consumption of the accumulated integral resource.
Relaxed pool: The pool for which there is no guarantee to get all resources for accumulated_resource_ratio_volume at a given moment, but there is a guarantee to get them in the end (within a day). You can only set resource_flow for a relaxed pool. Such pools are expected to run operations that do not immediately require the resources they are entitled to. This pool type provides a relaxed_integral guarantee.
A relaxed pool configuration example:

Listing 3
```
yt set //sys/pools/root-pool/relaxed/@integral-guarantees "
{
  guarantee_type=relaxed;
  resource_flow={cpu=1000}
}"
```
Both relaxed and burst pools have a resource consumption limit. In the case of relaxed pools, the limit is three times the value of the flow guarantee. For example, with a resource_flow/cpu guarantee of 1000, the scheduler will limit pool consumption to 3000 cores at a time by default.

The described pool types complement each other, allowing the use of the same resources by priority processes that require significant guarantees for limited periods of time as well as by less significant processes which do not require immediate resource allocation.

Example

Suppose there is a production process that requires 2000 CPUs 12 hours a day and a research process that needs only 1000 CPUs on average. With the mechanism of integral guarantees, burst and relaxed pools can be configured for such processes, and 2000 cores will be needed to provide guarantees. The specified pools must be located in the same pool tree, there are no other restrictions. With the basic mechanism of issuing guarantees (strong guarantee), 3000 cores would have to be ordered to enforce such guarantees.

The figure shows a graph of CPU usage and demand for a burst pool with a burst_integral guarantee of 500 CPU cores and a resource_flow of 100 CPU cores.

The figure below shows a graph of the volume of accumulated resource for the same pool. You can see how the volume of virtual resource decreases when operations are running in the pool, and then the resource is again accumulated to the specified limit.

Combination with strong guarantees

Both types of integral pools are naturally compatible with strong guarantees. When issuing resources, the scheduler will first issue resources against strong guarantees. If they are enough to meet the current need of the pool, accumulated_resource_ratio_volume will not be spent. If the need for resources exceeds the strong guarantees, accumulated_resource_ratio_volume will be spent to meet the exceeding part.

Integral pools along with neighboring pools participate in receiving resources in excess of their guarantees from the pools that are higher in the hierarchy in proportion to the set weights.

Additional attributes of integral pools

You can request the scheduler for the value of service attributes, which are available in a special subtree of Cypress called Orchid.

accumulated_resource_ratio_volume: Accumulated integral resources in terms of the cluster share * second.
accumulated_resource_volume: accumulated_resource_ratio_volume as a dict with all resources. For example, accumulated_resource_volume/cpu is a volume of the accumulated integral resource expressed in core * second.
integral_pool_capacity: The limit of accumulating accumulated_resource_ratio_volume.
specified_burst_ratio: burst_guarantee_resources converted into the cluster share.
specified_resource_flow_ratio: resource_flow converted into the cluster share.
total_burst_ratio: The sum of specified_burst_ratio for all descendants (including the current pool).
total_resource_flow_ratio: The sum of specified_resource_flow_ratio for all descendants (including the current pool).
estimated_burst_usage_duration_seconds: The estimated period of time that the accumulated resource will last (taking into account the continuing inflow of resources at the rate of resource_flow) when consuming burst_guarantee_resources (only available for burst pools).

An example of requesting an attribute is shown in Listing 4.
Listing 4

yt get //sys/scheduler/orchid/scheduler/scheduling_info_per_pool_tree/default/fair_share_info/pools/pool_name/integral_pool_capacity

Attention

The scheduler Orchid is not part of the stable API and can be changed without announcement.

Preemption

Limits on the number of operations