Pool characteristics
This section describes the characteristics and settings of pool trees, pools, and operations.
Pool tree characteristics
A pool tree has the following attributes:
nodes_filter
: A filter for selecting cluster nodes that belong to the tree. For more information, see Tagging cluster nodes.max_operation_count
: The default value for the maximum number of concurrently started (running + pending) operations in the root pool.max_operation_count_per_pool
: The default value for the maximum number of concurrently started operations in the pool. Can be overwritten by a setting on the pool.max_running_operation_count
: The default value for the maximum number of concurrently running operations in the root pool.max_running_operation_count_per_pool
: The default value for the maximum number of concurrently running operations in the pool. Can be overwritten by a setting on the pool.default_parent_pool
: The default pool, in which operations that do not have a pool indicated in the specification will be started.enable_pool_starvation
: Allow pool starvation. For more information, see Preemption.forbid_immediate_operations_in_root
: Allow operations to run in the root pool of the tree.max_ephemeral_pool_per_user
: The maximum number of ephemeral pools for each user. An ephemeral pool is a pool that was indicated in the specification, but for which there is no explicit node in Cypress.fair_share_preemption_timeout
: How long the operation can remain below its fair_share before the system initiates preemption to run its jobs.fair_share_starvation_tolerance
: The tolerance used when comparing usage-ratio with fair_share: an operation is considered starving ifusage-ratio < fair_share * tolerance
.non_preemptible_resource_usage_threshold
: The operation will not be subject to preemption if its resource usage is less than this value.
The fair_share_preemption_timeout
and fair_share_starvation_tolerance
attributes can be redefined in the operation specification.
The tree is the scheduler_pool_tree
object at the first level of the //sys/pool_trees
directory in Cypress. The attributes listed above are the attributes of this object. The //sys/pool_trees
node has the default_tree
attribute in which the default tree can be specified. Will be used for operations that do not have the specified pool_trees
option.
Pool characteristics
Each pool has the following characteristics. The default values are specified in brackets.
weight
(1): A real non-negative number, which is responsible for the proportion in which the subtree should be provided with the resources of the parent pool. When a pool has two child pools with weights 2 and 1, the parent's resources will be divided between them in the 2:1 proportion.strong_guarantee_resources
: A dict which lists the guaranteed resources of the given pool (user_slots
,cpu
, andmemory
).
Note
Note that guarantees will be fulfilled only if the dominant resource of the operation coincides with the guaranteed resource.
resource_limits
: A dict which describes the limits on different resources of the given pool (user_slots
,cpu
, andmemory
).mode
(fair_share): The scheduling mode that can take thefair_share
orfifo
values. Withfifo
mode, jobs are issued to child operations in lexicographic sorting order according to the values of the parameters specified in thefifo_sort_parameters
attribute for that pool. For example, if the attribute value is[start_time]
, jobs will first be issued to the operations with the least start time.max_running_operation_count
(8): The limit on the number of concurrently running operations in the pool. Operations above this limit will be queued up and pending.max_operation_count
(50): The limit on the number of concurrently started (running + pending) operations in the pool. If the specified limit is reached, starting new operations in the pool will end with an error.fifo_sort_parameters
(start_time): The order of starting operations in the FIFO pool. By default, operations are sorted first by weight, and then by start time. This parameter enables you to change the order of operations in the queue. Supported values:start_time
,weight
, andpending_job_count
. Theweight
parameter is applied in reverse order: operations with more weight go first and have a higher priority. Thepending_job_count
value enables you to prioritize small operations (with few jobs).forbid_immediate_operations
: Prohibits the start of operations directly in the given pool; does not apply to starting operations in subpools.create_ephemeral_subpools
(false): Activates the mode in which an ephemeral subpool is created for operations in the given pool with thepoolname$username
name wherepoolname
is the name of the current pool. The operation is started in the created ephemeral subpool.ephemeral_subpool_config
: A nested configuration of ephemeral subpools created in the current pool. Makes sense only if thecreate_ephemeral_subpools
option is specified. You can specify themode
,max_running_operation_count
,max_operation_count
, andresource_limits
in the configuration.offloading_settings
: A setting responsible for offloading some of the pool jobs to another tree along with the tree where the pool resides. Job offloading works for operations started after the setting was specified. Example:yt set //sys/pool_trees/<pool_tree>/<pool>/@offloading_settings '{<pool_tree_X>={pool=<pool_name_Y>}}'
.
Each pool has its unique (within a single tree) name. The name of the pool selected to start the operation is shown in the operation settings.
Attention
The max_running_operation_count
and max_operation_count
parameters are set (explicitly or implicitly) at all levels of the pool hierarchy and are also checked at all levels of the hierarchy. To be specific, starting a new operation in the P pool is possible if and only if both P and all of its parents (up to the root) have fewer started operations than a given limit. The same applies for the max_running_operation_count
limit: an operation in the P pool will remain in the pending state as long as the number of running operations in that pool or in any of its parents is larger than or equal to the max_running_operation_count
limit. Oversubscribing these limits is possible at each level of the hierarchy. This feature must be taken into account when ordering and changing user pool limits. In order to guarantee a certain number of operations to the pool, you need to make sure that there is no oversubscription at all levels up to the root.
Operations
Some of the settings inherent to the pool are also available for operations and are specified in the operation specification root at start-up. These settings include: weight
and resource_limits
.
For example, you can specify resource_limits={user_slots=200}
in the specification. In this case, the operation will not run more than 200 jobs concurrently in each tree.
The resource_limits
and other settings can be specified independently for different pool trees.
Dynamic characteristics of operations and pools
Fair share ratio
is a share of cluster resources that is guaranteed for the given operation (pool) at the moment. The fair share sum of all operations does not exceed one.Guaranteed ratio
is a share of cluster resources that is guaranteed for the operation (pool) at the maximum cluster load.Usage ratio
is a share of cluster resources currently consumed by the operation or pool: all the running operation or pool jobs. The usage ratio sum of all operations can be larger than one, since operations can consume different resources, while the share is calculated by the dominant resource.Starving
is a flag indicating whether the operation is starving. If there is a starving operation, the scheduler will attempt to abort the jobs of non-starving operations by performing preemption to start the jobs of that operation.Demand ratio
is a characteristic that shows the share of all cluster resources that an operation or pool needs to complete computation.Dominant resource
is the dominant resource of an operation or pool - the resource whose share in the resources of the entire cluster is the largest.
Dynamic characteristics of operations are calculated for each tree separately. Consequently, an operation can have several usage_ratios
and so on.
The amount of resources that an operation needs to be executed is calculated by summing up the needs of all the operation jobs by each resource type.