Selecting memory for jobs
Memory is the RAM consumed by operation jobs.
For all operations, there is a default mechanism for selecting the actual memory reserve for the jobs to be run.
The operation controller collects memory consumption information for all successfully completed jobs. This information is saved as a special digest, based on which the actual memory reserve for the newly run jobs is selected.
The main aspects of the digest:
- Collects actual values (typically in the range from 0.0 to 1.0) describing the share of
memory_limit
consumed by the job. - Only successfully completed jobs and jobs that were aborted because they exceeded the memory reserve are counted.
- For jobs that were aborted because they exceeded the reserve, a point with the memory consumption share of that job multiplied by a certain constant slightly above one (the default value is 1.1) is added to the digest.
- The
memory_limit * percentile(digest, P)
(by default,P
is 0.95) is taken as a reserve for the newly run job. - The digest does not store all points internally, but uses a data structure that allows an approximate calculation of percentiles.
The memory reserve allocated to the job is used by the scheduler to consider consumption the lower limit. If the reserve value is exceeded and there is no free memory on the cluster node, the job may be aborted due to the resource_overdraft
reason. If there is extra free memory on the cluster node, the job may exceed its allocated reserve and be completed successfully. If the memory limit specified by the user in the operation specification is exceeded, the job will be aborted with the Memory limit exceeded
error.
Note that in addition to the user process, there is also a job_proxy
process, which is a layer between the user process and YTsaurus. Depending on different circumstances, job proxy can consume a significant amount of RAM for compression, columnar read/write mechanism, erasure, and other needs. The operation controller supports individual digests for the user process (user_job
) and the job_proxy
process. In the case of the job_proxy
process, the controller makes an assumption about the expected memory usage based on the number of input/output tables and their settings. Based on the assumption , the digest selects a reserve (ranging from 0.5 to 2.0 by default).
Besides that, the memory allocated to the user process has a lower limit. By default, it is indicated in the specification and equals 0.05. However, if a job orders tmpfs
, the memory reserve cannot be less than tmpfs_size / memory_limit
, i.e. the system tends not to allow overcommit on tmpfs
, because it can cause scheduler locking.
Statistics
There are a number of values in the operation statistics that enable you to analyze the reserved memory:
user_job/memory_reserve
andjob_proxy/memory_reserve
(bytes): The memory reserve allocated to the appropriate job process.user_job/max_memory
andjob_proxy/max_memory
(bytes): The memory consumed by the appropriate job process. Maximum during the operation of a particular job.job_proxy/estimated_memory
(bytes) : The scheduler's evaluation of thejob_proxy
process memory of a given job.user_job/memory_limit
(bytes) : The job memory limit specified by the user.user_job/cumulative_memory_reserve
,job_proxy/cumulative_memory_reserve
,user_job/cumulative_max_memory
,job_proxy/cumulative_max_memory
,job_proxy/cumulative_estimated_memory
(bytes * seconds): The appropriate cumulative metrics taking into account the job duration.
Digest settings
The following options are available to the user in the operation specification:
user_job_memory_digest_default_value
: Initial assumption for selecting the memory reserve (the default value is 0.5).user_job_memory_digest_lower_bound
: The limit below which the reserve must not fall (the default value is 0.05). We do not recommend changing the default value.memory_reserve_factor
: The alias for theuser_job_memory_digest_lower_bound
anduser_job_memory_digest_default_value
options concurrently. Using this option is not recommended.