Selecting memory for jobs
Memory is the RAM consumed by operation jobs.
For all operations, there is a default mechanism for selecting the actual memory reserve for the jobs to be run.
The operation controller collects memory consumption information for all successfully completed jobs. This information is saved as a special digest, based on which the actual memory reserve for the newly run jobs is selected.
The main aspects of the digest:
- Collects actual values (typically in the range from 0.0 to 1.0) describing the share of
memory_limitconsumed by the job.
- Only successfully completed jobs and jobs that were aborted because they exceeded the memory reserve are counted.
- For jobs that were aborted because they exceeded the reserve, a point with the memory consumption share of that job multiplied by a certain constant slightly above one (the default value is 1.1) is added to the digest.
memory_limit * percentile(digest, P)(by default,
Pis 0.95) is taken as a reserve for the newly run job.
- The digest does not store all points internally, but uses a data structure that allows an approximate calculation of percentiles.
The memory reserve allocated to the job is used by the scheduler to consider consumption the lower limit. If the reserve value is exceeded and there is no free memory on the cluster node, the job may be aborted due to the
resource_overdraft reason. If there is extra free memory on the cluster node, the job may exceed its allocated reserve and be completed successfully. If the memory limit specified by the user in the operation specification is exceeded, the job will be aborted with the
Memory limit exceeded error.
Note that in addition to the user process, there is also a
job_proxy process, which is a layer between the user process and YTsaurus. Depending on different circumstances, job proxy can consume a significant amount of RAM for compression, columnar read/write mechanism, erasure, and other needs. The operation controller supports individual digests for the user process (
user_job) and the
job_proxy process. In the case of the
job_proxy process, the controller makes an assumption about the expected memory usage based on the number of input/output tables and their settings. Based on the assumption , the digest selects a reserve (ranging from 0.5 to 2.0 by default).
Besides that, the memory allocated to the user process has a lower limit. By default, it is indicated in the specification and equals 0.05. However, if a job orders
tmpfs, the memory reserve cannot be less than
tmpfs_size / memory_limit, i.e. the system tends not to allow overcommit on
tmpfs, because it can cause scheduler locking.
There are a number of values in the operation statistics that enable you to analyze the reserved memory:
job_proxy/memory_reserve(bytes): The memory reserve allocated to the appropriate job process.
job_proxy/max_memory(bytes): The memory consumed by the appropriate job process. Maximum during the operation of a particular job.
job_proxy/estimated_memory(bytes) : The scheduler's evaluation of the
job_proxyprocess memory of a given job.
user_job/memory_limit(bytes) : The job memory limit specified by the user.
job_proxy/cumulative_estimated_memory(bytes * seconds): The appropriate cumulative metrics taking into account the job duration.
The following options are available to the user in the operation specification:
user_job_memory_digest_default_value: Initial assumption for selecting the memory reserve (the default value is 0.5).
user_job_memory_digest_lower_bound: The limit below which the reserve must not fall (the default value is 0.05). We do not recommend changing the default value.
memory_reserve_factor: The alias for the
user_job_memory_digest_default_valueoptions concurrently. Using this option is not recommended.