Setting up locations
For the YTsaurus cluster to work, its components need disk space to store persistent, temporary, and debugging data. Paths to the file system directories intended for YTsaurus are set in static configs of components generated by the K8s operator. In the YTsaurus specification, paths are defined in the Locations section. Location types correspond to relevant components. Volumes are allocated to locations using standard Kubernetes concepts: volumeMounts
, volumes
(for non-persistent volumes), and volumeClaimTemplates
(for persistent volumes).
Location types
MasterChangelogs, MasterSnapshots
Used to store master data, which includes all the cluster's metadata. These types must be placed on persistent volumes regardless of the installation size (except very small ones). The required volume size depends on the amount of metadata and the cluster load. Overloaded locations can make the cluster unavailable, which is why we recommend allocating some extra space and keeping an eye on its usage. Production installations are typically hundreds of gigabytes in size.
In order to ensure performance in production installations, we recommend placing MasterChangelogs
on separate, fast (NVMe, for instance) volumes since logging latency directly impacts the latency of mutating requests submitted to the master.
Each master instance can and should have exactly one MasterChangelogs
location and one MasterSnapshots
location.
ChunkCache, ImageCache, Slots
Used by exec nodes when initiating jobs that contain user code (Map, Reduce, Vanilla, including CHYT and SPYT). ChunkCache
locations are used for managing and caching binary artifacts like executable files or auxiliary dictionaries. ImageCeche
locations are used for caching container images, depending on configuration of job environment. Slots
locations are needed to allocate a temporary workspace (sandbox, scratch space) when running user processes. Each exec node must have at least one ChunkCache
location and at least one Slots
location. When allocating multiple ChunkCache
or Slots
locations for a single node, the exec node will try to balance the load between them.
Non-persistent volumes can be used for ChunkCache
, ImageCache
and Slots
locations without compromising data reliability. Typical location sizes are 10–50 GB for ChunkCache
and 5–200 GB for Slots
and ImageCache
.
ChunkStore
Used by data nodes to store chunks. These types must be placed on persistent volumes regardless of the installation size (except very small ones). The size of the locations determines the total capacity of the cluster. When building a multi-tiered storage (containing different disk types, like HDD and SSD), specify the medium
parameter in the location description. By default, the location is assigned to the default
medium.
The minimum location size is 10 GB for a test installation and 100 GB for production. Each data node must have at least one ChunkStore
location.
Logs
Used by all components to store logs. Setting these locations is optional. If you don't set any, logs are saved to /var/log
within the container. You can use non-persistent volumes — this won't compromise data reliability but can complicate debugging when migrating or re-creating pods. Typical location sizes for production installations are 50–200 GB.
Each instance can have no more than one Logs
location.
Specification examples
Example setup for volumes and master locations
primaryMasters:
# Other master parameters.
volumeClaimTemplates:
# Persistent volume for master changelogs, uses dynamic volume provisioner in YC, non-replicated SSD storage class.
- metadata:
name: master-changelogs
spec:
storageClassName: yc-network-ssd-nonreplicated
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 200Gi
# Persistent volume for master snapshots, uses dynamic volume provisioner in YC, HDD storage class.
- metadata:
name: master-snapshots
spec:
storageClassName: yc-network-hdd
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 200Gi
volumes:
# Non-persistent volume for debug logs.
- name: master-logs
emptyDir:
sizeLimit: 100Gi
volumeMounts:
- name: master-changelogs
mountPath: /yt/master-changelogs
- name: master-snapshots
mountPath: /yt/master-snapshots
- name: master-logs
mountPath: /yt/master-logs
locations:
- locationType: MasterChangelogs
path: /yt/master-changelogs
- locationType: MasterSnapshots
path: /yt/master-snapshots
- locationType: Logs
path: /yt/master-logs
Sample configuration for volumes and data node locations
dataNodes:
volumeClaimTemplates:
# Persistent volume for chunk stores on HDD.
- metadata:
name: chunk-store-hdd
spec:
storageClassName: yc-network-hdd
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
# Persistent volume for chunk stores on SSD.
- metadata:
name: chunk-store-ssd
spec:
storageClassName: yc-network-ssd
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
# Persistent volume for debug logs.
- metadata:
name: node-logs
spec:
storageClassName: yc-network-hdd
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
volumeMounts:
- name: chunk-store-hdd
mountPath: /yt/node-chunk-store-hdd
- name: chunk-store-ssd
mountPath: /yt/node-chunk-store-ssd
- name: node-logs
mountPath: /yt/node-logs
locations:
- locationType: ChunkStore
path: /yt/node-chunk-store-hdd
medium: default
- locationType: ChunkStore
path: /yt/node-chunk-store-ssd
medium: ssd_blobs
- locationType: Logs
path: /yt/node-logs
Configuring volumes for K8s clusters without dynamic volume provisioning
There are two ways to mark up disks and create persistent volumes on clusters without dynamic provisioning for volumes:
- Using hostPath-type volumes
- Manually creating persistent volumes with claimRef
Using hostPath volumes
Sample configuration for a homogeneous set of hosts with disks mounted in the /yt/chunk-store-hdd-1
, /yt/chunk-store-hdd-2
, and /yt/chunk-store-ssd-1
directory.
dataNodes:
volumes:
- name: chunk-store-hdd-1
hostPath:
path: /yt/chunk-store-hdd-1
- name: chunk-store-hdd-2
hostPath:
path: /yt/chunk-store-hdd-2
- name: chunk-store-ssd-1
hostPath:
path: /yt/chunk-store-ssd-1
volumeMounts:
- name: chunk-store-hdd-1
mountPath: /yt/node-chunk-store-hdd-1
- name: chunk-store-hdd-2
mountPath: /yt/node-chunk-store-hdd-2
- name: chunk-store-ssd-1
mountPath: /yt/node-chunk-store-ssd-1
locations:
- locationType: ChunkStore
# Location path can be a nested path of a volume mount.
path: /yt/node-chunk-store-hdd-1/chunk_store
medium: default
- locationType: ChunkStore
path: /yt/node-chunk-store-hdd-2/chunk_store
medium: default
- locationType: ChunkStore
path: /yt/node-chunk-store-ssd-1/chunk_store
medium: ssd_blobs
# Place logs onto the first hdd disk, along with chunk store. Different locations may possibly share the same volume.
- locationType: Logs
path: /yt/node-chunk-store-hdd-1/logs