Setting up locations
For the YTsaurus cluster to work, its components need disk space to store persistent, temporary, and debugging data. Paths to the file system directories intended for YTsaurus are set in static configs of components generated by the K8s operator. In the YTsaurus specification, paths are defined in the Locations section. Location types correspond to relevant components. Volumes are allocated to locations using standard Kubernetes concepts: volumeMounts
, volumes
(for non-persistent volumes), and volumeClaimTemplates
(for persistent volumes).
When using k8s in the cloud, it is usually necessary to specify a cloud-specific storageClassName
in the spec
section of the volumeClaimTemplates
. For example, in the case of AWS, you should use auto-ebs-sc
as the storageClassName
.
Location types
MasterChangelogs, MasterSnapshots
Used to store master data, which includes all the cluster's metadata. These types must be placed on persistent volumes regardless of the installation size (except very small ones). The required volume size depends on the amount of metadata and the cluster load. Overloaded locations can make the cluster unavailable, which is why we recommend allocating some extra space and keeping an eye on its usage. Production installations are typically hundreds of gigabytes in size.
In order to ensure performance in production installations, we recommend placing MasterChangelogs
on separate, fast (NVMe, for instance) volumes since logging latency directly impacts the latency of mutating requests submitted to the master.
Each master instance can and should have exactly one MasterChangelogs
location and one MasterSnapshots
location.
ChunkCache, ImageCache, Slots
Used by exec nodes when initiating jobs that contain user code (Map, Reduce, Vanilla, including CHYT and SPYT). ChunkCache
locations are used for managing and caching binary artifacts like executable files or auxiliary dictionaries. ImageCeche
locations are used for caching container images, depending on configuration of job environment. Slots
locations are needed to allocate a temporary workspace (sandbox, scratch space) when running user processes. Each exec node must have at least one ChunkCache
location and at least one Slots
location. When allocating multiple ChunkCache
or Slots
locations for a single node, the exec node will try to balance the load between them.
Non-persistent volumes can be used for ChunkCache
, ImageCache
and Slots
locations without compromising data reliability. Typical location sizes are 10–50 GB for ChunkCache
and 5–200 GB for Slots
and ImageCache
.
ChunkStore
Used by data nodes to store chunks. These types must be placed on persistent volumes regardless of the installation size (except very small ones). The size of the locations determines the total capacity of the cluster. When building a multi-tiered storage (containing different disk types, like HDD and SSD), specify the medium
parameter in the location description. By default, the location is assigned to the default
medium.
The minimum location size is 10 GB for a test installation and 100 GB for production. Each data node must have at least one ChunkStore
location.
Logs
Used by all components to store logs. Setting these locations is optional. If you don't set any, logs are saved to /var/log
within the container. You can use non-persistent volumes — this won't compromise data reliability but can complicate debugging when migrating or re-creating pods. Typical location sizes for production installations are 50–200 GB.
Each instance can have no more than one Logs
location.
Specification examples
Example setup for volumes and master locations
primaryMasters:
# Other master parameters.
volumeClaimTemplates:
- metadata:
name: master-changelogs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 200Gi
- metadata:
name: master-snapshots
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 200Gi
volumes:
- name: master-logs
emptyDir:
sizeLimit: 100Gi
volumeMounts:
- name: master-changelogs
mountPath: /yt/master-changelogs
- name: master-snapshots
mountPath: /yt/master-snapshots
- name: master-logs
mountPath: /yt/master-logs
locations:
- locationType: MasterChangelogs
path: /yt/master-changelogs
- locationType: MasterSnapshots
path: /yt/master-snapshots
- locationType: Logs
path: /yt/master-logs
Sample configuration for volumes and data node locations
dataNodes:
volumeClaimTemplates:
- metadata:
name: chunk-store-hdd
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
- metadata:
name: chunk-store-ssd
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
- metadata:
name: node-logs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 300Gi
volumeMounts:
- name: chunk-store-hdd
mountPath: /yt/node-chunk-store-hdd
- name: chunk-store-ssd
mountPath: /yt/node-chunk-store-ssd
- name: node-logs
mountPath: /yt/node-logs
locations:
- locationType: ChunkStore
path: /yt/node-chunk-store-hdd
medium: default
- locationType: ChunkStore
path: /yt/node-chunk-store-ssd
medium: ssd_blobs
- locationType: Logs
path: /yt/node-logs
Configuring volumes for K8s clusters without dynamic volume provisioning
There are two ways to mark up disks and create persistent volumes on clusters without dynamic provisioning for volumes:
- Using hostPath-type volumes
- Manually creating persistent volumes with claimRef
Using hostPath volumes
Sample configuration for a homogeneous set of hosts with disks mounted in the /yt/chunk-store-hdd-1
, /yt/chunk-store-hdd-2
, and /yt/chunk-store-ssd-1
directory.
dataNodes:
volumes:
- name: chunk-store-hdd-1
hostPath:
path: /yt/chunk-store-hdd-1
- name: chunk-store-hdd-2
hostPath:
path: /yt/chunk-store-hdd-2
- name: chunk-store-ssd-1
hostPath:
path: /yt/chunk-store-ssd-1
volumeMounts:
- name: chunk-store-hdd-1
mountPath: /yt/node-chunk-store-hdd-1
- name: chunk-store-hdd-2
mountPath: /yt/node-chunk-store-hdd-2
- name: chunk-store-ssd-1
mountPath: /yt/node-chunk-store-ssd-1
locations:
- locationType: ChunkStore
# Location path can be a nested path of a volume mount.
path: /yt/node-chunk-store-hdd-1/chunk_store
medium: default
- locationType: ChunkStore
path: /yt/node-chunk-store-hdd-2/chunk_store
medium: default
- locationType: ChunkStore
path: /yt/node-chunk-store-ssd-1/chunk_store
medium: ssd_blobs
# Place logs onto the first hdd disk, along with chunk store. Different locations may possibly share the same volume.
- locationType: Logs
path: /yt/node-chunk-store-hdd-1/logs