Setting up locations

For the YTsaurus cluster to work, its components need disk space to store persistent, temporary, and debugging data. Paths to the file system directories intended for YTsaurus are set in static configs of components generated by the K8s operator. In the YTsaurus specification, paths are defined in the Locations section. Location types correspond to relevant components. Volumes are allocated to locations using standard Kubernetes concepts: volumeMounts, volumes (for non-persistent volumes), and volumeClaimTemplates (for persistent volumes).

Location types

MasterChangelogs, MasterSnapshots

Used to store master data, which includes all the cluster's metadata. These types must be placed on persistent volumes regardless of the installation size (except very small ones). The required volume size depends on the amount of metadata and the cluster load. Overloaded locations can make the cluster unavailable, which is why we recommend allocating some extra space and keeping an eye on its usage. Production installations are typically hundreds of gigabytes in size.

In order to ensure performance in production installations, we recommend placing MasterChangelogs on separate, fast (NVMe, for instance) volumes since logging latency directly impacts the latency of mutating requests submitted to the master.

Each master instance can and should have exactly one MasterChangelogs location and one MasterSnapshots location.

ChunkCache, ImageCache, Slots

Used by exec nodes when initiating jobs that contain user code (Map, Reduce, Vanilla, including CHYT and SPYT). ChunkCache locations are used for managing and caching binary artifacts like executable files or auxiliary dictionaries. ImageCeche locations are used for caching container images, depending on configuration of job environment. Slots locations are needed to allocate a temporary workspace (sandbox, scratch space) when running user processes. Each exec node must have at least one ChunkCache location and at least one Slots location. When allocating multiple ChunkCache or Slots locations for a single node, the exec node will try to balance the load between them.

Non-persistent volumes can be used for ChunkCache, ImageCache and Slots locations without compromising data reliability. Typical location sizes are 10–50 GB for ChunkCache and 5–200 GB for Slots and ImageCache.

ChunkStore

Used by data nodes to store chunks. These types must be placed on persistent volumes regardless of the installation size (except very small ones). The size of the locations determines the total capacity of the cluster. When building a multi-tiered storage (containing different disk types, like HDD and SSD), specify the medium parameter in the location description. By default, the location is assigned to the default medium.

The minimum location size is 10 GB for a test installation and 100 GB for production. Each data node must have at least one ChunkStore location.

Logs

Used by all components to store logs. Setting these locations is optional. If you don't set any, logs are saved to /var/log within the container. You can use non-persistent volumes — this won't compromise data reliability but can complicate debugging when migrating or re-creating pods. Typical location sizes for production installations are 50–200 GB.

Each instance can have no more than one Logs location.

Specification examples

Example setup for volumes and master locations

primaryMasters:
  # Other master parameters.

  volumeClaimTemplates:
    # Persistent volume for master changelogs, uses dynamic volume provisioner in YC, non-replicated SSD storage class.
    - metadata:
        name: master-changelogs
      spec:
        storageClassName: yc-network-ssd-nonreplicated
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 200Gi
    # Persistent volume for master snapshots, uses dynamic volume provisioner in YC, HDD storage class.
    - metadata:
        name: master-snapshots
      spec:
        storageClassName: yc-network-hdd
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 200Gi

  volumes:
    # Non-persistent volume for debug logs.
    - name: master-logs
      emptyDir:
        sizeLimit: 100Gi

  volumeMounts:
    - name: master-changelogs
      mountPath: /yt/master-changelogs
    - name: master-snapshots
      mountPath: /yt/master-snapshots
    - name: master-logs
      mountPath: /yt/master-logs

  locations:
    - locationType: MasterChangelogs
      path: /yt/master-changelogs
    - locationType: MasterSnapshots
      path: /yt/master-snapshots
    - locationType: Logs
      path: /yt/master-logs

Sample configuration for volumes and data node locations

dataNodes:
  volumeClaimTemplates:
    # Persistent volume for chunk stores on HDD.
    - metadata:
        name: chunk-store-hdd
      spec:
        storageClassName: yc-network-hdd
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 300Gi
    # Persistent volume for chunk stores on SSD.
    - metadata:
        name: chunk-store-ssd
      spec:
        storageClassName: yc-network-ssd
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 300Gi
    # Persistent volume for debug logs.
    - metadata:
        name: node-logs
      spec:
        storageClassName: yc-network-hdd
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 300Gi

  volumeMounts:
    - name: chunk-store-hdd
      mountPath: /yt/node-chunk-store-hdd
    - name: chunk-store-ssd
      mountPath: /yt/node-chunk-store-ssd
    - name: node-logs
      mountPath: /yt/node-logs

  locations:
    - locationType: ChunkStore
      path: /yt/node-chunk-store-hdd
      medium: default
    - locationType: ChunkStore
      path: /yt/node-chunk-store-ssd
      medium: ssd_blobs
    - locationType: Logs
      path: /yt/node-logs

Configuring volumes for K8s clusters without dynamic volume provisioning

There are two ways to mark up disks and create persistent volumes on clusters without dynamic provisioning for volumes:

  • Using hostPath-type volumes
  • Manually creating persistent volumes with claimRef

Using hostPath volumes

Sample configuration for a homogeneous set of hosts with disks mounted in the /yt/chunk-store-hdd-1, /yt/chunk-store-hdd-2, and /yt/chunk-store-ssd-1 directory.

dataNodes:
  volumes:
    - name: chunk-store-hdd-1
      hostPath:
        path: /yt/chunk-store-hdd-1
    - name: chunk-store-hdd-2
      hostPath:
        path: /yt/chunk-store-hdd-2
    - name: chunk-store-ssd-1
      hostPath:
        path: /yt/chunk-store-ssd-1

  volumeMounts:
    - name: chunk-store-hdd-1
      mountPath: /yt/node-chunk-store-hdd-1
    - name: chunk-store-hdd-2
      mountPath: /yt/node-chunk-store-hdd-2
    - name: chunk-store-ssd-1
      mountPath: /yt/node-chunk-store-ssd-1

  locations:
    - locationType: ChunkStore
      # Location path can be a nested path of a volume mount.
      path: /yt/node-chunk-store-hdd-1/chunk_store
      medium: default
    - locationType: ChunkStore
      path: /yt/node-chunk-store-hdd-2/chunk_store
      medium: default
    - locationType: ChunkStore
      path: /yt/node-chunk-store-ssd-1/chunk_store
      medium: ssd_blobs
      # Place logs onto the first hdd disk, along with chunk store. Different locations may possibly share the same volume.
    - locationType: Logs
      path: /yt/node-chunk-store-hdd-1/logs