Node administration

This section describes some of the features available for administering cluster nodes.

Pending restart

The primary use case for this flag is to optimize master replication operations during maintenance of cluster data nodes, where the nods are assumed to be down for a short time. Before initiating such maintenance, we recommend running the following command:

$ yt add-maintenance --component="cluster_node" --type="pending_restart" --address="my-node.yandex.net"  --comment="my comment"

To make sure that the restart pending flag is set for the node, use the following command

$ yt get //sys/cluster_nodes/my-node.yandex.net/@pending_restart
> true

How it works

Setting the restart pending flag for a node object significantly extends its lease transaction timeout. This means that the node is still considered online for an extended period even after it stops making periodic requests to the master server to confirm its availability. You can adjust the duration of this interval in the dynamic config of the master server at //sys/@config/node_tracker/pending_restart_lease_timeout (the default value is 10 minutes).

Optimization of the master chunk replicator is achieved through special handling of replicas that are located on nodes with this flag. We'll refer to these nodes as temporarily unavailable replicas. Replicas located on regular nodes will be called available replicas.

For instance, a chunk in a regular format is considered underreplicated if at least one of the following conditions is met:

  • The total number of available + temporarily unavailable replicas is less than the replication factor.
  • The number of available replicas is less than 1 + max replicas per rack.
  • There are more than one temporarily unavailable replicas.

In turn, erasure chunks are considered parity missing or data missing if:

  • One of the replicas is unavailable, and it isn't temporarily unavailable.
  • The difference between the minimum number of replicas required for the chunk to be repairable and the number of temporarily unavailable replicas is less than max erasure replicas per rack.

In a typical cluster configuration with replication factor = 3 and max replicas per rack = 1, a practical application of pending restart assumes that simultaneous maintenance will be limited to nodes within a single rack.

Removing the flag

The pending restart flag is removed after the node restarts, specifically when it's re-registered at the master. You can also remove the flag explicitly:

$ yt remove-maintenance --component="cluster_node" --address="my-node.yandex.net" --id="<maintenance-id>"

Prolonged use of this flag can be dangerous, so its validity is intentionally limited by a configurable interval, the previously mentioned pending_restart_lease_timeout, after which the flag is removed automatically. The deadline is calculated from the last time the flag was set.