Metadata tree

This section describes Cypress, the meta information tree. Cypress contains various system information as well as indications of where user data is stored. This section includes three subsections describing a general tree view, Cypress node attributes, and TTL for Cypress nodes.

General tree view

To a user, Cypress looks like a Linux file system tree but with a number of significant differences. First, every tree node has an associated collection of attributes, including some user-defined ones. Second, the tree is transactional. Third, files and directories, as well as other objects, can serve as Cypress nodes. Like a file system, Cypress supports an access control system.

Cypress is rooted at / which has map_node type (that is, it's a directory). Cypress nodes are addressed using YPath.

Example paths: //tmp is the temporary directory, //tmp/@ is a pointer to the directory attributes, //tmp/table/@type is the path to the type attribute of the //tmp/table node.

Using YPath, you can represent Cypress as follows:

/
  /home
    /user1
      /table
        /@id
        /@chunk_ids
        /@type
        ...
      ...
    /user2
    ...
  /tmp
  /sys
    /chunks
      ...
    ...
  ...

You can manipulate Cypress via a CLI.

Cypress node attributes

In addition to attributes common to all objects, Cypress nodes have additional attributes listed in the table:

Attribute Type Value
parent_id string Parent node ID (none for the root)
locks array<Lock> List of locks taken out on a node
lock_mode LockMode Current node lock mode (transaction-dependent)
path string Node absolute path
key string Key to access this node in its parent folder (if the node is so nested)
creation_time DateTime Node create time
modification_time DateTime Node most recent modification time
access_time DateTime Node most recent access time
expiration_time DateTime Time automatically to delete a node. Optional attribute
expiration_timeout DateTime A timeout for the automatic deletion of a node if it has not been accessed. Optional attribute
access_counter integer Number of times a node has been accessed since being created
revision integer Node revision
resource_usage ClusterResources Cluster resources appropriated by a node
recursive_resource_usage ClusterResources Cluster resources appropriated by a node and its entire subtree
account string Account used to keep track of the resources being used by a specific node
annotation string Human-readable summary description of an object

Each node has its own attribute responsible for access control. Therefore, its attributes include inherit_acl, acl, and owner. For more information, see Access control system.

Time attributes

The creation_time attribute stores the node create time. The modification_time attribute stores the time of the last update of the node and node attribute. modification_time does not track child node updates, that is, modification_time for map_node does not change if there are changes somewhere deep in the tree.

When a node is created and every time a node is modified, the system updates its revision attribute. It stores a non-negative integer. The revision number is guaranteed to increase in a strictly monotonous manner over time. You can use revisions to verify that a node has not updated. revision updates together with modification_time.

The access_time attribute stores the most recent node access time. Attribute access does not count. In addition, to improve performance, the system does not update this attribute for every access transaction but rather accumulates such transactions and updates access_time approximately once per second.

Attention

In rare cases, an attribute may have been accessed without an access_time update because of a master server fault.

Most commands used for reads and writes include the suppress_access_tracking and the suppress_modification_tracking options that disable access_time, modification_time, and revision updates, respectively. for reading and writing. In particular, the web interface uses suppress_access_tracking, so viewing the contents via the web UI doesn't trigger access_time updates.

Note

In the event that a transaction creates or modifies a node, the above attributes are set once during updates within the transaction. Thus, a node may become visible in parent transactions much later than its creation_time: only after a commit of the relevant transaction.

Cypress node TTL

Cypress can delete nodes automatically at a specified moment in time or if nodes are not accessed for a certain length of time. This feature is controlled by the expiration_time and the expiration_timeout attributes. By default, these attributes are not there, so the system will not delete a node automatically. For TTL to function, you need:

  • to set expiration_time to a moment in time when the node is to be deleted. If it is a composite node, this will also delete its entire subtree.
  • to set expiration_timeout to a time interval during which there have to be no attempts to access the node (and its entire subtree if it is a composite node) for it to be deleted.

The moment in time has to be either an isoformat string or an integer denoting the number of milliseconds since the epoch. These two methods are equivalent:

Attention

You cannot restore data deleted using this mechanism. Use it with caution.

yt set //home/project/path/table/@expiration_time '"2020-05-16 15:12:34.591+03:00"'
yt set //home/project/path/table/@expiration_time '1589631154591'

A time interval is specified in milliseconds:

# Delete a node if "left alone" for a week.
yt set //home/project/path/table/@expiration_timeout 604800000

Attention

Setting expiration_timeout for directories requires extreme caution. The lifetime of a directory is only prolonged by directly accessing it but not its subtree. For instance, reading a table that resides in a directory with an expiration_timeout does not prolong the lifetime of said directory.

You can modify these attributes within transactions; however, only their committed values will take effect.

To be able to set these attributes for a node, you need to have the right to write to the node itself same as for many other attributes as well as the remove privilege to the node and its entire subtree because a delete is being requested in effect, albeit a deferred one. The write privilege is sufficient to delete these attributes.

The system provides no guarantee that the delete will occur exactly at the time requested. In real life, the delete occurs within single seconds of the specified moment in time.

A node is not automatically deleted if at the specified moment in time it is subject to locks other than snapshot. The system will delete the node when all locks are released. You can use this property to extend a node's time-to-live artificially.

When you copy and move a node, expiration_time and expiration_timeout are reset by default, so the copy will not automatically delete. Commands include the preserve-expiration-time and the preserve-expiration-timeout options that enable you to change their behavior.

Attention

A number of API calls that create temporary tables set such tables' expiration_time/expiration_timeout to purge them automatically. You must keep that in mind and not store important data in such tables.

Deletion may occur earlier if the node is located in a subtree with a smaller expiration_time/expiration_timeout value at the root. To get the actual deletion time of a node, use the effective_expiration attribute:

$ yt get //home/project/path/table/@effective_expiration
{
  "time": {"value": 42, "path": //testator/path}
  "timeout": {"value": 42, "path": //testator/path}
}

If the path from the root to the node doesn't contain expiration_time or another relevant attribute, a YSON entity is written to the “time” field instead.