Transactions
This section describes transactions as applied to Cypress, and static tables, locks, and versioning of Cypress objects.
The transactional model of dynamic tables is described in the Multiversioning and transaction processing of dynamic tables section.
General information
The YTsaurus system supports transactions, but differs in a number of ways from the classic transaction processing model:
- Transactions can last minutes or hours.
- Isolation is configurable.
- There are no foreign table keys.
- Transactions can affect other Cypress nodes in addition to tables.
Transactions are objects of the transaction
type. A list of all system transactions can be found in Cypress at //sys/transactions
.
A transaction can have another transaction as a parent. Transactions form a tree whose roots are transactions without parents, also called topmost transactions. A list of all topmost transactions is available at //sys/topmost_transactions
.
YTsaurus ensures the following properties of the transaction processing system:
- Atomicity. YTsaurus guarantees that the transaction will not be partially committed in the system. Either all or none of its sub-operations will be executed. Changing the data of a single Cypress node within a single command (for example, the set command) is atomic.
- Consistency. Changes introduced by committing a transaction maintain data consistency in Cypress nodes and static tables.
- Isolation. Unlike traditional transaction processing systems, YTsaurus enables you to choose isolation on a per-transaction basis. YTsaurus behavior corresponds to the Read committed isolation level when interacting with Cypress and Serializable when interacting with static tables within transactions.
- Durability. YTsaurus guarantees the safety of changes after the transaction is committed. A single server hardware failure, power outage, or system shutdown cannot cause a loss of made changes. Transactions in the system can survive a system shutdown and continue running after the system is restored.
When you perform most actions in the system, you can specify which transaction those actions should be performed under. Even if no transaction is explicitly specified, the actions are atomic. YTsaurus starts a special system transaction and completes it at the end of the command execution.
Transactions are divided into master transactions and tablet transactions. Master transactions enable you to perform operations on master metainformation.
Tablet transactions enable you only to write data to dynamic tables. For more information about tablet transactions, see Dynamic tables.
Master transactions
Transactional processing within master servers applies to versionable objects. Examples of such objects are files, tables, and folders.
Creating a transaction
To create a transaction, run the start_tx
command.
You can specify a parent transaction in the parent_id
attribute and a transaction lifetime in the timeout
attribute.
The upper transaction lifetime limit in the system is one hour. If you specify a timeout
of more than one hour, it will be equal to the limit.
The reference point for transaction lifetime is the latter of start_tx
moment or last ping_tx
moment.
Extending a transaction lifetime
To extend a transaction lifetime, run the ping_tx
command.
Each execution extends the transaction lifetime by a time interval equal to timeout
. If the time since transaction creation or the last execution of the ping_tx
command exceeds timeout
, the transaction will be aborted.
Completing a transaction
You can abort a transaction using the abort_tx
or successfully complete it using the commit_tx
command.
If you abort a transaction, all nested transactions will also be stopped.
To successfully complete a transaction with nested transactions, run the commit_tx
command on all nested transactions and then on the parent transaction.
In all other cases, completing a transaction cannot cause an error.
YTsaurus uses pessimistic locks, so possible conflicts are detected as they occur — when locks are taken and objects are created within transactions rather than when a transaction is completed.
Transaction attributes
In addition to the attributes inherent to all objects, transactions have the following attributes:
Attribute | Type | Description | Mandatory |
---|---|---|---|
timeout |
integer |
Transaction timeout in ms. May be omitted for some system transactions. | No |
title |
string |
Text description string. This attribute is filled in automatically for all system transactions and for user transactions only if the user specifies it themselves when creating a transaction. | No |
last_ping_time |
DateTime |
Time of the last transaction lifetime extension. May be missing for some system transactions. | No |
parent_id |
Guid |
Parent transaction ID. | Yes |
start_time |
DateTime |
Transaction creation time. | Yes |
nested_transaction_ids |
array<Guid> |
A list of nested transaction IDs. | Yes |
staged_object_ids |
array<Guid> |
A list of IDs of objects that the transaction temporarily owns. | Yes |
branched_node_ids |
array<Guid> |
A list of branched Cypress node IDs. | Yes |
locked_node_ids |
array<Guid> |
A list of locked Cypress node IDs. | Yes |
lock_ids |
array<Guid> |
A list of IDs of locks created in the transaction. | Yes |
resource_usage |
ResourceUsageMap |
An attribute that shows the use of resources in a given transaction for each affected account. | Yes |
Note
Transactions created by the system always have the filled in title
attribute. It contains a description of the operations associated with the transaction.
Users are also encouraged to use this attribute to describe the purpose of the transaction.
Locks
Versioning of Cypress nodes is related to the concept of locks. By taking a lock on the node, the transaction expresses its intention to work with the node according to the locking mode and, if successful, gets a guarantee that:
- It is allowed to work with this node in the specified manner.
- The node for this transaction is branched.
Note
The node may have been branched before, but if there is a lock, the branched version exists.
The opposite is not true: in certain scenarios, there may be branched but not locked nodes. An attempt to work with such a node will always result in taking a lock on it. Branching and taking a lock are related, but are different things.
A lock is a full-fledged object that has an ID. A list of all locks in the system is available at //sys/locks
. We recommend using an address of the #lock-id
form to access a specific lock.
A list of locks taken by a transaction is displayed in its lock_ids
attribute.
Locking modes
The following locking modes are available: snapshot
, exclusive
, and shared
.
A locking mode defines the list of allowed transaction actions, as well as the ability to take other locks:
snapshot
: The transaction can read but not modify the node. The lock is used to take a read-only copy of the Cypress node state in the context of a transaction and freeze its state.
Note
A snapshot lock is taken only on the node itself, but not on the path to it in Cypress. If you continue accessing the node using its path, you can get a new node placed on the same path.
To guarantee that you can access the snapshot, use one of the following methods:
— Take a lock on the node id
. In this case, the node can be changed between computing the id
and taking the lock, so additional attempts may be required.
— Get the id
of the snapshot node by lock_id
. Use the lock
method to get the lock_id
, then read the snapshot node id from the #<lock_id>/@node_id
path.
-
exclusive
: The transaction can modify the node state. Other transactions cannot change the node. -
shared
: The transaction can modify a certain part of the node state. Other transactions can still change other parts of this node.There are three standard scenarios for using this lock:
- Concurrently appending data to a table or file from multiple transactions. In this case, you can only append data, but not overwrite it.
- Concurrently creating several differently named subfolders within the same folder from multiple transactions. For example, transaction T1 can be started and create (or delete) the
//tmp/a
node and transaction T2 can be started and create (or delete) the//tmp/b
node. Each of them will take a separateshared
lock on//tmp
. To detect conflicts, each lock has thechild_key
attribute indicating which key (subfolder) is locked by it. - Concurrently creating several differently named attributes within the same node from multiple transactions. For example, transaction T1 can be started and set (change, delete) the
//tmp/@a
attribute and transaction T2 can be started and set (change, delete) the//tmp/@b
attribute. To detect conflicts, each lock has theattribute_key
attribute indicating which key (attribute name) is locked by it.
Note
Transactions can be nested. There can be more than one exclusive
lock on the node due to nested transactions.
Implicit locks
A transaction can take locks either explicitly using the lock
command or implicitly. Implicit taking of locks can occur in case of certain interactions with Cypress nodes, for example:
- Creating a node is accompanied by taking an
exclusive
lock. - Writing to a table or file means taking a
shared
lock if writing to the end of the table and anexclusive
lock if overwriting it. - Creating a new entry in the folder, as well as changing or deleting an existing entry is accompanied by taking a
shared
lock with the correspondingchild_key
. - Creating a new node attribute, as well as changing or deleting an existing attribute is accompanied by taking a
shared
lock with the correspondingattribute_key
.
Lock compatibility
Some lock combinations can be taken concurrently. The formal rules are as follows:
- A
snapshot
lock can always be taken. If this transaction has already taken asnapshot
lock, an attempt to take the lock again is completed without errors and has no effect. - A
shared
orexclusive
lock cannot be taken if the transaction or any of its ancestors has already taken asnapshot
lock. - A
shared
orexclusive
lock cannot be taken if another transaction that is not an ancestor of the given one has already taken anexclusive
lock. - An
exclusive
lock cannot be taken if another transaction that is not an ancestor of the given one has already taken ashared
lock. - A
shared
lock with the specifiedchild_key
cannot be taken if another transaction that is not an ancestor of the given one has already taken ashared
lock with the samechild_key
. - A
shared
lock with the specifiedattribute_key
cannot be taken if another transaction that is not an ancestor of the given one has already taken ashared
lock with the sameattribute_key
. - A
shared
lock withoutchild_key
andattribute_key
can be taken despite any othershared
locks.
Lock operations
There are two commands to work with locks: lock
and unlock
.
The lock
command enables you to take a lock on a Cypress node in a specified transaction.
The unlock
command does the opposite: it removes all explicit locks from the node for a given transaction, both those already taken and those still in the lock queue.
The lock can only be removed if the locked branched version of the node contains no changes compared to the original. Consequently, an explicit snapshot
lock can always be removed. Otherwise the command will end with an error.
Locks are automatically removed at the end of the transaction, whether successful or unsuccessful. Therefore, there is usually no need to remove them manually.
We recommend using the unlock
command only when you need to take and release locks without completing the transaction. Such transactions are usually designed to manage the synchronization of third-party services.
Lock queue
Each Cypress node may have a lock queue.
By default, the lock
command tries to take a lock and returns an error if the node is already locked. If you specify the waitable
parameter equal to true
in the lock
command, the lock will be queued.
To find out whether a lock is in the queue, request the state
attribute. If the lock is in the queue, it will be pending
, if not, it will be acquired
.
Attention!
For a lock to be actually taken, for example exclusively, its state must become acquired
.
To take a lock:
- Call the
lock
command. - Monitor the
state
in the cycle until it takes on theacquired
value.
Lock attributes
In addition to the attributes inherent to all objects, locks have the following attributes:
Attribute | Type | Description |
---|---|---|
state |
string |
Lock state: pending or acquired . |
transaction_id |
Guid |
ID of the transaction that took the lock. |
mode |
string |
Locking mode: shared , exclusive , or snapshot . |
child_key |
string |
The key on which the lock is taken. For the shared type only. |
attribute_key |
string |
The name of the attribute on which the lock is taken (for the shared type only) |
Versioning
Changing the node state by a transaction is the following three-phase process:
-
Transaction
T
takes the lock on nodeN
. VersionN:T
appears for nodeN
and it is formed as follows:- For a
snapshot
lock, it branches off from versionN:T'
whereT'
is the closest ancestor ofT
that branched offN
. - For
shared
andexclusive
locks,N:T''
branches off fromN:T'
whereT''
is the child ofT'
;N:T'''
branches off fromN:T''
whereT'''
is the child ofT''
and so on up toN:T
. In other words, a chain of branched node versions is created for each transaction fromT'
toT
.
If there is no such
T'
, the real version ofN
is used. It does not matter for this description whether the lock was taken explicitly or implicitly. - For a
-
Transaction
T
works with nodeN
, with the actual changes accumulated in its versionN:T
. -
Transaction
T
completes successfully and the changes it made toN:T
, if any, get merged into the version from which versionN:T
was branched. Thus, these changes become visible to the parent transaction or to all if transactionT
was a topmost transaction.
Using transactions in operations
When starting operations, the scheduler creates a set of transactions to provide some atomicity of data processing in the operation. Learn more about the work schema in the Transaction processing of data section.