Commands
This section lists all the commands available in the YTsaurus system API and includes a full description of their options.
Structure
To put it simply, each command has the following structure:
- Command name (string).
- Description of the input and output formats (a YSON string with attributes).
- Parameters of executed action (a YSON structure).
- Input and output data streams.
Each command also defines the type of its input and output data. Below all the possible options for the input and output data:
- No data (
null
). - The data is a
binary
stream. For example, file contents. - The data is structured (
structured
). For example, regular Cypress nodes. - The data is a
binary
stream. For example, table contents.
In addition to that, each command can be:
- Mutating or not (whether it changes anything in the metastate or not).
- Light or heavy (light commands only transmit command parameters within a query, but heavy commands write or read the data stream).
Formats
If your command works with structured or tabular data, you need to specify a format for it. For the input stream, use the input_format
parameter, and for the output stream, use the output_format
parameter, respectively.
For structured data formats, YSON
(default) and JSON
formats are supported. For tabular data, there exist many formats.
Retries
A command retry is an option to repeat the query in the event of transient (intermittent) errors. It is expected that the response to a query retry would be indistinguishable from the response to the original query, as if there were no transient errors. It doesn't mean, however, that the response to a retry will be totally identical to the original query response (that is, only standard isolation is guaranteed).
Availability of retry options and the mechanism of retries depends on the command properties.
Light commands
For non-mutating light commands, you can repeat the original query.
The mutating light commands change the system status. That's why you need to hint the system that such a query has already been made. To do this, before executing the command, generate the mutation_id
for the command. This is a standard GUID that consists of four 32-bit numbers in the HEX format separated by a dash (-
).
Specify the mutation_id
generated both in the parameters of both the original query and in the retries. In addition, you need to add, to the original query, the retry
parameter with the value of false
. To the retried queries, add the retry
parameter with the value of true
. Some light mutating commands do not support retries (an example is concatenate
). If you need to retry such commands, you can use transactions.
Note
mutation_id
are usually kept for 5-10 minutes.
Heavy commands
You can't retry heavy commands However, you can implement a retry mechanism by using transactions.
Transactions
Transactions are an integral Cypress property. Many commands that interact with Cypress this way or another, are transactional. Each command or group of commands has a separate indication of whether it is transactional or not. Transactional commands support the following parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
No | null-transaction-id | Current transaction ID. |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
Other query parameters
You can also specify prerequisites for your queries. Moreover, for light non-mutating queries, you can also specify where to read the data from. For more information, see Query parameters.
Working with transactions
For more information about transactions, see the Transactions article.
start_tx
Command properties: Mutating, Light.
Semantics:
- Begin a new transaction in the context of the current transaction.
- The new transaction is a nested (internal) transaction for the given one.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
No | null-transaction-id | ID of the current transaction (it will become a parent transaction for the transaction created by the command). |
ping_ancestor_transactions |
No | false |
Whether to ping, when running the operation, all the parent transactions (to extend their TTL). |
timeout |
No | 15000 |
Transaction TTL since the last extension (in ms). |
deadline |
No | missing |
Transaction execution deadline (in UTC). |
attributes |
No | missing |
Enables you to set attributes for the created transaction. |
type |
No | master |
Enables you to set the transaction type: master or tablet . |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the new transaction.
Example:
PARAMETERS { "transaction_id" = "0-54b-1-36223d20" }
OUTPUT "0-54c-1-bb49086d"
ping_tx
Command properties: Mutating, Light.
Semantics:
- Update the transaction.
Detailed description.
ping_tx
pings the transaction on the server (including all the parent transactions ifping_ancestors
is specified). This way you can extend the TTL for the transaction.- If the transaction started at the time
s
with the timeout (TTL) oft
, then the transaction will complete ats+t
by default. - If you ping the transaction at the time
r
(s < r < s + t
), it will be extended untilr + t
.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
Yes | Parent transaction ID. | |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "transaction_id" = "0-54b-1-36223d20" }
commit_tx
Command properties: Mutating, Light.
Semantics:
- Complete the transaction successfully.
- While there are some incompleted internal transactions, the outer transaction can't complete.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
Yes | Transaction ID. | |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "transaction_id" = "0-54b-1-36223d20" }
abort_tx
Command properties: Mutating, Light.
Semantics:
- Abort the transaction.
- All the active internal transactions are aborted as well.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
Yes | Transaction ID. | |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
force |
No | false |
Forcibly aborts the transaction even if this may result in consistency loss (in case of a tablet transaction and a two-phase commit). |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "transaction_id" = "0-54b-1-36223d20" }
Working with Cypress
For more information about the metainformation tree, see Cypress.
Note
All the commands used to work with Cypress are transactional.
create
Command properties: Mutating, Light.
Semantics:
- Create a node of a specified type in Cypress.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. At default settings, the path should not exist. | |
type |
Yes | Node type. | |
recursive |
No | false |
Whether to create intermediate nodes recursively. |
attributes |
No | {} |
Enables you to set attributes for the created node. |
ignore_existing |
No | false |
If the created node exists already, it is not recreated. In particular, the transmitted attributes are ignored. Moreover, both the existing and created nodes must have the same type, otherwise the query will return an error. |
lock_existing |
No | false |
Set an exclusive lock on the specified node even if it already exists. This parameter is only used together with ignore_existing . If the lock couldn't be set, the command fails. |
force |
No | false |
If the specified node already exists, it is deleted and replaced with a new one. In this situation, the existing node can be of any type. When the node is recreated, its ID changes. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the created node.
Examples:
PARAMETERS { "path" = "//tmp/table" ; "type" = "table" }
OUTPUT "0-4-191-6c07cd58"
PARAMETERS { "type" = "user" ; attributes = { name = 'kulichek-robot' } }
OUTPUT "7727-417b1-1f5-d4f116ca"
PARAMETERS { "path" = "//tmp/document" ; "type" = "document" ; attributes = { value = {} } }
OUTPUT "7727-417b1-1f5-d4f116ca"
remove
Command properties: Mutating, Light.
Semantics:
- Delete the Cypress node.
- Deletes the node successfully, even if a non-empty subtree is growing from it.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. The path must exist. | |
recursive |
No | false |
Enables you to delete the entire subtree in the case when the deleted node is a composite type. |
force |
No | false |
Enables you to treat the command as successfully executed if the deleted node is missing. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//tmp/table" }
set
Command properties: Mutating, Light.
Semantics:
- Write new content into the Cypress node.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. If recursive == false , then the path must exist (except, probably, the last token). |
|
recursive |
No | false |
Create all non-existent intermediate nodes on the path. |
force |
No | false |
Enables you to modify any Cypress node instead of only attributes and documents. |
Input data:
- Type:
structured
. - Value: Node content.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//tmp/node" }
INPUT {
"my_integer" = 4 ;
"my_double" = 2.718281828 ;
"map" = { "a" = 1 ; "b" = 2 } ;
"list" = [ 1, 2, 3 ]
}
multiset_attributes
Command properties: Mutating, Light.
Semantics:
- Set several attributes at the specified path (if the attributes exist already, they are overwritten).
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to attributes of a node in Cypress. The path must exist. |
Input data:
- Type:
structured
. - Value: A map node that contains new attribute values.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//tmp/node/@" }
INPUT {
"attribute1" = 4 ;
"attribute2" = "attribute2 value";
}
get
Command properties: Non-mutating, Light.
Semantics:
- Get the content of the Cypress node.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. The path must exist. | |
attributes |
No | [] |
A list of attributes that need to be obtained along with each node. |
max_size |
No | missing |
Sets a limit on the number of children that will be issued in the case of virtual composite nodes (for regular map nodes, this option doesn't make sense). |
ignore_opaque |
No | false |
Ignore the opaque attribute when executing a query (never use this option without explicit advice from YTsaurus developers). |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: Node content.
Example:
PARAMETERS { "path" = "//tmp/node" }
OUTPUT {
"my_integer" = 4 ;
"my_double" = 2.718281828 ;
"map" = { "a" = 1 ; "b" = 2 } ;
"list" = [ 1, 2, 3 ]
}
list
Command properties: Non-mutating, Light.
Semantics:
- Get a list of descendants for the Cypress node.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. The path must exist. | |
attributes |
No | [] |
A list of attributes that need to be obtained along with each node. |
max_size |
No | missing |
Limits the number of descendants output. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
- Value: A list of the node's descendants.
Example:
PARAMETERS { "path" = "//tmp/node" }
OUTPUT [
"home" ;
"sys" ;
"statbox" ;
"tmp"
]
lock
Command properties: Mutating, Light.
Semantics:
- Set a lock on the Cypress node.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. The path must exist. | |
mode |
No | exclusive |
Lock type (snapshot, shared, exclusive). |
waitable |
No | false |
In case of a conflict, it's doesn't fail but is enqueued. You can find out whether the lock was actually set using the state attribute of the lock object. See Transactions. |
child_key |
No | The key in the dictionary on which the lock is taken (for the shared type only). |
|
attribute_key |
No | The name of the attribute on which the lock is taken (for the shared type only). |
Note
Be sure to specify the transaction ID for this command.
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the created lock and ID of the branched node.
Example:
PARAMETERS { "path" = "//tmp/node" }
OUTPUT {
OUTPUT "lock_id" = "0-1-3fe00c8-353e9ba4";
OUTPUT "node_id" = "0-1-3fe012f-9ad48d90";
OUTPUT }
unlock
Attention
Locks are automatically removed at the end of the transaction. Do not use the unlock
command unless absolutely needed. See Transactions.
Command properties: Mutating, Light.
Semantics:
- Release all the locks set for the selected transaction on the Cypress node.
- You can only release explicit locks.
- This command releases both already taken and enqueued locks.
- If the node is not locked, the command is considered successful.
- If the locked node version includes any changes compared to the original version, the unlock fails.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. The path must exist. |
Note
Be sure to specify the transaction ID for this command.
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//tmp/node" }
copy
Command properties: Mutating, Light.
Semantics:
- Copy the Cypress node to the new address.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
source_path |
Yes | Path to the source node in Cypress. The path must exist. | |
destination_path |
Yes | Path where the copy will be created. The path must not exist. | |
recursive |
No | false |
Whether to create the skipped levels (map nodes) at the destination path. |
ignore_existing |
No | false |
If the node already exists at destination_path , do nothing (you can't use this parameter together with force = %true ). |
lock_existing |
No | false |
Set an exclusive lock on the node at destination_path , even if it already exists. This parameter is only used together with ignore_existing . If the command fails to set a lock, it returns an error. |
force |
No | false |
Allows you to specify an existing node to be replaced, as the destination path. |
preserve_account |
No | false |
Whether to keep the accounts of the source nodes or use the account at the destination path. |
preserve_expiration_time |
No | false |
Whether to copy the expiration_time attribute or leave it empty. |
preserve_expiration_timeout |
No | false |
Whether to copy the expiration_timeout attribute or leave it empty. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the created node.
Example:
PARAMETERS {
"source_path" = "//tmp/from" ;
"destination_path" = "//tmp/to" ;
}
OUTPUT "0-4-191-6c07cd58"
move
Command properties: Mutating, Light.
Semantics:
- Move the Cypress node to a new path.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
source_path |
Yes | Path to the source node in Cypress. The path must exist. | |
destination_path |
Yes | A new path in Cypress. The path must not exist. | |
recursive |
No | false |
Whether to create the skipped levels (map nodes) at the destination path. |
force |
No | false |
Allows you to specify an existing node to be replaced, as the destination path. |
preserve_account |
No | false |
Whether to keep the accounts of the source nodes or use the account at the destination path. |
preserve_expiration_timeout |
No | false |
Whether to copy the expiration_timeout attribute or leave it empty. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS {
"source_path" = "//tmp/from" ;
"destination_path" = "//tmp/to" ;
}
link
Command properties: Mutating, Light.
Semantics:
- Create a symbolic link to the object at the new path.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
link_path |
Yes | Path where the link will be created. The path must not exist. | |
target_path |
Yes | Path to the source node in Cypress. The path must exist. | |
attributes |
No | missing |
Attributes of the node created as a result of the command. |
recursive |
No | false |
Whether to create intermediate nodes recursively. |
ignore_existing |
No | false |
Enables you to avoids operation failure if the node already exists and is a link. The call will return an error if the node exists but is not a link. |
lock_existing |
No | false |
Set an exclusive lock on the specified node even if it already exists. It's only used together with the ignore_existing parameter. |
force |
No | false |
Recreate the link if the path where the link is created already exists. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the created node.
Example:
PARAMETERS {
"target_path" = "//tmp/from" ;
"link_path" = "//tmp/to" ;
}
OUTPUT "0-4-191-6c07cd58"
exists
Command properties: Non-mutating, Light.
Semantics:
- Checks what are the node exists.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the source node in Cypress. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value:
true
orfalse
string.
Example:
PARAMETERS {
"path" = "//tmp/my_table/@_format" ;
}
concatenate
Semantics:
- Merge the set of files or tables (in the order in which their paths are listed).
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
No | null-transaction-id | Current transaction ID. |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
source_paths |
Yes | The list of paths to input files or tables in Cypress. The paths must exist and all items at the paths should either be files or tables. | |
destination_path |
Yes | Path to a file or table in Cypress. The path must exist. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS {
"source_paths" = ["//tmp/file1"; "//tmp/file2"];
"destination_path" = "//tmp/file";
}
Access control
For more information about access permissions, see Access control.
Note
This set of commands is not transactional.
add_member
Command properties: Mutating, Light.
Semantics:
- Add to a group a user or another group.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
member |
Yes | The name of the user or group that needs to be added to the group. | |
group |
Yes | The name of the group into which you are adding a user or another group. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS {
"group" = "admins";
"member" = "devs";
}
remove_member
Command properties: Mutating, Light.
Semantics:
- Add to a group a user or another group.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
member |
Yes | The name of the user or group that needs to be deleted from the group. | |
group |
Yes | The name of the group that you are deleting a user or another group from. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS {
"group" = "admins";
"member" = "devs";
}
check_permission
Command properties: Non-mutating, Light.
Semantics:
- Check if a user has a certain permission to access a certain Cypress node.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the node in Cypress. | |
user |
Yes | The name of the user that you are checking the permission for. | |
permission |
Yes | The name of the permission checked. | |
columns |
No | The list of columns the access to which should be checked. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS {
"path" = "//sys/accounts/statbox";
"permission" = "use";
"user" = "bob";
}
Working with files
For more information about files, see the Files article.
Note
All the commands used to work with files are transactional.
write_file
Command properties: Mutating, Heavy.
Synonyms (supported, but not recommended).
- upload.
Semantics:
- Upload the content to the file.
- The file must exist.
- If the file
path
includes theappend=%true
attribute, the data is appended to the file; otherwise, the file is overwritten.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
No | null-transaction-id | Current transaction ID. |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
path |
Yes | Path to a file in Cypress. The path must exist. | |
compute_md5 |
No | false |
Whether to calculate the MD5 sum for the file written; if both compute_md5=%true and append=%true , but the file does not have the md5 attribute (that is, previously it was written without compute_md5=%true ), an error will occur. |
Input data:
- Type:
binary
. - Value: File content.
Output data:
- Type:
null
.
Example:
PARAMETERS {"path" = "//tmp/file"}
INPUT this is sample file content
read_file
Command properties: Non-mutating, Heavy.
Synonyms (supported, but not recommended).
- download.
Semantics:
- Get the content of the Cypress node.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
transaction_id |
No | null-transaction-id | Current transaction ID. |
ping_ancestor_transactions |
No | false |
Whether to ping all the parent transactions while running the operation. |
path |
Yes | Path to a file in Cypress. The path must exist. | |
offset |
No | 0 | The position starting from which the data should be read. |
length |
No | The length of the data to be read; by default, the file is read to the end. |
Input data:
- Type:
null
.
Output data:
- Type:
binary
. - Value: File content.
Example:
PARAMETERS { "path" = "//tmp/file" }
OUTPUT this is sample file content
Working with file cache
To learn more about the file cache, see the File cache section.
put_file_to_cache
Command properties: Mutating, Light.
Semantics:
- Add the file to the cache.
- Checks that, at the provided path, there's a file with the MD5 hash given in the command options.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the file in Cypress. | |
md5 |
Yes | Expected MD5 hash for the file. | |
cache_path |
Yes | Path to the file cache. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: Path to the cached file.
Example:
PARAMETERS {
"path" = "//tmp/file";
"md5" = "a3dcb4d229de6fde0db5686dee47145d";
"cache_path" = "//tmp/yt_wrapper/file_storage/new_cache";
}
OUTPUT "//tmp/yt_wrapper/file_storage/new_cache/5d/a3dcb4d229de6fde0db5686dee47145d"
get_file_from_cache
Command properties: Non-mutating, Light.
Semantics:
- Get a path to the cached file.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
md5 |
Yes | MD5 hash for the file. | |
cache_path |
Yes | Path to the file cache. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: Path to the cached file.
Example:
PARAMETERS {
"md5" = "a3dcb4d229de6fde0db5686dee47145d";
"cache_path" = "//tmp/yt_wrapper/file_storage/new_cache";
}
OUTPUT "//tmp/yt_wrapper/file_storage/new_cache/5d/a3dcb4d229de6fde0db5686dee47145d"
Working with tables
To learn more about static tables, see the Static tables section.
To learn more about dynamic tables, see the Dynamic tables section.
write_table
Command properties: Mutating, Heavy.
Scope: Static tables.
Synonyms (supported, but not recommended).
- write.
Semantics:
- Add new entries to a static table.
- The table must exist.
- If the table
path
includes theappend=%true
attribute, the entries are appended to the table; otherwise, the table is overwritten. - If the table
path
includes thesorted_by
attribute, the system checks that the data is sorted by the specified set of keys and the resulting table is labeled as sorted. You can write sorted data (when thesorted_by
attribute is specified) only if any new key is greater than or equal to any old key. - You can write unsorted data (when the
sorted_by
attribute is not specified) to any table, but the table's sort flag will be removed. - The command can be nested in a transaction.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to a table in Cypress. The path must exist. | |
table_writer |
No | From the driver configuration | Table write options. |
Input data:
- Type:
tabular
. - Value: Table content.
Output data:
- Type:
null
.
Example:
PARAMETERS {
"path" = "//tmp/node" ;
"table_writer" = { "codec_id" = "gzip" };
}
INPUT { "id" = 1; "value" = 1.125; };
INPUT { "id" = 2; "value" = 2.000; };
INPUT { "id" = 3; "value" = 3.850; };
read_table
Command properties: Non-mutating, Heavy.
Scope: Static and dynamic tables.
Synonyms (supported, but not recommended).
- read.
Semantics:
- Retrieve entries from a Cypress table.
- The command can be nested in a transaction.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to a table in Cypress. The path must exist. | |
table_reader |
No | From the driver configuration | Table's read options. |
control_attributes |
No | From the driver configuration | Configuration of the control read attributes. |
unordered |
No | false |
Whether to read data in parallel, regardless of the entry order. |
Input data:
- Type:
null
.
Output data:
- Type:
tabular
. - Value: Table content.
Example:
PARAMETERS { "path" = "//tmp/node" }
OUTPUT { "id" = 1; "value" = 1.125; };
OUTPUT { "id" = 2; "value" = 2.000; };
OUTPUT { "id" = 3; "value" = 3.850; };
read_blob_table
Command properties: Non-mutating, Heavy.
Scope: Static tables with binary data.
Semantics:
- Get a binary data stream from a specific column in a Cypress table.
- The command can be nested in a transaction.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to a table in Cypress. The path must exist. | |
table_reader |
No | From the driver config file | Table's read options. |
part_index_column_name |
No | part_index |
Name of the column that stores BLOB indexes. |
data_column_name |
No | data |
Name of the column that stores data. |
Input data:
- Type:
null
.
Output data:
- Type:
binary
. - Value: Contents of a data column in the table.
Example:
PARAMETERS { "path" = "//tmp/node" }
OUTPUT "Hello world"
select_rows
Command properties: Non-mutating, Heavy.
Scope: Dynamic tables.
Semantics:
- Execute an SQL-like query for a dynamic table in accordance with the supported features.
- The transaction can be executed against a data snapshot with a specified timestamp.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
query |
Yes | Query string. | |
timestamp |
No | sync last committed |
Which timestamp the query should run against. |
Input data:
- Type:
null
.
Output data:
- Type:
tabular
. - Value: Set of rows with the result.
Example:
PARAMETERS { "query" = "key, value from [//tmp/sometable]" }
OUTPUT { "key" = 1; "value" = "hello"; };
OUTPUT { "key" = 2; "value" = "world; };
insert_rows
Command properties: Mutating, Heavy.
Scope: Dynamic tables.
Semantics:
- Write rows to a dynamic table.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
update |
No | false |
If set to false , the columns missing in the input data will be written with the value of Null (overwriting the current value in the table). If set to true , such columns will preserve their previous value in the table. |
aggregate |
No | false |
If set to false , the aggregating columns will be overwritten by the new value. If set to true , such columns will apply the delta from the source data. |
atomicity |
No | full |
Supported values: none and full . If set to none , writing will occur on each tablet, independent of the others. If set to full , either all of the passed rows will be written, or nothing. Read more. |
require-sync-replica |
No | true |
The option makes sense only in case of replication. If set to true , inserting will occur only if the table has a synchronous replica. If set to false , a synchronous replica is not required. |
Input data:
- Type:
tabular
. - Value: Table content.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table" }
INPUT { "id" = 1; "value" = 1.125; };
INPUT { "id" = 2; "value" = 2.000; };
INPUT { "id" = 3; "value" = 3.850; };
delete_rows
Command properties: Mutating, Heavy.
Scope: Dynamic tables.
Semantics:
- Delete all rows with the specified keys from the dynamic table.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. |
Input data:
- Type:
tabular
. - Value: Set of rows with the keys.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//tmp/sometable" }
INPUT { "id" = 1; };
INPUT { "id" = 2; };
INPUT { "id" = 3; };
lock_rows
Command properties: Mutating, Heavy.
Scope: Dynamic tables.
Semantics:
- Lock writing to the rows in the dynamic table while the current transaction is running.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
locks |
Yes | List of the involved lock groups from the table schema. |
|
lock_type |
No | shared strong |
Lock types: shared weak , shared strong , exclusive . |
A shared lock can be taken from multiple different transactions at the same time. An exclusive lock can be taken from a single transaction that usually updates this row.
When using read locks (in contrast to write locks), one row can be locked by multiple transactions. That's why, when there is a continuous stream of transactions that take a shared lock on a certain row, the row becomes locked permanently, and you won't be able to update it. This effect is referred to as write starvation
. However this is never the case for write-write conflicts because they have an exclusive lock
that is released every time a write transaction is complete.
To alleviate the effect of write starvation
, you can decrease isolation of shared locks. For this, you can specify the weak
or strong
lock mode. The distinction of the weak
mode is that when the transaction is complete, the timestamps until which the rows were locked by shared locks, aren't saved. The practical implication is that a write operation isn't locked if nested in a transaction that overlaps with the current transaction in time but completes later. However, if the write transaction completes earlier, the transaction with a shared lock isn't applied. In the weak
mode, the shared lock becomes asymmetric.
Input data:
- Type:
tabular
. - Value: Set of rows with the keys.
lookup_rows
Command properties: Non-mutating, Heavy.
Scope: Dynamic tables.
Semantics:
- Select the rows with the specified keys from the table.
- The command can be executed against a data snapshot with a specified timestamp.
- It is guaranteed that the relative order of rows retrieved in the response will be the same as the order of keys in the query.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. The path should be simple (without columns, ranges, etc.) | |
column_names |
No | Which columns to include in the response. | |
keep_missing_rows |
No | false |
Whether to include the rows corresponding to the non-found keys. If set to true , the corresponding positions will include # (entity). |
timestamp |
No | sync last committed |
Which timestamp the query should run against. |
Input data:
- Type:
tabular
. - Value: Set of rows with the keys.
Output data:
- Type:
tabular
. - Value: A set of rows with the specified keys and queried columns.
Example:
PARAMETERS { "path" = "//tmp/sometable" }
INPUT { "id" = 1; };
INPUT { "id" = 2; };
INPUT { "id" = 3; };
OUTPUT { "id" = 1; "value" = 1.125; };
OUTPUT { "id" = 2; "value" = 2.000; };
OUTPUT { "id" = 3; "value" = 3.850; };
trim_rows
Command properties: Mutating, Heavy.
Scope: Ordered dynamic tables.
Semantics:
- Remove rows from the beginning of the tablet of an ordered dynamic table . After that, the deleted data can no longer be read by the
select_rows
command. Numbering of other rows is unchanged in this case. - The command is executed outside of transactions.
Parameters:
parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
tablet_index |
Yes | Index of the truncated tablet. | |
trimmed_row_count |
Yes | Number of deleted rows. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table"; tablet_index = 10; trimmed_row_count = 43 }
mount_table
Command properties: Mutating, Light.
Scope: Dynamic tables.
Semantics:
- Mounts the tablets of the dynamic table.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
first_tablet_index |
No | 0 |
Index of the first mounted tablet. |
last_tablet_index |
No | tablet_count - 1 |
Index of the last mounted tablet. |
cell_id |
No | If specified, the tablets are mounted to the specified cell. Otherwise, the system selects suitable cells (in most cases, you should delegate this choice to the system). | |
freeze |
No | false |
If set to true , the tablets are mounted to a frozen state. See also the description of the freeze_table command. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table" }
unmount_table
Command properties: Mutating, Light.
Scope: Dynamic tables.
Semantics:
- Unmounts the tablets of the dynamic table.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
first_tablet_index |
No | 0 |
Index of the first unmounted tablet. |
last_tablet_index |
No | tablet_count - 1 |
Index of the last unmounted tablet. |
force |
No | false |
Forcibly unmount the tablets. Using this flag poses a risk of data loss, that's why you need YTsaurus admin rights to use it. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table"; "first_tablet_index" = 10; "last_tablet_index" = 20; }
remount_table
Command properties: Mutating, Light.
Scope: Dynamic tables.
Semantics:
- Enables you to update certain settings of dynamic tables without unmounting them.
- When remounted like that, the table remains accessible for reading and writing.
- The settings are taken from the table attributes and eventually reach the node where the tablets are processed.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
first_tablet_index |
No | 0 |
Index of the first remounted tablet. |
last_tablet_index |
No | tablet_count - 1 |
Index of the last remounted tablet. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table" }
freeze_table
Command properties: Mutating, Light.
Scope: Dynamic tables.
Semantics:
- Changes the tablets of the dynamic table to a frozen state, flushing all the data to the disk. You can read data in this state, but you can't write new entries. However, all the other written data exist in chunks and are available to map-reduce operations, for example (even without using the
enable_dynamic_store_read
option).
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
first_tablet_index |
No | 0 |
Index of the first frozen tablet. |
last_tablet_index |
No | tablet_count - 1 |
Index of the last frozen tablet. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table" }
unfreeze_table
Command properties: Mutating, Light.
Scope: Dynamic tables.
Semantics:
- Unfreezes a table that was frozed (for example, by a
freeze_table
command), making the table write-accessible.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Path to the dynamic table. | |
first_tablet_index |
No | 0 |
Index of the first unfrozen tablet. |
last_tablet_index |
No | tablet_count - 1 |
Index of the last unfrozen tablet. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table" }
reshard_table
Command properties: Mutating, Light.
Scope: Dynamic tables.
Semantics:
- Reshards the dynamic table (that is, changes the set of its tablets).
- The resharded tablets must be unmounted.
- Be sure to specify
tablet_count
for an ordered table. For a sorted table, you can specify bothtablet_count
andpivot_keys
. The resharded tablets are replaced by a set of new tablets. - In the case of a sorted table:
- When passing
pivot_keys
, the first key inpivot_keys
must match the first key of the first resharded tablet. The number ofpivot_keys
is equal to the number of new tablets that the resharded tablets are split into. - When passing
tablet_count
, the system will select pivot keys based on the data available in the table as evenly as possible. If the table isn't large enough, you might get less tablets then requested as a result. At default settings, your resulting tablets can't be smaller than about 200 MB each. For smaller slicing, use the optionenable_slicing
. - If the first key column of the table has an integer type, then along with
tablet_count
, you can useuniform=True
. In this case, uniform values from the range of the appropriate type will be selected as pivot keys.0, 2^64/n, 2^64\*2/n, ...
for an unsigned 64-bit type and-2^63, -2^63 + 2^64/n, -2^63 + 2^64\*2/n, ...
for a signed 64-bit type.
- When passing
- For an ordered table,
table_count
specifies the number of new tablets that the sharded tablets are split into. In this case, if the resulting tablets are higher in numbers than the old ones, new empty tablets are created. If the resulting tablets are smaller in numbers, the corresponding number of source trailing tablets are merged into a single tablet in their natural order.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Table path. | |
first_tablet_index |
No | 0 |
Index of the first resharded tablet. |
last_tablet_index |
No | tablet_count - 1 |
Index of the last resharded tablet. |
pivot_keys |
No | Pivot keys for the new tablets (for a sorted table). | |
tablet_count |
No | Number of new tablets. | |
uniform |
No | false |
Uniformly reshard tablets based on an integer key column. |
enable_slicing |
No | false |
Use sampling to increase granularity (for a more precise splitting into tablets) when pivot keys are selected automatically. This might help if you have many entries on one key and few entries on another key. |
slicing_accuracy |
No | 0.05 |
Tolerance acceptable for uniform regarding into a given number of tablets. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table"; "schema" = [{"name" = "key"; type = "int64"; "sort_order" = "ascending"}; {"name" = "value"; type = "string"}]; }
reshard_table_automatic
Command properties: Mutating, Light.
Scope: Sorted dynamic tables.
Semantics:
- Forces an iteration of a background tablet balancer, that is, reshards the tablets that go beyond the
min_tablet_size
–max_tablet_size
interval. - In contrast to
reshard_table
, it can work with mounted tables. In the process of its operation, it can amount tablets. - It's started asynchronously by default, but if you specify
keep_actions
, it returns a list of IDs that you can use to track the progress. Most APIs do it automatically when passed thesync=True
flag. - The command does not support replicated tables (you can use it independently on different replicas, however).
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Table path. | |
keep_actions |
No | false |
If set to true , respond with a list of IDs to track the progress. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: A list of
tablet_action_id
ifkeep_actions=True
. Otherwise, it returns an empty list.
Example:
PARAMETERS { "path" = "//home/user/table"; }
OUTPUT [
"11-22-33-44";
]
alter_table
Command properties: Mutating, Light.
Scope: Static and dynamic tables.
Semantics:
- Changes the scheme and other table settings (both static and dynamic).
- You can usually use
alter_table
to change the settings that require complex joint validation. The attributes are used to change the recommended settings that only affect new data. - Changing the schema involves various validation checks because the system has to make sure that the existing data matches the schema.
- You can change a static table to a dynamic table, but not vice versa.
- You can change the schema of a dynamic table, as well as
upstream_replica_id
, only if the table is unmounted.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | Table path. | |
schema |
No | If specified, it sets a new schema for the table. | |
dynamic |
No | If specified, it changes a static table to a dynamic table. This setting can only be changed outside a transaction. | |
upstream_replica_id |
No | If specified, it changes the ID of the replica object on the metacluster. For more information, see Replicated dynamic tables. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "path" = "//home/user/table"; "schema" = [{"name" = "key"; type = "int64"; "sort_order" = "ascending"}; {"name" = "value"; type = "string"}]; }
alter_table_replica
Command properties: Mutating, Light.
Scope: Replicated dynamic tables.
Semantics:
- Changes the replica's properties: enables/disables it or changes its mode.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
replica_id |
Yes | Replica ID. | |
enabled |
No | Doesn't change the enabled/disabled status of the replica | If set to true , enables the table. If set to false , disables the table. |
mode |
No | Doesn't change the replica's mode | Changes the replica mode sync /async . |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "replica_id" = "730e-8611b-3ff02c5-f647333f"; "enabled" = %true; }
get_table_columnar_statistics
Command properties: Non-mutating, Light.
Scope: Static and dynamic tables.
Semantics:
- Get statistics on the set of columns in the given set of tables (taken completely or partially, by ranges).
- The statistics includes:
- The total
data_weight
for each of the requested columns. - The total
data_weight
for all the old chunks (that the metainformation about each column hasn't been saved to because the chunk has been generated before column-by-column statistics were supported). - The total weight of all the timestamps of rows in a dynamic table.
- The total
- The paths should always include the
column selectors
. - The command can be nested in a transaction.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
paths |
Yes | A list of paths to tables in Cypress. The tables must exist. The paths must include column selectors . |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Values: Column-by-column statistics for the requested columns.
Example:
PARAMETERS {
"paths" = ["//tmp/table1{a,b,c}"; "//tmp/table2{a,b,c}"];
"transaction_id" = "1234-abcd-abcd-7890"
}
OUTPUT {
"column_data_weights" = {
"a" = 8124;
"b" = 124241241;
"c" = 3121414;
};
"legacy_chunks_data_weight" = 100242;
"timestamp_total_weight" = 50056;
}
Running operations
For more information about running data processing operations, see Data processing.
All the operations are run asynchronously, the specified commands only launch them. To find out whether the operation is complete or not, request the operation status using the get_operation
command.
All the commands used to work with operations are also transactional. It means that everything you do with tables in an operation will be executed within the specified transaction when you run the operation. The node responsible for the operation (//sys/operations/<OP_ID>
) is updated by the scheduleroutside of any transactions.
start_operation
Command properties: Mutating, Light.
Semantics:
- Run the operation of the specified type.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_type |
Yes | Operation type (can be one of these: map , reduce , map_reduce , remote-copy , erase , sort , merge , vanilla ). |
|
spec |
Yes | Operation specification. For more information, see Setting up operations. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS { "operation_type" = "map_reduce";
"spec" = {
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_path" = "//tmp/table_out"
}
}
OUTPUT "37878b-ba919c15-cdc97f3a-8a983ece"
merge
Command properties: Mutating, Light.
Semantics:
- Start merging the source tables.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification. | |
spec[input_table_paths] | Yes | List of input tables. | |
spec[output_table_path] | Yes | Output table. | |
spec[mode] | No | unordered |
Merging mode ( unordered , ordered , sorted ). |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_path" = "//tmp/table_out"
}
}
OUTPUT "37878b-ba919c15-cdc97f3a-8a983ece"
erase
Command properties: Mutating, Light.
Semantics:
- Start erasing data from the source table.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification. | |
spec[table_path] | Yes | An input table with the specified row selector. The same table will be used for output. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"table_path" = "//tmp/table[#0:#500]" ;
}
}
OUTPUT "3f9e62-ce8d2965-6350842b-3e4628d2"
map
Command properties: Mutating, Light.
Semantics:
- Run the map operation on the source tables, writing the output to the output tables.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification (see the relevant fields below). | |
spec[input_table_paths] | Yes | List of input tables. | |
spec[output_table_paths] | Yes | List of output tables. | |
spec[mapper][command] | Yes | The command that runs the mapper. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"mapper" = {
"command" = "cat"
} ;
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_paths" = [ "//tmp/table_out" ]
}
}
OUTPUT "33ab3f-bf1df917-b35fe9ed-c70a4bf4"
reduce
Command properties: Mutating, Light.
Semantics:
- Run the reduce operation on the source tables, writing the output to the output tables.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification (see the relevant fields below). | |
spec[input_table_paths] | Yes | List of input tables. | |
spec[output_table_paths] | Yes | List of output tables. | |
spec[reduce_by] | Yes | Columns that reduce runs against. |
|
spec[reducer][command] | Yes | Command that runs the reducer. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"reducer" = {
"command" = "cat" ;
} ;
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_paths" = [ "//tmp/table_out" ] ;
"reduce_by" = [ "my_key" ] ;
}
}
OUTPUT "33ab3f-bf1df917-b35fe9ed-c70a4bf4"
sort
Command properties: Mutating, Light.
Semantics:
- Start sorting the source tables.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification (see the relevant fields below). | |
spec[input_table_paths] | Yes | List of input tables. | |
spec[output_table_path] | Yes | Output table. | |
spec[sort_by] | Yes | And non-empty set of column names that make up the sort key. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_path" = "//tmp/table_out" ;
"sort_by" = [ "mykey" ];
}
}
OUTPUT "37878b-ba919c15-cdc97f3a-8a983ece"
map_reduce
Command properties: Mutating, Light.
Semantics:
- Run Map-Reduce on the source tables.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification (see the relevant fields below). | |
spec[input_table_paths] | Yes | List of input tables. | |
spec[output_table_paths] | Yes | List of output tables. | |
spec[mapper][command] | No | The command that runs the mapper. | |
spec[sort_by] | No | A non-empty set of column names by which the data for reduces will be sorted. | |
spec[reduce_by] | Yes | Columns that reduce runs against. | |
spec[reducer][command] | Yes | Command that runs the reducer. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"mapper" = {
"command" = "cat"
} ;
"reducer" = {
"command" = "cat"
} ;
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_path" = "//tmp/table_out" ;
"reduce_by" = [ "my_key" ] ;
}
}
OUTPUT "37878b-ba919c15-cdc97f3a-8a983ece"
remote-copy
Command properties: Mutating, Light.
Semantics:
- Copy the source tables from the specified cluster.
- Detailed description of all specification parameters.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
spec |
Yes | Operation specification (see the relevant fields below). | |
spec[input_table_paths] | Yes | List of input tables. | |
spec[output_table_paths] | Yes | List of output tables. | |
spec[cluster_name] | Yes | Cluster with the source tables. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Value: ID of the running operation.
Example:
PARAMETERS {
"spec" = {
"input_table_paths" = [ "//tmp/table_in", "//tmp/table_in" ] ;
"output_table_path" = "//tmp/table_out" ;
"cluster_name" = <cluster-name>;
}
}
OUTPUT "37878b-ba919c15-cdc97f3a-8a983ece"
Working with operations
Note
All the commands used to work with the operations are non-transactional.
list_operations
Attention
This command can create a significant load against the cluster. Do not use it in your workflow without a prior approval by the administrator.
Command properties: Non-mutating, Light.
Semantics:
- Get a list of operations that match the filters.
Parameters:
Parameter | Type | Required | Default value | Description |
---|---|---|---|---|
attributes |
list<string> |
No | Null |
List of operation attributes that need to be returned in the response. |
from_time |
ISO 8601 string |
No | Null |
Bottom limit for the time interval for operation selection (by the time when the operation began). |
to_time |
ISO 8601 string |
No | Null |
Top limit for the time interval for operation selection (by the time when the operation began). |
cursor_time |
ISO 8601 string |
No | Null |
Time beginning from which a list of operation should be returned. |
cursor_direction |
{past, future} |
No | past |
Time direction used to list the operations. |
user |
string |
No | Null |
Username to filter for. |
state |
string |
No | Null |
Operation state used to filter data. |
type |
string |
No | Null |
Type of filtering operation. |
filter |
string |
No | Null |
Substring that the operation's filter_factors should include. |
pool |
string |
No | Null |
Pool used for filtering. |
with_failed_jobs |
bool |
No | Null |
Return only the operations that have jobs with the failed status. |
access |
map |
No | Null |
Dictionary with the mandatory fields subject (a string) and permissions (a list of strings) that set a filter by access rights. If specified, only the operations for which a subject has every right in the permissions list, are returned. |
include_archive |
bool |
No | false |
Whether to request operations from the archive. |
include_counters |
bool |
No | true |
Whether to return statistics for the requested operations. |
limit |
int |
No | 100 |
List of operations that need to be returned in the response. |
enable_ui_mode |
bool |
No | false |
Whether to return the response in the old UI-compatible format. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Returns a dictionary with the following fields:
operations
: List with explicit descriptions of operations. Each operation described is a dictionary that includes the selected operation attributes:id
,type
,state
,authenticated_user
,brief_progress
,brief_spec
,start_time
,suspended
,weight
. Theweight
,brief_progress
, andbrief_spec
attributes are optional.incomplete
: Whether the list of operations is complete (that is, whether all the operations in the rangefrom_time
—to_time
are listed).pool_counts
: Statistics on pools.user_counts
: Statistics on users.state_counts
: Statistics on operation states.type_counts
: Statistics on operation type.failed_jobs_count
: Count offailed
jobs for the operations.
Example:
PARAMETERS { }
OUTPUT {
"operations" = [
{
"id" = "7001208d-fef089b3-3fe03e8-453d99a1";
"type" = "remote-copy";
"state" = "initializing";
"authenticated_user" = "user-name";
"brief_progress" = {};
"brief_spec" = {
...
};
"start_time" = "2018-02-06T11:06:34.200591Z";
"suspended" = %false;
"weight" = 1.;
};
];
"incomplete" = %true;
"pool_counts" = {
"pool-counts-example" = 2;
"user-name-1" = 2;
...
};
"user_counts" = {
"yql" = 52;
"user-name-1" = 2;
};
"state_counts" = {
"materializing" = 10;
"pending" = 763;
"running" = 1848;
"completed" = 6654;
"aborted" = 37;
"failed" = 98;
};
"type_counts" = {
"map" = 4294;
"merge" = 1337;
"erase" = 97;
"sort" = 1126;
"reduce" = 886;
"map_reduce" = 1609;
"remote-copy" = 24;
};
"failed_jobs_count" = 109;
}
get_operation
Command properties: Non-mutating, Light.
Semantics:
- Get information about an operation.
Parameters:
Parameter | Type | Required | Default value | Description |
---|---|---|---|---|
operation_id |
GUID |
Yes | Operation ID. | |
attributes |
list |
No | [] |
Operation attributes. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. - Returns a dictionary with the requested operation attributes.
Field | Type | Description |
---|---|---|
id |
string |
String representation of the operation ID (for example, d840bb39-3d893e5b-3fe03e8-f009b1fb ). |
type |
string |
Operation type (for example, map , reduce , map_reduce ). |
state |
string |
Current operation state (for example, completed , running ,failed ). |
authenticated_user |
string |
User that ran the operation. |
start_time |
ISO 8601 string |
Start time. |
finish_time |
ISO 8601 string |
End time (if the operation completed). |
brief_progress |
map |
Summarized statistics on jobs. |
progress |
map |
Full statistics on jobs. |
brief_spec |
map |
Part of the specification with light fields. |
spec |
map |
Specification given by the user at the beginning of the operation. |
full_spec |
map |
Specification where all the fields omitted by the user are populated by defaults. |
unrecognized_spec |
map |
Specification fields that were entered by the user but not recognized by the scheduler. |
controller_agent_address |
string |
Address of the controller agent (host:port) responsible for the operation. |
events |
list<map> |
List of events (state changes) that occurred with the operation. |
alerts |
map |
Alerts (in the format of a dictionary <alert_name> : <map_with_attributes> ) set up for the operation at the moment. |
result |
map |
A map with an error field that can include an error if the operation failed. |
committed |
bool |
Whether the operation results were committed. |
suspended |
bool |
Whether the operation is currently suspended. |
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4"; attributes = [ "state" ] }
OUTPUT {
"state" = "running";
}
abort_operation
Command properties: Mutating, Light.
Semantics:
- Abort the operation.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4" }
complete_operation
Command properties: Mutating, Light.
Semantics:
- Instantly complete the operation, saving the currently calculated result.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4" }
suspend_operation
Command properties: Mutating, Light.
Semantics:
- Suspend the operation.
- This command is executed instantly. After that, no more jobs are started for the operation, and the running jobs are optionally aborted.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. | |
abort_running_jobs |
No | false |
Whether to abort the running operation jobs. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4" }
resume_operation
Command properties: Mutating, Light.
Semantics:
- Resume the suspended operation.
- This command is executed instantly.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4" }
update_operation_parameters
Command properties: Mutating, Light.
Semantics:
- Update the parameters of a running operation.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. | |
parameters |
Yes | Dictionary with the operation parameters. | |
parameters[owners] | No | (deprecated, will be removed) List of new owners of the operation. | |
parameters[acl] | No | New ACL of the operation (it overlaps the base ACL). | |
parameters[pool] | No | Name of the pool to which the operation has to be switched over in all its trees. | |
parameters[weight] | No | New weight of the operation in all the trees. | |
parameters[scheduling_options_per_pool_tree] | No | Dictionary {tree name: scheduler settings for this tree} . The settings are described below. To learn more about the scheduler, see Scheduler and pools. |
|
parameters[options_per_job_shell] | No | Dictionary {Job Shell name: settings for this Job Shell} . The settings are described below. |
Scheduler settings for the tree:
Parameter | Required | Default value | Description |
---|---|---|---|
weight |
No | Weight of the operation in the tree. | |
pool |
No | Pool of the operation in the tree. | |
resource_limits |
No | Dictionary { resource name: limit} . Resource limits in the tree. |
Job Shell settings:
Parameter | Required | Default value | Description |
---|---|---|---|
owners |
No | Subjects (users or groups) who will have access to the Job Shell. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS {"operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4"; "parameters" = {"pool" = "username"; "scheduling_options_per_pool_tree" = {"default" = {"weight" = 2; "resource_limits" = { "user_slots" = 1; "cpu" = 0.5; "network" = 10; "memory" = 1000000000}}}}}
Working with jobs
Note
All the commands used to work with jobs are non-transactional.
get_job
Command properties: Non-mutating, Light.
Semantics:
- Getting information about a job. The command can run both for running and completed jobs (if information about the jobs was saved).
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID | |
job_id |
Yes | Job ID |
Input data:
- Type:
null
.
Output data:
- Type:
structured
. Map with the job's attributes.
Field | Type | Description |
---|---|---|
job_id |
string |
String representation of the job ID (for example, d840bb39-3d893e5b-3fe03e8-f009b1fb ). |
operation_id |
string |
String representation of the operation ID (for example, d840bb39-3d893e5b-3fe03e8-f009b1fb ). |
type |
string |
Job type (for example, vanilla , map , partition_reduce ). |
state |
string |
Current job state (for example, running , failed , completed ). |
address |
string |
Address of the node (host:port) that ran the job. |
task_name |
string |
Name of the task that the job responds to. |
start_time |
ISO 8601 string |
Start time. |
finish_time |
ISO 8601 string |
End time. |
pool |
string |
Name of the pool where the job was run. |
pool_tree |
string |
Name of the pool tree where the job was run. |
progress |
float in [0,1] |
Evaluation of the share of work executed by the job by the current moment. |
stderr_size |
integer |
Size of the saved stderr of the job (you can get stderr using the get_job_stderr command). |
error |
map |
Dictionary with an error description (for a failed job). |
statistics |
map |
Dictionary with the job's statistics. |
brief_statistics |
map |
Dictionary with brief statistics. |
input_paths |
list<YPath> |
List of parts to tables (with row ranges) processed by the job. |
core_infos |
list<map> |
List of dictionaries describing the core dumps saved by the job. |
events |
list<map> |
List of dictionaries describing events (changes in state or phase) that occurred to the job. |
is_stale |
bool |
Whether the information about the job is outdated (if %true , some fields might need update). Information about the job is considered outdated if it hasn't been updated for a long time. The information in the job archive is updated by the node running the job and the operation controller. The update process is asynchronous. If the node and the controller restart at the same time for some reason (for example, as a result of an update), the information about the final job state (completed , failed , or aborted ) might not end up in the archive, resulting in this job always returning as stale. Despite the running status, such jobs likely haven't been running for a long time and should be ignored. |
Example:
PARAMETERS { "operation_id" = "e13c5406-e5dd6f5d-3fe03e8-fe05f0d3"; "job_id" = "f11ae559-a0375703-3fe0384-8f1"}
OUTPUT {
"operation_id" = "e13c5406-e5dd6f5d-3fe03e8-fe05f0d3";
"job_id" = "f11ae559-a0375703-3fe0384-8f1";
"state" = "completed";
"start_time" = "2018-02-06T09:37:02.858492Z";
"finish_time" = "2018-02-06T09:42:19.185525Z";
"address" = "hostname.net:9012";
"statistics" = {
"data" = {
...
};
};
"events" = [
...
];
}
list_jobs
Attention
This command can create significant load on the cluster. Do not use it in your workflow without a prior approval by the administrator.
Command properties: Non-mutating, Light.
Semantics:
- Get all the jobs that match the filters for a given operation.
Parameters:
Parameter | Type | Required | Default value | Description |
---|---|---|---|---|
operation_id |
GUID |
Yes | Operation ID. | |
type (job_type) |
EJobType |
No | Null |
When you specify the parameter, the response will only include the jobs with the specified job_type . |
state (job_state) |
EJobState |
No | Null |
When you specify the parameter, the response will only include the jobs with the specified job_state . |
address |
string |
No | Null |
If this parameter is specified, the response will only include the jobs with an address that starts with address . |
with_stderr |
bool |
No | Null |
At Null , all the jobs are returned. At True , only the jobs with a non-null stderr are returned. At False , only the jobs with a null stderr are returned. |
with_fail_context |
bool |
No | Null |
At Null , all the jobs are returned. At True , only the jobs that have a saved fail_context are returned. At False , only the jobs that do not have fail_context are returned. |
with_spec |
bool |
No | Null |
At Null , all the jobs are returned. At True , only the jobs that have a saved specification are returned. At False , only the jobs that do not have a specification are returned. |
with_competitors |
bool |
No | Null |
At Null , all the jobs are returned. At True , only the jobs for which speculative copies were run along with those copies are returned. At False , only the jobs that do not have speculative copies are returned. |
job_competition_id |
GUID |
No | Null |
When you specify the parameter, the response will include the job with the job_competition_id and all of its speculative copies (if any). |
with_monitoring_descriptor |
bool |
No | Null |
At Null , all the jobs are returned. At True , only the jobs that have monitoring_descriptor are returned. At False , only the jobs that do not have monitoring_descriptor are returned. |
task_name |
string |
No | Null |
When you specify the parameter, the response will only include the jobs with the specified task_name . |
sort_field |
{none,type,state,start_time,finish_time,address,duration,progress,id} |
No | none |
Sort fields. |
sort_order |
{ascending,descending} |
No | ascending |
Sorting order. |
limit |
int |
No | 1000 |
Limit on the number of returned jobs. |
offset |
int |
No | 0 |
Offset by the given number of jobs. |
data_source |
EDataSource |
No | auto |
Data source, acceptable values: runtime , archive , and auto . |
The job_type
, job_state
, address
, with_stderr
, with_fail_context
, with_competitors
, with_spec
, and with_monitoring_descriptor
parameters define the job filter. The response will only include the jobs that meet the filtering criteria.
The sort_field
andsort_order
define the order of jobs in the response. In this case, the limit
and offset
parameters define the slice (subset) of jobs in the response: the first offset
jobs are skipped, and then limit
of the remaining jobs is selected.
The data_source
parameter regulates the source of data from which the jobs are taken:
- In the
runtime
mode, jobs are retrieved from the controller agent and Cypress and then merged. - In the
archive
mode, jobs are retrieved from the archive and the controller agent and then merged. - The
auto
mode automatically determines the source of jobs based on availability of operations in the controller agent.
Input data:
- Type:
null
.
Output data:
-
Type:
structured
. -
Structure with the fields
jobs
,cypress_job_count
,controller_agent_job_count
, andarchive_job_count
. -
*_count
fields: Сounter of jobs found for a given operation in the appropriate data sources, without filtering. If all three numbers are zeros, it means that there's no information about the operation's jobs. If all the three numbers aren't zeros, but thejobs
response is empty, it means that all the jobs were filtered out. If you getnull
instead of a number, the corresponding data source wasn't polled. -
jobs
field: List of structures that describe each job. Each job can have the following fields:id
(guid
),type
(string
),state
(string
),address
(string
): Required fields.start_time
(instant
),finish_time
(instant
),progress
(double
),stderr_size
(int
): Optional fields.error
,brief_statistics
,input_paths
,core_infos
: Optional fields.
Example:
PARAMETERS { "operation_id" = "4505e8eb-28fa88e2-3fe03e8-c6fcd8fa"; }
OUTPUT {
"jobs" = [
{
"id" = "55aff293-7ef14284-3fe0384-3e07";
"type" = "map";
"state" = "failed";
"address" = "hostname.net:9012";
"start_time" = "2018-05-05T00:41:27.433832Z";
"finish_time" = "2018-05-05T00:49:04.288196Z";
"fail_context_size" = 973230u;
"error" = {
"code" = 1205;
"message" = "User job failed";
...
};
...
};
...
{
"id" = "69ae20a7-887b25ab-3fe0384-3cff";
"type" = "map";
"state" = "running";
"address" = "hostname.net:9012";
"start_time" = "2018-05-07T13:04:03.339873Z";
"progress" = 0.;
"brief_statistics" = <
"timestamp" = "2018-05-07T13:04:08.431740Z";
> {
"processed_input_compressed_data_size" = 0;
"processed_input_data_weight" = 0;
"processed_output_uncompressed_data_size" = 0;
"processed_output_compressed_data_size" = 0;
"processed_input_uncompressed_data_size" = 0;
"processed_input_row_count" = 0;
};
};
];
"cypress_job_count" = 200;
"scheduler_job_count" = 208;
"archive_job_count" = #;
"type_counts" = {
"map" = 408;
};
"state_counts" = {
"running" = 208;
"failed" = 200;
};
}
abandon_job
Command properties: Mutating, Light.
Semantics:
- Abort the job and consider that its input data was processed successfully.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
job_id |
Yes | Job ID. |
Input data:
- Type:
null
Output data:
- Type:
null
.
Example:
PARAMETERS { "job_id" = "1225d-1f2fb8c4-f1075d39-5fb7cdff" }
abort_job
Command properties: Mutating, Light.
Semantics:
- Abort the job, and let it later be restarted by the scheduler (as any aborted operation job).
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
job_id |
Yes | Job ID. | |
interrupt_timeout |
No | A timeout for successful completion of the job after it stops receiving input data. |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "job_id" = "1225d-1f2fb8c4-f1075d39-5fb7cdff" }
dump_job_context
Command properties: Non-mutating, Light.
Semantics:
- Get the
input context
of the job.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
job_id |
Yes | Job ID. | |
path |
Yes | A path where to save the set of input parameters received by the job. The path's components should exist (except for the file itself). |
Input data:
- Type:
null
.
Output data:
- Type:
null
.
Example:
PARAMETERS { "job_id" = "1225d-1f2fb8c4-f1075d39-5fb7cdff"; "path" = "//tmp/input_context" }
get_job_input
Command properties: Non-mutating, Heavy.
Semantics:
- Get the full input for the job. The command is used both with running and failed jobs with a saved specification and all the input data available.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
job_id |
Yes | Job ID. |
Input data:
- Type:
null
.
Output data:
- Type:
binary
. - Value: Job input.
Example:
PARAMETERS { "job_id" = "1225d-1f2fb8c4-f1075d39-5fb7cdff"}
get_job_fail_context
Command properties: Non-mutating, Heavy.
Semantics:
- Get the
fail context
of the job. The command is used with failed jobs.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. | |
job_id |
Yes | Job ID. |
Input data:
- Type:
null
.
Output data:
- Type:
binary
. - Value:
Fail context
of the job.
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4"; "job_id" = "1225d-1f2fb8c4-f1075d39-5fb7cdff"}
get_job_stderr
Command properties: Non-mutating, Heavy.
Semantics:
- Get the stderr of the job. The command is used with failed jobs.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
operation_id |
Yes | Operation ID. | |
job_id |
Yes | Job ID. | |
offset |
No | Offset from the beginning in bytes. | |
limit |
No | Maximum size in bytes. |
Input data:
- Type:
null
.
Output data:
- Type:
binary
. - Value: The stderr of the job.
Example:
PARAMETERS { "operation_id" = "33ab3f-bf1df917-b35fe9ed-c70a4bf4"; "job_id" = "1225d-1f2fb8c4-f1075d39-5fb7cdff"; "offset" = 500; "limit" = 100 }
OUTPUT {
OUTPUT "total_size" = 1000;
OUTPUT "end_offset" = 600;
OUTPUT }
Other
parse_ypath
Command properties: Non-mutating, Light.
Semantics:
- Perform parsing of the passed YPath by putting all IDs of the complex YPath into attributes.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
path |
Yes | The path that needs to be parsed. |
Input data:
- Type:
null
.
Output data:
- Type:
structured
.
Example:
PARAMETERS { "path" = "//tmp/table[#1:#2]" }
OUTPUT { "path" = "<ranges=[{lower_limit={row_index=1};upper_limit={row_index=2}}]>//tmp/table" }
execute_batch
Command properties: Mutating if the set includes mutating commands, Light.
Semantics:
- Use a single query to execute the set of commands passed in the parameters.
- The command can (and will be) executed in parallel. It means that if a set includes both writing to and reading from the node, the reading result can either be the older value or the updated one.
- The set can only include light commands.
- The set can only include commands with the input type of
null
orstructured
. - The set can only include commands with the output type of
null
orstructured
.
Parameters:
Parameter | Required | Default value | Description |
---|---|---|---|
requests |
Yes | Description of executed queries. | |
concurrency |
No | 50 |
A numeric parameter that sets the maximum number of commands running on the cluster in parallel. Use this parameter to avoid exhausting your request rate limit. |
Input data:
- Type:
structured
. - The queries executed are listed in the
requests
parameter, which is a list. - Each element in this list is a dictionary including the following fields:
Parameter | Required | Default value | Description |
---|---|---|---|
command |
Yes | Command name. | |
parameters |
Yes | Dictionary with the command parameters. | |
input |
No | Input for the query (for the commands with the structured input type, for example, set ). |
Output data:
- Type:
structured
. - The output produces a list of the same length as at the input.
- Each list item describes the result of a single query execution. This is a dictionary of the following format:
Parameter | Required | Description |
---|---|---|
error |
No | Error that arose during the query execution (if any). |
output |
No | Output for the query (for the successful commands with the structured input type, for example, get ). |
Example:
PARAMETERS {
"requests" = [
{
"command" = "set";
"parameters" = {"path" = "//tmp/a"};
"input" = "value_a";
};
{
"command" = "get";
"parameters" = {"path" = "//tmp/b"};
};
{
"command" = "get";
"parameters" = {"path" = "//nonexisting"};
};
];
}
OUTPUT [
{ };
{ output = "value_b"; };
{ error = {...} };
]
get_supported_features
Command properties: Non-mutating, Light.
Semantics:
- Returns a dictionary with the
features
field describing elementary data types, compression codecs, erasure codecs, and other features supported by the cluster.
Parameters:
- No
Input data:
- Type:
null
.
Output data:
- Type:
structured
. Thefeatures
key in the response includes a dictionary with the following fields.
Key | Value type | Description |
---|---|---|
primitive_types | list of rows | Primitive types. |
erasure_codecs | list of rows | Erasure codecs. |
compression_codecs | list of rows | Compression codecs. |
Example:
PARAMETERS { }
OUTPUT {
"features" = {
"primitive_types" = ["int8"; "int16"; ... ];
"erasure_codecs" = ["lrc_12_2_2"; "reed_solomon_6_3"; ... ];
"compression_codecs" = ["none"; "snappy"; "brotli_1"; ... ];
};
}
generate_timestamp
Command properties: Non-mutating, Light.
Semantics:
- Generates a monotonous timestamp.
Parameters:
- No
Input data:
- Type:
null
.
Output data:
- Type:
structured
. Thetimestamp
key in the response stores auint64
value.
Example:
PARAMETERS { }
OUTPUT {
"timestamp" = 1723665447133469427u;
}