FAQ

Q: When working with dynamic tables from the Python API or C++ API, I get the "Sticky transaction 1935-3cb03-4040001-e8723e5a is not found" error. What should I do?

A: The answer depends on how the transactions are used. If a master transaction is used, then this is a pointless action and you must set the query outside the transaction. To do this, you can either create a separate client or explicitly specify that the query must be run under a null transaction (with client.Transaction(transaction_id="0-0-0-0"): ...).
Full use of tablet transactions in Python API is only possible via RPC-proxy (yt.config['backend'] = 'rpc'). Use of tablet transactions over HTTP is currently not supported.

Q: When writing to a dynamic table, the "Node is out of tablet memory; all writes disabled" or "Active store is overflown, all writes disabled" error occurs. What does it mean and how should I deal with it?

**A:**The error occurs when the cluster node runs out of memory to store data not written to the disk. The input data stream is too large and the cluster node has no time to compress and write data to the disk. Queries with such errors must be repeated, possibly with an increasing delay. If this is a recurring error, this may mean (except in off-nominal situations) that individual tablets are overloaded with writes or that the cluster's capacity is not sufficient to cope with this load. Increasing the number of tablets can also help (the reshard-table command).

Q: What does "Too many overlapping stores" mean? What should I do?

A: This error is an indication that tablet structure is such that the dynamic store coverage of the key range being serviced by the tablet is too dense. Dense coverage with stores leads to degradation of read performance, so in this situation a protection mechanism is activated that prevents new data from being written. The background compaction and partitioning processes should gradually normalize the tablet structure. If this does not happen, the cluster may be failing to cope with the load.

Q: When querying a dynamic table, I get the "Maximum block size limit violated" error.

A: The query involves a dynamic table once converted from a static table. The block_size parameter was not specified. If you receive an error like this, make sure you follow all the instructions from the section about converting a static table into a dynamic table. If block size is large, you need to increase max_unversioned_block_size to 32 MB and re-mount the table. This can happen, if the table's cells store large binary data that are stored in a single block in their entirety.

Q: When querying a dynamic table, I get the "Too many overlapping stores in tablet" error.

A: Most likely, the tablet can't cope with the write flow and new chunks don't have time to compact. Check that the table was sharded for a sufficient number of tablets. When writing data to an empty table, disable auto-sharding, because small tablets will be combined into one.

Q: When querying a dynamic table, I get the "Active store is overflown, all writes disabled" error.

A: The tablet can't cope with the write flow — it doesn't have time to dump the data to the disk or it can't do it for some reason. Check for errors in the @tablet_errors table attribute and if there are none, check sharding as above.

Q: When querying a dynamic table, I get the "Too many stores in tablet, all writes disabled" error.

A: The tablet is too large. Get the table to have more tablets. Note that auto-sharding limits the number of tablets to the number of cells multiplied by the value of the tablet_balancer_config/tablet_to_cell_ratio parameter.

Q: When querying a dynamic table, I get the error "Tablet ... is not known".

A: The client sent a query to a cluster node no longer serving the tablet. This usually occurs as a result of automatic balancing of tablets or restarting of cluster nodes. You need to resend the query to make the error disappear after a cache update or to disable balancing.

Q: When querying a dynamic table, I get the "Service is not known" error.

A: The client sent a query to a cluster node no longer serving the cell tablet. This usually happens when cells are rebalanced. You need to resend the query to make the error disappear after a cache update.

Q: When querying a dynamic table, I get the error "Invalid mount revision of tablet"

A: The client sent a query to a cluster node that has an updated tablet revision. This can happen if the table was explicitly remounted or if the tablet was moved as a result of automatic balancing (the tablet may have remained on the same node but changed its tablet cell). Send another query — the error will disappear once the cache is updated. Alternatively, you can reconfigure the balancing settings.

Q: When querying a dynamic table, I get the "Chunk data is not preloaded yet" error.

A: The message is specific to a table with the in_memory_mode parameter at other than none. Such a table is always in memory in a mounted state. In order to read from such a table, all data must be loaded into memory. If the table was recently mounted, the tablet was moved to a different cell, or the YTsaurus process restarted, the data is no longer in memory, which will generate this type of error. You need to wait for the background process to load data into memory.

Tablets can move to other cells as a result of automatic sharding. You can see these moves on the "Tablet balancer moves" graph on the bundle's page. If you want, you can set a different schedule for automatic sharding. Your new schedule must allow the table to continue sustaining your load.

Q: In tablet_errors, I see the "Too many write timestamps in a versioned row" or "Too many delete timestamps in a versioned row" error.

A: Sorted dynamic tables store many versions of the same value at the same time. In lookup format, each key can have no more than 2^16 versions. A simple solution is to use a column format (@optimize_for = scan). In reality, such a large number of versions is not necessary and they occur as a result of misconfiguration or a programming error. For example, when specifying atomicity=none, you can update the same table key with great frequency (in this mode, there is no row locking and transactions with overlapping time ranges can update the same key). This is not recommended. If writing a large number of versions results from a product need, such as frequent delta writes in aggregation columns, set the @merge_rows_on_flush=%true table attribute and correctly configure TTL deletion so that only a small number of actually needed versions are written to the chunk in a flush.

Q: When querying Select Rows, I get the "Query terminated prematurely due to excessive input; consider rewriting your query or changing input limit" error.

A: This error occurs if too much data is read during query execution.

Check the following:

The primary key prefix of the table must be specified in the WHERE query section. If this hasn't been done, input data cutting off can't be performed, and the query can't be executed efficiently (it will run as a full scan).
If the query contains JOIN, make sure it's executed using an index.

You need to manually rewrite the query to make sure it runs efficiently. To debug the query, you can use the explain command.
You can temporarily fix the error by increasing input_row_limit, but this won't fully resolve your issue.

Q: When querying Select Rows, I get the "Maximum expression depth exceeded" error.

A: This error occurs if the expression tree depth is too large. This usually happens when writing expressions like
FROM [...] WHERE (id1="a" AND id2="b") OR (id1="c" AND id2="d") OR ... <a few thousand conditions>
Instead, you need to set them in the form FROM [...] WHERE (id1, id2) IN (("a", "b"), ("c", "b"), ...)
There are a few problems with the first option. The problem is that queries are compiled into machine code. Machine code for queries is cached so that query structure not including constants serves as the key.

In the first case, as the number of conditions increases, query structure will constantly change. There will be code generation for each query.
The first query will generate a very large amount of code.
Compiling the query will be very slow.
The first case will work simply by checking all the conditions. No transformation into a more complex algorithm is envisaged.
Besides that, if the columns are key, read ranges are displayed for them so that only the relevant data can be read. The read range output algorithm will work more optimally for the IN variant.
In case of IN, checking the condition will perform search in the hash table.

Q: When working with a dynamic table, I get the "Value is too long" error.

A: There are quite strict limits on the size of values in dynamic tables. Single value (table cell) size must not exceed 16 MB, while the length of an entire row should stay under 128 and 512 MB taking into account all versions. There can be a maximum of 1024 values in a row, taking all versions into account. There is also a limit on the number of rows per query, which defaults to 100,000 rows per transaction when inserting, 1 million rows for selects, and 5 million for a lookup. Please note that operation may become unstable in the vicinity of the threshold values. If you are over limit, you need to change data storage in a dynamic table and not store such long rows.

Q: How do I clear a replicated table using a CLI?

A: You can't clear a replicated table atomically, but you can clear individual replicas using the Erase operation. To do this, you need to enable bulk insert for the entire cluster by setting:

//sys/@config/tablet_manager/enable_bulk_insert to %true
//sys/controller_agents/config/enable_bulk_insert_for_everyone to %true (note the missing @ symbol)

If the last specified node doesn't exist, create it using the following command:

$ yt create document //sys/controller_agents/config --attributes '{value={}}'

Q: When reading from a dynamic table, I get the "Timed out waiting on blocked row" error.

A: This error occurs due to read-write conflicts – you attempt to read by a key used by a concurrent transaction. The system must wait for the result of that transaction to ensure the required level of isolation between transactions.

If you don't require strict consistency, you can weaken the isolation level for the reading transaction.

Q: When writing to a dynamic table, I get the "Row lock conflict due to concurrent ... write" error.

A: This error occurs due to write conflicts – you're trying to write to a row that is locked by a concurrent transaction (this can be a write operation or an explicit lock). In some cases, the system may indicate the conflicting transaction. For example, for write-write conflicts, the error attributes include the transaction ID.

In general, to eliminate conflicts caused by writing to the same key from multiple sources, you need to revise your logic for writing to dynamic tables.

In some scenarios, weakening guarantees may help. For example, you can use atomicity=none mode or shared write lock mode. Before you weaken the guarantees, be sure to read the documentation to find out whether the side effects can impact you.

Q: When reading from a dynamic table, I get the "No working in-sync replicas found for table" error.

A: This error occurs when reading from a replicated or chaotic table if there is no replica to read from. The replica must meet the following criteria:

It must contain fairly recent data. If a timestamp is specified when reading, the replica must contain data up to that timestamp. If a timestamp is not specified, reads must be made from an actually synchronous replica: a one that has mode=sync and definitely contains older data. For more information, see guarantees of replicated dynamic tables.
It must be a working replica. If the client has recently got an error from a replica, next time it will try to access a different replica. The replicas you cannot read data from are listed in the banned_replicas attribute of the error in question.

Q: When attempting to execute RemoteCopy, I get the "Chunk view cannot be copied to remote cluster" error.

A: To ensure cost efficiency, some operations on a dynamic table, such as resharding, may create chunk views — wrappers for chunks. Currently, RemoteCopy can only copy regular chunks, not chunk views.
To remove chunk views from a table, perform a special forced compaction by setting the forced_chunk_view_compaction_revision attribute to 1, and then execute remount-table. For more information, see Forced compaction.

Creating a dynamic table backup

Overview