Python API

Note

Before you start, install the Python client from the pip repository using the command:

pip install ytsaurus-client

What becomes available after installing the package:

  • The Python yt library.
  • The CLI binary yt.

Installation

YSON libraries

To use the YSON format to work with tables, you need C++ bindings installed as a separate package. Installing YSON bindings:

pip install ytsaurus-yson

Attention!

It is currently impossible to install YSON bindings on Windows.

For Apple M1 platform users

There are currently no YSON bindings built for the Apple platform. You can use Rosetta 2 as a temporary solution and install the Python version for the x86_64 architecture.

Learn more here.

To learn more about YSON, see Formats.

To find out the version of the installed Python wrapper, print the yt.VERSION variable or call the yt --version command.

If you encounter a problem, check the FAQ section. If the problem persists, write to the chat.

Library source code.

Attention!

We do not recommend installing the library and its dependent packages in different ways at the same time. This can lead to problems that are difficult to diagnose.

User documentation

Help

The most up-to-date help on specific functions and their parameters is in the code.

To view a description of functions and classes in the interpreter, proceed as follows:

python
>>> import yt.wrapper as yt
>>> help(yt.run_sort)

Examples

FAQ

This section contains answers to a number of frequently asked questions about the Python API. Answers to other frequently asked questions are in the FAQ section.

Q: I installed the package via pypi, but I get the yt: command not found error.
A: Try running the
pip install ytsaurus-client --force-reinstall command
, the log will most likely display a warning like The script yt is installed in '...' which isn't on your PATH. To solve the problem, you need to add the specified path to the PATH environment variable. To do this, run the following command:

echo 'export PATH="$PATH:<specified path>"' >> ~/.bashrc
source ~/.bashrc

Depending on the shell, the file may have a different name. The most common name on Mac is ~/.zshrc.

Q: Reading with retry ends with an error because of timeout.
A: Most likely there are too many chunks in the table, you need to enlarge them. Use yt merge --src table --dst table --spec "{combine_chunks=true}"

Q: The operation ends with a YSON error (for example: YsonError: Premature end of stream) and the web interface displays a YSON parsing error.
A: The operation most likely writes to stdout. This is prohibited from being done explicitly in Python via print, sys.stdout.write() if the operation is not marked as raw_io, but it can be done by a third-party program, such as an archiver.

Q: The Python library writes too much to stderr, how do I increase the level of logging?
A: You can increase the level by setting the YT_LOG_LEVEL="ERROR" environment variable or by setting up the YTsaurus logger: logging.getLogger("Yt").setLevel(logging.ERROR).

Q: I start an operation on Mac OS X, but jobs end with errors like ImportError: ./tmpfs/modules/_ctypes.so: invalid ELF header.
A: Since the Python wrapper takes all Python operation dependencies with it to the cluster, binary .so and .pyc files arrive there too, which then cannot be loaded. Use a porto layer with your local environment and enable filtering of these files so that they do not end up on the cluster. For more information, see the section.

Q: Jobs end with the Invalid table index N: expected integer in range [A,B] error.
A: The message means that you output a table index in the records and there is no corresponding table. This most often means that you have several input tables and one output table. The @table_index fields appear in the input records by default. To disable them, you can change the format: yt.config["tabular_data_format"] = yt.YsonFormat(process_table_index=None). To learn more about the format, see the section. As an alternative, explicitly indicate in the specification (example for a map operation): {"mapper": {"enable_input_table_index": False}}.

Q: The (ReadTimeout, HTTPConnectionPool(....): Read timed out.) error appears after the operation is completed.
The message means that the operation stderr could not be downloaded due to network problems and even repeated queries didn't help. In that case, you should use the ignore_stderr_if_download_failed option which enables you to ignore stderr if you can't download it. We recommend using this option when writing production processes.

Q: I get the Yson bindings required error.
This means that YSON was selected as the input (output) format and bindings could not be imported in the job. To learn more about YSON and bindings, see the section. You need to install the bindings package and check that YSON bindings are not filtered out using module_filter. This is a dynamic yson_lib.so library that can easily be accidentally filtered out when filtering out all .so files. In addition, so that yt_yson_bindings that came in modules are not deleted, write config["pickling"]["ignore_yson_bindings_for_incompatible_platforms"] = False in the configuration file.