Python API
Note
Before you start, install the Python client from the pip repository using the command:
pip install ytsaurus-client
What becomes available after installing the package:
- The Python yt library.
- The CLI binary yt.
Installation
YSON libraries
To use the YSON format to work with tables, you need C++ bindings installed as a separate package. Installing YSON bindings:
pip install ytsaurus-yson
Attention!
It is currently impossible to install YSON bindings on Windows.
For Apple M1 platform users
To learn more about YSON, see Formats.
To find out the version of the installed Python wrapper, print the yt.VERSION
variable or call the yt --version
command.
If you encounter a problem, check the FAQ section. If the problem persists, write to the chat.
Attention!
We do not recommend installing the library and its dependent packages in different ways at the same time. This can lead to problems that are difficult to diagnose.
User documentation
- General
- Teams
- Python objects as operations
- Untyped Python operations
- Other
- Deprecated
Help
The most up-to-date help on specific functions and their parameters is in the code.
To view a description of functions and classes in the interpreter, proceed as follows:
python
>>> import yt.wrapper as yt
>>> help(yt.run_sort)
Examples
- Basic level
- Reading and writing tables
- Table schemas
- Simple map
- Sorting a table and a simple reduce operation
- Reduce with multiple input tables
- Reduce with multiple input and output tables
- mapreduce
- MapReduce with multiple intermediate tables
- Decorators for job classes
- Working with files on the client and in operations
- Grep
- Advanced level
- Miscellaneous
- Untyped API
FAQ
This section contains answers to a number of frequently asked questions about the Python API. Answers to other frequently asked questions are in the FAQ section.
Q: I installed the package via pypi, but I get the yt: command not found
error.
A: Try running the
pip install ytsaurus-client --force-reinstall
command
, the log will most likely display a warning like The script yt is installed in '...' which isn't on your PATH
. To solve the problem, you need to add the specified path to the PATH environment variable. To do this, run the following command:
echo 'export PATH="$PATH:<specified path>"' >> ~/.bashrc
source ~/.bashrc
Depending on the shell, the file may have a different name. The most common name on Mac is ~/.zshrc
.
Q: Reading with retry ends with an error because of timeout.
A: Most likely there are too many chunks in the table, you need to enlarge them. Use yt merge --src table --dst table --spec "{combine_chunks=true}"
Q: The operation ends with a YSON error (for example: YsonError: Premature end of stream
) and the web interface displays a YSON parsing error.
A: The operation most likely writes to stdout
. This is prohibited from being done explicitly in Python via print, sys.stdout.write()
if the operation is not marked as raw_io
, but it can be done by a third-party program, such as an archiver.
Q: The Python library writes too much to stderr, how do I increase the level of logging?
A: You can increase the level by setting the YT_LOG_LEVEL="ERROR"
environment variable or by setting up the YTsaurus logger: logging.getLogger("Yt").setLevel(logging.ERROR)
.
Q: I start an operation on Mac OS X, but jobs end with errors like ImportError: ./tmpfs/modules/_ctypes.so: invalid ELF header
.
A: Since the Python wrapper takes all Python operation dependencies with it to the cluster, binary .so and .pyc files arrive there too, which then cannot be loaded. Use a porto layer with your local environment and enable filtering of these files so that they do not end up on the cluster. For more information, see the section.
Q: Jobs end with the Invalid table index N: expected integer in range [A,B]
error.
A: The message means that you output a table index in the records and there is no corresponding table. This most often means that you have several input tables and one output table. The @table_index
fields appear in the input records by default. To disable them, you can change the format: yt.config["tabular_data_format"] = yt.YsonFormat(process_table_index=None)
. To learn more about the format, see the section. As an alternative, explicitly indicate in the specification (example for a map operation): {"mapper": {"enable_input_table_index": False}}
.
Q: The (ReadTimeout, HTTPConnectionPool(....): Read timed out.) error appears after the operation is completed.
The message means that the operation stderr could not be downloaded due to network problems and even repeated queries didn't help. In that case, you should use the ignore_stderr_if_download_failed
option which enables you to ignore stderr if you can't download it. We recommend using this option when writing production processes.
Q: I get the Yson bindings required
error.
This means that YSON was selected as the input (output) format and bindings could not be imported in the job. To learn more about YSON and bindings, see the section. You need to install the bindings package and check that YSON bindings are not filtered out using module_filter
. This is a dynamic yson_lib.so library that can easily be accidentally filtered out when filtering out all .so files. In addition, so that yt_yson_bindings
that came in modules are not deleted, write config["pickling"]["ignore_yson_bindings_for_incompatible_platforms"] = False
in the configuration file.