If you’re missing any functionality, please reach out in the community chat or create an issue/PR in the repository.

Shuffle service support for SPYT in YTsaurus
YTsaurus 25.2 has introduced built-in Shuffle service for more resilient computations, designed to store intermediate data between computation stages. Unlike the standard Spark Shuffle service which relies on in-memory storage and temporary disk directories, the YTsaurus Shuffle Service persists data in chunks. This approach significantly improves fault tolerance in distributed Spark computations: even if executors fail (a common by-design scenario), processed data remains intact, preventing re-computation overhead.
Starting from version 2.7.3, SPYT supports integration with the YTsaurus Shuffle service, and now it can be enabled with a special option
Internal tests have shown that the time spent working with an external shuffle service practically does not increase, and in cases where the executors fall and restart, it decreases. Moreover, the more restarts there are within the framework of a single task, the more noticeable the effect.
For details and recommendations refer to the documentation.