YTsaurus: Yandex's main data storage and processing system is now open source

Maxim Babenko, Head of Distributed Computing at Yandex, shares how YTsaurus has evolved from an internal development to an open-source platform over a decade

March 20, 2023

Hello! My name is Maxim Babenko, and I lead the Distributed Computing Technology Department at Yandex. Today, we have open-sourced the YTsaurus platform — one of the main infrastructural Big Data systems developed at Yandex.

YTsaurus is the result of almost a decade of work that we would like to share with the world. In this article, we will tell the story of YT’s inception, answer why YTsaurus is needed, describe the key features of the system, and outline its application areas.

In the GitHub‑repository, you can find the server code of YTsaurus, the deployment infrastructure using k8s, as well as the system’s web interface and client SDK for common programming languages — C++, Java, Go, and Python. All of this is under the Apache 2.0 license, which allows anyone to download it onto their servers, as well as modify it for their needs.

Read more on Medium. Or watch the video: