YTsaurus

An open source big data platform for distributed storage and processing.

Get started Online demo

Multitenant ecosystem

A set of interrelated subsystems: MapReduce, a SQL query engine, a job scheduler, and a key-value store for OLTP workloads.
Support for large numbers of users, eliminating the need for multiple installations and improving hardware utilization.

Reliability and stability

No single point of failure.
Automated replication between servers.
Updates with no loss of progress.

Scalability

Up to 1 million CPU cores and thousands of GPUs.
Exabytes of data on different media: HDD, SSD, NVME, RAM.
10 000+ nodes.
Automated server scaling up and down.

Rich functionality

An expansive MapReduce model.
Distributed ACID transactions.
A variety of SDKs and APIs.
Secure isolation for compute resources and storage.
A user-friendly and easy-to-use UI.

CHYT powered by ClickHouse®

A well-known SQL dialect and familiar functionality.
Fast analytic queries.
Integration with popular BI solutions via JDBC and ODBC.

SPYT powered by Apache Spark

A set of popular tools for writing ETL processes.
Support for multiple isolated clusters of various sizes.
Easy migration for existing solutions.

Use cases

Batch processing

MapReduce and SPYT for processing structured and unstructured data: logs and financial transactions.

Ad hoc analytics

CHYT provides rapid analytical queries without exporting data to an outside OLAP system. External dashboards and BI tools can access data using JDBC/ODBC protocols.

OLTP

Low-latency transactional key-value store allows building interactive pipelines and services.

Machine learning

Managing GPU clusters to train models with billions of parameters.

Metadata storage

Transactional metadata storage and a reliable distributed coordination service.

ETL pipelines

Build data processing pipelines using familiar tools: Apache Spark, SQL, MapReduce.

Success stories

Ad displays

Internet search engine stores user information as profiles in the key-value store. YTsaurus makes real-time updates of user information easy, with latency as low as 10ms. The key-value storage is used as a backend for interactive user applications.