An open source big data platform for distributed storage and processing.

What sets our platform apart


Multitenant ecosystem

  • A set of interrelated subsystems: MapReduce, a SQL query engine, a job scheduler, and a key-value store for OLTP workloads.
  • Support for large numbers of users, eliminating the need for multiple installations and improving hardware utilization.

Reliability and stability

  • No single point of failure.
  • Automated replication between servers.
  • Updates with no loss of progress.


  • Up to 1 million CPU cores and thousands of GPUs.
  • Exabytes of data on different media: HDD, SSD, NVME, RAM.
  • 10 000+ nodes.
  • Automated server scaling up and down.

Rich functionality

  • An expansive MapReduce model.
  • Distributed ACID transactions.
  • A variety of SDKs and APIs.
  • Secure isolation for compute resources and storage.
  • A user-friendly and easy-to-use UI.

CHYT powered by ClickHouse®

  • A well-known SQL dialect and familiar functionality.
  • Fast analytic queries.
  • Integration with popular BI solutions via JDBC and ODBC.

SPYT powered by Apache Spark

  • A set of popular tools for writing ETL processes.
  • Support for multiple isolated clusters of various sizes.
  • Easy migration for existing solutions.

Use cases

Batch processing

MapReduce and SPYT for processing structured and unstructured data: logs and financial transactions.

Ad hoc analytics

CHYT provides rapid analytical queries without exporting data to an outside OLAP system. External dashboards and BI tools can access data using JDBC/ODBC protocols.


Low-latency transactional key-value store allows building interactive pipelines and services.

Machine learning

Managing GPU clusters to train models with billions of parameters.

Metadata storage

Transactional metadata storage and a reliable distributed coordination service.

ETL pipelines

Build data processing pipelines using familiar tools: Apache Spark, SQL, MapReduce.

Success stories


Ad displays

Internet search engine stores user information as profiles in the key-value store. YTsaurus makes real-time updates of user information easy, with latency as low as 10ms. The key-value storage is used as a backend for interactive user applications.