How it works
How it works
The YTsaurus architecture has three layers.
- Storage Layer: Includes queues, static and dynamic tables, and is responsible for storing data and metadata.
- Compute Layer: Responsible for data processing, distributed computing using various processing engines like MapReduce, YQL, CHYT, SPYT.
- Access Layer (the data access layer): a user-friendly UI, CLI, and SDK in Java, Python, GoLang, and C++.
We have a distributed file system and metainformation tree that stores files, documents, static and dynamic tables, and metadata. It supports transactions and manages data chunks.
In addition to storage Cypress can serve as a coordination service. Fault tolerance is driven by our own consensus algorithm (RSM) similar to RAFT.
Dynamic tables
Dynamic tables
Dynamic tables are built on top of the file system as a key-value store. They support: distributed ACID transactions within or across tables, automatic data caching in memory for low read latency, TTL data retention rules, isolation between projects on different cluster nodes, replication between clusters.
Queues in YTsaurus
A distributed, replicated message log based on dynamic tables, with support for sharding and cross-cluster replication, and compatible with the Apache Kafka protocol.
Flexible table-based storage
Export messages to a static table for long-term storage, and delete them via TTL after processing
An all-in-one transaction space
With a single transaction space with dynamic tables, you can process events exactly-once without complex integrations and third-party tools
Scalability and fault tolerance
Sharding and distributed storage allow you to scale the system horizontally without a single point of failure
Data processing: Scheduler
Data processing: Scheduler
YTsaurus scheduler manages resources of the cluster: CPU cores, RAM, GPUs. It allocates resources to jobs following the Dominant Resource Fairness approach. It supports of hierarchy of compute pools, and different types of guarantees. Its point-in-time and integral guarantees achieve fairness on different time scales. Operations managed by the Scheduler can be written in MapReduce, or in a dialect of SQL provided by YQL. Tables in YTsaurus are schematized, allowing simple and concise queries.

Operations on the cluster can be started with YQL, a dialect of SQL with UDF, window functions and more. You can use it to build complex data processing pipelines that store subqueries in variables and create chains of dependent queries.
CHYT allows you to launch ClickHouse® clusters to work with data in YTsaurus. It can operate as a data source for visualization and BI tools, making it excellent for ad hoc queries.
SPYT clusters run Apache Spark inside YTsaurus Vanilla operations. They are great for building ETL pipelines.
What is YTsaurus?
What is YTsaurus?
An introduction into the platform.
Try YTsaurus
