
SPYT Streaming
We are pleased to announce the release of SPYT 2.6.5, which introduces support for Spark Structured Streaming on top of dynamic tables and queues in YTsaurus.
With this update, you can now run near real-time data processing workflows — leveraging Spark’s micro-batching mechanism. Instead of processing data row-by-row, records are grouped into batches according to a specified window size, optimizing both throughput and latency.
YTsaurus queues are now supported as data sources. Our built-in mechanisms track offsets individually for each queue, ensuring exactly-once processing and eliminating data duplication. This enables you to process hundreds of tables or data streams in parallel and continuously, maintaining minimal delay.
What can you build with the Structured Streaming?
Example Use Cases:
-
Application Log Collection & Aggregation: Streaming logs from multiple services can be recorded into YTsaurus queues, with Spark Structured Streaming aggregating errors, building metrics, and triggering alerts in near real time based on detected patterns.
-
IoT Data Filtering & Enrichment: Telemetry streams from sensors or smart devices can be filtered (e.g., “pass only anomalies,” “drop duplicates”) and enriched with additional metadata before pushing them into analytical datamarts.
-
User Activity Monitoring: User actions (such as clicks or payments) are ingested into queues instantly. Spark Structured Streaming enables near real-time fraud detection, activity heatmaps, and dynamic segmentation for instant recommendations.
-
Transaction Stream Processing: Incoming transactions can be aggregated, validated, and stored into static tables for further offline processing and analysis.
Processed data streams can be not only analyzed on the fly but also written back to YTsaurus queues. This is especially useful for building processing pipelines: for example, cleansing, then aggregating, then exporting data further downstream.
Additionally, data accumulated in queues can be periodically exported by the queue agent to static tables, making it available for offline analytics, model training, and reporting.
Don’t miss out on these powerful new capabilities — upgrade to SPYT 2.6.5 today and unlock real-time data processing for your business!