Odin — Active Monitoring for YTsaurus

Introducing the new monitoring component in the YTsaurus ecosystem

A new name has appeared in the YTsaurus ecosystem — Odin.

It’s a component that provides comprehensive monitoring for the platform and is integrated into the UI. Installation instructions are available in the documentation.

Until now, YTsaurus users mostly relied on collecting metrics through Prometheus and building their own dashboards. This is useful for analyzing resources but doesn’t always tell if the system actually working right now.

Odin solves exactly that problem. This tool has been actively used by the team for many years to monitor the liveness of production clusters. It runs regular checks and gives a simple answer: everything is fine, something needs attention, or there’s a problem.

Here’s how it looks like:

Full screen image

How Odin works

  • Checks are executed every minute and verify basic system scenarios.

  • Each check returns one of three statuses: OK, WARNING, or CRITICAL.

  • The results are stored in a table available for analysis or visualization.

  • In the UI, you can see the result of each run and its log.

You can also view the status of selected checks for the past 30 minutes to get a quick overview of the cluster’s overall health.

Full screen image

Available checks

Odin comes with a set of ready-to-use checks that cover different levels of the cluster’s operation — from basic job execution to system component health.

The full list of available checks can be found in the documentation, and their implementation is available on GitHub.

If you’re missing any functionality, please reach out in the community chat or create an issue/PR in the repository.

Odin — Active Monitoring for YTsaurus
Sign in to save this post