Release notes
- YTsaurus Server
- Query Tracker
- Strawberry
- CHYT
- SPYT
- Kubernetes operator
- Minor
- Bugfix
- Warning
- Features
- Minor
- Bugfix
- New Contributors
- Minor
- Bugfix
- Bugfix
- Warning
- Bugfix
- Warning
- Minor
- New Contributors
- Backward incompatible changes
- Minor
- Experimental
- Backward incompatible changes
- Minor
- Bugfixes
- New Contributors
- Features
- Minor
- Bugfixes
- Features
- New Contributors
- YTsaurus Python YSON
- SDK
Thanks to multiple outside contributors for the active participation in YTsaurus development. 🖤
YTsaurus Server
All main components are released as a docker image.
Releases:
24.1.0
Release date: 2024-11-07
To install YTsaurus Server 24.1.0 update the k8s-operator to version 0.17.0.
Scheduler and GPU
Features and changes:
- Support prioritization of pools during strong guarantee adjustment due to insufficient total cluster resources.
- Support prioritization of operations during module assignment stage of the GPU scheduling algorithm.
- Support job resource demand restrictions per pool tree.
- Add custom TTL for jobs in operation archive.
- Add user job trace collection in Trace Event Format.
- Refactor exec node config and Orchid.
- Logically separate jobs and allocations.
- Add configurable input data buffer size in jobs for more efficient interrupts.
Fixes and optimizations:
- Fix exec node heartbeat tracing throughout scheduler and controller agents.
- Optimize general allocation scheduling algorithm and fair share computation.
- Optimize scheduler <-> CA and exec node <-> CA heartbeats processing.
Queue Agent
Features:
- Treat static queue export the same way as vital consumer during queues trimming, so not exported rows will not be trimmed.
- Add functionality for banning queue agent instances via cypress attribute.
- Take cumulative data weight and timestamp from consumer meta for consumer metrics.
Fixes:
- Fix bug in handling of queues/consumers with invalid attributes (e.g.
auto_trim_config
). - Fix alerts visibility from
@queue_status
attribute. - Do not ignore consumers higher than queue size.
- Rename
write_registration_table_mapping
->write_replicated_table_mapping
in dynamic config. - Take shared lock instead of exclusive lock on static export destination directories.
Proxy
Features:
- Implement queue producer handlers for exactly once pushing in queues (
PushQueueProducer
,CreateQueueProducerSession
). - Add
queue_consumer
andqueue_producer
object type handler, so they can be created without explicitly schema specification. Example:yt create queue_consumer <path>
. - Support retries of cross cell copying.
- Add float and date types in Arrow format.
- Add memory tracking for
read_table
requests. - Drop heavy requests if there is no more memory.
- Send
bytes_out
andbytes_in
metrics during request execution. - Store
cumulative_data_weight
andtimestamp
in consumer meta. - Rename
PullConsumer
->PullQueueConsumer
andAdvanceConsumer
->AdvanceQueueConsumer
. Old handlers continue to exists for now for backward compatibility reasons.
CHYT:
- Add authorization via X-ClickHouse-Key HTTP-header.
- Add sticky query distribution based on session id/sticky cookie.
- Add a new "/chyt" http handler for chyt queries ("/query" handler is deprecated but still works for backward compatibility).
- Add ability to allocate a separate port for the new http handler to support queries without a custom URL path.
- The clique alias may be specified via "chyt.clique_alias" or "user" parameters (only for new handlers).
- Make HTTP GET requests read-only for compatibility with ClickHouse (only for new handlers).
Fixes:
- Fill dictionary encoding index type in Arrow format.
- Fix null, void and optional composite columns in Arrow format.
- Fix
yt.memory.heap_usage
metrics.
Dynamic Tables
Features:
- Secondary Indexes: basic, partial, list, and unique indexes.
- Optimize queries which group and order by same keys.
- Balance tablets using load factor (requires standalone tablet balancer).
- Shared write lock - write to same row from different transactions without blocking.
- Rpc proxy client balancer based on power of two choices algorithm.
- Compression dictionary for Hunks and Hash index.
MapReduce
Features:
- Support input tables from remote clusters in operations.
- Improve control over how data is split into jobs for ML training applications.
- Support read by latest timestamp in MapReduce operations over dynamic tables.
- Disclose less configuration information to a potential attacker.
Fixes:
- Fix teleportation of a single chunk in an unordered pool.
- Fix agent disconnect on removal of an account.
- Fix the inference of intermediate schemas for inputs with column filters.
- Fix controller agent crash on incompatible user statistic paths.
Optimizations:
- Add JobInputCache: in-memory cache on exe nodes, storing data read by multiple jobs running on the same node.
Master Server
Features:
- Tablet cells Hydra persistence data is now primarily stored at the new location
//sys/hydra_persistence
by default. The duality with the previous location (//sys/tablet_cells
) will be resolved in the future releases. - Support inheritance of
@chunk_merger_mode
after copy into directory with set@chunk_merger_mode
. - Add backoff rescheduling for nodes merged by chunk merger in case of a transient failure to merge them.
- Add an option to use the two random choices algorithm when allocating write targets.
- Add the add-maintenance command to CLI.
- Support intra-cell cross-shard link nodes.
- Propagate transaction user to transaction replicas for the sake of proper accounting of the cpu time spent committing or aborting them.
- Propagate knowledge of new master cells dynamically to other cluster components and shorten downtime when adding new master cells.
Optimizations:
- Reduce master server memory footprint by reducing the size of table nodes.
- Speed up removal jobs on data nodes.
- Move exec node tracker service away from automaton thread.
- Non-data nodes are now disposed immediately (instead of per location disposal) and independently from data-nodes.
- Offload invoking transaction replication requests from automaton thread.
Fixes:
- Fix nullptr dereferencing in resolution of queue agent and yql agent attributes.
- Respect medium override in IO engine on node restart.
- Fix rebalancing mode in table's chunk tree after merging branched tables.
- Fix sanitizing hostnames in errors for cellar nodes.
- Fix losing trace context for some callbacks and rpc calls.
- Fix persistence of
@last_seen_time
attribute for users. - Fix handling unknown chunk meta extensions by meta aggregating writer.
- Fix nodes crashing on heartbeat retries when masters are down for a long time.
- Fix table statistics being inconsistent between native and external cells after copying the table mid statistics update.
- Fix logical request weight being accidentally dropped in proxying chunk service.
- Fix a crash that occasionally occurred when exporting a chunk.
- Fix tablet cell lease transactions getting stuck sometimes.
- Native client retries are now more reliable.
- Fix primary cell chunk hosting for multicell.
- Fix crash related to starting incumbency epoch until recovery is complete.
- Restart elections if changelog store for a voting peer is locked in read-only (Hydra fix for tablet nodes).
- Fix crashing on missing schema when importing a chunk.
- Fix an epoch restart-related crash in expiration tracker.
- In master cell directory, alert on an unknown cell role instead of crashing.
Misc
Features:
- Add ability to redirect stdout to stderr in user jobs (
redirect_stdout_to_stderr
option in operation spec). - Add dynamic table log writer.
23.2.1
Release date: 2024-07-31
Scheduler and GPU
Features:
- Disable writing
//sys/scheduler/event_log
by default. - Add lightweight running operations.
Fixes:
- Various optimizations in scheduler
- Improve total resource usage and limits profiling.
- Do not account job preparation time in GPU statistics.
Queue Agent
Fixes:
- Normalize cluster name in queue consumer registration.
Proxy
Features:
- RPC proxy API for Query Tracker.
- Changed format and added metadata for issued user tokens.
- Support rotating TLS certificates for HTTP proxies.
- Compatibility with recent Query Tracker release.
Fixes:
- Do not retry on Read-Only response error.
- Fix standalone authentication token revokation.
- Fix per-user memory tracking (propagate allocation tags to child context).
- Fix arrow format for optional types.
Dynamic Tables
Features:
- Shared write locks.
- Increased maximum number of key columns to 128.
- Implemented array join in YT QL.
Fixes:
- Cap replica lag time for tables that are rarely written to.
- Fix possible journal record loss during journal session abort.
- Fix in backup manager.
- Fix some bugs in chaos dynamic table replication.
MapReduce
Features:
- Combined per-locaiton throttlers limiting total in+out bandwidth.
- Options in operation spec to force memory limits on user job containers.
- Use codegen comparator in SimpleSort & PartitionSort if possible.
Fixes:
- Better profiling tags for job proxy metrics.
- Fixes for remote copy with erasure repair.
- Fix any_to_composite converter when multiple schemas have similarly named composite columns.
- Fixes for partition_table API method.
- Fixes in new live preview.
- Do not fail jobs with supervisor communication failures.
- Multiple retries added in CRI executor/docker image integration.
- Cleaned up job memory statistics collection, renamed some statistics.
Master Server
Features:
- Parallelize and offload virtual map reads.
- Emergency flag to disable attribute-based access control.
- Improved performance of transaction commit/abort.
- Enable snapshot loading by default.
Fixes:
- Fixes and optimizations for Sequoia chunk replica management.
- Fix multiple possible master crashes.
- Fixes for master update with read-only availability.
- Fixes for jammed incremental hearbeats and lost replica update on disabled locations.
- Fix per-account sensors on new account creation.
Misc
Features:
- Config exposure via orchid became optional.
- Support some c-ares options in YT config.
- Support IP addresses in RPC TLS certificate verification.
Fixes:
- Fix connection counter leak in http server.
- Track and limit memory used by queued RPC requests.
- Better memory tracking for RPC connection buffers.
- Fix address resolver configuration.
23.2.0
Release date: 2024-02-29
Scheduler
Many internal changes driven by developing new scheduling mechanics that separate jobs from resource allocations at exec nodes. These changes include modification of the protocol of interaction between schedulers, controller agents and exec nodes, and adding tons of new logic for introducing allocations in exec nodes, controller agents and schedulers.
List of significant changes and fixes:
- Optimize performance of scheduler's Control and NodeShard threads.
- Optimize performance of the core scheduling algorithm by considering only a subset of operations in most node heartbeats.
- Optimize operation launch time overhead by not creating debug transaction if neither stderr or core table have been specified.
- Add priority scheduling for pools with resource guarantees.
- Consider disk usage in job preemption algorithm.
- Add operation module assignment preemption in GPU segments scheduling algorithm.
- Add fixes for GPU scheduling algorithms.
- Add node heartbeat throttling by scheduling complexity.
- Add concurrent schedule job exec duration throttling.
- Reuse job monitoring descriptors within a single operation.
- Support monitoring descriptors in map operations.
- Support filtering jobs with monitoring descriptors in
list_jobs
command. - Fix displaying jobs which disappear due to a node failure as running and "stale" in UI.
- Improve ephemeral subpools configuration.
- Hide user tokens in scheduler and job proxy logs.
- Support configurable max capacity for pipes between job proxy and user job.
Queue Agent
Aside small improvements, the most significant features include the ability to configure periodic exports of partitioned data from queues into static tables and the support for using replicated and chaos dynamic tables as queues and consumers.
Features:
- Support chaos replicated tables as queues and consumers.
- Support snapshot exports from queues into static tables.
- Support queues and consumers that are symbolic links for other queues and consumers.
- Support trimming of rows in queues by lifetime duration.
- Support for registering and unregistering of consumer to queue from different cluster.
Fixes:
- Trim queues by its
object_id
, not bypath
. - Fix metrics of read rows data weight via consumer.
- Fix handling frozen tablets in queue.
Proxy
Features:
- Add ability to call
pull_consumer
without specifyingoffset
, it will be taken fromconsumer
table. - Add
advance_consumer
handler for queues. - Early implementation of
arrow
format to read/write static tables. - Support type conversions for inner fields in complex types.
- Add new per user memory usage monitoring sensors in RPC proxies.
- Use ACO for RPC proxies permission management.
- Introduce TCP Proxies for SPYT.
- Support of OAuth authorization.
Fixes:
- Fix returning requested system columns in
web_json
format.
Dynamic Tables
Features:
- DynTables Query language improvments:
- New range inferrer.
- Add various SQL operators (<>, string length, ||, yson_length, argmin, argmax, coalesce).
- Add backups for tables with hunks.
- New fair share threadpool for select operator and network.
- Add partial key filtering for range selects.
- Add overload controller.
- Distribute load among rpc proxies more evenly.
- Add per-table size metrics.
- Store heavy chunk meta in blocks.
MapReduce
Features:
- RemoteСopy now supports cypress file objects, in addition to tables.
- Add support for per job experiments.
- Early implementation of CRI (container runtime interface) job environment & support for external docker images.
- New live preview for MapReduce output tables.
- Add support for arrow as an input format for MapReduce.
- Support GPU resource in exec-nodes and schedulers.
Enhancements:
- Improve memory tracking in data nodes (master jobs, blob write sessions, p2p tracking).
- Rework memory acccounting in controller agents.
Master Server
Noticeable/Potentially Breaking Changes:
- Read requests are now processed in a multithreaded manner by default.
- Read-only mode now persists between restarts.
yt-admin master-exit-read-only
command should be used to leave it. list_node
type has been deprecated. Users are advised to usemap_node
s ordocument
s instead.ChunkService::ExecuteBatch
RPC call has been deprecated and split into individual calls. Batching chunk service has been superseded by proxying chunk service.- New transaction object types:
system_transaction
,nested_system_transaction
. Support for transaction actions in regular Cypress transactions is now deprecated. - Version 2 of the Hydra library is now enabled by default. Version 1 is officially deprecated.
Features:
- It is now possible to update master-servers with no read downtime via leaving non-voting peers to serve read requests while the main quorum is under maintenance.
- A data node can now be marked as pending restart, hinting the replicator to ignore its absence for a set amount of time to avoid needless replication bursts.
- The
add_maintenance
command now supports HTTP- and RPC-proxies. - Attribute-based access control: a user may now be annotated with a set of tags, while an access-control entry (ACE) may be annotated with a tag filter.
Optimizations & Fixes:
- Response keeper is now persistent. No warm-up period is required before a peer may begin leading.
- Chunk metadata now include schemas. This opens up a way to a number of significant optimizations.
- Data node heartbeat size has been reduced.
- Chunks and chunk lists are now loaded from snapshot in parallel.
- Fixed excessive memory consumption in multicell configurations.
- Accounting code has been improved to properly handle unlimited quotas and avoid negative master memory usage.
Additionally, advancements have been made in the Sequoia project dedicated to scaling master server by offloading certain parts of its state to dynamic tables. (This is far from being production-ready yet.)
Misc
Enhancements:
- Add rpc server config dynamization.
- Add support for peer alternative hostname for Bus TLS.
- Properly handle Content-Encoding in monitoring web-server.
- Bring back "host" attribute to errors.
- Add support for --version option in ytserver binaries.
- Add additional metainformation in yson/json server log format (fiberId, traceId, sourceFile).
Query Tracker
Is released as a docker image.
Releases:
0.0.8
Release date: 2024-08-26
- Optimized Query Tracker API performance by adding system tables indexes. Issue: #653
- Added support of SystemPython udfs in YQL queries. Issue: #265
- Fixed broken logs compression in YQL agent. Issue: #623
- Optimized simultaneous YQL queries performance
- Fixed memory leak in YQL Agent
- Important fix. Fixed YQL queries results corruption in DQ. Issue: #707
- Added DQ support in dual stack networks. Issue: #744
0.0.7
Release date: 2024-08-01
- Important fix. Fixed YQL queries results corruption. Issue: https://github.com/ytsaurus/ytsaurus/issues/707
- Fixed YQL DQ launching
- Fixed bug caused UTF-8 errors in yql-agent logs
- Fixed multiple deadlocks in yql-agent
- Added support for SPYT discovery groups
- Added support for SPYT queries parameters
- Added everyone-share ACO which can be used to share queries by link.
- Added support of multiple ACOs per query, feature will be available in fresh UI, SDK releases
- Changed interaction between Query Tracker and Proxies
NB! This release is only compatible with proxy version 23.2.1, operator version 0.10.0 and later
https://github.com/ytsaurus/ytsaurus/releases/tag/docker%2Fytsaurus%2F23.2.1
https://github.com/ytsaurus/ytsaurus-k8s-operator/releases/tag/release%2F0.10.0
0.0.6
Release date: 2024-04-11
- Fixed authorization in complex cluster-free YQL queries
- Fixed a bug that caused queries with large queries to never complete
- Fixed a bag caused possibility of SQL injection in query tracker
- Reduced the size of query_tracker docker images
Related issues:
In case of an error when starting query
Access control object "nobody" does not exist
You need to run commands by admin
yt create access_control_object_namespace --attr '{name=queries}'
yt create access_control_object --attr '{namespace=queries;name=nobody}'
0.0.5
Release date: 2024-03-19
- Added access control to queries
- Added support for the in‑memory DQ engine that accelerates small YQL queries
- Added execution mode setting to query tracker. This allows to run queries in validate and explain modes
- Fixed a bug that caused queries to be lost in query_tracker
- Fixed a bug related to yson parsing in YQL queries
- Reduced the load on the state dyntables by QT
- Improved authentication in YQL queries.
- Added authentication in SPYT queries
- Added reuse of spyt sessions. Speeds up the sequential launch of SPYT queries from a single user
- Changed the build type of QT images from cmake to ya make
NB:
- Compatible only with operator version 0.6.0 and later
- Compatible only with proxies version 23.2 and later
- Before updating, please read the documentation section containing information about the new query access control.
New related issues:
In case of an error when starting query
Access control object "nobody" does not exist
You need to run commands by admin
yt create access_control_object_namespace --attr '{name=queries}'
yt create access_control_object --attr '{namespace=queries;name=nobody}'
0.0.4
Release date: 2023-12-03
- Applied YQL defaults from the documentation
- Fixed a bag in YQL queries that don't use YT tables
- Fixed a bag in YQL queries that use aggregate functions
- Supported common UDF functions in YQL
NB: This release is compatible only with the operator 0.5.0 and newer versions.
https://github.com/ytsaurus/yt-k8s-operator/releases/tag/release%2F0.5.0
0.0.3
Release date: 2023-11-14
- Fixed a bug that caused the user transaction to expire before the completion of the yql query on IPv4 only networks.
- System query_tracker tables have been moved to sys bundle
0.0.1
Release date: 2023-10-19
- Added authentication, now all requests are run on behalf of the user that initiated them.
- Added support for v3 types in YQL queries.
- Added the ability to set the default cluster to execute YQL queries on.
- Changed the format of presenting YQL query errors.
- Fixed a bug that caused errors during the execution of queries that did not return any result.
- Fixed a bug that caused errors during the execution of queries that extracted data from dynamic tables.
- Fixed a bug that caused memory usage errors. YqlAgent no longer crashes for no reason under the load.
Strawberry
Is released as a docker image.
Releases:
v0.0.12
Release date: 2024-06-21
CHYT:
- Make
enable_geodata
default value configurable and set tofalse
by default (PR: #667). Thanks @thenno for the PR! - Configure system log tables exporter during the clique start
Livy:
- Add SPYT Livy support to the controller
CHYT
Is released as a docker image.
Releases:
2.16.0
Release date: 2024-11-06
- Support ClickHouse query cache (may be configured via
clickhouse_config
) - Read in order optimization (PR #757)
- New PREWHERE algorithm on a data conversion level, turned on by default
- Convert
bool
data type toBool
instread ofYtBoolean
.YtBoolean
type is deprecated - Convert
dict
data type toMap
instead ofArray(Typle(Key, Value))
- Convert
timestamp
data type toDateTime64
instead ofUInt64
- Support reading and writing
date32
,datetime64
,timestamp64
,interval64
data types - Support reading
json
data type asString
- Support JSON_* functions from ClickHouse
- The ability to specify a cypress directory as a database
- Support exporting system log tables to cypress (query_log, metric_log, etc)
Note: date32
, datetime64
, timestamp64
and interval64
were introduced in YTsaurus 24.1. If the YTsaurus cluster version is older, trying to store these data types in a table will lead to a not a valid type
error.
2.14.0
Release date: 2024-02-15
- Support SQL UDFs
- Support reading dynamic and static tables via concat-functions
2.13.0
Release date: 2024-01-19
- Update ClickHouse code version to the latest LTS release (22.8 -> 23.8)
- Support for reading and writing ordered dynamic tables
- Move dumping query registry debug information to a separate thread
- Configure temporary data storage
SPYT
Is released as a docker image.
Releases:
2.5.0
Release date: 2024-12-25
Major release that enables support for Spark 3.4.x and 3.5.x.
- Compile-time Spark version is changed from 3.2.2 to 3.5.4;
- SPYT compile-time Spark version will be the latest available supported version since this release;
- Backward compatibility is still preserved down to Spark 3.2.2;
- Unit tests can be run over different Spark version than used at compile time via
-DtestSparkVersion=3.x.x
sbt flag
2.4.4
Release date: 2024-12-20
Maintenance release with bug fixes:
- Providing network project for Livy via command line argument
2.4.3
Release date: 2024-12-16
Maintenance release with bug fixes:
- Specifying network project for direct submit and setting it from Livy
- Fix read and write for structs with float value using Dataset API
2.4.2
Release date: 2024-12-06
Maintenance release with bug fixes:
- Autocast DatetimeType to TimestampType in spark udf
- Add parsing spark.executorEnv and spark.ytsaurus.driverEnv and set SPARK_LOCAL_DIRS
- Fix worker_disk_limit and worker_disk_account parameters for standalone cluster
- Using compatible SPYT versions instead of latest for direct submit
- Separate proxy role into client (spark.hadoop.yt.proxyRole) and cluster (spark.hadoop.yt.clusterProxyRole)
- Add flag spark.ytsaurus.driver.watch for watching driver operation
- Fix reading Livy logs
2.4.1
Release date: 2024-11-12
Maintenance release with bug fixes:
- Fix creating tables via Spark SQL without explicitly specifying ytTable schema
- Fix serializing and deserializing nested time types
- Fix casting NULL in nested data structures
2.4.0
Release date: 2024-10-31
- Support for running local files and their dependencies in direct submit mode by uploading it to YTsaurus cache
- Support for submitting compiled python binaries as spark applications via direct submit
- Dataframe write schema hints
- Bug fixes:
-
- Writing to external S3 from YTsaurus
-
- Reading float values from nested structures
-
- Columnar format reading for Spark 3.3.x
-
- Reading arbitrary files from Cypress when using Spark 3.3.x
2.3.0
Release date: 2024-09-11
The major feature of SPYT 2.3.0 is support for Spark 3.3.x. Other notable features are:
- Support for extended Datetime types such as Date32, Datetime32, Timestamp64, Interval64;
- Support for table properties in Spark SQL;
- Support for writing using Hive partitioning schema;
- Support for specifying random port for Shuffle service in inner standalone cluster;
- Fix for runtime statistics;
- Bug-Fixes for user-provided schema and for dataframes persisting.
2.2.0
Release date: 2024-08-14
- Support for reading from multiple YTsaurus clusters
- Supplying annotations for YTsaurus operations via conf parameters
- Support for specifying custom schema on read
- Support for --archives parameter in spark-submit
- Fix for int8 and int16 as nested fields
- Transactional read fix
- Other minor fixes
2.1.0
Release date: 2024-06-19
- Support for running applications using GPU
- Support for Spark versions 3.2.2-3.2.4
- History server support for direct submit scenarios
- Support for https and TCP proxy in direct submit scenarios
- Other minor fixes and improvements
2.0.0
Release date: 2024-05-29
SPYT 2.0.0 is the first release under the new release scheme and in the separate ytsaurus-spyt repository. The main feature of this release is that we have finally switched from Apache Spark fork that was used in previous releases to original Apache Spark distributive. The 2.0.0 SPYT release is still using Apache Spark 3.2.2, but we plan to support all Apache Spark 3.x.x releases in the nearest future!
Other notable changes are:
- Support for direct submit on using Livy via Query Tracker;
- Split data-source module into data-source-base that uses standard Spark types for all YTsaurus types, and data-source-extended for our implementation of custom YTsaurus types that don't have direct matches in Spark type system;
- Support for direct submit from Jupyter notebooks;
- Custom UDT for YTsaurus datetime type.
Kubernetes operator
Is released as helm charts on Github Packages.
Releases:
0.18.1
Release date: 2024-12-13
Minor
- more validation by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/393
Bugfix
- Fix updates for named cluster components @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/401
Full Changelog: https://github.com/ytsaurus/ytsaurus-k8s-operator/compare/release/0.18.0...release/0.18.1
0.18.0
Release date: 2024-11-26
Warning
This release has known bug, which broke update for YTsaurus components with non-empty names (names can be set for data/tablet/exec nodes) and roles (can be set for proxies).
The bug was fixed in 0.18.1.
Features
- Implemented RemoteTabletNodes api by @qurname2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/372
Minor
- Update sample config for cluster with TLS by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/369
- Remove DataNodes from StatelesOnly update by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/371
- Added namespacedScope value to the helm chart by @qurname2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/376
- Upgrade crd-ref-docs by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/379
- Add observed generation for remote nodes by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/382
- Support different controller families in strawberry configuration by @dmi-feo in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/355
- kata-compat: mount TLS-related files to a separate directory by @kruftik in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/388
- Support OAuth login transformations by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/397
- Add diff for static config update case by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/398
Bugfix
- Fix observed generation by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/373
- Fix YQL agent dynamic config creation by @savnadya in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/377
- Fix logging in chyt_controller by @dmi-feo in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/370
- Fix strawberry container name by @dmi-feo in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/375
- Use expected instance count as default for minimal ready count by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/395
New Contributors
- @dmi-feo made their first contribution in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/370
Full Changelog: https://github.com/ytsaurus/ytsaurus-k8s-operator/compare/release/0.17.0...release/0.18.0
0.17.0
Release date: 2024-10-11
Minor
- Separate CHYT init options into makeDefault and createPublicClique by @achulkov2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/347
Bugfix
- Fix queue agent init script usage for 24.* by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/356
0.16.2
Release date: 2024-09-13
Bugfix
- Fix strawberry controller image for 2nd job by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/345
0.16.1
Release date: 2024-09-13
Warning
This release has a bug if Strawberry components is enabled.
Use 0.16.2 instead.
Bugfix
- Revert job image override for UI/strawberry by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/344 — the bug was introduced in 0.16.0
0.16.0
Release date: 2024-09-12
Warning
This release has a bug for a configuration where UI or Strawberry components are enabled and some of their images were overridden (k8s init jobs will fail for such components).
Use 0.16.2 instead.
Minor
- Add observedGeneration field to the YtsaurusStatus by @wilwell in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/333
- Set statistics for job low cpu usage alerts by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/335
- Add nodeSelector for UI and Strawberry by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/338
- Init job creates from InstanceSpec image if specified by @wilwell in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/336
- Add tolerations and nodeselectors to jobs by @l0kix2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/342
New Contributors
- @wilwell made their first contribution in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/333
0.15.0
Release date: 2024-09-04
Backward incompatible changes
- Component pod labels were refactored in #326 and changes are:
app.kubernetes.io/instance
was removedapp.kubernetes.io/name
was Ytsaurus before, now it contains component typeapp.kubernetes.io/managed-by
is"ytsaurus-k8s-operator"
instead of"Ytsaurus-k8s-operator"
- Deprecated
chyt
field in the main YTsaurus spec was removed, usestrawberry
field with the same schema instead.
Minor
- Added tolerations for Strawberry by @qurname2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/328
- Refactor label names for components by @achulkov2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/326
Experimental
- RemoteDataNodes by @qurname2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/330
0.14.0
Release date: 2024-08-22
Backward incompatible changes
Before this release StrawberryController
was unconditionally configured with {address_resolver={enable_ipv4=%true;enable_ipv6=%true}}
in its static config. From now on it respects common useIpv6
and useIpv4
fields, which can be set in the YtsaurusSpec.
If for some reason it is required to have configuration different from
useIpv6: true
useIpv4: true
for the main Ytsaurus spec and at the same time enable_ipv4=%true;enable_ipv6=%true
for the StrawberryController
, it is possible to achieve that by using configOverrides
ConfigMap with
data:
strawberry-controller.yson: |
{
controllers = {
chyt = {
address_resolver = {
enable_ipv4 = %true;
enable_ipv6 = %true;
};
};
};
}
Minor
- Add no more than one ytsaurus spec per namespace validation by @qurname2 in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/305
- Add strategy, nodeSelector, affinity, tolerations by @sgburtsev in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/321
- Add forceTcp and keepSocket options by @leo-astorsky in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/324
Bugfixes
- Fix empty volumes array in sample config by @koct9i in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/318
New Contributors
- @leo-astorsky made their first contribution in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/324
0.13.1
Release date: 2024-07-30
Bugfixes
- Revert deprecation of useInsecureCookies in #310 by @sgburtsev in https://github.com/ytsaurus/ytsaurus-k8s-operator/pull/317
The field useInsecureCookies
was deprecated in the previous release in a not backwards compatible way, this release fixes it. It is now possible to configure the secureness of UI cookies (via the useInsecureCookies
field) and the secureness of UI and HTTP proxy interaction (via the secure
field) independently.
0.13.0
Release date: 2024-07-23
Features
- Add per-component terminationGracePeriodSeconds by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/304
- Added externalProxy parameter for UI by @sgburtsev in https://github.com/ytsaurus/yt-k8s-operator/pull/308
- Size as Quantity in LogRotationPolicy by @sgburtsev in https://github.com/ytsaurus/yt-k8s-operator/pull/309
- Use
secure
instead ofuseInsecureCookies
, pass caBundle to UI by @sgburtsev in https://github.com/ytsaurus/yt-k8s-operator/pull/310
Minor
- Add all YTsaurus CRD into category "ytsaurus-all" "yt-all" by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/311
Bugfixes
- Operator should detect configOverrides updates by @l0kix2 in https://github.com/ytsaurus/yt-k8s-operator/pull/314
0.12.0
Release date: 2024-06-28
Features
- More options for store locations. by @sgburtsev in https://github.com/ytsaurus/yt-k8s-operator/pull/294
- data nodes upper limit for
low_watermark
increased from 5 to 25Gib; - data nodes'
trash_cleanup_watermark
will be set equal to thelowWatermark
value from spec max_trash_ttl
can be configured in spec
- data nodes upper limit for
- Add support for directDownload to UI Spec by @kozubaeff in https://github.com/ytsaurus/yt-k8s-operator/pull/257
directDownload
for UI can be configured in the spec now. If omitted or set totrue
, UI will have current default behaviour (use proxies for download), if set tofalse
— UI backend will be used for downloading.
New Contributors
- @sgburtsev made their first contribution in https://github.com/ytsaurus/yt-k8s-operator/pull/294
0.11.0
Release date: 2024-06-27
Features
- SetHostnameAsFQDN option is added to all components. Default is true by @qurname2 in https://github.com/ytsaurus/yt-k8s-operator/pull/302
- Add per-component option hostNetwork by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/287
Minor
- Add option for per location disk space quota by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/279
- Add into exec node pods environment variables for CRI tools by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/283
- Add per-instance-group podLabels and podAnnotations by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/289
- Sort status conditions for better readability by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/290
- Add init containers for exec node by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/288
- Add loglevel "warning" by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/292
- Remove mutating/defaulter webhooks by @koct9i in https://github.com/ytsaurus/yt-k8s-operator/pull/296
Bugfixes
- fix exec node resource calculation on non-isolated CRI-powered job environment by @kruftik in https://github.com/ytsaurus/yt-k8s-operator/pull/277
0.10.0
Release date: 2024-06-07
Features
Minor
- Add everyone-share QT ACO by @Krisha11 in #272
- Add channel in qt config by @Krisha11 in #273
- Add option for per location disk space quota #279
Bugfixes
- Fix exec node resource calculation on non-isolated CRI-powered job environment #277
0.9.1
Release date: 2024-05-30
Features
Minor
- Add 'physical_host' to cypress_annotations for CMS and UI сompatibility #252
- added WATCH_NAMESPACE env and LeaderElectionNamespace #168
- Add configuration for solomon exporter: specify host and some instance tags #258
- Add sidecars support to primary masters containers #259
- Add option for containerd registry config path #264
Bugfixes
- Fix CRI job environment for remote exec nodes #261
0.9.0
Release date: 2024-04-23
Features
- Add experimental (behaviour may change) UpdateSelector field #211 to be able to update components separately
Minor
- Enable TmpFS when possible #235
- Disable disk quota for slot locations #236
- Forward docker image environment variables to user job #248
Bugfixes
- Fix flag doNotSetUserId #243
0.8.0
Release date: 2024-04-12
Features
Minor
- Increased default value for MaxSnapshotCountToKeep and MaxChangelogCountToKeep
- Tune default bundle replication factor #210
- Set EnableServiceLinks=false for all pods #218
Bugfixes
- Fix authentication configuration for RPC Proxy #207
- Job script updated on restart #224
- Use secure random and base64 for tokens #202
- Fix running jobs with custom docker_image when default job image is not set #217
0.7.0
Release date: 2024-04-04
Features
- Add Remote exec nodes support #75
- Add MasterCaches support #122
- Enable TLS certificate auto-update for http proxies #167
- CRI containerd job environment #105
Minor
- Support RuntimeClassName in InstanceSpec
- Configurable monitoring port #146
- Not triggering full update for data nodes update
- Add ALLOW_PASSWORD_AUTH to UI #162
- Readiness checks for strawberry & UI
- Medium is called domestic medium now #88
- Tune tablet changelog/snapshot initial replication factor according to data node count #185
- Generate markdown API docs
- Rename operations archive #116
- Configure cluster to use jupyt #149
- Fix QT ACOs creation on cluster update #176
- Set ACLs for QT ACOs, add everyone-use ACO #181
- Enable rpc proxy in job proxy #197
- Add yqla token file in container #140
Bugfixes
- Replace YQL Agent default monitoring port 10029 -> 10019
0.6.0
Release date: 2024-02-26
Features
- Added support for updating masters of 23.2 versions
- Added the ability to bind masters to the set of nodes by node hostnames.
- Added the ability to configure the number of stored snapshots and changelogs in master spec
- Added the ability for users to create access control objects
- Added support for volume mount with mountPropagation = Bidirectional mode in execNodes
- Added access control object namespace "queries" and object "nobody". They are necessary for query_tracker versions 0.0.5 and higher.
- Added support for the new Cliques CHYT UI.
- Added the creation of a group for admins (admins).
- Added readiness probes to component statefulset specs
Fixes
- Improved ACLs on master schemas
- Master and scheduler init jobs do not overwrite existing dynamic configs anymore.
Tests
- Added flow to run tests on Github resources
- Added e2e to check that updating from 23.1 to 23.2 works
- Added config generator tests for all components
- Added respect KIND_CLUSTER_NAME env variable in e2e tests
- Supported local k8s port forwarding in e2e
Backward Incompatible Changes
exec_agent
was renamed toexec_node
in exec node config, if your specs haveconfigOverrides
please rename fields accordingly.
0.5.0
Release date: 2023-11-29
Features
- Added
minReadyInstanceCount
into Ytsaurus components which allows not to wait when all pods are ready. - Support queue agent.
- Added postprocessing of generated static configs.
- Introduced separate UseIPv4 option to allow dualstack configurations.
- Support masters in host network mode.
- Added spyt engine in query tracker by default.
- Enabled both ipv4 and ipv6 by default in chyt controllers.
- Default CHYT clique creates as tracked instead of untracked.
- Don't run full update check if full update is not enabled (
enable_full_update
flag in spec). - Update cluster algorithm was improved. If full update is needed for already running components and new components was added, operator will run new components at first, and only then start full update. Previously such reconfiguration was not supported.
- Added optional TLS support for native-rpc connections.
- Added possibility to configure job proxy loggers.
- Changed how node resource limits are calculated from
resourceLimits
andresourceRequests
. - Enabled debug logs of YTsaurus go client for controller pod.
- Supported dualstack clusters in YQL agent.
- Supported new config format of YQL agent.
- Supported
NodePort
specification for HTTP proxy (http, https), UI (http) and RPC proxy (rpc port). For TCP proxy NodePorts are used implicitly when NodePort service is chosen. Port range size and minPort are now customizable.
Fixes
- Fixed YQL agents on ipv6-only clusters.
- Fixed deadlock in case when UI deployment is manually deleted.
Tests
- e2e tests were fixed.
- Added e2e test for operator version compat.
0.4.1
Release date: 2023-10-03
Features
- Support per-instance-group config override
- Support TLS for RPC proxies
Bug fixes
- Fixed an error during creation of default
CHYT
clique (ch_public
).
0.4.0
Release date: 2023-09-26
Features
- The operations archive will be updated when the scheduler image changes.
- Ability to specify different images for different components.
- Cluster update without full downtime for stateless components was supported.
- Updating of static component configs if necessary was supported.
- Improved SPYT controller. Added initialization status (
ReleaseStatus
). - Added CHYT controller and the ability to load several different versions on one YTsaurus cluster.
- Added the ability to specify the log format (
yson
,json
orplain_text
), as well as the ability to enable recording of structured logs. - Added more diagnostics about deploying cluster components in the
Ytsaurus
status. - Added the ability to disable the launch of a full update (
enableFullUpdate
field inYtsaurus
spec). - The
chyt
spec field was renamed tostrawberry
. For backward compatibility, it remains incrd
, but it is recommended to rename it. - The size of
description
incrd
is now limited to 80 characters, greatly reducing the size ofcrd
. Query Tracker
status tables are now automatically migrated when it is updated.- Added the ability to set privileged mode for
exec node
containers. - Added
TCP proxy
. - Added more spec validation: checks that the paths in the locations belong to one of the volumes, and also checks that for each specified component there are all the components necessary for its successful work.
strawberry controller
andui
can also be updated.- Added the ability to deploy
http-proxy
with TLS. - Odin service address for the UI can be specified in spec.
- Added the ability to configure
tags
andrack
for nodes. - Supported OAuth service configuration in the spec.
- Added the ability to pass additional environment variables to the UI, as well as set the theme and environment (
testing
,production
, etc.) for the UI. - Data node location mediums are created automatically during the initial deployment of the cluster.
0.3.1
Release date: 2023-08-14
Features
- Added the ability to configure automatic log rotation.
toleration
andnodeSelector
can be specified in instance specs of components.- Types of generated objects are specified in controller configuration, so operator respond to modifications of generated objects by reconciling.
- Config maps store data in text form instead of binary, so that you can view the contents of configs through
kubectl describe configmap <configmap-name>
. - Added calculation and setting of
disk_usage_watermark
anddisk_quota
for exec node. - Added a SPYT controller and the ability to load the necessary for SPYT into Cypress using a separate resource, which allows you to have several versions of SPYT on one cluster.
Bug fixes
- Fixed an error in the naming of the
medium_name
field in static configs.
YTsaurus Python YSON
Availabe as a package in PyPI.
Releases:
0.4.9
Release date: 2024-08-07
Features:
- Support ORC format
- Access thread local variables via noinline functions
- Support Python 3.13 (avoid using deprecated PyImport_ImportModuleNoBlock)
0.4.8
Release date: 2024-04-24
- Add table creation in upload parquet
- Reduce bindings .so size
0.4.7
Release date: 2024-03-09
- Add implementation of
upload_parquet
- Fix invalid memory access in YsonStringProxy
SDK
Python
Availabe as a package in PyPI.
Releases:
0.13.21
Release date: 2024-12-26
Features:
- Introduce YAML format support
- Introduce the higher level primitives for tracking queries
- Add
network_project
option setter forUserJobSpecBuilder
- Add parallel mode for ORC format
- Support
omit_inaccessible_columns
for read commands - Support
preserve_acl
option in copy/move commands - Rework authentication commands in CLI over getpass
- Dirtable upload improvements
- Add queue producer commands
- Improve SpecBuilder: add use_columnar_statistics, ordered, data_size_per_reduce_job
Fixes:
- Fix retries for parquet/orc upload commands
Cosmetics:
- Remove legacy constant from operation_commands.py
- Beautify imports: drop Python 2 support
- Wrap doc links into constants for
--help
command
Many thanks to @zlobober for significant contribution!
0.13.19
Release date: 2024-10-15
Features:
- Add possibility to upload and dump tables in ORC format using CLI commands:
upload-orc
anddump-orc
- Support parallel mode for
dump-parquet
command - Support nullable fields during parsing YT schema from parquet schema
- Support parallel mode for
read_table_structured
command - Add cli params to docker respawn decorator (PR: #849). Thanks @thenno for the PR!
Fixes:
- Do not retry
LineTooLong
error - Fix
read_query_result
always returning raw results (PR: #800). Thanks @zlobober for the PR! - Fix cyclic references that were causing memory leaks
- Reduce default value of
write_parallel/concatenate_size
from 100 to 20 - Fix retries in
upload-parquet
command
0.13.18
Release date: 2024-07-26
Features:
- Use expanduser for
config["token_path"]
- Support custom dill params
- Support Nullable patchable config element
- Add max_replication_factor in config
- Use strawberry ctl address from cypress client_config
Fixes:
- Fixes of E721: do not compare types, for exact checks use
is
/is not
, for instance checks useisinstance()
- Fix bug in YT python wrapper: stop transaction pinger before exiting transaction
Thanks to multiple outside contributors for the active participation in Python SDK development.
0.13.17
Release date: 2024-06-26
Features:
- Support profiles in configuration file
- Add versioned select
- Add enum.StrEnum and enum.IntEnum support for yt_dataclasses
Fixes:
- Fix test_operation_stderr_output in py.test environment
Thanks to @thenno for considerable contribution!
0.13.16
Release date: 2024-06-19
Features:
- Allow to specify prerequisite transaction ids in client.Transaction context manager (PR: #638). Thanks @chegoryu for the PR!
- Add client and chunk_count parameters to dirtable_commands
- Add alter_query command for Query Tracker
- Add dump_job_proxy_log command (PR: #594). Thanks @tagirhamitov for the PR!
Fixes:
- Fix return result of lock command in case of batch client
- Fix jupyter notebooks for operations in separate cells (PR: #654). Thanks @dmi-feo for the PR!
0.13.14
Release date: 2024-03-09
Features:
- Added an option for skipping rows merge in select
- Support composite types in QL
- Add
preserve_account
option to table backup commands - Expand the list of dynamic table retriable errors
- Enhance table creation with specified append attribute
- Various improvements of maintenance API
- Support
upload_parquet
command
Fixes:
- Support SortColumn serialization
- Fix file descriptors leak in config parsing
- Fix output stream validation for TypedJobs
0.13.12
Release date: 2023-12-14
Features:
- Support
double
andfloat
type in@yt_dataclass
. - Added
get_query_result
command.
Fixes:
- Fixed setting config from environment variables.
- Eligeable error message if node type is not equal to table in operation spec.
Java
Is released as packages in maven.
Releases:
1.2.7
Release date: 2024-11-25
- Add the
RequestMiddleware
interface to subscribe on request start. - Support
ListQueueConsumerRegistrations
. - Add monitoring callback interface for
MultiYTsaurusClient
. - Refactor
MultiYTsaurusClient
. - Support
YT_BASE_LAYER
. - Fix resource leak in the
ClientPool
.
1.2.6
Release date: 2024-09-05
YsonJsonConverter
has been released.- Support for
Date32
,Datetime64
,Timestamp64
,Interval64
types. - Fixed a bug that caused
writeTable
to fail if the table schema did not match the user-specified schema.
1.2.5
Release date: 2024-08-20
- Added MultiYTsaurusClient.
- Support for MultiLookupRows request.
- Fixed a bug that caused an infinite wait for proxy discovery when the connection failed.
- Fixed a bug that caused the operation output table to be created without a user-specified transaction.
1.2.4
Release date: 2024-06-18
- Support for JPA
@Embedded
/@Embeddable
annotations. - Support for URL schema to detect the usage of TLS.
- Implemented YT Query Tracker API methods.
1.2.3
Release date: 2024-05-27
- Introduced
DiscoveryClient
. - The following types are supported in
@Entity
fields (use@Column(columnDefinition=“...”)
to specify type):- enum -> utf8/string;
- String -> string;
- Instant -> int64;
- YsonSerializable -> yson.
- Fixed a bug due to which
YTsaurusClient
did not terminate.
1.2.2
Release date: 2024-04-11
- Supported placeholder values in SelectRowsRequest.
- Supported specifying the proxy network name.
- Supported set(Input/Output)Format in CommandSpec.
- Fixed a bug that caused NoSuchElementException in SyncTableReader.
- Fixed a bug that caused the table to be recreated when writing without "append".
1.2.1
Release date: 2024-01-29
- Supported serializable mapper/reducer.
- Added completeOperation method.
- Implemented three YT Queues API methods: registerQueueConsumer, advanceConsumer, pullConsumer.
- Added AggregateStatistics to MultiTablePartition.
- Some minor bug fixes.
1.2.0
Release date: 2023-09-18
- Fixed a bug that caused
SyncTableReaderImpl
internal threads would not terminate. - In the
WriteTable
request, theneedRetries
option is set totrue
by default. - The
WriteTable
request hasbuilder(Class)
now; using it, you can omit theSerializationContext
if the class is marked with the@Entity
annotation, implements thecom.google.protobuf.Message
ortech.ytsaurus.ysontree.YTreeMapNode
interface (serialization formats will beskiff
,protobuf
orwire
respectively). - The
setPath(String)
setters in theWriteTable
andReadTable
builders are@Deprecated
. - The interfaces of the
GetNode
andListNode
request builders have been changed:List<String>
is passed to thesetAttributes
method instead ofColumnFilter
, thenull
argument representsuniversal filter
(all attributes should be returned). - Added the
useTLS
flag toYTsaurusClientConfig
, if set totrue
https
will be used fordiscover_proxies
.
1.1.1
Release date: 2023-07-26
- Fixed validation of
@Entity
schemas: reading a subset of table columns, a superset of columns (if the types of the extra columns arenullable
), writing a subset of columns (if the types of the missing columns arenullable
). - The following types are supported in
@Entity
fields:utf8
->String
;string
->byte[]
;uuid
->tech.ytsaurus.core.GUID
;timestamp
->java.time.Instant
.
- If an operation started by
SyncYTsaurusClient
fails, an exception will be thrown. - Added the
ignoreBalancers
flag toYTsaurusClientConfig
, which allows to ignore balancer addresses and find only rpc proxy addresses.