Versioning rules
SPYT includes two components:
- A cluster with Spark pre-installed.
- A client where the
ytsaurus-spyt
package is installed viapip
.
SPYT is built from the following artifacts:
- A
.tar
archive with patched Spark that is deployed to YTsaurus and read at cluster startup; ytsaurus-pyspark
pip package with patched Spark files as in the.tar
above;ytsaurus-spyt
pip package which in turn depends onytsaurus-pyspark
package. It contains the assembly of all SPYT modules and their dependencies that are not included in Spark distributive. The package is deployed to pypi repository and client hosts. It also contains CLI utilities for launching an inner Spark standalone cluster and for submitting applications to it.
When a new version of ytsaurus-spyt
is installed ytsaurus-pyspark
updates automatically.
In most cases, a ytsaurus-spyt
update is sufficient. However, sometimes, to make new functionality work, you will have to update your cluster. You can find out the update procedure and scope based on the specific version.
Update procedure
- Cluster and client version number consists of three parts. Update of the last (minor) part means that backward compatibility is preserved and the change affects only one component.
- When a middle (major) part of the version is updated backward compatibility with clusters of previous versions may not be preserved so it is recommended to use a client with the same major version as cluster.
- Cluster version can be chosen on launching via
spark-launch-yt
in--spyt-version
parameter. When the version is not specified the last release version will be used. - Client version in
client mode
(Jupyter notebooks) is the same asytsaurus-spyt
package version. - Client version in
cluster mode
can be specified using--spyt-version
parameter forspark-submit-yt
command. When the version is not specified the last compatible with the cluster version will be used.