Versioning rules
SPYT includes two components:
- A cluster with Spark pre-installed.
- A client where the
ytsaurus-spytpackage is installed viapip.
SPYT is built from the following artifacts:
- A
.tararchive with patched Spark that is deployed to YTsaurus and read at cluster startup; ytsaurus-pysparkpip package with patched Spark files as in the.tarabove;ytsaurus-spytpip package which in turn depends onytsaurus-pysparkpackage. It contains the assembly of all SPYT modules and their dependencies that are not included in Spark distributive. The package is deployed to pypi repository and client hosts. It also contains CLI utilities for launching an inner Spark standalone cluster and for submitting applications to it.
When a new version of ytsaurus-spyt is installed ytsaurus-pyspark updates automatically.
In most cases, a ytsaurus-spyt update is sufficient. However, sometimes, to make new functionality work, you will have to update your cluster. You can find out the update procedure and scope based on the specific version.
Update procedure
- Cluster and client version number consists of three parts. Update of the last (minor) part means that backward compatibility is preserved and the change affects only one component.
- When a middle (major) part of the version is updated backward compatibility with clusters of previous versions may not be preserved so it is recommended to use a client with the same major version as cluster.
- Cluster version can be chosen on launching via
spark-launch-ytin--spyt-versionparameter. When the version is not specified the last release version will be used. - Client version in
client mode(Jupyter notebooks) is the same asytsaurus-spytpackage version. - Client version in
cluster modecan be specified using--spyt-versionparameter forspark-submit-ytcommand. When the version is not specified the last compatible with the cluster version will be used.
Previous
Next