Quick start
Client install
Install the ytsaurus-spyt
package:
pip install ytsaurus-spyt
Cluster start
-
Select an account you will use to start your cluster. You will need to upload any code that regularly runs on Spark to the YTsaurus system. The account used to start the cluster must have enough privilege to read the code.
-
Create a directory for Spark housekeeping data, such as
my_discovery_path
. The account used to start the cluster must have write privileges to the directory. Users that will run Spark jobs must have read access to the directory. -
Start your cluster:
spark-launch-yt \ --proxy <cluster-name> \ --pool my_pool \ --discovery-path my_discovery_path \ --worker-cores 16 \ --worker-num 5 \ --worker-memory 64G
Options:
spark-launch-yt
: Start the Vanilla YTsaurus transaction from a client host.--proxy
: Cluster name.--pool
: YTsaurus computational pool.--spyt-version
: Spark housekeeping data directory.--worker-cores
: Number of worker cores.--worker-num
: Number of workers.--worker-memory
: Amount of each worker's memory.--spark-cluster-version
: Cluster version (optional).
-
Start a test job on your cluster:
spark-submit-yt \ --proxy <cluster-name> \ --discovery-path my_discovery_path \ --deploy-mode cluster \ yt:///sys/spark/examples/smoke_test.py
Options:
spark-submit-yt
: spark-submit wrapper that enables you to find out the Spark master address from the Vanilla transaction. The search usesproxy
,id
, anddiscovery-path
as arguments.--proxy
: Cluster name.--discovery-path
: Spark housekeeping data directory.--deploy-mode
(cluster
orclient
): Cluster startup mode.--spyt-version
: SPYT version (optional).- Address of the file with the code in YTsaurus.
Use
-
spark-launch-yt
spark-launch-yt \ --proxy <cluster-name> \ --pool my_pool \ --discovery-path my_discovery_path \ --worker-cores 16 \ --worker-num 5 \ --worker-memory 64G \ --spark-cluster-version 1.72.0
-
spark-discovery-yt
Retrieving links to the UI master, transaction, Spark History Server:
spark-discovery-yt \ --proxy <cluster-name> \ --discovery-path my_discovery_path
-
spark-submit-yt
spark-submit-yt \ --proxy <cluster-name> \ --discovery-path my_discovery_path \ --deploy-mode cluster \ --spyt-version 1.72.0 \ yt:///sys/spark/examples/smoke_test.py
Note
You can set environment variables to use instead of some of the command arguments, such as
YT_PROXY
instead of--proxy
.
Additional parameters
For additional cluster startup parameters, see Starting a Spark cluster.