Spark History Server (SHS)

To use the History Server in this mode, it must be launched separately.

The main parameters of the shs-launch-yt command are a subset of the parameters of the spark-launch-yt command that are relevant for launching the History Server. Example of launching the History Server:

$ shs-launch-yt --proxy <proxy address> --discovery-path <discovery path>

Example:

$ shs-launch-yt --proxy my.ytsaurus.cluster.net --discovery-path //home/user/spark/discovery

To save event logs for a task launch, the spark-submit command should include the following parameters:
--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=ytEventLog:/<history server discovery path>/logs/event_log_table

In this case, the event logs for the launch will be saved in the corresponding table and will be available in the History Server, which uses this table as a log storage.

Note

When started, the History Server indexes all records from the event logs source, and processing each record may take 10–15 seconds. New logs become available only after complete indexing of all existing data is finished.

In high-load clusters, this can lead to significant delays when starting or restarting the History Server. Therefore, for production clusters with an intense flow of tasks, it is recommended to use a new (empty) event logs table when launching a new Spark cluster. If necessary, the old table can be archived (move the table).

Diagnostics

SPYT cluster