SPYT in Scala

Requirements

SPYT works with Java 11 and Scala 2.12.

Dependencies

All the dependencies are installed in SPYT automatically; therefore the scope value is Provided. They will be used for compilation only and will not be included in the .jar or run.

libraryDependencies ++= Seq(
    // Spark dependencies
    "org.apache.spark" %% "spark-core" % "3.0.1" % Provided,
    "org.apache.spark" %% "spark-sql" % "3.0.1" % Provided,
    // SPYT library
    "tech.ytsaurus" %% "spark-yt-data-source" % "1.50.0" % Provided
)

Build

The example code has been built using the sbt assembly command and posted to YTsaurus: //home/spark/examples/scala-examples-assembly-0.1.jar.

Starting jobs

Before running jobs from the example, you need to launch your SPYT cluster or find out the discovery-path for an already running cluster.

Jobs are started via the spark-submit-yt utility.

spark-submit-yt \
  --proxy ${YT_PROXY} \
  --discovery-path ${SPYT_DISCOVERY_PATH} \
  --deploy-mode cluster \
  --class tech.ytsaurus.spyt.examples.GroupingExample \
  yt:///home/spark/examples/scala-examples-assembly-0.1.jar