SPYT in Scala
Requirements
SPYT works with Java 11 and Scala 2.12.
Dependencies
All the dependencies are installed in SPYT automatically; therefore the scope value is Provided. They will be used for compilation only and will not be included in the .jar or run.
All possible values for spytVersion can be found here under the spyt/* or spyt-spark/* tag. Apache Spark compatible versions are described in this table.
val sparkVersion = "3.5.4"
val spytVersion = "2.7.3"
libraryDependencies ++= Seq(
// Spark dependencies
"org.apache.spark" %% "spark-core" % sparkVersion % Provided,
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
// SPYT library
"tech.ytsaurus" %% "spark-yt-data-source" % spytVersion % Provided
)
Build
The example code has been built using the sbt assembly command and posted to YTsaurus: //home/spark/examples/scala-examples-assembly-0.1.jar.
Starting jobs
Before running jobs from the example, you need to launch your SPYT cluster or find out the discovery-path for an already running cluster.
Jobs are started via the spark-submit-yt utility.
spark-submit-yt \
--proxy ${YT_PROXY} \
--discovery-path ${SPYT_DISCOVERY_PATH} \
--deploy-mode cluster \
--class tech.ytsaurus.spyt.examples.GroupingExample \
yt:///home/spark/examples/scala-examples-assembly-0.1.jar
Differences for submitting directly to YTsaurus (from version 1.76.0)
For submitting Spark tasks directly to YTsaurus a SparkSession object should be created according to Spark recommendations:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
object MySparkApplication {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
val spark = SparkSession.builder.config(conf).getOrCreate()
try {
// Application code
} finally {
spark.stop()
}
}
}
In this case a standard spark-submit command should be used:
$ spark-submit \
--master ytsaurus://${YT_PROXY} \
--deploy-mode cluster \
--class tech.ytsaurus.spyt.examples.GroupingExample \
yt:///home/spark/examples/scala-examples-assembly-0.1.jar