Streaming options

This section lists SPYT options for Structured Streaming, service columns, and a version compatibility matrix. Spark session parameters (including spark.yt.streaming.transactional) are available on the Configuration parameters page.

`yt` source and sink options

Passed via .option(...) when reading or writing a streaming DataFrame. For an input queue, the path parameter is usually passed via .load(...).

Option	Description	Required	Default value	Since version
`consumer_path`	Path to the consumer table for reading from the queue	Yes, when reading	—	1.77.0
`path`	Path to the input queue when reading or to the output table when writing	Yes	—	1.77.0
`include_service_columns`	Add service columns `__spyt_streaming_src_tablet_index` and `__spyt_streaming_src_row_index` to the DataFrame (correspond to `$tablet_index` and `$row_index` of a row in the original queue)	No	`false`	2.6.0
`max_rows_per_partition`	Maximum number of rows read from a single queue partition within one batch	No	`∞`	2.6.0
`parsing_type_v3`	Read composite types while preserving the type. If the option is not specified, `spark.yt.read.typeV3.enabled` is used	No	`spark.yt.read.typeV3.enabled`	2.6.0
`write_type_v3`	Write composite types while preserving the type. If the option is not specified, `spark.yt.write.typeV3.enabled` is used	No	`spark.yt.write.typeV3.enabled`	2.6.0

Spark Structured Streaming parameters

checkpointLocation is a Spark Structured Streaming parameter, not a specific yt source or sink option.

Parameter	Description	When to specify	Since version
`checkpointLocation`	Path to the directory with checkpoint files for a streaming query. Spark stores the intermediate state of stateful operations here: for example, groupBy, windowing, or join. To store in YTsaurus, specify a path like `yt:///...`	Always — this is a mandatory requirement of Spark Structured Streaming	1.77.0

Service columns

When include_service_columns = true, the following columns are added to the streaming DataFrame:

Column	Description
`__spyt_streaming_src_tablet_index`	Value of `$tablet_index` for a row in the original queue
`__spyt_streaming_src_row_index`	Value of `$row_index` for a row in the original queue

Compatibility matrix

Functionality	Minimum SPYT version
Checkpoint storage on YTsaurus	1.77.0
Structured Streaming over YTsaurus Queue API	1.77.0
Support for composite data types	2.6.0
`max_rows_per_partition` option	2.6.0
`include_service_columns` option	2.6.0
`spark.yt.write.dynBatchSize` parameter	Configurable for streaming since 2.6.5 (previously hard‑coded to 50 000)
Transactional mode (exactly‑once)	2.10

Streaming options

yt source and sink options

Spark Structured Streaming parameters

Service columns

Compatibility matrix

See also

`yt` source and sink options