Write options

sorted_by

A sort using a column prefix:

df.write.sorted_by("uuid").yt("//sys/spark/examples/test_data")

unique_keys

Uniqueness of a key in a table:

df.write.sorted_by("uuid").unique_keys.yt("//sys/spark/examples/test_data")

optimize_for

A table may be stored in row (lookup) or column (scan) format. The preferred format is selected based on the task:

spark.write.optimize_for("scan").yt("//sys/spark/examples/test_data")
spark.write.optimize_for("lookup").yt("//sys/spark/examples/test_data")

Schema v3

Write tables with schema in type_v3 instead of type_v1. It can be enabled via Spark configuration or write option.

Python example:

df.write.option("write_type_v3", "true")

Dynamic tables

For dynamic tables you should explicitly specify an additional option inconsistent_dynamic_write with true value so that you do agree that there is no support for transactional writes to dynamic tables.

Python example:

df.write.option("inconsistent_dynamic_write", "true")

Reading and writing within a transaction

GPU usage