Data classes are supported only in Python 3.
- Convenience: since the YTsaurus scalar type system is richer than in Python, marking by types helps explicitly express the desired column types. The convenience can be seen even better when working with composite types, such as structures.
- Strict typing enables you to detect more errors in the code before they cause data corruption and other negative effects. To do this, linters and IDEs can be used.
- Speed: due to the Skiff format and some other optimizations, CPU consumption is several times lower than for untyped data.
To work with the
yt.wrapper.schema module, install the packages:
To work with tabular data, you must declare a class with the
yt.wrapper.yt_dataclass decorator and mark its fields with types. The field type comes after the colon. It can be:
- Python built-in types:
str, and others.
- Custom classes with the
- Composite types from the typing module.
Optionalare currently supported.
Tupleand a number of other types are going to be supported in the future.
- Special types from
OtherColumns, and others (a more complete list is given below).
class Row: id: int name: str is_robot: bool = False
For the class described in this way, a constructor and other service methods will be generated. In particular,
__repr__. You can specify a default value for some fields. It will get into the constructor signature. In the standard
dataclasses module, you can create objects regularly:
row = Row(id=123, name="foo"). In this case, for all the fields for which the default values are not specified (as for
robot: bool = False), you need to pass relevant fields to the constructor, otherwise an exception will be displayed.
Inheritance is supported for data classes.
When you have a class like this, you can:
- Create a table with a relevant schema (you can just start writing to an empty or a non-existent table or use the
- Write and read tables, example.
- Run operations, example.
||Corresponds to several columns.|