Data classes
Attention
Data classes are only supported in Python 3.
The main method used to represent table rows is to create classes with fields marked by types (similar to dataclasses). See also the example.
Motivation
- Convenience: since the YTsaurus scalar type system is richer than in Python, marking by types helps explicitly express the desired column types. The convenience can be seen even better when working with composite types, such as structures.
- Strong typing helps you detect errors in the code before they result in incorrect data and other negative effects. To do this, linters and IDEs can be used.
- Speed: due to the Skiff format and some other optimizations, CPU consumption is several times lower than for untyped data.
Installation
To work with the yt.wrapper.schema
module, install the packages:
typing_extensions
ytsaurus-yson
six
Introduction
To work with tabular data, you must declare a class with the yt.wrapper.yt_dataclass
decorator and mark its fields with types. The field type comes after the colon. It can be:
- Python built-in types:
int
,float
,str
, and others. - Custom classes with the
@yt_dataclass
decorator. - Composite types from the typing module.
List
,Dict
,Optional
, andTuple
are currently supported. We're also planning to add support for a number of other types in the future. - Special types from
yt.wrapper.schema
:Int8
,Uint16
,OtherColumns
, and others (a more complete list is given below).
Example:
@yt.wrapper.yt_dataclass
class Row:
id: int
name: str
is_robot: bool = False
For the class described in this way, a constructor and other service methods will be generated. In particular, __eq__
and __repr__
. You can specify a default value for some fields. It will get into the constructor signature. You can create objects of this class as follows: row = Row(id=123, name="foo")
. For all the fields without default values (such as robot: bool = False
), you need to pass relevant fields to the constructor. Otherwise, it will throw an exception.
The data classes support inheritance.
When you have a class like this, you can:
- Create a table with a relevant schema (you can just start writing to an empty or a non-existent table or use the
TableSchema.from_row_type()
function). - Write and read tables, example.
- Run operations, example.
Data classes can also be created based on table schemas explicitly or automatically, by reading structured data.
yt_table_schema = client.get(f"{table_path}/@schema")
dataclass_type = yt.schema.make_dataclass_from_table_schema(yt.schema.TableSchema.from_yson_type(yt_table_schema))
# or
typed_table_data = list(client.read_table_structured(table=table_path, row_type=None))
Special types
yt.wrapper.schema |
Python | Diagram |
---|---|---|
Int8, Int16 , Int32 , Int64 |
int |
int8 , int16 , int32 , int64 |
Uint8 , Uint16 , Uint32 , Uint64 |
int |
uint8 , uint16 , uint32 , uint64 |
YsonBytes |
bytes |
yson /any |
OtherColumns |
OtherColumns |
Corresponds to several columns. |