Examples of Cypress object processing

This section provides examples of Cypress object processing.

Static tables

Creating

Use the create command to create a table.

yt create table //home/dev/test_table
1282c-1ed72c-3fe0191-443cf2ee

Deleting

Use the remove command to delete a table.

yt remove //home/dev/test_table

Reading

To read an existing table named <table>, use the read-table command.

yt read-table [--format FORMAT]
                [--table-reader TABLE_READER]
                <table>

The --format option defines the output data format. The supported formats are json, yson, dsv, and schemaful_dsv. For more information, see Formats.

yt read-table --format dsv //home/dev/test_table
day=monday	time=10
day=wednesday	time=20
day=friday	time=30

The --table-reader option modifies the table read settings.

Sampling

The sampling_rate attribute sets the percentage of data that must be read from an input table.

yt read-table --format dsv --table-reader '{"sampling_rate"=0.3}' //home/dev/test_table
day=monday	time=10
day=friday	time=30

In this example, read-table will return 30% of all the rows in the input table.

The sampling_seed controls a random number generator that selects the rows. It guarantees the same output for the same sampling_speed and collection of input chunks. If unspecified, the sampling_seed attribute will be random.

yt read-table --format dsv --table-reader '{"sampling_seed"=42;"sampling_rate"=0.3}' //home/dev/test_table

To place sampling output in a different table, please run map:

yt map cat --src //home/dev/input --dst //home/dev/output --spec '{job_io = {table_reader = {sampling_rate = 0.001}}}' --format yson

Note

The specified sampling_rate attribute notwithstanding, sampling reads all data from disk.

Overwriting

The write-table command overwrites an existing table called <table> with the data transmitted.

yt write-table [--format FORMAT]
                 [--table-writer TABLE_WRITER]
                 <table>

The --format option defines the input data format. The supported formats are json, yson, dsv, and schemaful_dsv. For more information, see Formats.

yt write-table --format dsv //home/dev/test_table
time=10	day=monday
time=20	day=wednesday
time=30 day=friday
^D

To add records to a table, use the <append=true> option before the table path.

cat test_table.json
{"time":"10","day":"monday"}
{"time":"20","day":"wednesday"}
{"time":"30","day":"friday"}
cat test_table.json | yt write-table --format json "<append=true>"//home/dev/test_table

The --table-writer option modifies the table write settings:

Limit on table row size

When writing table data, the YTsaurus system checks its size. A write will return an error if the size is greater than the maximum legal value.
By default, maximum row size is 16 MB. To modify this value, use the --table-writer option's max_row_weight parameter. Enter a value in bytes.

cat test_table.json | yt write-table --format json --table-writer {"max_row_weight"=33554432} //home/dev/test_table

Note

max_row_weight cannot be greater than 128 MB.

Chunk size

The desired_chunk_size defines chunk size in bytes.

cat test_table.json | yt write-table --format json --table-writer {"desired_chunk_size"=1024} //home/dev/test_table

Replication factor

You can control the replication factor for new table chunks through the min_upload_replication_factor and the upload_replication_factor attributes.
upload_replication_factor sets the number of synchronous replicas created when writing new data to a table.
min_upload_replication_factor sets the minimum number of successfully written chunks. Both attributes have a default value of 2 and a maximum value of 10.

cat test_table.json | yt write-table --format json --table-writer '{"upload_replication_factor"=5;"min_upload_replication_factor"=3}' //home/dev/test_table

To increase the number of replicas of an existing table, use one of the methods below:

  • Start a Merge operation:
yt merge --mode auto --spec '{"force_transform"=true; "job_io"={"table_writer"={"upload_replication_factor"=5}}}' --src <> --dst <>
  • Increase the table's replication_factor attribute. This will perform the conversion in the background at a later point in time. New records added to the table will have a replication factor set by the upload_replication_factor attribute. Subsequently, a background process will asynchronously replicate chunks to achieve the replication factor defined for the specific table.
yt set //home/dev/test_table/@replication_factor 5

Medium

To move a table to a different medium, change the value of the primary_medium attribute.
New data written to the table will be delivered directly to the new medium while old data will move in the background.
To force-move data to a new medium, start a Merge:

yt set //home/dev/test_table/@primary_medium ssd_blobs
yt merge --mode auto --spec '{"force_transform"=true;}' --src //home/dev/test_table --dst //home/dev/test_table

To check whether a table's medium has changed, run the command below:

$ yt get //home/dev/test_table/@chunk_media_statistics

{
    "ssd_blobs" = {
        "chunk_count" = 2126;
        "uncompressed_data_size" = 9667220402266;
        "compressed_data_size" = 4954465956017;
        "data_weight" = 10764306825793;
        "max_block_size" = 6584787;
    };
}

The amount is shown in bytes.