HTTP proxy reference
Command execution structure
When developing a library to work with YTsaurus, you should understand the command execution structure.
Each executed command is a structure containing:
- Information about the user, allowing for authentication (in case of working over HTTP, it is most often a token).
- Information about the input and output data format (represented as a YSON string with attributes).
- Information about the command parameters (represented as a YSON dict).
- Input (byte) data stream.
- Output (byte) data stream.
Below are the formal coding specifications for each item in the HTTP proxy, which include more features for interpreting HTTP queries (partially to support compatibility with such HTTP clients as web browsers). To work correctly with YTsaurus, you need to support a smaller set of features, namely:
- Specify
X-YT-Header-Format
that defines the format for theX-YT-Input-Format
,X-YT-Output-Format
, andX-YT-Parameters
headers (the header value is YSON). - Specify
X-YT-Input-Format
andX-YT-Output-Format
encoded according toX-YT-Header-Format
and defining the input and output data stream formats. - Pass all command parameters in the
X-YT-Parameters
header encoded according toX-YT-Header-Format
. - Consider all HTTP responses 5xx as a transport-layer error. You must distinguish code 503 that is a clear signal of temporary unavailability from code 500 that signals an error on the YTsaurus side. In the first case, you can repeat the query.
- In all other responses (2xx, 4xx) about the successful execution of the command, judge by
X-YT-Error
(preferably) orX-YT-Response-Code
(extracted from theX-YT-Error
error code) andX-YT-Response-Message
(extracted from theX-YT-Error
error troubleshooting) headers/trailers.
Selecting an HTTP method for a command
To select an HTTP method for a command, just use the following algorithm:
- If the command has an input data stream, then PUT.
- If the command is mutating, then POST.
- Otherwise GET.
Data formats for operations
Interpreting input and output byte streams (with a data type other than binary) is determined by the data format. A description of the format used is a YSON string possibly with additional attributes. A standard way to describe the formats used are the X-YT-Input-Format
and X-YT-Output-Format
headers. For better compatibility with HTTP libraries, the Accept
and Content-Type
headers are partially supported. Below is a detailed description of the format definition rules.
Specifying an input data format
An input data format is determined using the following rules. Each successive rule overlaps the previous one.
- If the
Content-Type
header is specified and the specified MIME type is in the correspondence table, the format is selected from the table. - If the
X-YT-Input-Format
header is specified, the header content is interpreted as a JSON-encoded YSON string with attributes and it is used as an input format description. - If neither variant 1 nor variant 2 is successful, YSON is used.
Specifying an output data format
An output data format is determined using the following rules. Each successive rule overlaps the previous one.
- If the
Accept
header is specified, the best MIME type from the table corresponding to the Accept header is selected and then the format is taken from the table. Content-Type is equal to the matching MIME type. - If the
X-YT-Output-Format
header is specified, the header content is interpreted as a JSON-encoded YSON string with attributes and it is used as an output format description. Content-Type is equal toapplication/octet-stream
. - If neither variant 1 nor variant 2 is successful, Pretty YSON is used. Content-Type is equal to
text/plain
.
HTTP return codes
If the web server realizes that there is an error before any data is sent to the client, return code 4xx or 5xx indicates an error. If the data started to be delivered to the client, the return code will be (Accepted
) and there will be a YTsaurus return code in the X-YT-Response-Code
trailer header. A non-zero X-YT-Response-Code
indicates an error. In this case, an error message (as a JSON string) is specified in the X-YT-Response-Message
trailer header.
Table 1 — Return codes
Return code | Description |
---|---|
200 | The command was successfully completed |
202 | The command is executed, the response body will be sent over an HTTP stream, and internal return codes are written in the trailer headers |
307 | Redirecting heavy queries from light to heavy proxies |
400 | The command was executed, but an error was returned (detailed JSON-encoded error in the body) |
401 | Unauthenticated query |
404 | Unknown command |
405 | Incorrect HTTP method |
406 | Incorrect format in Accept |
415 | Incorrect format in Accept-Encoding |
429 | The limit on the number of queries from a user was exceeded |
500 | Error on the proxy side |
503 | Service unavailable. The query must be repeated later |
Query debugging
Note
All queries support optional debug headers. Processing them is not mandatory, but recommended.
In the query:
- Generate guid and set it in the
X-YT-Correlation-Id
header. This header helps you find the query by log even if the response to it did not come to the client.
In the response:
X-YT-Request-Id
: ID of the query generated by the proxy. You need it to find the query in the YTsaurus proxy log.X-YT-Proxy
: Hostname of the proxy from which the response came. It is important when the query passes through a balancer.
Advanced features
Compression
You can use compression when transmitting data via an HTTP proxy. The proxy selects a codec for incoming data based on the Content-Encoding
header and for outgoing data based on the Accept-Encoding
header. Possible codecs are listed in the table.
Content-Encoding/Accept-Encoding | Codec |
---|---|
identity | Missing |
gzip, deflate | Standard zlib |
Parameters
Each command has a set of additional parameters represented as a YSON dict. A standard method to pass parameters is the X-YT-Parameters
header interpreted as a JSON-encoded YSON dict.
List of heavy proxies (/hosts)
All YTsaurus commands are classified into two groups: light and heavy. Heavy commands are associated with a large I/O and, consequently, a network load. To isolate this load, light (controlling) commands are separated from the heavy ones. When you try to execute a heavy command, light proxies return code 503. The balancing of heavy commands among heavy proxies is performed by the client.
Before you execute a heavy command, you need to query /hosts
and get a list of proxies ordered by load. The load is estimated based on the current CPU and network load, as well as some planned future load on the proxies. The very first proxy in the resulting list is the least loaded, and that's the one you want to use in simple cases (= in 80% of cases).
Available API versions (/api)
The HTTP API is versioned (as you can see from the /v3
prefix in the examples from here). The API version is changed if there are backward-incompatible changes to the set of supported commands or to the semantics of any of the existing commands. Adding new commands does not usually change the API version. The HTTP proxy supports the two latest versions of the API.
A list of supported API versions can be obtained from the URL /api
.
Getting a list of available APIs:
$ curl -v -X GET "http://$YT_PROXY/api"
> GET /api HTTP/1.1
< HTTP/1.1 200 OK
["v3","v4"]
The table of correspondence of MIME types and YT formats
The correspondence of MIME types to YT formats is represented in the table.
MIME type | YT format |
---|---|
application/json | json |
application/x-yt-yson-binary | <format=binary>yson |
application/x-yt-yson-text | <format=text>yson |
application/x-yt-yson-pretty | <format=pretty>yson |
application/x-yamr-delimited | <lenval=false;has_subkey=false>yamr |
application/x-yamr-lenval | <lenval=true;has_subkey=false>yamr |
application/x-yamr-subkey-delimited | <lenval=false;has_subkey=true>yamr |
application/x-yamr-subkey-lenval | <lenval=true;has_subkey=true>yamr |
text/tab-separated-values | dsv |
text/x-tskv | <line_prefix=tskv>dsv |
Framing
Some commands may use a special protocol on top of the usual HTTP.
If the client specifies the X-YT-Accept-Framing: 1
header, the proxy can respond with the X-YT-Framing: 1
header.
In this case, the response body will consist of entries of the <tag> <header> <frame>
type.
Frame types:
Name | Tag | Header | Frame | Comment |
---|---|---|---|---|
Data | 0x01 |
4-byte little-endian number — frame size | frame body | |
Keep-alive | 0x02 |
missing | missing | "the data is being prepared, please wait" |