Configuring server component logging
The server components of the YTsaurus cluster generate detailed logs that can be used to audit and analyze problems during operation. For production installations, we recommend allocating dedicated storage locations on persistent volumes for these logs. Absence of logs can significantly complicate support.
You can use Prometheus metrics with the yt_logging_*
prefix to analyze the logging subsystem.
Debugging logs
Debugging logs are described in the loggers
section of the YTsaurus component specification.
Table 1 — YTsaurus debug logger settings
Field | Possible values | Description |
---|---|---|
name |
arbitrary string | The logger name (we recommend choosing short and clear names like debug or info ). |
format |
plain_text (default), yson , json |
The format of the log string. |
minLogLevel |
trace , debug , info , error |
The minimum level for records that reach the log. |
categoriesFilter |
A filter that only lets you write logs from some subsystems (see below). | |
writerType |
file , stderr |
Write logs to a file or stderr. When writing to stderr, the rotation settings are ignored. |
compression |
none (default), gzip , zstd |
If a value other than none is set, the YTsaurus server will write compressed logs. |
useTimestampSuffix |
true , false (default) |
If true , a timestamp is added to the file name when it's opened or on rotation. At the same time, the numbering mechanism doesn't apply to old segments. This option is only relevant when writing to a file. |
rotationPolicy |
Log rotation settings (see below). This option is only relevant when writing to a file. |
The path to the directory for logs with writerType=file
is set in the Logs
type location description. If no Logs
location is specified, they are written to /var/log
.
Log file names follow the format [component].[name].log(.[format])(.[compression])(.[timestamp_suffix])
. Examples:
controller-agent.error.log
master.debug.log.gzip
scheduler.info.log.json.zstd.2023-01-01T10:30:00
Debug log entries contain the following fields:
instant
— the time in the local time zonelevel
— write level:T
— trace,D
— debug,I
— info,W
— warning,E
— errorcategory
— the name of the subsystem the recording belongs to (for example,ChunkClient
,ObjectServer
, orRpcServer
)message
— message bodythread_id
— the ID or name of the thread that generated the entry (only written asplain_text
)fiber_id
— the ID of the fiber that generated the record (only written asplain_text
)trace_id
— the trace_context ID the recording appeared for (only written asplain_text
)
Sample entry
2023-09-15 00:00:17,215385 I ExecNode Artifacts prepared (JobId: d15d7d5f-164ff08a-3fe0384-128e0, OperationId: cd56ab80-d21ef5ab-3fe03e8-d05edd49, JobType: Map) Job fff6d4149ccdf656 2bd5c3c9-600a44f5-de721d58-fb905017
Recommendations for configuring categories
There are two types of category filters (categoriesFilter
):
- inclusive — records are only written for categories that were explicitly listed
- exclusive — records are written for any categories except those that are listed
In large installations, you often need to exclude the Bus
and Concurrency
categories.
Sample filters
categoriesFilter:
type: exclude
values: ["Bus", "Concurrency"]
categoriesFilter:
type: include
values: ["Scheduler", "Strategy"]
Structured logs
Some YTsaurus components can generate structured logs, which you can later use for auditing, analytics, and automatic processing. Structured logs are described in the structured_loggers
section of the YTsaurus component specification.
Structured loggers are described using the same fields as debugging logs, except:
writerType
— not set (structured logs are always written to a file)categoriesFilter
— the requiredcategory
field is set instead and is equal to one category
Structured logs should always be written in a structured format: JSON
or YSON
. Events in a structured log are usually recorded at the info
level. The set of structured log fields varies depending on the specific log type.
The main types of structured logs:
master_access_log
— data access log (written on the master,Access
category)master_security_log
— log of security events like adding a user to a group or modifying an ACL (written on the master,SecurityServer
category)structured_http_proxy_log
— log of requests to http proxy, one line per request (written on http proxy,HttpStructuredProxy
category)chyt_log
— log of requests to CHYT, one line per request (written on http proxy,ClickHouseProxyStructured
catgeory)structured_rpc_proxy_log
— log of requests to rpc proxy, one line per request (written on rpc proxy,RpcProxyStructuredMain
category)scheduler_event_log
— scheduler event log, written by the scheduler (SchedulerEventLog
category)controller_event_log
— log of controller agent events, written on the controller agent (ControllerEventLog
category)
Table access log
Sometimes, you may need to know who's using a particular table, for example, to evaluate the consequences of it being moved or deleted. This might be difficult if the table is accessed by links.
For this, there is a special log that records events involving Cypress nodes that might be of interest to users.
Logs are written by master servers. Due to technical reasons, several servers produce the same sequence of entries, so duplication is to be expected. On top of that, some actions (for example, writing to a table) are represented as a sequence of several different events on different master servers (from different shards). This is covered in more detailed below.
Each log entry (table row) contains one command applied to a certain Cypress node.
Only the commands applied to the following types of nodes are recorded:
- Table
- File
- Document
- Journal
It must be noted that directories are not included in this list.
The following commands are recorded:
- Basic (CRUD):
- Create
- Get
- GetKey
- Exists
- List
- Set
- Remove
- Creating a symbolic link:
- Link
- Locking:
- Lock
- Unlock
- Copying and moving:
- Copy
- Move
- BeginCopy, EndCopy
- Reading and writing data:
- GetBasicAttributes
- Fetch
- BeginUpload
- EndUpload
- Changing the state of a dynamic table:
- PrepareMount, CommitMount, AbortMount
- PrepareUnmount, CommitUnmount, AbortUnmount
- PrepareRemount, CommitRemount, AbortRemount
- PrepareFreeze, CommitFreeze, AbortFreeze
- PrepareUnfreeze, CommitUnfreeze, AbortUnfreeze
- PrepareReshard, CommitReshard, AbortReshard
- Other:
- CheckPermission
Below you can find some comments on the command semantics.
Each log entry has certain fields (table columns), which are represented in Table 1.
Table 1 — Description of log fields
Field | Description |
---|---|
instant |
Event time in the format YYYY-MM-DD hh:mm:ss,sss |
cluster |
Short cluster name |
method |
Command (see the above list) |
path (see Note) |
Path passed to the command as an argument |
original_path (see Note) |
Path passed to the command as an argument |
destination_path (see Note) |
Destination path for the Copy, Move, and Link commands (not applicable to other commands) |
original_destination_path (see Note) |
Destination path for the Copy, Move, and Link commands (not applicable to other commands) |
user |
User who gave the command |
type |
Type of node created with the Create command (not applicable to other commands) |
transaction_info |
Information about the transaction where the command was executed (not applicable to cases where the command was executed outside of a transaction) |
Note
The difference between original_path
and path
(as well as between original_destination_path
and destination_path
) is as follows:
- If a link (symbolic link) was specified as a path,
original_path
will contain the path to the link, whereaspath
will contain the actual path to the node. - If this path leads to a shard, its log will feature the actual path under
original_path
, while the path relative to the root of the shard will be written inpath
.
Overall, this means that if you grep a relative path to a symbolic link, the command will always return entries containing the actual path to the node, while grepping the actual path finds accesses, including via symbolic links. The key takeaway here is that you should search both by path
and by original_path
.
The structure of the transaction_info
field is shown in Table 2.
Table 2 — Structure of the transaction_info
field
Field | Description |
---|---|
transaction_id |
Transaction ID |
transaction_title |
Human-readable description of the transaction (specified by the client upon transaction start; the field is missing if the description wasn't specified) |
operation_id |
ID of the operation associated with the transaction |
operation_title |
Human-readable description of the operation associated with the transaction |
parent |
For a nested transaction, the description of its parent (for top-level transactions, the field is missing) |
Please note that the parent
field is structured the same way as transaction_info
. Thus, transaction_info
contains the full recursive description of the ancestry of the transaction where the command was run.
Notes
- The reading and writing of metadata needs to be distinguished from the reading and writing of data (chunks) to tables and files.
- From a master's point of view, data reads/writes look like the following sequence of commands:
- Reading:
- GetBasicAttributes: Getting some service attributes necessary for reading.
- Fetch: Getting a list of chunks that make up the file or table.
- Writing:
- GetBasicAttributes: Getting some service attributes necessary for writing.
- BeginUpload: Starting the upload transaction.
- EndUpload: Completing the upload transaction.
- Reading:
- When reading/writing data, the GetBasicAttributes command targets one cell, while Fetch, BeginUpload, and EndUpload target another — this is normal.
- In most cases, copying or moving a table looks like the Copy or Move commands. The BeginCopy and EndCopy commands are used when copying/moving crosses the Cypress sharding boundaries. In practice, such cases are rare.
HTTP Proxy request log
This log contains entries for all requests handled by the HTTP proxy.
Table 2 — Description of log fields
Field | Description |
---|---|
instant |
Event time in the format YYYY-MM-DD hh:mm:ss,sss |
cluster |
Short name of the cluster |
request_id |
Identifier of the request |
correlation_id |
Special request identifier generated by the client and unchanged in case of retries |
user |
User making the request |
method |
HTTP request method |
http_path |
HTTP request path |
user_agent |
Contents of the User-Agent header in the request |
command |
Command |
parameters |
Parameters of the command |
path |
Value of the path parameter |
error |
Structured description of the error if the request failed |
error_code |
Error code if the request failed |
http_code |
HTTP response code for the request |
start_time |
Actual start time of the request execution on the proxy |
cpu_time |
Time spent by the proxy for request execution (excluding time spent in other cluster components) |
duration |
Total duration of the request |
in_bytes |
Size of the request data in bytes |
out_bytes |
Size of the response data in bytes |
remote_address |
Address from which the request originated |
Configuring log rotation
For debug and structured logs written to a file, you can configure the built-in rotation mechanism (the rotationPolicy
field). The rotation settings are detailed in the table. If the useTimestampSuffix
option isn't enabled, an index number is appended to the file names of old segments on rotation.
Table 3 — Log rotation settings
Field | Description |
---|---|
rotationPeriodMilliseconds |
Rotation period in milliseconds. Can be set together with maxSegmentSize . |
maxSegmentSize |
Log segment size limit in bytes. Can be set together with rotationPeriodMilliseconds . |
maxTotalSizeToKeep |
Total segment size limit in bytes. At the time of rotation, the oldest logs are deleted to meet the limit. |
maxSegmentCountToKeep |
Limit on the number of stored log segments. The oldest segments over the limit are deleted. |
Dynamic configuration
Components that support dynamic configuration let you further refine the logging system settings using the dynamic config in Cypress (logging
section).
Basic parameters:
enable_anchor_profiling
— enables Prometheus metrics for individual record prefixesmin_logged_message_rate_to_profile
— the minimum message frequency for inclusion in a separate metricsuppressed_messaged
— a list of debug log message prefixes to be excluded from logging
Configuration example:
{
logging = {
enable_anchor_profiling = %true;
min_logged_message_rate_to_profile = 100;
suppressed_messaged = [
"Skipping out of turn block",
"Request attempt started",
"Request attempt acknowledged"
];
}
}
Sample logging settings
primaryMasters:
...
loggers:
- name: debug
compression: zstd
minLogLevel: debug
writerType: file
rotationPolicy:
maxTotalSizeToKeep: 50_000_000_000
rotationPeriodMilliseconds: 900000
categoriesFilter:
type: exclude
values: ["Bus", "Concurrency", "ReaderMemoryManager"]
- name: info
minLogLevel: info
writerType: file
rotationPolicy:
maxTotalSizeToKeep: 10_000_000_000
rotationPeriodMilliseconds: 900000
- name: error
minLogLevel: error
writerType: stderr
structuredLoggers:
- name: access
minLogLevel: info
category: Access
rotationPolicy:
maxTotalSizeToKeep: 5_000_000_000
rotationPeriodMilliseconds: 900000
locations:
- locationType: Logs
path: /yt/logs
- ...
volumeMounts:
- name: master-logs
mountPath: /yt/logs
- ...
volumeClaimTemplates:
- metadata:
name: master-logs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 100Gi
- ...