How to split traffic
By default, all proxies in a cluster have the default role. This creates a "noisy neighbor" problem: one user running a heavy read operation can overload a proxy and slow down work for everyone else, including web interface requests.
The solution is to split proxies into groups using roles. This lets you distribute load between light and heavy requests and isolate resources for different projects and teams.
This article shows how to:
Splitting light and heavy traffic (for HTTP)
In practice, HTTP proxies are usually divided into two functional groups:
- Control proxies (typically with the
controlrole) β serve "light" requests: browsing the web interface, directory listing, creating/deleting tables, working with metadata. - Data proxies (typically with the
defaultrole) β serve "heavy" requests: reading and writing tables or files.
For high availability, it's recommended to run several proxy replicas per role.
You can set roles in the Ytsaurus specification:
spec:
httpProxies:
# Heavy group (accepts Data traffic by default)
- role: default
instanceCount: 5
serviceType: NodePort
# Control group (UI and metadata)
- role: control
instanceCount: 2
serviceType: LoadBalancer # Convenient for Ingress
Note
The role name becomes part of the K8s pod name (for example, hp-control-0). Therefore, use only DNS-compatible names (lowercase letters, numbers, and hyphens). The underscore (_) cannot be used.
How it works:
- The operator creates two independent pod groups (StatefulSets) with labels
controlanddefault. - The operator automatically creates a Kubernetes Service for each group. The
http-proxies-controlservice is configured with a selector that targets only pods with thecontrollabel. - You configure Ingress/DNS to point to the
controlgroup's service. Clients will automatically get Data proxy addresses through the Discovery mechanism and send heavy requests directly to them.
Note
If you create a Service or Ingress manually (without the operator), make sure the service's Label Selector has the correct filter for the role label. You can check labels on running pods using the command kubectl get pods --show-labels. Without this filter, the load balancer will send light requests to all proxies in turn, including Data proxies.
HTTP access and Ingress specifics
For HTTP proxies, Ingress controllers are often used for SSL termination and routing. If Ingress is used only for Control proxies, and Data proxies are exposed via NodePort (as described above), the scheme works efficiently.
If you need to route all traffic through Ingress (including Data), you'll have to solve several problems:
- Double load balancing and reduced efficiency.
- Complex Ingress rules for each Data proxy pod individually (for per-pod routing).
- Configuring sticky sessions for correct transaction handling.
The sticky session problem
When using Ingress for Data proxies, it's important to consider how transactions work. Transactions in YTsaurus are created on a specific proxy, and all subsequent requests within a single transaction must go to that same proxy. Requests routed through the Ingress controller may intermittently result in errors like "Transaction not found" or "Invalid transaction state". To avoid this, configure sticky sessions in your Ingress controller.
Configuring sticky sessions depends on the Ingress controller used:
- NGINX Ingress: use the annotation
nginx.ingress.kubernetes.io/affinity: "cookie" - Traefik: configure
stickinessin the service configuration - HAProxy Ingress: use
haproxy.org/cookie-persistence
Example for NGINX Ingress with sticky sessions
This Ingress configures NGINX to mark the client with a special cookie and always send them to the same proxy pod.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: yt-data-proxies
annotations:
# Enable sticky sessions: NGINX will remember which pod to send the client to
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "yt-proxy-id"
nginx.ingress.kubernetes.io/session-cookie-max-age: "14400"
spec:
rules:
- host: yt-data.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: http-proxies-default
port:
number: 80
About TLS
Important
When exposing Data proxies directly (via NodePort), traffic does not pass through Ingress, meaning it's transmitted without TLS encryption.
For high-load installations, if there are no strict encryption requirements, the optimal scheme remains split:
- Use Ingress only for Control traffic (for convenient access to the UI, light API requests, and HTTPS).
- Publish Data proxies directly via NodePort or LoadBalancer with the Discovery mechanism configured (traffic will be sent in clear text but with maximum performance).
If security requirements mandate encryption for external clients, there are two options:
- Route all traffic through Ingress (with sticky sessions configured), taking into account the potential performance degradation,
- Or configure load balancers (Network Load Balancer) or internal TLS within the cluster itself.
Project isolation (for RPC)
For RPC proxies, splitting by projects is more common. For example, you can create a dedicated proxy group with the role project-a and grant access to it only to a specific user.
spec:
rpcProxies:
# Pods for general needs
- role: default
instanceCount: 2
serviceType: NodePort
# Dedicated pods for Project A
- role: project-a
instanceCount: 2
serviceType: NodePort
For a client to start working with a dedicated proxy group, you need to explicitly specify the proxy_role in the application code (or in the CLI settings). If you don't do this, the client will use the default role.
Example for Python
# Example of a Python client hard-bound to a group
client = yt.YtClient(proxy="yt.cluster", config={"proxy_role": "project-a"})
Example for CLI
# Can be passed via environment variable before running scripts
export YT_CONFIG_PATCHES='{proxy_role="project-a"}'
yt list //home
Sticky sessions β a load balancing mechanism where the Ingress controller binds a client to a specific server (pod) for the entire session duration. On the first request, the load balancer gives the client a special cookie tag, ensuring all subsequent requests are guaranteed to go to the same pod.