Query Tracker
Introduction
Query Tracker is a YTsaurus component that allows working with the system using human-readable queries in SQL-like languages. The user sends queries to Query Tracker, they are executed by the engine, and the execution result is returned to the user.
With Query Tracker, you can:
- Send ad hoc queries for execution.
- Track query execution.
- Save query results.
- View query history.
- Share queries with others.
Example of working with Query Tracker.
Queries are defined by engine and text. The engine controls the execution of the query. The query text is engine-specific.
In this way, you can process data using different languages and runtime environments whilst honoring the guarantees and performance targets.
Engines
Currently supported execution engines include:
- YT QL
- A query language built in YT. Only supports dynamic tables.
- YQL
- Executes the query on YQL agents, which break the query into individual YT operations (map, reduce, ...), start them, then retrieve and return the result.
- CHYT
- Executes the query on a clique.
- SPYT
- Executes the query on the Spark cluster.
API
All operations accept the optional query_tracker_stage
parameter that can be used to select the Query Tracker installation where you want your queries to be run. The default value is production
.
start_query
Sends the query for execution. Returns the query ID.
Required parameters:
engine
: Query execution engine. Supported values areql
,yql
,chyt
, andspyt
.query
: Query text.
Optional parameters:
files
: List of files for the query in YSON format.settings
: Additional query parameters in YSON format.draft
: Used to mark draft queries. These queries are terminated automatically without execution.annotations
: Arbitrary annotations to the query. They can make it easier to search for queries. Specified in YSON format.access_control_objects
: List of objects at//sys/access_control_object_namespaces/queries/
that controls access to the query for other users.
Example: start_query(engine="yql", query="SELECT * FROM my_table", access_control_objects=["my_aco"])
abort_query
Aborts query execution. Doesn't return anything.
Required parameters:
query_id
: Query ID.
Optional parameters:
message
: Abort message.
Example: abort_query(query_id="my_query_id")
.
get_query_result
Returns meta information about query execution results.
Required parameters:
query_id
: Query ID.result_index
: Result ID.
Example: get_query_result(query_id="my_query_id", result_index=0)
.
read_query_result
Returns the query results.
Required parameters:
query_id
: Query ID.result_index
: Result ID.
Optional parameters:
columns
: List of columns to read.lower_row_index
: First row from which to start reading the results.upper_row_index
: Last row at which to stop reading the results.
Example: read_query_result(query_id="my_query_id", result_index=0)
.
get_query
Returns information about the query.
Required parameters:
query_id
: Query ID.
Optional parameters:
attributes
: Filters the returned attributes.timestamp
: Returned query information will be consistent with the system as at the specified point in time.
Example: get_query(query_id="my_query_id")
.
list_queries
Gets a list of queries that match the set filters.
Optional parameters:
from_time
: Lower threshold for the query start time.to_time
: Upper threshold for the query start time.cursor_direction
: Direction in which to sort queries by start time.cursor_time
: Cursor stop time. Only applicable ifcursor_direction
is specified.user_filter
: Filter by query creator.state_filter
: Filter by query state.engine_filter
: Filter by query engine.substr_filter
: Filter by query ID, annotations, and access control object.limit
: Number of items to return. The default value is 100.attributes
: Filters the returned attributes.
Example: list_queries()
.
alter_query
Modifies the query. Doesn't return anything.
Required parameters:
query_id
: Query ID.
Optional parameters:
annotations
: New annotations for the query.access_control_objects
: New access control object list for the query.
Example: alter_query(query_id="my_query_id", access_control_object=["my_new_aco"])
.
Access control
To manage access to queries and their results, the query includes a list of access_control_objects
strings that point to //sys/access_control_object_namespaces/queries/[access_control_object]
.
An Access Control Object (ACO) is an object with the @principal_acl
attribute. It sets access rules in the same manner as @acl
does for Cypress nodes. For more information, see Access control.
You can create an ACO through the user interface or by calling the create command:
yt create access_control_object --attr '{namespace=queries;name=my_aco}'
.
All APIs use ACOs to verify access:
Use
grants access to the query.Read
grants access to the query results.Administer
grants the permission to modify and stop the query.
Detailed information on each API method:
start_query
checks whether the passed ACO exists.alter_query
checks whether the query ACO grants theAdminister
permission to the user and whether the passed ACO exists.get_query
checks whether the query ACO grants theUse
permission to the user.list_queries
checks whether the query ACO grants theUse
permission to the user.get_query_result
checks whether the query ACO grants theRead
permission to the user.read_query_result
checks whether the query ACO grants theRead
permission to the user.abort_query
checks whether the query ACO grants theAdminister
permission to the user.
A few things to keep in mind:
- The query creator always has access to their queries.
- If a query has multiple ACOs, you only need to have access to one of them to get access to the query.
For convenience, "ytsaurus-k8s-operator" creates the following ACOs:
nobody
. Doesn't grant any permissions.everyone
. GrantsUse
andRead
permissions to all users.everyone-use
. GrantsUse
permissions to all users.everyone-share
. The same as "everyone", but is not returned inlist_queries
.
Example
A typical usage scenario for Query Tracker is as follows:
- Sending a query for execution.
start_query(engine="yql", query="SELECT * from //home/me/my_table", access_control_objects=["my_team_aco"])
.
- Getting a list of our available queries.
list_queries()
.
- Getting information about our query.
get_query(query_id="my_query_id")
.
- Waiting for the query to complete.
get_query(query_id="my_query_id").state == "completed"
.
- Getting meta information about the results (result size, table schema).
get_query_result(query_id="my_query_id", result_index=0)
.
- Reading the result.
read_query_result(query_id="my_query_id", result_index=0)
.
- Changing the ACO of the query to make it accessible to everyone.
alter_query(query_id="my_query_id", access_control_objects=["everyone"])
.