Query Tracker
Introduction
Query Tracker is a YTsaurus component that allows working with the system using human-readable queries in SQL-like languages. The user sends queries to Query Tracker, they are executed by the engine, and the execution result is returned to the user.
With Query Tracker, you can:
- Send ad hoc queries for execution.
- Track query execution.
- Save query results.
- View query history.
- Share queries with others.
See the example of working with Query Tracker.
Queries are defined by engine and text. The engine controls the execution of the query. The query text is engine-specific.
In this way, you can process data using different languages and runtime environments whilst honoring the guarantees and performance targets.
Engines
Currently supported execution engines include:
- YT QL
- A query language built in YT. Only supports dynamic tables.
- YQL
- Executes the query on YQL agents, which break the query into individual YT operations (map, reduce, ...), start them, then retrieve and return the result.
- CHYT
- Executes the query on a clique.
- SPYT
- Executes the query on the Spark cluster.
API
All operations accept the optional query_tracker_stage
parameter that can be used to select the Query Tracker installation where you want your queries to be run. The default value is production
.
start_query
Sends the query for execution. Returns the query ID.
Required parameters:
engine
: Query execution engine. Supported values areql
,yql
,chyt
, andspyt
.query
: Query text.
Optional parameters:
files
: List of files for the query in YSON format.settings
: Additional query parameters in YSON format.- In CHYT, you must set a clique alias using the
clique
parameter. Default value is the public clique. - In SPYT, you must set the housekeeping directory of an existing Spark cluster using the
discovery_path
parameter. - In YQL, you can run queries in different modes using the
execution_mode
parameter. Acceptable values arerun
,validate
, andoptimize
. The default value isrun
.
- In CHYT, you must set a clique alias using the
draft
: Used to mark draft queries. These queries are terminated automatically without execution.annotations
: Arbitrary annotations to the query. They can make it easier to search for queries. Specified in YSON format.access_control_objects
: List of objects at//sys/access_control_object_namespaces/queries/
that controls access to the query for other users.
Example: start_query(engine="yql", query="SELECT * FROM my_table", access_control_objects=["my_aco"])
abort_query
Aborts query execution. Doesn't return anything.
Required parameters:
query_id
: Query ID.
Optional parameters:
message
: Abort message.
Example: abort_query(query_id="my_query_id")
.
get_query_result
Returns meta information about query execution results.
Required parameters:
query_id
: Query ID.result_index
: Result ID.
Example: get_query_result(query_id="my_query_id", result_index=0)
.
read_query_result
Returns the query results.
Required parameters:
query_id
: Query ID.result_index
: Result ID.
Optional parameters:
columns
: List of columns to read.lower_row_index
: First row from which to start reading the results.upper_row_index
: Last row at which to stop reading the results.
Example: read_query_result(query_id="my_query_id", result_index=0)
.
get_query
Returns information about the query.
Required parameters:
query_id
: Query ID.
Optional parameters:
attributes
: Filters the returned attributes.timestamp
: Returned query information will be consistent with the system as at the specified point in time.
Example: get_query(query_id="my_query_id")
.
list_queries
Gets a list of queries that match the set filters.
Optional parameters:
from_time
: Lower threshold for the query start time.to_time
: Upper threshold for the query start time.cursor_direction
: Direction in which to sort queries by start time.cursor_time
: Cursor stop time. Only applicable ifcursor_direction
is specified.user_filter
: Filter by query creator.state_filter
: Filter by query state.engine_filter
: Filter by query engine.substr_filter
: Filter by query ID, annotations, and access control object.limit
: Number of items to return. The default value is 100.attributes
: Filters the returned attributes.
Example: list_queries()
.
alter_query
Modifies the query. Doesn't return anything.
Required parameters:
query_id
: Query ID.
Optional parameters:
annotations
: New annotations for the query.access_control_objects
: New access control object list for the query.
Example: alter_query(query_id="my_query_id", access_control_object=["my_new_aco"])
.
Access control
To manage access to queries and their results, the query includes a list of access_control_objects
strings that point to //sys/access_control_object_namespaces/queries/[access_control_object]
.
An Access Control Object (ACO) is an object with the @principal_acl
attribute. It sets access rules in the same manner as @acl
does for Cypress nodes. For more information, see Access control.
You can create an ACO through the user interface or by calling the create command:
yt create access_control_object --attr '{namespace=queries;name=my_aco}'
.
All APIs use ACOs to verify access:
Use
grants access to the query.Read
grants access to the query results.Administer
grants the permission to modify and stop the query.
Detailed information on each API method:
start_query
checks whether the passed ACO exists.alter_query
checks whether the query ACO grants theAdminister
permission to the user and whether the passed ACO exists.get_query
checks whether the query ACO grants theUse
permission to the user.list_queries
checks whether the query ACO grants theUse
permission to the user.get_query_result
checks whether the query ACO grants theRead
permission to the user.read_query_result
checks whether the query ACO grants theRead
permission to the user.abort_query
checks whether the query ACO grants theAdminister
permission to the user.
A few things to keep in mind:
- The query creator always has access to their queries.
- If a query has multiple ACOs, you only need to have access to one of them to get access to the query.
For convenience, "ytsaurus-k8s-operator" creates the following ACOs:
nobody
. Doesn't grant any permissions.everyone
. GrantsUse
andRead
permissions to all users.everyone-use
. GrantsUse
permissions to all users.everyone-share
. The same as "everyone", but is not returned inlist_queries
.
Example
A typical usage scenario for Query Tracker is as follows:
- Sending a query for execution.
start_query(engine="yql", query="SELECT * from //home/me/my_table", access_control_objects=["my_team_aco"])
.
- Getting a list of our available queries.
list_queries()
.
- Getting information about our query.
get_query(query_id="my_query_id")
.
- Waiting for the query to complete.
get_query(query_id="my_query_id").state == "completed"
.
- Getting meta information about the results (result size, table schema).
get_query_result(query_id="my_query_id", result_index=0)
.
- Reading the result.
read_query_result(query_id="my_query_id", result_index=0)
.
- Changing the ACO of the query to make it accessible to everyone.
alter_query(query_id="my_query_id", access_control_objects=["everyone"])
.