Introduction
YQL (Yandex Query Language) is a language of universal declarative queries against data storage and processing systems as well as an infrastructure to run such queries. YQL benefits include:
- A powerful graph execution engine that can build MapReduce pipelines with hundreds of nodes and adapt during computation.
- Ability to build complex data processing pipelines using SQL by storing subqueries in variables as chains of dependent queries and transactions.
- Predictable parallel execution of queries of any complexity.
- Efficient implementation of joins, subqueries, and window functions with no restrictions on their topology or nesting.
- Extensive function library.
- Support for user-defined functions in C++, Python.
- Automatic execution of small parts of queries on prepared compute instances, bypassing MapReduce operations to reduce latency.
YQL provides a functional web interface where, among other things, you can:
- Write query code.
- Start and stop query execution.
- View query execution results.
- View query history.
How to try
To run your first YQL query:
-
Open the web interface of the YTsaurus cluster and go to the Queries tab — you can get there from the menu on the left.
-
Enter the query and click the start button:
SELECT "Hello, World!";
Glossary
Term |
English translation |
Description |
Query |
Query |
Program text in YQL |
Operation |
Operation |
Query execution process. In YQL terminology, a query and an operation relate to each other much like a program and a process do in an operating system. |
Table |
Table |
Logically, a table is a list of structures ( List items form rows, while structure members serve as table cells. Vertically aligned cells constitute a table column. |
Expression |
Expression |
A computed value. Typically, it takes one or more table cells as input, with the output becoming a cell in another table. Examples:
|
Statement |
Statement |
Query components separated by semicolons and starting with a verb. Examples:
|
Subquery |
Subquery |
A query component that, similarly to tables, can be used as input for statements or other subqueries. Optionally, subqueries can be parameterized |
Named node |
Named node |
A mechanism for reusing expressions and subqueries within one query. More Examples:
|
Lambda |
Lambda |
A parameterizable block consisting of one or more named nodes (specifically expressions, not statements) where the result of the last node becomes the result of the entire lambda call. The call is made by passing parameters in parentheses. Learn more |
UDF (User-Defined Function) |
UDF (User-Defined Function) |
Functions that let you integrate business logic into a query using one of the supported popular programming languages. C++ UDFs are loaded in compiled Since the YQL optimizer can't peek inside the Python interpreter, we recommend using lambda functions and C++ UDFs for better performance. |
Action |
Action |
A parameterizable block consisting of one or multiple statements that can then be invoked any number of times using special keywords. Unlike lambda functions, they don't return any result. Learn more |
Library |
Library |
A query component stored in a separate file for reuse or convenience. Learn more |