FLATTEN
FLATTEN BY
Converts rows in the source table using vertical unpacking of containers of variable length (lists or dictionaries).
For example:
-
Source table:
[a, b, c] 1 [d] 2 [] 3 -
The table resulting from
FLATTEN BYon the left column:a 1 b 1 c 1 d 2
Example
$sample = AsList(
AsStruct(AsList('a','b','c') AS value, CAST(1 AS Uint32) AS id),
AsStruct(AsList('d') AS value, CAST(2 AS Uint32) AS id),
AsStruct(AsList() AS value, CAST(3 AS Uint32) AS id)
);
SELECT value, id FROM as_table($sample) FLATTEN BY (value);
This conversion can be convenient in the following cases:
-
When it is necessary to output statistics by cells from a container column (for example, via
GROUP BY). -
When the cells in a container column store IDs from another table that you want to join with
JOIN.
Syntax
FLATTEN BYis specified afterFROM, but beforeGROUP BY, ifGROUP BYis present in the query.- The type of the result column depends on the type of the source column:
| Container type | Result type | Comments |
|---|---|---|
List<X> |
X |
List cell type |
Dict<X,Y> |
Tuple<X,Y> |
Tuple of two elements containing key-value pairs |
Optional<X> |
X |
The result is almost equivalent to the clause WHERE foo IS NOT NULL, but the foo column type is changed to X |
- By default, the result column replaces the source column. Use
FLATTEN BY foo AS barto keep the source container. As a result, the source container is still available asfooand the output container is available asbar. - To build a Cartesian product of multiple container columns, use the clause
FLATTEN BY (a, b, c). Parentheses are mandatory to avoid grammar conflicts. - Inside
FLATTEN BY, you can only use column names from the input table. To applyFLATTEN BYto the calculation result, use a subquery. - In
FLATTEN BYyou can use both columns and arbitrary named expressions (unlike columns,ASis required in this case). To avoid grammatical ambiguities of the expression afterFLATTEN BY, make sure to use parentheses with the following:... FLATTEN BY (ListSkip(col, 1) AS col) ... - If the source column had nested containers, for example,
List<DictX,Y>,FLATTEN BYunpacks only the outer level. To completely unpack the nested containers, use a subquery.
Note
FLATTEN BY interprets optional data types as lists of length 0 or 1. The table rows with NULL are skipped, and the column type changes to a similar non-optional type.
FLATTEN BY makes only one conversion at a time, so use FLATTEN LIST BY or FLATTEN OPTIONAL BY on optional containers, for example, Optional<List<String>>.
Applying FLATTEN BY to YSON
To apply FLATTEN BY to YSON, you first need to transform it into one of the container types discussed above. For instance, you can do this using the UDF module of the same name from the list of presets.
Specifying the container type
To specify the type of container to convert to, you can use:
-
FLATTEN LIST BYFor
Optional<List<T>>,FLATTEN LIST BYwill unpack the list, treatingNULLas an empty list. -
FLATTEN DICT BYFor
Optional<Dict<T>>,FLATTEN DICT BYwill unpack the dictionary, interpretingNULLas an empty dictionary. -
FLATTEN OPTIONAL BYTo filter the
NULLvalues without serialization, specify the operation by usingFLATTEN OPTIONAL BY.
Examples
SELECT
t.item.0 AS key,
t.item.1 AS value,
t.dict_column AS original_dict,
t.other_column AS other
FROM my_table AS t
FLATTEN DICT BY dict_column AS item;
SELECT * FROM (
SELECT
AsList(1, 2, 3) AS a,
AsList("x", "y", "z") AS b
) FLATTEN LIST BY (a, b);
SELECT * FROM (
SELECT
"1;2;3" AS a,
AsList("x", "y", "z") AS b
) FLATTEN LIST BY (String::SplitToList(a, ";") as a, b);
Analogues of FLATTEN BY in other DBMS
- PostgreSQL:
unnest - Hive:
LATERAL VIEW - MongoDB:
unwind - Google BigQuery:
FLATTEN - ClickHouse:
ARRAY JOIN / arrayJoin
FLATTEN COLUMNS
Transforms each column of type Struct into individual columns, one for each field within the struct. The names of the new columns are the names of the fields from the original struct columns. Columns that are not structs remain unchanged.
- Only one level of the struct is flattened.
- The original struct columns are not included in the result; their names are not used anywhere.
- All column names in the resulting table (including names from struct fields in the original columns and names of non-struct columns) must be unique; name conflicts result in an error.
Example
SELECT x, y, z, not_struct
FROM (
SELECT
AsStruct(
1 AS x,
"foo" AS y),
AsStruct(
false AS z),
1 as not_struct,
) FLATTEN COLUMNS;