npm - @hamelin.sh/documentation - Versions diffs - 0.4.12 → 0.4.14 - Mend

@hamelin.sh/documentation 0.4.12 → 0.4.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/dist/main.js +4 -4
package/package.json +1 -1

package/dist/main.js CHANGED Viewed

@@ -9,7 +9,7 @@ var HAMELIN_DOCUMENTATION = {
   "command-reference/let.md": "# `LET`\n\nAdd or modify columns without affecting other columns.\n\n## Syntax\n\n```\nLET assignment [, assignment]* [,]?\n```\n\n## Parameters\n\n- **assignment** - Column assignment in the form `identifier = expression`\n\n## Description\n\nThe `LET` command adds new columns or modifies existing ones while \npreserving all other columns in the dataset. Unlike `SELECT`, which replaces \nthe entire column set, `LET` makes incremental changes to the data structure.\n\nYou can specify multiple assignments in a single `LET` command, separated by \ncommas. Each assignment creates or updates the specified field with the \nresult of evaluating the expression. Expressions can reference any field \navailable at the point where the `LET` command appears in the pipeline.\n\nWhen the identifier already exists as a column, `LET` modifies that column's \nvalues. When the identifier does not exist, `LET` creates a new column with \nthat name.\n\n\n\n## Related Commands\n\n- **[SELECT](./select.md)** - Completely redefine output columns",
   "command-reference/limit.md": "# `LIMIT`\n\nRestrict the number of rows returned.\n\n## Syntax\n\n```\nLIMIT expression\n```\n\n## Parameters\n\n- **expression** - Non-negative integer specifying the maximum number of rows to return\n\n## Description\n\nThe `LIMIT` command restricts the output to a specified maximum number of \nrows. The expression must evaluate to a non-negative integer value. When the \ndataset contains fewer rows than the limit, all rows are returned.\n\nYou typically use `LIMIT` in combination with `SORT` to retrieve the top or \nbottom N results from an ordered dataset. When you use it without sorting, \n`LIMIT` returns an arbitrary subset of rows, which may vary between query \nexecutions.\n\nThe limit is applied after all other operations in the pipeline, making it \nuseful for controlling output size while preserving the full computation \ncontext for earlier commands.\n\n## Related Commands\n\n- **[SORT](./sort.md)** - Order rows by expressions (commonly used with LIMIT)\n\n",
   "command-reference/lookup.md": "# `LOOKUP`\n\nCombine datasets using left outer join logic - all original rows are preserved.\n\n## Syntax\n\n```\nLOOKUP fromClause [ON expression]?\n```\n\n## Parameters\n\n- **fromClause** - Either a dataset identifier or an alias assignment (`alias = dataset`)\n- **expression** - Boolean condition defining how rows should match\n\n## Description\n\nThe `LOOKUP` command performs a left outer join operation, combining the \ncurrent dataset with another dataset based on matching conditions you specify \nin the `ON` clause. All rows from the original dataset are preserved in the \nresults, regardless of whether they have matches in the lookup dataset.\n\nFor rows without matches, the looked-up data struct is set to `null`. For \nrows with matches, the looked-up data is nested as a struct to prevent field \nname collisions. By default, the struct uses the name of the lookup dataset, \nbut you can override this using the assignment syntax in the `fromClause`.\n\nWhen you omit the `ON` clause, the lookup becomes a cartesian product of all \nrows from both datasets. The lookup condition expression can reference fields \nin the current dataset directly and fields in the lookup dataset by name \nusing dot notation (e.g., `users.email`).\n\n\n\n## Related Commands\n\n- **[JOIN](./join.md)** - Inner join that only keeps rows with matches",
-  "command-reference/match.md": '# `MATCH`\n\nFind ordered sequences of events using pattern matching with quantifiers.\n\n## Syntax\n\n```\nMATCH pattern+ \n      [BY groupClause [, groupClause]*] [,]?\n      [SORT [BY]? sortExpression [, sortExpression]*] [,]?\n      [WITHIN interval] [,]?\n```\n\n## Parameters\n\n- **pattern** - Named dataset reference with optional quantifier (`*`, `+`, `?`, `{n}`) specifying sequence requirements\n- **interval** - Time interval specifying the maximum duration for the entire pattern sequence (e.g., `5m`, `1h`)\n- **groupClause** - Field or expression to group pattern matching by\n- **sortExpression** - Field or expression to order results by, with optional `ASC` or `DESC` direction\n\n## Description\n\nThe `MATCH` command finds ordered sequences of events across multiple named \ndatasets using regular expression-style pattern matching. Unlike `WINDOW`, \nwhich performs unordered correlation, `MATCH` requires that events occur in a \nspecific temporal sequence.\n\nYou specify patterns using named datasets (defined with `WITH` clauses) \nfollowed by optional quantifiers. Quantifiers include `*` (zero or more), `+` \n(one or more), `?` (zero or one), `{n}` (exactly n), and `{n,m}` (between n \nand m occurrences).\n\n### Implicit Timestamp Ordering\n\n`MATCH` commands automatically sort by the `timestamp` column unless you explicitly specify a different `SORT` clause. This default ordering ensures that pattern matching operates on temporally ordered event sequences.\n\nIf you explicitly provide a `SORT` clause, the automatic timestamp ordering is disabled, and your custom ordering takes effect.\n\n### Time Constraints with WITHIN\n\nThe `WITHIN` clause constrains the total duration from the start of the first pattern to the end of the last pattern. For example, `WITHIN 5m` ensures that the entire pattern sequence completes within 5 minutes. The constraint must be a positive value and cannot be negative.\n\nWhen using `WITHIN`:\n- You must specify exactly one `SORT` expression (or use the implicit `timestamp` ordering)\n- The `WITHIN` constraint measures on the actual `SORT` column, not hardcoded to timestamp\n- The `SORT` column type must be compatible with the `WITHIN` type:\n  - `TIMESTAMP` sort columns work with `INTERVAL` (e.g., `5m`) or `CALENDAR_INTERVAL` (e.g., `1y`, `3mon`)\n  - Numeric sort columns require matching numeric `WITHIN` types (e.g., `INT` sort with `INT` within)\n- If you don\'t specify a `SORT` clause, the implicit `timestamp` ordering is used automatically\n\nThe `BY` clause partitions data for independent pattern matching within each group. Results contain matched sequences with access to \nfirst and last events via `first()` and `last()` functions.\n\n## Examples\n\n### Basic Pattern Matching with WITHIN\n\n```hamelin\nWITH login AS (FROM events WHERE event_type = "login"),\n     suspicious AS (FROM events WHERE event_type = "suspicious_activity")\n\nFROM events \n| MATCH login suspicious WITHIN 5m\n```\n\nFinds sequences where a login event is followed by suspicious activity within 5 minutes.\n\n### Multiple Patterns with Time Constraint\n\n```hamelin\nWITH file_access AS (FROM events WHERE event_type = "file_access"),\n     data_exfil AS (FROM events WHERE event_type = "data_transfer")\n\nFROM events \n| MATCH file_access+ data_exfil BY user_id WITHIN 10m\n```\n\nFinds sequences where one or more file access events are followed by data transfer, all occurring within 10 minutes, grouped by user.\n\n### Numeric Ordering with WITHIN\n\n```hamelin\nWITH login AS (FROM events WHERE event_type = "login"),\n     action AS (FROM events WHERE event_type = "action")\n\nFROM events \n| MATCH login action SORT BY row_number WITHIN 10\n```\n\nFinds sequences where a login is followed by an action within 10 row numbers. The `WITHIN` constraint measures on the `SORT` column (`row_number`), ensuring the distance from first to last event is at most 10.\n\n## Related Commands\n\n- **[WINDOW](./window.md)** - Unordered correlation and aggregation',
+  "command-reference/match.md": '# `MATCH`\n\nFind ordered sequences of events using pattern matching with quantifiers.\n\n## Syntax\n\n```\nMATCH pattern+\n      [AGG aggExpression [, aggExpression]*] [,]?\n      [BY groupClause [, groupClause]*] [,]?\n      [SORT [BY]? sortExpression [, sortExpression]*] [,]?\n      [WITHIN interval] [,]?\n```\n\n## Parameters\n\n- **pattern** - Named dataset reference with optional quantifier (`*`, `+`, `?`, `{n}`) specifying sequence requirements\n- **aggExpression** - Aggregation expression applied to matched rows (e.g., `total = sum(value)`)\n- **interval** - Time interval specifying the maximum duration for the entire pattern sequence (e.g., `5m`, `1h`)\n- **groupClause** - Field or expression to group pattern matching by\n- **sortExpression** - Field or expression to order results by, with optional `ASC` or `DESC` direction\n\n## Description\n\nThe `MATCH` command finds ordered sequences of events across multiple named \ndatasets using regular expression-style pattern matching. Unlike `WINDOW`, \nwhich performs unordered correlation, `MATCH` requires that events occur in a \nspecific temporal sequence.\n\nYou specify patterns using named datasets (defined with `WITH` clauses) \nfollowed by optional quantifiers. Quantifiers include `*` (zero or more), `+` \n(one or more), `?` (zero or one), `{n}` (exactly n), and `{n,m}` (between n \nand m occurrences).\n\n### Implicit Timestamp Ordering\n\n`MATCH` commands automatically sort by the `timestamp` column unless you explicitly specify a different `SORT` clause. This default ordering ensures that pattern matching operates on temporally ordered event sequences.\n\nIf you explicitly provide a `SORT` clause, the automatic timestamp ordering is disabled, and your custom ordering takes effect.\n\n### Time Constraints with WITHIN\n\nThe `WITHIN` clause constrains the total duration from the start of the first pattern to the end of the last pattern. For example, `WITHIN 5m` ensures that the entire pattern sequence completes within 5 minutes. The constraint must be a positive value and cannot be negative.\n\nWhen using `WITHIN`:\n- You must specify exactly one `SORT` expression (or use the implicit `timestamp` ordering)\n- The `WITHIN` constraint measures on the actual `SORT` column, not hardcoded to timestamp\n- The `SORT` column type must be compatible with the `WITHIN` type:\n  - `TIMESTAMP` sort columns work with `INTERVAL` (e.g., `5m`) or `CALENDAR_INTERVAL` (e.g., `1y`, `3mon`)\n  - Numeric sort columns require matching numeric `WITHIN` types (e.g., `INT` sort with `INT` within)\n- If you don\'t specify a `SORT` clause, the implicit `timestamp` ordering is used automatically\n\nThe `BY` clause partitions data for independent pattern matching\nwithin each group.\n\n### Aggregating matched rows with AGG\n\nThe `AGG` clause computes aggregations over the rows that\nparticipated in each matched sequence. This is distinct from\nregular `AGG` \u2014 match aggregation functions only see the rows\nthat were part of the match, not all rows in the time window.\n\nThe following functions are available in `MATCH AGG`:\n\n| Function | Description |\n|----------|-------------|\n| `count()` | Count all matched rows |\n| `count(x)` | Count non-null values of `x` across matched rows |\n| `sum(x)` | Sum numeric values across matched rows |\n| `avg(x)` | Average numeric values across matched rows |\n| `min(x)` | Minimum value across matched rows |\n| `max(x)` | Maximum value across matched rows |\n| `first(x)` | Value of `x` from the first matched row |\n| `last(x)` | Value of `x` from the last matched row |\n\nThese functions share names with the regular aggregation functions\nbut operate exclusively on matched rows. You can combine multiple\naggregations in a single `AGG` clause.\n\n## Examples\n\n### Basic Pattern Matching with WITHIN\n\n```hamelin\nWITH login AS (FROM events WHERE event_type = "login"),\n     suspicious AS (FROM events WHERE event_type = "suspicious_activity")\n\nFROM events \n| MATCH login suspicious WITHIN 5m\n```\n\nFinds sequences where a login event is followed by suspicious activity within 5 minutes.\n\n### Multiple Patterns with Time Constraint\n\n```hamelin\nWITH file_access AS (FROM events WHERE event_type = "file_access"),\n     data_exfil AS (FROM events WHERE event_type = "data_transfer")\n\nFROM events \n| MATCH file_access+ data_exfil BY user_id WITHIN 10m\n```\n\nFinds sequences where one or more file access events are followed by data transfer, all occurring within 10 minutes, grouped by user.\n\n### Aggregating over matched sequences\n\n```hamelin\nWITH failed AS (FROM events WHERE event_type = "login_failed"),\n     success AS (FROM events WHERE event_type = "login_success")\n\nFROM events\n| MATCH failed+ success\n  AGG attempt_count = count(),\n      first_failure = first(timestamp),\n      last_failure = last(timestamp)\n  BY user_id\n  WITHIN 10m\n```\n\nFinds brute force login sequences and computes how many events\nparticipated in each match, along with the timestamps of the first\nand last matched events. The `count()` here counts only the rows\nthat matched the `failed+ success` pattern, not every row in the\n10-minute window.\n\n### Numeric Ordering with WITHIN\n\n```hamelin\nWITH login AS (FROM events WHERE event_type = "login"),\n     action AS (FROM events WHERE event_type = "action")\n\nFROM events \n| MATCH login action SORT BY row_number WITHIN 10\n```\n\nFinds sequences where a login is followed by an action within 10 row numbers. The `WITHIN` constraint measures on the `SORT` column (`row_number`), ensuring the distance from first to last event is at most 10.\n\n## Related Commands\n\n- **[WINDOW](./window.md)** - Unordered correlation and aggregation',
   "command-reference/nest.md": "# `NEST`\n\nNest all currently defined fields into one sub-struct.\n\n## Syntax\n\n```\nNEST identifier\n```\n\n## Parameters\n\n- **identifier** - The field name to create for the nested structure\n\n## Description\n\nThe `NEST` command takes all currently defined fields and nests them into one \nsub-struct by creating a new field containing a struct. All original fields \nare preserved as properties of the nested struct. This operation is the \ninverse of UNNEST.\n\n## Related Commands\n\n- **[UNNEST](./unnest.md)** - Lifts struct fields into the parent or enclosing result set (inverse operation)",
   "command-reference/parse.md": '# `PARSE`\n\nExtract structured data from string fields using anchor parsing.\n\n## Syntax\n\n```\nPARSE [expression] pattern AS? identifier (, identifier)* [NODROP]\n```\n\n## Parameters\n\n- **expression** - Optional source field to parse (defaults to current row context)\n- **pattern** - Anchor pattern string using star (*) characters to mark extraction points\n- **identifier** - Output field names for extracted values\n- **NODROP** - Optional flag to preserve the original source field\n\n## Description\n\nThe `PARSE` command provides a lightweight pattern matching approach that is a\nsimple alternative to complex regular expressions. It extracts structured data\nfrom string fields using anchor parsing with star (*) characters. The pattern\nstring uses literal text as anchors with star characters marking extraction\npoints (e.g., "prefix-*-suffix" extracts the value between the anchors). You\nmust provide as many output identifiers as there are star (*) characters in\nthe pattern. The command creates new fields containing the extracted values.\nBy default, rows that don\'t match the pattern are filtered out. When you\nspecify NODROP, non-matching rows are preserved with all output fields set to\nnull.\n',
   "command-reference/rows.md": "# `ROWS`\n\nInject rows into a pipeline.\n\n## Syntax\n\n```\nROWS expression\n```\n\n## Parameters\n\n- **expression** - An expression that evaluates to an array-of-struct which defines the rows to inject\n\n## Description\n\nThe `ROWS` command injects rows into a pipeline by taking an array-of-struct\nexpression and creating one row for each struct element. This is mostly used\nfor examples or playgrounds and is rarely useful in real queries over larger\ndatasets. The `ROWS` command is functionally equivalent to `UNNEST` of a literal\narray.\n\n## Related Commands\n\n- **[EXPLODE](./explode.md)** - Expand array fields into separate rows (similar row generation behavior)\n- **[UNNEST](./unnest.md)** - Lift struct or array of struct fields into the parent or enclosing result set (functionally equivalent for literal arrays)\n",
@@ -484,18 +484,18 @@ scoring weights without affecting the overall detection logic.`,
   "function-reference/array-functions.md": "# Array Functions\n\nScalar functions for array processing and manipulation that can be used in any expression context.\n\n## `array_distinct(x)`\n\nRemoves duplicate elements from an array.\n\n### Parameters\n\n- **x** - Array expression\n\n### Description\n\nThe `array_distinct()` function returns a new array containing only the unique\nelements from the input array. The order of elements in the result is not\nguaranteed. If the input array is null, the function returns null.\n\n## `any(x)`\n\nTests whether any element in a boolean array is true.\n\n### Parameters\n\n- **x** - Array of boolean expressions\n\n### Description\n\nThe `any()` function returns true if at least one element in the boolean array\nis true, false if all elements are false. If the array is empty, it returns\nfalse. If the array contains only null values, it returns null. This function\nperforms logical OR aggregation across array elements.\n\n## `all(x)`\n\nTests whether all elements in a boolean array are true.\n\n### Parameters\n\n- **x** - Array of boolean expressions\n\n### Description\n\nThe `all()` function returns true if all elements in the boolean array are true,\nfalse if at least one element is false. If the array is empty, it returns true.\nIf the array contains only null values, it returns null. This function performs\nlogical AND aggregation across array elements.\n\n## `max(x)`\n\nReturns the maximum element from an array.\n\n### Parameters\n\n- **x** - Array of numeric, string, or timestamp expressions\n\n### Description\n\nThe `max()` function finds and returns the largest element in the array. For\nnumeric arrays, it returns the numerically largest value. For string arrays,\nit uses lexicographic ordering. For timestamp arrays, it returns the\nchronologically latest value. If the array is empty or contains only null\nvalues, it returns null.\n\n## `min(x)`\n\nReturns the minimum element from an array.\n\n### Parameters\n\n- **x** - Array of numeric, string, or timestamp expressions\n\n### Description\n\nThe `min()` function finds and returns the smallest element in the array. For\nnumeric arrays, it returns the numerically smallest value. For string arrays,\nit uses lexicographic ordering. For timestamp arrays, it returns the\nchronologically earliest value. If the array is empty or contains only null\nvalues, it returns null.\n\n## `sum(x)`\n\nReturns the sum of all numeric elements in an array.\n\n### Parameters\n\n- **x** - Array of numeric expressions\n\n### Description\n\nThe `sum()` function calculates the sum of all numeric elements in the array.\nNull values are ignored in the calculation. If the array is empty or contains\nonly null values, it returns null. The result type matches the element type\nfor exact numeric types.\n\n## `avg(x)`\n\nReturns the average of all numeric elements in an array.\n\n### Parameters\n\n- **x** - Array of numeric expressions\n\n### Description\n\nThe `avg()` function calculates the arithmetic mean of all numeric elements in\nthe array. Null values are ignored in the calculation. If the array is empty or\ncontains only null values, it returns null. The result is always a floating-point\ntype regardless of the input element type. This function divides the sum of all\nnon-null elements by the count of non-null elements.\n\n## `len(x)`\n\nReturns the number of elements in an array.\n\n### Parameters\n\n- **x** - Array expression of any element type\n\n### Description\n\nThe `len()` function returns the number of elements in the array as an integer.\nThis includes null elements in the count. If the array itself is null, the\nfunction returns null. An empty array returns 0.\n\n## `filter_null(x)`\n\nRemoves null elements from an array.\n\n### Parameters\n\n- **x** - Array expression of any element type\n\n### Description\n\nThe `filter_null()` function returns a new array containing only the non-null\nelements from the input array. The order of remaining elements is preserved.\nIf all elements are null, it returns an empty array. If the input array is\nnull, the function returns null.\n\n## `slice(array, start, end)`\n\nExtracts a portion of an array between two indices.\n\n### Parameters\n\n- **array** - Array expression of any element type\n- **start** - Integer expression for the starting index (0-based, supports negative indices)\n- **end** - Integer expression for the ending index (exclusive, supports negative indices)\n\n### Description\n\nThe `slice()` function returns a new array containing elements from the start\nindex up to but not including the end index. Both indices are 0-based and support\nnegative values, where -1 refers to the last element, -2 to the second-last, and\nso on. If start is greater than or equal to end, an empty array is returned. The\nfunction handles out-of-bounds indices gracefully by clamping them to valid array\nboundaries.\n\n## `split(string, delimiter)`\n\nSplits a string into an array of substrings.\n\n### Parameters\n\n- **string** - String expression to split\n- **delimiter** - String expression used as the separator\n\n### Description\n\nThe `split()` function divides a string into an array of substrings based on the\nspecified delimiter. The delimiter itself is not included in the resulting array\nelements. If the delimiter is not found in the string, the function returns an\narray containing the original string as its only element. An empty delimiter\nresults in an error. Consecutive delimiters produce empty strings in the result.\n\n## `array_join(array, delimiter)` / `array_join(array, delimiter, null_replacement)`\n\nJoins array elements into a single string.\n\n### Parameters\n\n- **array** - Array of string expressions\n- **delimiter** - String expression to place between elements\n- **null_replacement** (optional) - String expression to use for null elements\n\n### Description\n\nThe `array_join()` function concatenates all elements of a string array into a\nsingle string, placing the delimiter between each element. By default, null\nelements are skipped. When you provide a null_replacement parameter, null\nelements are replaced with that string before joining. This is the inverse of\nthe `split()` function and is useful for creating delimited strings from arrays.\n\n## `flatten(x)`\n\nFlattens a nested array by one level.\n\n### Parameters\n\n- **x** - Array of arrays expression\n\n### Description\n\nThe `flatten()` function takes an array of arrays and returns a single array\ncontaining all elements from the nested arrays. It only flattens one level deep,\nso arrays nested more deeply remain as array elements. The order of elements is\npreserved, with elements from earlier arrays appearing before elements from later\narrays. If the input contains null arrays, they are skipped. This is useful for\ncombining multiple arrays or processing results from operations that return arrays.",
   "function-reference/conditional-functions.md": "# Conditional Functions\n\nScalar functions for conditional logic and branching that can be used in any expression context.\n\n## `if(condition, then)` / `if(condition, then, else)`\n\nReturns different values based on a boolean condition.\n\n### Parameters\n\n- **condition** - Boolean expression to evaluate\n- **then** - Expression to return when condition is true\n- **else** (optional) - Expression to return when condition is false\n\n### Description\n\nThe `if()` function evaluates the condition and returns the `then` expression\nif the condition is true. When used with two parameters, it returns null if\nthe condition is false. When used with three parameters, it returns the `else`\nexpression if the condition is false.\n\nBoth the `then` and `else` expressions must be of the same type when the\nthree-parameter form is used. The function provides a concise way to implement\nconditional logic within expressions.\n\n## `case(when: then, when: then, ...)`\n\nReturns values based on multiple conditions evaluated in order.\n\n### Parameters\n\n- **when: then** - Variable number of condition-value pairs\n\n### Description\n\nThe `case()` function evaluates multiple condition-value pairs in order and\nreturns the value associated with the first condition that evaluates to true.\nUnlike SQL's CASE WHEN syntax, Hamelin uses function syntax with colon-separated\npairs.\n\nEach condition must be a boolean expression, and all values must be of the same\ntype. If no condition matches, the function returns null. The conditions are\nevaluated in the order they appear, so earlier conditions take precedence.\n\n## `coalesce(...)`\n\nReturns the first non-null value from a list of expressions.\n\n### Parameters\n\n- **...** - Variable number of expressions of the same type\n\n### Description\n\nThe `coalesce()` function evaluates expressions from left to right and returns\nthe first expression that is not null. If all expressions are null, it returns\nnull. All expressions must be of the same type.\n\nThis function is commonly used for providing default values or handling null\nvalues in expressions. It's particularly useful when you want to fall back\nthrough a series of potentially null values to find the first valid one.",
   "function-reference/data-structure-functions.md": "# Data Structure Functions\n\nScalar functions for data structure operations and type information that can be used in any expression context.\n\n## `typeof(x)`\n\nReturns type information for any expression.\n\n### Parameters\n\n- **x** - Expression of any type\n\n### Description\n\nThe `typeof()` function returns a struct containing detailed type information\nabout the input expression. The result includes both the Hamelin type name\nand the corresponding SQL type name. This function is useful for debugging,\ntype introspection, and understanding how Hamelin maps types to the underlying\nSQL engine.\n\n## `map(keys, values)`\n\nCreates a map from separate key and value arrays.\n\n### Parameters\n\n- **keys** - Array expression containing map keys\n- **values** - Array expression containing map values\n\n### Description\n\nThe `map()` function creates a map by pairing elements from the keys array\nwith elements from the values array. Both arrays must have the same length.\nThe nth element from the keys array is paired with the nth element from the\nvalues array. If the arrays have different lengths, an error is raised.\n\n## `map(elements)`\n\nCreates a map from an array of key-value tuples.\n\n### Parameters\n\n- **elements** - Array of tuples where each tuple contains a key and value\n\n### Description\n\nThe `map()` function creates a map from an array of key-value pairs represented\nas tuples. Each tuple in the array must contain exactly two elements: the first\nelement becomes the key, and the second element becomes the value. This format\nis useful when you have structured key-value data.\n\n## `map()`\n\nCreates an empty map.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `map()` function creates an empty map with unknown key and value types.\nThis is useful for initializing map variables or as a starting point for\nmap operations. The specific key and value types are inferred from subsequent\nusage context.\n\n## `map(key: value, ...)`\n\nCreates a map from literal key-value pairs.\n\n### Parameters\n\n- **key: value** - Variable number of key-value pairs using colon syntax\n\n### Description\n\nThe `map()` function creates a map from explicitly specified key-value pairs\nusing Hamelin's colon syntax. Each key must be unique within the map. All keys\nmust be of the same type, and all values must be of the same type. This provides\na concise way to create maps with known literal values.\n\n## `map_keys(map)`\n\nExtracts all keys from a map as an array.\n\n### Parameters\n\n- **map** - Map expression\n\n### Description\n\nThe `map_keys()` function returns an array containing all keys from the input\nmap. The order of keys in the resulting array is not guaranteed. If the map\nis empty, it returns an empty array. If the map is null, the function returns null.\n\n## `map_values(map)`\n\nExtracts all values from a map as an array.\n\n### Parameters\n\n- **map** - Map expression\n\n### Description\n\nThe `map_values()` function returns an array containing all values from the input\nmap. The order of values in the resulting array corresponds to the order of keys\nreturned by `map_keys()`, though this order is not guaranteed across calls. If the\nmap is empty, it returns an empty array. If the map is null, the function returns\nnull. This is useful for extracting and processing all values from a map structure.\n\n## `parse_json(json)`\n\nParses a JSON string into a variant type.\n\n### Parameters\n\n- **json** - String expression containing valid JSON\n\n### Description\n\nThe `parse_json()` function parses a JSON string and returns the result as\na variant type that can represent any JSON structure including objects, arrays,\nstrings, numbers, booleans, and null values. If the input string is not valid\nJSON, an error is raised. The variant type preserves the original JSON structure\nand allows dynamic access to nested elements.\n\n## `parse_json(variant)`\n\nReturns a variant value unchanged (identity function for variants).\n\n### Parameters\n\n- **variant** - Variant expression\n\n### Description\n\nWhen `parse_json()` is called with a variant input, it simply returns the\nvariant unchanged. This overload allows `parse_json()` to be safely used\non values that might already be variants without causing errors or unnecessary\nconversions.\n\n## `to_json_string(json)`\n\nConverts a variant to its JSON string representation.\n\n### Parameters\n\n- **json** - Variant expression containing JSON data\n\n### Description\n\nThe `to_json_string()` function converts a variant type back into a JSON string.\nThis is the inverse of `parse_json()`, allowing you to serialize structured data\nback to JSON format. The resulting string is properly formatted JSON that can be\nstored, transmitted, or parsed by other systems. Complex nested structures are\npreserved, and the output follows standard JSON formatting rules.\n\n## `len(collection)`\n\nReturns the number of elements in a collection.\n\n### Parameters\n\n- **collection** - Array or map expression\n\n### Description\n\nThe `len()` function returns the number of elements in arrays or maps as an\ninteger. For arrays, it counts all elements including null values. For maps,\nit counts the number of key-value pairs. If the collection is null, the\nfunction returns null. Empty collections return 0.\n\n## `filter_null(array)`\n\nRemoves null elements from an array.\n\n### Parameters\n\n- **array** - Array expression of any element type\n\n### Description\n\nThe `filter_null()` function returns a new array containing only the non-null\nelements from the input array. The order of remaining elements is preserved.\nIf all elements are null, it returns an empty array. If the input array is\nnull, the function returns null. This function is essential for cleaning\ndata before further processing.",
-  "function-reference/match-group-functions.md": "# Match Group Functions\n\nFunctions for accessing events within pattern matching groups that must be used with the `MATCH` command.\n\n## `first(expression)` / `first(expression, offset)`\n\nReturns the value of an expression from the first event in a match group.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the first event\n- **offset** (optional) - Integer specifying which occurrence to access (default: 0)\n\n### Description\n\nThe `first()` function retrieves the value of the specified expression from\nthe first event in the current match group. When used with the offset parameter,\nit returns the value from the first + offset event. This function is commonly\nused to access timestamps, field values, or calculated expressions from the\nbeginning of a matched event sequence.\n\n## `last(expression)` / `last(expression, offset)`\n\nReturns the value of an expression from the last event in a match group.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the last event\n- **offset** (optional) - Integer specifying which occurrence to access (default: 0)\n\n### Description\n\nThe `last()` function retrieves the value of the specified expression from\nthe last event in the current match group. When used with the offset parameter,\nit returns the value from the last - offset event. This function is commonly\nused to measure durations, access final states, or extract values from the\nend of a matched event sequence.\n\n## `prev(expression)`\n\nReturns the value of an expression from the previous event in the sequence.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the previous event\n\n### Description\n\nThe `prev()` function retrieves the value of the specified expression from\nthe event immediately preceding the current event in the match sequence.\nThis function provides access to the previous event's state, enabling\ncomparisons and calculations that depend on sequential relationships\nbetween events.\n\n## `next(expression)`\n\nReturns the value of an expression from the next event in the sequence.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the next event\n\n### Description\n\nThe `next()` function retrieves the value of the specified expression from\nthe event immediately following the current event in the match sequence.\nThis function enables forward-looking analysis and calculations that depend\non subsequent events in the pattern.",
+  "function-reference/match-group-functions.md": "# Match Functions\n\nFunctions that you can only use with the `MATCH` command. These\ninclude accessor functions for navigating matched events and\naggregate functions for computing values across matched sequences.\n\n## Accessor functions\n\nUse these functions to access specific events within a matched\nsequence.\n\n### `first(expression)` / `first(expression, offset)`\n\nReturns the value of an expression from the first event in a\nmatch group.\n\n#### Parameters\n\n- **expression** - Expression to evaluate from the first event\n- **offset** (optional) - Integer specifying which occurrence\n  to access (default: 0)\n\n#### Description\n\nThe `first()` function retrieves the value of the specified\nexpression from the first event in the current match group. When\nused with the offset parameter, it returns the value from the\nfirst + offset event. This function is commonly used to access\ntimestamps, field values, or calculated expressions from the\nbeginning of a matched event sequence.\n\n### `last(expression)` / `last(expression, offset)`\n\nReturns the value of an expression from the last event in a\nmatch group.\n\n#### Parameters\n\n- **expression** - Expression to evaluate from the last event\n- **offset** (optional) - Integer specifying which occurrence\n  to access (default: 0)\n\n#### Description\n\nThe `last()` function retrieves the value of the specified\nexpression from the last event in the current match group. When\nused with the offset parameter, it returns the value from the\nlast - offset event. This function is commonly used to measure\ndurations, access final states, or extract values from the end\nof a matched event sequence.\n\n### `prev(expression)`\n\nReturns the value of an expression from the previous event in\nthe sequence.\n\n#### Parameters\n\n- **expression** - Expression to evaluate from the previous event\n\n#### Description\n\nThe `prev()` function retrieves the value of the specified\nexpression from the event immediately preceding the current event\nin the match sequence. This function provides access to the\nprevious event's state, enabling comparisons and calculations\nthat depend on sequential relationships between events.\n\n### `next(expression)`\n\nReturns the value of an expression from the next event in the\nsequence.\n\n#### Parameters\n\n- **expression** - Expression to evaluate from the next event\n\n#### Description\n\nThe `next()` function retrieves the value of the specified\nexpression from the event immediately following the current event\nin the match sequence. This function enables forward-looking\nanalysis and calculations that depend on subsequent events in the\npattern.\n\n## Aggregate functions\n\nUse these functions in the `MATCH AGG` clause to aggregate values\nacross all rows that participated in a matched sequence. They only\nsee events that were part of the match, not all events in the time\nwindow.\n\n### `count()` / `count(x)`\n\nCounts events in a matched sequence.\n\n#### Parameters\n\n- **x** (optional) - Expression to count non-null values of\n\n#### Description\n\nWhen called with no arguments, `count()` returns the total number\nof events that participated in the matched sequence. When called\nwith an expression, it counts only the matched rows where that\nexpression is non-null.\n\n### `sum(x)`\n\nSums numeric values across a matched sequence.\n\n#### Parameters\n\n- **x** - Numeric expression (int, long, float, or double)\n\n#### Description\n\nThe `sum()` function computes the sum of a numeric expression\nacross all events that participated in the match. Null values are\nignored. The return type is always `double`.\n\n### `avg(x)`\n\nAverages numeric values across a matched sequence.\n\n#### Parameters\n\n- **x** - Numeric expression (int, long, float, or double)\n\n#### Description\n\nThe `avg()` function computes the arithmetic mean of a numeric\nexpression across all events that participated in the match. Null\nvalues are ignored. The return type is always `double`.\n\n### `min(x)`\n\nReturns the minimum value across a matched sequence.\n\n#### Parameters\n\n- **x** - Numeric, string, or timestamp expression\n\n#### Description\n\nThe `min()` function returns the smallest value of the given\nexpression across all events that participated in the match. For\nnumeric values it returns the numerically smallest, for strings\nit uses lexicographic ordering, and for timestamps it returns the\nearliest. The return type matches the input type.\n\n### `max(x)`\n\nReturns the maximum value across a matched sequence.\n\n#### Parameters\n\n- **x** - Numeric, string, or timestamp expression\n\n#### Description\n\nThe `max()` function returns the largest value of the given\nexpression across all events that participated in the match. For\nnumeric values it returns the numerically largest, for strings\nit uses lexicographic ordering, and for timestamps it returns the\nlatest. The return type matches the input type.\n",
   "function-reference/mathematical-functions.md": "# Mathematical Functions\n\nScalar functions for mathematical operations and calculations that can be used in any expression context.\n\n## `abs(x)`\n\nReturns the absolute value of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `abs()` function returns the absolute value (magnitude) of the input number,\nremoving any negative sign. For positive numbers and zero, it returns the value\nunchanged. For negative numbers, it returns the positive equivalent.\n\n## `cbrt(x)`\n\nReturns the cube root of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `cbrt()` function calculates the cube root of the input value. The result\nis always returned as a double-precision floating-point number. Unlike square\nroot, cube root is defined for negative numbers.\n\n## `ceil(x)` / `ceiling(x)`\n\nRounds a number up to the nearest integer.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `ceil()` and `ceiling()` functions round the input value up to the nearest\ninteger. For positive numbers, this means rounding away from zero. For negative\nnumbers, this means rounding toward zero. Both function names are equivalent.\n\n## `degrees(x)`\n\nConverts radians to degrees.\n\n### Parameters\n\n- **x** - Numeric expression representing an angle in radians\n\n### Description\n\nThe `degrees()` function converts an angle from radians to degrees. The result\nis always returned as a double-precision floating-point number. The conversion\nuses the formula: degrees = radians \xD7 (180/\u03C0).\n\n## `e()`\n\nReturns Euler's number (mathematical constant e).\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `e()` function returns the mathematical constant e (approximately 2.71828),\nwhich is the base of natural logarithms. The result is returned as a\ndouble-precision floating-point number.\n\n## `exp(x)`\n\nReturns e raised to the power of x.\n\n### Parameters\n\n- **x** - Numeric expression representing the exponent\n\n### Description\n\nThe `exp()` function calculates e^x, where e is Euler's number. This is the\nexponential function, which is the inverse of the natural logarithm. The result\nis always returned as a double-precision floating-point number.\n\n## `floor(x)`\n\nRounds a number down to the nearest integer.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `floor()` function rounds the input value down to the nearest integer. For\npositive numbers, this means rounding toward zero. For negative numbers, this\nmeans rounding away from zero.\n\n## `ln(x)`\n\nReturns the natural logarithm of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `ln()` function calculates the natural logarithm (base e) of the input\nvalue. The input must be positive; negative values or zero will result in an\nerror. The result is always returned as a double-precision floating-point number.\n\n## `log(b, x)`\n\nReturns the logarithm of x with the specified base.\n\n### Parameters\n\n- **b** - Numeric expression representing the logarithm base\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `log()` function calculates the logarithm of x using the specified base b.\nBoth the base and the value must be positive. The result is always returned as\na double-precision floating-point number.\n\n## `log10(x)`\n\nReturns the base-10 logarithm of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `log10()` function calculates the common logarithm (base 10) of the input\nvalue. The input must be positive; negative values or zero will result in an\nerror. The result is always returned as a double-precision floating-point number.\n\n## `log2(x)`\n\nReturns the base-2 logarithm of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `log2()` function calculates the binary logarithm (base 2) of the input\nvalue. The input must be positive; negative values or zero will result in an\nerror. The result is always returned as a double-precision floating-point number.\n\n## `pi()`\n\nReturns the mathematical constant \u03C0 (pi).\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `pi()` function returns the mathematical constant \u03C0 (approximately 3.14159),\nwhich represents the ratio of a circle's circumference to its diameter. The\nresult is returned as a double-precision floating-point number.\n\n## `pow(x, p)` / `power(x, p)`\n\nRaises a number to the specified power.\n\n### Parameters\n\n- **x** - Numeric expression representing the base\n- **p** - Numeric expression representing the exponent\n\n### Description\n\nThe `pow()` and `power()` functions calculate x raised to the power of p (x^p).\nBoth function names are equivalent. The result is always returned as a\ndouble-precision floating-point number.\n\n## `radians(x)`\n\nConverts degrees to radians.\n\n### Parameters\n\n- **x** - Numeric expression representing an angle in degrees\n\n### Description\n\nThe `radians()` function converts an angle from degrees to radians. The result\nis always returned as a double-precision floating-point number. The conversion\nuses the formula: radians = degrees \xD7 (\u03C0/180).\n\n## `round(x)` / `round(x, d)`\n\nRounds a number to the nearest integer or specified decimal places.\n\n### Parameters\n\n- **x** - Numeric expression to round\n- **d** (optional) - Integer specifying the number of decimal places\n\n### Description\n\nThe `round()` function rounds the input value to the nearest integer when used\nwith one parameter, or to the specified number of decimal places when used with\ntwo parameters. The rounding follows standard mathematical rules (0.5 rounds up).\n\n## `sign(x)`\n\nReturns the sign of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `sign()` function returns -1 for negative numbers, 0 for zero, and 1 for\npositive numbers. This function helps determine the sign of a value without\nregard to its magnitude.\n\n## `sqrt(x)`\n\nReturns the square root of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be non-negative)\n\n### Description\n\nThe `sqrt()` function calculates the square root of the input value. The input\nmust be non-negative; negative values will result in an error. The result is\nalways returned as a double-precision floating-point number.\n\n## `truncate(x)`\n\nRemoves the fractional part of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `truncate()` function removes the fractional part of a number, effectively\nrounding toward zero. For positive numbers, this is equivalent to `floor()`.\nFor negative numbers, this is equivalent to `ceil()`.\n\n## `width_bucket(x, bound1, bound2, n)`\n\nReturns the bucket number for a value in a histogram with equal-width buckets.\n\n### Parameters\n\n- **x** - Numeric expression representing the value to bucket\n- **bound1** - Numeric expression representing the lower bound\n- **bound2** - Numeric expression representing the upper bound  \n- **n** - Integer expression representing the number of buckets\n\n### Description\n\nThe `width_bucket()` function determines which bucket a value falls into when\ndividing the range between bound1 and bound2 into n equal-width buckets. Values\noutside the bounds return 0 (below bound1) or n+1 (above bound2).\n\n## `width_bucket(x, bins)`\n\nReturns the bucket number for a value using explicitly defined bucket boundaries.\n\n### Parameters\n\n- **x** - Numeric expression representing the value to bucket\n- **bins** - Array of numeric values representing bucket boundaries\n\n### Description\n\nThe `width_bucket()` function determines which bucket a value falls into using\nan array of explicitly defined bucket boundaries. The function returns the\nindex of the bucket where the value belongs, with 0 for values below the\nlowest boundary and array length + 1 for values above the highest boundary.",
   "function-reference/network-functions.md": "# Network functions\n\nFunctions for working with IP addresses, CIDR ranges, and network operations.\n\n## `cidr_contains(cidr, ip)`\n\nTests whether an IP address falls within a CIDR range.\n\n### Parameters\n\n- **cidr** - String expression representing a CIDR range (e.g., \"192.168.1.0/24\" or \"2001:db8::/32\")\n- **ip** - String expression representing an IP address to test\n\n### Returns\n\n- `true` - The IP address is within the CIDR range\n- `false` - The IP address is outside the CIDR range\n- `null` - Invalid CIDR format, invalid IP format, or either parameter is null\n\n### Description\n\nThe `cidr_contains()` function checks whether the specified IP address falls\nwithin the given CIDR range. It supports both IPv4 and IPv6 addresses and CIDR\nnotations. The function returns `null` for invalid inputs rather than raising\nan error, making it safe to use when processing untrusted network data.\n\n### Examples\n\nCheck if specific IP addresses fall within a CIDR range. The function works\nwith both IPv4 and IPv6 addresses, and returns `true` when the IP is within\nthe range, `false` when outside, or `null` for invalid inputs.\n\n```hamelin\ncidr_contains('192.168.1.0/24', '192.168.1.100')\n# Returns: true\n\ncidr_contains('2001:db8::/32', '2001:db8::1')\n# Returns: true\n\ncidr_contains('10.0.0.0/8', '192.168.1.1')\n# Returns: false\n\ncidr_contains('not_a_cidr', '192.168.1.1')\n# Returns: null\n```\n\nFilter security events to show only traffic originating from internal network\nranges. This pattern is common in security analytics when identifying which\nevents came from inside versus outside the network perimeter.\n\n```hamelin\nFROM security_events\n| WHERE cidr_contains('10.0.0.0/8', source_ip)\n| SELECT timestamp, source_ip, action\n```\n\n## `is_ipv4(ip)`\n\nTests whether a string is a valid IPv4 address.\n\n### Parameters\n\n- **ip** - String expression to test\n\n### Returns\n\n- `true` - The string is a valid IPv4 address\n- `false` - The string is a valid IPv6 address\n- `null` - The string is not a valid IP address, empty string, or null\n\n### Description\n\nThe `is_ipv4()` function validates whether the input string represents a valid\nIPv4 address. It returns `true` for valid IPv4 addresses, `false` for valid\nIPv6 addresses, and `null` for invalid or non-IP strings. This three-valued\nlogic distinguishes between \"definitely IPv4\", \"definitely not IPv4 but valid\nIPv6\", and \"not a valid IP at all\".\n\nNote that IPv4-mapped IPv6 addresses (like `::ffff:192.168.1.1`) are\nnormalized to their IPv4 representation and return `true`, since they\nrepresent IPv4 addresses.\n\n### Examples\n\nTest individual IP addresses to determine their type. The function returns\n`true` for IPv4, `false` for IPv6, and `null` for anything that isn't a valid\nIP address.\n\n```hamelin\nis_ipv4('192.168.1.100')\n# Returns: true\n\nis_ipv4('2001:db8::1')\n# Returns: false\n\nis_ipv4('not_an_ip')\n# Returns: null\n```\n\nFilter network logs to show only events with IPv4 source addresses. This is\nuseful when analyzing traffic patterns or building reports that need to\nseparate IPv4 and IPv6 connections.\n\n```hamelin\nFROM network_logs\n| WHERE is_ipv4(source_ip)\n| SELECT timestamp, source_ip, destination_ip\n```\n\nCount how many connections use IPv4 versus IPv6. The AGG command groups by\nthe result of `is_ipv4()`, creating separate counts for true (IPv4), false\n(IPv6), and null (invalid addresses).\n\n```hamelin\nFROM network_logs\n| AGG ipv4_count = count() BY is_ipv4(source_ip)\n```\n\n## `is_ipv6(ip)`\n\nTests whether a string is a valid IPv6 address.\n\n### Parameters\n\n- **ip** - String expression to test\n\n### Returns\n\n- `true` - The string is a valid IPv6 address\n- `false` - The string is a valid IPv4 address\n- `null` - The string is not a valid IP address, empty string, or null\n\n### Description\n\nThe `is_ipv6()` function validates whether the input string represents a valid\nIPv6 address. It returns `true` for valid IPv6 addresses, `false` for valid\nIPv4 addresses, and `null` for invalid or non-IP strings. This three-valued\nlogic distinguishes between \"definitely IPv6\", \"definitely not IPv6 but valid\nIPv4\", and \"not a valid IP at all\".\n\nNote that IPv4-mapped IPv6 addresses (like `::ffff:192.168.1.1`) are\nnormalized to their IPv4 representation and return `false`, since they\nultimately represent IPv4 addresses.\n\n### Examples\n\nTest individual IP addresses to determine their type. The function recognizes\nall IPv6 formats including compressed notation and the loopback address.\n\n```hamelin\nis_ipv6('2001:db8::1')\n# Returns: true\n\nis_ipv6('::1')\n# Returns: true\n\nis_ipv6('192.168.1.100')\n# Returns: false\n\nis_ipv6('not_an_ip')\n# Returns: null\n```\n\nFilter network logs to show only events with IPv6 source addresses. This lets\nyou analyze IPv6-specific traffic or build reports segmented by IP version.\n\n```hamelin\nFROM network_logs\n| WHERE is_ipv6(source_ip)\n| SELECT timestamp, source_ip, destination_ip\n```\n\nClassify IP addresses by version using a `case` expression. The LET command\ncreates a new field that labels each address as IPv4, IPv6, or Invalid based\non validation results, then aggregates the counts.\n\n```hamelin\nFROM network_logs\n| LET ip_version = case(\n    is_ipv4(source_ip): \"IPv4\",\n    is_ipv6(source_ip): \"IPv6\",\n    \"Invalid\"\n  )\n| AGG count() BY ip_version\n```\n\n## Working with network data\n\nDetect external connections originating from internal hosts by combining\nmultiple CIDR checks. The first WHERE filters for internal source addresses,\nwhile the second excludes internal destinations, leaving only outbound\nconnections to external networks.\n\n```hamelin\nFROM network_connections\n| WHERE cidr_contains('10.0.0.0/8', source_ip)\n| WHERE NOT cidr_contains('10.0.0.0/8', destination_ip)\n| SELECT\n    timestamp,\n    source_ip,\n    destination_ip,\n    is_ipv4(destination_ip) AS dest_is_ipv4\n```\n\nAnalyze DNS query patterns by IP version. The `case` expression classifies\nresolved IP addresses as A records (IPv4) or AAAA records (IPv6), then counts\nqueries by type to show adoption trends.\n\n```hamelin\nFROM dns_queries\n| LET query_type = case(\n    is_ipv4(resolved_ip): \"A\",\n    is_ipv6(resolved_ip): \"AAAA\",\n    \"OTHER\"\n  )\n| AGG query_count = count() BY query_type\n| SORT query_count DESC\n```\n",
   "function-reference/regular-expression-functions.md": "# Regular Expression Functions\n\nScalar functions for pattern matching and advanced text processing using regular expressions.\n\n## `regexp_count(string, pattern)`\n\nCounts the number of times a regular expression pattern matches in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n\n### Description\n\nThe `regexp_count()` function returns the number of non-overlapping matches of\nthe specified regular expression pattern within the input string. If no matches\nare found, it returns 0. The pattern uses standard regular expression syntax.\n\n## `regexp_extract_all(string, pattern)` / `regexp_extract_all(string, pattern, group)`\n\nExtracts all matches of a regular expression pattern from a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **group** (optional) - Integer specifying which capture group to extract\n\n### Description\n\nThe `regexp_extract_all()` function returns an array containing all matches of\nthe specified pattern. When used with two parameters, it returns the entire\nmatch. When used with three parameters, it returns the specified capture group\nfrom each match.\n\nIf no matches are found, it returns an empty array. Capture groups are numbered\nstarting from 1, with 0 representing the entire match.\n\n## `regexp_extract(string, pattern)` / `regexp_extract(string, pattern, group)`\n\nExtracts the first match of a regular expression pattern from a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **group** (optional) - Integer specifying which capture group to extract\n\n### Description\n\nThe `regexp_extract()` function returns the first match of the specified pattern.\nWhen used with two parameters, it returns the entire match. When used with three\nparameters, it returns the specified capture group from the first match.\n\nIf no match is found, it returns null. Capture groups are numbered starting\nfrom 1, with 0 representing the entire match.\n\n## `regexp_like(string, pattern)`\n\nTests whether a string matches a regular expression pattern.\n\n### Parameters\n\n- **string** - String expression to test\n- **pattern** - String expression representing the regular expression pattern\n\n### Description\n\nThe `regexp_like()` function returns true if the input string contains a match\nfor the specified regular expression pattern, false otherwise. This function\ntests for the presence of a match anywhere within the string, not just at the\nbeginning or end.\n\n## `regexp_position(string, pattern)` / `regexp_position(string, pattern, start)` / `regexp_position(string, pattern, start, occurrence)`\n\nReturns the position of a regular expression match within a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **start** (optional) - Integer specifying the starting position for the search\n- **occurrence** (optional) - Integer specifying which occurrence to find\n\n### Description\n\nThe `regexp_position()` function returns the 1-based position of the first\ncharacter of a pattern match within the string. When used with the `start`\nparameter, it begins searching from that position. When used with the\n`occurrence` parameter, it finds the nth occurrence of the pattern.\n\nIf no match is found, it returns 0. The start position is 1-based, and the\noccurrence count begins at 1 for the first match.\n\n## `regexp_replace(string, pattern)` / `regexp_replace(string, pattern, replacement)`\n\nReplaces matches of a regular expression pattern in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **replacement** (optional) - String expression to replace matches with\n\n### Description\n\nThe `regexp_replace()` function replaces all matches of the specified pattern.\nWhen used with two parameters, it removes all matches (replaces with empty\nstring). When used with three parameters, it replaces matches with the\nspecified replacement string.\n\nThe replacement string can include capture group references using standard\nregular expression syntax. If no matches are found, the original string is\nreturned unchanged.\n\n## `regexp_split(string, pattern)`\n\nSplits a string using a regular expression pattern as the delimiter.\n\n### Parameters\n\n- **string** - String expression to split\n- **pattern** - String expression representing the regular expression pattern to use as delimiter\n\n### Description\n\nThe `regexp_split()` function splits the input string at each occurrence of\nthe specified pattern and returns an array of the resulting substrings. The\npattern matches are not included in the result array.\n\nIf the pattern is not found, the function returns an array containing the\noriginal string as a single element. If the pattern matches at the beginning\nor end of the string, empty strings may be included in the result array.",
-  "function-reference/string-functions.md": "# String Functions\n\nScalar functions for string processing and manipulation that can be used in any expression context.\n\n## `replace(string, pattern)`\n\nReplaces all occurrences of a pattern in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the text to replace\n\n### Description\n\nThe `replace()` function removes all occurrences of the specified pattern from\nthe input string. This function performs literal string replacement, not\npattern matching. If the pattern is not found, the original string is returned\nunchanged.\n\n## `starts_with(string, prefix)`\n\nTests whether a string starts with a specified prefix.\n\n### Parameters\n\n- **string** - String expression to test\n- **prefix** - String expression representing the prefix to check for\n\n### Description\n\nThe `starts_with()` function returns true if the input string begins with the\nspecified prefix, false otherwise. The comparison is case-sensitive. An empty\nprefix will always return true for any string.\n\n## `ends_with(string, suffix)`\n\nTests whether a string ends with a specified suffix.\n\n### Parameters\n\n- **string** - String expression to test\n- **suffix** - String expression representing the suffix to check for\n\n### Description\n\nThe `ends_with()` function returns true if the input string ends with the\nspecified suffix, false otherwise. The comparison is case-sensitive. An empty\nsuffix will always return true for any string.\n\n## `contains(string, substring)`\n\nTests whether a string contains a specified substring.\n\n### Parameters\n\n- **string** - String expression to search within\n- **substring** - String expression representing the text to search for\n\n### Description\n\nThe `contains()` function returns true if the input string contains the\nspecified substring anywhere within it, false otherwise. The comparison is\ncase-sensitive. An empty substring will always return true for any string.\n\n## `lower(string)`\n\nConverts a string to lowercase.\n\n### Parameters\n\n- **string** - String expression to convert\n\n### Description\n\nThe `lower()` function converts all uppercase characters in the input string\nto their lowercase equivalents. Characters that are already lowercase or\nnon-alphabetic characters remain unchanged.\n\n## `upper(string)`\n\nConverts a string to uppercase.\n\n### Parameters\n\n- **string** - String expression to convert\n\n### Description\n\nThe `upper()` function converts all lowercase characters in the input string\nto their uppercase equivalents. Characters that are already uppercase or\nnon-alphabetic characters remain unchanged.\n\n## `substr(string, start, end)`\n\nReturns a substring from start index to end index (exclusive).\n\n### Parameters\n\n- **string** - String expression to extract from\n- **start** - Integer expression for the starting index (0-based, supports negative indices)\n- **end** - Integer expression for the ending index (exclusive, supports negative indices)\n\n### Description\n\nThe `substr()` function returns a new string containing characters from the start\nindex up to but not including the end index. Both indices are 0-based and support\nnegative values, where -1 refers to the last character, -2 to the second-last, and\nso on. If start is greater than or equal to end, an empty string is returned. The\nfunction handles out-of-bounds indices gracefully by clamping them to valid string\nboundaries.\n\n## `len(string)`\n\nReturns the length of a string in characters.\n\n### Parameters\n\n- **string** - String expression to measure\n\n### Description\n\nThe `len()` function returns the number of characters in the input string.\nThis counts Unicode characters, not bytes, so multi-byte characters are\ncounted as single characters. An empty string returns 0.",
+  "function-reference/string-functions.md": '# String Functions\n\nScalar functions for string processing and manipulation that can be used in any expression context.\n\n## `replace(string, pattern)`\n\nReplaces all occurrences of a pattern in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the text to replace\n\n### Description\n\nThe `replace()` function removes all occurrences of the specified pattern from\nthe input string. This function performs literal string replacement, not\npattern matching. If the pattern is not found, the original string is returned\nunchanged.\n\n## `starts_with(string, prefix)`\n\nTests whether a string starts with a specified prefix.\n\n### Parameters\n\n- **string** - String expression to test\n- **prefix** - String expression representing the prefix to check for\n\n### Description\n\nThe `starts_with()` function returns true if the input string begins with the\nspecified prefix, false otherwise. The comparison is case-sensitive. An empty\nprefix will always return true for any string.\n\n## `ends_with(string, suffix)`\n\nTests whether a string ends with a specified suffix.\n\n### Parameters\n\n- **string** - String expression to test\n- **suffix** - String expression representing the suffix to check for\n\n### Description\n\nThe `ends_with()` function returns true if the input string ends with the\nspecified suffix, false otherwise. The comparison is case-sensitive. An empty\nsuffix will always return true for any string.\n\n## `contains(string, substring)`\n\nTests whether a string contains a specified substring.\n\n### Parameters\n\n- **string** - String expression to search within\n- **substring** - String expression representing the text to search for\n\n### Description\n\nThe `contains()` function returns true if the input string contains the\nspecified substring anywhere within it, false otherwise. The comparison is\ncase-sensitive. An empty substring will always return true for any string.\n\n## `lower(string)`\n\nConverts a string to lowercase.\n\n### Parameters\n\n- **string** - String expression to convert\n\n### Description\n\nThe `lower()` function converts all uppercase characters in the input string\nto their lowercase equivalents. Characters that are already lowercase or\nnon-alphabetic characters remain unchanged.\n\n## `upper(string)`\n\nConverts a string to uppercase.\n\n### Parameters\n\n- **string** - String expression to convert\n\n### Description\n\nThe `upper()` function converts all lowercase characters in the input string\nto their uppercase equivalents. Characters that are already uppercase or\nnon-alphabetic characters remain unchanged.\n\n## `substr(string, start, end)`\n\nReturns a substring from start index to end index (exclusive).\n\n### Parameters\n\n- **string** - String expression to extract from\n- **start** - Integer expression for the starting index (0-based, supports negative indices)\n- **end** - Integer expression for the ending index (exclusive, supports negative indices)\n\n### Description\n\nThe `substr()` function returns a new string containing characters from the start\nindex up to but not including the end index. Both indices are 0-based and support\nnegative values, where -1 refers to the last character, -2 to the second-last, and\nso on. If start is greater than or equal to end, an empty string is returned. The\nfunction handles out-of-bounds indices gracefully by clamping them to valid string\nboundaries.\n\n## `uuid()`\n\nGenerates a random UUID v4 string.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `uuid()` function returns a new random UUID (universally\nunique identifier) string each time you call it. The generated\nvalue follows the UUID v4 format (e.g.,\n`"550e8400-e29b-41d4-a716-446655440000"`). Each invocation\nproduces a different value.\n\n## `uuid5(name)`\n\nGenerates a deterministic UUID v5 string (RFC 4122).\n\n### Parameters\n\n- **name** - String expression to derive the UUID from\n\n### Description\n\nThe `uuid5()` function returns a deterministic UUID (universally unique\nidentifier) string derived from the input name. The generated value follows the\nUUID v5 format (e.g., `xxxxxxxx-xxxx-5xxx-8xxx-xxxxxxxxxxxx`). The same input\nalways produces the same output.\n\n## `len(string)`\n\nReturns the length of a string in characters.\n\n### Parameters\n\n- **string** - String expression to measure\n\n### Description\n\nThe `len()` function returns the number of characters in the input string.\nThis counts Unicode characters, not bytes, so multi-byte characters are\ncounted as single characters. An empty string returns 0.\n',
   "function-reference/time-date-functions.md": '# Time & Date Functions\n\nScalar functions for temporal data processing and manipulation that can be used in any expression context.\n\n## `now()`\n\nReturns the current timestamp.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `now()` function returns the current date and time as a timestamp. The\nexact timestamp represents the moment when the function is evaluated during\nquery execution. All calls to `now()` within the same query execution return\nthe same timestamp value.\n\n## `today()`\n\nReturns today\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `today()` function returns the current date with the time portion set to\nmidnight (00:00:00). This is equivalent to truncating `now()` to the day\nboundary. The result represents the start of the current day.\n\n## `yesterday()`\n\nReturns yesterday\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `yesterday()` function returns yesterday\'s date with the time portion set\nto midnight (00:00:00). This is equivalent to subtracting one day from `today()`.\nThe result represents the start of the previous day.\n\n## `tomorrow()`\n\nReturns tomorrow\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `tomorrow()` function returns tomorrow\'s date with the time portion set to\nmidnight (00:00:00). This is equivalent to adding one day to `today()`. The\nresult represents the start of the next day.\n\n## `ts(timestamp)`\n\nConverts a string to a timestamp.\n\n### Parameters\n\n- **timestamp** - String expression representing a timestamp\n\n### Description\n\nThe `ts()` function parses a string representation of a timestamp and converts\nit to a timestamp type. The function accepts various timestamp formats including\nISO 8601 format. If the string cannot be parsed as a valid timestamp, an error\nis raised.\n\n## `year(timestamp)`\n\nExtracts the year from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `year()` function extracts the year component from a timestamp and returns\nit as an integer. For example, a timestamp of "2023-07-15 14:30:00" would\nreturn 2023.\n\n## `month(timestamp)`\n\nExtracts the month from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `month()` function extracts the month component from a timestamp and returns\nit as an integer from 1 to 12, where 1 represents January and 12 represents\nDecember. For example, a timestamp of "2023-07-15 14:30:00" would return 7.\n\n## `day(timestamp)`\n\nExtracts the day of the month from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `day()` function extracts the day component from a timestamp and returns\nit as an integer from 1 to 31, depending on the month. For example, a timestamp\nof "2023-07-15 14:30:00" would return 15.\n\n## `day_of_week(timestamp)`\n\nExtracts the day of the week from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `day_of_week()` function extracts the ISO day of the week from a timestamp \nand returns it as an integer from 1 (Monday) to 7 (Sunday).\n\n## `hour(timestamp)`\n\nExtracts the hour from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `hour()` function extracts the hour component from a timestamp and returns\nit as an integer from 0 to 23, using 24-hour format. For example, a timestamp\nof "2023-07-15 14:30:00" would return 14.\n\n## `minute(timestamp)`\n\nExtracts the minute from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `minute()` function extracts the minute component from a timestamp and\nreturns it as an integer from 0 to 59. For example, a timestamp of\n"2023-07-15 14:30:00" would return 30.\n\n## `second(timestamp)`\n\nExtracts the second from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `second()` function extracts the second component from a timestamp and\nreturns it as an integer from 0 to 59. For example, a timestamp of\n"2023-07-15 14:30:45" would return 45.\n\n## `at_timezone(timestamp, timezone)`\n\nConverts a timestamp to a different timezone.\n\n### Parameters\n\n- **timestamp** - Timestamp expression to convert\n- **timezone** - String expression representing the target timezone\n\n### Description\n\nThe `at_timezone()` function converts a timestamp from its current timezone\nto the specified target timezone. The timezone parameter should be a valid\ntimezone identifier such as "UTC", "America/New_York", or "Europe/London".\nThe function returns a new timestamp representing the same moment in time\nbut expressed in the target timezone.\n\n## `to_millis(interval)`\n\nConverts an interval to milliseconds.\n\n### Parameters\n\n- **interval** - Interval expression to convert\n\n### Description\n\nThe `to_millis()` function converts an interval (duration) to its equivalent\nvalue in milliseconds as an integer. This is useful for calculations that\nrequire numeric representations of time durations. For example, an interval\nof "5 minutes" would return 300000 milliseconds.\n\n## `to_nanos(interval)`\n\nConverts an interval to nanoseconds.\n\n### Parameters\n\n- **interval** - Interval expression to convert\n\n### Description\n\nThe `to_nanos()` function converts an interval (duration) to its equivalent\nvalue in nanoseconds as an integer. This provides the highest precision for\ntime duration calculations. The function multiplies the millisecond value\nby 1,000,000 to get nanoseconds. For example, an interval of "1 second"\nwould return 1,000,000,000 nanoseconds.\n\n## `from_millis(millis)`\n\nCreates an interval from milliseconds.\n\n### Parameters\n\n- **millis** - Integer expression representing milliseconds\n\n### Description\n\nThe `from_millis()` function creates an interval from a millisecond value.\nThis is the inverse of `to_millis()`, allowing you to convert numeric\nmillisecond values back into interval types that can be used with timestamp\narithmetic. For example, `from_millis(5000)` creates an interval of 5 seconds.\n\n## `from_nanos(nanos)`\n\nCreates an interval from nanoseconds.\n\n### Parameters\n\n- **nanos** - Integer expression representing nanoseconds\n\n### Description\n\nThe `from_nanos()` function creates an interval from a nanosecond value.\nThis is the inverse of `to_nanos()`, converting numeric nanosecond values\ninto interval types. The function divides the nanosecond value by 1,000,000,000\nto convert to seconds. For example, `from_nanos(1500000000)` creates an\ninterval of 1.5 seconds.\n\n## `from_unixtime_seconds(seconds)`\n\nCreates a timestamp from Unix seconds.\n\n### Parameters\n\n- **seconds** - Integer expression representing seconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_seconds()` function converts a Unix timestamp (seconds\nsince January 1, 1970 UTC) into a timestamp type. This is commonly used\nwhen working with systems that store time as Unix timestamps. For example,\n`from_unixtime_seconds(1625097600)` returns the timestamp "2021-07-01 00:00:00".\n\n## `from_unixtime_millis(millis)`\n\nCreates a timestamp from Unix milliseconds.\n\n### Parameters\n\n- **millis** - Integer expression representing milliseconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_millis()` function converts Unix time in milliseconds\nto a timestamp. Many systems and APIs return timestamps as milliseconds\nsince the Unix epoch. This function handles the conversion by multiplying\nthe input by 1,000,000 to convert to nanoseconds internally. For example,\n`from_unixtime_millis(1625097600000)` returns "2021-07-01 00:00:00".\n\n## `from_unixtime_micros(micros)`\n\nCreates a timestamp from Unix microseconds.\n\n### Parameters\n\n- **micros** - Integer expression representing microseconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_micros()` function converts Unix time in microseconds\nto a timestamp. This provides microsecond precision for systems that require\nit. The function multiplies the input by 1,000 to convert to nanoseconds\ninternally. For example, `from_unixtime_micros(1625097600000000)` returns\n"2021-07-01 00:00:00".\n\n## `from_unixtime_nanos(nanos)`\n\nCreates a timestamp from Unix nanoseconds.\n\n### Parameters\n\n- **nanos** - Integer expression representing nanoseconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_nanos()` function converts Unix time in nanoseconds\ndirectly to a timestamp. This provides the highest precision for timestamp\nconversion and is useful when working with high-frequency data or systems\nthat track time at nanosecond granularity. For example,\n`from_unixtime_nanos(1625097600000000000)` returns "2021-07-01 00:00:00".\n\n## `to_unixtime(timestamp)`\n\nConverts a timestamp to Unix seconds.\n\n### Parameters\n\n- **timestamp** - Timestamp expression to convert\n\n### Description\n\nThe `to_unixtime()` function converts a timestamp to Unix time, returning\nthe number of seconds since January 1, 1970 UTC as a double-precision\nfloating-point number. The fractional part represents sub-second precision.\nThis is useful for interoperability with systems that expect Unix timestamps.\nFor example, the timestamp "2021-07-01 00:00:00" returns 1625097600.0.',
   "function-reference/window-functions.md": "# Window Functions\n\nFunctions for analytical operations over data windows that must be used with the `WINDOW` command.\n\n## `row_number()`\n\nReturns a sequential row number for each row within a window partition.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `row_number()` function assigns a unique sequential integer to each row\nwithin its window partition, starting from 1. The ordering is determined by\nthe `SORT` clause in the `WINDOW` command. Rows with identical sort values\nreceive different row numbers in an arbitrary but consistent order.\n\n## `rank()`\n\nReturns the rank of each row within a window partition with gaps.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `rank()` function assigns a rank to each row within its window partition\nbased on the `SORT` clause ordering. Rows with identical sort values receive\nthe same rank, and subsequent ranks are skipped. For example, if two rows tie\nfor rank 2, the next row receives rank 4 (not rank 3).\n\n## `dense_rank()`\n\nReturns the rank of each row within a window partition without gaps.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `dense_rank()` function assigns a rank to each row within its window\npartition based on the `SORT` clause ordering. Rows with identical sort values\nreceive the same rank, but subsequent ranks are not skipped. For example, if\ntwo rows tie for rank 2, the next row receives rank 3.\n\n## `lag(expression, offset, ignore_nulls)`\n\nReturns the value of an expression from a previous row within the window.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the previous row\n- **offset** - Integer specifying how many rows back to look\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `lag()` function retrieves the value of the specified expression from a\nrow that is `offset` positions before the current row within the window\npartition. When `ignore_nulls` is true, null values are skipped when counting\nthe offset. If there is no row at the specified offset, the function returns null.\n\n## `lead(expression, offset, ignore_nulls)`\n\nReturns the value of an expression from a subsequent row within the window.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the subsequent row\n- **offset** - Integer specifying how many rows ahead to look\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `lead()` function retrieves the value of the specified expression from a\nrow that is `offset` positions after the current row within the window\npartition. When `ignore_nulls` is true, null values are skipped when counting\nthe offset. If there is no row at the specified offset, the function returns null.\n\n## `first_value(expression, ignore_nulls)`\n\nReturns the first value of an expression within the window frame.\n\n### Parameters\n\n- **expression** - Expression to evaluate\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `first_value()` function returns the value of the specified expression from\nthe first row in the current window frame. When `ignore_nulls` is true, it\nreturns the first non-null value. The window frame is determined by the\n`WITHIN` clause in the `WINDOW` command.\n\n## `last_value(expression, ignore_nulls)`\n\nReturns the last value of an expression within the window frame.\n\n### Parameters\n\n- **expression** - Expression to evaluate\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `last_value()` function returns the value of the specified expression from\nthe last row in the current window frame. When `ignore_nulls` is true, it\nreturns the last non-null value. The window frame is determined by the\n`WITHIN` clause in the `WINDOW` command.\n\n## `nth_value(expression, n, ignore_nulls)`\n\nReturns the nth value of an expression within the window frame.\n\n### Parameters\n\n- **expression** - Expression to evaluate\n- **n** - Integer specifying which value to return (1-based)\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `nth_value()` function returns the value of the specified expression from\nthe nth row in the current window frame. When `ignore_nulls` is true, null\nvalues are not counted in the position. If there is no nth row, the function\nreturns null. The position is 1-based, where 1 represents the first row.\n\n## `cume_dist()`\n\nReturns the cumulative distribution of each row within the window partition.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `cume_dist()` function calculates the cumulative distribution of each row\nwithin its window partition. The result is the number of rows with values less\nthan or equal to the current row's value, divided by the total number of rows\nin the partition. Values range from 0 to 1.\n\n## `percent_rank()`\n\nReturns the percentile rank of each row within the window partition.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `percent_rank()` function calculates the percentile rank of each row within\nits window partition. The result is calculated as (rank - 1) / (total rows - 1),\nwhere rank is determined by the `SORT` clause ordering. Values range from 0 to 1,\nwith 0 representing the lowest value and 1 representing the highest.",
   "introduction.md": "# Introducing Hamelin\n\nHamelin is a **pipe-based query language** for **event analytics** which targets\nthe specific challenges detection engineers face when analyzing security events.\nThe language makes event correlation straightforward, letting you define\npatterns, correlate them across time windows, and match ordered sequences of\nevents.\n\n## Key Features\n\n### \u{1F504} Pipe-Based\n\nYou write queries that read naturally from top to bottom. Each operation\nconnects to the next using the pipe operator `|`. Pipe-based languages let you\nbuild queries incrementally, making them easier to read, write, and test than\napproaches that rely heavily on nested subqueries.\n\n```hamelin\nFROM events\n| WHERE event.action == 'login'\n| WITHIN -1hr\n| SELECT user.email, timestamp\n```\n\n### \u{1F550} Event-Native\n\nHamelin offers shorthand for working with timestamped events. Time intervals are\nwritten as simple expressions that match how you think about time. You can\nreference relative timestamps and truncate them to specific boundaries.\n\n```hamelin\n// Reference relative time\n| WITHIN -15m    // events within the last 15 minutes\n| WITHIN -1h     // events within the last hour\n| WITHIN -7d     // events within the last 7 days\n\n// Truncate timestamps to boundaries\n| SELECT timestamp@h    // truncate to hour boundary\n| SELECT timestamp@d    // truncate to day boundary\n```\n\n### \u{1FA9F} Sliding Windows\n\nSliding windows move continuously with each event, giving you insights without\ngaps or duplicates. You can aggregate data over these moving time windows to\ndetect patterns as they happen.\n\n```hamelin\nFROM events\n| WHERE event.action == 'login'\n| WINDOW count()\n  BY user.id\n  WITHIN -15m\n```\n\n### \u{1F3AF} Correlation of Named Subqueries\n\nNamed subqueries let you define specific event patterns and correlate them\nwithin sliding windows. You can drop these patterns into sliding windows and\nwrite correlations around them. Hamelin makes it straightforward to aggregate\nover specific patterns while also aggregating over the entire group of events.\n\n```hamelin\nWITH failed_logins = FROM events\n| WHERE event.action == 'login_failed'\n\nWITH successful_logins = FROM events\n| WHERE event.action == 'login_success'\n\nFROM failed = failed_logins, success = successful_logins\n| WINDOW failures = count(failed),\n         successes = count(success),\n         total = count(),\n  BY user.id\n  WITHIN -5m\n| WHERE successes >= 1 && failures / total > 0.2\n```\n\nThis query demonstrates correlating failed and successful login events to detect\nbrute force attacks. Named subqueries define distinct event patterns:\n`failed_logins` filters to login failure events while `successful_logins`\nfilters to login success events. The sliding window aggregates these patterns by\nuser over 5-minute periods, counting failures, successes, and total events. The\nfinal filter identifies users who had at least one successful login where failed\nattempts represent more than 20% of their total login activity within that\nwindow.\n\n### \u{1F50D} Ordered Matching of Named Subqueries\n\nYou can ask Hamelin to match ordered patterns across events. Aggregations over sliding windows work well for many use cases, but others require that you search for specific events followed by other specific events. You can do that in Hamelin using regular expression quantifiers applied to named subqueries.\n\n```hamelin\nWITH failed_logins = FROM events\n| WHERE event.action == 'login_failed'\n\nWITH successful_logins = FROM events\n| WHERE event.action == 'login_success'\n\nMATCH failed_logins{10,} successful_logins+ WITHIN 10m\n```\n\nThis searches for 10 failed logins followed by at least one successful login, \nwith the entire sequence completing within a 10 minute period. The sliding \nwindow approach might miss attack patterns where timing and sequence matter, \nbut ordered matching can detect the exact progression of a brute force attack.\n\n### \u{1F517} Event Type Expansion\n\nYou can query across different event types without worrying about schema\ndifferences. Hamelin automatically sets missing fields to `null` when they don't\nexist in a particular event type.\n\n```hamelin\nFROM login_events, logout_events, error_events\n// Filters by user.email when if this field exists in a row.\n// Drops rows where this field does not exist\n//   (because NULL does not equal any string).\n| WHERE user.email == 'john@example.com'\n```\n\n### \u{1F5C2}\uFE0F Structured Types\n\nHamelin supports structured types like structs, arrays, and maps to represent\ncomplex data. These types make data modeling more familiar, and reduce the need\nto rely too much on joins in analytic queries.\n\n```hamelin\n// Create struct literals with nested data\nLET login_metadata = {\n  ip_address: '192.168.1.100',\n  user_agent: 'Mozilla/5.0',\n  location: 'San Francisco'\n}\n\n// Access nested fields using dot notation\n| WHERE login_metadata.ip_address != '192.168.1.100'\n\n// Use arrays to store multiple related values\n| LET failed_attempts = [\n    {timestamp: '2024-01-15T14:25:00Z', reason: 'invalid_password'},\n    {timestamp: '2024-01-15T14:27:00Z', reason: 'account_locked'}\n  ]\n\n// Use maps when key data is high cardinality\n// Using structs for this use case creates too many columns.\n| LET host_metrics = map(\n    'web-server-01': {cpu: 85.2, memory: 72.1},\n    'web-server-02': {cpu: 91.7, memory: 68.9},\n    'db-primary-01': {cpu: 67.3, memory: 89.4}\n  )\n\n// Look up map values using index notation\n| WHERE host_metrics['web-server-01'].cpu > 80\n```\n\n### \u{1F4E1} Array Broadcasting\n\nHamelin makes working with arrays simpler by offering broadcasting, which helps\nyou distribute operations over each member of an array. It does this when you\napply an operation to an array that makes more sense to be applied to each of\nits members. Broadcasting lets you work with arrays using simple, familiar\nsyntax without asking you to resort to functional programming or inefficient\nunnesting.\n\n```hamelin\n| WHERE any(failed_attempts.reason == 'invalid_password')\n```\n\nThis example demonstrates how the equality operator `==` broadcasts across the\n`reason` field of each element in the `failed_attempts` array. This example\ndemonstrates *two* broadcasts:\n\n  * first, the lookup of the `reason` field changes an array-of-struct into an\n    array-of-string\n  * second, applying equality to the resulting array applies it to each member\n\nHamelin can do this automatically because it is type-aware. It knows that\ncomparing equality between `array(string)` and `string` makes more sense to\nbroadcast: an array can never be equal to a string, but a member of an\n`array(string)` might be.\n\n### \u{1F500} Semi-Structured Types\n\nHamelin lets you parse json into instances of the `variant` type. This helps you\nhandle semi-structured data that doesn't fit nicely into fixed schemas. You can\nparse JSON strings, access their fields, and convert them to more structured\ntypes. This makes working with JSON feel fairly native.\n\n```hamelin\n// Parse JSON strings into the variant type\nFROM logs\n| LET event_data = parse_json(raw_json)\n\n// Access nested fields using dot notation\n| WHERE event_data.level AS string == 'ERROR'\n\n// Access json array elements with index notation\n| LET first_tag = event_data.tags[0]\n\n// Cast variant data to structured types when you need type safety.\n// Values that do not match will be null.\n| LET user_info = event_data.user AS {id: int, name: string}\n```\n\n### \u{1F6A8} Excellent Error Messages\n\nHamelin provides clear, helpful error messages. Error messages\npoint directly to the problematic Hamelin code and explain exactly what went\nwrong, rather than showing cryptic messages about generated SQL.\n\nThis matters especially when AI assistants write queries. AI tools need precise\ndescriptions of errors to fix queries and complete tasks. Clear error messages\nlet AI assistants debug queries effectively by giving the context needed to\ncorrect mistakes.\n\n```hamelin\nFROM simba.sysmon_events\n| AGG count() BY host.hostname\n| LET hostname = lower(host.hostname)\n```\n\ngenerates the error\n\n```\nError: problem doing translation\n   \u256D\u2500[ :3:24 ]\n   \u2502\n 3 \u2502 | LET hostname = lower(host.hostname)\n   \u2502                        \u2500\u2500\u252C\u2500\n   \u2502                          \u2570\u2500\u2500\u2500 error while translating\n   \u2502\n   \u2502 Note: unbound column reference: host\n   \u2502\n   \u2502       the following entries in the environment are close:\n   \u2502       - `host.hostname` (you must actually wrap with ``)\n\u2500\u2500\u2500\u256F\n```\n\nHere, the user has forgotten to escape an identifier that contains a dot character.\n\n```hamelin\nFROM simba.sysmon_events\n| WINDOW count(),\n         all(winlog.event_data.events)\n  BY host.hostname\n```\n\ngenerates the error\n\n```\nError: problem doing translation\n   \u256D\u2500[ :3:10 ]\n   \u2502\n 3 \u2502          all(winlog.event_data.events)\n   \u2502          \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252C\u2500\u252C\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n   \u2502                        \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 could not find a matching function definition\n   \u2502                          \u2502\n   \u2502                          \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 variant\n   \u2502\n   \u2502 Note: Attempted all(x=boolean)\n   \u2502         - Type mismatch for x: expected boolean, got variant\n   \u2502\n   \u2502       Attempted all(x=array(boolean))\n   \u2502         - Type mismatch for x: expected array(boolean), got variant\n   \u2502\n\u2500\u2500\u2500\u256F\n```\n\nHere, the user has forgotten to cast variant to a primitive type so that it can\nbe matched against the function call. (A future version of Hamelin will probably\ncoerce this automatically!)\n",
   "language-basics/aggregation.md": "# AGG: performing ordinary aggregation\n\nThe `AGG` command groups and aggregates datasets to create summary statistics\nand analytical insights. You can analyze user behavior patterns, system\nperformance metrics, or security events by grouping related records together and\napplying mathematical functions to each group.\n\n## AGG syntax\n\nThe `AGG` command follows a simple pattern that groups data and applies aggregation functions to each group:\n\n```hamelin\nAGG result = function(expression), ... BY grouping_expression, ...\n```\n\nWhen you omit the `BY` clause, Hamelin aggregates all records into a single group. This calculates overall dataset statistics and global metrics that span all records, counting all events across the entire dataset without any grouping or partitioning:\n\n```hamelin\nFROM events\n| AGG total_events = count()\n```\n\nWhen you omit explicit column names, Hamelin generates them automatically from\nthe expressions you provide. Learn more about this feature in [Automatic Field\nNames](../smart-features/automatic-field-names.md). This creates columns named\n`count()` and `avg(response_time)` that you can reference using backticks in\nsubsequent commands:\n\n```hamelin\nFROM requests\n| AGG count(), avg(response_time) BY service_name\n```\n\nWhen you omit aggregation functions entirely, you get distinct groups without any calculations. This returns the unique combinations of event_type and user_id without performing any mathematical operations:\n\n```hamelin\nFROM events\n| AGG BY event_type, user_id\n```\n\nYou can also rename columns in the BY clause and use any expression for grouping. This example groups by renamed event_type, truncated timestamp, and extracted email domain, creating clear column names for downstream analysis:\n\n```hamelin\nFROM events\n| AGG\n    total_events = count(),\n    avg_duration = avg(duration)\n  BY event_category = event_type,\n     hour_bucket = timestamp@hr,\n     user_domain = split(email, '@')[1]\n```\n\n## Simple aggregation examples\n\n### Basic counting\n\nEvent counting groups events by their characteristics and calculates how many events fall into each category. Notice that Hamelin uses `count()` with no arguments, not `count(*)` like SQL. The empty parentheses count all rows in each group, providing a clean syntax for the most common aggregation operation:\n\n```hamelin\nFROM events\n| AGG event_count = count() BY event_type\n```\n\n### Multiple aggregations\n\nCalculating several metrics at once in a single `AGG` command ensures all metrics use consistent grouping logic:\n\n```hamelin\nFROM requests\n| AGG\n    total_requests = count(),\n    avg_response_time = avg(response_time),\n    max_response_time = max(response_time),\n    error_count = count_if(status_code >= 400)\n  BY service_name\n```\n\n### Conditional aggregation\n\nConditional aggregation functions like `count_if()` let you count only rows that meet specific conditions without pre-filtering the dataset. Conditional aggregation maintains the full context of each group while applying different filters to different calculations:\n\n```hamelin\nFROM auth_logs\n| AGG\n    failures = count_if(outcome == 'FAILURE'),\n    successes = count_if(outcome == 'SUCCESS')\n  BY user_name\n```\n\n## Time series aggregations\n\nTime series aggregations combine time truncation with grouping to create time-based buckets for temporal analysis. Time-based grouping creates time-bucketed summaries for monitoring system performance, tracking business metrics, and understanding user behavior patterns across different time scales.\n\n### Hourly summaries\n\nHourly aggregations provide detailed views of system activity and user behavior throughout the day:\n\n```hamelin\nFROM logs\n| AGG\n    hourly_events = count(),\n    avg_response = avg(response_time),\n    error_rate = count_if(status >= 400) / count()\n  BY timestamp@hr\n| SORT timestamp@hr\n```\n\n### Daily trends\n\nDaily aggregations reveal longer-term trends and enable comparison across different time periods:\n\n```hamelin\nFROM events\n| WITHIN -30d..now()\n| AGG\n    daily_events = count(),\n    unique_users = count_distinct(user_name),\n    high_severity = count_if(severity = 'HIGH')\n  BY timestamp@d\n| SORT timestamp@d DESC\n```\n",
   "language-basics/five-core-commands.md": "# Five core commands\n\nHamelin uses five core commands to handle basic data operations: `FROM`,\n`WHERE`, `LET`, `SELECT`, and `DROP`. Each command operates narrowly and serves\na specific purpose. You combine these commands using the pipe character `|`.\nThese core commands serve the same purpose as SQL clauses, but you can use them\nin any order, with each command feeding its output to the next.\n\n## Command reference\n\n### `FROM` - Access the rows of a dataset\n\nThe `FROM` command pulls rows from one or more datasets. You start most queries with this command to specify the data sources.\n\nPull rows from the events dataset:\n\n```hamelin\nFROM events\n```\n\nPull rows from both the users and orders datasets:\n\n```hamelin\nFROM users, orders\n```\n\nThis operation translates to a SQL `UNION ALL`, not a join. It pulls all rows from both sources without performing any filtering or row alignment.\n\nWhen you query multiple sources, Hamelin automatically expands types to accommodate all fields from both datasets. Fields with the same name get aligned, while unique fields are added with `NULL` values for rows that don't contain them. Learn more about how this works in [Type Expansion](../smart-features/type-expansion.md).\n\nYou can organize results from multiple datasets by grouping fields into separate\nsections. This lets you keep track of which data comes from which source:\n\n```hamelin\nFROM\n  allows = events.access_allowed,\n  denys = events.access_denied\n```\n\nThe `allows` field becomes a struct containing all fields from\n`events.access_allowed` (set to null for rows from `events.access_denied`). The\n`denys` field works the same way for `events.access_denied`. All other\nfields are aligned normally. This makes it easy to both reference a field's\nspecific lineage and to ignore lineage when you don't need it.\n\n\n### `WHERE` - Filter rows\n\nThe `WHERE` command filters rows based on conditions you specify. You can stack\nmultiple `WHERE` commands to apply multiple filters.\n\nOnly query rows whose action is 'login':\n\n```hamelin\nFROM events\n| WHERE event.action == 'login'\n```\n\nThis has the same effect as using `AND` between each condition:\n\n```hamelin\nFROM users\n| WHERE user.role == 'admin'\n| WHERE status == 'active'\n```\n\n### `LET` - Add or modify columns\n\nThe `LET` command adds new columns or modifies existing ones without affecting\nother columns. This lets you create calculated fields and enrich datasets as\nyou build queries.\n\nCreate a full name by concatenating first and last names:\n\n```hamelin\nFROM users\n| LET full_name = user.first_name + ' ' + user.last_name\n```\n\nCalculate how many days ago an event occurred:\n\n```hamelin\nFROM events\n| LET days_ago = (now() - timestamp) / 1d\n```\n\nAdd a severity level based on the event action:\n\n```hamelin\nFROM events\n| LET severity = if(event.action == 'login_failed', 'high', 'low')\n```\n\nYou can set nested fields within existing structs to modify specific parts of records:\n\n```hamelin\nFROM events\n| LET user.display_name = user.first_name + ' ' + user.last_name\n```\n\nYou can also create entirely new nested structs by setting multiple nested fields:\n\n```hamelin\nFROM events\n| LET metadata.processed_at = now()\n| LET metadata.version = '2.1'\n```\n\nWhen creating new structs, using a struct literal is often more readable:\n\n```hamelin\nFROM events\n| LET metadata = {\n    processed_at: now(),\n    version: '2.1'\n  }\n```\n\n### `SELECT` - Define output columns\n\nThe `SELECT` command completely redefines what columns appear in the results.\nThis replaces all existing columns with only the ones you specify.\n\nSelect only the user's email, timestamp, and event action from the events dataset:\n\n```hamelin\nFROM events\n| SELECT user.email, timestamp, event.action\n```\n\nSelect existing columns and add new computed columns with custom names:\n\n```hamelin\nFROM metrics\n| SELECT count, avg_time, category = 'security'\n```\n\nCreate new columns using expressions and conditional logic:\n\n```hamelin\nFROM events\n| SELECT user_id, severity = if(error_count > 10, 'high', 'low')\n```\n\nTransform existing columns while selecting them:\n\n```hamelin\nFROM logs\n| SELECT timestamp, message, log_level = upper(level)\n```\n\nWhen you don't provide explicit field names in SELECT, Hamelin automatically\ngenerates sensible names based on your expressions. This reduces the need to\nthink of names for simple calculations. Learn more about this in [Automatic Field Names](../smart-features/automatic-field-names.md).\n\n### `DROP` - Remove columns\n\nThe `DROP` command removes specific columns from the results. This is useful\nwhen you want to exclude sensitive data or reduce the size of the output.\n\nRemove unwanted columns from the dataset:\n\n```hamelin\nFROM events\n| DROP internal_id, debug_info\n```\n\n## Basic example\n\nThis example shows how you can combine the core commands to build a common query.\n\n```hamelin\nFROM events\n| WHERE event.action != null\n| LET days_ago = (now() - timestamp) / 1d\n| DROP debug_data, internal_flags\n```\n\nThis example demonstrates how the core commands work together in a typical\nworkflow. You start by pulling rows from the events dataset, filter out records\nwith missing action data, add a calculated field for how many days ago each\nevent occurred, and remove unwanted debug column\n\n```hamelin\nFROM events\n| WHERE event.action != null\n| LET days_ago = (now() - timestamp) / 1d\n| SELECT event.action, timestamp, days_ago\n```\n\nThis example shows a similar pattern. Rather than dropping specific columns, you\nselect only the ones you want to keep in the final output.\n",
   "language-basics/join-combining-datasets.md": '# `JOIN` and `LOOKUP`: Combining datasets\n\nIn event analytics, event records are often narrow and require enrichment with\nadditional context. You use the `JOIN` and `LOOKUP` commands to do this\nenrichment. Hamelin gives you most of the power of SQL joins - you get inner\njoin behavior (with `JOIN`) and left outer join behavior (with `LOOKUP`).\n\n## Basic syntax\n\nJoin your main dataset with additional data by specifying a matching condition. The `ON` clause defines how records from both datasets should be linked together:\n\n```hamelin\n| JOIN other_dataset ON field_name == other_dataset.field_name\n```\n\nUse `LOOKUP` when you want to keep all your original records, even if some don\'t have matches in the second dataset:\n\n```hamelin\n| LOOKUP other_dataset ON field_name == other_dataset.field_name\n```\n\n## Nested results\n\nWhen you join datasets, Hamelin nests the joined data as a struct to prevent field name collisions. This structure keeps your original fields separate from the joined fields, making it clear which data came from which dataset.\n\nCombine user records with order data:\n\n```hamelin\nFROM users\n| WHERE timestamp > "2024-01-01"\n| JOIN orders ON user_id == orders.user_id\n```\n\nThis creates records where each user has an `orders` struct containing all the matched order information. Your original user fields remain at the top level, while order details are nested inside the `orders` structure.\n\n## Custom naming\n\nYou can control the name of the nested struct using assignment syntax. This makes your queries more readable when the default dataset name isn\'t descriptive:\n\n```hamelin\nFROM users\n| WHERE timestamp > "2024-01-01"\n| JOIN purchase_history = orders ON user_id == orders.user_id\n```\n\nNow the joined data appears under the more descriptive name `purchase_history` instead of the generic `orders` name.\n\n## Accessing joined fields\n\nYou access fields from the joined struct using dot notation. This lets you filter, select, or manipulate the joined data just like any other nested structure:\n\n```hamelin\nFROM users\n| WHERE timestamp > "2024-01-01"\n| JOIN orders ON user_id == orders.user_id\n| WHERE orders.total > 100\n```\n\nThis query finds users who have at least one order with a total over 100, demonstrating how you can filter on joined data.\n\n## Multiple joins\n\nYou can chain multiple `JOIN` operations to combine data from several datasets. Each join creates its own nested struct, letting you pull related information from multiple sources:\n\n```hamelin\nFROM transactions\n| WHERE amount > 1000\n| JOIN user_details = users ON user_id == users.id\n| JOIN account_info = accounts ON account_id == accounts.id\n| WHERE user_details.risk_score > 0.8\n```\n\nThis creates records where each transaction has both `user_details` and `account_info` structs, giving you access to related data from multiple datasets in a single query.\n\n## `JOIN` vs `LOOKUP`: Required vs optional matches\n\nThe key difference between `JOIN` and `LOOKUP` is how they handle missing matches. This choice determines whether you get only complete records or keep all your original data with optional enrichment.\n\n### `JOIN` requires matches\n\nWhen you use `JOIN`, only rows that have a match in both datasets appear in your results. Rows without matches get filtered out completely, giving you a dataset that only contains records with complete information.\n\nGet users who have placed orders:\n\n```hamelin\nFROM users\n| JOIN orders ON user_id == orders.user_id\n```\n\nThis returns only users who have actually placed orders. Users without any orders are excluded from the results entirely.\n\n### `LOOKUP` keeps all rows\n\nWhen you use `LOOKUP`, all rows from your main dataset stay in the results, regardless of whether they have matches. For rows without matches, the nested struct gets set to `null`, preserving your complete dataset while adding optional enrichment.\n\nGet all users and include their order information when available:\n\n```hamelin\nFROM users\n| LOOKUP orders ON user_id == orders.user_id\n```\n\nThis returns every user from your dataset. Users with orders get an `orders` struct containing their order data, while users without orders still appear with their `orders` field set to null.\n\n### When to use each\n\nUse `JOIN` when you only want records that have complete information from both datasets. Use `LOOKUP` when you want to preserve your entire main dataset and optionally enrich it with additional data that might not exist for every record.\n',
-  "language-basics/match-ordered-pattern-matching.md": "# MATCH: ordered pattern matching\n\nThe `MATCH` command finds specific sequences of events in your data. Pattern matching detects sequences like \"10 failed logins followed by a successful login\" or \"error events followed by restart events within 5 minutes.\" Unlike windowed aggregations, pattern matching requires that event patterns happen in a specific sequence.\n\n## Basic pattern matching\n\nThe `MATCH` command searches for ordered sequences using named subqueries and regular expression-style quantifiers. You define what events you're looking for, then specify the pattern and timing constraints. Create named subqueries for the events you want to match, then specify the sequence pattern. This example finds sequences where 5 or more failed logins are followed by at least one successful login:\n\n```hamelin\nWITH failed_logins =\n  FROM security_logs\n  | WHERE action == 'login_failed'\n\nWITH successful_logins =\n  FROM security_logs\n  | WHERE action == 'login_success'\n\nMATCH failed_logins{5,} successful_logins+\n```\n\n\n\n## Pattern quantifiers\n\nQuantifiers control how many of each event type to match. These work like regular expression quantifiers but apply to your named event datasets.\n\n### Exact counts\n\nSpecify exact numbers of events to match. This example finds exactly 3 error events followed by exactly 1 restart event:\n\n```hamelin\nWITH errors =\n  FROM system_logs\n  | WHERE level == 'ERROR'\n\nWITH restarts =\n  FROM system_logs\n  | WHERE action == 'service_restart'\n\nMATCH errors{3} restarts{1}\n```\n\n\n\n### Range quantifiers\n\nUse ranges to specify minimum and maximum counts. This example finds between 2 and 5 high-severity alerts followed by 1 or 2 acknowledgment events:\n\n```hamelin\nWITH alerts =\n  FROM monitoring\n  | WHERE severity == 'HIGH'\n\nWITH acknowledgments =\n  FROM monitoring\n  | WHERE action == 'acknowledge'\n\nMATCH alerts{2,5} acknowledgments{1,2}\n```\n\n\n\n### Open-ended quantifiers\n\nUse `+` for \"one or more\" and `*` for \"zero or more\". This example finds one or more failed requests followed by one or more successful requests:\n\n```hamelin\nWITH failed_requests =\n  FROM api_logs\n  | WHERE status_code >= 500\n\nWITH success_requests =\n  FROM api_logs\n  | WHERE status_code < 400\n\nMATCH failed_requests+ success_requests+\n```\n\n\n\n## Time constraints with WITHIN\n\nThe `WITHIN` clause adds constraints to patterns, measuring the distance from the first event to the last event in the matched sequence. For timestamp-based ordering, this represents a time window. For numeric ordering, this represents a numeric distance.\n\n### Time window constraints\n\nRequire that patterns complete within a specific time period. This example finds 10 or more failed logins followed by successful logins, but only when the entire sequence happens within 10 minutes:\n\n```hamelin\nWITH failed_logins =\n  FROM security_logs\n  | WHERE action == 'login_failed'\n\nWITH successful_logins =\n  FROM security_logs\n  | WHERE action == 'login_success'\n\nMATCH failed_logins{10,} successful_logins+ WITHIN 10m\n```\n\nThe `WITHIN` clause measures on the actual `SORT` column. When you don't specify a `SORT` clause, timestamp ordering is applied automatically. When using `WITHIN`, you must have exactly one `SORT` expression, and the `SORT` column type must be compatible with the `WITHIN` type:\n- `TIMESTAMP` columns work with `INTERVAL` (e.g., `5m`) or `CALENDAR_INTERVAL` (e.g., `1y`, `3mon`)\n- Numeric columns require matching numeric types (e.g., `INT` sort with `INT` within)\n\n### Numeric ordering with WITHIN\n\nYou can use `WITHIN` with numeric columns to constrain sequences by numeric distance rather than time:\n\n```hamelin\nWITH step_a =\n  FROM process_log\n  | WHERE step == 'A'\n\nWITH step_b =\n  FROM process_log\n  | WHERE step == 'B'\n\nMATCH step_a step_b SORT BY sequence_number WITHIN 100\n```\n\nThis finds sequences where step A is followed by step B, and the sequence numbers differ by at most 100. The `WITHIN` constraint measures `last(sequence_number) - first(sequence_number) <= 100`.\n\n### Using first() and last() functions\n\nThe `first()` and `last()` functions access the earliest and latest events in each matched group. This example finds CPU spikes followed by memory alerts within 15 minutes:\n\n```hamelin\nWITH cpu_spikes =\n  FROM metrics\n  | WHERE cpu_usage > 90\n\nWITH memory_alerts =\n  FROM metrics\n  | WHERE memory_usage > 85\n\nMATCH cpu_spikes{3,} memory_alerts+ WITHIN 15m\n```\n\n\n\n## Complex pattern examples\n\n### Security incident detection\n\nLook for suspicious login patterns that might indicate a brute force attack. This example detects external brute force attempts followed by successful logins and optional privilege escalations, all within 30 minutes:\n\n```hamelin\nWITH failed_logins =\n  FROM auth_logs\n  | WHERE outcome == 'FAILURE'\n  | WHERE source_ip NOT IN ('10.0.0.0/8', '192.168.0.0/16')\n\nWITH successful_logins =\n  FROM auth_logs\n  | WHERE outcome == 'SUCCESS'\n\nWITH privilege_escalations =\n  FROM audit_logs\n  | WHERE action == 'privilege_escalation'\n\nMATCH failed_logins{5,} successful_logins{1,3} privilege_escalations* WITHIN 30m\n```\n\n\n\n## When to use MATCH vs WINDOW\n\nThe key difference is that `WINDOW` performs unordered correlation while `MATCH` performs ordered correlation.\n\nWhen you pull multiple event patterns into a sliding window, you can aggregate each individual pattern or aggregate across all the patterns together. However, you cannot require that certain subpatterns happen before others\u2014the window treats all events within the time frame as unordered.\n\n`MATCH` specifies that certain events must happen before others in a specific sequence. Ordered correlation matters when the timing and sequence of events affects your analysis.\n\n**Use MATCH when order matters:**\n- Security attack sequences (failed logins \u2192 successful login \u2192 privilege escalation)\n- System failure cascades (errors \u2192 timeouts \u2192 circuit breaker trips)\n- User workflow analysis (page view \u2192 form submission \u2192 purchase)\n- Compliance violations (access \u2192 modification \u2192 deletion)\n",
+  "language-basics/match-ordered-pattern-matching.md": "# MATCH: ordered pattern matching\n\nThe `MATCH` command finds specific sequences of events in your data. Pattern matching detects sequences like \"10 failed logins followed by a successful login\" or \"error events followed by restart events within 5 minutes.\" Unlike windowed aggregations, pattern matching requires that event patterns happen in a specific sequence.\n\n## Basic pattern matching\n\nThe `MATCH` command searches for ordered sequences using named subqueries and regular expression-style quantifiers. You define what events you're looking for, then specify the pattern and timing constraints. Create named subqueries for the events you want to match, then specify the sequence pattern. This example finds sequences where 5 or more failed logins are followed by at least one successful login:\n\n```hamelin\nWITH failed_logins =\n  FROM security_logs\n  | WHERE action == 'login_failed'\n\nWITH successful_logins =\n  FROM security_logs\n  | WHERE action == 'login_success'\n\nMATCH failed_logins{5,} successful_logins+\n```\n\n\n\n## Pattern quantifiers\n\nQuantifiers control how many of each event type to match. These work like regular expression quantifiers but apply to your named event datasets.\n\n### Exact counts\n\nSpecify exact numbers of events to match. This example finds exactly 3 error events followed by exactly 1 restart event:\n\n```hamelin\nWITH errors =\n  FROM system_logs\n  | WHERE level == 'ERROR'\n\nWITH restarts =\n  FROM system_logs\n  | WHERE action == 'service_restart'\n\nMATCH errors{3} restarts{1}\n```\n\n\n\n### Range quantifiers\n\nUse ranges to specify minimum and maximum counts. This example finds between 2 and 5 high-severity alerts followed by 1 or 2 acknowledgment events:\n\n```hamelin\nWITH alerts =\n  FROM monitoring\n  | WHERE severity == 'HIGH'\n\nWITH acknowledgments =\n  FROM monitoring\n  | WHERE action == 'acknowledge'\n\nMATCH alerts{2,5} acknowledgments{1,2}\n```\n\n\n\n### Open-ended quantifiers\n\nUse `+` for \"one or more\" and `*` for \"zero or more\". This example finds one or more failed requests followed by one or more successful requests:\n\n```hamelin\nWITH failed_requests =\n  FROM api_logs\n  | WHERE status_code >= 500\n\nWITH success_requests =\n  FROM api_logs\n  | WHERE status_code < 400\n\nMATCH failed_requests+ success_requests+\n```\n\n\n\n## Time constraints with WITHIN\n\nThe `WITHIN` clause adds constraints to patterns, measuring the distance from the first event to the last event in the matched sequence. For timestamp-based ordering, this represents a time window. For numeric ordering, this represents a numeric distance.\n\n### Time window constraints\n\nRequire that patterns complete within a specific time period. This example finds 10 or more failed logins followed by successful logins, but only when the entire sequence happens within 10 minutes:\n\n```hamelin\nWITH failed_logins =\n  FROM security_logs\n  | WHERE action == 'login_failed'\n\nWITH successful_logins =\n  FROM security_logs\n  | WHERE action == 'login_success'\n\nMATCH failed_logins{10,} successful_logins+ WITHIN 10m\n```\n\nThe `WITHIN` clause measures on the actual `SORT` column. When you don't specify a `SORT` clause, timestamp ordering is applied automatically. When using `WITHIN`, you must have exactly one `SORT` expression, and the `SORT` column type must be compatible with the `WITHIN` type:\n- `TIMESTAMP` columns work with `INTERVAL` (e.g., `5m`) or `CALENDAR_INTERVAL` (e.g., `1y`, `3mon`)\n- Numeric columns require matching numeric types (e.g., `INT` sort with `INT` within)\n\n### Numeric ordering with WITHIN\n\nYou can use `WITHIN` with numeric columns to constrain sequences by numeric distance rather than time:\n\n```hamelin\nWITH step_a =\n  FROM process_log\n  | WHERE step == 'A'\n\nWITH step_b =\n  FROM process_log\n  | WHERE step == 'B'\n\nMATCH step_a step_b SORT BY sequence_number WITHIN 100\n```\n\nThis finds sequences where step A is followed by step B, and the sequence numbers differ by at most 100. The `WITHIN` constraint measures `last(sequence_number) - first(sequence_number) <= 100`.\n\n### Using first() and last() functions\n\nThe `first()` and `last()` functions access the earliest and latest events in each matched group. This example finds CPU spikes followed by memory alerts within 15 minutes:\n\n```hamelin\nWITH cpu_spikes =\n  FROM metrics\n  | WHERE cpu_usage > 90\n\nWITH memory_alerts =\n  FROM metrics\n  | WHERE memory_usage > 85\n\nMATCH cpu_spikes{3,} memory_alerts+ WITHIN 15m\n```\n\n\n\n## Aggregating over matched rows\n\nThe `AGG` clause lets you compute aggregations over the rows that\nparticipated in each matched sequence. Match aggregations only\noperate on the rows that were part of the match \u2014 not all rows\nwithin the time window. This distinction matters when the `WITHIN`\nwindow contains more events than the pattern actually matched.\n\nThe available match aggregation functions are `count()`,\n`count(x)`, `sum(x)`, `avg(x)`, `min(x)`, `max(x)`, `first(x)`,\nand `last(x)`.\n\n### Counting matched events\n\nUse `count()` to count how many events participated in each\nmatched sequence. When a pattern like `failed_logins{5,}\nsuccessful_logins+` matches, `count()` returns the total number\nof rows across all pattern groups \u2014 not just one group. You can\ncombine `count()` with `first()` and `last()` to capture both the\nsize and time span of each match. This example detects brute force\nlogin attempts and counts how many events were in each attack\nsequence:\n\n```hamelin\nWITH failed_logins =\n  FROM security_logs\n  | WHERE action == 'login_failed'\n\nWITH successful_logins =\n  FROM security_logs\n  | WHERE action == 'login_success'\n\nMATCH failed_logins{5,} successful_logins+\nAGG attempt_count = count(),\n    first_attempt = first(timestamp),\n    last_attempt = last(timestamp)\nBY user_id\nWITHIN 10m\n```\n\nThe `count()` returns the total number of matched events \u2014 all the\nfailed logins plus the successful login. The `first()` and\n`last()` functions return the timestamps of the earliest and latest\nevents in the match.\n\n### Aggregating field values\n\nYou can use `sum()`, `avg()`, `min()`, and `max()` to aggregate\nnumeric fields across matched events. These work exactly like\ntheir regular aggregation counterparts, but they only operate on\nthe events that participated in the match. This is important when\nthe `WITHIN` window contains events that didn't match the pattern\n\u2014 those events are excluded from the aggregation. This example\ndetects data exfiltration patterns and computes the total bytes\ntransferred during the matched sequence:\n\n```hamelin\nWITH access =\n  FROM file_logs\n  | WHERE action == 'read'\n\nWITH transfer =\n  FROM file_logs\n  | WHERE action == 'upload'\n\nMATCH access+ transfer\nAGG total_bytes = sum(bytes),\n    avg_bytes = avg(bytes),\n    file_count = count()\nBY user_id\nWITHIN 30m\n```\n\nThe `sum(bytes)` only totals the bytes from the events that\nparticipated in the matched pattern. If there were other file\nevents in the 30-minute window that didn't match the `access+\ntransfer` pattern, they are excluded from the aggregation.\n\n## Complex pattern examples\n\n### Security incident detection\n\nLook for suspicious login patterns that might indicate a brute force attack. This example detects external brute force attempts followed by successful logins and optional privilege escalations, all within 30 minutes:\n\n```hamelin\nWITH failed_logins =\n  FROM auth_logs\n  | WHERE outcome == 'FAILURE'\n  | WHERE source_ip NOT IN ('10.0.0.0/8', '192.168.0.0/16')\n\nWITH successful_logins =\n  FROM auth_logs\n  | WHERE outcome == 'SUCCESS'\n\nWITH privilege_escalations =\n  FROM audit_logs\n  | WHERE action == 'privilege_escalation'\n\nMATCH failed_logins{5,} successful_logins{1,3} privilege_escalations* WITHIN 30m\n```\n\n\n\n## When to use MATCH vs WINDOW\n\nThe key difference is that `WINDOW` performs unordered correlation while `MATCH` performs ordered correlation.\n\nWhen you pull multiple event patterns into a sliding window, you can aggregate each individual pattern or aggregate across all the patterns together. However, you cannot require that certain subpatterns happen before others\u2014the window treats all events within the time frame as unordered.\n\n`MATCH` specifies that certain events must happen before others in a specific sequence. Ordered correlation matters when the timing and sequence of events affects your analysis.\n\n**Use MATCH when order matters:**\n- Security attack sequences (failed logins \u2192 successful login \u2192 privilege escalation)\n- System failure cascades (errors \u2192 timeouts \u2192 circuit breaker trips)\n- User workflow analysis (page view \u2192 form submission \u2192 purchase)\n- Compliance violations (access \u2192 modification \u2192 deletion)\n",
   "language-basics/sort-limit-top-n.md": "# `SORT` and `LIMIT`: Doing top-n\n\nYou use the `SORT` command to order your data, and the `LIMIT` command to take only the first n rows from your results. Together, these commands let you find top performers, recent events, highest values, or any other ranking-based analysis.\n\nEach command is also useful on its own. `SORT` helps you understand data patterns by revealing ordering and outliers. You might sort transaction amounts to see the distribution of values, or sort timestamps to understand event sequences. `LIMIT` is valuable for exploring large datasets by giving you manageable samples. You can take the first 100 rows to understand data structure before writing more complex queries, or limit results to avoid overwhelming outputs during development.\n\n## Basic syntax\n\nSort your data by specifying the field you want to order by. Add `DESC` for descending order (highest to lowest) or leave it blank for ascending order (lowest to highest):\n\n```hamelin\n| SORT field_name DESC\n```\n\nLimit your results to a specific number of rows using the `LIMIT` command:\n\n```hamelin\n| LIMIT 10\n```\n\n## Simple sorting\n\nOrder your data by a single field to see patterns and outliers. This is useful for finding the most recent events, highest values, or alphabetical arrangements.\n\nSort login events by timestamp to see them in chronological order:\n\n```hamelin\nFROM security_logs\n| WHERE action == 'login'\n| SORT timestamp DESC\n```\n\nThis query gets login events and sorts them by timestamp in descending order (newest first), letting you see the full sequence of login activity.\n\n## Multiple sort fields\n\nYou can sort by multiple fields to create more sophisticated ordering. List the fields in order of priority, with the most important sort field first:\n\n```hamelin\nFROM transactions\n| SORT amount DESC, timestamp DESC\n```\n\nThis sorts transactions first by amount (highest first), then by timestamp (newest first) for transactions with the same amount. This ordering reveals value patterns across all transactions, with ties broken by recency.\n\n## Top-n analysis\n\nThe combination of `SORT` and `LIMIT` creates powerful top-n analysis patterns. This lets you answer questions like \"who are my top customers\" or \"what are the most common errors\" with simple, readable queries.\n\nFind the top 5 users by transaction volume:\n\n```hamelin\nFROM transactions\n| AGG total_amount = sum(amount) BY user_id\n| SORT total_amount DESC\n| LIMIT 5\n```\n\nThis aggregates transaction amounts by user, sorts by the total in descending order, and takes the top 5 results. The pattern works for any ranking scenario where you need to identify leaders or outliers.\n\nNote: This example uses the `AGG` command which we haven't covered yet. You can learn more about aggregation in [Aggregation](aggregation.md).\n\n## Sorting with expressions\n\nYou can sort by calculated values without adding them as permanent fields. This is useful when you want to order by a computation but don't need that computation in your final results:\n\n```hamelin\nFROM events\n| SORT (now() - timestamp) / 1hr\n| LIMIT 20\n```\n\nThis sorts events by how many hours ago they occurred, giving you the most recent events first. The calculation happens during sorting but doesn't create a new field in your results.\n\n## Complex sorting scenarios\n\nFor more advanced sorting, you can combine multiple fields, expressions, and directions to create exactly the ordering you need.\n\nFind the most problematic users by recent failed login attempts:\n\n```hamelin\nFROM security_logs\n| WHERE action == 'login_failed' AND timestamp > now() - 24hr\n| AGG failure_count = count(), latest_failure = max(timestamp) BY user_id\n| SORT failure_count DESC, latest_failure DESC\n| LIMIT 10\n```\n\nThis query identifies users with the most failed login attempts in the last 24 hours, sorted first by failure count (most failures first), then by recency of their latest failure. This creates a prioritized list for security investigation.\n\n## Performance considerations\n\nSorting large datasets can be expensive, especially when sorting by multiple fields or complex expressions. When possible, apply filters with `WHERE` before sorting to reduce the amount of data that needs to be ordered:\n\n```hamelin\nFROM events\n| WHERE timestamp > now() - 7d  -- Filter first\n| SORT severity DESC, timestamp DESC\n| LIMIT 50\n```\n\nThis pattern filters to recent events before sorting, which is more efficient than sorting all events and then filtering.\n",
   "language-basics/time.md": "# Time\n\nTime is central to event analytics. In Hamelin, you write time the way you think\nabout it\u2014`1hr`, `30min`, or `yesterday`. The language supports several ways to work\nwith time: intervals for durations, absolute timestamps for specific moments,\ntime truncation for grouping, and ranges for time spans.\n\n## Time intervals\n\nYou use time intervals to express durations\u2014how long something takes or how far back to look in your data.\n\n### Basic interval syntax\n\nCreate time intervals by writing a number directly followed by a time unit, with no spaces. Use these anywhere you need to specify how long something takes or how far back to look:\n\n```hamelin\n# Time intervals - number + unit\n1sec        # 1 second\n30sec       # 30 seconds\n1min        # 1 minute\n15min       # 15 minutes\n1hr         # 1 hour\n2hr         # 2 hours\n1d          # 1 day\n7d          # 7 days\n1w          # 1 week\n2w          # 2 weeks\n1mon        # 1 month\n6mon        # 6 months\n1yr         # 1 year\n```\n\n### Supported time units\n\n| Unit | Abbreviations | Examples |\n|------|---------------|----------|\n| **Seconds** | `s`, `sec`, `secs`, `second`, `seconds` | `30s`, `45sec` |\n| **Minutes** | `m`, `min`, `mins`, `minute`, `minutes` | `5m`, `15min` |\n| **Hours** | `h`, `hr`, `hrs`, `hour`, `hours` | `1h`, `2hr` |\n| **Days** | `d`, `day`, `days` | `1d`, `30days` |\n| **Weeks** | `w`, `week`, `weeks` | `1w`, `2weeks` |\n| **Months** | `mon`, `month`, `months` | `1mon`, `3months` |\n| **Years** | `y`, `yr`, `yrs`, `year`, `years` | `1y`, `2yrs` |\n\n### Using intervals in variables\n\nStore commonly used time intervals in variables to make your queries more readable and maintainable:\n\n```hamelin\n# Store intervals in variables for reuse\nWITH time_constants =\n  LET short_window = 5min,\n      daily_retention = 30d,\n      investigation_period = 2hr,\n      alert_threshold = 500ms\n```\n\n## Absolute timestamps\n\nYou can reference specific moments in time using absolute timestamps. This is useful when you know the exact time of an incident or need to analyze data from a specific date.\n\n### ISO 8601 format\n\nCreate absolute timestamps using the `ts()` function with ISO 8601 formatted strings. This format works with or without timezone information:\n\n```hamelin\n# Absolute timestamps using ISO 8601 format\nWITH timestamps =\n  LET incident_start = ts('2024-01-15T14:30:00'),\n      maintenance_window = ts('2024-01-15T02:00:00Z'),\n      deployment_time = ts('2024-01-15T09:15:30.123Z')\n```\n\n### Current time\n\nGet the current timestamp using the `now()` function. This captures the exact moment when your query starts running:\n\n```hamelin\n# Get the current timestamp\nWITH current_times =\n  LET right_now = now(),\n      query_start_time = now()\n```\n\n## Time truncation with `@`\n\nThe `@` operator snaps timestamps to time boundaries. You can truncate any timestamp to the start of its hour, day, week, or other time period. This makes it straightforward to group events into time buckets for analysis.\n\n### Truncation syntax\n\nApply the `@` operator to any timestamp to round it down to the nearest time boundary:\n\n```hamelin\n# Truncate current time to various boundaries\nnow()@d         # Today at midnight (00:00:00)\nnow()@hr        # This hour at :00 minutes\nnow()@min       # This minute at :00 seconds\nnow()@w         # This week's Monday at midnight\nnow()@mon       # First day of this month at midnight\n```\n\n### Available truncation units\n\n| Unit | Truncates To | Example Result |\n|------|--------------|----------------|\n| `@s` | Start of second | `2024-01-15T14:30:25.000` |\n| `@min` | Start of minute | `2024-01-15T14:30:00.000` |\n| `@hr` | Start of hour | `2024-01-15T14:00:00.000` |\n| `@d` | Start of day (midnight) | `2024-01-15T00:00:00.000` |\n| `@w` | Start of week (Monday) | `2024-01-15T00:00:00.000` |\n| `@mon` | Start of month | `2024-01-01T00:00:00.000` |\n| `@yr` | Start of year | `2024-01-01T00:00:00.000` |\n\n### Truncation with any timestamp\n\nYou can truncate any timestamp, not just `now()`. Create time buckets from your event data by truncating timestamp fields:\n\n```hamelin\n# Truncate any timestamp, not just now()\nWITH event_data =\n  LET event_time = ts('2024-01-15T14:37:22')\n\nFROM event_data\n| LET hour_bucket = event_time@hr        // 2024-01-15T14:00:00\n| LET day_bucket = event_time@d          // 2024-01-15T00:00:00\n```\n\nYou can also truncate timestamp columns directly from your event datasets to group events by time periods:\n\n```hamelin\nFROM events\n| LET event_day = timestamp@d           // Group events by day\n| LET event_hour = timestamp@hr         // Group events by hour\n| SELECT user_id, event_day, event_hour, action\n```\n\n## Time ranges\n\nYou combine time values into ranges using the `..` operator. Time ranges let you express time spans like \"between 2 and 4 hours ago\" or \"from this morning onward.\" This makes it natural to filter events within specific time windows.\n\n### The range operator `..`\n\nThe `..` operator creates a span between two time points. You place time values on either side to define the start and end of your range.\n\nCreate a range between 2 hours ago and 1 hour ago:\n\n```hamelin\n-2hr..-1hr\n```\n\nCreate a range from a specific time until now:\n\n```hamelin\nts('2024-01-15T10:00:00')..now()\n```\n\nCreate a range from midnight today until midnight tomorrow:\n\n```hamelin\nnow()@d..(now()@d + 1d)\n```\n\n### Relative time ranges\n\nUse negative numbers to go back in time from \"now\". This pattern covers most security and operational analytics scenarios where you're investigating recent events.\n\nGet events from the last hour:\n\n```hamelin\n-1hr..now()\n```\n\nGet events between 2 and 4 hours ago:\n\n```hamelin\n-4hr..-2hr\n```\n\nGet events from this week so far:\n\n```hamelin\nnow()@w..now()\n```\n\nYou can combine truncation with ranges to create precise time windows aligned to calendar boundaries:\n\n```hamelin\n// From start of today until now\nnow()@d..now()\n\n// Yesterday (full day)\n(now()@d - 1d)..(now()@d)\n\n// Last full hour\n(now()@hr - 1hr)..(now()@hr)\n```\n\n### Unbounded ranges\n\nYou can leave either side of the range empty to create spans that extend infinitely in one direction. This is useful for ongoing monitoring or historical analysis without a specific end point.\n\nGet everything before 1 hour ago:\n\n```hamelin\n..-1hr\n```\n\nGet everything from a specific time onward:\n\n```hamelin\nts('2024-01-15T10:00:00')..\n```\n\nGet everything from 2 hours ago onward (includes future events):\n\n```hamelin\n-2hr..\n```\n\n### Bounded vs unbounded ranges\n\nThe choice between bounded and unbounded ranges determines how your queries behave, especially for ongoing monitoring versus historical analysis.\n\n```hamelin\n// Bounded: Only gets events that happened in the past hour\nFROM alerts | WHERE timestamp IN -1hr..now()\n\n// Unbounded: Gets past events AND future events as they arrive\nFROM alerts | WHERE timestamp IN -1hr..\n```\n\nUse **bounded ranges** when analyzing completed time periods. Use **unbounded ranges** when monitoring ongoing events as they happen.\n",
   "language-basics/window-aggregating-over-sliding-windows.md": "# WINDOW: aggregating over sliding windows\n\nThe `WINDOW` command creates aggregations that slide across windows of data.\nThis lets you calculate running totals, moving averages, and time-based metrics\nwithout losing the detail of individual events. Each row gets its own\ncalculation based on a sliding window of related rows around it.\n\n## Window calculations\n\nThe `WINDOW` command supports two main types of calculations that operate on the\nsliding window of data. Aggregation functions like `count()`, `sum()`, `avg()`,\n`min()`, and `max()` calculate summary statistics across all rows in the current\nwindow frame.\n\n```hamelin\nFROM sales\n| WINDOW\n    total_sales = sum(amount),\n    avg_sale = avg(amount),\n    sale_count = count()\n  BY region\n  WITHIN -7d\n```\n\nThis creates a 7-day rolling summary for each region, showing total sales, average sale amount, and number of sales within the sliding window.\n\nWindow-specific functions like `row_number()`, `rank()`, `dense_rank()`, and `lag()` analyze the position and relationships between rows within the window without aggregating the data.\n\n```hamelin\nFROM events\n| WINDOW\n    event_number = row_number(),\n    event_rank = rank(),\n    previous_value = lag(score, 1)\n  BY user_id\n  SORT timestamp\n```\n\nThis assigns sequence numbers, ranks events by timestamp order, and shows the previous score value for each user's events.\n\nYou can combine multiple calculations in a single `WINDOW` command, and each calculation receives the same set of rows determined by the window frame, but produces different analytical results based on its specific function behavior:\n\n```hamelin\nFROM metrics\n| WINDOW\n    recent_count = count(),\n    running_total = sum(value),\n    current_rank = row_number(),\n    percentile_rank = percent_rank()\n  BY service\n  SORT timestamp\n  WITHIN -1hr\n```\n\nThis example mixes aggregation functions (`count()`, `sum()`) with window-specific functions (`row_number()`, `percent_rank()`) to create comprehensive analytics for each service within a 1-hour sliding window.\n\n\nWhen explicit names aren't provided for window calculations, Hamelin automatically generates field names from expressions. Learn more about this in [Automatic Field Names](../smart-features/automatic-field-names.md).\n\n## WINDOW command parts\n\nThe `WINDOW` command has three optional clauses that control how the sliding\nwindow behaves. Each clause serves a specific purpose in defining which data\ngets included in each calculation.\n\n```hamelin\nWINDOW calculations\n  BY grouping_fields     // optional: partitions data\n  SORT ordering_fields   // optional: defines row order\n  WITHIN frame_range     // optional: defines window size\n```\n\n### BY clause: partitioning data\n\nThe `BY` clause divides data into separate groups, with each group getting its own independent sliding window. This lets you create per-user, per-host, or per-category calculations without mixing data across different entities.\n\n**With BY fields specified:** Hamelin creates separate windows for each unique combination of those fields. This partitioning ensures that calculations for different users, devices, or categories remain completely independent. Each partition maintains its own window state, preventing data from different entities from interfering with each other. Here's how to create separate counting windows for each user:\n\n```hamelin\nFROM events\n| WINDOW count()\n  BY user_id\n```\n\n**Without BY fields:** Hamelin treats all data as one big group. This creates a single window that processes all events together, regardless of their source or category. The calculation accumulates across every row in the dataset, which proves useful for global metrics or overall trend analysis. This example creates one counting window that includes all events:\n\n```hamelin\nFROM events\n| WINDOW count()\n```\n\n### SORT clause: ordering rows\n\nThe `SORT` clause controls the order of rows within each window partition. This ordering determines which rows come \"before\" and \"after\" each row, affecting functions like `row_number()` and defining the direction of the sliding window.\n\n**With SORT specified:** Hamelin uses the explicit ordering. The sort order determines which rows come before and after each current row in the window calculation. When you want to analyze transactions by value rather than time, you can sort by amount to create value-based rankings and running totals:\n\n```hamelin\nFROM transactions\n| WINDOW\n    running_total = sum(amount),\n    transaction_rank = rank()\n  BY account_id\n  SORT amount DESC\n```\n\n**Without SORT specified:** Hamelin automatically orders by event timestamp. This chronological ordering makes sense for most time-series analysis where you want to track how metrics evolve over time. The automatic timestamp ordering eliminates the need to explicitly specify time-based sorting in typical analytical scenarios. This example creates a chronological sequence count for each user:\n\n```hamelin\nFROM events\n| WINDOW event_sequence = count()\n  BY user_id\n```\n\n### WITHIN clause: defining the window frame\n\nThe `WITHIN` clause controls how much data gets included in the window around each row.\n\n**With WITHIN specified:** Hamelin uses the explicit frame size. This sliding frame moves with each row, always maintaining the specified time period or row count. When you need to count events within a specific time window, you can specify the exact duration:\n\n```hamelin\nFROM events\n| WINDOW count()\n  BY user_id\n  WITHIN -1hr\n```\n\nFor each event, this counts all events for that user in the hour leading up to that event's timestamp.\n\n**Without WITHIN specified:** Hamelin uses `..0r` (from the beginning of the partition up to the current row). This default behavior creates cumulative calculations that include all rows from the start of each partition up to the current row. The cumulative approach works well for running totals, progressive counts, and other metrics that should include all historical data. This example creates a running count for each user from their first event:\n\n```hamelin\nFROM events\n| WINDOW cumulative_count = count()\n  BY user_id\n```\n\n## Window frames\n\nThe `WITHIN` clause accepts different types of frame specifications that control how much data gets included around each row. Frame specifications determine whether the window slides based on time intervals, specific row counts, or bounded ranges between two points. Understanding these frame types lets you create exactly the sliding behavior you need for different analytical scenarios.\n\n### Value-based frames\n\nIntervals like `-5min` or `-1hr` create sliding windows based on the values in the sorted column. Because the most common sort order is by timestamp, these frames typically create time-based windows that slide through data chronologically. The window maintains a consistent value range (usually time duration) regardless of how many events occur within that period. Value-based frames work particularly well for temporal metrics like monitoring system performance or analyzing user activity patterns over fixed time periods.\n\n```hamelin\nFROM metrics\n| WINDOW avg_cpu = avg(cpu_usage)\n  BY hostname\n  WITHIN -5min\n```\n\nFor each metric record, this calculates the average CPU usage for that host over the 5 minutes leading up to that metric's timestamp.\n\nYou can also use value-based frames with non-timestamp columns when the data is sorted by those values. This example calculates running statistics for orders based on order amounts, looking at orders within a $5 range of each current order:\n\n```hamelin\nFROM orders\n| WINDOW\n    nearby_orders = count(),\n    avg_nearby_amount = avg(amount)\n  BY customer_id\n  SORT amount\n  WITHIN -5\n```\n\n### Row-based frames\n\nRow counts create windows based on a specific number of surrounding rows rather than time periods. This approach proves valuable when you need consistent sample sizes for statistical calculations or when events occur at irregular intervals. Row-based frames ensure that each calculation includes exactly the specified number of data points, making comparisons more reliable across different time periods. This example counts events and assigns sequence numbers using a 4-row window (current row plus 3 preceding rows):\n\n```hamelin\nFROM events\n| WINDOW\n    recent_events = count(),\n    event_sequence = row_number()\n  BY user_id\n  SORT timestamp\n  WITHIN -3r\n```\n\n### Range frames\n\nRange frames like `-2hr..-1hr` create windows between two specific offsets from the current row. This capability lets you analyze data from specific time periods without including the current time period in the calculation. Range frames prove particularly useful for lag analysis, where you want to compare current metrics against historical periods, or when you need to exclude recent data that might be incomplete. This example counts events from the hour that ended one hour before each current event:\n\n```hamelin\nFROM events\n| WINDOW previous_hour_count = count()\n  BY user_id\n  WITHIN -2hr..-1hr\n```\n\nRange frames can be unbounded by omitting one end of the range. An unbounded range like `-2hr..` creates a window that extends infinitely in one direction from a starting point. This technique proves useful when you want all data from a specific threshold forward, such as calculating cumulative metrics that start counting only after a certain time delay. Unbounded ranges help create progressive totals that begin from meaningful starting points rather than the very beginning of the dataset. This example counts all events for each user starting from 2 hours before each current event:\n\n```hamelin\nFROM events\n| WINDOW cumulative_count = count()\n  BY user_id\n  WITHIN -2hr..\n```\n\n**Warning:** Interval frames like `-1hr` create sliding windows, while unbounded ranges like `-1hr..` include all future rows in the dataset. The unbounded version creates a massive window instead of the sliding window you typically want for temporal analysis.\n",

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hamelin.sh/documentation",
-  "version": "0.4.12",
+  "version": "0.4.14",
   "sideEffects": false,
   "license": "UNLICENSED",
   "type": "module",