@hamelin.sh/documentation 0.2.4 → 0.2.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/dist/main.js +6 -6
  2. package/package.json +1 -1
package/dist/main.js CHANGED
@@ -480,15 +480,15 @@ score thresholds based on acceptable false positive rates.
480
480
 
481
481
  The modular structure makes it easy to test individual components and adjust
482
482
  scoring weights without affecting the overall detection logic.`,
483
- "function-reference/aggregation-functions.md": "# Aggregation Functions\n\nFunctions that operate on groups of rows to produce summary values when used with `AGG` or `WINDOW` commands.\n\n## `count([expression])`\n\nReturns the number of rows in each group during aggregation.\n\n### Parameters\n\n- **expression** (optional) - When provided, only counts rows where this expression evaluates to a non-null value\n\n### Description\n\nThe `count()` function returns the number of rows in each group when you use it\nwith `AGG` or `WINDOW` commands. Unlike SQL, Hamelin uses `count()` rather than\n`count(*)`. When no grouping is specified, it counts all rows in the result.\n\nWhen you provide an expression parameter, `count()` only counts rows where that\nexpression evaluates to a non-null value. This lets you count specific \nconditions without using separate filtering.\n\nWhen you omit an explicit column name, Hamelin automatically generates the \ncolumn name `count()` or `count(expression)` that you can reference using \nbackticks in subsequent commands.\n\n## `count_if(condition)`\n\nCounts the number of rows where a condition is true in each group.\n\n### Parameters\n\n- **condition** - Boolean expression that must evaluate to true for the row to be counted\n\n### Description\n\nThe `count_if()` function counts only rows where the specified condition \nevaluates to true. This provides a more concise alternative to using `WHERE` \nclauses or `case()` expressions for conditional counting.\n\nWhen you omit an explicit column name, Hamelin automatically generates the \ncolumn name `count_if(condition)` that you can reference using backticks in \nsubsequent commands.\n\n## `sum(x)`\n\nReturns the sum of all values in each group.\n\n### Parameters\n\n- **x** - Numeric expression to sum\n\n### Description\n\nThe `sum()` function adds up all non-null values of the specified expression \nwithin each group. If all values are null, it returns null. The result type \nmatches the input type for exact numeric types.\n\n## `avg(x)`\n\nReturns the average (arithmetic mean) of all values in each group.\n\n### Parameters\n\n- **x** - Numeric expression to average\n\n### Description\n\nThe `avg()` function calculates the arithmetic mean of all non-null values \nwithin each group. If all values are null, it returns null. The result is \nalways a floating-point type regardless of input type.\n\n## `stddev(x)`\n\nReturns the standard deviation of all values in each group.\n\n### Parameters\n\n- **x** - Numeric expression to calculate standard deviation for\n\n### Description\n\nThe `stddev()` function calculates the sample standard deviation of all \nnon-null values within each group. If there are fewer than two non-null \nvalues, it returns null. The result is always a floating-point type.\n\n## `approx_percentile(x, percentile)`\n\nReturns an approximate percentile value for the specified expression.\n\n### Parameters\n\n- **x** - Numeric expression to calculate percentile for\n- **percentile** - Numeric value between 0.0 and 1.0 representing the desired percentile\n\n### Description\n\nThe `approx_percentile()` function calculates an approximate percentile using \nefficient algorithms suitable for large datasets. The percentile parameter \nshould be between 0.0 (minimum) and 1.0 (maximum). For example, 0.5 returns \nthe median, 0.95 returns the 95th percentile.\n\n## `min(x)`\n\nReturns the minimum value in each group.\n\n### Parameters\n\n- **x** - Expression of numeric, string, or timestamp type\n\n### Description\n\nThe `min()` function finds the smallest value within each group. It works with \nnumeric types (returning the numerically smallest), strings (lexicographic \nordering), and timestamps (chronologically earliest). If all values are null, \nit returns null.\n\n## `max(x)`\n\nReturns the maximum value in each group.\n\n### Parameters\n\n- **x** - Expression of numeric, string, or timestamp type\n\n### Description\n\nThe `max()` function finds the largest value within each group. It works with \nnumeric types (returning the numerically largest), strings (lexicographic \nordering), and timestamps (chronologically latest). If all values are null, \nit returns null.\n\n## `array_agg(x)`\n\nCollects all values in each group into an array.\n\n### Parameters\n\n- **x** - Expression of any type to collect into an array\n\n### Description\n\nThe `array_agg()` function creates an array containing all non-null values \nfrom the specified expression within each group. The order of elements in the \nresulting array follows the order specified by any `SORT` clause in the \naggregation command. If there are no non-null values, it returns an empty array.\n\n## `map_agg(key, value)`\n\nCollects key-value pairs in each group into a map.\n\n### Parameters\n\n- **key** - Expression of any type to use as map keys\n- **value** - Expression of any type to use as map values\n\n### Description\n\nThe `map_agg()` function creates a map from key-value pairs within each group. \nIf the same key appears multiple times, only the last value is retained. The \norder of processing follows any `SORT` clause in the aggregation command.\n\n## `multimap_agg(key, value)`\n\nCollects key-value pairs in each group into a map where each key maps to an array of values.\n\n### Parameters\n\n- **key** - Expression of any type to use as map keys\n- **value** - Expression of any type to collect into arrays\n\n### Description\n\nThe `multimap_agg()` function creates a map where each unique key maps to an \narray of all values associated with that key within each group. This preserves \nall values for duplicate keys, unlike `map_agg()` which keeps only the last value.\n\n## `any(x)`\n\nReturns true if any value in the group is true (logical OR aggregation).\n\n### Parameters\n\n- **x** - Boolean expression to test\n\n### Description\n\nThe `any()` function performs logical OR aggregation on boolean values within \neach group. It returns true if at least one value is true, false if all values \nare false, and null if all values are null.\n\n## `all(x)`\n\nReturns true if all values in the group are true (logical AND aggregation).\n\n### Parameters\n\n- **x** - Boolean expression to test\n\n### Description\n\nThe `all()` function performs logical AND aggregation on boolean values within \neach group. It returns true if all values are true, false if at least one \nvalue is false, and null if all values are null.",
484
- "function-reference/array-functions.md": "# Array Functions\n\nScalar functions for array processing and manipulation that can be used in any expression context.\n\n## `array_distinct(x)`\n\nRemoves duplicate elements from an array.\n\n### Parameters\n\n- **x** - Array expression\n\n### Description\n\nThe `array_distinct()` function returns a new array containing only the unique\nelements from the input array. The order of elements in the result is not\nguaranteed. If the input array is null, the function returns null.\n\n## `any(x)`\n\nTests whether any element in a boolean array is true.\n\n### Parameters\n\n- **x** - Array of boolean expressions\n\n### Description\n\nThe `any()` function returns true if at least one element in the boolean array\nis true, false if all elements are false. If the array is empty, it returns\nfalse. If the array contains only null values, it returns null. This function\nperforms logical OR aggregation across array elements.\n\n## `all(x)`\n\nTests whether all elements in a boolean array are true.\n\n### Parameters\n\n- **x** - Array of boolean expressions\n\n### Description\n\nThe `all()` function returns true if all elements in the boolean array are true,\nfalse if at least one element is false. If the array is empty, it returns true.\nIf the array contains only null values, it returns null. This function performs\nlogical AND aggregation across array elements.\n\n## `max(x)`\n\nReturns the maximum element from an array.\n\n### Parameters\n\n- **x** - Array of numeric, string, or timestamp expressions\n\n### Description\n\nThe `max()` function finds and returns the largest element in the array. For\nnumeric arrays, it returns the numerically largest value. For string arrays,\nit uses lexicographic ordering. For timestamp arrays, it returns the\nchronologically latest value. If the array is empty or contains only null\nvalues, it returns null.\n\n## `min(x)`\n\nReturns the minimum element from an array.\n\n### Parameters\n\n- **x** - Array of numeric, string, or timestamp expressions\n\n### Description\n\nThe `min()` function finds and returns the smallest element in the array. For\nnumeric arrays, it returns the numerically smallest value. For string arrays,\nit uses lexicographic ordering. For timestamp arrays, it returns the\nchronologically earliest value. If the array is empty or contains only null\nvalues, it returns null.\n\n## `sum(x)`\n\nReturns the sum of all numeric elements in an array.\n\n### Parameters\n\n- **x** - Array of numeric expressions\n\n### Description\n\nThe `sum()` function calculates the sum of all numeric elements in the array.\nNull values are ignored in the calculation. If the array is empty or contains\nonly null values, it returns null. The result type matches the element type\nfor exact numeric types.\n\n## `len(x)`\n\nReturns the number of elements in an array.\n\n### Parameters\n\n- **x** - Array expression of any element type\n\n### Description\n\nThe `len()` function returns the number of elements in the array as an integer.\nThis includes null elements in the count. If the array itself is null, the\nfunction returns null. An empty array returns 0.\n\n## `filter_null(x)`\n\nRemoves null elements from an array.\n\n### Parameters\n\n- **x** - Array expression of any element type\n\n### Description\n\nThe `filter_null()` function returns a new array containing only the non-null\nelements from the input array. The order of remaining elements is preserved.\nIf all elements are null, it returns an empty array. If the input array is\nnull, the function returns null.",
483
+ "function-reference/aggregation-functions.md": "# Aggregation Functions\n\nFunctions that operate on groups of rows to produce summary values when used with `AGG` or `WINDOW` commands.\n\n## `count([expression])`\n\nReturns the number of rows in each group during aggregation.\n\n### Parameters\n\n- **expression** (optional) - When provided, only counts rows where this expression evaluates to a non-null value\n\n### Description\n\nThe `count()` function returns the number of rows in each group when you use it\nwith `AGG` or `WINDOW` commands. Unlike SQL, Hamelin uses `count()` rather than\n`count(*)`. When no grouping is specified, it counts all rows in the result.\n\nWhen you provide an expression parameter, `count()` only counts rows where that\nexpression evaluates to a non-null value. This lets you count specific \nconditions without using separate filtering.\n\nWhen you omit an explicit column name, Hamelin automatically generates the \ncolumn name `count()` or `count(expression)` that you can reference using \nbackticks in subsequent commands.\n\n## `count_distinct(x)`\n\nCounts the number of distinct values in each group.\n\n### Parameters\n\n- **x** - Expression of any type to count distinct values for\n\n### Description\n\nThe `count_distinct()` function counts only unique values within each group,\nignoring duplicates. Null values are not counted. This provides the same\nfunctionality as SQL's `COUNT(DISTINCT x)` but with cleaner syntax. For\nexample, counting distinct user IDs shows how many unique users performed\nactions, regardless of how many actions each user performed.\n\n## `approx_distinct(x)`\n\nReturns an approximate count of distinct values in each group.\n\n### Parameters\n\n- **x** - Expression of any type to count distinct values for\n\n### Description\n\nThe `approx_distinct()` function provides an approximate count of distinct\nvalues using probabilistic algorithms that are much faster and use less\nmemory than exact counting, especially for high-cardinality data. The result\nis typically accurate within a few percent. This is particularly useful when\nyou need fast estimates for large datasets where exact precision isn't critical.\n\n## `count_if(condition)`\n\nCounts the number of rows where a condition is true in each group.\n\n### Parameters\n\n- **condition** - Boolean expression that must evaluate to true for the row to be counted\n\n### Description\n\nThe `count_if()` function counts only rows where the specified condition \nevaluates to true. This provides a more concise alternative to using `WHERE` \nclauses or `case()` expressions for conditional counting.\n\nWhen you omit an explicit column name, Hamelin automatically generates the \ncolumn name `count_if(condition)` that you can reference using backticks in \nsubsequent commands.\n\n## `sum(x)`\n\nReturns the sum of all values in each group.\n\n### Parameters\n\n- **x** - Numeric expression to sum\n\n### Description\n\nThe `sum()` function adds up all non-null values of the specified expression \nwithin each group. If all values are null, it returns null. The result type \nmatches the input type for exact numeric types.\n\n## `avg(x)`\n\nReturns the average (arithmetic mean) of all values in each group.\n\n### Parameters\n\n- **x** - Numeric expression to average\n\n### Description\n\nThe `avg()` function calculates the arithmetic mean of all non-null values \nwithin each group. If all values are null, it returns null. The result is \nalways a floating-point type regardless of input type.\n\n## `stddev(x)`\n\nReturns the standard deviation of all values in each group.\n\n### Parameters\n\n- **x** - Numeric expression to calculate standard deviation for\n\n### Description\n\nThe `stddev()` function calculates the sample standard deviation of all \nnon-null values within each group. If there are fewer than two non-null \nvalues, it returns null. The result is always a floating-point type.\n\n## `approx_percentile(x, percentile)`\n\nReturns an approximate percentile value for the specified expression.\n\n### Parameters\n\n- **x** - Numeric expression to calculate percentile for\n- **percentile** - Numeric value between 0.0 and 1.0 representing the desired percentile\n\n### Description\n\nThe `approx_percentile()` function calculates an approximate percentile using \nefficient algorithms suitable for large datasets. The percentile parameter \nshould be between 0.0 (minimum) and 1.0 (maximum). For example, 0.5 returns \nthe median, 0.95 returns the 95th percentile.\n\n## `min(x)`\n\nReturns the minimum value in each group.\n\n### Parameters\n\n- **x** - Expression of numeric, string, or timestamp type\n\n### Description\n\nThe `min()` function finds the smallest value within each group. It works with \nnumeric types (returning the numerically smallest), strings (lexicographic \nordering), and timestamps (chronologically earliest). If all values are null, \nit returns null.\n\n## `max(x)`\n\nReturns the maximum value in each group.\n\n### Parameters\n\n- **x** - Expression of numeric, string, or timestamp type\n\n### Description\n\nThe `max()` function finds the largest value within each group. It works with \nnumeric types (returning the numerically largest), strings (lexicographic \nordering), and timestamps (chronologically latest). If all values are null, \nit returns null.\n\n## `any_value(x)`\n\nReturns an arbitrary value from each group.\n\n### Parameters\n\n- **x** - Expression of any type to select a value from\n\n### Description\n\nThe `any_value()` function returns an arbitrary non-null value from each group.\nThis is useful when you're grouping data and need to include a column that's\nnot part of the grouping criteria, where you know all values in the group are\nthe same or where you don't care which specific value is selected. Unlike\n`min()` or `max()`, this function doesn't impose any ordering overhead, making\nit more efficient when you just need any representative value from the group.\n\n## `array_agg(x)`\n\nCollects all values in each group into an array.\n\n### Parameters\n\n- **x** - Expression of any type to collect into an array\n\n### Description\n\nThe `array_agg()` function creates an array containing all non-null values \nfrom the specified expression within each group. The order of elements in the \nresulting array follows the order specified by any `SORT` clause in the \naggregation command. If there are no non-null values, it returns an empty array.\n\n## `map_agg(key, value)`\n\nCollects key-value pairs in each group into a map.\n\n### Parameters\n\n- **key** - Expression of any type to use as map keys\n- **value** - Expression of any type to use as map values\n\n### Description\n\nThe `map_agg()` function creates a map from key-value pairs within each group. \nIf the same key appears multiple times, only the last value is retained. The \norder of processing follows any `SORT` clause in the aggregation command.\n\n## `multimap_agg(key, value)`\n\nCollects key-value pairs in each group into a map where each key maps to an array of values.\n\n### Parameters\n\n- **key** - Expression of any type to use as map keys\n- **value** - Expression of any type to collect into arrays\n\n### Description\n\nThe `multimap_agg()` function creates a map where each unique key maps to an \narray of all values associated with that key within each group. This preserves \nall values for duplicate keys, unlike `map_agg()` which keeps only the last value.\n\n## `any(x)`\n\nReturns true if any value in the group is true (logical OR aggregation).\n\n### Parameters\n\n- **x** - Boolean expression to test\n\n### Description\n\nThe `any()` function performs logical OR aggregation on boolean values within \neach group. It returns true if at least one value is true, false if all values \nare false, and null if all values are null.\n\n## `all(x)`\n\nReturns true if all values in the group are true (logical AND aggregation).\n\n### Parameters\n\n- **x** - Boolean expression to test\n\n### Description\n\nThe `all()` function performs logical AND aggregation on boolean values within \neach group. It returns true if all values are true, false if at least one \nvalue is false, and null if all values are null.",
484
+ "function-reference/array-functions.md": "# Array Functions\n\nScalar functions for array processing and manipulation that can be used in any expression context.\n\n## `array_distinct(x)`\n\nRemoves duplicate elements from an array.\n\n### Parameters\n\n- **x** - Array expression\n\n### Description\n\nThe `array_distinct()` function returns a new array containing only the unique\nelements from the input array. The order of elements in the result is not\nguaranteed. If the input array is null, the function returns null.\n\n## `any(x)`\n\nTests whether any element in a boolean array is true.\n\n### Parameters\n\n- **x** - Array of boolean expressions\n\n### Description\n\nThe `any()` function returns true if at least one element in the boolean array\nis true, false if all elements are false. If the array is empty, it returns\nfalse. If the array contains only null values, it returns null. This function\nperforms logical OR aggregation across array elements.\n\n## `all(x)`\n\nTests whether all elements in a boolean array are true.\n\n### Parameters\n\n- **x** - Array of boolean expressions\n\n### Description\n\nThe `all()` function returns true if all elements in the boolean array are true,\nfalse if at least one element is false. If the array is empty, it returns true.\nIf the array contains only null values, it returns null. This function performs\nlogical AND aggregation across array elements.\n\n## `max(x)`\n\nReturns the maximum element from an array.\n\n### Parameters\n\n- **x** - Array of numeric, string, or timestamp expressions\n\n### Description\n\nThe `max()` function finds and returns the largest element in the array. For\nnumeric arrays, it returns the numerically largest value. For string arrays,\nit uses lexicographic ordering. For timestamp arrays, it returns the\nchronologically latest value. If the array is empty or contains only null\nvalues, it returns null.\n\n## `min(x)`\n\nReturns the minimum element from an array.\n\n### Parameters\n\n- **x** - Array of numeric, string, or timestamp expressions\n\n### Description\n\nThe `min()` function finds and returns the smallest element in the array. For\nnumeric arrays, it returns the numerically smallest value. For string arrays,\nit uses lexicographic ordering. For timestamp arrays, it returns the\nchronologically earliest value. If the array is empty or contains only null\nvalues, it returns null.\n\n## `sum(x)`\n\nReturns the sum of all numeric elements in an array.\n\n### Parameters\n\n- **x** - Array of numeric expressions\n\n### Description\n\nThe `sum()` function calculates the sum of all numeric elements in the array.\nNull values are ignored in the calculation. If the array is empty or contains\nonly null values, it returns null. The result type matches the element type\nfor exact numeric types.\n\n## `avg(x)`\n\nReturns the average of all numeric elements in an array.\n\n### Parameters\n\n- **x** - Array of numeric expressions\n\n### Description\n\nThe `avg()` function calculates the arithmetic mean of all numeric elements in\nthe array. Null values are ignored in the calculation. If the array is empty or\ncontains only null values, it returns null. The result is always a floating-point\ntype regardless of the input element type. This function divides the sum of all\nnon-null elements by the count of non-null elements.\n\n## `len(x)`\n\nReturns the number of elements in an array.\n\n### Parameters\n\n- **x** - Array expression of any element type\n\n### Description\n\nThe `len()` function returns the number of elements in the array as an integer.\nThis includes null elements in the count. If the array itself is null, the\nfunction returns null. An empty array returns 0.\n\n## `filter_null(x)`\n\nRemoves null elements from an array.\n\n### Parameters\n\n- **x** - Array expression of any element type\n\n### Description\n\nThe `filter_null()` function returns a new array containing only the non-null\nelements from the input array. The order of remaining elements is preserved.\nIf all elements are null, it returns an empty array. If the input array is\nnull, the function returns null.\n\n## `slice(array, start, end)`\n\nExtracts a portion of an array between two indices.\n\n### Parameters\n\n- **array** - Array expression of any element type\n- **start** - Integer expression for the starting index (0-based, supports negative indices)\n- **end** - Integer expression for the ending index (exclusive, supports negative indices)\n\n### Description\n\nThe `slice()` function returns a new array containing elements from the start\nindex up to but not including the end index. Both indices are 0-based and support\nnegative values, where -1 refers to the last element, -2 to the second-last, and\nso on. If start is greater than or equal to end, an empty array is returned. The\nfunction handles out-of-bounds indices gracefully by clamping them to valid array\nboundaries.\n\n## `split(string, delimiter)`\n\nSplits a string into an array of substrings.\n\n### Parameters\n\n- **string** - String expression to split\n- **delimiter** - String expression used as the separator\n\n### Description\n\nThe `split()` function divides a string into an array of substrings based on the\nspecified delimiter. The delimiter itself is not included in the resulting array\nelements. If the delimiter is not found in the string, the function returns an\narray containing the original string as its only element. An empty delimiter\nresults in an error. Consecutive delimiters produce empty strings in the result.\n\n## `array_join(array, delimiter)` / `array_join(array, delimiter, null_replacement)`\n\nJoins array elements into a single string.\n\n### Parameters\n\n- **array** - Array of string expressions\n- **delimiter** - String expression to place between elements\n- **null_replacement** (optional) - String expression to use for null elements\n\n### Description\n\nThe `array_join()` function concatenates all elements of a string array into a\nsingle string, placing the delimiter between each element. By default, null\nelements are skipped. When you provide a null_replacement parameter, null\nelements are replaced with that string before joining. This is the inverse of\nthe `split()` function and is useful for creating delimited strings from arrays.\n\n## `flatten(x)`\n\nFlattens a nested array by one level.\n\n### Parameters\n\n- **x** - Array of arrays expression\n\n### Description\n\nThe `flatten()` function takes an array of arrays and returns a single array\ncontaining all elements from the nested arrays. It only flattens one level deep,\nso arrays nested more deeply remain as array elements. The order of elements is\npreserved, with elements from earlier arrays appearing before elements from later\narrays. If the input contains null arrays, they are skipped. This is useful for\ncombining multiple arrays or processing results from operations that return arrays.",
485
485
  "function-reference/conditional-functions.md": "# Conditional Functions\n\nScalar functions for conditional logic and branching that can be used in any expression context.\n\n## `if(condition, then)` / `if(condition, then, else)`\n\nReturns different values based on a boolean condition.\n\n### Parameters\n\n- **condition** - Boolean expression to evaluate\n- **then** - Expression to return when condition is true\n- **else** (optional) - Expression to return when condition is false\n\n### Description\n\nThe `if()` function evaluates the condition and returns the `then` expression\nif the condition is true. When used with two parameters, it returns null if\nthe condition is false. When used with three parameters, it returns the `else`\nexpression if the condition is false.\n\nBoth the `then` and `else` expressions must be of the same type when the\nthree-parameter form is used. The function provides a concise way to implement\nconditional logic within expressions.\n\n## `case(when: then, when: then, ...)`\n\nReturns values based on multiple conditions evaluated in order.\n\n### Parameters\n\n- **when: then** - Variable number of condition-value pairs\n\n### Description\n\nThe `case()` function evaluates multiple condition-value pairs in order and\nreturns the value associated with the first condition that evaluates to true.\nUnlike SQL's CASE WHEN syntax, Hamelin uses function syntax with colon-separated\npairs.\n\nEach condition must be a boolean expression, and all values must be of the same\ntype. If no condition matches, the function returns null. The conditions are\nevaluated in the order they appear, so earlier conditions take precedence.\n\n## `coalesce(...)`\n\nReturns the first non-null value from a list of expressions.\n\n### Parameters\n\n- **...** - Variable number of expressions of the same type\n\n### Description\n\nThe `coalesce()` function evaluates expressions from left to right and returns\nthe first expression that is not null. If all expressions are null, it returns\nnull. All expressions must be of the same type.\n\nThis function is commonly used for providing default values or handling null\nvalues in expressions. It's particularly useful when you want to fall back\nthrough a series of potentially null values to find the first valid one.",
486
- "function-reference/data-structure-functions.md": "# Data Structure Functions\n\nScalar functions for data structure operations and type information that can be used in any expression context.\n\n## `typeof(x)`\n\nReturns type information for any expression.\n\n### Parameters\n\n- **x** - Expression of any type\n\n### Description\n\nThe `typeof()` function returns a struct containing detailed type information\nabout the input expression. The result includes both the Hamelin type name\nand the corresponding SQL type name. This function is useful for debugging,\ntype introspection, and understanding how Hamelin maps types to the underlying\nSQL engine.\n\n## `map(keys, values)`\n\nCreates a map from separate key and value arrays.\n\n### Parameters\n\n- **keys** - Array expression containing map keys\n- **values** - Array expression containing map values\n\n### Description\n\nThe `map()` function creates a map by pairing elements from the keys array\nwith elements from the values array. Both arrays must have the same length.\nThe nth element from the keys array is paired with the nth element from the\nvalues array. If the arrays have different lengths, an error is raised.\n\n## `map(elements)`\n\nCreates a map from an array of key-value tuples.\n\n### Parameters\n\n- **elements** - Array of tuples where each tuple contains a key and value\n\n### Description\n\nThe `map()` function creates a map from an array of key-value pairs represented\nas tuples. Each tuple in the array must contain exactly two elements: the first\nelement becomes the key, and the second element becomes the value. This format\nis useful when you have structured key-value data.\n\n## `map()`\n\nCreates an empty map.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `map()` function creates an empty map with unknown key and value types.\nThis is useful for initializing map variables or as a starting point for\nmap operations. The specific key and value types are inferred from subsequent\nusage context.\n\n## `map(key: value, ...)`\n\nCreates a map from literal key-value pairs.\n\n### Parameters\n\n- **key: value** - Variable number of key-value pairs using colon syntax\n\n### Description\n\nThe `map()` function creates a map from explicitly specified key-value pairs\nusing Hamelin's colon syntax. Each key must be unique within the map. All keys\nmust be of the same type, and all values must be of the same type. This provides\na concise way to create maps with known literal values.\n\n## `map_keys(map)`\n\nExtracts all keys from a map as an array.\n\n### Parameters\n\n- **map** - Map expression\n\n### Description\n\nThe `map_keys()` function returns an array containing all keys from the input\nmap. The order of keys in the resulting array is not guaranteed. If the map\nis empty, it returns an empty array. If the map is null, the function returns null.\n\n## `parse_json(json)`\n\nParses a JSON string into a variant type.\n\n### Parameters\n\n- **json** - String expression containing valid JSON\n\n### Description\n\nThe `parse_json()` function parses a JSON string and returns the result as\na variant type that can represent any JSON structure including objects, arrays,\nstrings, numbers, booleans, and null values. If the input string is not valid\nJSON, an error is raised. The variant type preserves the original JSON structure\nand allows dynamic access to nested elements.\n\n## `parse_json(variant)`\n\nReturns a variant value unchanged (identity function for variants).\n\n### Parameters\n\n- **variant** - Variant expression\n\n### Description\n\nWhen `parse_json()` is called with a variant input, it simply returns the\nvariant unchanged. This overload allows `parse_json()` to be safely used\non values that might already be variants without causing errors or unnecessary\nconversions.\n\n## `len(collection)`\n\nReturns the number of elements in a collection.\n\n### Parameters\n\n- **collection** - Array or map expression\n\n### Description\n\nThe `len()` function returns the number of elements in arrays or maps as an\ninteger. For arrays, it counts all elements including null values. For maps,\nit counts the number of key-value pairs. If the collection is null, the\nfunction returns null. Empty collections return 0.\n\n## `filter_null(array)`\n\nRemoves null elements from an array.\n\n### Parameters\n\n- **array** - Array expression of any element type\n\n### Description\n\nThe `filter_null()` function returns a new array containing only the non-null\nelements from the input array. The order of remaining elements is preserved.\nIf all elements are null, it returns an empty array. If the input array is\nnull, the function returns null. This function is essential for cleaning\ndata before further processing.",
486
+ "function-reference/data-structure-functions.md": "# Data Structure Functions\n\nScalar functions for data structure operations and type information that can be used in any expression context.\n\n## `typeof(x)`\n\nReturns type information for any expression.\n\n### Parameters\n\n- **x** - Expression of any type\n\n### Description\n\nThe `typeof()` function returns a struct containing detailed type information\nabout the input expression. The result includes both the Hamelin type name\nand the corresponding SQL type name. This function is useful for debugging,\ntype introspection, and understanding how Hamelin maps types to the underlying\nSQL engine.\n\n## `map(keys, values)`\n\nCreates a map from separate key and value arrays.\n\n### Parameters\n\n- **keys** - Array expression containing map keys\n- **values** - Array expression containing map values\n\n### Description\n\nThe `map()` function creates a map by pairing elements from the keys array\nwith elements from the values array. Both arrays must have the same length.\nThe nth element from the keys array is paired with the nth element from the\nvalues array. If the arrays have different lengths, an error is raised.\n\n## `map(elements)`\n\nCreates a map from an array of key-value tuples.\n\n### Parameters\n\n- **elements** - Array of tuples where each tuple contains a key and value\n\n### Description\n\nThe `map()` function creates a map from an array of key-value pairs represented\nas tuples. Each tuple in the array must contain exactly two elements: the first\nelement becomes the key, and the second element becomes the value. This format\nis useful when you have structured key-value data.\n\n## `map()`\n\nCreates an empty map.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `map()` function creates an empty map with unknown key and value types.\nThis is useful for initializing map variables or as a starting point for\nmap operations. The specific key and value types are inferred from subsequent\nusage context.\n\n## `map(key: value, ...)`\n\nCreates a map from literal key-value pairs.\n\n### Parameters\n\n- **key: value** - Variable number of key-value pairs using colon syntax\n\n### Description\n\nThe `map()` function creates a map from explicitly specified key-value pairs\nusing Hamelin's colon syntax. Each key must be unique within the map. All keys\nmust be of the same type, and all values must be of the same type. This provides\na concise way to create maps with known literal values.\n\n## `map_keys(map)`\n\nExtracts all keys from a map as an array.\n\n### Parameters\n\n- **map** - Map expression\n\n### Description\n\nThe `map_keys()` function returns an array containing all keys from the input\nmap. The order of keys in the resulting array is not guaranteed. If the map\nis empty, it returns an empty array. If the map is null, the function returns null.\n\n## `map_values(map)`\n\nExtracts all values from a map as an array.\n\n### Parameters\n\n- **map** - Map expression\n\n### Description\n\nThe `map_values()` function returns an array containing all values from the input\nmap. The order of values in the resulting array corresponds to the order of keys\nreturned by `map_keys()`, though this order is not guaranteed across calls. If the\nmap is empty, it returns an empty array. If the map is null, the function returns\nnull. This is useful for extracting and processing all values from a map structure.\n\n## `parse_json(json)`\n\nParses a JSON string into a variant type.\n\n### Parameters\n\n- **json** - String expression containing valid JSON\n\n### Description\n\nThe `parse_json()` function parses a JSON string and returns the result as\na variant type that can represent any JSON structure including objects, arrays,\nstrings, numbers, booleans, and null values. If the input string is not valid\nJSON, an error is raised. The variant type preserves the original JSON structure\nand allows dynamic access to nested elements.\n\n## `parse_json(variant)`\n\nReturns a variant value unchanged (identity function for variants).\n\n### Parameters\n\n- **variant** - Variant expression\n\n### Description\n\nWhen `parse_json()` is called with a variant input, it simply returns the\nvariant unchanged. This overload allows `parse_json()` to be safely used\non values that might already be variants without causing errors or unnecessary\nconversions.\n\n## `to_json_string(json)`\n\nConverts a variant to its JSON string representation.\n\n### Parameters\n\n- **json** - Variant expression containing JSON data\n\n### Description\n\nThe `to_json_string()` function converts a variant type back into a JSON string.\nThis is the inverse of `parse_json()`, allowing you to serialize structured data\nback to JSON format. The resulting string is properly formatted JSON that can be\nstored, transmitted, or parsed by other systems. Complex nested structures are\npreserved, and the output follows standard JSON formatting rules.\n\n## `len(collection)`\n\nReturns the number of elements in a collection.\n\n### Parameters\n\n- **collection** - Array or map expression\n\n### Description\n\nThe `len()` function returns the number of elements in arrays or maps as an\ninteger. For arrays, it counts all elements including null values. For maps,\nit counts the number of key-value pairs. If the collection is null, the\nfunction returns null. Empty collections return 0.\n\n## `filter_null(array)`\n\nRemoves null elements from an array.\n\n### Parameters\n\n- **array** - Array expression of any element type\n\n### Description\n\nThe `filter_null()` function returns a new array containing only the non-null\nelements from the input array. The order of remaining elements is preserved.\nIf all elements are null, it returns an empty array. If the input array is\nnull, the function returns null. This function is essential for cleaning\ndata before further processing.",
487
487
  "function-reference/match-group-functions.md": "# Match Group Functions\n\nFunctions for accessing events within pattern matching groups that must be used with the `MATCH` command.\n\n## `first(expression)` / `first(expression, offset)`\n\nReturns the value of an expression from the first event in a match group.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the first event\n- **offset** (optional) - Integer specifying which occurrence to access (default: 0)\n\n### Description\n\nThe `first()` function retrieves the value of the specified expression from\nthe first event in the current match group. When used with the offset parameter,\nit returns the value from the first + offset event. This function is commonly\nused to access timestamps, field values, or calculated expressions from the\nbeginning of a matched event sequence.\n\n## `last(expression)` / `last(expression, offset)`\n\nReturns the value of an expression from the last event in a match group.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the last event\n- **offset** (optional) - Integer specifying which occurrence to access (default: 0)\n\n### Description\n\nThe `last()` function retrieves the value of the specified expression from\nthe last event in the current match group. When used with the offset parameter,\nit returns the value from the last - offset event. This function is commonly\nused to measure durations, access final states, or extract values from the\nend of a matched event sequence.\n\n## `prev(expression)`\n\nReturns the value of an expression from the previous event in the sequence.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the previous event\n\n### Description\n\nThe `prev()` function retrieves the value of the specified expression from\nthe event immediately preceding the current event in the match sequence.\nThis function provides access to the previous event's state, enabling\ncomparisons and calculations that depend on sequential relationships\nbetween events.\n\n## `next(expression)`\n\nReturns the value of an expression from the next event in the sequence.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the next event\n\n### Description\n\nThe `next()` function retrieves the value of the specified expression from\nthe event immediately following the current event in the match sequence.\nThis function enables forward-looking analysis and calculations that depend\non subsequent events in the pattern.",
488
488
  "function-reference/mathematical-functions.md": "# Mathematical Functions\n\nScalar functions for mathematical operations and calculations that can be used in any expression context.\n\n## `abs(x)`\n\nReturns the absolute value of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `abs()` function returns the absolute value (magnitude) of the input number,\nremoving any negative sign. For positive numbers and zero, it returns the value\nunchanged. For negative numbers, it returns the positive equivalent.\n\n## `cbrt(x)`\n\nReturns the cube root of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `cbrt()` function calculates the cube root of the input value. The result\nis always returned as a double-precision floating-point number. Unlike square\nroot, cube root is defined for negative numbers.\n\n## `ceil(x)` / `ceiling(x)`\n\nRounds a number up to the nearest integer.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `ceil()` and `ceiling()` functions round the input value up to the nearest\ninteger. For positive numbers, this means rounding away from zero. For negative\nnumbers, this means rounding toward zero. Both function names are equivalent.\n\n## `degrees(x)`\n\nConverts radians to degrees.\n\n### Parameters\n\n- **x** - Numeric expression representing an angle in radians\n\n### Description\n\nThe `degrees()` function converts an angle from radians to degrees. The result\nis always returned as a double-precision floating-point number. The conversion\nuses the formula: degrees = radians \xD7 (180/\u03C0).\n\n## `e()`\n\nReturns Euler's number (mathematical constant e).\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `e()` function returns the mathematical constant e (approximately 2.71828),\nwhich is the base of natural logarithms. The result is returned as a\ndouble-precision floating-point number.\n\n## `exp(x)`\n\nReturns e raised to the power of x.\n\n### Parameters\n\n- **x** - Numeric expression representing the exponent\n\n### Description\n\nThe `exp()` function calculates e^x, where e is Euler's number. This is the\nexponential function, which is the inverse of the natural logarithm. The result\nis always returned as a double-precision floating-point number.\n\n## `floor(x)`\n\nRounds a number down to the nearest integer.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `floor()` function rounds the input value down to the nearest integer. For\npositive numbers, this means rounding toward zero. For negative numbers, this\nmeans rounding away from zero.\n\n## `ln(x)`\n\nReturns the natural logarithm of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `ln()` function calculates the natural logarithm (base e) of the input\nvalue. The input must be positive; negative values or zero will result in an\nerror. The result is always returned as a double-precision floating-point number.\n\n## `log(b, x)`\n\nReturns the logarithm of x with the specified base.\n\n### Parameters\n\n- **b** - Numeric expression representing the logarithm base\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `log()` function calculates the logarithm of x using the specified base b.\nBoth the base and the value must be positive. The result is always returned as\na double-precision floating-point number.\n\n## `log10(x)`\n\nReturns the base-10 logarithm of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `log10()` function calculates the common logarithm (base 10) of the input\nvalue. The input must be positive; negative values or zero will result in an\nerror. The result is always returned as a double-precision floating-point number.\n\n## `log2(x)`\n\nReturns the base-2 logarithm of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be positive)\n\n### Description\n\nThe `log2()` function calculates the binary logarithm (base 2) of the input\nvalue. The input must be positive; negative values or zero will result in an\nerror. The result is always returned as a double-precision floating-point number.\n\n## `pi()`\n\nReturns the mathematical constant \u03C0 (pi).\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `pi()` function returns the mathematical constant \u03C0 (approximately 3.14159),\nwhich represents the ratio of a circle's circumference to its diameter. The\nresult is returned as a double-precision floating-point number.\n\n## `pow(x, p)` / `power(x, p)`\n\nRaises a number to the specified power.\n\n### Parameters\n\n- **x** - Numeric expression representing the base\n- **p** - Numeric expression representing the exponent\n\n### Description\n\nThe `pow()` and `power()` functions calculate x raised to the power of p (x^p).\nBoth function names are equivalent. The result is always returned as a\ndouble-precision floating-point number.\n\n## `radians(x)`\n\nConverts degrees to radians.\n\n### Parameters\n\n- **x** - Numeric expression representing an angle in degrees\n\n### Description\n\nThe `radians()` function converts an angle from degrees to radians. The result\nis always returned as a double-precision floating-point number. The conversion\nuses the formula: radians = degrees \xD7 (\u03C0/180).\n\n## `round(x)` / `round(x, d)`\n\nRounds a number to the nearest integer or specified decimal places.\n\n### Parameters\n\n- **x** - Numeric expression to round\n- **d** (optional) - Integer specifying the number of decimal places\n\n### Description\n\nThe `round()` function rounds the input value to the nearest integer when used\nwith one parameter, or to the specified number of decimal places when used with\ntwo parameters. The rounding follows standard mathematical rules (0.5 rounds up).\n\n## `sign(x)`\n\nReturns the sign of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `sign()` function returns -1 for negative numbers, 0 for zero, and 1 for\npositive numbers. This function helps determine the sign of a value without\nregard to its magnitude.\n\n## `sqrt(x)`\n\nReturns the square root of a number.\n\n### Parameters\n\n- **x** - Numeric expression (must be non-negative)\n\n### Description\n\nThe `sqrt()` function calculates the square root of the input value. The input\nmust be non-negative; negative values will result in an error. The result is\nalways returned as a double-precision floating-point number.\n\n## `truncate(x)`\n\nRemoves the fractional part of a number.\n\n### Parameters\n\n- **x** - Numeric expression\n\n### Description\n\nThe `truncate()` function removes the fractional part of a number, effectively\nrounding toward zero. For positive numbers, this is equivalent to `floor()`.\nFor negative numbers, this is equivalent to `ceil()`.\n\n## `width_bucket(x, bound1, bound2, n)`\n\nReturns the bucket number for a value in a histogram with equal-width buckets.\n\n### Parameters\n\n- **x** - Numeric expression representing the value to bucket\n- **bound1** - Numeric expression representing the lower bound\n- **bound2** - Numeric expression representing the upper bound \n- **n** - Integer expression representing the number of buckets\n\n### Description\n\nThe `width_bucket()` function determines which bucket a value falls into when\ndividing the range between bound1 and bound2 into n equal-width buckets. Values\noutside the bounds return 0 (below bound1) or n+1 (above bound2).\n\n## `width_bucket(x, bins)`\n\nReturns the bucket number for a value using explicitly defined bucket boundaries.\n\n### Parameters\n\n- **x** - Numeric expression representing the value to bucket\n- **bins** - Array of numeric values representing bucket boundaries\n\n### Description\n\nThe `width_bucket()` function determines which bucket a value falls into using\nan array of explicitly defined bucket boundaries. The function returns the\nindex of the bucket where the value belongs, with 0 for values below the\nlowest boundary and array length + 1 for values above the highest boundary.",
489
489
  "function-reference/regular-expression-functions.md": "# Regular Expression Functions\n\nScalar functions for pattern matching and advanced text processing using regular expressions.\n\n## `regexp_count(string, pattern)`\n\nCounts the number of times a regular expression pattern matches in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n\n### Description\n\nThe `regexp_count()` function returns the number of non-overlapping matches of\nthe specified regular expression pattern within the input string. If no matches\nare found, it returns 0. The pattern uses standard regular expression syntax.\n\n## `regexp_extract_all(string, pattern)` / `regexp_extract_all(string, pattern, group)`\n\nExtracts all matches of a regular expression pattern from a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **group** (optional) - Integer specifying which capture group to extract\n\n### Description\n\nThe `regexp_extract_all()` function returns an array containing all matches of\nthe specified pattern. When used with two parameters, it returns the entire\nmatch. When used with three parameters, it returns the specified capture group\nfrom each match.\n\nIf no matches are found, it returns an empty array. Capture groups are numbered\nstarting from 1, with 0 representing the entire match.\n\n## `regexp_extract(string, pattern)` / `regexp_extract(string, pattern, group)`\n\nExtracts the first match of a regular expression pattern from a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **group** (optional) - Integer specifying which capture group to extract\n\n### Description\n\nThe `regexp_extract()` function returns the first match of the specified pattern.\nWhen used with two parameters, it returns the entire match. When used with three\nparameters, it returns the specified capture group from the first match.\n\nIf no match is found, it returns null. Capture groups are numbered starting\nfrom 1, with 0 representing the entire match.\n\n## `regexp_like(string, pattern)`\n\nTests whether a string matches a regular expression pattern.\n\n### Parameters\n\n- **string** - String expression to test\n- **pattern** - String expression representing the regular expression pattern\n\n### Description\n\nThe `regexp_like()` function returns true if the input string contains a match\nfor the specified regular expression pattern, false otherwise. This function\ntests for the presence of a match anywhere within the string, not just at the\nbeginning or end.\n\n## `regexp_position(string, pattern)` / `regexp_position(string, pattern, start)` / `regexp_position(string, pattern, start, occurrence)`\n\nReturns the position of a regular expression match within a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **start** (optional) - Integer specifying the starting position for the search\n- **occurrence** (optional) - Integer specifying which occurrence to find\n\n### Description\n\nThe `regexp_position()` function returns the 1-based position of the first\ncharacter of a pattern match within the string. When used with the `start`\nparameter, it begins searching from that position. When used with the\n`occurrence` parameter, it finds the nth occurrence of the pattern.\n\nIf no match is found, it returns 0. The start position is 1-based, and the\noccurrence count begins at 1 for the first match.\n\n## `regexp_replace(string, pattern)` / `regexp_replace(string, pattern, replacement)`\n\nReplaces matches of a regular expression pattern in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the regular expression pattern\n- **replacement** (optional) - String expression to replace matches with\n\n### Description\n\nThe `regexp_replace()` function replaces all matches of the specified pattern.\nWhen used with two parameters, it removes all matches (replaces with empty\nstring). When used with three parameters, it replaces matches with the\nspecified replacement string.\n\nThe replacement string can include capture group references using standard\nregular expression syntax. If no matches are found, the original string is\nreturned unchanged.\n\n## `regexp_split(string, pattern)`\n\nSplits a string using a regular expression pattern as the delimiter.\n\n### Parameters\n\n- **string** - String expression to split\n- **pattern** - String expression representing the regular expression pattern to use as delimiter\n\n### Description\n\nThe `regexp_split()` function splits the input string at each occurrence of\nthe specified pattern and returns an array of the resulting substrings. The\npattern matches are not included in the result array.\n\nIf the pattern is not found, the function returns an array containing the\noriginal string as a single element. If the pattern matches at the beginning\nor end of the string, empty strings may be included in the result array.",
490
490
  "function-reference/string-functions.md": "# String Functions\n\nScalar functions for string processing and manipulation that can be used in any expression context.\n\n## `replace(string, pattern)`\n\nReplaces all occurrences of a pattern in a string.\n\n### Parameters\n\n- **string** - String expression to search within\n- **pattern** - String expression representing the text to replace\n\n### Description\n\nThe `replace()` function removes all occurrences of the specified pattern from\nthe input string. This function performs literal string replacement, not\npattern matching. If the pattern is not found, the original string is returned\nunchanged.\n\n## `starts_with(string, prefix)`\n\nTests whether a string starts with a specified prefix.\n\n### Parameters\n\n- **string** - String expression to test\n- **prefix** - String expression representing the prefix to check for\n\n### Description\n\nThe `starts_with()` function returns true if the input string begins with the\nspecified prefix, false otherwise. The comparison is case-sensitive. An empty\nprefix will always return true for any string.\n\n## `ends_with(string, suffix)`\n\nTests whether a string ends with a specified suffix.\n\n### Parameters\n\n- **string** - String expression to test\n- **suffix** - String expression representing the suffix to check for\n\n### Description\n\nThe `ends_with()` function returns true if the input string ends with the\nspecified suffix, false otherwise. The comparison is case-sensitive. An empty\nsuffix will always return true for any string.\n\n## `contains(string, substring)`\n\nTests whether a string contains a specified substring.\n\n### Parameters\n\n- **string** - String expression to search within\n- **substring** - String expression representing the text to search for\n\n### Description\n\nThe `contains()` function returns true if the input string contains the\nspecified substring anywhere within it, false otherwise. The comparison is\ncase-sensitive. An empty substring will always return true for any string.\n\n## `lower(string)`\n\nConverts a string to lowercase.\n\n### Parameters\n\n- **string** - String expression to convert\n\n### Description\n\nThe `lower()` function converts all uppercase characters in the input string\nto their lowercase equivalents. Characters that are already lowercase or\nnon-alphabetic characters remain unchanged.\n\n## `upper(string)`\n\nConverts a string to uppercase.\n\n### Parameters\n\n- **string** - String expression to convert\n\n### Description\n\nThe `upper()` function converts all lowercase characters in the input string\nto their uppercase equivalents. Characters that are already uppercase or\nnon-alphabetic characters remain unchanged.\n\n## `len(string)`\n\nReturns the length of a string in characters.\n\n### Parameters\n\n- **string** - String expression to measure\n\n### Description\n\nThe `len()` function returns the number of characters in the input string.\nThis counts Unicode characters, not bytes, so multi-byte characters are\ncounted as single characters. An empty string returns 0.",
491
- "function-reference/time-date-functions.md": '# Time & Date Functions\n\nScalar functions for temporal data processing and manipulation that can be used in any expression context.\n\n## `now()`\n\nReturns the current timestamp.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `now()` function returns the current date and time as a timestamp. The\nexact timestamp represents the moment when the function is evaluated during\nquery execution. All calls to `now()` within the same query execution return\nthe same timestamp value.\n\n## `today()`\n\nReturns today\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `today()` function returns the current date with the time portion set to\nmidnight (00:00:00). This is equivalent to truncating `now()` to the day\nboundary. The result represents the start of the current day.\n\n## `yesterday()`\n\nReturns yesterday\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `yesterday()` function returns yesterday\'s date with the time portion set\nto midnight (00:00:00). This is equivalent to subtracting one day from `today()`.\nThe result represents the start of the previous day.\n\n## `tomorrow()`\n\nReturns tomorrow\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `tomorrow()` function returns tomorrow\'s date with the time portion set to\nmidnight (00:00:00). This is equivalent to adding one day to `today()`. The\nresult represents the start of the next day.\n\n## `ts(timestamp)`\n\nConverts a string to a timestamp.\n\n### Parameters\n\n- **timestamp** - String expression representing a timestamp\n\n### Description\n\nThe `ts()` function parses a string representation of a timestamp and converts\nit to a timestamp type. The function accepts various timestamp formats including\nISO 8601 format. If the string cannot be parsed as a valid timestamp, an error\nis raised.\n\n## `year(timestamp)`\n\nExtracts the year from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `year()` function extracts the year component from a timestamp and returns\nit as an integer. For example, a timestamp of "2023-07-15 14:30:00" would\nreturn 2023.\n\n## `month(timestamp)`\n\nExtracts the month from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `month()` function extracts the month component from a timestamp and returns\nit as an integer from 1 to 12, where 1 represents January and 12 represents\nDecember. For example, a timestamp of "2023-07-15 14:30:00" would return 7.\n\n## `day(timestamp)`\n\nExtracts the day of the month from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `day()` function extracts the day component from a timestamp and returns\nit as an integer from 1 to 31, depending on the month. For example, a timestamp\nof "2023-07-15 14:30:00" would return 15.\n\n## `day_of_week(timestamp)`\n\nExtracts the day of the week from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `day_of_week()` function extracts the ISO day of the week from a timestamp \nand returns it as an integer from 1 (Monday) to 7 (Sunday).\n\n## `hour(timestamp)`\n\nExtracts the hour from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `hour()` function extracts the hour component from a timestamp and returns\nit as an integer from 0 to 23, using 24-hour format. For example, a timestamp\nof "2023-07-15 14:30:00" would return 14.\n\n## `minute(timestamp)`\n\nExtracts the minute from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `minute()` function extracts the minute component from a timestamp and\nreturns it as an integer from 0 to 59. For example, a timestamp of\n"2023-07-15 14:30:00" would return 30.\n\n## `second(timestamp)`\n\nExtracts the second from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `second()` function extracts the second component from a timestamp and\nreturns it as an integer from 0 to 59. For example, a timestamp of\n"2023-07-15 14:30:45" would return 45.\n\n## `at_timezone(timestamp, timezone)`\n\nConverts a timestamp to a different timezone.\n\n### Parameters\n\n- **timestamp** - Timestamp expression to convert\n- **timezone** - String expression representing the target timezone\n\n### Description\n\nThe `at_timezone()` function converts a timestamp from its current timezone\nto the specified target timezone. The timezone parameter should be a valid\ntimezone identifier such as "UTC", "America/New_York", or "Europe/London".\nThe function returns a new timestamp representing the same moment in time\nbut expressed in the target timezone.\n\n## `to_millis(interval)`\n\nConverts an interval to milliseconds.\n\n### Parameters\n\n- **interval** - Interval expression to convert\n\n### Description\n\nThe `to_millis()` function converts an interval (duration) to its equivalent\nvalue in milliseconds as an integer. This is useful for calculations that\nrequire numeric representations of time durations. For example, an interval\nof "5 minutes" would return 300000 milliseconds.',
491
+ "function-reference/time-date-functions.md": '# Time & Date Functions\n\nScalar functions for temporal data processing and manipulation that can be used in any expression context.\n\n## `now()`\n\nReturns the current timestamp.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `now()` function returns the current date and time as a timestamp. The\nexact timestamp represents the moment when the function is evaluated during\nquery execution. All calls to `now()` within the same query execution return\nthe same timestamp value.\n\n## `today()`\n\nReturns today\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `today()` function returns the current date with the time portion set to\nmidnight (00:00:00). This is equivalent to truncating `now()` to the day\nboundary. The result represents the start of the current day.\n\n## `yesterday()`\n\nReturns yesterday\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `yesterday()` function returns yesterday\'s date with the time portion set\nto midnight (00:00:00). This is equivalent to subtracting one day from `today()`.\nThe result represents the start of the previous day.\n\n## `tomorrow()`\n\nReturns tomorrow\'s date at midnight.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `tomorrow()` function returns tomorrow\'s date with the time portion set to\nmidnight (00:00:00). This is equivalent to adding one day to `today()`. The\nresult represents the start of the next day.\n\n## `ts(timestamp)`\n\nConverts a string to a timestamp.\n\n### Parameters\n\n- **timestamp** - String expression representing a timestamp\n\n### Description\n\nThe `ts()` function parses a string representation of a timestamp and converts\nit to a timestamp type. The function accepts various timestamp formats including\nISO 8601 format. If the string cannot be parsed as a valid timestamp, an error\nis raised.\n\n## `year(timestamp)`\n\nExtracts the year from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `year()` function extracts the year component from a timestamp and returns\nit as an integer. For example, a timestamp of "2023-07-15 14:30:00" would\nreturn 2023.\n\n## `month(timestamp)`\n\nExtracts the month from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `month()` function extracts the month component from a timestamp and returns\nit as an integer from 1 to 12, where 1 represents January and 12 represents\nDecember. For example, a timestamp of "2023-07-15 14:30:00" would return 7.\n\n## `day(timestamp)`\n\nExtracts the day of the month from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `day()` function extracts the day component from a timestamp and returns\nit as an integer from 1 to 31, depending on the month. For example, a timestamp\nof "2023-07-15 14:30:00" would return 15.\n\n## `day_of_week(timestamp)`\n\nExtracts the day of the week from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `day_of_week()` function extracts the ISO day of the week from a timestamp \nand returns it as an integer from 1 (Monday) to 7 (Sunday).\n\n## `hour(timestamp)`\n\nExtracts the hour from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `hour()` function extracts the hour component from a timestamp and returns\nit as an integer from 0 to 23, using 24-hour format. For example, a timestamp\nof "2023-07-15 14:30:00" would return 14.\n\n## `minute(timestamp)`\n\nExtracts the minute from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `minute()` function extracts the minute component from a timestamp and\nreturns it as an integer from 0 to 59. For example, a timestamp of\n"2023-07-15 14:30:00" would return 30.\n\n## `second(timestamp)`\n\nExtracts the second from a timestamp.\n\n### Parameters\n\n- **timestamp** - Timestamp expression\n\n### Description\n\nThe `second()` function extracts the second component from a timestamp and\nreturns it as an integer from 0 to 59. For example, a timestamp of\n"2023-07-15 14:30:45" would return 45.\n\n## `at_timezone(timestamp, timezone)`\n\nConverts a timestamp to a different timezone.\n\n### Parameters\n\n- **timestamp** - Timestamp expression to convert\n- **timezone** - String expression representing the target timezone\n\n### Description\n\nThe `at_timezone()` function converts a timestamp from its current timezone\nto the specified target timezone. The timezone parameter should be a valid\ntimezone identifier such as "UTC", "America/New_York", or "Europe/London".\nThe function returns a new timestamp representing the same moment in time\nbut expressed in the target timezone.\n\n## `to_millis(interval)`\n\nConverts an interval to milliseconds.\n\n### Parameters\n\n- **interval** - Interval expression to convert\n\n### Description\n\nThe `to_millis()` function converts an interval (duration) to its equivalent\nvalue in milliseconds as an integer. This is useful for calculations that\nrequire numeric representations of time durations. For example, an interval\nof "5 minutes" would return 300000 milliseconds.\n\n## `to_nanos(interval)`\n\nConverts an interval to nanoseconds.\n\n### Parameters\n\n- **interval** - Interval expression to convert\n\n### Description\n\nThe `to_nanos()` function converts an interval (duration) to its equivalent\nvalue in nanoseconds as an integer. This provides the highest precision for\ntime duration calculations. The function multiplies the millisecond value\nby 1,000,000 to get nanoseconds. For example, an interval of "1 second"\nwould return 1,000,000,000 nanoseconds.\n\n## `from_millis(millis)`\n\nCreates an interval from milliseconds.\n\n### Parameters\n\n- **millis** - Integer expression representing milliseconds\n\n### Description\n\nThe `from_millis()` function creates an interval from a millisecond value.\nThis is the inverse of `to_millis()`, allowing you to convert numeric\nmillisecond values back into interval types that can be used with timestamp\narithmetic. For example, `from_millis(5000)` creates an interval of 5 seconds.\n\n## `from_nanos(nanos)`\n\nCreates an interval from nanoseconds.\n\n### Parameters\n\n- **nanos** - Integer expression representing nanoseconds\n\n### Description\n\nThe `from_nanos()` function creates an interval from a nanosecond value.\nThis is the inverse of `to_nanos()`, converting numeric nanosecond values\ninto interval types. The function divides the nanosecond value by 1,000,000,000\nto convert to seconds. For example, `from_nanos(1500000000)` creates an\ninterval of 1.5 seconds.\n\n## `from_unixtime_seconds(seconds)`\n\nCreates a timestamp from Unix seconds.\n\n### Parameters\n\n- **seconds** - Integer expression representing seconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_seconds()` function converts a Unix timestamp (seconds\nsince January 1, 1970 UTC) into a timestamp type. This is commonly used\nwhen working with systems that store time as Unix timestamps. For example,\n`from_unixtime_seconds(1625097600)` returns the timestamp "2021-07-01 00:00:00".\n\n## `from_unixtime_millis(millis)`\n\nCreates a timestamp from Unix milliseconds.\n\n### Parameters\n\n- **millis** - Integer expression representing milliseconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_millis()` function converts Unix time in milliseconds\nto a timestamp. Many systems and APIs return timestamps as milliseconds\nsince the Unix epoch. This function handles the conversion by multiplying\nthe input by 1,000,000 to convert to nanoseconds internally. For example,\n`from_unixtime_millis(1625097600000)` returns "2021-07-01 00:00:00".\n\n## `from_unixtime_micros(micros)`\n\nCreates a timestamp from Unix microseconds.\n\n### Parameters\n\n- **micros** - Integer expression representing microseconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_micros()` function converts Unix time in microseconds\nto a timestamp. This provides microsecond precision for systems that require\nit. The function multiplies the input by 1,000 to convert to nanoseconds\ninternally. For example, `from_unixtime_micros(1625097600000000)` returns\n"2021-07-01 00:00:00".\n\n## `from_unixtime_nanos(nanos)`\n\nCreates a timestamp from Unix nanoseconds.\n\n### Parameters\n\n- **nanos** - Integer expression representing nanoseconds since Unix epoch\n\n### Description\n\nThe `from_unixtime_nanos()` function converts Unix time in nanoseconds\ndirectly to a timestamp. This provides the highest precision for timestamp\nconversion and is useful when working with high-frequency data or systems\nthat track time at nanosecond granularity. For example,\n`from_unixtime_nanos(1625097600000000000)` returns "2021-07-01 00:00:00".\n\n## `to_unixtime(timestamp)`\n\nConverts a timestamp to Unix seconds.\n\n### Parameters\n\n- **timestamp** - Timestamp expression to convert\n\n### Description\n\nThe `to_unixtime()` function converts a timestamp to Unix time, returning\nthe number of seconds since January 1, 1970 UTC as a double-precision\nfloating-point number. The fractional part represents sub-second precision.\nThis is useful for interoperability with systems that expect Unix timestamps.\nFor example, the timestamp "2021-07-01 00:00:00" returns 1625097600.0.',
492
492
  "function-reference/window-functions.md": "# Window Functions\n\nFunctions for analytical operations over data windows that must be used with the `WINDOW` command.\n\n## `row_number()`\n\nReturns a sequential row number for each row within a window partition.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `row_number()` function assigns a unique sequential integer to each row\nwithin its window partition, starting from 1. The ordering is determined by\nthe `SORT` clause in the `WINDOW` command. Rows with identical sort values\nreceive different row numbers in an arbitrary but consistent order.\n\n## `rank()`\n\nReturns the rank of each row within a window partition with gaps.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `rank()` function assigns a rank to each row within its window partition\nbased on the `SORT` clause ordering. Rows with identical sort values receive\nthe same rank, and subsequent ranks are skipped. For example, if two rows tie\nfor rank 2, the next row receives rank 4 (not rank 3).\n\n## `dense_rank()`\n\nReturns the rank of each row within a window partition without gaps.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `dense_rank()` function assigns a rank to each row within its window\npartition based on the `SORT` clause ordering. Rows with identical sort values\nreceive the same rank, but subsequent ranks are not skipped. For example, if\ntwo rows tie for rank 2, the next row receives rank 3.\n\n## `lag(expression, offset, ignore_nulls)`\n\nReturns the value of an expression from a previous row within the window.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the previous row\n- **offset** - Integer specifying how many rows back to look\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `lag()` function retrieves the value of the specified expression from a\nrow that is `offset` positions before the current row within the window\npartition. When `ignore_nulls` is true, null values are skipped when counting\nthe offset. If there is no row at the specified offset, the function returns null.\n\n## `lead(expression, offset, ignore_nulls)`\n\nReturns the value of an expression from a subsequent row within the window.\n\n### Parameters\n\n- **expression** - Expression to evaluate from the subsequent row\n- **offset** - Integer specifying how many rows ahead to look\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `lead()` function retrieves the value of the specified expression from a\nrow that is `offset` positions after the current row within the window\npartition. When `ignore_nulls` is true, null values are skipped when counting\nthe offset. If there is no row at the specified offset, the function returns null.\n\n## `first_value(expression, ignore_nulls)`\n\nReturns the first value of an expression within the window frame.\n\n### Parameters\n\n- **expression** - Expression to evaluate\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `first_value()` function returns the value of the specified expression from\nthe first row in the current window frame. When `ignore_nulls` is true, it\nreturns the first non-null value. The window frame is determined by the\n`WITHIN` clause in the `WINDOW` command.\n\n## `last_value(expression, ignore_nulls)`\n\nReturns the last value of an expression within the window frame.\n\n### Parameters\n\n- **expression** - Expression to evaluate\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `last_value()` function returns the value of the specified expression from\nthe last row in the current window frame. When `ignore_nulls` is true, it\nreturns the last non-null value. The window frame is determined by the\n`WITHIN` clause in the `WINDOW` command.\n\n## `nth_value(expression, n, ignore_nulls)`\n\nReturns the nth value of an expression within the window frame.\n\n### Parameters\n\n- **expression** - Expression to evaluate\n- **n** - Integer specifying which value to return (1-based)\n- **ignore_nulls** - Boolean indicating whether to skip null values (default: true)\n\n### Description\n\nThe `nth_value()` function returns the value of the specified expression from\nthe nth row in the current window frame. When `ignore_nulls` is true, null\nvalues are not counted in the position. If there is no nth row, the function\nreturns null. The position is 1-based, where 1 represents the first row.\n\n## `cume_dist()`\n\nReturns the cumulative distribution of each row within the window partition.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `cume_dist()` function calculates the cumulative distribution of each row\nwithin its window partition. The result is the number of rows with values less\nthan or equal to the current row's value, divided by the total number of rows\nin the partition. Values range from 0 to 1.\n\n## `percent_rank()`\n\nReturns the percentile rank of each row within the window partition.\n\n### Parameters\n\nThis function takes no parameters.\n\n### Description\n\nThe `percent_rank()` function calculates the percentile rank of each row within\nits window partition. The result is calculated as (rank - 1) / (total rows - 1),\nwhere rank is determined by the `SORT` clause ordering. Values range from 0 to 1,\nwith 0 representing the lowest value and 1 representing the highest.",
493
493
  "introduction.md": "# Introducing Hamelin\n\nHamelin is a **pipe-based query language** for **event analytics** which targets\nthe specific challenges detection engineers face when analyzing security events.\nThe language makes event correlation straightforward, letting you define\npatterns, correlate them across time windows, and match ordered sequences of\nevents.\n\n## Key Features\n\n### \u{1F504} Pipe-Based\n\nYou write queries that read naturally from top to bottom. Each operation\nconnects to the next using the pipe operator `|`. Pipe-based languages let you\nbuild queries incrementally, making them easier to read, write, and test than\napproaches that rely heavily on nested subqueries.\n\n```hamelin\nFROM events\n| WHERE event.action == 'login'\n| WITHIN -1hr\n| SELECT user.email, timestamp\n```\n\n### \u{1F550} Event-Native\n\nHamelin offers shorthand for working with timestamped events. Time intervals are\nwritten as simple expressions that match how you think about time. You can\nreference relative timestamps and truncate them to specific boundaries.\n\n```hamelin\n// Reference relative time\n| WITHIN -15m // events within the last 15 minutes\n| WITHIN -1h // events within the last hour\n| WITHIN -7d // events within the last 7 days\n\n// Truncate timestamps to boundaries\n| SELECT timestamp@h // truncate to hour boundary\n| SELECT timestamp@d // truncate to day boundary\n```\n\n### \u{1FA9F} Sliding Windows\n\nSliding windows move continuously with each event, giving you insights without\ngaps or duplicates. You can aggregate data over these moving time windows to\ndetect patterns as they happen.\n\n```hamelin\nFROM events\n| WHERE event.action == 'login'\n| WINDOW count()\n BY user.id\n WITHIN -15m\n```\n\n### \u{1F3AF} Correlation of Named Subqueries\n\nNamed subqueries let you define specific event patterns and correlate them\nwithin sliding windows. You can drop these patterns into sliding windows and\nwrite correlations around them. Hamelin makes it straightforward to aggregate\nover specific patterns while also aggregating over the entire group of events.\n\n```hamelin\nWITH failed_logins = FROM events\n| WHERE event.action == 'login_failed'\n\nWITH successful_logins = FROM events\n| WHERE event.action == 'login_success'\n\nFROM failed = failed_logins, success = successful_logins\n| WINDOW failures = count(failed),\n successes = count(success),\n total = count(),\n BY user.id\n WITHIN -5m\n| WHERE successes >= 1 && failures / total > 0.2\n```\n\nThis query demonstrates correlating failed and successful login events to detect\nbrute force attacks. Named subqueries define distinct event patterns:\n`failed_logins` filters to login failure events while `successful_logins`\nfilters to login success events. The sliding window aggregates these patterns by\nuser over 5-minute periods, counting failures, successes, and total events. The\nfinal filter identifies users who had at least one successful login where failed\nattempts represent more than 20% of their total login activity within that\nwindow.\n\n### \u{1F50D} Ordered Matching of Named Subqueries\n\nYou can ask Hamelin to match ordered patterns across events. Aggregations over sliding windows work well for many use cases, but others require that you search for specific events followed by other specific events. You can do that in Hamelin using regular expression quantifiers applied to named subqueries.\n\n```hamelin\nWITH failed_logins = FROM events\n| WHERE event.action == 'login_failed'\n\nWITH successful_logins = FROM events\n| WHERE event.action == 'login_success'\n\nMATCH failed_logins{10,} successful_logins+\nWHEN last(successful_logins.timestamp) - first(successful_logins.timestamp) < 10m\n```\n\nThis searches for 10 failed logins followed by at least one successful login in\na ten minute period. The sliding window approach might miss attack patterns\nwhere timing and sequence matter, but ordered matching can detect the exact\nprogression of a brute force attack.\n\n### \u{1F517} Event Type Expansion\n\nYou can query across different event types without worrying about schema\ndifferences. Hamelin automatically sets missing fields to `null` when they don't\nexist in a particular event type.\n\n```hamelin\nFROM login_events, logout_events, error_events\n// Filters by user.email when if this field exists in a row.\n// Drops rows where this field does not exist\n// (because NULL does not equal any string).\n| WHERE user.email == 'john@example.com'\n```\n\n### \u{1F5C2}\uFE0F Structured Types\n\nHamelin supports structured types like structs, arrays, and maps to represent\ncomplex data. These types make data modeling more familiar, and reduce the need\nto rely too much on joins in analytic queries.\n\n```hamelin\n// Create struct literals with nested data\nLET login_metadata = {\n ip_address: '192.168.1.100',\n user_agent: 'Mozilla/5.0',\n location: 'San Francisco'\n}\n\n// Access nested fields using dot notation\n| WHERE login_metadata.ip_address != '192.168.1.100'\n\n// Use arrays to store multiple related values\n| LET failed_attempts = [\n {timestamp: '2024-01-15T14:25:00Z', reason: 'invalid_password'},\n {timestamp: '2024-01-15T14:27:00Z', reason: 'account_locked'}\n ]\n\n// Use maps when key data is high cardinality\n// Using structs for this use case creates too many columns.\n| LET host_metrics = map(\n 'web-server-01': {cpu: 85.2, memory: 72.1},\n 'web-server-02': {cpu: 91.7, memory: 68.9},\n 'db-primary-01': {cpu: 67.3, memory: 89.4}\n )\n\n// Look up map values using index notation\n| WHERE host_metrics['web-server-01'].cpu > 80\n```\n\n### \u{1F4E1} Array Broadcasting\n\nHamelin makes working with arrays simpler by offering broadcasting, which helps\nyou distribute operations over each member of an array. It does this when you\napply an operation to an array that makes more sense to be applied to each of\nits members. Broadcasting lets you work with arrays using simple, familiar\nsyntax without asking you to resort to functional programming or inefficient\nunnesting.\n\n```hamelin\n| WHERE any(failed_attempts.reason == 'invalid_password')\n```\n\nThis example demonstrates how the equality operator `==` broadcasts across the\n`reason` field of each element in the `failed_attempts` array. This example\ndemonstrates *two* broadcasts:\n\n * first, the lookup of the `reason` field changes an array-of-struct into an\n array-of-string\n * second, applying equality to the resulting array applies it to each member\n\nHamelin can do this automatically because it is type-aware. It knows that\ncomparing equality between `array(string)` and `string` makes more sense to\nbroadcast: an array can never be equal to a string, but a member of an\n`array(string)` might be.\n\n### \u{1F500} Semi-Structured Types\n\nHamelin lets you parse json into instances of the `variant` type. This helps you\nhandle semi-structured data that doesn't fit nicely into fixed schemas. You can\nparse JSON strings, access their fields, and convert them to more structured\ntypes. This makes working with JSON feel fairly native.\n\n```hamelin\n// Parse JSON strings into the variant type\nFROM logs\n| LET event_data = parse_json(raw_json)\n\n// Access nested fields using dot notation\n| WHERE event_data.level AS string == 'ERROR'\n\n// Access json array elements with index notation\n| LET first_tag = event_data.tags[0]\n\n// Cast variant data to structured types when you need type safety.\n// Values that do not match will be null.\n| LET user_info = event_data.user AS {id: int, name: string}\n```\n\n### \u{1F6A8} Excellent Error Messages\n\nHamelin provides clear, helpful error messages. Error messages\npoint directly to the problematic Hamelin code and explain exactly what went\nwrong, rather than showing cryptic messages about generated SQL.\n\nThis matters especially when AI assistants write queries. AI tools need precise\ndescriptions of errors to fix queries and complete tasks. Clear error messages\nlet AI assistants debug queries effectively by giving the context needed to\ncorrect mistakes.\n\n```hamelin\nFROM simba.sysmon_events\n| AGG count() BY host.hostname\n| LET hostname = lower(host.hostname)\n```\n\ngenerates the error\n\n```\nError: problem doing translation\n \u256D\u2500[ :3:24 ]\n \u2502\n 3 \u2502 | LET hostname = lower(host.hostname)\n \u2502 \u2500\u2500\u252C\u2500\n \u2502 \u2570\u2500\u2500\u2500 error while translating\n \u2502\n \u2502 Note: unbound column reference: host\n \u2502\n \u2502 the following entries in the environment are close:\n \u2502 - `host.hostname` (you must actually wrap with ``)\n\u2500\u2500\u2500\u256F\n```\n\nHere, the user has forgotten to escape an identifier that contains a dot character.\n\n```hamelin\nFROM simba.sysmon_events\n| WINDOW count(),\n all(winlog.event_data.events)\n BY host.hostname\n```\n\ngenerates the error\n\n```\nError: problem doing translation\n \u256D\u2500[ :3:10 ]\n \u2502\n 3 \u2502 all(winlog.event_data.events)\n \u2502 \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252C\u2500\u252C\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\n \u2502 \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 could not find a matching function definition\n \u2502 \u2502\n \u2502 \u2570\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 variant\n \u2502\n \u2502 Note: Attempted all(x=boolean)\n \u2502 - Type mismatch for x: expected boolean, got variant\n \u2502\n \u2502 Attempted all(x=array(boolean))\n \u2502 - Type mismatch for x: expected array(boolean), got variant\n \u2502\n\u2500\u2500\u2500\u256F\n```\n\nHere, the user has forgotten to cast variant to a primitive type so that it can\nbe matched against the function call. (A future version of Hamelin will probably\ncoerce this automatically!)\n",
494
494
  "language-basics/aggregation.md": "# AGG: performing ordinary aggregation\n\nThe `AGG` command groups and aggregates datasets to create summary statistics\nand analytical insights. You can analyze user behavior patterns, system\nperformance metrics, or security events by grouping related records together and\napplying mathematical functions to each group.\n\n## AGG syntax\n\nThe `AGG` command follows a simple pattern that groups data and applies aggregation functions to each group:\n\n```hamelin\nAGG result = function(expression), ... BY grouping_expression, ...\n```\n\nWhen you omit the `BY` clause, Hamelin aggregates all records into a single group. This calculates overall dataset statistics and global metrics that span all records, counting all events across the entire dataset without any grouping or partitioning:\n\n```hamelin\nFROM events\n| AGG total_events = count()\n```\n\nWhen you omit explicit column names, Hamelin generates them automatically from\nthe expressions you provide. Learn more about this feature in [Automatic Field\nNames](../smart-features/automatic-field-names.md). This creates columns named\n`count()` and `avg(response_time)` that you can reference using backticks in\nsubsequent commands:\n\n```hamelin\nFROM requests\n| AGG count(), avg(response_time) BY service_name\n```\n\nWhen you omit aggregation functions entirely, you get distinct groups without any calculations. This returns the unique combinations of event_type and user_id without performing any mathematical operations:\n\n```hamelin\nFROM events\n| AGG BY event_type, user_id\n```\n\nYou can also rename columns in the BY clause and use any expression for grouping. This example groups by renamed event_type, truncated timestamp, and extracted email domain, creating clear column names for downstream analysis:\n\n```hamelin\nFROM events\n| AGG\n total_events = count(),\n avg_duration = avg(duration)\n BY event_category = event_type,\n hour_bucket = timestamp@hr,\n user_domain = split(email, '@')[1]\n```\n\n## Simple aggregation examples\n\n### Basic counting\n\nEvent counting groups events by their characteristics and calculates how many events fall into each category. Notice that Hamelin uses `count()` with no arguments, not `count(*)` like SQL. The empty parentheses count all rows in each group, providing a clean syntax for the most common aggregation operation:\n\n```hamelin\nFROM events\n| AGG event_count = count() BY event_type\n```\n\n### Multiple aggregations\n\nCalculating several metrics at once in a single `AGG` command ensures all metrics use consistent grouping logic:\n\n```hamelin\nFROM requests\n| AGG\n total_requests = count(),\n avg_response_time = avg(response_time),\n max_response_time = max(response_time),\n error_count = count_if(status_code >= 400)\n BY service_name\n```\n\n### Conditional aggregation\n\nConditional aggregation functions like `count_if()` let you count only rows that meet specific conditions without pre-filtering the dataset. Conditional aggregation maintains the full context of each group while applying different filters to different calculations:\n\n```hamelin\nFROM auth_logs\n| AGG\n failures = count_if(outcome == 'FAILURE'),\n successes = count_if(outcome == 'SUCCESS')\n BY user_name\n```\n\n## Time series aggregations\n\nTime series aggregations combine time truncation with grouping to create time-based buckets for temporal analysis. Time-based grouping creates time-bucketed summaries for monitoring system performance, tracking business metrics, and understanding user behavior patterns across different time scales.\n\n### Hourly summaries\n\nHourly aggregations provide detailed views of system activity and user behavior throughout the day:\n\n```hamelin\nFROM logs\n| AGG\n hourly_events = count(),\n avg_response = avg(response_time),\n error_rate = count_if(status >= 400) / count()\n BY timestamp@hr\n| SORT timestamp@hr\n```\n\n### Daily trends\n\nDaily aggregations reveal longer-term trends and enable comparison across different time periods:\n\n```hamelin\nFROM events\n| WITHIN -30d..now()\n| AGG\n daily_events = count(),\n unique_users = count_distinct(user_name),\n high_severity = count_if(severity = 'HIGH')\n BY timestamp@d\n| SORT timestamp@d DESC\n```\n",
@@ -505,10 +505,10 @@ scoring weights without affecting the overall detection logic.`,
505
505
  "smart-features/lineage-sub-struct.md": "# Lineage sub-struct\n\nHamelin tracks where data comes from when you assign results to struct fields\nin `FROM` or `MATCH` clauses. This lineage tracking lets you correlate events\nfrom different sources while maintaining visibility into which upstream source\ncontributed each piece of data. You can reference this lineage information to\nbuild complex pattern detection queries.\n\n## How lineage sub-struct works\n\nHamelin creates a composite record that preserves the source of each piece of\ndata when you assign query results to struct fields. This happens automatically\nwhen you use assignment syntax in `FROM` or `MATCH` clauses. As an example,\nconsider tracking both failed and successful login events:\n\n```hamelin\nFROM failed = failed_logins, success = successful_logins\n| WINDOW failures = count(failed),\n successes = count(success),\n total = count()\n BY user.id\n WITHIN -5m\n```\n\nThe `failed = failed_logins` assignment creates a struct field that gets\npopulated for events from the failed logins source, while `success =\nsuccessful_logins` creates another struct field that gets populated for events\nfrom the successful logins source. Events from `failed_logins` will have the\n`failed` field populated and `success` as NULL. Events from `successful_logins`\nwill have the `success` field populated and `failed` as NULL. Hamelin maintains\nthis lineage information throughout the query pipeline.\n\n## Accessing lineage data\n\nYou can reference the assigned struct fields directly in queries. The field\nnames become available for filtering, aggregation, and selection:\n\n```hamelin\nFROM failed = security_alerts, success = login_events\n| WHERE failed.severity > 'medium' OR success.user_id IS NOT NULL\n| SELECT failed.alert_type, success.login_time, failed.source_ip\n```\n\nEach event gets lineage tags that indicate which source it came from. Events\nfrom `security_alerts` will have the `failed` field populated with their data\nand `success` as NULL. Events from `login_events` will have the `success` field\npopulated with their data and `failed` as NULL. This lets you access any field\nfrom the original data while knowing exactly which source contributed each\nevent.\n\n## Pattern correlation with lineage\n\nLineage tracking enables sophisticated event correlation patterns. As an\nexample, consider detecting brute force attacks by correlating failed attempts\nwith eventual successes:\n\n```hamelin\nWITH failed_logins = FROM events\n| WHERE event.action == 'login_failed'\n\nWITH successful_logins = FROM events\n| WHERE event.action == 'login_success'\n\nFROM failed = failed_logins, success = successful_logins\n| WINDOW failures = count(failed),\n successes = count(success),\n total = count()\n BY user.id\n WITHIN -5m\n| WHERE successes >= 1 && failures / total > 0.2\n| SELECT user.id,\n failed_count = failures,\n success_count = successes,\n failure_rate = failures / total,\n```\n\nThis query correlates two distinct event patterns within sliding windows. The\nlineage tracking lets you distinguish events by source - events from\n`failed_logins` have the `failed` struct populated, while events from\n`successful_logins` have the `success` struct populated. You can then access\nsource-specific fields and aggregate based on event lineage.\n\n## MATCH clause lineage\n\nThe `MATCH` command also supports lineage tracking when you assign pattern\nresults to struct fields. As an example, consider detecting brute force\npatterns that span multiple login attempts:\n\n```hamelin\nWITH failed_logins = FROM events\n| WHERE event.action == 'login_failed'\n\nWITH successful_logins = FROM events\n| WHERE event.action == 'login_success'\n\nMATCH failed_logins = failed_logins{10,}, successful_logins = successful_logins+\nWHEN max(successful_logins.timestamp) - min(successful_logins.timestamp) < 10m\n| AGG failed_count = count(failed_logins),\n success_count = count(successful_logins),\n first_failed_ip = min(failed_logins.source_ip),\n success_duration = max(successful_logins.timestamp) - min(successful_logins.timestamp)\n BY user_id\n```\n\nThis pattern detects sequences where at least 10 failed login attempts are\nfollowed by one or more successful logins, all occurring within a 10-minute\nwindow. The assignments (`failed_logins =` and `successful_logins =`) create\nlineage tags that identify which pattern each event matched. Events matching the\nfailed login pattern have the `failed_logins` struct populated, while events\nmatching the successful login pattern have the `successful_logins` struct\npopulated. The `AGG` command then operates on these lineage-tagged events to\ncalculate metrics specific to each pattern type. The `count(failed_logins)`\naggregation counts only events that matched the failed login pattern, while\n`count(successful_logins)` counts only events that matched the successful login\npattern. Similarly, `min(failed_logins.source_ip)` accesses the `source_ip`\nfield specifically from events in the failed login pattern, and the timestamp\ncalculations work with the `timestamp` field from events in the successful login\npattern.\n\n## Benefits of lineage tracking\n\nLineage sub-struct provides several key advantages for complex data analysis.\nYou can correlate events from multiple sources while maintaining clear\nattribution of where each piece of data originated. This eliminates confusion\nin queries where data might come from multiple upstream sources with similar\nfield names.\n\nThe feature also enables pattern detection across different event types. You\ncan write queries that aggregate and filter across multiple event patterns\nwhile accessing specific fields from each pattern type. This supports use cases\nlike security monitoring, user behavior analysis, and system performance\ncorrelation.\n",
506
506
  "smart-features/type-expansion.md": "# Type expansion\n\nHamelin expands types when you query multiple datasets. If you write `FROM\nevents, logs`, Hamelin creates an *expanded type* that includes all fields from\nboth sources. This lets you write queries that work across datasets without\nworrying about schema differences.\n\n## How type expansion works\n\nHamelin constructs expanded types by combining field names from all source\ntypes. Fields with the same name get aligned into a single field in the\nexpanded type. This lets you write queries that work across datasets with\nconsistent field naming. As an example, consider searching across different\nauthentication logs with varying schemas:\n\n```hamelin\nFROM security_logs, audit_logs, access_logs\n| WHERE action = 'login' OR event_type = 'authentication'\n| SELECT timestamp, user_id, source_ip, action, event_type, session_id\n```\n\nEach logging system has its own schema:\n\nSecurity logs track authentication attempts with IP addresses:\n\n```hamelin\n{timestamp: timestamp, user_id: string, source_ip: string, action: string}\n```\n\nAudit logs capture detailed session information:\n\n```hamelin\n{timestamp: timestamp, user_id: string, event_type: string, session_id: string}\n```\n\nAccess logs record basic user activity:\n\n```hamelin\n{timestamp: timestamp, user_id: string, source_ip: string}\n```\n\nThe expanded result type becomes:\n\n```hamelin\n{\n timestamp: timestamp,\n user_id: string,\n source_ip: string,\n action: string,\n event_type: string,\n session_id: string\n}\n```\n\nRows from `security_logs` will have `NULL` for the `event_type` and\n`session_id` fields. Rows from `audit_logs` will have `NULL` for the\n`source_ip` and `action` fields. Rows from `access_logs` will have `NULL` for\nthe `action`, `event_type`, and `session_id` fields.\n\nThe result contains rows like:\n\n| timestamp | user_id | source_ip | action | event_type | session_id |\n|-----------|---------|-----------|---------|------------|------------|\n| 2024-01-15 10:30:00 | alice.smith | 192.168.1.100 | login | NULL | NULL |\n| 2024-01-15 10:31:00 | alice.smith | NULL | NULL | authentication | sess_abc123 |\n| 2024-01-15 10:32:00 | alice.smith | 192.168.1.100 | NULL | NULL | NULL |\n\nAll three datasets contribute to the same `timestamp` and `user_id` fields\nbecause they use identical field names. You can filter and select on shared\nfields without knowing which source contributed each row. This lets you write a\nsingle query to search for authentication events across all systems, even\nthough each system logs different fields. The expanded type accommodates all\npossible fields, and you can filter on any field that exists in any source.\n\n\n\n## Nested type expansion\n\nType expansion works with nested structures. Hamelin expands the type hierarchy\nto accommodate nested fields from different sources. As an example, consider\ncombining user data from different systems:\n\n```hamelin\nFROM user_profiles, account_settings\n| SELECT user.name, user.email, user.preferences\n```\n\nEach system has its own nested user structure:\n\nUser profiles contain basic identity information:\n\n```hamelin\n{user: {name: string, email: string}}\n```\n\nAccount settings store user preferences:\n\n```hamelin\n{user: {preferences: string}}\n```\n\nThe expanded result type becomes:\n\n```hamelin\n{user: {name: string, email: string, preferences: string}}\n```\n\nThis means you can access `user.name` from profile data and `user.preferences`\nfrom settings data in the same query, even though the original sources have\ndifferent nested structures.\n\n :::note\n\n Hamelin maintains stable field ordering when merging nested structures. Fields\n from the first source appear first, then fields from the second source are\n added in their original order. This consistent ordering means you can rely on\n the structure of expanded types being predictable across queries.\n\n :::\n\n## Array literal expansion\n\nType expansion also happens when you create array literals containing struct\ntypes with different schemas. Just like `FROM` clauses, Hamelin creates an\nexpanded type that accommodates all fields from every struct in the array. As\nan example, consider creating an array mixing user records with different\navailable fields:\n\n```hamelin\nLET mixed_users = [\n {name: 'Alice', age: 30, department: 'Engineering'},\n {name: 'Bob', email: 'bob@company.com', age: 25},\n {name: 'Carol', email: 'carol@company.com', department: 'Sales'}\n]\n```\n\nEach struct has its own schema:\n\nThe first user record has name, age, and department:\n\n```hamelin\n{name: string, age: number, department: string}\n```\n\nThe second user record has name, email, and age:\n\n```hamelin\n{name: string, email: string, age: number}\n```\n\nThe third user record has name, email, and department:\n\n```hamelin\n{name: string, email: string, department: string}\n```\n\nThe expanded array type becomes:\n\n```hamelin\n[{name: string, age: number, department: string, email: string}]\n```\n\nEach element gets `NULL` values for missing fields. The first element has\n`NULL` for `email`. The second element has `NULL` for `department`. The third\nelement has `NULL` for `age`. This lets you create arrays from structs with\ndifferent schemas while maintaining type consistency across all elements.\n\nYou can then query the expanded array just like any other dataset:\n\n```hamelin\nLET mixed_users = [\n {name: 'Alice', age: 30, department: 'Engineering'},\n {name: 'Bob', email: 'bob@company.com', age: 25},\n {name: 'Carol', email: 'carol@company.com', department: 'Sales'}\n]\n| UNNEST mixed_users\n```\n\nThe query works across all elements regardless of which fields were originally\npresent in each struct. Missing fields appear as `NULL` in the results, just\nlike with `FROM` clause expansion.\n\nThe results would be:\n\n| name | age | department | email |\n|------|-----|------------|-------|\n| Alice | 30 | Engineering | NULL |\n| Bob | 25 | NULL | bob@company.com |\n| Carol | NULL | Sales | carol@company.com |\n",
507
507
  "types/array.md": '# Array\n\nArrays let you work with collections of values. Hamelin arrays work much like SQL arrays, but they integrate seamlessly with structs to handle complex nested data.\n\n## Creating arrays\n\nYou create arrays using square brackets with elements separated by commas.\n\n```hamelin\nLET event_types = ["login", "logout", "purchase"]\nLET user_ids = [1001, 1002, 1003, 1004]\nLET timestamps = [ts(\'2024-01-15T10:00:00\'), ts(\'2024-01-15T10:15:00\'), ts(\'2024-01-15T10:30:00\')]\n```\n\nArrays can contain any type of value, including numbers, strings, timestamps, and even structs.\n\n## Arrays of structs\n\nArrays become especially useful when they contain structs, as Hamelin automatically handles differences between struct fields.\n\n```hamelin\nLET user_events = [\n {user_id: 1001, event: "login", timestamp: ts(\'2024-01-15T09:00:00\')},\n {user_id: 1001, event: "purchase", timestamp: ts(\'2024-01-15T09:15:00\'), amount: 49.99},\n {user_id: 1001, event: "logout", timestamp: ts(\'2024-01-15T09:30:00\')}\n]\n```\n\nNotice how the second struct has an `amount` field that the others don\'t have. Hamelin automatically creates a combined type that includes all fields, setting missing fields to `NULL` where needed.\n\n## Accessing array elements\n\nUse square brackets with zero-based indexing to access individual elements.\n\n```hamelin\nFROM events\n| SELECT \n first_tag = tags[0],\n second_tag = tags[1],\n last_tag = tags[-1]\n```\n\nNegative indices count from the end of the array, so `[-1]` gives you the last element.\n\n## Array operations with mixed structs\n\nWhen you combine arrays containing structs with different fields, Hamelin merges the struct types intelligently.\n\n```hamelin\nLET login_events = [\n {event_type: "login", user_id: 100, timestamp: ts(\'2024-01-15T09:00:00\')},\n {event_type: "login", user_id: 101, timestamp: ts(\'2024-01-15T09:05:00\')}\n]\n\nLET purchase_events = [\n {event_type: "purchase", user_id: 100, amount: 25.99, timestamp: ts(\'2024-01-15T09:10:00\')}\n]\n\n// Combining these creates an array with all fields: event_type, user_id, timestamp, amount\n```\n\nThe resulting combined array contains structs where each element has all the fields that appear in any struct, with `NULL` values where fields are missing.\n\n## Field ordering in combined structs\n\nWhen Hamelin combines structs with different fields, it maintains the field order from the first struct encountered, then appends any new fields in the order they first appear.\n\n```hamelin\nLET events = [\n {id: 1, type: "login", user_id: 100}, // Order: id, type, user_id\n {status: "success", id: 2, type: "logout"} // New field \'status\' gets appended\n]\n\n// Result order: id, type, user_id, status\n```\n\n## Type compatibility\n\nArrays can only contain elements that can be coerced to a common type. Hamelin will combine compatible types automatically, but incompatible types will cause an error.\n\n```hamelin\n// This works - numbers can be in the same array\nLET mixed_numbers = [1, 2.5, 3]\n\n// This works - structs with compatible fields\nLET compatible_structs = [\n {name: "Alice", age: 30},\n {name: "Bob", age: 25, city: "Seattle"}\n]\n\n// This would fail - structs with same field name but different types\n// {name: "Alice", count: 5} and {name: "Bob", count: "many"} \n```\n\n## Practical examples\n\nArrays work well for collecting related values and organizing repeated data.\n\n```hamelin\n// Collecting user actions over time\nFROM user_logs\n| SELECT \n user_id,\n user_session = {\n daily_events: [\n {action: "login", time: login_time},\n {action: "view_page", time: page_view_time, page: page_name},\n {action: "logout", time: logout_time}\n ],\n session_duration: logout_time - login_time\n }\n```\n\nThis creates structured output where each user\'s session contains an array of different event types, each with their own specific fields.\n\n## Working with nested arrays\n\nArrays can contain structs that themselves contain arrays, creating complex nested structures.\n\n```hamelin\nFROM dns_logs\n| SELECT dns_response = {\n query: query_name,\n answers: [\n {name: answer1_name, type: answer1_type, ttl: answer1_ttl},\n {name: answer2_name, type: answer2_type, ttl: answer2_ttl}\n ],\n response_time: query_duration\n }\n```\n\nThis organizes DNS response data where each query can have multiple answers, and each answer has its own set of fields.',
508
- "types/casting.md": "# Casting\n\nTo cast, use the infix operator `AS`. Hamelin uses the `AS` operator for explicit type casting. You write the value, then `AS`, then the type you want.\n\nYou basically don't ever have to cast, except from variant to explicit types, where casting is a very important part of interpreting JSON.\n\n## Basic syntax\n\nCast a value by putting `AS` between the value and the target type:\n\n```hamelin\n| LET x = 5 AS double\n```\n\nThis creates a double-precision value instead of an integer.\n\n## Why `AS` for casting?\n\nYou'll use explicit casting often, especially when declaring literals to influence type inference. We wanted something terse. Using `AS` for assignment confuses people (the order seems backwards). This frees up `AS` for casting, which reads cleanly: *treat this one thing as another type*.\n\n## How it works\n\nThe `AS` operator translates explicit cast expressions into the generated code. We often actually translate to `try_cast()` in order to make sure the query doesn't crash.\n\nHamelin delegates **implicit casting to the underlying engine** \u2014 if you assign a value to a typed column or pass it to a function that expects a different type, the engine decides whether and how to cast the value.\n\n## Common casting examples\n\n### String conversions\nConvert values to strings for display or storage:\n\n```hamelin\nFROM events\n| SELECT\n user_id_str = user_id AS string,\n timestamp_str = timestamp AS string,\n status_display = status_code AS string\n```\n\n### Numeric conversions\nConvert between different numeric types or from strings to numbers:\n\n```hamelin\nFROM logs\n| SELECT\n status_code = response_code AS integer,\n response_time = response_time_str AS double,\n user_count = total_users AS integer\n```\n\n### Boolean conversions\nConvert various values to boolean types:\n\n```hamelin\nFROM user_data\n| SELECT\n user_id,\n is_active = status_flag AS boolean,\n has_permissions = permission_level AS boolean\n```\n\n## Type inference with casting\n\nYou can influence type inference in variable declarations by casting literals:\n\n```hamelin\nFROM events\n| LET\n threshold = 100 AS double,\n max_retries = 5 AS integer,\n default_timeout = 30.0 AS double\n| WHERE response_time > threshold\n```\n\n## Complex type casting\n\n### Array casting\nCast arrays to specific element types:\n\n```hamelin\nFROM json_data\n| SELECT\n tags = tag_list AS array(string),\n scores = score_array AS array(double)\n```\n\n### Struct casting\nCast structured data to specific field types:\n\n```hamelin\nFROM structured_data\n| SELECT\n user_info = user_data AS {name: string, email: string},\n coordinates = location AS {x: double, y: double}\n```\n",
508
+ "types/casting.md": "# Casting\n\nTo cast, use the infix operator `AS`. Hamelin uses the `AS` operator for explicit type casting. You write the value, then `AS`, then the type you want.\n\nThe two most common reasons to cast are:\n\n- Casting variant to explicit types after parsing JSON\n- Casting types to string to concatenate them together\n\n## Basic syntax\n\nCast a value by putting `AS` between the value and the target type:\n\n```hamelin\n| LET x = 5 AS double\n```\n\nThis creates a double-precision value instead of an integer.\n\n## Why `AS` for casting?\n\nYou'll use explicit casting often, especially when declaring literals to influence type inference. We wanted something terse. Using `AS` for assignment confuses people (the order seems backwards). This frees up `AS` for casting, which reads cleanly: *treat this one thing as another type*.\n\n## How it works\n\nThe `AS` operator translates explicit cast expressions into the generated code. We often actually translate to `try_cast()` in order to make sure the query doesn't crash.\n\nHamelin delegates **implicit casting to the underlying engine** \u2014 if you assign a value to a typed column or pass it to a function that expects a different type, the engine decides whether and how to cast the value.\n\n## Common casting examples\n\n### String conversions\nConvert values to strings for display or storage:\n\n```hamelin\nFROM events\n| SELECT\n user_id_str = user_id AS string,\n timestamp_str = timestamp AS string,\n status_display = status_code AS string\n```\n\n### Numeric conversions\nConvert between different numeric types or from strings to numbers:\n\n```hamelin\nFROM logs\n| SELECT\n status_code = response_code AS integer,\n response_time = response_time_str AS double,\n user_count = total_users AS integer\n```\n\n### Boolean conversions\nConvert various values to boolean types:\n\n```hamelin\nFROM user_data\n| SELECT\n user_id,\n is_active = status_flag AS boolean,\n has_permissions = permission_level AS boolean\n```\n\n## Type inference with casting\n\nYou can influence type inference in variable declarations by casting literals:\n\n```hamelin\nFROM events\n| LET\n threshold = 100 AS double,\n max_retries = 5 AS integer,\n default_timeout = 30.0 AS double\n| WHERE response_time > threshold\n```\n\n## Complex type casting\n\n### Array casting\nCast arrays to specific element types:\n\n```hamelin\nFROM json_data\n| SELECT\n tags = tag_list AS array(string),\n scores = score_array AS array(double)\n```\n\n### Struct casting\nCast structured data to specific field types:\n\n```hamelin\nFROM structured_data\n| SELECT\n user_info = user_data AS {name: string, email: string},\n coordinates = location AS {x: double, y: double}\n```\n",
509
509
  "types/map.md": "# Map\n\nYou use maps to store key-value pairs where you have too many different keys to create separate columns. Hamelin's **map** type matches SQL's design - it's a homogeneous structure where all keys have the same type and all values have the same type.\n\n## When to use maps\n\nYou should use maps rarely. Before using a map, consider whether you could:\n- Factor subset key spaces into separate tables\n- Leave the data as a JSON string and parse only commonly needed values into columns\n\nYou use maps only when you have a high cardinality key space that can't be handled with these alternatives.\n\n## Creating maps\n\nYou construct maps using the `map()` function, which has two overloaded forms.\n\n### Map literals\n\nYou create a map by listing key-value pairs directly:\n\n```hamelin\nLET config = map(\n 'timeout': 30,\n 'retries': 3,\n 'debug': false\n)\n```\n\n### From key and value arrays\n\nYou build a map from separate arrays of keys and values:\n\n```hamelin\nLET field_names = ['user_id', 'email', 'created_at']\nLET field_values = [12345, 'user@example.com', '2024-01-15']\nLET user_data = map(field_names, field_values)\n```\n\n\n\n## Empty maps\n\nYou create an empty map using the function without arguments:\n\n```hamelin\nLET empty_config = map()\n```\n\n## Type homogeneity\n\nMaps must be homogeneous - all values must have the same type. This example will generate an error:\n\n```hamelin\n// ERROR: mixing integer and string values\nLET broken_map = map(\n 'count': 42,\n 'name': 'example'\n)\n```\n\n## Accessing map values\n\nYou use bracket notation to retrieve values by key:\n\n```hamelin\nFROM events\n| LET metadata = map('source': 'api', 'version': 2)\n| SELECT event_source = metadata['source']\n```\n\n## Map storage\n\nThe underlying engine stores each map as a pair of related columns - one for keys and one for values. Row positions in each column relate the keys to their values.\n\n## Performance considerations\n\nUnlike structs, which add overhead by creating a column per field, maps have minimal impact on table width (only two columns). However, you pay the cost of key values being actual data rather than column names, even though they're dictionary encoded.\n",
510
510
  "types/philosophy.md": "# Type Philosophy\n\nHamelin is a typed language, which prevents query mistakes, provides better\nerror messages, and simplifies translation definitions.\n\n## Design Philosophy\n\nYou mostly won't think about types when writing queries. You'll learn about\nthem when something doesn't type check and Hamelin gives you a clear error\nmessage. Hamelin catches type errors before they reach the SQL engine, so you\nget helpful feedback about your Hamelin code instead of confusing messages about\ngenerated SQL.\n\n### Umbrella Types\n\nHamelin groups related types under umbrella categories instead of exposing every\nSQL type variation. As an example, all integer types from `tinyint` to `bigint` become the\nsingle `integer` type for type checking. You can reason about your code more\neasily while still getting precise error messages.\n\n### Error Prevention\n\nTypes catch mistakes early and give you clear feedback. When something doesn't\ntype check, errors point to your Hamelin code, not to generated SQL that would\nconfuse you.\n\n### Transparent Mapping\n\nHamelin types map cleanly to SQL types without requiring you to think about\nstorage details. The system handles these details automatically while preserving\nthe semantic meaning of your data.\n\n### Function Overloading\n\nHamelin's type system allows it to use the same function name for operations on\ndifferent types. You write `sum()` whether you're aggregating a column or adding\nup an array - no need for separate `sum_numbers()` and `array_sum()` functions.\nThis makes it easier on the author. It also makes it possible for Hamelin's\nimplementation to define dialect-specific translations.\n\n## Type Inference\n\nHamelin figures out types from your expressions and data automatically. You\ndon't need to declare types. Hamelin determines them based on how you use\nvalues in operations and functions.\n\nFor example:\n- `42 + 3.14e0` results in a `double` (floating-point arithmetic)\n- `'hello' + 'world'` results in a `string` (string concatenation)\n- `timestamp > '2024-01-01'` results in a `boolean` (comparison operation)\n- `sum(revenue)` works as an aggregation function in `AGG` commands\n- `sum([1, 2, 3])` works on arrays and returns `6`\n\nType inference also powers Hamelin's function translation, ensuring that\noperations translate to the right SQL functions based on the inferred types of\ntheir arguments. This lets you focus on expressing your logic clearly while the\ntype system works behind the scenes to ensure correctness and give you helpful\nfeedback when things go wrong.\n",
511
- "types/primitive-types.md": "# Primitive Types\n\nHamelin provides several primitive types that serve as building blocks for more complex data structures. These types map cleanly to SQL types while providing a simplified interface for type checking and operations.\n\n## Boolean\n\nThe `boolean` type translates directly to SQL's `boolean` type.\n\nExamples: `true`, `false`\n\n## Integer\n\nThe `integer` type is an umbrella for all integer types, from `tinyint` (8 bits)\nto `bigint` (64 bits). All integers are treated the same for type checking\npurposes.\n\nExamples: `1`, `42`, `-17`, `1000000`\n\n## Double (Floating Point)\n\nThe `double` type represents floating point numbers with variable precision. Use\ndoubles for calculations where approximate values are acceptable. Double\nliterals can be written in scientific notation (e.g., `1.5e0`). The `e0` means\n\"times 10 to the power of 0\", so `1.5e0` equals `1.5` but forces the result to\nbe a double type.\n\nExamples: `3.14159e0`, `2.5e0`, `1.23e-4`, `-9.87e2`\n\n## Decimal (Fixed Point)\n\nThe `decimal` type represents exact numeric values with specified precision and\nscale, written as `decimal(precision, scale)`. Decimal literals in Hamelin\ndefault to fixed point because most business calculations require exact\narithmetic rather than floating point approximations.\n\nExamples: `100.50`, `0.075`, `999.99`\n\n## String\n\nThe `string` type covers `char`, `varchar` (of any length). String concatenation\nuses the `+` operator.\n\nExamples: `'hello world'`, `\"error message\"`, `'user@example.com'`\n\n## Binary\n\nThe `binary` type translates to `varbinary` in SQL for handling binary data.\n\nExamples: Binary data representations for file contents, encrypted data, or raw bytes.\n\n## Timestamp\n\nThe `timestamp` type is an umbrella for `date`, `timestamp`, and all their\nvariants (precision and timezone). Hamelin follows SQL's assumption that\ntimestamps with and without zones can be compared.\n\nExamples: `ts('2024-01-15')`, `ts('2024-01-15 14:30:00')`, `ts('2024-01-15T14:30:00Z')`\n\n## Interval\n\nThe `interval` type covers `interval day to second` for time duration\ncalculations with fixed durations. These intervals represent exact amounts of\ntime that can be directly compared and calculated.\n\nExamples: `2h`, `30min`, `5d`\n\n## Calendar Interval\n\nThe `calendar interval` type covers `interval year to month` for calendar-based\ndurations. Calendar intervals like years and months don't represent a fixed\nnumber of days because months have different lengths and years can be leap\nyears. These intervals cannot be directly compared to day-based intervals.\n\nExamples: `3mon`, `2y`, `1q`\n\n## Range\n\nThe `range` type represents spans between two values of any type. Ranges\nare created by the `..` operator. Ranges can be bounded (with both start and end\npoints) or unbounded (extending infinitely in one direction). You see them most\ncommonly in Hamelin as `range(timestamp)` or `range(interval)`, and every query\ngenerally operates under the constraints of a `range(timestamp)`.\n\nExamples: `-2hr..-1hr`, `ts('2024-01-15')..now()`, `-1hr..`, `..now()`, `1..10`, `'a'..'z'`\n\n## Rows\n\nThe `rows` type represents a number of rows and is only useful when declaring\nwindow frames as a certain number of rows rather than a time-based frame.\n\nExamples: `5r`, `10r`, `0r`\n",
511
+ "types/primitive-types.md": "# Primitive Types\n\nHamelin provides several primitive types that serve as building blocks for more complex data structures. These types map cleanly to SQL types while providing a simplified interface for type checking and operations.\n\n## Boolean\n\nThe `boolean` type translates directly to SQL's `boolean` type.\n\nExamples: `true`, `false`\n\n## Int\n\nThe `int` type is an umbrella for all integer types, from `tinyint` (8 bits)\nto `bigint` (64 bits). All integers are treated the same for type checking\npurposes.\n\nExamples: `1`, `42`, `-17`, `1000000`\n\n## Double (Floating Point)\n\nThe `double` type represents floating point numbers with variable precision. Use\ndoubles for calculations where approximate values are acceptable. Double\nliterals can be written in scientific notation (e.g., `1.5e0`). The `e0` means\n\"times 10 to the power of 0\", so `1.5e0` equals `1.5` but forces the result to\nbe a double type.\n\nExamples: `3.14159e0`, `2.5e0`, `1.23e-4`, `-9.87e2`\n\n## Decimal (Fixed Point)\n\nThe `decimal` type represents exact numeric values with specified precision and\nscale, written as `decimal(precision, scale)`. Decimal literals in Hamelin\ndefault to fixed point because most business calculations require exact\narithmetic rather than floating point approximations.\n\nExamples: `100.50`, `0.075`, `999.99`\n\n## String\n\nThe `string` type covers `char`, `varchar` (of any length). String concatenation\nuses the `+` operator.\n\nExamples: `'hello world'`, `\"error message\"`, `'user@example.com'`\n\n## Binary\n\nThe `binary` type translates to `varbinary` in SQL for handling binary data.\n\nExamples: Binary data representations for file contents, encrypted data, or raw bytes.\n\n## Timestamp\n\nThe `timestamp` type is an umbrella for `date`, `timestamp`, and all their\nvariants (precision and timezone). Hamelin follows SQL's assumption that\ntimestamps with and without zones can be compared.\n\nExamples: `ts('2024-01-15')`, `ts('2024-01-15 14:30:00')`, `ts('2024-01-15T14:30:00Z')`\n\n## Interval\n\nThe `interval` type covers `interval day to second` for time duration\ncalculations with fixed durations. These intervals represent exact amounts of\ntime that can be directly compared and calculated.\n\nExamples: `2h`, `30min`, `5d`\n\n## Calendar Interval\n\nThe `calendar interval` type covers `interval year to month` for calendar-based\ndurations. Calendar intervals like years and months don't represent a fixed\nnumber of days because months have different lengths and years can be leap\nyears. These intervals cannot be directly compared to day-based intervals.\n\nExamples: `3mon`, `2y`, `1q`\n\n## Range\n\nThe `range` type represents spans between two values of any type. Ranges\nare created by the `..` operator. Ranges can be bounded (with both start and end\npoints) or unbounded (extending infinitely in one direction). You see them most\ncommonly in Hamelin as `range(timestamp)` or `range(interval)`, and every query\ngenerally operates under the constraints of a `range(timestamp)`.\n\nExamples: `-2hr..-1hr`, `ts('2024-01-15')..now()`, `-1hr..`, `..now()`, `1..10`, `'a'..'z'`\n\n## Rows\n\nThe `rows` type represents a number of rows and is only useful when declaring\nwindow frames as a certain number of rows rather than a time-based frame.\n\nExamples: `5r`, `10r`, `0r`\n",
512
512
  "types/struct.md": '# Struct\n\nStructs let you group related fields together into a single data structure. Unlike SQL\'s `ROW` type, Hamelin structs use field names rather than position to identify each field, making them safer and easier to work with.\n\n## Creating structs\n\nYou create structs using curly braces with field names and values.\n\n```hamelin\nLET user_data = {\n user_id: 12345,\n name: "Alice Johnson",\n email: "alice@example.com"\n}\n```\n\nThis creates a struct with three fields: `user_id`, `name`, and `email`. Each field has a name and a value.\n\n## Accessing struct fields\n\nUse dot notation to access individual fields within a struct.\n\n```hamelin\nFROM user_events\n| WHERE user_info.user_id == 12345\n| SELECT\n user_name = user_info.name,\n user_email = user_info.email\n```\n\nYou can access any field by name, regardless of the order they were defined in the struct.\n\n## Field order and naming\n\nStructs maintain the order of fields as you define them, but field identity\nduring type expansion comes from the name, not the position. You can read more\nabout that in [type expansion](../smart-features/type-expansion.md).\n\n```hamelin\n// These two structs declare different types, but they can be aligned during expansion\nLET\n profile1 = {\n id: 1001,\n status: "active",\n created: ts(\'2024-01-15T00:00:00\')\n },\n profile2 = {\n status: "inactive",\n id: 1002,\n created: ts(\'2024-01-16T00:00:00\')\n }\n```\n\n## Nested structs\n\nStructs can contain other structs for organizing complex data.\n\n```hamelin\nFROM events\n| SELECT structured_event = {\n user: {\n id: user_id,\n profile: {\n name: user_name,\n email: email_address\n }\n },\n event: {\n type: event_type,\n timestamp: event_time,\n source: data_source\n }\n }\n```\n\nAccess nested fields by chaining field names with dots:\n\n```hamelin\n| SELECT\n user_name = structured_event.user.profile.name,\n event_time = structured_event.event.timestamp\n```\n\n## Practical examples\n\nStructs work well for organizing related information that belongs together:\n\n```hamelin\n// HTTP request logging\nFROM access_logs\n| SELECT request_data = {\n request: {\n method: http_method,\n path: url_path,\n status: response_code\n },\n timing: {\n start_time: request_start,\n end_time: request_end,\n duration_ms: response_time\n },\n client: {\n ip: client_ip,\n user_agent: user_agent\n }\n }\n```\n\nThis creates clean, organized output where related fields are grouped logically rather than scattered across many columns.\n',
513
513
  "types/variant.md": "# Variant\n\nYou use Hamelin's **variant** type to work with JSON and other semi-structured data. Hamelin adopts the Variant trend for representing the JSON object model, making JSON feel native and easy to work with.\n\n## Parsing JSON into variant\n\nYou parse a JSON string into a variant using the `parse_json()` function:\n\n```hamelin\nFROM api_logs\n| LET event_data = parse_json(json_payload)\n| SELECT event_data\n```\n\n## Navigating variant data\n\nYou navigate variant substructure safely and ergonomically using dots and square brackets, just like with structs and arrays:\n\n```hamelin\nFROM car_sales\n| LET json = parse_json(src)\n| SELECT \n sale_date = json.date,\n salesperson_name = json.salesperson.name,\n customer_name = json.customer[0].name\n```\n\n## Accessing nested fields\n\nYou can access deeply nested fields using the same dot and bracket notation:\n\n```hamelin\nFROM events\n| LET data = parse_json(event_json)\n| SELECT \n user_email = data.user.profile.email,\n first_item_price = data.transaction.items[0].price\n```\n\n## Safe type conversion\n\nVariants cast safely to other Hamelin types. Individual conversion failures become `NULL` instead of crashing your query:\n\n```hamelin\nFROM logs\n| LET parsed = parse_json(log_data)\n| SELECT \n log_time = parsed.timestamp AS timestamp,\n user_id = parsed.user_id AS string,\n event_count = parsed.count AS integer\n```\n\n## Casting to structured types\n\nYou can cast variants to maps, arrays, and structs. These casts are safe and null on failure:\n\n```hamelin\nFROM json_data\n| LET parsed = parse_json(raw_json)\n| SELECT \n user_info = parsed AS {name: string, age: integer},\n tag_list = parsed.tags AS [string]\n```\n\n## Creating variant objects\n\nYou create variant objects by casting structs to variant:\n\n```hamelin\nLET user_struct = {name: 'Alice', age: 30}\n| LET user_variant = user_struct AS variant\n| SELECT user_variant\n```\n\n## Creating variant arrays\n\nYou create variant arrays by casting arrays to variant:\n\n```hamelin\nLET numbers = [1, 2, 3, 4, 5]\n| LET variant_list = numbers AS variant\n| SELECT variant_list\n```\n\n## Mixed type handling\n\nVariant handles mixed types within the same structure:\n\n```hamelin\nFROM api_responses\n| LET response = parse_json(response_body)\n| SELECT \n record_id = response.data.id AS string,\n is_active = response.data.active AS boolean,\n user_score = response.data.score AS double\n```\n\n## Database system compatibility\n\nHamelin adapts to your database system's JSON capabilities:\n\n- **Full VARIANT support** (Snowflake, Databricks): Hamelin uses native variant storage and operations\n- **JSON support** (BigQuery, Postgres): Hamelin treats JSON as the variant format \n- **ANSI JSON only**: Hamelin provides parsing and access functions but no efficient storage\n\n## Working with arrays\n\nYou access array elements using zero-based indexing:\n\n```hamelin\nFROM events\n| LET data = parse_json(event_data)\n| SELECT \n first_item = data.items[0].name,\n last_item = data.items[-1].name\n```\n\n## Handling missing fields\n\nVariant navigation is safe - accessing missing fields returns `NULL`:\n\n```hamelin\nFROM logs\n| LET parsed = parse_json(log_entry)\n| SELECT \n always_present = parsed.required_field,\n might_be_null = parsed.optional_field\n```\n\n---\n\n*Variant types make JSON feel native in Hamelin, providing safe navigation and conversion without requiring upfront schema knowledge.*"
514
514
  };
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hamelin.sh/documentation",
3
- "version": "0.2.4",
3
+ "version": "0.2.8",
4
4
  "sideEffects": false,
5
5
  "license": "UNLICENSED",
6
6
  "type": "module",