npm - @chrismo/superkit - Versions diffs - 1.0.0 → 1.1.0 - Mend

@chrismo/superkit 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/README.md +34 -3
package/docs/recipes/array.md +89 -0
package/docs/recipes/array.spq +49 -0
package/docs/recipes/format.md +40 -0
package/docs/recipes/format.spq +21 -0
package/docs/recipes/integer.md +42 -0
package/docs/recipes/integer.spq +22 -0
package/docs/recipes/string.md +73 -0
package/docs/recipes/string.spq +37 -0
package/docs/superdb-expert.md +171 -14
package/docs/tutorials/bash_to_sup.md +1 -1
package/docs/tutorials/cloudflare_durations.md +73 -0
package/docs/tutorials/moar_subqueries.md +39 -5
package/docs/tutorials/subqueries.md +51 -2
package/docs/zq-to-super-upgrades.md +23 -0
package/package.json +2 -2

package/README.md CHANGED Viewed

@@ -4,17 +4,48 @@ Documentation, tutorials, and recipes for [SuperDB](https://superdb.org/).
 **Website:** [chrismo.github.io/superkit](https://chrismo.github.io/superkit/)
+## Install
+```bash
+npm install -g @chrismo/superkit
+```
+## CLI Tools
+- `skdoc` — Browse documentation (expert guide, upgrade guide, tutorials)
+- `skgrok` — Search grok patterns
+- `skops` — Browse recipe functions and operators
+Also available via `npx skdoc`, `npx skgrok`, `npx skops`.
 ## Content
 - **Expert Guide** — Comprehensive SuperSQL syntax reference
 - **Upgrade Guide** — Migration guide from zq to SuperDB
 - **Tutorials** — Step-by-step guides for common patterns
+- **Recipes** — Reusable SuperSQL functions and operators
+- **Grok Patterns** — All SuperDB grok patterns
+## Library
+The [SuperDB MCP server](https://github.com/chrismo/superdb-mcp) depends on
+this package for its documentation tools. The TypeScript API is available for
+other integrations:
+```typescript
+import { superHelp, superRecipes, superGrokPatterns } from '@chrismo/superkit';
+```
-## How it works
+## Upgrading from pre-npm SuperKit
-Content is authored in [superdb-mcp](https://github.com/chrismo/superdb-mcp) and auto-synced here via GitHub Action. The site is built with Jekyll using the [Just the Docs](https://just-the-docs.com/) theme and deployed via GitHub Pages.
+If you previously installed SuperKit via the old `install.sh` script, remove
+the legacy files:
-`changelog.jsup` is kept for historical reference from the original SuperKit project.
+```bash
+rm -f ~/.local/bin/sk ~/.local/bin/skdoc ~/.local/bin/skgrok \
+      ~/.local/bin/skgrok.jsup ~/.local/bin/skops \
+      ~/.local/bin/skops.jsup ~/.local/bin/skops.spq
+```
 ## License

package/docs/recipes/array.md CHANGED Viewed

@@ -64,3 +64,92 @@ op sk_array_flatten: (
   | parse_sup(f'[{this}]')
 )
 ```
+---
+## sk_array_append
+Appends a value to the end of an array.
+**Type:** function
+| Argument | Description |
+|----------|-------------|
+| `arr` | The array to append to. |
+| `val` | The value to append. |
+```supersql
+sk_array_append([1,2,3], 4)
+-- => [1,2,3,4]
+sk_array_append([], "a")
+-- => ["a"]
+```
+**Implementation:**
+```supersql
+fn sk_array_append(arr, val): ([...arr, val])
+```
+---
+## sk_array_remove
+Removes all occurrences of a value from an array.
+**Type:** operator
+| Argument | Description |
+|----------|-------------|
+| `val` | The value to remove. |
+```supersql
+[1,2,3,2,1] | sk_array_remove 2
+-- => [1,3,1]
+["a","b","c"] | sk_array_remove "b"
+-- => ["a","c"]
+```
+**Implementation:**
+```supersql
+op sk_array_remove val: (
+  [unnest this | where this != val]
+)
+```
+---
+## sk_deep_flatten
+Recursively flattens nested arrays into a single flat array.
+Unlike `sk_array_flatten` which only flattens one level, `sk_deep_flatten` recursively processes all nested arrays regardless of depth.
+**Type:** operator
+```supersql
+[[1,[2,3]],[4,[5,[6]]]] | sk_deep_flatten
+-- => [1,2,3,4,5,6]
+[1,[2],[[3]]] | sk_deep_flatten
+-- => [1,2,3]
+```
+**Implementation:**
+```supersql
+op sk_deep_flatten: (
+  fn _df(v): (
+    case kind(v)
+    when "array" then (
+      [unnest [unnest v | _df(this)] | unnest this]
+    )
+    else [v]
+    end
+  )
+  _df(this)
+)
+```

package/docs/recipes/array.spq CHANGED Viewed

@@ -29,3 +29,52 @@ op sk_array_flatten: (
   | replace(this, ']','')
   | parse_sup(f'[{this}]')
 )
+fn skdoc_array_append(): (
+  cast(
+    {name:"sk_array_append",
+     type:"func",
+     desc:"Appends a value to the end of an array.",
+     args:[{name:"arr",desc:"The array to append to."}
+           {name:"val",desc:"The value to append."}],
+     examples:[{i:"sk_array_append([1,2,3], 4)",o:"[1,2,3,4]"}
+               {i:"sk_array_append([], \"a\")",o:"[\"a\"]"}]}, <skdoc>)
+)
+fn sk_array_append(arr, val): ([...arr, val])
+fn skdoc_array_remove(): (
+  cast(
+    {name:"sk_array_remove",
+     type:"op",
+     desc:"Removes all occurrences of a value from an array.",
+     args:[{name:"val",desc:"The value to remove."}],
+     examples:[{i:"[1,2,3,2,1] | sk_array_remove 2",o:"[1,3,1]"}
+               {i:"[\"a\",\"b\",\"c\"] | sk_array_remove \"b\"",o:"[\"a\",\"c\"]"}]}, <skdoc>)
+)
+op sk_array_remove val: (
+  [unnest this | where this != val]
+)
+fn skdoc_array_deep_flatten(): (
+  cast(
+    {name:"sk_deep_flatten",
+     type:"op",
+     desc:"Recursively flattens nested arrays into a single flat array.",
+     args:[],
+     examples:[{i:"[[1,[2,3]],[4,[5,[6]]]] | sk_deep_flatten",o:"[1,2,3,4,5,6]"}
+               {i:"[1,[2],[[3]]] | sk_deep_flatten",o:"[1,2,3]"}]}, <skdoc>)
+)
+op sk_deep_flatten: (
+  fn _df(v): (
+    case kind(v)
+    when "array" then (
+      [unnest [unnest v | _df(this)] | unnest this]
+    )
+    else [v]
+    end
+  )
+  _df(this)
+)

package/docs/recipes/format.md CHANGED Viewed

@@ -49,3 +49,43 @@ fn sk_format_bytes(value): (
   (value == 0) ? "0 B" : _sk_format_nonzero_bytes(value)
 )
 ```
+---
+## sk_format_epoch
+Converts Unix epoch milliseconds to a time value with timezone offset applied.
+**Type:** function
+| Argument | Description |
+|----------|-------------|
+| `epoch_ms` | Milliseconds since 1970-01-01 00:00:00 UTC. |
+| `tz_offset` | Timezone offset string like '-0500' or '+0530'. |
+```supersql
+sk_format_epoch(0, '+0000')
+-- => 1970-01-01T00:00:00Z
+sk_format_epoch(1704067200000, '-0500')
+-- => 2023-12-31T19:00:00Z
+```
+**Note:** SuperDB has no timezone-aware time type. The returned time value
+displays as UTC but represents the local time with the offset already applied.
+For display purposes only — do not use the result in further time arithmetic
+that assumes UTC.
+**Implementation:**
+```supersql
+fn sk_format_epoch(epoch_ms, tz_offset): (
+  {
+    sign: tz_offset[0:1],
+    hours: tz_offset[1:3]::int64,
+    mins: tz_offset[3:5]::int64,
+    base_time: (epoch_ms * 1000000)::time
+  }
+  | this.base_time + f'{this.sign == "-" ? "-" : ""}{this.hours}h{this.mins > 0 ? f"{this.mins}m" : ""}'::duration
+)
+```

package/docs/recipes/format.spq CHANGED Viewed

@@ -22,3 +22,24 @@ fn skdoc_format_bytes(): (
 fn sk_format_bytes(value): (
   (value == 0) ? "0 B" : _sk_format_nonzero_bytes(value)
 )
+fn skdoc_format_epoch(): (
+  cast(
+    {name:"sk_format_epoch",
+     type:"func",
+     desc:"Converts Unix epoch milliseconds to a time value with timezone offset applied.",
+     args:[{name:"epoch_ms",desc:"Milliseconds since 1970-01-01 00:00:00 UTC."}
+           {name:"tz_offset",desc:"Timezone offset string like '-0500' or '+0530'."}],
+     examples:[{i:"sk_format_epoch(0, '+0000')",o:"1970-01-01T00:00:00Z"}
+               {i:"sk_format_epoch(1704067200000, '-0500')",o:"2023-12-31T19:00:00Z"}]}, <skdoc>)
+)
+fn sk_format_epoch(epoch_ms, tz_offset): (
+  {
+    sign: tz_offset[0:1],
+    hours: tz_offset[1:3]::int64,
+    mins: tz_offset[3:5]::int64,
+    base_time: (epoch_ms * 1000000)::time
+  }
+  | this.base_time + f'{this.sign == "-" ? "-" : ""}{this.hours}h{this.mins > 0 ? f"{this.mins}m" : ""}'::duration
+)

package/docs/recipes/integer.md CHANGED Viewed

@@ -99,3 +99,45 @@ fn sk_max(a, b): (
   a > b ? a : b
 )
 ```
+---
+## sk_last_day_of_month
+Returns the last day number (28-31) of the given month and year. Correctly handles leap years.
+**Type:** function
+| Argument | Description |
+|----------|-------------|
+| `year` | The year (e.g. 2024). |
+| `month` | The month number (1-12). |
+```supersql
+sk_last_day_of_month(2024, 2)
+-- => 29
+sk_last_day_of_month(2023, 2)
+-- => 28
+sk_last_day_of_month(2024, 12)
+-- => 31
+sk_last_day_of_month(2024, 4)
+-- => 30
+```
+Works by constructing the first day of the next month as a time value, subtracting one day, then extracting the day number from the resulting date string.
+**Implementation:**
+```supersql
+fn sk_last_day_of_month(year, month): (
+  -- Returns the last day number of the given month
+  {
+    nm: month == 12 ? 1 : month + 1,
+    ny: month == 12 ? year + 1 : year
+  }
+  | ((f'{this.ny}-{this.nm > 9 ? "" : "0"}{this.nm}-01T00:00:00Z'::time - 1d)::string)[8:10]::uint8
+)
+```

package/docs/recipes/integer.spq CHANGED Viewed

@@ -51,3 +51,25 @@ fn skdoc_max(): (
 fn sk_max(a, b): (
   a > b ? a : b
 )
+fn skdoc_last_day_of_month(): (
+  cast(
+    {name:"sk_last_day_of_month",
+     type:"func",
+     desc:"Returns the last day number (28-31) of the given month and year.",
+     args:[{name:"year",desc:"The year (e.g. 2024)."}
+           {name:"month",desc:"The month number (1-12)."}],
+     examples:[{i:"sk_last_day_of_month(2024, 2)",o:"29"}
+               {i:"sk_last_day_of_month(2023, 2)",o:"28"}
+               {i:"sk_last_day_of_month(2024, 12)",o:"31"}
+               {i:"sk_last_day_of_month(2024, 4)",o:"30"}]}, <skdoc>)
+)
+fn sk_last_day_of_month(year, month): (
+  -- Returns the last day number of the given month
+  {
+    nm: month == 12 ? 1 : month + 1,
+    ny: month == 12 ? year + 1 : year
+  }
+  | ((f'{this.ny}-{this.nm > 9 ? "" : "0"}{this.nm}-01T00:00:00Z'::time - 1d)::string)[8:10]::uint8
+)

package/docs/recipes/string.md CHANGED Viewed

@@ -142,6 +142,79 @@ fn sk_pad_right(s, pad_char, target_length): (
 ---
+## sk_left
+Returns the first n characters of a string.
+**Type:** function
+| Argument | Description |
+|----------|-------------|
+| `s` | The string. |
+| `n` | Number of characters from the left. |
+```supersql
+sk_left('hello', 3)
+-- => 'hel'
+```
+**Implementation:**
+```supersql
+fn sk_left(s, n): (sk_slice(s, 0, sk_clamp(n, 0, len(s))))
+```
+---
+## sk_right
+Returns the last n characters of a string.
+**Type:** function
+| Argument | Description |
+|----------|-------------|
+| `s` | The string. |
+| `n` | Number of characters from the right. |
+```supersql
+sk_right('hello', 3)
+-- => 'llo'
+```
+**Implementation:**
+```supersql
+fn sk_right(s, n): (sk_slice(s, len(s) - sk_clamp(n, 0, len(s)), len(s)))
+```
+---
+## sk_mid
+Returns n characters from a string starting at a given position.
+**Type:** function
+| Argument | Description |
+|----------|-------------|
+| `s` | The string. |
+| `start` | Starting index, zero-based. |
+| `n` | Number of characters to return. |
+```supersql
+sk_mid('hello world', 6, 5)
+-- => 'world'
+```
+**Implementation:**
+```supersql
+fn sk_mid(s, start, n): (sk_slice(s, sk_clamp(start, 0, len(s)), sk_clamp(start, 0, len(s)) + sk_clamp(n, 0, len(s))))
+```
+---
 ## sk_urldecode
 URL decoder for SuperDB. Splits on `%`, decodes each hex-encoded segment, and joins back together.

package/docs/recipes/string.spq CHANGED Viewed

@@ -77,6 +77,43 @@ fn sk_pad_right(s, pad_char, target_length): (
   len(s) < target_length ? sk_pad_right(f'{s}{pad_char}', pad_char, target_length) : s
 )
+fn skdoc_left(): (
+  cast(
+    {name:"sk_left",
+     type:"func",
+     desc:"Returns the first n characters of a string.",
+     args:[{name:"s",desc:"The string."}
+           {name:"n",desc:"Number of characters from the left."}],
+     examples:[{i:"sk_left('hello', 3)",o:"'hel'"}]}, <skdoc>)
+)
+fn sk_left(s, n): (sk_slice(s, 0, sk_clamp(n, 0, len(s))))
+fn skdoc_right(): (
+  cast(
+    {name:"sk_right",
+     type:"func",
+     desc:"Returns the last n characters of a string.",
+     args:[{name:"s",desc:"The string."}
+           {name:"n",desc:"Number of characters from the right."}],
+     examples:[{i:"sk_right('hello', 3)",o:"'llo'"}]}, <skdoc>)
+)
+fn sk_right(s, n): (sk_slice(s, len(s) - sk_clamp(n, 0, len(s)), len(s)))
+fn skdoc_mid(): (
+  cast(
+    {name:"sk_mid",
+     type:"func",
+     desc:"Returns n characters from a string starting at a given position.",
+     args:[{name:"s",desc:"The string."}
+           {name:"start",desc:"Starting index, zero-based."}
+           {name:"n",desc:"Number of characters to return."}],
+     examples:[{i:"sk_mid('hello world', 6, 5)",o:"'world'"}]}, <skdoc>)
+)
+fn sk_mid(s, start, n): (sk_slice(s, sk_clamp(start, 0, len(s)), sk_clamp(start, 0, len(s)) + sk_clamp(n, 0, len(s))))
 -- TODO: skdoc_urldecode
 -- URL Decoder for SuperDB

package/docs/superdb-expert.md CHANGED Viewed

@@ -383,7 +383,7 @@ echo '{id:1,person_id:1,exercise:"tango"}
 {id:4,person_id:2,exercise:"cooking"}' > exercises.sup
 # joins supported: left, right, inner, full outer, anti
-super -c "
+super -s -c "
   select * from people.json people
   join exercises.sup exercises
   on people.id=exercises.person_id
@@ -391,7 +391,7 @@ super -c "
 # where ... is null not supported yet
 # unless coalesce used in the select clause
-super -c "
+super -s -c "
   select * from people.json people
   left join exercises.sup exercises
   on people.id=exercises.person_id
@@ -423,6 +423,159 @@ _current_tasks "| where done==true" | super -s -c "count()" -
 _current_tasks | super -s -c "where done==true | count()" -
 ```
+## Advanced Patterns
+### Finding Syntax Errors in .sup Files
+Read each line as a raw string and test it individually with `parse_sup()`.
+The first error reported is the real problem:
+```
+super -i line -j -c '
+  values {raw: this, parsed: parse_sup(this)}
+  | where is_error(parsed)
+  | cut raw
+' broken-file.sup
+```
+### Crosstab Pattern
+SQL crosstab using CASE/WHEN to pivot rows into columns:
+```sql
+SELECT
+  coalesce(category, 'Total') as _,
+  SUM(CASE WHEN win = true THEN count ELSE 0 END) AS win,
+  SUM(CASE WHEN win = false THEN count ELSE 0 END) AS loss
+GROUP BY _
+```
+### Fork and Join for Inline Data
+Use `fork` with inline structured data to split and rejoin streams:
+```
+values {
+  data:[{id:1,s:'a'},{id:2,s:'b'},{id:3,s:'c'}],
+  match:[{id:2},{id:3}]
+}
+| fork
+  ( unnest data )
+  ( unnest match )
+| inner join on left.id=right.id
+```
+See the [Subqueries tutorial](tutorials/subqueries) for fork-and-join as a
+streamable alternative to `collect`-based correlated subqueries, and
+[Moar Subqueries](tutorials/moar_subqueries) for the collect-first "go up
+before drilling down" pattern.
+### Aggregate Filters
+Use `filter (expr)` on aggregate functions for conditional aggregation.
+Non-matches produce a count of 0 instead of empty output:
+```
+-- Count of 0 for non-matches:
+values 1, 2 | count() filter (this == 3)
+-- => 0
+-- Conditional collection:
+unnest [{dir:"out",v:"90"},{dir:"in",v:"561"},{dir:"in",v:"306"}]
+| in_vals:=collect(v) filter (dir=="in"),
+  out_vals:=collect(v) filter (dir=="out")
+```
+### Record vs Map Types
+Key distinction between Records and Maps:
+- `{a:1}` is a Record (unquoted keys)
+- `|{"a":1}|` is a Map (literal primitive keys, pipe delimiters)
+- `put` and spread only work with Records, not Maps or Unions
+- Map keys must be literal primitive types
+- `collect_map` requires a Map Expression argument — use `|{key:val}|` syntax,
+  not a Record expression
+### Converting Map to Record
+Maps and Records are separate types. To convert a `collect_map` result to a
+Record for use with `put`/spread, strip the pipe delimiters and re-parse:
+```
+-- collect_map produces a Map:
+values {k:"a",v:1}, {k:"b",v:2} | collect_map(|{k:v}|)
+-- => |{"a":1,"b":2}|
+-- Convert Map to Record:
+values {k:"a",v:1}, {k:"b",v:2}
+| collect_map(|{k:v}|)
+| this::string | this[1:-1] | parse_sup(this)
+-- => {a:1,b:2}
+```
+### Which Builtins Need Explicit `this`
+Most aggregate functions need `this` passed in explicitly:
+- **Implicit** (no argument): `count()`
+- **Explicit**: `and(this)`, `any(this)`, `avg(this)`, `collect(this)`,
+  `dcount(this)`, `fuse(this)`, `max(this)`, `min(this)`, `or(this)`,
+  `sum(this)`, `union(this)`
+- **Oddball**: `collect_map(|{key:val}|)` — needs a Map Expression, not `this`
+Functions that changed in 0.1.0 to require explicit `this`:
+- `grep('pattern', this)` (was `grep(/pattern/)`)
+- `is(this, <type>)` (was `is(<type>)`)
+- `nest_dotted(this)` (was `nest_dotted()`)
+### Expressions Inside `put`
+Pipelines work inside `put` expressions — no lateral subquery hack needed:
+```
+values {arn:"arn:aws:kms:us-east-1:000000000000:key/abc123"}
+| put region:=(split(this.arn, ':') | this[3])
+-- => {arn:"arn:aws:kms:us-east-1:000000000000:key/abc123",region:"us-east-1"}
+```
+Direct indexing also works: `put region:=split(this.arn, ':')[3]`
+### String Slicing
+SuperDB uses exclusive end index (like Python):
+```
+"aoeusnth"[0:-1]
+-- => "aoeusnt" (last char excluded)
+"aoeusnth"[0:]
+-- => "aoeusnth" (full string)
+```
+### search vs where for Regex
+- `search 'pattern'` — search all fields
+- `where grep('pattern', this)` — filter with regex in where clause
+- No `=~` operator exists in SuperDB
+### Deep Walk (Recursive Transformation)
+A recursive function that walks nested structures, applying a
+transformation at every leaf:
+```
+fn walk(v):
+  case kind(v)
+  when "array" then
+    [unnest v | walk(this)]
+  when "record" then
+    unflatten([unnest flatten(v) | {key,value:walk(value)}])
+  else v+1
+  end
+values walk([{x:[1,2]},{y:3}])
+-- => [{x:[2,3]},{y:4}]
+```
 ## Advanced SuperDB Features
 ### Type System
@@ -548,8 +701,12 @@ super -s -c "{a:{c:1}, b:{d:'foo'}} | {...a, ...b}" # => {c:1, d:'foo'}
 - Check for a trailing `-` without stdin
 - Check for no trailing `-` with stdin (sometimes you get output anyway but this is usually wrong!)
+- Watch for trailing `-` inside bash loops — `while IFS= read -r line` provides
+  stdin, so a `super -c "..." -` inside the loop will consume it instead of the
+  pipe. Drop the `-` if the command doesn't need stdin input.
 - Verify field names match exactly (case-sensitive)
 - Check type mismatches in comparisons
+- `collect()` on empty stream returns `null` (not empty) — guard with `coalesce(result, [])`
 2. **Type Errors**
@@ -654,13 +811,13 @@ Converting numeric values (like milliseconds) to duration types uses f-string in
 ```bash
 # Convert milliseconds to duration
-super -c "values 993958 | values f'{this}ms'::duration"
+super -s -c "values 993958 | values f'{this}ms'::duration"
 # Convert to seconds first, then duration
-super -c "values 993958 / 1000 | values f'{this}s'::duration"
+super -s -c "values 993958 / 1000 | values f'{this}s'::duration"
 # Round duration to buckets (e.g., 15 minute chunks)
-super -c "values 993958 / 1000 | values f'{this}s'::duration | bucket(this, 15m)"
+super -s -c "values 993958 / 1000 | values f'{this}s'::duration | bucket(this, 15m)"
 ```
 **Key points:**
@@ -680,16 +837,16 @@ SuperDB uses `::type` syntax for type conversions (not function calls):
 ```bash
 # Integer conversion (truncates decimals)
-super -c "values 1234.56::int64" # outputs: 1234
+super -s -c "values 1234.56::int64" # outputs: 1234
 # String conversion
-super -c "values 42::string" # outputs: "42"
+super -s -c "values 42::string" # outputs: "42"
 # Float conversion
-super -c "values 100::float64" # outputs: 100.0
+super -s -c "values 100::float64" # outputs: 100.0
 # Chaining casts
-super -c "values (123.45::int64)::string" # outputs: "123"
+super -s -c "values (123.45::int64)::string" # outputs: "123"
 ```
 **Important:**
@@ -729,13 +886,13 @@ SuperDB has a `round()` function that rounds to the nearest integer:
 ```bash
 # Round to nearest integer (single argument only)
-super -c "values round(3.14)" # outputs: 3.0
-super -c "values round(-1.5)" # outputs: -2.0
-super -c "values round(1234.567)" # outputs: 1235.0
+super -s -c "values round(3.14)" # outputs: 3.0
+super -s -c "values round(-1.5)" # outputs: -2.0
+super -s -c "values round(1234.567)" # outputs: 1235.0
 # For rounding to specific decimal places, use the multiply-cast-divide pattern
-super -c "values ((1234.567 * 100)::int64 / 100.0)" # outputs: 1234.56 (2 decimals)
-super -c "values ((1234.567 * 10)::int64 / 10.0)" # outputs: 1234.5 (1 decimal)
+super -s -c "values ((1234.567 * 100)::int64 / 100.0)" # outputs: 1234.56 (2 decimals)
+super -s -c "values ((1234.567 * 10)::int64 / 10.0)" # outputs: 1234.5 (1 decimal)
 ```
 **Key points:**

package/docs/tutorials/bash_to_sup.md CHANGED Viewed

@@ -11,7 +11,7 @@ last_updated: "2026-02-17"
 # Getting Bash Text into SuperDB
-The companion to [sup_to_bash](sup_to_bash.md), this covers the reverse: safely
+The companion to [sup_to_bash]({% link docs/tutorials/sup_to_bash.md %}), this covers the reverse: safely
 getting raw text from Bash into SuperDB.
 ## The Problem

package/docs/tutorials/cloudflare_durations.md ADDED Viewed

@@ -0,0 +1,73 @@
+---
+title: "Cloudflare Log Durations"
+name: cloudflare-durations
+description: "Parsing Cloudflare edge timestamps, computing request durations, and bucketing for analysis."
+layout: default
+nav_order: 12
+parent: Tutorials
+superdb_version: "0.3.0"
+last_updated: "2026-04-05"
+---
+# Cloudflare Log Durations
+*Narrative tutorial — examples reference external Cloudflare log data.*
+Many Cloudflare log entries include edge timestamps like `@EdgeStartTimestamp`
+and `@EdgeEndTimestamp`. Computing request durations from these is a common
+analysis task — and a good example of SuperDB's string cleaning, time parsing,
+and bucketing capabilities.
+## The Problem
+Cloudflare timestamps often arrive with extra escaping:
+```
+"@EdgeStartTimestamp":"\"2025-04-22T18:16:46Z\""
+```
+We need to strip the escaped quotes, parse as time values, compute durations,
+and then analyze the distribution.
+## Step 1: Clean and Compute Durations
+```bash
+super -s -c "
+  drop Message, Service, Env
+  | start := regexp_replace(this['@EdgeStartTimestamp'], '[^A-Z0-9-:]', ''),
+    end := regexp_replace(this['@EdgeEndTimestamp'], '[^A-Z0-9-:]', '')
+  | start := start::time, end := end::time
+  | dur := end - start
+  | cut start, end, dur
+" cloudflare-extract.csv > cf-durations.sup
+```
+Key techniques:
+- `regexp_replace` strips everything except alphanumerics, hyphens, and colons
+- `::time` casts the cleaned strings to time values
+- Duration is simply `end - start` — SuperDB handles time arithmetic natively
+## Step 2: Bucket and Analyze
+```bash
+super -s -c "
+  log_count := collect(this) by bucket(dur, 3s)
+  | log_count := len(log_count)
+  | sort bucket
+" cf-durations.sup
+```
+This groups requests into 3-second duration buckets and counts how many fall
+into each, giving a histogram of request latencies.
+## Variations
+Adjust the bucket size for different granularity:
+```bash
+-- Fine-grained: 500ms buckets
+super -s -c "count() by bucket(dur, 500ms) | sort bucket" cf-durations.sup
+-- Coarse: 30s buckets
+super -s -c "count() by bucket(dur, 30s) | sort bucket" cf-durations.sup
+```

package/docs/tutorials/moar_subqueries.md CHANGED Viewed

@@ -1,16 +1,50 @@
 ---
 title: "Moar Subqueries"
 name: moar-subqueries
-description: "Additional subquery patterns including fork and full sub-selects."
+description: "Additional subquery patterns including collect-first, fork, and full sub-selects."
 layout: default
 nav_order: 10
 parent: Tutorials
-superdb_version: "0.2.0"
-last_updated: "2026-02-15"
+superdb_version: "0.3.0"
+last_updated: "2026-04-05"
 ---
 # Moar Subqueries
+## Collect-First Pattern ("Go Up Before Drilling Down")
+A common problem: you need to both aggregate the full dataset AND filter it
+based on those aggregation results. But SuperDB streams data — once it's
+consumed, it's gone.
+The collect-first pattern solves this by buffering everything into a single
+record, then using lateral subqueries to derive summaries while keeping access
+to all the data:
+```
+from data.json
+| collect(this) | {data: this}
+| put top_ten := [
+    unnest data
+    | aggregate count := count() by table
+    | sort -r count
+    | head 10
+    | values table
+  ]
+| unnest data
+| where table in top_ten
+| aggregate count := count() by table, bucket(ts, 1h)
+| sort table, bucket
+```
+The idea: collect everything first ("go up"), derive what you need (top ten
+tables), then drill back down into the raw data using those results as a filter.
+**Tradeoff:** This buffers the entire dataset into memory. For large datasets,
+consider the fork-and-join approach from
+[Subqueries]({% link docs/tutorials/subqueries.md %}) instead, which stays
+streamable.
 ## Fork
 One hassle to this approach is the limit of 2 forks. Nesting forks works, but
@@ -18,8 +52,8 @@ makes constructing this query a bit more difficult.
 ## Full Sub-Selects
-As of 20250815 build, this is much, much slower. I'm guessing it's doing a full
-reload of the data file each time.
+Much slower than pipe-style subqueries because the data file gets re-read each
+time.
 ```
 select

package/docs/tutorials/subqueries.md CHANGED Viewed

@@ -17,8 +17,6 @@ superdb.
 ## Correlated Subqueries
-[//]: # (TODO: file versions - phil's versions from Slack - NOT versions - issue #54)
 Let's start with this simple dataset:
 ```json lines
@@ -129,6 +127,57 @@ super -s -c '
 {id:4,date:"2025-02-28",foo:9}
 ```
+### Fork-and-Join: A Streamable Alternative
+The lateral subquery approach above uses `collect` to buffer the entire input
+into a single value before iterating. This works well for small datasets, but
+`collect` has limits on how large a single value can be. For larger datasets,
+a fork-and-join approach avoids that limitation by keeping things streamable.
+The idea is a self-join: raw data on one side, aggregated data on the other,
+joined on the matching fields.
+```mdtest-command
+super -s -c '
+  from data.json
+  | inner join (
+      from data.json
+      | foo := max(foo) by date
+    ) on {left.date, left.foo}={right.date, right.foo}
+  | values left
+  | sort date'
+```
+```mdtest-output
+{id:1,date:"2025-02-27",foo:3}
+{id:4,date:"2025-02-28",foo:9}
+```
+This can also use `fork` to read the input once instead of naming the file
+twice:
+```mdtest-command
+super -s -c '
+  from data.json
+  | fork
+    ( pass )
+    ( foo := max(foo) by date )
+  | inner join on {left.date, left.foo}={right.date, right.foo}
+  | values left
+  | sort date'
+```
+```mdtest-output
+{id:1,date:"2025-02-27",foo:3}
+{id:4,date:"2025-02-28",foo:9}
+```
+With `fork`, the data flows through a single unnamed input — one branch
+passes records through, the other aggregates. The multi-field join key uses
+the `{left.x, left.y}={right.x, right.y}` record syntax (see
+[multi-value joins](../join/#multi-value-joins)).
+The tradeoff: fork-and-join is more verbose, but it avoids the `collect`
+size limit and works with streaming pipelines.
 ## Subquery with Related Data Join
 A more realistic scenario: find the records with the top `score` per date, and

package/docs/zq-to-super-upgrades.md CHANGED Viewed

@@ -49,6 +49,7 @@ This table covers ALL breaking changes. Complex items reference detailed section
 | count type       | returns `uint64`            | returns `int64`                          |
 | Dynamic from     | `from pool`                 | `from f'{pool}'` (see section)           |
 | BSUP format      | BSUP v1                     | BSUP v2 (v1 no longer readable)          |
+| collect (empty)  | no output on empty stream   | returns `null` (see section)             |
 | collect/union    | preserves all errors        | drops `error("quiet")` values            |
 | concat/f-strings | errors propagate            | `null` values ignored                    |
@@ -456,6 +457,28 @@ super-0.2.0 -s data.bsup > data.sup
 super -f bsup data.sup > data-v2.bsup
 ```
+### collect on empty stream returns null
+In 0.1.0+, `collect()` on an empty stream returns `null` instead of producing
+no output. This can cause subtle downstream bugs — `this in null` drops all
+records instead of preserving them:
+```
+-- Empty collect returns null:
+values [1,2,3] | unnest this | where false | collect(this)
+-- Returns: null
+-- Downstream gotcha: "not in null" filters out everything:
+values ["a","b","c"] | unnest this | where not (this in null) | collect(this)
+-- Returns: null  (all records dropped!)
+-- Guard with coalesce or check for empty array:
+values ["a","b","c"] | unnest this
+  | where not (this in coalesce(null, []))
+  | collect(this)
+-- Returns: ["a","b","c"]
+```
 ### collect and union drop quiet errors
 In `collect` and `union` aggregate functions, `error("quiet")` values are now

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@chrismo/superkit",
-  "version": "1.0.0",
+  "version": "1.1.0",
   "description": "SuperDB toolkit — docs, recipes, grok patterns, and CLI tools for the super binary",
   "type": "module",
   "main": "dist/index.js",
@@ -43,4 +43,4 @@
   "engines": {
     "node": ">=18.0.0"
   }
-}
+}