npm - @dx-do/cli - Versions diffs - 5.2.49 → 6.0.1 - Mend

@dx-do/cli 5.2.49 → 6.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (185) hide show

package/dist-node/nassql-cookbook-n8kc0mff.md ADDED Viewed

@@ -0,0 +1,812 @@
+---
+id: nassql-cookbook
+title: NASSQL cookbook — operations reference and recipe catalog
+applies_to: nassql
+tags: [reference, recipes, authoring]
+related: [nassql-quickstart, mm-cookbook, gotchas, run-query-vs-run-partial]
+---
+# NASSQL cookbook
+Full operation reference for NASSQL (`/metrics/nassQuery`) authoring, plus a
+recipe catalog. Use after `nassql-quickstart.md` when you need the breadth
+of ops, the column-availability table, or a concrete pattern to start
+from. Embeds Metrics Metadata specifiers in `FROM_METADATA` and
+`JOIN_METADATA` — `mm-cookbook.md` documents that vocabulary.
+## NASSQL at a glance
+A NASSQL query is a **pipeline of operations** (`query: [op1, op2, ...]`)
+executed in order. Each op transforms the working set. The first op is
+always a source; the last op is almost always `KEEP` (or `DESCRIBE` while
+iterating).
+```mermaid
+flowchart LR
+  src["source: FROM_TOPOLOGY / FROM_METADATA / FROM / FROM_DATA"] --> joins["joins (optional): JOIN_TOPOLOGY / JOIN_METADATA / JOIN_DATA"]
+  joins --> transforms["transforms: GROUP / FILTER / WINDOW / MAP_STRING / FORMAT_TIME / DISTINCT / ORDER"]
+  transforms --> aggs["aggregations: COUNT / SUM / MEAN / TOP / BOTTOM / FIRST / LAST / QUANTILE / AGG"]
+  aggs --> shape["shape: KEEP (always last)"]
+```
+Mental model: a **dataframe pipeline**. Source ops produce rows with named
+columns; transforms add/rename/filter columns; aggregations collapse rows;
+`KEEP` projects the final column set.
+## Top-level envelope
+| Field | Type | Purpose |
+|-------|------|---------|
+| `query` | `QueryFunctionSpec[]` | Pipeline ops (required) |
+| `limit` | number | Row cap on the final result |
+| `authorizationView` | string | Auth-view restriction |
+## Source ops
+Each source op produces an initial row set with a known column shape.
+**`alias` is a reference scope, NOT a row-key prefix.** Setting
+`alias: "health"` on a `FROM` does **not** rename its output columns —
+`rows[0]` keys stay flat (`data.value`, `metric.source`, …). Aliases
+are how the engine disambiguates expression-context references (AGG
+inner `column`, FILTER predicates, MAP `fn`, SCRIPT references) when
+multiple FROMs / JOINs share column names. So:
+- `AGG { spec: [{ op: "LAST", column: "health.data.value", as: "Service Health" }] }` — works (expression context).
+- `KEEP { columns: ["health.data.value"] }` — returns no rows; the bare
+  `data.value` is in `rows[0]`, not the prefixed form.
+- See `gotchas` for the full rule and disambiguation pattern when two
+  FROMs emit the same column name.
+### `FROM_TOPOLOGY` — vertices/edges as rows
+`querySpecifier` types as a full TAS query, but **only the `filter` field
+is honored by the server**. `limit`, `projection`, `order`, etc. inside
+`querySpecifier` are silently ignored or cause a 400. Use the top-level
+`limit` for row capping.
+```json
+{
+  "op": "FROM_TOPOLOGY",
+  "querySpecifier": {
+    "filter": { "op": "ATTRIBUTE", "expressions": [
+      { "name": "type", "values": ["HOST"], "operator": "IN" } ] }
+  },
+  "alias": "hosts"
+}
+```
+Produces columns: `vertex.id`, `vertex.externalId`, `vertex.startTime`,
+`vertex.endTime`, and `vertex.attr.<attribute_name>` for each vertex
+attribute.
+### `FROM_METADATA` — metric metadata as rows
+`querySpecifier` is a **Metrics Metadata `QuerySpecifier`** (NOT a TAS
+query). See `mm-cookbook.md` for the specifier vocabulary.
+```json
+{
+  "op": "FROM_METADATA",
+  "querySpecifier": { "op": "ALL" },
+  "alias": "all_metadata"
+}
+```
+```json
+{
+  "op": "FROM_METADATA",
+  "querySpecifier": {
+    "op": "SPEC",
+    "sourceNameSpecifier": { "op": "REGEX", "pattern": ".*Infrastructure.*" },
+    "attributeNameSpecifier": { "op": "REGEX", "pattern": ".*CPU.*" }
+  },
+  "alias": "cpu_metrics"
+}
+```
+Produces columns: `metric.source`, `metric.path` (and `metric.id` /
+`metric.firstSeen` / `metric.lastSeen` if joined later).
+### `FROM` — raw time-series data
+Same `querySpecifier` shape as `FROM_METADATA` (Metrics Metadata
+specifier), but pulls actual data points.
+```json
+{
+  "op": "FROM",
+  "querySpecifier": {
+    "op": "SPEC",
+    "sourceNameSpecifier": { "op": "REGEX", "pattern": ".*Infrastructure.*" },
+    "attributeNameSpecifier": { "op": "REGEX", "pattern": ".*CPU.*Utilization.*" }
+  },
+  "queryRange": { "endTime": 0, "rangeSize": 3600000, "frequency": 60000 },
+  "alias": "cpu_data"
+}
+```
+Produces columns: `data.time`, `data.value`, `metric.source`, `metric.path`,
+`metric.id`. (`metric.id` is available without any `JOIN_METADATA` —
+verified by `[FROM, KEEP({columns:["metric.id"]})]`.)
+`queryRange` is optional but recommended for windowed aggregations.
+**`metric.attr.<name>` is also accessible in expression contexts** (AGG
+inner `column`, FILTER predicates, MAP `fn`, SCRIPT references) even
+though those keys don't appear in `rows[0]`. The engine resolves them
+at execution time against the matched metric's metadata. Example:
+`AGG { spec: [{ op: "LAST", column: "metric.attr.service_name", as: "Service Name" }] }`
+works.
+**Diagnostic — verify a FROM is matching anything**: run the FROM in
+isolation with `KEEP(metric.id)` and inspect the rowcount. Empty rows
+means the specifier matches no metric in this tenant — common when an
+ATTRIBUTE predicate (e.g. `is_Custom_health IN ["true"]`) doesn't match
+the values actually stored. The data-store editor exposes a "Run JUST
+this step" control on FROM / FROM_METADATA for exactly this purpose.
+### `FROM_DATA` — raw time-series by metric id list
+When you already have specific metric IDs (e.g. from a prior `FROM_METADATA`).
+```json
+{
+  "op": "FROM_DATA",
+  "querySpecifier": { "metrics": ["m-id-1", "m-id-2"] },
+  "queryRange": { "endTime": 0, "rangeSize": 3600000, "frequency": 60000 }
+}
+```
+Produces columns: `data.time`, `data.interval`, `data.min`, `data.max`,
+`data.value`, `data.count`, `metric.source`, `metric.path`, `metric.id`.
+Same `metric.attr.<name>` lazy-resolution rule as `FROM` for expression
+contexts.
+### `FROM_TABLE` / `SHOW_TABLES`
+Internal NASSQL table access; rarely used in normal queries.
+## Join ops
+### `JOIN_METADATA`
+After a `FROM_TOPOLOGY` source, join in metric metadata for those vertices.
+```json
+{
+  "op": "JOIN_METADATA",
+  "querySpecifier": { "op": "ALL" },
+  "alias": "vertex_metrics"
+}
+```
+`querySpecifier` is a Metrics Metadata specifier. Adds `metric.source`,
+`metric.path`, `metric.id`, `metric.firstSeen`, `metric.lastSeen` columns.
+`joinType`: `INNER` (default) | `LEFT`. Use `LEFT` when you want to keep
+vertices that have no metrics (rows with null metric columns are retained).
+### `JOIN_TOPOLOGY`
+Inverse: after a metadata source, join the topology for the metrics' source
+vertices. Configured purely via `externalIdColumn` / `joinType` /
+`rowsClampSize` — **no `querySpecifier`**, the rows already point at
+vertices.
+```json
+{ "op": "JOIN_TOPOLOGY", "externalIdColumn": "metric.source", "joinType": "INNER" }
+```
+### `JOIN_DATA`
+After a metadata or topology source, join time-series data for the matching
+metrics.
+```json
+{
+  "op": "JOIN_DATA",
+  "metricIdColumn": "metric.id",
+  "queryRange": { "endTime": 0, "rangeSize": 3600000, "frequency": 60000 }
+}
+```
+## Transform ops
+### `GROUP`
+Group rows by the listed columns. Required before windowed/aggregated ops
+if you want per-group results (e.g. per source).
+```json
+{ "op": "GROUP", "columns": ["metric.source"] }
+```
+**Context-dependent column-drop.** `GROUP` itself doesn't immediately
+prune columns — it just registers the grouping key. The drop happens at
+the **next aggregator** (`AGG` / `COUNT` / `SUM` / `MEAN` / `MIN` /
+`MAX` / `FIRST` / `LAST` / `QUANTILE` / `NASS_AGG`):
+- `GROUP → AGG` (or any aggregator) → output collapses to *grouping
+  key columns* + each aggregator's `as` outputs. Non-grouped,
+  non-aggregated columns are gone after this point.
+- `GROUP → BOTTOM` / `→ TOP` / `→ ORDER` / `→ KEEP` (no aggregator
+  between) → upstream columns are preserved; `GROUP` is effectively
+  a no-op for column shape.
+**Repeated `GROUP` overrides** the previous user-grouping key. If you
+need a multi-pass pipeline, GROUP-then-aggregate-then-GROUP-again works
+because the second GROUP starts a fresh key.
+### `FILTER`
+Filter rows by a `QueryFilterPredicateSpec`. Predicate ops:
+| Op | Use |
+|----|-----|
+| `NUMERIC` | `{ op:"NUMERIC", column, operator: EQ\|LT\|LE\|GT\|GE\|NE, value }` |
+| `IN` | `{ op:"IN", column, values: [...] }` |
+| `REGEX` | `{ op:"REGEX", column, pattern, ignoreCase }` |
+| `EXPR` | `{ op:"EXPR", spec: "string expression" }` |
+| `AND` | `{ op:"AND", spec: [predA, predB, ...] }` |
+| `OR`  | `{ op:"OR", spec: [predA, predB, ...] }` |
+| `NOT` | `{ op:"NOT", spec: predA }` |
+```json
+{ "op": "FILTER", "spec": {
+  "op": "NUMERIC", "column": "metric_count", "operator": "GT", "value": 2000 } }
+```
+### `FILTER_EXPR`
+String-based filter expression for cases the predicate spec cannot express.
+```json
+{ "op": "FILTER_EXPR", "spec": "metric_count > 100 && metric.source matches '.*prod.*'" }
+```
+### `ORDER`
+Sort by one or more columns.
+```json
+{ "op": "ORDER",
+  "columns": [{ "column": "metric_count", "sortDescending": true }],
+  "topN": 10 }
+```
+`topN` is an optional cap — equivalent to `TOP` without a separate op.
+### `KEEP` — must be last
+Project the final column set. Optional `as[]` renames columns positionally.
+```json
+{ "op": "KEEP", "columns": ["metric.source", "metric_count"] }
+```
+```json
+{ "op": "KEEP", "columns": ["vertex.attr.name", "metric_count"],
+  "as": ["host", "metrics"] }
+```
+The API rejects `"KEEP function has to be the last one"` if anything
+follows.
+### `DISTINCT`
+Collapse duplicate rows.
+```json
+{ "op": "DISTINCT" }
+```
+### `MAP_STRING`
+Apply a regex to a column, capturing groups into new named columns.
+```json
+{ "op": "MAP_STRING",
+  "column": "vertex.attr.name", "pattern": "(.*)/(.*)",
+  "as": ["dir", "leaf"] }
+```
+`as[]` length should match the number of regex capture groups. Use
+`filterNotMatching: true` to drop rows that do not match.
+### `MAP`
+Per-row arithmetic expression. `fn` supports column references, arithmetic
+(+ - * / ^), comparisons, logical operators, sqrt, and parentheses.
+```json
+{ "op": "MAP", "fn": "abs(data.value)", "as": "abs_value" }
+```
+### `FORMAT_TIME`
+Format an epoch-millisecond column as a string.
+```json
+{ "op": "FORMAT_TIME", "column": "vertex.endTime",
+  "as": "last_seen", "pattern": "yyyy-MM-dd HH:mm:ss",
+  "timezone": "UTC" }
+```
+`duration: true` formats the value as a duration instead of a timestamp.
+### `FORMAT`
+General-purpose string formatter using a Java-style format spec.
+```json
+{ "op": "FORMAT", "format": "%s (%d)", "columns": ["name", "count"], "as": "label" }
+```
+## Window ops
+### `WINDOW`
+Time-bucket rows into windows of `every` milliseconds. Use **before** an
+aggregation to get per-window aggregates.
+```json
+{ "op": "WINDOW", "every": 3600000 }
+```
+`align`: `ABSOLUTE | LEFT | RIGHT`. `incomplete: false` (default) drops
+partial windows.
+### `WINDOW_CALENDAR`
+Calendar-aligned windows.
+```json
+{ "op": "WINDOW_CALENDAR", "calendarInterval": "HOUR", "timeZone": "UTC" }
+```
+`calendarInterval`: `MINUTE | HOUR | DAY | WEEK | MONTH | QUARTER | YEAR`.
+## Aggregation ops
+All take optional `as` (output column name) and `column` (input).
+Aggregations **collapse** non-grouped, non-aggregated columns — only
+`GROUP` columns and the produced `as` column survive.
+| Op | Field shape | Purpose |
+|----|-------------|---------|
+| `COUNT` | `{ as }` (no column required — counts rows) | Row count |
+| `SUM` | `{ column, as }` | Sum |
+| `MEAN` | `{ column, as, weightColumn? }` | Mean (optionally weighted) |
+| `MIN` / `MAX` | `{ column, as }` | Extremes |
+| `FIRST` / `LAST` | `{ column, as, orderSrc? }` | First/last by time or `orderSrc` |
+| `QUANTILE` | `{ column, as, index, scale?, method? }` | Quantile / percentile |
+| `DERIVATIVE` | `{ column, as, unit?, negative?, timeSrc? }` | Per-time-unit slope |
+| `DIFFERENCE` | `{ column, as, negative?, timeSrc? }` | Successive differences |
+| `AGG` | `{ spec: [{ op, column, as, ... }, ...] }` | Multi-aggregation in one pass |
+| `NASS_AGG` | `{ fromAlias?, as }` | Aggregation specific to nass values |
+| `TOP` | `{ column, n, sortAscending? }` | Top-N by column |
+| `BOTTOM` | `{ column, n, sortAscending? }` | Bottom-N by column |
+`TOP`/`BOTTOM` use `n`, not `count`. `topN` exists only on `ORDER`.
+**Default sort direction differs between TOP and BOTTOM** — when
+`sortAscending` is omitted:
+- `TOP` → `sortAscending = false` (descending; **largest** first).
+- `BOTTOM` → `sortAscending = true` (ascending; **smallest** first).
+Setting `sortAscending: true` on `TOP` is legal but produces the same
+ordering as `BOTTOM` — at that point use `BOTTOM` for clarity.
+## Debug ops
+### `DESCRIBE`
+Returns the column schema of the working set instead of the data. Insert
+anywhere while authoring to learn what columns are available.
+```json
+{ "op": "DESCRIBE" }
+```
+Use it after every source/join op while iterating; remove for the final
+query.
+### `LOG`
+Emits an intermediate-result log without modifying the rows.
+`countRecords=true` logs only the row count. `excludeFinalResult=true`
+prevents the LOG output from being included in the response. **Remove
+before saving the final query.**
+```json
+{ "op": "LOG", "name": "after_join", "limit": 5, "excludeFinalResult": true }
+```
+### `SCRIPT`
+Execute a script body that emits computed columns. Significantly slower
+than native ops (disables query-plan optimizations and runs server-side
+per row) — use sparingly; prefer `MAP` / `FORMAT` / `DERIVATIVE` /
+`DIFFERENCE` for arithmetic / formatting.
+**Canonical signature** (the function shape the engine invokes; copy
+this skeleton verbatim and fill in the body):
+```js
+(function nassqlfn(rows) {
+  // rows is an array, and each row is an array of the input column values
+  // — each row's positions map to inputColumns[] in the same order.
+  // return an array of rows; each row should be an array of length =
+  // outputColumns.length, in outputColumns[] order.
+})
+```
+Worked example — pick the first non-null/non-blank column and emit it
+as `Health`:
+```json
+{
+  "op": "SCRIPT",
+  "inputColumns": ["Custom Health", "Service Health"],
+  "outputColumns": ["Health"],
+  "script": "(function nassqlfn(rows) { return rows.map(function(row) { return [row[0] != null && row[0] !== ' ' ? row[0] : row[1]]; }); })"
+}
+```
+`inputColumns[]` declares the upstream columns the script can read
+(positional), and `outputColumns[]` declares the columns it emits
+(also positional). The visual editor pre-populates new SCRIPT steps
+with the skeleton above.
+## Column conventions
+| After op | Columns added |
+|----------|---------------|
+| `FROM_METADATA` | `metric.id`, `metric.source`, `metric.path`, `metric.name`, `metric.description`, `metric.firstSeen`, `metric.lastSeen` (plus `metric.attr.<X>` resolvable in expression contexts) |
+| `FROM` (data via metadata spec) | `data.time`, `data.value`, `metric.source`, `metric.path`, `metric.id` (plus `metric.attr.<X>` resolvable in expression contexts) |
+| `FROM_DATA` (data via metric ids) | `data.time`, `data.interval`, `data.min`, `data.max`, `data.value`, `data.count`, `metric.source`, `metric.path`, `metric.id` (plus `metric.attr.<X>` resolvable in expression contexts) |
+| `FROM_TOPOLOGY` | `vertex.id`, `vertex.externalId`, `vertex.startTime`, `vertex.endTime`, `vertex.attr.<X>` per attribute |
+| `JOIN_METADATA` (after `FROM_TOPOLOGY`) | + `metric.source`, `metric.path`, `metric.id`, `metric.firstSeen`, `metric.lastSeen` |
+| `JOIN_TOPOLOGY` (after metadata) | + the vertex columns above |
+| `JOIN_DATA` | + `data.time`, `data.interval`, `data.min`, `data.max`, `data.value`, `data.count` |
+| `MAP` / `MAP_STRING` / `FORMAT` / `FORMAT_TIME` / `DERIVATIVE` / `DIFFERENCE` | + the op's `as` column (preserves all upstream columns) |
+| `SCRIPT` | + every column named in `outputColumns[]` (preserves all upstream columns) |
+| `AGG` / `COUNT` / `SUM` / `MEAN` / `MIN` / `MAX` / `FIRST` / `LAST` / `QUANTILE` | output is *grouping key* + each aggregator's `as` (defaults to source column for most; `count` for COUNT). Non-grouped non-aggregated columns are dropped. |
+| `NASS_AGG` | output is *grouping key* + 7-column burst `<as>.{time, interval, min, max, value, count}` (`<as>` defaults to `fromAlias`) |
+| `KEEP` | terminal — projects to `columns[]` (renamed via positional `as[]`) |
+| `WINDOW` / `WINDOW_CALENDAR` | + `window.timestart`, `window.timeend` (only materialize after the next aggregator; before that they're a marker) |
+**Key naming surprises:**
+- `FROM` (data) gives `metric.source` (NOT `metric.sourceName`) and `data.value` (NOT `metric.value`).
+- All FROM/FROM_DATA/FROM_METADATA emit `metric.id` — no `JOIN_METADATA` needed to access it.
+- `WINDOW` + aggregation collapses non-grouped columns — always `GROUP` before `WINDOW`.
+- `metric.attr.<name>` and `vertex.attr.<name>` are resolvable in expression
+  contexts (AGG inner `column`, FILTER predicates, MAP `fn`, SCRIPT references)
+  even though they don't appear in the row-level column list. The engine
+  resolves them lazily at execution time.
+## Pipeline ordering rules
+Distilled from live validation:
+1. **Source first** — `FROM_TOPOLOGY` / `FROM_METADATA` / `FROM` /
+   `FROM_DATA` must be the first op.
+2. **`KEEP` last** — anything after `KEEP` is rejected with
+   `"KEEP function has to be the last one"`.
+3. **`GROUP` before `WINDOW`** — for time-series per-entity aggregates,
+   group first or the entity column collapses away.
+4. **`GROUP` before aggregations** — to get per-group results; otherwise
+   the aggregation runs across all rows.
+5. **REGEX inside SPEC** — never use `{ op: "REGEX", ... }` directly as a
+   `querySpecifier`. Wrap in `SPEC` with `sourceNameSpecifier` and/or
+   `attributeNameSpecifier` (see `mm-cookbook.md`).
+6. **`JOIN_*` after compatible source** — `JOIN_TOPOLOGY` requires a
+   metadata source above; `JOIN_METADATA` typically follows
+   `FROM_TOPOLOGY`.
+See `gotchas.md` for more pitfalls.
+## Recipe catalog
+Each recipe links to a worked example shipped in the corpus —
+`corpus_get("queries", "<id>")` returns the full payload + per-op
+descriptions.
+### "Count all metrics on the tenant"
+`corpus_get("queries", "20-nassql-from-metadata-basic")`
+```json
+{
+  "query": [
+    { "op": "FROM_METADATA", "querySpecifier": { "op": "ALL" }, "alias": "all_metrics" },
+    { "op": "COUNT", "as": "total" },
+    { "op": "KEEP", "columns": ["total"] }
+  ],
+  "limit": 10
+}
+```
+### "List metric paths matching source/attribute regex"
+`corpus_get("queries", "21-nassql-from-metadata-regex")`
+```json
+{
+  "query": [
+    { "op": "FROM_METADATA",
+      "querySpecifier": {
+        "op": "SPEC",
+        "sourceNameSpecifier": { "op": "REGEX", "pattern": ".*Infrastructure.*" },
+        "attributeNameSpecifier": { "op": "REGEX", "pattern": ".*CPU.*" }
+      },
+      "alias": "cpu_metrics" },
+    { "op": "KEEP", "columns": ["metric.source", "metric.path"] }
+  ],
+  "limit": 50
+}
+```
+### "Top N most-emitting metric sources"
+`corpus_get("queries", "25-nassql-group-order-top")` (also see
+`corpus_get("queries", "03-discover-sources")` for the same shape without
+TOP for a full sorted list)
+```json
+{
+  "query": [
+    { "op": "FROM_METADATA", "querySpecifier": { "op": "ALL" }, "alias": "all_metadata" },
+    { "op": "GROUP", "columns": ["metric.source"] },
+    { "op": "COUNT", "as": "metric_count" },
+    { "op": "TOP", "column": "metric_count", "n": 10 },
+    { "op": "KEEP", "columns": ["metric.source", "metric_count"] }
+  ],
+  "limit": 10
+}
+```
+### "Bottom N (least-active) sources"
+`corpus_get("queries", "32-nassql-bottom-aggregation")` — same as TOP but
+`BOTTOM`.
+### "List entities of a type, projected"
+`corpus_get("queries", "22-nassql-from-topology")`
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": { "op": "ATTRIBUTE", "expressions": [
+        { "name": "type", "values": ["HOST"], "operator": "IN" } ] } },
+      "alias": "hosts" },
+    { "op": "KEEP", "columns": ["vertex.attr.name", "vertex.attr.type", "vertex.externalId"] }
+  ],
+  "limit": 20
+}
+```
+### "Cross-domain count: entities -> metric counts per entity"
+`corpus_get("queries", "23-nassql-join-topology-metadata")`
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": { "op": "ATTRIBUTE", "expressions": [
+        { "name": "type", "values": ["AGENT"], "operator": "IN" } ] } },
+      "alias": "agents" },
+    { "op": "JOIN_METADATA", "querySpecifier": { "op": "ALL" }, "alias": "agent_metrics" },
+    { "op": "GROUP", "columns": ["vertex.attr.name", "vertex.attr.type"] },
+    { "op": "COUNT", "as": "metric_count" },
+    { "op": "ORDER", "columns": [{ "column": "metric_count", "sortDescending": true }] },
+    { "op": "KEEP", "columns": ["vertex.attr.name", "vertex.attr.type", "metric_count"] }
+  ],
+  "limit": 20
+}
+```
+### "Per-source mean over a window"
+`corpus_get("queries", "24-nassql-from-data-window-mean")` — note `GROUP`
+is **before** `WINDOW` so `metric.source` survives.
+```json
+{
+  "query": [
+    { "op": "FROM",
+      "querySpecifier": {
+        "op": "SPEC",
+        "sourceNameSpecifier": { "op": "REGEX", "pattern": ".*Infrastructure.*" },
+        "attributeNameSpecifier": { "op": "REGEX", "pattern": ".*CPU.*Utilization.*" }
+      },
+      "alias": "cpu_data" },
+    { "op": "GROUP", "columns": ["metric.source"] },
+    { "op": "WINDOW", "every": 3600000 },
+    { "op": "MEAN", "column": "data.value", "as": "avg_cpu" },
+    { "op": "KEEP", "columns": ["metric.source", "avg_cpu"] }
+  ],
+  "limit": 50
+}
+```
+### "Filter rows by numeric threshold"
+`corpus_get("queries", "26-nassql-filter-predicate")`
+```json
+{
+  "query": [
+    { "op": "FROM_METADATA", "querySpecifier": { "op": "ALL" }, "alias": "all_metrics" },
+    { "op": "GROUP", "columns": ["metric.source"] },
+    { "op": "COUNT", "as": "metric_count" },
+    { "op": "FILTER", "spec": { "op": "NUMERIC", "column": "metric_count", "operator": "GT", "value": 2000 } },
+    { "op": "ORDER", "columns": [{ "column": "metric_count", "sortDescending": true }] },
+    { "op": "KEEP", "columns": ["metric.source", "metric_count"] }
+  ],
+  "limit": 50
+}
+```
+### "Distinct rows / projection only"
+`corpus_get("queries", "27-nassql-distinct-keep")`
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": { "op": "ATTRIBUTE", "expressions": [
+        { "name": "type", "values": ["k8s_NAMESPACE"], "operator": "IN" } ] } },
+      "alias": "namespaces" },
+    { "op": "KEEP", "columns": ["vertex.attr.name", "vertex.attr.type"] }
+  ],
+  "limit": 50
+}
+```
+### "Format an epoch column as a date string"
+`corpus_get("queries", "28-nassql-format-time")`
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": { "op": "ATTRIBUTE", "expressions": [
+        { "name": "type", "values": ["AGENT"], "operator": "IN" } ] } },
+      "alias": "agents" },
+    { "op": "FORMAT_TIME", "column": "vertex.endTime",
+      "as": "last_seen", "pattern": "yyyy-MM-dd HH:mm:ss" },
+    { "op": "KEEP", "columns": ["vertex.attr.name", "vertex.attr.type", "last_seen"] }
+  ],
+  "limit": 20
+}
+```
+### "Discover columns at any pipeline step"
+`corpus_get("queries", "29-nassql-describe-log")` (also
+`corpus_get("queries", "04-discover-metadata-columns")`)
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY", "querySpecifier": { "filter": { } }, "alias": "x" },
+    { "op": "JOIN_METADATA", "querySpecifier": { "op": "ALL" }, "alias": "y" },
+    { "op": "DESCRIBE" }
+  ],
+  "limit": 100
+}
+```
+**MCP shortcut for the source-step case.** When the question is specifically "what `metric.attr.*` / `vertex.attr.*` columns will my source op expose?", `nassql_step_columns` answers it in one call. Pass your in-progress payload + the 0-based index of the source / join op and it returns `{attrs: [...], kind, durationMs}`. The handler picks the right probe automatically (MM `queryMetric` for `FROM_METADATA`, a `KEEP(metric.id)` slice + MM lookup for `FROM`, `COLLECT_ATTRIBUTE_NAMES` for `FROM_TOPOLOGY`, an external-id harvest + COLLECT for `JOIN_TOPOLOGY`). Use the MCP probe when authoring; use the `DESCRIBE` recipe above when you need to see the entire row shape (including non-attribute columns) or the join surface across two sources at once.
+### "Verify a FROM is matching any metrics"
+When a downstream aggregation produces nulls or zero rows, isolate
+each source op and confirm it actually matches something.
+```json
+{
+  "query": [
+    { "op": "FROM",
+      "querySpecifier": { "op": "SPEC",
+        "sourceNameSpecifier": { "op": "EXACT", "names": ["<your-source>"] },
+        "attributeNameSpecifier": { "op": "REGEX", "pattern": "<your-pattern>" } },
+      "alias": "probe" },
+    { "op": "KEEP", "columns": ["metric.id"] }
+  ],
+  "limit": 50
+}
+```
+Result rows are `[["metric.id"], <id1>, <id2>, …]`. Zero data rows ⇒
+the specifier doesn't match in this tenant. From here, sample one of
+the IDs via Metrics-Metadata to inspect the metric's attributes and
+refine the filter against actual stored values. The data-store editor
+exposes a "Run JUST this step" control on `FROM` / `FROM_METADATA`
+that runs this exact pipeline and shows the result in the Debug pane.
+### "Extract substrings into new columns via regex"
+`corpus_get("queries", "30-nassql-map-string")`
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": { "op": "ATTRIBUTE", "expressions": [
+        { "name": "type", "values": ["k8s_POD"], "operator": "IN" } ] } },
+      "alias": "pods" },
+    { "op": "MAP_STRING", "column": "vertex.attr.name",
+      "pattern": "(.*)", "as": ["pod_label"] },
+    { "op": "KEEP", "columns": ["pod_label", "vertex.attr.type"] }
+  ],
+  "limit": 20
+}
+```
+### "Cross-domain: entities + filtered metric counts per entity"
+`corpus_get("queries", "31-nassql-join-data-sum")` — shows `JOIN_METADATA`
+with a `SPEC` specifier to filter the joined metrics.
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": { "op": "ATTRIBUTE", "expressions": [
+        { "name": "type", "values": ["AGENT"], "operator": "IN" } ] } },
+      "alias": "agents" },
+    { "op": "JOIN_METADATA",
+      "querySpecifier": { "op": "SPEC",
+        "attributeNameSpecifier": { "op": "REGEX", "pattern": ".*Responses Per Interval.*" } },
+      "alias": "response_metrics" },
+    { "op": "GROUP", "columns": ["vertex.attr.name"] },
+    { "op": "COUNT", "as": "response_metric_count" },
+    { "op": "ORDER", "columns": [{ "column": "response_metric_count", "sortDescending": true }] },
+    { "op": "KEEP", "columns": ["vertex.attr.name", "response_metric_count"] }
+  ],
+  "limit": 20
+}
+```
+### "Topology traversal -> aggregation pipeline"
+`corpus_get("queries", "33-nassql-cross-domain-pipeline")` — shows TAS
+`TRAVERSE` inside `FROM_TOPOLOGY.querySpecifier`.
+```json
+{
+  "query": [
+    { "op": "FROM_TOPOLOGY",
+      "querySpecifier": { "filter": {
+        "op": "TRAVERSE",
+        "input": { "op": "ATTRIBUTE", "expressions": [
+          { "name": "type", "values": ["k8s_CLUSTER"], "operator": "IN" } ] },
+        "traverse": [{ "direction": "ANY", "repeat": 3 }],
+        "includeInput": true } },
+      "alias": "k8s_entities" },
+    { "op": "GROUP", "columns": ["vertex.attr.type"] },
+    { "op": "COUNT", "as": "entity_count" },
+    { "op": "ORDER", "columns": [{ "column": "entity_count", "sortDescending": true }] },
+    { "op": "KEEP", "columns": ["vertex.attr.type", "entity_count"] }
+  ],
+  "limit": 20
+}
+```
+### "All metric sources sorted by count" (no TOP cap)
+`corpus_get("queries", "03-discover-sources")` — same shape as the TOP
+recipe without the TOP op.
+## Authoring tips
+1. **Sketch the pipeline first.** Source → joins → transforms →
+   aggregation → KEEP. If you cannot name the source op, you have not
+   decided whether you want entities, metric metadata, or raw data.
+2. **Insert `DESCRIBE` early and often.** After every source/join op
+   while iterating. Remove for the final query.
+3. **Group before windowed aggregations** so the entity column survives
+   into `KEEP`.
+4. **`KEEP` last, always.** And only list columns that exist after the
+   aggregations.
+5. **Use the recipes verbatim** when starting from a familiar pattern —
+   substitute attribute names, regex patterns, and limits, but keep the
+   op shapes intact.
+6. **For `FROM` / `FROM_METADATA` regex matching**, remember the REGEX op
+   must be wrapped in a `SPEC` specifier. See `mm-cookbook.md`.
+7. **Verify intermediate stages** with `run_partial_query` (`upToStep:
+   N`) — see `run-query-vs-run-partial.md`.