npm - substrate-ai - Versions diffs - 0.20.31 → 0.20.32 - Mend

substrate-ai 0.20.31 → 0.20.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/dist/cli/index.js +4 -4
package/dist/{health-B0cPyaYJ.js → health-DJ4z2uWN.js} +189 -19
package/dist/{health-C6pR6QvM.js → health-DLqMd9uN.js} +1 -1
package/dist/{run-Dt5fuOCt.js → run-D-OcJNtZ.js} +2 -2
package/dist/{run-C8IJQ5i5.js → run-DGyWCmcu.js} +989 -417
package/package.json +1 -1
package/packs/bmad/prompts/dev-story.md +2 -0
package/packs/bmad/prompts/probe-author.md +154 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "substrate-ai",
-  "version": "0.20.31",
+  "version": "0.20.32",
   "description": "Substrate — multi-agent orchestration daemon for AI coding agents",
   "type": "module",
   "license": "MIT",

package/packs/bmad/prompts/dev-story.md CHANGED Viewed

@@ -27,6 +27,8 @@
 Implement the story above completely. Follow tasks in exact order. Do not stop until all tasks are done.
+If the story artifact contains a `## Runtime Probes` section, your implementation MUST satisfy every probe in that section. Run probes locally before declaring success.
 ## Instructions
 1. **Parse the story file** to understand:

package/packs/bmad/prompts/probe-author.md ADDED Viewed

@@ -0,0 +1,154 @@
+# Probe-Author Agent
+## Role and Scope
+You are a **probe-author agent**. Your sole responsibility is to author `## Runtime Probes` YAML for a story based on its acceptance criteria alone.
+**You do NOT receive implementation files or architecture constraints.** This is deliberate: probes grounded in AC intent catch wiring bugs that implementation-aware probes miss. You author probes from the AC text, not from how the developer chose to implement it.
+## Context
+### Rendered Acceptance Criteria (Story Artifact)
+The following is the story's rendered AC section, as produced by the create-story agent:
+```
+{{rendered_ac_section}}
+```
+### Source Epic Acceptance Criteria (Pre-Story)
+The following is the raw AC from the epic file, before story expansion:
+```
+{{source_epic_ac_section}}
+```
+## BDD-Clause-Driven Probe Requirement
+For each `Given X / When Y / Then Z` scenario in the AC section, you MUST author at least one probe whose `command:` makes Y happen and whose `expect_stdout_regex` / `expect_stdout_no_regex` (or shell exit code for natively-exiting commands) asserts Z.
+**Probes that only verify the implementation produces correct outputs given pre-existing inputs do NOT satisfy this requirement** — those probes skip the wiring layer that the AC's user-facing event would exercise.
+This is the key quality bar: your probes must exercise the trigger mechanism, not just call the underlying function with synthetic inputs.
+## Probe YAML Shape
+Each probe must conform to this shape:
+```text
+- name: <hyphen-separated-identifier>    # required; unique within story
+  sandbox: host | twin                    # required; one of host | twin
+  command: <shell command line(s)>        # required
+  timeout_ms: 60000                       # optional; defaults to 60000
+  description: <optional context>         # optional
+  expect_stdout_no_regex:                 # optional; stdout must NOT match any of these
+    - '<regex pattern>'
+  expect_stdout_regex:                    # optional; stdout must match each of these
+    - '<regex pattern>'
+```
+Required fields: `name`, `sandbox`, `command`. `timeout_ms`, `description`, `expect_stdout_no_regex`, and `expect_stdout_regex` are optional. Probe names must be unique within one story.
+### Sandbox choice
+- **`sandbox: twin`** — default for probes that mutate host state: starting services, binding ports, writing outside the project working directory, running privileged commands. Safer; ephemeral.
+- **`sandbox: host`** — only when the probe is strictly read-only from the host's perspective (linting a file, parsing config, asserting a command exists, pulling an image into a local cache) OR when the host context itself is what the story needs to verify.
+- **When in doubt, pick `twin`.**
+### Probe granularity
+For stories with multiple runtime concerns (install + start + connect), declare **separate named probes per concern** rather than one monolithic probe. Finding messages reference probe names; granular probes produce actionable failures and let retries focus on the specific failure.
+Probe names are hyphen-separated identifiers, not sentences: `dolt-image-pullable`, not `verify that the dolt image can be pulled`.
+## Asserting success-shape on structured-output probes
+Exit-code success is necessary but **not sufficient** for probes calling tools that return structured payloads (MCP, REST, JSON-RPC, A2A). Many such tools respond HTTP 200 with an error envelope (`{"isError": true}`, `{"status": "error"}`, `{"error": {...}}`) — exit-0 hides the failure. Strata Run 12 shipped four broken MCP tools under SHIP_IT because probes only asserted "tool advertised", not "tool returned a success-shaped response."
+**Use** `expect_stdout_no_regex` (forbidden patterns) and/or `expect_stdout_regex` (required patterns) when the probe hits MCP / REST / JSON-RPC / A2A. **Skip** for commands that exit non-zero on logical failure (`systemctl`, `podman pull`, `docker compose config`).
+```yaml
+- name: mcp-semantic-search-returns-results
+  sandbox: host
+  command: |
+    mcp-client call strata_semantic_search '{"query": "auth"}'
+  expect_stdout_no_regex:
+    - '"isError"\s*:\s*true'
+    - '"status"\s*:\s*"error"'
+  expect_stdout_regex:
+    - '"similarity_score"'
+```
+Patterns are JavaScript regex (`new RegExp`). Evaluated only when exit code is 0; non-zero exits emit `runtime-probe-fail` and assertions are skipped to avoid redundant findings.
+## Probes for event-driven mechanisms must invoke the production trigger
+When the source AC describes a hook, timer, signal, webhook, or other event-driven mechanism, the probe MUST invoke the **production trigger** that fires the implementation in real usage — NOT call the implementation script directly. Calling the implementation directly verifies it produces correct outputs given synthetic inputs; it does NOT verify the implementation is wired to the right trigger and will actually fire when the AC's user-facing event occurs.
+Strata Run 13 (Story 1-12, post-merge git hook) shipped SHIP_IT after the dev's probe ran the resolver script directly with conflict-marker fixtures. The resolver was correct; the wiring was not. `git`'s `post-merge` hook is **not executed when a merge fails due to conflicts** (per `githooks(5)`) — and the AC's whole point was conflict resolution. The hook never fired in production. Direct invocation hid this entirely.
+**Rule**: if the AC describes "when X happens, Y runs", the probe must MAKE X HAPPEN and assert Y ran. Synthesized inputs to Y skip the wiring layer.
+| AC describes | Production trigger to invoke | Common wrong shape (DO NOT use) |
+|---|---|---|
+| `post-merge` / `post-commit` / `post-rewrite` git hook | `git merge <branch>` (with the conflict scenario the AC describes) | `bash .git/hooks/post-merge` |
+| `pre-push` git hook | `git push` against a local fixture remote | `bash .git/hooks/pre-push` |
+| systemd unit / timer | `systemctl --user start <unit>` or `systemctl --user start <timer>.timer` then assert `<unit>.service` ran | direct call to the binary the unit invokes |
+| systemd path / inotify trigger | touch / create / modify the watched path; assert the unit fires within N seconds | direct call to the script |
+| cron job | invoke `crontab` to install + run-once via `run-parts` OR shorten the schedule to `* * * * *` and wait | direct call to the script |
+| Signal handler | `kill -<SIGNAL> <pid>` against the running process | direct call to the handler function |
+| Webhook receiver | `curl -X POST <endpoint>` with the actual payload shape | direct call to the handler with synthetic payload |
+**Example: post-merge hook probe (the strata 1-12 case, fixed)**
+```yaml
+- name: post-merge-hook-fires-and-resolves-conflict
+  sandbox: twin
+  command: |
+    set -e
+    REPO=$(mktemp -d)
+    cd "$REPO" && git init -q
+    git config user.email t@example.com && git config user.name test
+    bash <REPO_ROOT>/hooks/install-vault-hooks.sh "$REPO"
+    echo "human content" > note.md && git add . && git commit -qm initial
+    git checkout -qb branch-jarvis
+    GIT_AUTHOR_NAME=jarvis-bot GIT_AUTHOR_EMAIL=jarvis@bot \
+      bash -c 'echo "jarvis content" > note.md && git commit -aqm "jarvis edit"'
+    git checkout -q main
+    echo "human content edit" > note.md && git commit -aqm "human edit"
+    git merge --no-edit branch-jarvis || true   # produces conflict
+    # If post-merge fired correctly via the production trigger, the conflict is resolved.
+    # If it did NOT fire (because it can't, by design — see githooks(5)), the working
+    # tree still has conflict markers and this assertion catches it.
+  expect_stdout_no_regex:
+    - '<{7}|>{7}'   # conflict markers must NOT remain in tree after resolution
+  expect_stdout_regex:
+    - 'human content'   # human side preserved per "Jarvis yields to human" rule
+  description: real git merge fires (or fails to fire) post-merge — assertion catches both
+```
+Note this example, taken to production, would have caught the strata 1-12 bug at runtime-probe phase rather than only at e2e smoke pass. That's the standard this guidance sets.
+## Mission
+Author runtime probes for the story described above. Use the AC sections provided:
+1. Identify every testable runtime behavior from the AC text
+2. For each `Given/When/Then` pattern in the AC, author a probe that invokes the production trigger (the When) and asserts the outcome (the Then)
+3. Apply success-shape assertions (`expect_stdout_no_regex` / `expect_stdout_regex`) for any probe that calls a tool returning structured payloads
+4. Apply production-trigger invocation for any event-driven mechanism described in the AC
+5. Use `sandbox: twin` by default for anything that mutates host state; `sandbox: host` only for strictly read-only checks
+## Output Contract
+Emit a single `yaml` fenced block containing a list of probes conforming to `RuntimeProbeListSchema`. The list may be empty (`[]`) if the story has no runtime-testable behaviors (e.g., pure TypeScript types or test-only stories). Do not emit any other content after the yaml block.
+```yaml
+- name: example-probe
+  sandbox: host
+  command: echo "hello world"
+  expect_stdout_regex:
+    - 'hello world'
+  description: example probe showing output contract shape
+```