npm - aiwg - Versions diffs - 2026.5.5 → 2026.5.7 - Mend

aiwg 2026.5.5 → 2026.5.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/CLAUDE.md CHANGED Viewed

@@ -663,10 +663,10 @@ Before pushing a version tag:
    npm run uat:serve-live
    ```
    Tests skip cleanly when `AIWG_SANDBOX_ENDPOINT` is unset or unreachable, so this is a safe no-op gate. Run before any release that touches `src/serve/`, the executor contract, or the MC ↔ serve bridge.
-5. **Commit and tag** - `git tag -m "vX.X.X" vX.X.X`
-6. **Push tag to Gitea** - `git push origin main --tags` (automatically creates Gitea Release)
-7. **Optionally mirror to GitHub** - `git push github main --tags`
-8. **Update/Create GitHub Release manually** - via `gh release create|edit`
+5. **Commit the release prep** — `git commit` the package.json/CHANGELOG/announcement bump. Do NOT use plain `git tag -a` or `git tag -s` (they sign with `user.signingkey`, which is typically the maintainer's *personal commit-signing key* — wrong key for tags; the supply-chain gate `tools/ci/verify-signed-tag.sh` will reject in CI).
+6. **Cut the tag via the wrapper** — `tools/release/cut-tag.sh <X.Y.Z>`. Runs 10 pre-tag checks (CalVer shape, `package.json` + `marketplace.json` lockstep, CHANGELOG entry, announcement file present, release-signing key both present locally AND published in `.gitea/keys/maintainers.asc`) and signs with `-u <RELEASE_KEY_FINGERPRINT>` (default: `FE9272F0BC5781E1DE77FAAA719AB63879E84CE8`, the `AIWG Release Signing <release@aiwg.io>` key per the two-key model from commit `a13dabc5`). See the v2026.5.5 incident note in `docs/contributing/versioning.md` for what happens when this is skipped.
+7. **Push tag to Gitea** — `git push origin main --tags`. Triggers `gitea-release.yml` + `npm-publish.yml` (both gated on signed-tag verify).
+8. **Mirror to GitHub** — `git push github main --tags`. Triggers `github-mirror.yml` which creates the GitHub Release using `docs/releases/v<version>-announcement.md` as the body. **No manual `gh release create` needed for stable releases** — the workflow handles it. Pre-release tags (`-rc.*`, `-alpha.*`, `-beta.*`, `-nightly.*`) skip GitHub-Release creation by design.
 ### Version Format

package/agentic/code/addons/agent-loop/agents/ralph-verifier.md CHANGED Viewed

@@ -14,6 +14,12 @@ allowed-tools: Bash, Read, Glob
 You verify completion criteria for agent loops - determining if a task iteration succeeded by running verification commands and analyzing their output.
+## Companion skill
+When the loop is started without explicit `--completion`, the criterion you verify is produced by the `infer-completion-criteria` skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). It derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts.
+You do not run that skill yourself — the loop orchestrator (`ralph-loop` agent or external launcher) calls it during initialization. Your job is to take whatever criterion is in the loop state and verify it. The skill writes its rationale into `.aiwg/ralph/<loop-id>/progress.md` (or `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` for external loops); when reporting verification results, you may reference that rationale so the user sees the full evidence chain.
 ## Capabilities
 ### Verification Methods

package/agentic/code/addons/agent-loop/manifest.json CHANGED Viewed

@@ -50,7 +50,8 @@
     "execute-feedback",
     "reflection-injection",
     "auto-test-execution",
-    "mission-control"
+    "mission-control",
+    "infer-completion-criteria"
   ],
   "agents": [
     "ralph-loop",

package/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 namespace: aiwg
 name: agent-loop
 description: Detect requests for iterative autonomous agent loops and route to the appropriate loop executor
-version: 3.0.0
+version: 3.1.0
 platforms: [all]
 ---
@@ -22,7 +22,7 @@ platforms: [all]
 # Agent Loop Skill
-You detect when users want iterative autonomous task execution and route to the appropriate loop command.
+You detect when users want iterative autonomous task execution and route to an internal, in-session loop by default. External daemon loops are opt-in and require an explicit request.
 ## Loop Taxonomy
@@ -30,10 +30,33 @@ This skill is the **detection and routing layer** for autonomous agent loops —
 | Loop Type | Implementation | Description |
 |-----------|---------------|-------------|
-| **Al** | `ralph` command | Basic iterate-until-complete with learning extraction |
+| **Internal Agent Loop** | current assistant session / internal loop | Default visible iterate-until-complete workflow in the active session |
+| **Al** | internal `ralph` concept | Basic iterate-until-complete when named without external qualifiers |
+| **External Agent Loop** | `agent-loop-ext` / `ralph-external` daemon | Explicitly requested background, detached, crash-resilient, or resumable work |
 | *(future)* | — | Reflection loops, critic-actor loops, branching loops |
-Currently routes all detected requests to the iterative loop executor. As new loop types are added, this skill will route based on task characteristics.
+Generic loop requests route to the internal in-session loop. As new loop types are added, this skill will route based on task characteristics.
+## Routing Policy
+### Default: Internal/In-Session Loop
+Use the internal loop when the user says `agent-loop`, `al`, `ralph`, `loop`, `iterate`, `keep trying`, `fix until green`, `address issues`, `handle all listed issues`, or supplies an iteration bound such as `--iterations 200` without explicit external wording.
+Run the work visibly in the current assistant session:
+1. Establish completion criteria.
+2. Act on the next bounded slice of work.
+3. Verify with the relevant checks.
+4. Adapt and continue until completion, blocker, or the requested iteration cap.
+Do not launch detached processes, background sessions, or the Ralph external daemon for generic loop requests.
+### Explicit External Route
+Route to `agent-loop-ext` / `ralph-external` only when the user explicitly asks for external execution, background execution, a daemon, detached operation, crash resilience, session survival, resume-later behavior, unattended long-running work, or when they name `agent-loop-ext`, `ralph-external`, or the Ralph daemon directly.
+If the user says `ralph` without external/background/daemon qualifiers, treat it as the internal loop concept.
 ## Triggers
@@ -50,16 +73,16 @@ Alternate expressions and non-obvious activations (primary phrases are matched a
 | Pattern | Example | Action |
 |---------|---------|--------|
-| `ralph this: X` | "ralph this: fix all lint errors" | Extract task, infer completion |
-| `ralph: X` | "ralph: migrate to TypeScript" | Extract task, infer completion |
-| `ralph it` | "ralph it" (after task description) | Use conversation context |
-| `keep trying until X` | "keep trying until tests pass" | Task = current context, completion = X |
-| `loop until X` | "loop until coverage >80%" | Task = improve coverage, completion = X |
-| `iterate until X` | "iterate until no errors" | Task = fix errors, completion = X |
-| `run until passes` | "run until passes" | Infer test command |
-| `fix until green` | "fix until green" | Task = fix tests, completion = tests pass |
-| `keep fixing until X` | "keep fixing until lint is clean" | Task = fix lint, completion = X |
-| `al: X` | "al: fix all lint errors" | Shortcut for agent-loop, extract task |
+| `ralph this: X` | "ralph this: fix all lint errors" | Internal loop: extract task, infer completion |
+| `ralph: X` | "ralph: migrate to TypeScript" | Internal loop: extract task, infer completion |
+| `ralph it` | "ralph it" (after task description) | Internal loop: use conversation context |
+| `keep trying until X` | "keep trying until tests pass" | Internal loop: task = current context, completion = X |
+| `loop until X` | "loop until coverage >80%" | Internal loop: task = improve coverage, completion = X |
+| `iterate until X` | "iterate until no errors" | Internal loop: task = fix errors, completion = X |
+| `run until passes` | "run until passes" | Internal loop: infer test command |
+| `fix until green` | "fix until green" | Internal loop: task = fix tests, completion = tests pass |
+| `keep fixing until X` | "keep fixing until lint is clean" | Internal loop: task = fix lint, completion = X |
+| `al: X` | "al: fix all lint errors" | Internal loop shortcut: extract task |
 ## Extraction Logic
@@ -74,7 +97,19 @@ Alternate expressions and non-obvious activations (primary phrases are matched a
 ### Completion Inference
-When user doesn't specify explicit verification:
+When the user doesn't specify explicit verification, delegate to the **`infer-completion-criteria`** skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). That skill runs a deterministic 5-layer pipeline:
+1. **Task verb** → criterion class (test-pass, type-clean, regression-gate, coverage, lint-clean, build-pass, implement-feature)
+2. **Project context files** (CLAUDE.md / AGENTS.md / AIWG.md) → canonical commands from the Development section
+3. **Package manifests** (`package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.) → discovered scripts
+4. **CI configuration** (`.github/workflows/`, `.gitea/workflows/`, GitLab/CircleCI/Jenkins) → team's actual "passes" definition
+5. **`.aiwg/` artifacts** (test-strategy, related use cases by ID match, prior progress files) → project-specific gates
+Synthesis is validated against the `vague-discretion` rule and emits a structured YAML proposal with criterion, verification command, rationale chain, confidence level, and alternatives considered.
+**Use the inline table below ONLY as a last-resort fallback** when the inference skill is unavailable (degraded environment, missing skill deployment). It is intentionally narrow — JavaScript/Node-centric — and represents prior state before `infer-completion-criteria` was added.
+Legacy fallback table:
 | Task Pattern | Inferred Completion |
 |--------------|---------------------|
@@ -86,6 +121,8 @@ When user doesn't specify explicit verification:
 | "migrate to ESM" | "node runs without errors" |
 | "refactor X" | "npm test passes" (preserve behavior) |
+When the inference skill IS available, prefer it. The skill handles multi-language projects, monorepos, CI-defined gates, use-case acceptance criteria, and the refusal case (truly vague tasks like "make it better" that have no measurable criterion).
 ### Examples
 **User**: "ralph this: migrate all files in lib/ to ESM"
@@ -93,7 +130,7 @@ When user doesn't specify explicit verification:
 - Task: "migrate all files in lib/ to ESM"
 - Completion (inferred): "node --experimental-vm-modules lib/index.js runs without errors"
-**Action**: Invoke `/ralph "migrate all files in lib/ to ESM" --completion "node --experimental-vm-modules lib/index.js succeeds"`
+**Action**: Run an internal loop in the current session for `migrate all files in lib/ to ESM` until the inferred completion command succeeds
 ---
@@ -102,7 +139,7 @@ When user doesn't specify explicit verification:
 - Task: "fix failing tests" (from context or implied)
 - Completion: "npm test passes with 0 failures"
-**Action**: Invoke `/ralph "fix failing tests" --completion "npm test passes"`
+**Action**: Run an internal loop in the current session until `npm test` passes
 ---
@@ -111,7 +148,7 @@ When user doesn't specify explicit verification:
 - Task: (from conversation context about auth validation)
 - Completion: (infer based on task type)
-**Action**: Invoke `/ralph "{context-based task}" --completion "{inferred criteria}"`
+**Action**: Run an internal loop in the current session using the context-based task and inferred criteria
 ---
@@ -120,7 +157,16 @@ When user doesn't specify explicit verification:
 - Task: "add tests to improve coverage"
 - Completion: "npm run coverage shows >80%"
-**Action**: Invoke `/ralph "add tests to improve coverage" --completion "coverage report shows >80%"`
+**Action**: Run an internal loop in the current session until the coverage report shows more than 80%
+---
+**User**: "Run this in the background with crash recovery and let me attach later"
+**Extraction**:
+- Task: (from conversation context)
+- Completion: (infer based on task type)
+**Action**: Route to `agent-loop-ext` / `ralph-external` because the user explicitly requested background crash-resilient execution
 ## Clarification Prompts
@@ -146,9 +192,9 @@ What task should I repeat until success?
 What command tells me when it's done?
 ```
-## Multi-Loop Support
+## External/Multi-Loop Support
-**Version 2.0** adds concurrent loop execution with registry tracking.
+**Version 2.0** added concurrent loop execution with registry tracking. This applies to explicit external daemon loops, not ordinary internal `agent-loop` requests.
 ### Concurrency Limits
@@ -164,7 +210,7 @@ All loops have unique identifiers:
 ### --loop-id Parameter
-Users can optionally specify a loop ID:
+Users can optionally specify a loop ID for external daemon loops:
 ```
 /ralph "fix tests" --completion "npm test passes" --loop-id ralph-my-fixes-12345678
@@ -174,7 +220,7 @@ If not provided, ID is auto-generated from task description.
 ### Registry Tracking
-All active loops tracked in `.aiwg/ralph/registry.json`:
+External active loops are tracked in `.aiwg/ralph/registry.json`:
 ```json
 {
@@ -195,7 +241,7 @@ All active loops tracked in `.aiwg/ralph/registry.json`:
 ### Concurrent Loop Behavior
-**When starting a new loop**:
+**When starting a new external loop**:
 1. Check registry: `active_loops.length < 4`
 2. If at limit: Show error with active loop list
@@ -259,22 +305,33 @@ Multi-loop structure per loop:
 ## Invocation
-Once task and completion are extracted/confirmed, invoke the loop executor skill with:
+Once task and completion are extracted/confirmed, use the default internal route unless explicit external wording is present.
+For the default internal route:
+- **Task**: The extracted task description
+- **Completion criteria**: The verification command or condition
+- **Max iterations**: If user mentioned iteration limit
+- **Timeout**: If user mentioned time limit
+- **Operation**: Iterate in the current assistant session with visible progress and verification after meaningful changes
+For explicit external daemon routes:
 - **Task**: The extracted task description
 - **Completion criteria**: The verification command or condition
 - **Max iterations**: If user mentioned iteration limit
 - **Timeout**: If user mentioned time limit
 - **Loop ID**: If user wants a custom loop identifier
+- **Operation**: Route through `agent-loop-ext` / `ralph-external`, then surface status, log, attach, and abort commands
-### Multi-Loop Examples
+### External Multi-Loop Examples
 **Parallel bug fixes**:
 ```
-User: "ralph: fix TypeScript errors in src/"
+User: "run an external ralph loop to fix TypeScript errors in src/"
 → Loop 1: ralph-fix-ts-errors-a1b2c3d4
-User: "also ralph: add missing tests in lib/"
+User: "also run an external ralph loop to add missing tests in lib/"
 → Loop 2: ralph-add-tests-b2c3d4e5
 Both running in parallel until completion criteria met.
@@ -282,7 +339,7 @@ Both running in parallel until completion criteria met.
 **Sequential with manual abort**:
 ```
-User: "ralph: refactor entire auth module"
+User: "run an external ralph loop to refactor the entire auth module"
 → Loop 1: ralph-refactor-auth-c3d4e5f6 (running)
 User: "actually, abort that and just fix the login bug"
@@ -296,12 +353,16 @@ User: "actually, abort that and just fix the login bug"
 - The skill is **exclusive** - once triggered, handle the entire request
 - Always confirm extraction before invoking if there's ambiguity
 - Prefer inferring completion criteria over asking (ask only if truly unclear)
-- Check registry capacity before starting new loops
-- Show helpful errors with active loop list when at capacity
+- Default ambiguous requests to the internal in-session loop
+- Do not start `ralph-external`, detached daemons, or background `aiwg ralph` processes unless the user explicitly asks for them
+- Check registry capacity before starting explicit external loops
+- Show helpful errors with active loop list when explicit external loops are at capacity
 ## Related
-- `ralph` skill - the iterative loop executor implementation
+- `infer-completion-criteria` skill - derives measurable `--completion` from project state when the user doesn't supply one
+- `ralph` skill - legacy name for the iterative loop concept; `agent-loop` is canonical and defaults to in-session execution
+- `agent-loop-ext` skill - crash-resilient external loop with state persistence
 - `ralph-status` skill - check loop progress
 - `ralph-resume` skill - continue interrupted loops
 - `ralph-abort` skill - abort active loops
@@ -311,6 +372,7 @@ User: "actually, abort that and just fix the login bug"
 ## Version History
+- **3.1.0**: Defaulted generic `agent-loop` routing to internal in-session loops; require explicit wording for external daemon loops
 - **3.0.0**: Renamed from `ralph-loop` to `agent-loop`; added loop taxonomy (Issue #558)
 - **2.0.0**: Added multi-loop support with registry tracking (Issue #268)
 - **1.0.0**: Initial single-loop implementation

package/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md CHANGED Viewed

@@ -5,7 +5,7 @@ legacyName: ralph-external
 platforms: [all]
 description: Crash-resilient external agent loop with state persistence and CI/CD integration
 commandHint:
-  argumentHint: "\"<objective>\" --completion \"<criteria>\" [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet]"
+  argumentHint: "\"<objective>\" [--completion \"<criteria>\"] [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet] [--auto-criteria | --no-infer-completion]"
   allowedTools: Bash, Read, Write
   model: sonnet
   category: automation
@@ -60,7 +60,7 @@ Users may say:
 ### Objective (required)
 The task the loop should accomplish. Passed as the first positional argument.
-### --completion (required)
+### --completion (optional — inferred when omitted)
 Success criteria as a verifiable command. The loop exits when this command returns exit code 0.
 **Good examples**:
@@ -68,6 +68,14 @@ Success criteria as a verifiable command. The loop exits when this command retur
 - `--completion "npx tsc --noEmit exits with code 0"`
 - `--completion "coverage report shows >80%"`
+**When omitted**: the launcher invokes the `infer-completion-criteria` skill before the external loop starts. The skill derives a measurable criterion from project state (CLAUDE.md / AGENTS.md / AIWG.md, package manifests, CI configuration, `.aiwg/` artifacts) and emits a structured proposal with rationale. The proposal is written to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` and used as the loop's gate.
+Because `agent-loop-ext` runs externally (potentially headless / in CI), the confirmation flow is:
+- Interactive session (TTY attached): show proposal, accept `Y / n / edit` like the in-session `ralph` skill
+- Non-interactive / `--auto-criteria` / CI environment: use the inferred criterion if confidence is `high`, otherwise fail fast and print the proposal as a diagnostic so the user can re-launch with `--completion` explicitly
+Pass `--no-infer-completion` to require explicit `--completion` and fail before launch if missing. See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`.
 ### --max-iterations (default: 10)
 Maximum iterations before the loop halts and saves state for manual review.
@@ -90,8 +98,12 @@ Suppress verbose progress output. Completion banner is always shown.
 When triggered:
-1. Validate that `--completion` criteria are specified and verifiable
-2. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
+1. **Resolve completion criteria**:
+   - If `--completion` is provided → use it directly
+   - Else if `--no-infer-completion` is set → fail fast before launch with a helpful error
+   - Else → invoke `infer-completion-criteria` skill, persist proposal to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml`, confirm or auto-adopt per session-interactivity rules above
+2. Validate the resolved criterion is verifiable (can be checked via command)
+3. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
 3. Generate a unique `loop-id` (8-character hex) and create the loop state file at `.aiwg/ralph-external/loops/<loop-id>.json`
 4. Write the initial state: `{ objective, completionCriteria, maxIterations, timeout, provider, status: "pending", iteration: 0 }`
 5. If `--branch` is specified, create the git branch now