npm - aiwg - Versions diffs - 2026.5.5 → 2026.5.6 - Mend

aiwg 2026.5.5 → 2026.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CLAUDE.md CHANGED Viewed

@@ -663,10 +663,10 @@ Before pushing a version tag:
    npm run uat:serve-live
    ```
    Tests skip cleanly when `AIWG_SANDBOX_ENDPOINT` is unset or unreachable, so this is a safe no-op gate. Run before any release that touches `src/serve/`, the executor contract, or the MC ↔ serve bridge.
-5. **Commit and tag** - `git tag -m "vX.X.X" vX.X.X`
-6. **Push tag to Gitea** - `git push origin main --tags` (automatically creates Gitea Release)
-7. **Optionally mirror to GitHub** - `git push github main --tags`
-8. **Update/Create GitHub Release manually** - via `gh release create|edit`
+5. **Commit the release prep** — `git commit` the package.json/CHANGELOG/announcement bump. Do NOT use plain `git tag -a` or `git tag -s` (they sign with `user.signingkey`, which is typically the maintainer's *personal commit-signing key* — wrong key for tags; the supply-chain gate `tools/ci/verify-signed-tag.sh` will reject in CI).
+6. **Cut the tag via the wrapper** — `tools/release/cut-tag.sh <X.Y.Z>`. Runs 10 pre-tag checks (CalVer shape, `package.json` + `marketplace.json` lockstep, CHANGELOG entry, announcement file present, release-signing key both present locally AND published in `.gitea/keys/maintainers.asc`) and signs with `-u <RELEASE_KEY_FINGERPRINT>` (default: `FE9272F0BC5781E1DE77FAAA719AB63879E84CE8`, the `AIWG Release Signing <release@aiwg.io>` key per the two-key model from commit `a13dabc5`). See the v2026.5.5 incident note in `docs/contributing/versioning.md` for what happens when this is skipped.
+7. **Push tag to Gitea** — `git push origin main --tags`. Triggers `gitea-release.yml` + `npm-publish.yml` (both gated on signed-tag verify).
+8. **Mirror to GitHub** — `git push github main --tags`. Triggers `github-mirror.yml` which creates the GitHub Release using `docs/releases/v<version>-announcement.md` as the body. **No manual `gh release create` needed for stable releases** — the workflow handles it. Pre-release tags (`-rc.*`, `-alpha.*`, `-beta.*`, `-nightly.*`) skip GitHub-Release creation by design.
 ### Version Format

package/agentic/code/addons/agent-loop/agents/ralph-verifier.md CHANGED Viewed

@@ -14,6 +14,12 @@ allowed-tools: Bash, Read, Glob
 You verify completion criteria for agent loops - determining if a task iteration succeeded by running verification commands and analyzing their output.
+## Companion skill
+When the loop is started without explicit `--completion`, the criterion you verify is produced by the `infer-completion-criteria` skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). It derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts.
+You do not run that skill yourself — the loop orchestrator (`ralph-loop` agent or external launcher) calls it during initialization. Your job is to take whatever criterion is in the loop state and verify it. The skill writes its rationale into `.aiwg/ralph/<loop-id>/progress.md` (or `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` for external loops); when reporting verification results, you may reference that rationale so the user sees the full evidence chain.
 ## Capabilities
 ### Verification Methods

package/agentic/code/addons/agent-loop/manifest.json CHANGED Viewed

@@ -50,7 +50,8 @@
     "execute-feedback",
     "reflection-injection",
     "auto-test-execution",
-    "mission-control"
+    "mission-control",
+    "infer-completion-criteria"
   ],
   "agents": [
     "ralph-loop",

package/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md CHANGED Viewed

@@ -74,7 +74,19 @@ Alternate expressions and non-obvious activations (primary phrases are matched a
 ### Completion Inference
-When user doesn't specify explicit verification:
+When the user doesn't specify explicit verification, delegate to the **`infer-completion-criteria`** skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). That skill runs a deterministic 5-layer pipeline:
+1. **Task verb** → criterion class (test-pass, type-clean, regression-gate, coverage, lint-clean, build-pass, implement-feature)
+2. **Project context files** (CLAUDE.md / AGENTS.md / AIWG.md) → canonical commands from the Development section
+3. **Package manifests** (`package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.) → discovered scripts
+4. **CI configuration** (`.github/workflows/`, `.gitea/workflows/`, GitLab/CircleCI/Jenkins) → team's actual "passes" definition
+5. **`.aiwg/` artifacts** (test-strategy, related use cases by ID match, prior progress files) → project-specific gates
+Synthesis is validated against the `vague-discretion` rule and emits a structured YAML proposal with criterion, verification command, rationale chain, confidence level, and alternatives considered.
+**Use the inline table below ONLY as a last-resort fallback** when the inference skill is unavailable (degraded environment, missing skill deployment). It is intentionally narrow — JavaScript/Node-centric — and represents prior state before `infer-completion-criteria` was added.
+Legacy fallback table:
 | Task Pattern | Inferred Completion |
 |--------------|---------------------|
@@ -86,6 +98,8 @@ When user doesn't specify explicit verification:
 | "migrate to ESM" | "node runs without errors" |
 | "refactor X" | "npm test passes" (preserve behavior) |
+When the inference skill IS available, prefer it. The skill handles multi-language projects, monorepos, CI-defined gates, use-case acceptance criteria, and the refusal case (truly vague tasks like "make it better" that have no measurable criterion).
 ### Examples
 **User**: "ralph this: migrate all files in lib/ to ESM"
@@ -301,7 +315,9 @@ User: "actually, abort that and just fix the login bug"
 ## Related
-- `ralph` skill - the iterative loop executor implementation
+- `infer-completion-criteria` skill - derives measurable `--completion` from project state when the user doesn't supply one
+- `ralph` skill - the iterative loop executor implementation (legacy name; `agent-loop` is canonical)
+- `agent-loop-ext` skill - crash-resilient external loop with state persistence
 - `ralph-status` skill - check loop progress
 - `ralph-resume` skill - continue interrupted loops
 - `ralph-abort` skill - abort active loops

package/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md CHANGED Viewed

@@ -5,7 +5,7 @@ legacyName: ralph-external
 platforms: [all]
 description: Crash-resilient external agent loop with state persistence and CI/CD integration
 commandHint:
-  argumentHint: "\"<objective>\" --completion \"<criteria>\" [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet]"
+  argumentHint: "\"<objective>\" [--completion \"<criteria>\"] [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet] [--auto-criteria | --no-infer-completion]"
   allowedTools: Bash, Read, Write
   model: sonnet
   category: automation
@@ -60,7 +60,7 @@ Users may say:
 ### Objective (required)
 The task the loop should accomplish. Passed as the first positional argument.
-### --completion (required)
+### --completion (optional — inferred when omitted)
 Success criteria as a verifiable command. The loop exits when this command returns exit code 0.
 **Good examples**:
@@ -68,6 +68,14 @@ Success criteria as a verifiable command. The loop exits when this command retur
 - `--completion "npx tsc --noEmit exits with code 0"`
 - `--completion "coverage report shows >80%"`
+**When omitted**: the launcher invokes the `infer-completion-criteria` skill before the external loop starts. The skill derives a measurable criterion from project state (CLAUDE.md / AGENTS.md / AIWG.md, package manifests, CI configuration, `.aiwg/` artifacts) and emits a structured proposal with rationale. The proposal is written to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` and used as the loop's gate.
+Because `agent-loop-ext` runs externally (potentially headless / in CI), the confirmation flow is:
+- Interactive session (TTY attached): show proposal, accept `Y / n / edit` like the in-session `ralph` skill
+- Non-interactive / `--auto-criteria` / CI environment: use the inferred criterion if confidence is `high`, otherwise fail fast and print the proposal as a diagnostic so the user can re-launch with `--completion` explicitly
+Pass `--no-infer-completion` to require explicit `--completion` and fail before launch if missing. See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`.
 ### --max-iterations (default: 10)
 Maximum iterations before the loop halts and saves state for manual review.
@@ -90,8 +98,12 @@ Suppress verbose progress output. Completion banner is always shown.
 When triggered:
-1. Validate that `--completion` criteria are specified and verifiable
-2. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
+1. **Resolve completion criteria**:
+   - If `--completion` is provided → use it directly
+   - Else if `--no-infer-completion` is set → fail fast before launch with a helpful error
+   - Else → invoke `infer-completion-criteria` skill, persist proposal to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml`, confirm or auto-adopt per session-interactivity rules above
+2. Validate the resolved criterion is verifiable (can be checked via command)
+3. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
 3. Generate a unique `loop-id` (8-character hex) and create the loop state file at `.aiwg/ralph-external/loops/<loop-id>.json`
 4. Write the initial state: `{ objective, completionCriteria, maxIterations, timeout, provider, status: "pending", iteration: 0 }`
 5. If `--branch` is specified, create the git branch now

package/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md ADDED Viewed

@@ -0,0 +1,323 @@
+---
+namespace: aiwg
+name: infer-completion-criteria
+aliases: [agent-loop-infer-completion, al-infer-completion, ralph-infer-completion]
+platforms: [all]
+description: Infer measurable completion criteria for an agent-loop task from project docs, code, and AIWG standards when the user has not supplied --completion explicitly
+commandHint:
+  argumentHint: '"<task description>" [--task-type code|test|docs|refactor] [--non-interactive]'
+  allowedTools: "Read, Glob, Grep, Bash"
+  model: sonnet
+  category: automation
+---
+# Infer Completion Criteria
+## Purpose
+When a user starts an `agent-loop` task without supplying `--completion`, this skill derives a measurable, verifiable completion criterion from project state. The output must satisfy the `vague-discretion` rule: a concrete shell command or file-inspection check that returns pass/fail unambiguously.
+Iteration is only as good as its gate. A loop with a vague gate ("until it's done") runs forever or exits prematurely. This skill is what turns "agent-loop this" into "agent-loop this until `<measurable thing>`."
+The canonical name for the iterative-loop addon is **agent-loop**. `ralph` is the legacy name for the executor skill, retained as an alias; `al` is a short form. The detection/routing skill is `agent-loop` (which delegates to this skill when criteria are missing); the executor is `ralph` (canonical name forthcoming). Everywhere this skill says "agent-loop" you can read "ralph" as the legacy equivalent.
+## When This Skill Runs
+This skill is invoked by:
+- The `agent-loop` detection-and-routing skill when it parses a user request without explicit completion criteria
+- The `ralph` executor skill during Phase 1 initialization when `--completion` is omitted
+- The `agent-loop-ext` external-loop launcher during pre-launch resolution when `--completion` is omitted
+- Direct invocation via `aiwg discover "infer completion"` → `aiwg show skill infer-completion-criteria` when a user wants to preview the inferred criterion before committing to a loop
+This skill does **not** run when `--completion` is explicit. The user's word is authoritative.
+## Inference Pipeline
+The skill is a deterministic walk through five evidence layers, plus one synthesis step. Each layer contributes candidate criteria; the synthesis picks the strongest measurable one and explains the chain of evidence.
+### Layer 1 — The task verb
+Parse the user's task description for an intent verb. Map to a default criterion class:
+| Verb / phrase | Criterion class |
+|---|---|
+| "fix tests", "make tests pass", "test failure" | Test suite passes (exit 0) |
+| "add tests", "increase coverage", "test coverage" | Coverage threshold met |
+| "fix types", "type errors", "migrate to typescript" | Type checker exits 0 |
+| "fix lint", "clean up warnings", "style" | Linter exits 0 |
+| "build", "make it compile" | Build command exits 0 |
+| "refactor", "extract", "rename" | Tests still pass AND build still passes (regression gate) |
+| "implement <X>", "add feature <X>" | Tests for the new code exist and pass |
+| "document", "add docs", "JSDoc" | Coverage check on docstrings/JSDoc presence |
+| "fix bug", "resolve issue #N" | Specific test for that bug passes AND existing suite still green |
+| "migrate", "upgrade" | Build + test + lint all green (no regression) |
+If the verb is ambiguous, the skill falls back to "regression gate" (build + test + lint all green) as the safest default.
+### Layer 2 — Project conventions in CLAUDE.md / AGENTS.md / AIWG.md
+Read the project's context files. AIWG-managed projects often declare commands directly:
+```bash
+# Run tests
+npm test
+# Type check
+npx tsc --noEmit
+# Lint markdown
+npm exec markdownlint-cli2 "**/*.md"
+```
+Extract these as the canonical commands for their respective domains. The Development section of `CLAUDE.md` is the highest-trust source here — it's what the project's maintainers run.
+Also scan for explicit completion-criterion conventions. Some projects state "a commit is not finished until CI passes" — that signals the CI command (or equivalent local invocation) is the gate.
+### Layer 3 — Package manifests and config
+Inspect the project's manifest files to discover scripts and tools:
+| Manifest | Where to look |
+|---|---|
+| `package.json` | `scripts.test`, `scripts.lint`, `scripts.build`, `scripts.coverage`, `scripts.typecheck` |
+| `Cargo.toml` | implies `cargo test`, `cargo build`, `cargo clippy` |
+| `pyproject.toml` | `[tool.pytest]`, `[tool.ruff]`, `[tool.mypy]`, `scripts.*` |
+| `go.mod` | implies `go test ./...`, `go vet ./...`, `go build ./...` |
+| `Gemfile` | implies `bundle exec rspec`, `bundle exec rubocop` |
+| `pom.xml` / `build.gradle` | `mvn test`, `mvn verify`, `gradle test` |
+| `.tool-versions` / `mise.toml` | language version pins inform which tool is canonical |
+When multiple scripts exist (e.g. `test`, `test:unit`, `test:integration`), prefer the script the project's own docs reference. If the docs don't reference any, prefer the most specific match to the task verb (e.g. for "fix integration test" → `test:integration`).
+### Layer 4 — CI configuration
+CI files encode the team's actual definition of "passes":
+| CI system | Scan |
+|---|---|
+| GitHub Actions | `.github/workflows/*.yml` — extract `run:` steps from non-deploy jobs |
+| Gitea Actions | `.gitea/workflows/*.yml` — same |
+| GitLab CI | `.gitlab-ci.yml` — extract `script:` from test/lint jobs |
+| CircleCI | `.circleci/config.yml` |
+| Jenkins | `Jenkinsfile` |
+The first non-trivial verification step in the primary workflow is the team's canonical "done" gate. If CI runs `npm test && npm run lint && npm run typecheck` in order, the inferred criterion is "all three exit 0."
+### Layer 5 — AIWG artifacts
+If the project has a `.aiwg/` directory, scan for relevant context:
+- `.aiwg/testing/test-strategy.md` — declared verification approach
+- `.aiwg/architecture/software-architecture-doc.md` — architectural quality gates
+- `.aiwg/security/security-gates.md` — security-related criteria
+- `.aiwg/quality/code-review-guide.md` — code quality bars
+- `.aiwg/activity.log` — recent operations that may indicate what "done" looked like for similar past tasks
+- `.aiwg/working/<related-progress-files>.md` — prior task progress files; mine the "Completion criteria" sections
+If the project has a related use case (`.aiwg/requirements/UC-*.md`) whose ID is in the task description, pull that use case's acceptance criteria — those ARE the completion criteria.
+### Synthesis step
+Combine the layers into a single proposed criterion. The decision logic:
+1. If a use case in `.aiwg/requirements/` matches the task, use its acceptance criteria verbatim. Done.
+2. Otherwise, take the verb-class default from Layer 1 and instantiate it using the canonical command from Layer 2 (CLAUDE.md) > Layer 4 (CI) > Layer 3 (manifest).
+3. If the task is in the "regression gate" class, AND the project's CI runs more than one verification, combine them: `command-A passes AND command-B passes AND command-C passes`.
+4. If no canonical command is found in any layer (unusual — typically only on empty-scaffold projects), fall back to:
+   - `<file or change exists in git diff against HEAD~1>` — pure structural check
+   - And inform the user that a substantive verification command should be added.
+### Apply AIWG standards (vague-discretion)
+Validate the proposal against the `vague-discretion` rule:
+- Criterion must be expressible as a shell command (or shell pipeline) that exits 0 on pass, non-zero on fail.
+- Criterion must NOT use the words "good enough", "thorough", "comprehensive", "complete" without a measurable suffix.
+- Criterion must NOT be self-referential ("the agent is satisfied" — no).
+- Criterion must have an implicit or explicit `max-iterations` cap (ralph's default 10 is the floor; very large refactors may need 20).
+If the proposal fails any of these, regenerate. If after two regenerations the proposal still fails, surface the problem to the user with the diagnostic ("could not find a measurable verification command — please supply one explicitly").
+## Output Contract
+The skill emits a single block of structured output for the calling skill (`agent-loop` router, `ralph` executor, or `agent-loop-ext` launcher) to consume:
+```yaml
+proposed_completion:
+  criterion: "npm test passes AND npx tsc --noEmit exits 0"
+  verification_command: "npm test && npx tsc --noEmit"
+  rationale:
+    - "Task verb 'refactor' triggers regression gate (Layer 1)"
+    - "package.json scripts.test = 'jest --coverage' (Layer 3)"
+    - "CLAUDE.md Development section references both npm test and npx tsc --noEmit (Layer 2)"
+    - ".github/workflows/ci.yml runs both as required checks (Layer 4)"
+  confidence: high  # high | medium | low
+  alternatives_considered:
+    - criterion: "npm run lint exits 0"
+      rejected_because: "Lint is not in CI required checks for this repo"
+  max_iterations_suggestion: 10
+  needs_human_confirmation: false  # true if confidence == low OR criterion is unusual
+```
+When the skill runs in non-interactive mode (via `aiwg al --auto-criteria` or the equivalent on `agent-loop` / `agent-loop-ext`), `needs_human_confirmation: false` proceeds directly. Otherwise the consuming skill (the `agent-loop` router or the `ralph` / `agent-loop-ext` executor) shows the proposal to the user and confirms.
+## Interaction With The User
+In interactive mode, after running the pipeline, present the proposal:
+```
+No --completion criteria was provided. I inferred:
+  Criterion: npm test passes AND npx tsc --noEmit exits 0
+  Verification: `npm test && npx tsc --noEmit`
+Evidence:
+  - Task verb "refactor" → regression gate
+  - package.json scripts.test = jest --coverage
+  - CLAUDE.md Development section references both checks
+  - .github/workflows/ci.yml requires both
+Proceed with this criterion? [Y/n/edit]
+```
+User options:
+- `Y` (default): start the loop with the inferred criterion
+- `n`: abort and request the user supply `--completion` explicitly
+- `edit`: accept a manual edit to the criterion before proceeding
+Use the platform's native interaction tool when available (per `native-ux-tools` rule). On Claude Code, that means `AskUserQuestion`.
+## When Inference Should NOT Run
+- `--completion` is explicit → use the user's criterion, don't second-guess
+- `--no-infer-completion` flag is passed → fail fast with a helpful error if `--completion` is also missing
+- The task description is itself a criterion (e.g. "make `npx tsc --noEmit` pass") → extract the command from the task, don't re-infer
+## Edge Cases
+| Case | Handling |
+|---|---|
+| No `package.json`, no manifest, no CI | Scan project root for any executable test runner (`pytest`, `go test`, `cargo test`, `make test`, `Makefile` target `test`). Fall back to the structural check if none found. |
+| Multiple test commands (`test:unit`, `test:integration`, `test:e2e`) | Prefer the one nearest to the task scope. If task mentions "unit", use `test:unit`. If the task is broad, prefer the union via `&&`. |
+| CI runs tests on multiple OS/Node versions | Use the local invocation (`npm test`), not the matrix runner. The matrix is a deploy concern. |
+| Monorepo with multiple packages | Detect from `pnpm-workspace.yaml` / `lerna.json` / `turbo.json` / workspaces field. If the task scope is one package, infer that package's commands. If the task spans the monorepo, use the top-level `test` script. |
+| Project has no tests at all | This is a finding. The inferred criterion should be "tests exist for the new code AND those tests pass." Surface to the user that the project lacks a baseline test suite — that's important context for the loop's expectations. |
+| Project's tests are currently broken (the task IS to fix them) | Set the criterion to the passing condition. The whole point of the loop is to get from current red state to green. |
+| Conflict between layers (CLAUDE.md says X, CI says Y) | Prefer CLAUDE.md (closer to the maintainer's intent). Note the discrepancy in the rationale. |
+## Interaction With Other AIWG Rules
+| Rule | How this skill respects it |
+|---|---|
+| `vague-discretion` | The whole point — produces measurable, command-form criteria |
+| `instruction-comprehension` | Reads the task description carefully; doesn't override explicit user instructions |
+| `research-before-decision` | Layer-walk IS research; doesn't propose criteria without evidence |
+| `human-authorization` | Confirms with user before starting loop (unless `--auto-criteria` explicitly granted) |
+| `auto-compact-continue` | The inferred criterion IS what the loop continues toward; no "should I keep working" prompts |
+| `cli-secondary` | Uses `aiwg discover` to find related skills if the task verb is unusual |
+## Examples
+### Example 1: Simple test task on a TypeScript project
+```
+$ agent-loop "fix the failing auth tests"   # or: aiwg al / aiwg ralph (legacy)
+Inferring completion criteria...
+  Layer 1 verb: "fix tests" → test-pass class
+  Layer 2 CLAUDE.md: "npm test" is the canonical test command
+  Layer 3 package.json: scripts.test = "jest"
+  Layer 4 CI: .github/workflows/ci.yml runs `npm test`
+  Layer 5: no related use case found
+Proposed criterion: npm test passes (exit 0)
+Verification: `npm test`
+Confidence: high
+Proceed? [Y/n/edit]
+```
+### Example 2: Refactor with no tests
+```
+$ agent-loop "extract auth logic into a separate module"
+Inferring completion criteria...
+  Layer 1 verb: "extract" → regression gate
+  Layer 2 CLAUDE.md: no Development section
+  Layer 3 package.json: scripts.test = "echo 'no tests'" (degenerate)
+  Layer 4 CI: no workflows found
+  Layer 5: no related use case found
+Warning: project has no functional test suite or CI configuration.
+Falling back to structural verification.
+Proposed criterion: The new module exists, the original code references it,
+                    AND `npx tsc --noEmit` still exits 0
+Verification: `test -f src/auth/index.ts && grep -q 'from.*src/auth' src/main.ts && npx tsc --noEmit`
+Confidence: low
+This is a weak gate. Consider supplying --completion explicitly
+or adding a test suite first.
+Proceed? [Y/n/edit]
+```
+### Example 3: Task with an explicit use case reference
+```
+$ agent-loop "implement UC-AUTH-001"
+Inferring completion criteria...
+  Layer 1 verb: "implement" → tests-exist class
+  Layer 5: found .aiwg/requirements/UC-AUTH-001-user-login.md
+Using acceptance criteria from UC-AUTH-001:
+  - [ ] User can log in with valid email/password
+  - [ ] Invalid credentials show clear error message
+  - [ ] Account locks after 5 failed attempts
+  - [ ] Login completes within 2 seconds
+Proposed criterion: All acceptance criteria from UC-AUTH-001 verified by tests,
+                    AND `npm test -- --testPathPattern=auth` passes
+Verification: `npm test -- --testPathPattern=auth`
+Confidence: high (acceptance criteria are explicit)
+Proceed? [Y/n/edit]
+```
+### Example 4: Refusal case
+```
+$ agent-loop "make the code better"
+Inferring completion criteria...
+  Layer 1 verb: "make better" → AMBIGUOUS, no clear criterion class
+  Layer 5: no related use case
+Cannot infer measurable criteria for this task.
+"Make the code better" is vague (per AIWG vague-discretion rule).
+A loop with no measurable gate runs forever or exits prematurely.
+Please supply --completion with a concrete check, e.g.:
+  --completion "npm test passes AND npm run lint exits 0"
+  --completion "all functions in src/utils/ have JSDoc"
+  --completion "complexity score from eslint < 10 for all files"
+Or rephrase the task with a concrete intent:
+  agent-loop "reduce cyclomatic complexity in src/utils/"
+  agent-loop "add JSDoc to all exported functions in src/api/"
+```
+## References
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Measurable criteria requirement
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Layer-walk research pattern
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/instruction-comprehension.md — Don't override explicit user criteria
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md — Criterion IS what the loop continues toward
+- @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md — The detection/routing skill that delegates here when `--completion` is missing
+- @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/ralph/SKILL.md — The legacy executor skill that consumes this output (`ralph` is legacy for `agent-loop`'s executor)
+- @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md — The crash-resilient external loop, same delegation pattern
+- @$AIWG_ROOT/agentic/code/addons/agent-loop/agents/ralph-verifier.md — Runs the verification command this skill proposes

package/agentic/code/addons/agent-loop/skills/ralph/SKILL.md CHANGED Viewed

@@ -6,7 +6,7 @@ deprecated_names: [ralph]
 platforms: [all]
 description: Execute iterative task loop until completion criteria are met - iteration beats perfection
 commandHint:
-  argumentHint: '"<task>" --completion "<criteria>" [--max-iterations N] [--timeout M] [--interactive --guidance "text"]'
+  argumentHint: '"<task>" [--completion "<criteria>"] [--max-iterations N] [--timeout M] [--interactive --guidance "text"] [--auto-criteria | --no-infer-completion]'
   allowedTools: "Task, Read, Write, Bash, Glob, Grep, TodoWrite, Edit"
   model: opus
   category: automation
@@ -50,7 +50,7 @@ The task to execute. Should be:
 - Measurable completion state
 - Self-contained (all context provided)
-### --completion (required)
+### --completion (optional — inferred when omitted)
 Success criteria. Must be:
 - Verifiable (tests, lint, compilation)
 - Specific (not subjective)
@@ -66,6 +66,10 @@ Success criteria. Must be:
 - `--completion "code looks good"`
 - `--completion "feature is done"`
+**When omitted**: the loop delegates to the `infer-completion-criteria` skill, which derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts. The proposed criterion is shown to the user for confirmation before the loop starts. Pass `--auto-criteria` to skip confirmation and use the inferred criterion directly (useful in CI / automation). Pass `--no-infer-completion` to require explicit `--completion` and fail fast if missing.
+See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md` for the inference pipeline.
 ### --max-iterations (default: 10)
 Safety limit on iterations. Prevents infinite loops.
@@ -94,12 +98,21 @@ Create feature branch for loop work.
 ### Phase 1: Initialization
-1. Parse task and completion criteria
-2. Validate criteria are verifiable (can be checked via command)
-3. Create `.aiwg/ralph/` workspace if not exists
-4. Initialize iteration counter (i=0)
-5. Create feature branch if --branch specified
-6. Log initialization
+1. Parse task
+2. **Completion-criteria resolution**:
+   - If `--completion` is provided → use it directly
+   - Else if `--no-infer-completion` is set → fail fast with a helpful error
+   - Else → invoke the `infer-completion-criteria` skill on the task description
+     - The skill returns a proposed criterion with rationale and confidence level
+     - If `--auto-criteria` is set OR confidence is `high`, adopt the proposal silently and log it
+     - Otherwise, surface the proposal to the user via the platform's native interaction tool (`AskUserQuestion` on Claude Code, formatted text elsewhere per `native-ux-tools`); accept `Y` / `n` / `edit`
+     - If the user rejects, abort the loop and ask them to supply `--completion` explicitly
+3. Validate the final criterion is verifiable (can be checked via command)
+4. Create `.aiwg/ralph/` workspace if not exists
+5. Initialize iteration counter (i=0)
+6. Create feature branch if --branch specified
+7. **Write the criterion and its rationale into the loop's progress file** (`.aiwg/ralph/<loop-id>/progress.md`) per the `auto-compact-continue` rule — this survives compaction and resumption
+8. Log initialization
 **Communicate**:
 ```

package/agentic/code/addons/aiwg-utils/manifest.json CHANGED Viewed

@@ -122,7 +122,8 @@
     "post-commit-index-refresh",
     "soul-enforcement",
     "skill-discovery",
-    "cli-secondary"
+    "cli-secondary",
+    "auto-compact-continue"
   ],
   "templates": [
     "devkit/addon-manifest",

package/agentic/code/addons/aiwg-utils/rules/RULES-INDEX.md CHANGED Viewed

@@ -4,10 +4,15 @@ Core meta-utility rules for agent coordination, context management, and platform
 ---
-## AIWG Utilities Rules (20 rules — active with aiwg-utils addon)
+## AIWG Utilities Rules (21 rules — active with aiwg-utils addon)
 ### HIGH
+#### auto-compact-continue
+**Summary**: The answer to "should I keep working?" is always YES — until the task's measurable completion criteria are met or the user redirects. Context pressure, long tool outputs, and high iteration counts are not scope questions. When context fills, the right response is to compact, checkpoint to durable storage (activity log, progress file at `.aiwg/working/<task>-progress.md`, git history, AIWG memory), and continue. Trust platform auto-compact (REF-910); load-bearing state must live in `CLAUDE.md` / `AGENTS.md` / `AIWG.md` / activity log / progress file / git / `.aiwg/working/` — never only in conversation turns. Maintain a `## Compact Instructions` section so the summarizer preserves completion criteria, last successful step, failed approaches (do not let them be re-attempted), and authorization questions. Apply aggressive in-session compression (REF-122: passive=6% savings, aggressive=22.7%); update the progress file every 10–15 tool calls on long tasks. Exceptions: authorization gates (per `human-authorization`), 3-attempts-failed escalation (per `anti-laziness` Rule 6), explicit user redirect, genuinely ambiguous new directive classification (per `skill-discovery` Rule 0). "Should I continue?" is never the right question.
+**When to apply**: Any long-running session; before context fills; after long tool outputs; when crossing iteration thresholds; when resuming after compaction; when tempted to ask the user permission to keep going
+**Full rule**: @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md
 #### cli-secondary
 **Summary**: AIWG is agentic-first. The agent's strict priority order for any action: (1) **local skill** already loaded in context (kernel skills, framework quickrefs, deployed agents), (2) **discovered skill** via `aiwg discover` + `aiwg show`, (3) raw CLI command, (4) manual file edits as last resort. The skill carries the priming (pre-flight checks, dry-run preview, preservation logic, gates) that the CLI alone lacks. Sole exception: discovery/finder commands (`aiwg discover`, `aiwg show`, `aiwg list`, `aiwg status`, `aiwg version`, `aiwg runtime-info`, status/info subcommands) stay primary — they ARE the priming entry points and the bridge from priority 2 to priority 1. For mixed commands (`aiwg index`, `aiwg packages`, `aiwg ops`, `aiwg storage`), classify per subcommand. Includes a pairing table covering use/refresh/regenerate/doctor/init/promote/scaffold-*/add-*/doc-sync/lint/cleanup-audit/sdlc-accelerate/ralph/mc/steward/index build/ops actions/storage migrate.
 **When to apply**: Before invoking any AIWG CLI command, before writing skill/agent docs that reference CLI commands, when updating quickrefs or routing tables, when filing pairings audits

package/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md ADDED Viewed

@@ -0,0 +1,257 @@
+# Auto-Compact and Continue
+**Enforcement Level**: HIGH
+**Scope**: All long-running agent sessions and tool-using agents across all platforms
+**Addon**: aiwg-utils (core, universal)
+**Research Basis**: REF-909 (Anthropic — Effective Harnesses for Long-Running Agents), REF-910 (Anthropic — Compaction), REF-122 (Active Context Compression / Focus Agent), REF-128 (Context Window Management Strategies)
+## Overview
+When context grows long, the wrong response is to stop and ask the user "should I keep working?" The right response is to compact, checkpoint, and continue. AIWG agents must treat the AIWG memory system, activity log, and on-disk artifacts as the durable substrate that survives context resets, and must continue until the original task's measurable completion criteria are met.
+## Problem Statement
+Frontier models — especially under conservative harness defaults — increasingly stop mid-task to ask the user permission to continue when context budget gets tight, when a long tool result lands, or when an iteration count crosses some internal threshold. The user has already authorized the task. The question "should I keep working?" is the agent dumping its context-management responsibility onto the human. Every time this happens, the user has to re-engage, re-establish state, and pay a thinking tax that exists because the agent treated a routine context event as if it were a scope question.
+The failure mode is structurally similar to `vague-discretion` (loops that exit on "good enough") and `anti-laziness` Rule 6 (premature abandonment). This rule names the specific subset: **stopping to ask about continuation when continuation is the obvious correct action**.
+## The Single Rule, Stated Plainly
+**The answer to "should I keep working?" is always YES — until the task's stated completion criteria are met or the user has redirected.** Context pressure is not a scope question. Long tool output is not a scope question. Crossing iteration N is not a scope question. The right response to all of them is: compact, checkpoint to durable storage, and continue.
+The exceptions are narrow and named:
+1. **Authorization gates** (`human-authorization` rule) — a destructive or out-of-scope action *was* discovered. Ask about the action, not about continuation.
+2. **3-attempts-failed escalation** (`anti-laziness` rule, Rule 6) — three honest attempts to solve the same blocker have failed; escalate with full context.
+3. **Explicit user redirect** — the user has paused or changed direction since you started.
+4. **Genuinely ambiguous classification** of a new directive (`skill-discovery` rule, Rule 0) — ask one clarifying question, do not stall.
+Nothing else justifies stopping to ask "should I keep working?"
+## Mandatory Rules
+### Rule 1: Never Ask "Should I Continue?" When Context Is the Reason
+If your reason for asking the user is any of the following — context is getting long, you've been working a while, this is a big task, the next step might be expensive — do not ask. Compact and continue.
+**FORBIDDEN**:
+```
+Agent: "I've completed phases 1 and 2. Context is getting long.
+        Should I continue to phase 3?"
+Agent: "This has been a complex investigation. Would you like me
+        to wrap up here, or should I keep going?"
+Agent: "I notice we're at iteration 8. Do you want me to keep
+        iterating or stop?"
+```
+**REQUIRED**:
+```
+Agent: *writes checkpoint to .aiwg/working/<task-slug>-progress.md*
+       *appends activity log entry*
+       *continues to phase 3*
+Agent: *continues investigation; if context tightens, writes findings
+       to a working file and references it in subsequent turns*
+Agent: *continues iterating against the measurable completion condition;
+       if no progress for 3 consecutive iterations, applies the
+       anti-laziness recovery protocol, not a continuation prompt*
+```
+### Rule 2: Use the Harness, Don't Ask the Human
+AIWG already ships the durable-storage substrate that makes auto-compact safe. Before context fills, write the load-bearing state to one or more of these:
+| Substrate | What lives there | Survives compaction? |
+|---|---|---|
+| `CLAUDE.md` / `AGENTS.md` / `AIWG.md` | Project conventions, framework rules, the task contract itself | Yes (system-prompt scope) |
+| `.aiwg/activity.log` | Append-only timeline of operations performed | Yes (on disk, never in context) |
+| AIWG memory (`~/.claude/projects/.../memory/`) | Durable facts the user explicitly chose to keep | Yes (system-prompt scope on Claude Code; equivalent surfaces on other providers) |
+| `.aiwg/working/<task>-progress.md` | The progress file (see Rule 3) | Yes (on disk) |
+| Git history | Each meaningful step as a commit | Yes (on disk + remote) |
+| `.aiwg/working/` | Intermediate artifacts, scratch results | Yes (on disk) |
+If the load-bearing state for the current task lives only in conversation turns, you are setting up a future failure. Move it to disk before context fills, not after.
+### Rule 3: Write a Progress File for Multi-Phase Work
+For any task that you reasonably expect to span more than ~20 tool calls or more than one compaction window, write a progress file at `.aiwg/working/<task-slug>-progress.md` and update it at each meaningful checkpoint. The file must include:
+```markdown
+# Progress: <task name>
+## Task contract
+- Original request: <verbatim quote of the user ask>
+- Completion criteria (measurable, per vague-discretion): <bullets>
+- Authorization scope: <what is in-scope; what requires asking>
+## Current status
+- Phase: <where we are>
+- Last successful step: <what just worked>
+- Next action: <what to do when this file is re-read>
+## Completed steps
+- [x] <step with brief result and link to artifact / commit ref>
+- [x] <step with brief result and link to artifact / commit ref>
+## Failed approaches (do not retry)
+- <approach> — failed because <reason>; learning: <what we know now>
+## Open questions / deferred items
+- <item> — deferred because <reason>; will need authorization to <action>
+## State references
+- Activity log entries: <range or recent IDs>
+- Commits: <hashes>
+- Artifacts: <paths under .aiwg/working/ or final locations>
+```
+The progress file's **Failed approaches** section is the single most underrated artifact (REF-909). Without it, post-compaction agents re-discover known dead ends. With it, the next agent reads the file once, knows where to resume, and skips the blind alleys.
+### Rule 4: Trust the Platform's Auto-Compact
+On Claude Code (and equivalents), auto-compact runs automatically when context approaches the limit (REF-910). The right response is *not* to preempt it by asking the user to clear context. The right response is:
+1. Make sure your durable state is current (Rule 2).
+2. Let compaction run.
+3. After compaction, the conversation history is summarized but `CLAUDE.md` / `AGENTS.md` / `AIWG.md`, the activity log, the progress file, and all on-disk artifacts are intact.
+4. Read the progress file (Rule 3) and continue from "Next action."
+If you are on a platform that does *not* auto-compact and you can see context getting tight, write the checkpoint and use whatever the platform's manual compact / new-session mechanism is, then resume — do not ask the user to make that decision for you.
+### Rule 5: Honor the Compact Instructions
+A `## Compact Instructions` section in `CLAUDE.md` / `AGENTS.md` / `AIWG.md` (or the equivalent provider context file) tells the auto-compact summarizer what to preserve. If the project has one, your in-context summaries should bias toward the same priorities. If the project lacks one and you are working on a long-running task, contribute one — propose to the user that AIWG's `aiwg-regenerate` skill add it, or write the file directly under team-directive scope.
+A minimum-viable Compact Instructions block:
+```markdown
+## Compact Instructions
+When summarizing this conversation for compaction, preserve:
+1. The current task's completion criteria verbatim.
+2. The last successful step and any verification command that proved it.
+3. Failed approaches and the reason each failed (do not let them be re-attempted).
+4. References to `.aiwg/working/*-progress.md`, `.aiwg/activity.log`,
+   and any in-flight commits.
+5. Pending authorization questions that were raised but not answered.
+6. Open scope boundaries (what is in/out of scope for this task).
+Discard:
+- Exploratory reasoning traces leading to already-known conclusions.
+- Tool outputs that were superseded by later, more authoritative reads.
+- Greetings, status banners, and other non-load-bearing prose.
+```
+### Rule 6: Aggressive, Not Passive, Compaction Discipline
+REF-122 (Focus Agent / Active Context Compression) is unambiguous: passive instructions to compress yield ~6% savings; aggressive instructions (compress every 10–15 tool calls; consolidate findings; prune raw history) yield ~22.7%. The mechanism that makes "always continue" safe is *actively compacting during the session*, not waiting for auto-compact at the limit. Apply the same discipline:
+- After every meaningful tool-result observation, ask: "Is the raw content load-bearing for the remaining work, or is the conclusion?" If conclusion-only, summarize it into your next thought and stop carrying the raw output forward.
+- After every 10–15 tool calls on a long task, write a progress-file update.
+- Before spawning a subagent (`subagent-scoping` rule), pass conclusions, not raw history (`context-bloat` rule).
+### Rule 7: Distinguish Continuation From Authorization
+The hardest case, and the one this rule exists to draw a sharp line through:
+| Situation | Right action |
+|---|---|
+| Context getting tight, task is in-scope, no new scope discovered | **Continue** (compact + checkpoint) |
+| Context getting tight, but a destructive or out-of-scope action is the next required step | **Ask about the action** (per `human-authorization`), not about continuation |
+| Context getting tight, three honest attempts have failed on the same blocker | **Escalate per anti-laziness Rule 6** with full context, not a generic "should I continue?" |
+| Context getting tight, the user just sent a new directive that supersedes the current task | **Classify per skill-discovery Rule 0**; the old task may be done, the new task starts fresh |
+In all four cases, the question framed to the user (if any) is specific and load-bearing. "Should I keep working?" is never the right question — it carries no information for the user and no signal back for the agent.
+### Rule 8: Recovery After Compaction
+When a session resumes from compaction (you notice prior turns are now summarized, or you are reading a progress file from a previous session):
+1. **Read the progress file first.** It is the canonical state.
+2. **Read recent activity log entries.** They are the timeline.
+3. **Check git status and recent commits.** They show what landed.
+4. **Verify the completion criteria are still met by your read.** If the summary says "phase 2 done" but git shows uncommitted changes from phase 2, reconcile before continuing.
+5. **Skip the "failed approaches" set.** Do not re-try them.
+6. **Resume from "Next action."**
+This is the AIWG-flavored implementation of REF-909's initializer-agent / coding-agent pattern. The progress file *is* the bridge between sessions; treating it as authoritative is what makes the bridge load-bearing.
+## Detection Heuristics
+You may be in violation of this rule if:
+| Symptom | Likely cause |
+|---|---|
+| Output ends with "Would you like me to continue?" or "Should I keep going?" | Asked a scope question that was actually a context question |
+| Output offers a menu like "I can stop here or proceed with X — which?" when X is clearly the next obvious step | Same |
+| You wrote a long summary of "what we've done so far" and asked the user to direct the next step | The summary belonged in a progress file, not in the conversation |
+| You stopped before the user's measurable completion criteria were met | You confused fatigue / context / iteration count with scope completion |
+| You re-attempted an approach the prior session marked as failed | You did not read the progress file's "Failed approaches" section |
+| You asked the user to decide between two technically equivalent paths to the same outcome | The decision was yours to make under `instruction-comprehension` — make it and continue |
+## Recovery When This Rule Was Violated
+If you notice you just asked "should I continue?" or its semantic equivalent:
+1. **Reframe in the same turn.** "Continuing — writing checkpoint to `.aiwg/working/<task>-progress.md` and proceeding to <next step>." Do not wait for the user to confirm.
+2. **Write the progress file** if it does not already exist.
+3. **Append an activity-log entry** noting the checkpoint and resumption.
+4. **Continue.**
+If the user has *already* responded ("yes, keep going") then the cost is paid; treat it as a lesson and update your in-session checklist to avoid the second occurrence.
+## Interaction With Other Rules
+| Rule | Relationship |
+|---|---|
+| `vague-discretion` | This rule is the operational counterpart: vague-discretion forbids vague completion criteria; this rule forbids stopping to ask about continuation against measurable criteria |
+| `anti-laziness` | Rule 6 of anti-laziness names the 3-attempts-failed escalation case; this rule covers the much more common context-pressure case |
+| `human-authorization` | Authorization is for scope changes and destructive actions, not for "should I keep working" |
+| `instruction-comprehension` | Track the user's stated completion criteria. The criteria, not the agent's fatigue, decide when to stop |
+| `skill-discovery` | Rule 0 handles new-directive classification; this rule handles "no new directive, just context pressure" |
+| `activity-log` | The activity log is one of the durable substrates this rule depends on |
+| `subagent-scoping` / `context-bloat` | When you delegate, pass conclusions not raw history; same logic applies in the main agent |
+| `context-budget` | When `AIWG_CONTEXT_WINDOW` is set, the compaction trigger is tighter; budget more aggressively |
+## Platform Applicability
+Universal across all AIWG-supported providers:
+- **Claude Code**: Auto-compact is the platform default. This rule says: trust it, prepare for it, do not ask the user to decide for it.
+- **OpenAI Codex**: Smaller context window; aggressive checkpointing to disk is more important. The 32KB `AGENTS.md` cap means the progress file and activity log do the heavy lifting.
+- **GitHub Copilot, Cursor, Warp, Factory, OpenCode, Windsurf, OpenClaw, Hermes**: All have their own context handling; the AIWG durable substrate (memory, activity log, working files, git) is platform-neutral and is the cross-platform answer.
+## Checklist
+Before responding "should I continue?" or any semantic equivalent:
+- [ ] Did the original task contract include a measurable completion criterion?
+- [ ] Has that criterion been met? If yes, *say so and stop*; if no, *continue*.
+- [ ] Is the reason I want to stop "context is long"? Then write a progress file and continue.
+- [ ] Is the reason I want to stop "I've done a lot"? Then write a progress file and continue.
+- [ ] Is there an authorization gate (destructive / out-of-scope) blocking the next step? Then ask about *that specific action*, not about continuation.
+- [ ] Have I made 3 honest attempts at the same blocker without progress? Then escalate per `anti-laziness`, with full context.
+- [ ] Did the user redirect? Then classify the new directive per `skill-discovery` Rule 0.
+If none of those apply: **continue**.
+## References
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Measurable completion criteria
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/instruction-comprehension.md — Track the user's stated criteria
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/human-authorization.md — When asking *is* the right move
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/activity-log.md — One of the durable substrates
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/context-budget.md — Aggressive budgeting when `AIWG_CONTEXT_WINDOW` is set
+- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/skill-discovery.md — Rule 0: classify new directives
+- @$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/rules/anti-laziness.md — Rule 6: 3-attempts escalation
+- REF-909 (Anthropic — Effective Harnesses for Long-Running Agents) — initializer-agent / coding-agent pattern, progress files
+- REF-910 (Anthropic — Compaction) — auto-compact mechanics, Compact Instructions, what survives
+- REF-122 (Active Context Compression / Focus Agent) — aggressive vs passive compression discipline
+- REF-128 (Context Window Management Strategies) — effective context is 30-40% smaller than advertised
+---
+**Rule Status**: ACTIVE
+**Last Updated**: 2026-05-14

package/agentic/code/frameworks/sdlc-complete/skills/flow-release/SKILL.md CHANGED Viewed

@@ -166,9 +166,23 @@ The config's `policy` block applies at every gate:
 - **Post-tag failures**: never delete pushed tags. Increment patch and re-run the flow.
 - **Gate failures with `hard_stop: true`**: halt immediately, surface the failure log, do not advance.
 - **Gate failures with `hard_stop: false`**: log a warning, continue.
+- **Supply-chain gate (signed-tag verify) failure**: this is the recovery exception to "never delete pushed tags." If `tools/ci/verify-signed-tag.sh` rejects the tag (wrong signing key, expired key, missing-from-maintainers.asc), no artifacts are emitted by `npm-publish.yml` / `gitea-release.yml` / `github-mirror.yml` — the bad tag is an empty shell. Recovery: `git tag -d <tag>`, push delete to both remotes (`git push origin :refs/tags/<tag>` and `git push github :refs/tags/<tag>`), then re-cut via `tools/release/cut-tag.sh <version>` which forces the release key. Document the incident in `docs/contributing/versioning.md` for the next release.
 Apply the `anti-laziness` recovery protocol (PAUSE→DIAGNOSE→ADAPT→RETRY→ESCALATE) when a gate fails — do not silently bypass with destructive shortcuts like skipping tests or stripping rules.
+## Tag-cutting must use the wrapper
+**`git tag -a` and `git tag -s` are NOT to be used directly by this skill.** The maintainer's global git config typically has `tag.gpgsign=true` and `user.signingkey=<personal-commit-signing-key>`, which causes plain `git tag` invocations to sign with the **wrong key** — the personal key, not the release key. The supply-chain gate will reject the tag and no artifacts will ship.
+Always use `tools/release/cut-tag.sh <version>` for the tag step. The wrapper:
+1. Runs 10 pre-tag sanity checks (CalVer shape, package.json + marketplace.json lockstep, CHANGELOG entry, announcement file present, release-signing key present locally AND published in `.gitea/keys/maintainers.asc`)
+2. Signs with `-u <RELEASE_KEY_FINGERPRINT>` (defaults to the AIWG release key; override via `AIWG_RELEASE_KEY_FINGERPRINT` env var for forks)
+3. Verifies the local signature via `git tag -v` before declaring success
+4. Does NOT push automatically — push is left to the operator so a final sanity step can run
+This is the canonical path. Any release config that templates a raw `git tag -s` is incorrect and must be migrated to call the wrapper.
 ## Defaults when no config exists
 If `.aiwg/release.config` is missing, scaffold one from the schema and ask the operator to review before continuing. The AIWG repo's own config is the reference implementation; new projects can copy it as a starting point.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "aiwg",
-  "version": "2026.5.5",
+  "version": "2026.5.6",
   "description": "Deployment tool and support utility for AI context. Copies agents, skills, commands, rules, and behaviors into the paths each AI platform reads (Claude Code, Codex, Copilot, Cursor, Warp, OpenClaw, and 4 more) so one source of truth works across 10 platforms. Optional utilities for persistent artifact memory, background orchestration, autonomous loops, and artifact indexing.",
   "type": "module",
   "bin": {

package/tools/release/cut-tag.sh ADDED Viewed

@@ -0,0 +1,250 @@
+#!/usr/bin/env bash
+# Cut a signed AIWG release tag with the release-signing key.
+#
+# This wrapper exists because git's `user.signingkey` is typically set to
+# a maintainer's PERSONAL commit-signing key. Tags created via plain
+# `git tag -s` (or even `git tag -a` when `tag.gpgsign=true` is set) will
+# then be signed with that personal key, NOT the release key published in
+# `.gitea/keys/maintainers.asc`. The supply-chain gate
+# (`tools/ci/verify-signed-tag.sh`) correctly rejects such tags — but the
+# error surfaces in CI rather than at the moment of the mistake.
+#
+# This script forces the correct release key via `git tag -s -u <fingerprint>`
+# and runs pre-tag sanity checks so the gate-fail loop becomes a local
+# fail-fast loop instead.
+#
+# Background: a13dabc5 (two-key model — personal key signs commits, release
+# key signs tags) and v2026.5.5 incident (signed-tag gate caught wrong-key
+# tag on commit fd2ba49e — recovered via re-tag with this procedure).
+#
+# Usage:
+#   tools/release/cut-tag.sh <version> [-m "<custom tag message>"]
+#
+# Example:
+#   tools/release/cut-tag.sh 2026.5.6
+#   tools/release/cut-tag.sh 2026.5.6 -m "v2026.5.6 — Some Theme"
+#
+# What it does (fail-fast at each step):
+#   1. Verifies <version> matches the CalVer YYYY.M.PATCH shape (no leading zeros)
+#   2. Verifies package.json version matches <version>
+#   3. Verifies .claude-plugin/marketplace.json metadata.version matches (PUW-038 lockstep)
+#   4. Verifies CHANGELOG.md has an entry for [<version>]
+#   5. Verifies docs/releases/v<version>-announcement.md exists
+#   6. Verifies the AIWG release-signing key is available locally
+#   7. Verifies the release-signing key is published in .gitea/keys/maintainers.asc
+#   8. Creates the signed tag with `git tag -s -u <release-key-fingerprint>`
+#   9. Verifies the tag's signature locally before any push (`git tag -v`)
+#  10. Reports next-step push commands; does NOT push automatically
+#
+# The push step is intentionally left to the operator so this script can
+# also be run as a sanity check before a release ceremony.
+set -euo pipefail
+# ---------------------------------------------------------------------------
+# Configuration — single source of truth for the release-signing key
+# ---------------------------------------------------------------------------
+# Fingerprint of the AIWG release-signing key, per SECURITY.md and
+# .gitea/keys/maintainers.asc. Override via $AIWG_RELEASE_KEY_FINGERPRINT
+# for forks or migration scenarios.
+RELEASE_KEY_FINGERPRINT="${AIWG_RELEASE_KEY_FINGERPRINT:-FE9272F0BC5781E1DE77FAAA719AB63879E84CE8}"
+# ---------------------------------------------------------------------------
+# Argument parsing
+# ---------------------------------------------------------------------------
+if [ $# -lt 1 ]; then
+  cat <<EOF >&2
+Usage: $0 <version> [-m "<tag message>"]
+Examples:
+  $0 2026.5.6
+  $0 2026.5.6 -m "v2026.5.6 — Some Release Theme"
+The default tag message is "vX.Y.Z — <CalVer date>" if not provided.
+EOF
+  exit 1
+fi
+VERSION="$1"
+shift
+TAG_MESSAGE=""
+while [ $# -gt 0 ]; do
+  case "$1" in
+    -m|--message)
+      TAG_MESSAGE="$2"
+      shift 2
+      ;;
+    *)
+      echo "Unknown flag: $1" >&2
+      exit 1
+      ;;
+  esac
+done
+TAG="v${VERSION}"
+cat <<EOF
+====================================================================
+AIWG cut-tag — preflight checks for ${TAG}
+====================================================================
+EOF
+# ---------------------------------------------------------------------------
+# 1. CalVer shape: YYYY.M.PATCH (no leading zeros on M or PATCH)
+# ---------------------------------------------------------------------------
+if ! [[ "$VERSION" =~ ^[0-9]{4}\.([1-9]|1[0-2])\.([0-9]|[1-9][0-9]+)$ ]]; then
+  cat <<EOF >&2
+FAIL: '$VERSION' does not match CalVer 'YYYY.M.PATCH'.
+       - Month must be 1-12 with NO leading zero (use '5' not '05').
+       - Patch must have no leading zero either.
+       See docs/contributing/versioning.md.
+EOF
+  exit 1
+fi
+echo "  [1/10] CalVer shape OK: $VERSION"
+# ---------------------------------------------------------------------------
+# 2. package.json lockstep
+# ---------------------------------------------------------------------------
+PKG_VERSION=$(node -e "console.log(JSON.parse(require('fs').readFileSync('package.json','utf8')).version)")
+if [ "$PKG_VERSION" != "$VERSION" ]; then
+  cat <<EOF >&2
+FAIL: package.json version is '$PKG_VERSION', expected '$VERSION'.
+       Run: npm version $VERSION --no-git-tag-version
+       Or edit package.json manually and commit.
+EOF
+  exit 1
+fi
+echo "  [2/10] package.json lockstep OK"
+# ---------------------------------------------------------------------------
+# 3. .claude-plugin/marketplace.json metadata.version lockstep (PUW-038 #1139)
+# ---------------------------------------------------------------------------
+MP_VERSION=$(node -e "console.log(JSON.parse(require('fs').readFileSync('.claude-plugin/marketplace.json','utf8')).metadata.version)")
+if [ "$MP_VERSION" != "$VERSION" ]; then
+  cat <<EOF >&2
+FAIL: .claude-plugin/marketplace.json metadata.version is '$MP_VERSION',
+       expected '$VERSION' (PUW-038 lockstep — #1139).
+       Update the file and commit.
+EOF
+  exit 1
+fi
+echo "  [3/10] marketplace.json lockstep OK"
+# ---------------------------------------------------------------------------
+# 4. CHANGELOG.md entry exists
+# ---------------------------------------------------------------------------
+if ! grep -q "^## \[${VERSION}\]" CHANGELOG.md; then
+  cat <<EOF >&2
+FAIL: CHANGELOG.md does not contain '## [${VERSION}]'.
+       Add the release entry before tagging — see docs/releases/.
+EOF
+  exit 1
+fi
+echo "  [4/10] CHANGELOG.md entry OK"
+# ---------------------------------------------------------------------------
+# 5. Release announcement file exists
+# ---------------------------------------------------------------------------
+ANNOUNCEMENT="docs/releases/v${VERSION}-announcement.md"
+if [ ! -f "$ANNOUNCEMENT" ]; then
+  cat <<EOF >&2
+FAIL: $ANNOUNCEMENT does not exist.
+       Create the release announcement before tagging.
+       See docs/releases/v2026.5.5-announcement.md as a template.
+EOF
+  exit 1
+fi
+echo "  [5/10] Announcement file OK"
+# ---------------------------------------------------------------------------
+# 6. Release-signing key available locally
+# ---------------------------------------------------------------------------
+if ! gpg --list-secret-keys "$RELEASE_KEY_FINGERPRINT" >/dev/null 2>&1; then
+  cat <<EOF >&2
+FAIL: Release-signing key $RELEASE_KEY_FINGERPRINT not found in local
+       GPG keyring. Either import it (gpg --import) or override the
+       fingerprint with \$AIWG_RELEASE_KEY_FINGERPRINT.
+       See docs/contributing/versioning.md → "Signing your release tag".
+EOF
+  exit 1
+fi
+echo "  [6/10] Release-signing key present locally"
+# ---------------------------------------------------------------------------
+# 7. Release-signing key published in maintainers.asc
+# ---------------------------------------------------------------------------
+# Extract fingerprints from the committed keyring and check ours is among
+# them. We use `gpg --show-keys --with-colons` so the output is parseable
+# without depending on locale.
+if ! gpg --show-keys --with-colons .gitea/keys/maintainers.asc 2>/dev/null \
+     | awk -F: '$1=="fpr" {print $10}' | grep -qx "$RELEASE_KEY_FINGERPRINT"; then
+  cat <<EOF >&2
+FAIL: Release-signing key $RELEASE_KEY_FINGERPRINT is NOT published in
+       .gitea/keys/maintainers.asc. The supply-chain gate will reject
+       the tag. Publish the key:
+         gpg --armor --export $RELEASE_KEY_FINGERPRINT >> .gitea/keys/maintainers.asc
+         git add .gitea/keys/maintainers.asc
+         git commit -m 'security: republish release signing key'
+EOF
+  exit 1
+fi
+echo "  [7/10] Release-signing key published in maintainers.asc"
+# ---------------------------------------------------------------------------
+# 8. Create signed tag with explicit release key (-u override)
+# ---------------------------------------------------------------------------
+if [ -z "$TAG_MESSAGE" ]; then
+  TODAY=$(date -u '+%Y-%m-%d')
+  TAG_MESSAGE="${TAG} — ${TODAY}"
+fi
+# Refuse to overwrite an existing local tag — operators must run
+# `git tag -d $TAG` explicitly first to acknowledge the destructive op.
+if git rev-parse "$TAG" >/dev/null 2>&1; then
+  cat <<EOF >&2
+FAIL: Tag '$TAG' already exists locally. Refusing to overwrite.
+       To recreate intentionally:
+         git tag -d $TAG
+         git push origin :refs/tags/$TAG    # if it was already pushed
+         $0 $VERSION                          # re-run this script
+EOF
+  exit 1
+fi
+git tag -s -u "$RELEASE_KEY_FINGERPRINT" "$TAG" -m "$TAG_MESSAGE"
+echo "  [8/10] Signed tag '$TAG' created with release key"
+# ---------------------------------------------------------------------------
+# 9. Local verify (mirror of the CI gate logic)
+# ---------------------------------------------------------------------------
+if ! git tag -v "$TAG" >/dev/null 2>&1; then
+  cat <<EOF >&2
+FAIL: Local 'git tag -v $TAG' verification did not succeed.
+       This means even the local GPG can't verify what it just signed —
+       check that the release key is not expired or revoked.
+       Deleting the bad tag for cleanup:
+EOF
+  git tag -d "$TAG" >/dev/null 2>&1 || true
+  exit 1
+fi
+echo "  [9/10] Local 'git tag -v' verification passed"
+# ---------------------------------------------------------------------------
+# 10. Report next steps; do NOT auto-push
+# ---------------------------------------------------------------------------
+cat <<EOF
+  [10/10] Ready to push. The push step is left manual on purpose —
+  inspect the tag once more with 'git tag -v $TAG' if you want, then:
+    git push origin main --tags
+    git push github main --tags    # if mirroring to GitHub
+  After push, watch for the gitea-release / npm-publish / github-mirror
+  workflows. The supply-chain gate runs first; a green gate is the
+  precondition for any artifact emission.
+====================================================================
+EOF