aiwg 2026.5.4 → 2026.5.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/CLAUDE.md +4 -4
  2. package/README.md +11 -0
  3. package/agentic/code/addons/agent-loop/agents/ralph-verifier.md +6 -0
  4. package/agentic/code/addons/agent-loop/manifest.json +2 -1
  5. package/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md +18 -2
  6. package/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md +16 -4
  7. package/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md +323 -0
  8. package/agentic/code/addons/agent-loop/skills/ralph/SKILL.md +21 -8
  9. package/agentic/code/addons/aiwg-utils/manifest.json +4 -2
  10. package/agentic/code/addons/aiwg-utils/rules/RULES-INDEX.md +6 -1
  11. package/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md +257 -0
  12. package/agentic/code/frameworks/sdlc-complete/skills/flow-release/SKILL.md +14 -0
  13. package/agentic/code/frameworks/sdlc-complete/templates/aiwg-sections/02b-discover-first.md +42 -0
  14. package/agentic/code/frameworks/sdlc-complete/templates/aiwg-sections/manifest.json +7 -0
  15. package/agentic/code/frameworks/sdlc-complete/templates/copilot/copilot-instructions.md.aiwg-template +11 -0
  16. package/dist/src/cli/handlers/use.d.ts.map +1 -1
  17. package/dist/src/cli/handlers/use.js +21 -0
  18. package/dist/src/cli/handlers/use.js.map +1 -1
  19. package/dist/src/cli/project-isolation/detect.d.ts +8 -0
  20. package/dist/src/cli/project-isolation/detect.d.ts.map +1 -0
  21. package/dist/src/cli/project-isolation/detect.js +70 -0
  22. package/dist/src/cli/project-isolation/detect.js.map +1 -0
  23. package/dist/src/cli/project-isolation/index.d.ts +6 -0
  24. package/dist/src/cli/project-isolation/index.d.ts.map +1 -0
  25. package/dist/src/cli/project-isolation/index.js +7 -0
  26. package/dist/src/cli/project-isolation/index.js.map +1 -0
  27. package/dist/src/cli/project-isolation/signals.d.ts +4 -0
  28. package/dist/src/cli/project-isolation/signals.d.ts.map +1 -0
  29. package/dist/src/cli/project-isolation/signals.js +17 -0
  30. package/dist/src/cli/project-isolation/signals.js.map +1 -0
  31. package/dist/src/cli/project-isolation/warning.d.ts +20 -0
  32. package/dist/src/cli/project-isolation/warning.d.ts.map +1 -0
  33. package/dist/src/cli/project-isolation/warning.js +102 -0
  34. package/dist/src/cli/project-isolation/warning.js.map +1 -0
  35. package/dist/src/mcp/tools/discovery.mjs +1 -1
  36. package/package.json +1 -1
  37. package/tools/agents/providers/hermes.mjs +21 -5
  38. package/tools/release/cut-tag.sh +250 -0
  39. package/tools/warp/setup-warp.mjs +73 -1
package/CLAUDE.md CHANGED
@@ -663,10 +663,10 @@ Before pushing a version tag:
663
663
  npm run uat:serve-live
664
664
  ```
665
665
  Tests skip cleanly when `AIWG_SANDBOX_ENDPOINT` is unset or unreachable, so this is a safe no-op gate. Run before any release that touches `src/serve/`, the executor contract, or the MC ↔ serve bridge.
666
- 5. **Commit and tag** - `git tag -m "vX.X.X" vX.X.X`
667
- 6. **Push tag to Gitea** - `git push origin main --tags` (automatically creates Gitea Release)
668
- 7. **Optionally mirror to GitHub** - `git push github main --tags`
669
- 8. **Update/Create GitHub Release manually** - via `gh release create|edit`
666
+ 5. **Commit the release prep** — `git commit` the package.json/CHANGELOG/announcement bump. Do NOT use plain `git tag -a` or `git tag -s` (they sign with `user.signingkey`, which is typically the maintainer's *personal commit-signing key* — wrong key for tags; the supply-chain gate `tools/ci/verify-signed-tag.sh` will reject in CI).
667
+ 6. **Cut the tag via the wrapper** — `tools/release/cut-tag.sh <X.Y.Z>`. Runs 10 pre-tag checks (CalVer shape, `package.json` + `marketplace.json` lockstep, CHANGELOG entry, announcement file present, release-signing key both present locally AND published in `.gitea/keys/maintainers.asc`) and signs with `-u <RELEASE_KEY_FINGERPRINT>` (default: `FE9272F0BC5781E1DE77FAAA719AB63879E84CE8`, the `AIWG Release Signing <release@aiwg.io>` key per the two-key model from commit `a13dabc5`). See the v2026.5.5 incident note in `docs/contributing/versioning.md` for what happens when this is skipped.
668
+ 7. **Push tag to Gitea** `git push origin main --tags`. Triggers `gitea-release.yml` + `npm-publish.yml` (both gated on signed-tag verify).
669
+ 8. **Mirror to GitHub** `git push github main --tags`. Triggers `github-mirror.yml` which creates the GitHub Release using `docs/releases/v<version>-announcement.md` as the body. **No manual `gh release create` needed for stable releases** — the workflow handles it. Pre-release tags (`-rc.*`, `-alpha.*`, `-beta.*`, `-nightly.*`) skip GitHub-Release creation by design.
670
670
 
671
671
  ### Version Format
672
672
 
package/README.md CHANGED
@@ -49,6 +49,17 @@ AIWG is a deployment tool and support utility for AI context. At its core, `aiwg
49
49
 
50
50
  Around that core, AIWG ships utilities for things the base platforms do not handle on their own: persistent artifact memory (`.aiwg/`), background orchestration (`aiwg mc`), autonomous loops (`aiwg ralph`), artifact indexing (`aiwg index`), cost telemetry, health diagnostics, and more. Most are opt-in. The deployment layer works standalone as plain text files the platform reads natively.
51
51
 
52
+ ### Project scope (recommended) vs user scope (global)
53
+
54
+ `aiwg use` writes artifacts at one of two scopes. Both are first-class supported (see ADR-NUA-001 in `.aiwg/studies/novice-user-adoption/`):
55
+
56
+ - **Project scope** — default. Run `aiwg use sdlc` from a project root and the artifacts land in `./.claude/agents/`, `./.claude/skills/`, etc. One project's agent set never bleeds into another's session. **This is the recommended default for most use cases.**
57
+ - **User scope (global install)** — `aiwg use sdlc --scope user` writes to `~/.claude/agents/`, `~/.claude/skills/`, etc. Same artifact set loads into every session, regardless of project. Fits "AIWG in every conversation" workflows and is the canonical mode for OpenClaw and Hermes (whose primary discovery is user-scope).
58
+
59
+ The trade-off is real: when the same agent set loads into every session, context from one project can bleed into reasoning about another. Research (REF-720, *Lost in Multi-Turn Conversation*, MSR/Salesforce 2025) measured a 39% capability drop when this happens. The non-blocking project-isolation warning surfaces the trade-off at deploy time so the scope choice is informed. Neither scope is wrong; pick the one that fits the workflow.
60
+
61
+ See `docs/cli-reference.md` (under `aiwg use` → "Scope models") for the per-provider details and the global-install rough-edge inventory.
62
+
52
63
  ## Simple Building Blocks
53
64
 
54
65
  AIWG ships five primitive artifact types. All are plain text:
@@ -14,6 +14,12 @@ allowed-tools: Bash, Read, Glob
14
14
 
15
15
  You verify completion criteria for agent loops - determining if a task iteration succeeded by running verification commands and analyzing their output.
16
16
 
17
+ ## Companion skill
18
+
19
+ When the loop is started without explicit `--completion`, the criterion you verify is produced by the `infer-completion-criteria` skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). It derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts.
20
+
21
+ You do not run that skill yourself — the loop orchestrator (`ralph-loop` agent or external launcher) calls it during initialization. Your job is to take whatever criterion is in the loop state and verify it. The skill writes its rationale into `.aiwg/ralph/<loop-id>/progress.md` (or `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` for external loops); when reporting verification results, you may reference that rationale so the user sees the full evidence chain.
22
+
17
23
  ## Capabilities
18
24
 
19
25
  ### Verification Methods
@@ -50,7 +50,8 @@
50
50
  "execute-feedback",
51
51
  "reflection-injection",
52
52
  "auto-test-execution",
53
- "mission-control"
53
+ "mission-control",
54
+ "infer-completion-criteria"
54
55
  ],
55
56
  "agents": [
56
57
  "ralph-loop",
@@ -74,7 +74,19 @@ Alternate expressions and non-obvious activations (primary phrases are matched a
74
74
 
75
75
  ### Completion Inference
76
76
 
77
- When user doesn't specify explicit verification:
77
+ When the user doesn't specify explicit verification, delegate to the **`infer-completion-criteria`** skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). That skill runs a deterministic 5-layer pipeline:
78
+
79
+ 1. **Task verb** → criterion class (test-pass, type-clean, regression-gate, coverage, lint-clean, build-pass, implement-feature)
80
+ 2. **Project context files** (CLAUDE.md / AGENTS.md / AIWG.md) → canonical commands from the Development section
81
+ 3. **Package manifests** (`package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.) → discovered scripts
82
+ 4. **CI configuration** (`.github/workflows/`, `.gitea/workflows/`, GitLab/CircleCI/Jenkins) → team's actual "passes" definition
83
+ 5. **`.aiwg/` artifacts** (test-strategy, related use cases by ID match, prior progress files) → project-specific gates
84
+
85
+ Synthesis is validated against the `vague-discretion` rule and emits a structured YAML proposal with criterion, verification command, rationale chain, confidence level, and alternatives considered.
86
+
87
+ **Use the inline table below ONLY as a last-resort fallback** when the inference skill is unavailable (degraded environment, missing skill deployment). It is intentionally narrow — JavaScript/Node-centric — and represents prior state before `infer-completion-criteria` was added.
88
+
89
+ Legacy fallback table:
78
90
 
79
91
  | Task Pattern | Inferred Completion |
80
92
  |--------------|---------------------|
@@ -86,6 +98,8 @@ When user doesn't specify explicit verification:
86
98
  | "migrate to ESM" | "node runs without errors" |
87
99
  | "refactor X" | "npm test passes" (preserve behavior) |
88
100
 
101
+ When the inference skill IS available, prefer it. The skill handles multi-language projects, monorepos, CI-defined gates, use-case acceptance criteria, and the refusal case (truly vague tasks like "make it better" that have no measurable criterion).
102
+
89
103
  ### Examples
90
104
 
91
105
  **User**: "ralph this: migrate all files in lib/ to ESM"
@@ -301,7 +315,9 @@ User: "actually, abort that and just fix the login bug"
301
315
 
302
316
  ## Related
303
317
 
304
- - `ralph` skill - the iterative loop executor implementation
318
+ - `infer-completion-criteria` skill - derives measurable `--completion` from project state when the user doesn't supply one
319
+ - `ralph` skill - the iterative loop executor implementation (legacy name; `agent-loop` is canonical)
320
+ - `agent-loop-ext` skill - crash-resilient external loop with state persistence
305
321
  - `ralph-status` skill - check loop progress
306
322
  - `ralph-resume` skill - continue interrupted loops
307
323
  - `ralph-abort` skill - abort active loops
@@ -5,7 +5,7 @@ legacyName: ralph-external
5
5
  platforms: [all]
6
6
  description: Crash-resilient external agent loop with state persistence and CI/CD integration
7
7
  commandHint:
8
- argumentHint: "\"<objective>\" --completion \"<criteria>\" [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet]"
8
+ argumentHint: "\"<objective>\" [--completion \"<criteria>\"] [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet] [--auto-criteria | --no-infer-completion]"
9
9
  allowedTools: Bash, Read, Write
10
10
  model: sonnet
11
11
  category: automation
@@ -60,7 +60,7 @@ Users may say:
60
60
  ### Objective (required)
61
61
  The task the loop should accomplish. Passed as the first positional argument.
62
62
 
63
- ### --completion (required)
63
+ ### --completion (optional — inferred when omitted)
64
64
  Success criteria as a verifiable command. The loop exits when this command returns exit code 0.
65
65
 
66
66
  **Good examples**:
@@ -68,6 +68,14 @@ Success criteria as a verifiable command. The loop exits when this command retur
68
68
  - `--completion "npx tsc --noEmit exits with code 0"`
69
69
  - `--completion "coverage report shows >80%"`
70
70
 
71
+ **When omitted**: the launcher invokes the `infer-completion-criteria` skill before the external loop starts. The skill derives a measurable criterion from project state (CLAUDE.md / AGENTS.md / AIWG.md, package manifests, CI configuration, `.aiwg/` artifacts) and emits a structured proposal with rationale. The proposal is written to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` and used as the loop's gate.
72
+
73
+ Because `agent-loop-ext` runs externally (potentially headless / in CI), the confirmation flow is:
74
+ - Interactive session (TTY attached): show proposal, accept `Y / n / edit` like the in-session `ralph` skill
75
+ - Non-interactive / `--auto-criteria` / CI environment: use the inferred criterion if confidence is `high`, otherwise fail fast and print the proposal as a diagnostic so the user can re-launch with `--completion` explicitly
76
+
77
+ Pass `--no-infer-completion` to require explicit `--completion` and fail before launch if missing. See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`.
78
+
71
79
  ### --max-iterations (default: 10)
72
80
  Maximum iterations before the loop halts and saves state for manual review.
73
81
 
@@ -90,8 +98,12 @@ Suppress verbose progress output. Completion banner is always shown.
90
98
 
91
99
  When triggered:
92
100
 
93
- 1. Validate that `--completion` criteria are specified and verifiable
94
- 2. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
101
+ 1. **Resolve completion criteria**:
102
+ - If `--completion` is provided use it directly
103
+ - Else if `--no-infer-completion` is set → fail fast before launch with a helpful error
104
+ - Else → invoke `infer-completion-criteria` skill, persist proposal to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml`, confirm or auto-adopt per session-interactivity rules above
105
+ 2. Validate the resolved criterion is verifiable (can be checked via command)
106
+ 3. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
95
107
  3. Generate a unique `loop-id` (8-character hex) and create the loop state file at `.aiwg/ralph-external/loops/<loop-id>.json`
96
108
  4. Write the initial state: `{ objective, completionCriteria, maxIterations, timeout, provider, status: "pending", iteration: 0 }`
97
109
  5. If `--branch` is specified, create the git branch now
@@ -0,0 +1,323 @@
1
+ ---
2
+ namespace: aiwg
3
+ name: infer-completion-criteria
4
+ aliases: [agent-loop-infer-completion, al-infer-completion, ralph-infer-completion]
5
+ platforms: [all]
6
+ description: Infer measurable completion criteria for an agent-loop task from project docs, code, and AIWG standards when the user has not supplied --completion explicitly
7
+ commandHint:
8
+ argumentHint: '"<task description>" [--task-type code|test|docs|refactor] [--non-interactive]'
9
+ allowedTools: "Read, Glob, Grep, Bash"
10
+ model: sonnet
11
+ category: automation
12
+ ---
13
+
14
+ # Infer Completion Criteria
15
+
16
+ ## Purpose
17
+
18
+ When a user starts an `agent-loop` task without supplying `--completion`, this skill derives a measurable, verifiable completion criterion from project state. The output must satisfy the `vague-discretion` rule: a concrete shell command or file-inspection check that returns pass/fail unambiguously.
19
+
20
+ Iteration is only as good as its gate. A loop with a vague gate ("until it's done") runs forever or exits prematurely. This skill is what turns "agent-loop this" into "agent-loop this until `<measurable thing>`."
21
+
22
+ The canonical name for the iterative-loop addon is **agent-loop**. `ralph` is the legacy name for the executor skill, retained as an alias; `al` is a short form. The detection/routing skill is `agent-loop` (which delegates to this skill when criteria are missing); the executor is `ralph` (canonical name forthcoming). Everywhere this skill says "agent-loop" you can read "ralph" as the legacy equivalent.
23
+
24
+ ## When This Skill Runs
25
+
26
+ This skill is invoked by:
27
+
28
+ - The `agent-loop` detection-and-routing skill when it parses a user request without explicit completion criteria
29
+ - The `ralph` executor skill during Phase 1 initialization when `--completion` is omitted
30
+ - The `agent-loop-ext` external-loop launcher during pre-launch resolution when `--completion` is omitted
31
+ - Direct invocation via `aiwg discover "infer completion"` → `aiwg show skill infer-completion-criteria` when a user wants to preview the inferred criterion before committing to a loop
32
+
33
+ This skill does **not** run when `--completion` is explicit. The user's word is authoritative.
34
+
35
+ ## Inference Pipeline
36
+
37
+ The skill is a deterministic walk through five evidence layers, plus one synthesis step. Each layer contributes candidate criteria; the synthesis picks the strongest measurable one and explains the chain of evidence.
38
+
39
+ ### Layer 1 — The task verb
40
+
41
+ Parse the user's task description for an intent verb. Map to a default criterion class:
42
+
43
+ | Verb / phrase | Criterion class |
44
+ |---|---|
45
+ | "fix tests", "make tests pass", "test failure" | Test suite passes (exit 0) |
46
+ | "add tests", "increase coverage", "test coverage" | Coverage threshold met |
47
+ | "fix types", "type errors", "migrate to typescript" | Type checker exits 0 |
48
+ | "fix lint", "clean up warnings", "style" | Linter exits 0 |
49
+ | "build", "make it compile" | Build command exits 0 |
50
+ | "refactor", "extract", "rename" | Tests still pass AND build still passes (regression gate) |
51
+ | "implement <X>", "add feature <X>" | Tests for the new code exist and pass |
52
+ | "document", "add docs", "JSDoc" | Coverage check on docstrings/JSDoc presence |
53
+ | "fix bug", "resolve issue #N" | Specific test for that bug passes AND existing suite still green |
54
+ | "migrate", "upgrade" | Build + test + lint all green (no regression) |
55
+
56
+ If the verb is ambiguous, the skill falls back to "regression gate" (build + test + lint all green) as the safest default.
57
+
58
+ ### Layer 2 — Project conventions in CLAUDE.md / AGENTS.md / AIWG.md
59
+
60
+ Read the project's context files. AIWG-managed projects often declare commands directly:
61
+
62
+ ```bash
63
+ # Run tests
64
+ npm test
65
+
66
+ # Type check
67
+ npx tsc --noEmit
68
+
69
+ # Lint markdown
70
+ npm exec markdownlint-cli2 "**/*.md"
71
+ ```
72
+
73
+ Extract these as the canonical commands for their respective domains. The Development section of `CLAUDE.md` is the highest-trust source here — it's what the project's maintainers run.
74
+
75
+ Also scan for explicit completion-criterion conventions. Some projects state "a commit is not finished until CI passes" — that signals the CI command (or equivalent local invocation) is the gate.
76
+
77
+ ### Layer 3 — Package manifests and config
78
+
79
+ Inspect the project's manifest files to discover scripts and tools:
80
+
81
+ | Manifest | Where to look |
82
+ |---|---|
83
+ | `package.json` | `scripts.test`, `scripts.lint`, `scripts.build`, `scripts.coverage`, `scripts.typecheck` |
84
+ | `Cargo.toml` | implies `cargo test`, `cargo build`, `cargo clippy` |
85
+ | `pyproject.toml` | `[tool.pytest]`, `[tool.ruff]`, `[tool.mypy]`, `scripts.*` |
86
+ | `go.mod` | implies `go test ./...`, `go vet ./...`, `go build ./...` |
87
+ | `Gemfile` | implies `bundle exec rspec`, `bundle exec rubocop` |
88
+ | `pom.xml` / `build.gradle` | `mvn test`, `mvn verify`, `gradle test` |
89
+ | `.tool-versions` / `mise.toml` | language version pins inform which tool is canonical |
90
+
91
+ When multiple scripts exist (e.g. `test`, `test:unit`, `test:integration`), prefer the script the project's own docs reference. If the docs don't reference any, prefer the most specific match to the task verb (e.g. for "fix integration test" → `test:integration`).
92
+
93
+ ### Layer 4 — CI configuration
94
+
95
+ CI files encode the team's actual definition of "passes":
96
+
97
+ | CI system | Scan |
98
+ |---|---|
99
+ | GitHub Actions | `.github/workflows/*.yml` — extract `run:` steps from non-deploy jobs |
100
+ | Gitea Actions | `.gitea/workflows/*.yml` — same |
101
+ | GitLab CI | `.gitlab-ci.yml` — extract `script:` from test/lint jobs |
102
+ | CircleCI | `.circleci/config.yml` |
103
+ | Jenkins | `Jenkinsfile` |
104
+
105
+ The first non-trivial verification step in the primary workflow is the team's canonical "done" gate. If CI runs `npm test && npm run lint && npm run typecheck` in order, the inferred criterion is "all three exit 0."
106
+
107
+ ### Layer 5 — AIWG artifacts
108
+
109
+ If the project has a `.aiwg/` directory, scan for relevant context:
110
+
111
+ - `.aiwg/testing/test-strategy.md` — declared verification approach
112
+ - `.aiwg/architecture/software-architecture-doc.md` — architectural quality gates
113
+ - `.aiwg/security/security-gates.md` — security-related criteria
114
+ - `.aiwg/quality/code-review-guide.md` — code quality bars
115
+ - `.aiwg/activity.log` — recent operations that may indicate what "done" looked like for similar past tasks
116
+ - `.aiwg/working/<related-progress-files>.md` — prior task progress files; mine the "Completion criteria" sections
117
+
118
+ If the project has a related use case (`.aiwg/requirements/UC-*.md`) whose ID is in the task description, pull that use case's acceptance criteria — those ARE the completion criteria.
119
+
120
+ ### Synthesis step
121
+
122
+ Combine the layers into a single proposed criterion. The decision logic:
123
+
124
+ 1. If a use case in `.aiwg/requirements/` matches the task, use its acceptance criteria verbatim. Done.
125
+ 2. Otherwise, take the verb-class default from Layer 1 and instantiate it using the canonical command from Layer 2 (CLAUDE.md) > Layer 4 (CI) > Layer 3 (manifest).
126
+ 3. If the task is in the "regression gate" class, AND the project's CI runs more than one verification, combine them: `command-A passes AND command-B passes AND command-C passes`.
127
+ 4. If no canonical command is found in any layer (unusual — typically only on empty-scaffold projects), fall back to:
128
+ - `<file or change exists in git diff against HEAD~1>` — pure structural check
129
+ - And inform the user that a substantive verification command should be added.
130
+
131
+ ### Apply AIWG standards (vague-discretion)
132
+
133
+ Validate the proposal against the `vague-discretion` rule:
134
+
135
+ - Criterion must be expressible as a shell command (or shell pipeline) that exits 0 on pass, non-zero on fail.
136
+ - Criterion must NOT use the words "good enough", "thorough", "comprehensive", "complete" without a measurable suffix.
137
+ - Criterion must NOT be self-referential ("the agent is satisfied" — no).
138
+ - Criterion must have an implicit or explicit `max-iterations` cap (ralph's default 10 is the floor; very large refactors may need 20).
139
+
140
+ If the proposal fails any of these, regenerate. If after two regenerations the proposal still fails, surface the problem to the user with the diagnostic ("could not find a measurable verification command — please supply one explicitly").
141
+
142
+ ## Output Contract
143
+
144
+ The skill emits a single block of structured output for the calling skill (`agent-loop` router, `ralph` executor, or `agent-loop-ext` launcher) to consume:
145
+
146
+ ```yaml
147
+ proposed_completion:
148
+ criterion: "npm test passes AND npx tsc --noEmit exits 0"
149
+ verification_command: "npm test && npx tsc --noEmit"
150
+ rationale:
151
+ - "Task verb 'refactor' triggers regression gate (Layer 1)"
152
+ - "package.json scripts.test = 'jest --coverage' (Layer 3)"
153
+ - "CLAUDE.md Development section references both npm test and npx tsc --noEmit (Layer 2)"
154
+ - ".github/workflows/ci.yml runs both as required checks (Layer 4)"
155
+ confidence: high # high | medium | low
156
+ alternatives_considered:
157
+ - criterion: "npm run lint exits 0"
158
+ rejected_because: "Lint is not in CI required checks for this repo"
159
+ max_iterations_suggestion: 10
160
+ needs_human_confirmation: false # true if confidence == low OR criterion is unusual
161
+ ```
162
+
163
+ When the skill runs in non-interactive mode (via `aiwg al --auto-criteria` or the equivalent on `agent-loop` / `agent-loop-ext`), `needs_human_confirmation: false` proceeds directly. Otherwise the consuming skill (the `agent-loop` router or the `ralph` / `agent-loop-ext` executor) shows the proposal to the user and confirms.
164
+
165
+ ## Interaction With The User
166
+
167
+ In interactive mode, after running the pipeline, present the proposal:
168
+
169
+ ```
170
+ No --completion criteria was provided. I inferred:
171
+
172
+ Criterion: npm test passes AND npx tsc --noEmit exits 0
173
+ Verification: `npm test && npx tsc --noEmit`
174
+
175
+ Evidence:
176
+ - Task verb "refactor" → regression gate
177
+ - package.json scripts.test = jest --coverage
178
+ - CLAUDE.md Development section references both checks
179
+ - .github/workflows/ci.yml requires both
180
+
181
+ Proceed with this criterion? [Y/n/edit]
182
+ ```
183
+
184
+ User options:
185
+ - `Y` (default): start the loop with the inferred criterion
186
+ - `n`: abort and request the user supply `--completion` explicitly
187
+ - `edit`: accept a manual edit to the criterion before proceeding
188
+
189
+ Use the platform's native interaction tool when available (per `native-ux-tools` rule). On Claude Code, that means `AskUserQuestion`.
190
+
191
+ ## When Inference Should NOT Run
192
+
193
+ - `--completion` is explicit → use the user's criterion, don't second-guess
194
+ - `--no-infer-completion` flag is passed → fail fast with a helpful error if `--completion` is also missing
195
+ - The task description is itself a criterion (e.g. "make `npx tsc --noEmit` pass") → extract the command from the task, don't re-infer
196
+
197
+ ## Edge Cases
198
+
199
+ | Case | Handling |
200
+ |---|---|
201
+ | No `package.json`, no manifest, no CI | Scan project root for any executable test runner (`pytest`, `go test`, `cargo test`, `make test`, `Makefile` target `test`). Fall back to the structural check if none found. |
202
+ | Multiple test commands (`test:unit`, `test:integration`, `test:e2e`) | Prefer the one nearest to the task scope. If task mentions "unit", use `test:unit`. If the task is broad, prefer the union via `&&`. |
203
+ | CI runs tests on multiple OS/Node versions | Use the local invocation (`npm test`), not the matrix runner. The matrix is a deploy concern. |
204
+ | Monorepo with multiple packages | Detect from `pnpm-workspace.yaml` / `lerna.json` / `turbo.json` / workspaces field. If the task scope is one package, infer that package's commands. If the task spans the monorepo, use the top-level `test` script. |
205
+ | Project has no tests at all | This is a finding. The inferred criterion should be "tests exist for the new code AND those tests pass." Surface to the user that the project lacks a baseline test suite — that's important context for the loop's expectations. |
206
+ | Project's tests are currently broken (the task IS to fix them) | Set the criterion to the passing condition. The whole point of the loop is to get from current red state to green. |
207
+ | Conflict between layers (CLAUDE.md says X, CI says Y) | Prefer CLAUDE.md (closer to the maintainer's intent). Note the discrepancy in the rationale. |
208
+
209
+ ## Interaction With Other AIWG Rules
210
+
211
+ | Rule | How this skill respects it |
212
+ |---|---|
213
+ | `vague-discretion` | The whole point — produces measurable, command-form criteria |
214
+ | `instruction-comprehension` | Reads the task description carefully; doesn't override explicit user instructions |
215
+ | `research-before-decision` | Layer-walk IS research; doesn't propose criteria without evidence |
216
+ | `human-authorization` | Confirms with user before starting loop (unless `--auto-criteria` explicitly granted) |
217
+ | `auto-compact-continue` | The inferred criterion IS what the loop continues toward; no "should I keep working" prompts |
218
+ | `cli-secondary` | Uses `aiwg discover` to find related skills if the task verb is unusual |
219
+
220
+ ## Examples
221
+
222
+ ### Example 1: Simple test task on a TypeScript project
223
+
224
+ ```
225
+ $ agent-loop "fix the failing auth tests" # or: aiwg al / aiwg ralph (legacy)
226
+
227
+ Inferring completion criteria...
228
+ Layer 1 verb: "fix tests" → test-pass class
229
+ Layer 2 CLAUDE.md: "npm test" is the canonical test command
230
+ Layer 3 package.json: scripts.test = "jest"
231
+ Layer 4 CI: .github/workflows/ci.yml runs `npm test`
232
+ Layer 5: no related use case found
233
+
234
+ Proposed criterion: npm test passes (exit 0)
235
+ Verification: `npm test`
236
+ Confidence: high
237
+
238
+ Proceed? [Y/n/edit]
239
+ ```
240
+
241
+ ### Example 2: Refactor with no tests
242
+
243
+ ```
244
+ $ agent-loop "extract auth logic into a separate module"
245
+
246
+ Inferring completion criteria...
247
+ Layer 1 verb: "extract" → regression gate
248
+ Layer 2 CLAUDE.md: no Development section
249
+ Layer 3 package.json: scripts.test = "echo 'no tests'" (degenerate)
250
+ Layer 4 CI: no workflows found
251
+ Layer 5: no related use case found
252
+
253
+ Warning: project has no functional test suite or CI configuration.
254
+ Falling back to structural verification.
255
+
256
+ Proposed criterion: The new module exists, the original code references it,
257
+ AND `npx tsc --noEmit` still exits 0
258
+ Verification: `test -f src/auth/index.ts && grep -q 'from.*src/auth' src/main.ts && npx tsc --noEmit`
259
+ Confidence: low
260
+
261
+ This is a weak gate. Consider supplying --completion explicitly
262
+ or adding a test suite first.
263
+
264
+ Proceed? [Y/n/edit]
265
+ ```
266
+
267
+ ### Example 3: Task with an explicit use case reference
268
+
269
+ ```
270
+ $ agent-loop "implement UC-AUTH-001"
271
+
272
+ Inferring completion criteria...
273
+ Layer 1 verb: "implement" → tests-exist class
274
+ Layer 5: found .aiwg/requirements/UC-AUTH-001-user-login.md
275
+
276
+ Using acceptance criteria from UC-AUTH-001:
277
+ - [ ] User can log in with valid email/password
278
+ - [ ] Invalid credentials show clear error message
279
+ - [ ] Account locks after 5 failed attempts
280
+ - [ ] Login completes within 2 seconds
281
+
282
+ Proposed criterion: All acceptance criteria from UC-AUTH-001 verified by tests,
283
+ AND `npm test -- --testPathPattern=auth` passes
284
+ Verification: `npm test -- --testPathPattern=auth`
285
+ Confidence: high (acceptance criteria are explicit)
286
+
287
+ Proceed? [Y/n/edit]
288
+ ```
289
+
290
+ ### Example 4: Refusal case
291
+
292
+ ```
293
+ $ agent-loop "make the code better"
294
+
295
+ Inferring completion criteria...
296
+ Layer 1 verb: "make better" → AMBIGUOUS, no clear criterion class
297
+ Layer 5: no related use case
298
+
299
+ Cannot infer measurable criteria for this task.
300
+
301
+ "Make the code better" is vague (per AIWG vague-discretion rule).
302
+ A loop with no measurable gate runs forever or exits prematurely.
303
+
304
+ Please supply --completion with a concrete check, e.g.:
305
+ --completion "npm test passes AND npm run lint exits 0"
306
+ --completion "all functions in src/utils/ have JSDoc"
307
+ --completion "complexity score from eslint < 10 for all files"
308
+
309
+ Or rephrase the task with a concrete intent:
310
+ agent-loop "reduce cyclomatic complexity in src/utils/"
311
+ agent-loop "add JSDoc to all exported functions in src/api/"
312
+ ```
313
+
314
+ ## References
315
+
316
+ - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Measurable criteria requirement
317
+ - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Layer-walk research pattern
318
+ - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/instruction-comprehension.md — Don't override explicit user criteria
319
+ - @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md — Criterion IS what the loop continues toward
320
+ - @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md — The detection/routing skill that delegates here when `--completion` is missing
321
+ - @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/ralph/SKILL.md — The legacy executor skill that consumes this output (`ralph` is legacy for `agent-loop`'s executor)
322
+ - @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md — The crash-resilient external loop, same delegation pattern
323
+ - @$AIWG_ROOT/agentic/code/addons/agent-loop/agents/ralph-verifier.md — Runs the verification command this skill proposes
@@ -6,7 +6,7 @@ deprecated_names: [ralph]
6
6
  platforms: [all]
7
7
  description: Execute iterative task loop until completion criteria are met - iteration beats perfection
8
8
  commandHint:
9
- argumentHint: '"<task>" --completion "<criteria>" [--max-iterations N] [--timeout M] [--interactive --guidance "text"]'
9
+ argumentHint: '"<task>" [--completion "<criteria>"] [--max-iterations N] [--timeout M] [--interactive --guidance "text"] [--auto-criteria | --no-infer-completion]'
10
10
  allowedTools: "Task, Read, Write, Bash, Glob, Grep, TodoWrite, Edit"
11
11
  model: opus
12
12
  category: automation
@@ -50,7 +50,7 @@ The task to execute. Should be:
50
50
  - Measurable completion state
51
51
  - Self-contained (all context provided)
52
52
 
53
- ### --completion (required)
53
+ ### --completion (optional — inferred when omitted)
54
54
  Success criteria. Must be:
55
55
  - Verifiable (tests, lint, compilation)
56
56
  - Specific (not subjective)
@@ -66,6 +66,10 @@ Success criteria. Must be:
66
66
  - `--completion "code looks good"`
67
67
  - `--completion "feature is done"`
68
68
 
69
+ **When omitted**: the loop delegates to the `infer-completion-criteria` skill, which derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts. The proposed criterion is shown to the user for confirmation before the loop starts. Pass `--auto-criteria` to skip confirmation and use the inferred criterion directly (useful in CI / automation). Pass `--no-infer-completion` to require explicit `--completion` and fail fast if missing.
70
+
71
+ See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md` for the inference pipeline.
72
+
69
73
  ### --max-iterations (default: 10)
70
74
  Safety limit on iterations. Prevents infinite loops.
71
75
 
@@ -94,12 +98,21 @@ Create feature branch for loop work.
94
98
 
95
99
  ### Phase 1: Initialization
96
100
 
97
- 1. Parse task and completion criteria
98
- 2. Validate criteria are verifiable (can be checked via command)
99
- 3. Create `.aiwg/ralph/` workspace if not exists
100
- 4. Initialize iteration counter (i=0)
101
- 5. Create feature branch if --branch specified
102
- 6. Log initialization
101
+ 1. Parse task
102
+ 2. **Completion-criteria resolution**:
103
+ - If `--completion` is provided use it directly
104
+ - Else if `--no-infer-completion` is set → fail fast with a helpful error
105
+ - Else invoke the `infer-completion-criteria` skill on the task description
106
+ - The skill returns a proposed criterion with rationale and confidence level
107
+ - If `--auto-criteria` is set OR confidence is `high`, adopt the proposal silently and log it
108
+ - Otherwise, surface the proposal to the user via the platform's native interaction tool (`AskUserQuestion` on Claude Code, formatted text elsewhere per `native-ux-tools`); accept `Y` / `n` / `edit`
109
+ - If the user rejects, abort the loop and ask them to supply `--completion` explicitly
110
+ 3. Validate the final criterion is verifiable (can be checked via command)
111
+ 4. Create `.aiwg/ralph/` workspace if not exists
112
+ 5. Initialize iteration counter (i=0)
113
+ 6. Create feature branch if --branch specified
114
+ 7. **Write the criterion and its rationale into the loop's progress file** (`.aiwg/ralph/<loop-id>/progress.md`) per the `auto-compact-continue` rule — this survives compaction and resumption
115
+ 8. Log initialization
103
116
 
104
117
  **Communicate**:
105
118
  ```
@@ -36,7 +36,8 @@
36
36
  "consolidation": {
37
37
  "strategy": "index-with-links",
38
38
  "rulesIndex": "rules/RULES-INDEX.md",
39
- "deployIndexOnly": true
39
+ "deployIndexOnly": false,
40
+ "_note": "deployIndexOnly was true (#1343); changed to false 2026-05-14 so individual rule files (including skill-discovery.md) deploy to every provider's rules directory alongside the index. RULES-INDEX.md remains the canonical entry point; its links now resolve to local files on every provider, not just Claude/Cursor."
40
41
  },
41
42
  "commands": [],
42
43
  "agents": [
@@ -121,7 +122,8 @@
121
122
  "post-commit-index-refresh",
122
123
  "soul-enforcement",
123
124
  "skill-discovery",
124
- "cli-secondary"
125
+ "cli-secondary",
126
+ "auto-compact-continue"
125
127
  ],
126
128
  "templates": [
127
129
  "devkit/addon-manifest",
@@ -4,10 +4,15 @@ Core meta-utility rules for agent coordination, context management, and platform
4
4
 
5
5
  ---
6
6
 
7
- ## AIWG Utilities Rules (20 rules — active with aiwg-utils addon)
7
+ ## AIWG Utilities Rules (21 rules — active with aiwg-utils addon)
8
8
 
9
9
  ### HIGH
10
10
 
11
+ #### auto-compact-continue
12
+ **Summary**: The answer to "should I keep working?" is always YES — until the task's measurable completion criteria are met or the user redirects. Context pressure, long tool outputs, and high iteration counts are not scope questions. When context fills, the right response is to compact, checkpoint to durable storage (activity log, progress file at `.aiwg/working/<task>-progress.md`, git history, AIWG memory), and continue. Trust platform auto-compact (REF-910); load-bearing state must live in `CLAUDE.md` / `AGENTS.md` / `AIWG.md` / activity log / progress file / git / `.aiwg/working/` — never only in conversation turns. Maintain a `## Compact Instructions` section so the summarizer preserves completion criteria, last successful step, failed approaches (do not let them be re-attempted), and authorization questions. Apply aggressive in-session compression (REF-122: passive=6% savings, aggressive=22.7%); update the progress file every 10–15 tool calls on long tasks. Exceptions: authorization gates (per `human-authorization`), 3-attempts-failed escalation (per `anti-laziness` Rule 6), explicit user redirect, genuinely ambiguous new directive classification (per `skill-discovery` Rule 0). "Should I continue?" is never the right question.
13
+ **When to apply**: Any long-running session; before context fills; after long tool outputs; when crossing iteration thresholds; when resuming after compaction; when tempted to ask the user permission to keep going
14
+ **Full rule**: @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md
15
+
11
16
  #### cli-secondary
12
17
  **Summary**: AIWG is agentic-first. The agent's strict priority order for any action: (1) **local skill** already loaded in context (kernel skills, framework quickrefs, deployed agents), (2) **discovered skill** via `aiwg discover` + `aiwg show`, (3) raw CLI command, (4) manual file edits as last resort. The skill carries the priming (pre-flight checks, dry-run preview, preservation logic, gates) that the CLI alone lacks. Sole exception: discovery/finder commands (`aiwg discover`, `aiwg show`, `aiwg list`, `aiwg status`, `aiwg version`, `aiwg runtime-info`, status/info subcommands) stay primary — they ARE the priming entry points and the bridge from priority 2 to priority 1. For mixed commands (`aiwg index`, `aiwg packages`, `aiwg ops`, `aiwg storage`), classify per subcommand. Includes a pairing table covering use/refresh/regenerate/doctor/init/promote/scaffold-*/add-*/doc-sync/lint/cleanup-audit/sdlc-accelerate/ralph/mc/steward/index build/ops actions/storage migrate.
13
18
  **When to apply**: Before invoking any AIWG CLI command, before writing skill/agent docs that reference CLI commands, when updating quickrefs or routing tables, when filing pairings audits