aiwg 2026.5.4 → 2026.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +4 -4
- package/README.md +11 -0
- package/agentic/code/addons/agent-loop/agents/ralph-verifier.md +6 -0
- package/agentic/code/addons/agent-loop/manifest.json +2 -1
- package/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md +18 -2
- package/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md +16 -4
- package/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md +323 -0
- package/agentic/code/addons/agent-loop/skills/ralph/SKILL.md +21 -8
- package/agentic/code/addons/aiwg-utils/manifest.json +4 -2
- package/agentic/code/addons/aiwg-utils/rules/RULES-INDEX.md +6 -1
- package/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md +257 -0
- package/agentic/code/frameworks/sdlc-complete/skills/flow-release/SKILL.md +14 -0
- package/agentic/code/frameworks/sdlc-complete/templates/aiwg-sections/02b-discover-first.md +42 -0
- package/agentic/code/frameworks/sdlc-complete/templates/aiwg-sections/manifest.json +7 -0
- package/agentic/code/frameworks/sdlc-complete/templates/copilot/copilot-instructions.md.aiwg-template +11 -0
- package/dist/src/cli/handlers/use.d.ts.map +1 -1
- package/dist/src/cli/handlers/use.js +21 -0
- package/dist/src/cli/handlers/use.js.map +1 -1
- package/dist/src/cli/project-isolation/detect.d.ts +8 -0
- package/dist/src/cli/project-isolation/detect.d.ts.map +1 -0
- package/dist/src/cli/project-isolation/detect.js +70 -0
- package/dist/src/cli/project-isolation/detect.js.map +1 -0
- package/dist/src/cli/project-isolation/index.d.ts +6 -0
- package/dist/src/cli/project-isolation/index.d.ts.map +1 -0
- package/dist/src/cli/project-isolation/index.js +7 -0
- package/dist/src/cli/project-isolation/index.js.map +1 -0
- package/dist/src/cli/project-isolation/signals.d.ts +4 -0
- package/dist/src/cli/project-isolation/signals.d.ts.map +1 -0
- package/dist/src/cli/project-isolation/signals.js +17 -0
- package/dist/src/cli/project-isolation/signals.js.map +1 -0
- package/dist/src/cli/project-isolation/warning.d.ts +20 -0
- package/dist/src/cli/project-isolation/warning.d.ts.map +1 -0
- package/dist/src/cli/project-isolation/warning.js +102 -0
- package/dist/src/cli/project-isolation/warning.js.map +1 -0
- package/dist/src/mcp/tools/discovery.mjs +1 -1
- package/package.json +1 -1
- package/tools/agents/providers/hermes.mjs +21 -5
- package/tools/release/cut-tag.sh +250 -0
- package/tools/warp/setup-warp.mjs +73 -1
package/CLAUDE.md
CHANGED
|
@@ -663,10 +663,10 @@ Before pushing a version tag:
|
|
|
663
663
|
npm run uat:serve-live
|
|
664
664
|
```
|
|
665
665
|
Tests skip cleanly when `AIWG_SANDBOX_ENDPOINT` is unset or unreachable, so this is a safe no-op gate. Run before any release that touches `src/serve/`, the executor contract, or the MC ↔ serve bridge.
|
|
666
|
-
5. **Commit
|
|
667
|
-
6. **
|
|
668
|
-
7. **
|
|
669
|
-
8. **
|
|
666
|
+
5. **Commit the release prep** — `git commit` the package.json/CHANGELOG/announcement bump. Do NOT use plain `git tag -a` or `git tag -s` (they sign with `user.signingkey`, which is typically the maintainer's *personal commit-signing key* — wrong key for tags; the supply-chain gate `tools/ci/verify-signed-tag.sh` will reject in CI).
|
|
667
|
+
6. **Cut the tag via the wrapper** — `tools/release/cut-tag.sh <X.Y.Z>`. Runs 10 pre-tag checks (CalVer shape, `package.json` + `marketplace.json` lockstep, CHANGELOG entry, announcement file present, release-signing key both present locally AND published in `.gitea/keys/maintainers.asc`) and signs with `-u <RELEASE_KEY_FINGERPRINT>` (default: `FE9272F0BC5781E1DE77FAAA719AB63879E84CE8`, the `AIWG Release Signing <release@aiwg.io>` key per the two-key model from commit `a13dabc5`). See the v2026.5.5 incident note in `docs/contributing/versioning.md` for what happens when this is skipped.
|
|
668
|
+
7. **Push tag to Gitea** — `git push origin main --tags`. Triggers `gitea-release.yml` + `npm-publish.yml` (both gated on signed-tag verify).
|
|
669
|
+
8. **Mirror to GitHub** — `git push github main --tags`. Triggers `github-mirror.yml` which creates the GitHub Release using `docs/releases/v<version>-announcement.md` as the body. **No manual `gh release create` needed for stable releases** — the workflow handles it. Pre-release tags (`-rc.*`, `-alpha.*`, `-beta.*`, `-nightly.*`) skip GitHub-Release creation by design.
|
|
670
670
|
|
|
671
671
|
### Version Format
|
|
672
672
|
|
package/README.md
CHANGED
|
@@ -49,6 +49,17 @@ AIWG is a deployment tool and support utility for AI context. At its core, `aiwg
|
|
|
49
49
|
|
|
50
50
|
Around that core, AIWG ships utilities for things the base platforms do not handle on their own: persistent artifact memory (`.aiwg/`), background orchestration (`aiwg mc`), autonomous loops (`aiwg ralph`), artifact indexing (`aiwg index`), cost telemetry, health diagnostics, and more. Most are opt-in. The deployment layer works standalone as plain text files the platform reads natively.
|
|
51
51
|
|
|
52
|
+
### Project scope (recommended) vs user scope (global)
|
|
53
|
+
|
|
54
|
+
`aiwg use` writes artifacts at one of two scopes. Both are first-class supported (see ADR-NUA-001 in `.aiwg/studies/novice-user-adoption/`):
|
|
55
|
+
|
|
56
|
+
- **Project scope** — default. Run `aiwg use sdlc` from a project root and the artifacts land in `./.claude/agents/`, `./.claude/skills/`, etc. One project's agent set never bleeds into another's session. **This is the recommended default for most use cases.**
|
|
57
|
+
- **User scope (global install)** — `aiwg use sdlc --scope user` writes to `~/.claude/agents/`, `~/.claude/skills/`, etc. Same artifact set loads into every session, regardless of project. Fits "AIWG in every conversation" workflows and is the canonical mode for OpenClaw and Hermes (whose primary discovery is user-scope).
|
|
58
|
+
|
|
59
|
+
The trade-off is real: when the same agent set loads into every session, context from one project can bleed into reasoning about another. Research (REF-720, *Lost in Multi-Turn Conversation*, MSR/Salesforce 2025) measured a 39% capability drop when this happens. The non-blocking project-isolation warning surfaces the trade-off at deploy time so the scope choice is informed. Neither scope is wrong; pick the one that fits the workflow.
|
|
60
|
+
|
|
61
|
+
See `docs/cli-reference.md` (under `aiwg use` → "Scope models") for the per-provider details and the global-install rough-edge inventory.
|
|
62
|
+
|
|
52
63
|
## Simple Building Blocks
|
|
53
64
|
|
|
54
65
|
AIWG ships five primitive artifact types. All are plain text:
|
|
@@ -14,6 +14,12 @@ allowed-tools: Bash, Read, Glob
|
|
|
14
14
|
|
|
15
15
|
You verify completion criteria for agent loops - determining if a task iteration succeeded by running verification commands and analyzing their output.
|
|
16
16
|
|
|
17
|
+
## Companion skill
|
|
18
|
+
|
|
19
|
+
When the loop is started without explicit `--completion`, the criterion you verify is produced by the `infer-completion-criteria` skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). It derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts.
|
|
20
|
+
|
|
21
|
+
You do not run that skill yourself — the loop orchestrator (`ralph-loop` agent or external launcher) calls it during initialization. Your job is to take whatever criterion is in the loop state and verify it. The skill writes its rationale into `.aiwg/ralph/<loop-id>/progress.md` (or `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` for external loops); when reporting verification results, you may reference that rationale so the user sees the full evidence chain.
|
|
22
|
+
|
|
17
23
|
## Capabilities
|
|
18
24
|
|
|
19
25
|
### Verification Methods
|
|
@@ -74,7 +74,19 @@ Alternate expressions and non-obvious activations (primary phrases are matched a
|
|
|
74
74
|
|
|
75
75
|
### Completion Inference
|
|
76
76
|
|
|
77
|
-
When user doesn't specify explicit verification:
|
|
77
|
+
When the user doesn't specify explicit verification, delegate to the **`infer-completion-criteria`** skill (`@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`). That skill runs a deterministic 5-layer pipeline:
|
|
78
|
+
|
|
79
|
+
1. **Task verb** → criterion class (test-pass, type-clean, regression-gate, coverage, lint-clean, build-pass, implement-feature)
|
|
80
|
+
2. **Project context files** (CLAUDE.md / AGENTS.md / AIWG.md) → canonical commands from the Development section
|
|
81
|
+
3. **Package manifests** (`package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.) → discovered scripts
|
|
82
|
+
4. **CI configuration** (`.github/workflows/`, `.gitea/workflows/`, GitLab/CircleCI/Jenkins) → team's actual "passes" definition
|
|
83
|
+
5. **`.aiwg/` artifacts** (test-strategy, related use cases by ID match, prior progress files) → project-specific gates
|
|
84
|
+
|
|
85
|
+
Synthesis is validated against the `vague-discretion` rule and emits a structured YAML proposal with criterion, verification command, rationale chain, confidence level, and alternatives considered.
|
|
86
|
+
|
|
87
|
+
**Use the inline table below ONLY as a last-resort fallback** when the inference skill is unavailable (degraded environment, missing skill deployment). It is intentionally narrow — JavaScript/Node-centric — and represents prior state before `infer-completion-criteria` was added.
|
|
88
|
+
|
|
89
|
+
Legacy fallback table:
|
|
78
90
|
|
|
79
91
|
| Task Pattern | Inferred Completion |
|
|
80
92
|
|--------------|---------------------|
|
|
@@ -86,6 +98,8 @@ When user doesn't specify explicit verification:
|
|
|
86
98
|
| "migrate to ESM" | "node runs without errors" |
|
|
87
99
|
| "refactor X" | "npm test passes" (preserve behavior) |
|
|
88
100
|
|
|
101
|
+
When the inference skill IS available, prefer it. The skill handles multi-language projects, monorepos, CI-defined gates, use-case acceptance criteria, and the refusal case (truly vague tasks like "make it better" that have no measurable criterion).
|
|
102
|
+
|
|
89
103
|
### Examples
|
|
90
104
|
|
|
91
105
|
**User**: "ralph this: migrate all files in lib/ to ESM"
|
|
@@ -301,7 +315,9 @@ User: "actually, abort that and just fix the login bug"
|
|
|
301
315
|
|
|
302
316
|
## Related
|
|
303
317
|
|
|
304
|
-
- `
|
|
318
|
+
- `infer-completion-criteria` skill - derives measurable `--completion` from project state when the user doesn't supply one
|
|
319
|
+
- `ralph` skill - the iterative loop executor implementation (legacy name; `agent-loop` is canonical)
|
|
320
|
+
- `agent-loop-ext` skill - crash-resilient external loop with state persistence
|
|
305
321
|
- `ralph-status` skill - check loop progress
|
|
306
322
|
- `ralph-resume` skill - continue interrupted loops
|
|
307
323
|
- `ralph-abort` skill - abort active loops
|
|
@@ -5,7 +5,7 @@ legacyName: ralph-external
|
|
|
5
5
|
platforms: [all]
|
|
6
6
|
description: Crash-resilient external agent loop with state persistence and CI/CD integration
|
|
7
7
|
commandHint:
|
|
8
|
-
argumentHint: "\"<objective>\" --completion \"<criteria>\" [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet]"
|
|
8
|
+
argumentHint: "\"<objective>\" [--completion \"<criteria>\"] [--max-iterations N] [--timeout M] [--provider <p>] [--no-commit] [--branch <name>] [--quiet] [--auto-criteria | --no-infer-completion]"
|
|
9
9
|
allowedTools: Bash, Read, Write
|
|
10
10
|
model: sonnet
|
|
11
11
|
category: automation
|
|
@@ -60,7 +60,7 @@ Users may say:
|
|
|
60
60
|
### Objective (required)
|
|
61
61
|
The task the loop should accomplish. Passed as the first positional argument.
|
|
62
62
|
|
|
63
|
-
### --completion (
|
|
63
|
+
### --completion (optional — inferred when omitted)
|
|
64
64
|
Success criteria as a verifiable command. The loop exits when this command returns exit code 0.
|
|
65
65
|
|
|
66
66
|
**Good examples**:
|
|
@@ -68,6 +68,14 @@ Success criteria as a verifiable command. The loop exits when this command retur
|
|
|
68
68
|
- `--completion "npx tsc --noEmit exits with code 0"`
|
|
69
69
|
- `--completion "coverage report shows >80%"`
|
|
70
70
|
|
|
71
|
+
**When omitted**: the launcher invokes the `infer-completion-criteria` skill before the external loop starts. The skill derives a measurable criterion from project state (CLAUDE.md / AGENTS.md / AIWG.md, package manifests, CI configuration, `.aiwg/` artifacts) and emits a structured proposal with rationale. The proposal is written to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml` and used as the loop's gate.
|
|
72
|
+
|
|
73
|
+
Because `agent-loop-ext` runs externally (potentially headless / in CI), the confirmation flow is:
|
|
74
|
+
- Interactive session (TTY attached): show proposal, accept `Y / n / edit` like the in-session `ralph` skill
|
|
75
|
+
- Non-interactive / `--auto-criteria` / CI environment: use the inferred criterion if confidence is `high`, otherwise fail fast and print the proposal as a diagnostic so the user can re-launch with `--completion` explicitly
|
|
76
|
+
|
|
77
|
+
Pass `--no-infer-completion` to require explicit `--completion` and fail before launch if missing. See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md`.
|
|
78
|
+
|
|
71
79
|
### --max-iterations (default: 10)
|
|
72
80
|
Maximum iterations before the loop halts and saves state for manual review.
|
|
73
81
|
|
|
@@ -90,8 +98,12 @@ Suppress verbose progress output. Completion banner is always shown.
|
|
|
90
98
|
|
|
91
99
|
When triggered:
|
|
92
100
|
|
|
93
|
-
1.
|
|
94
|
-
|
|
101
|
+
1. **Resolve completion criteria**:
|
|
102
|
+
- If `--completion` is provided → use it directly
|
|
103
|
+
- Else if `--no-infer-completion` is set → fail fast before launch with a helpful error
|
|
104
|
+
- Else → invoke `infer-completion-criteria` skill, persist proposal to `.aiwg/ralph-external/<run-id>/inferred-completion.yaml`, confirm or auto-adopt per session-interactivity rules above
|
|
105
|
+
2. Validate the resolved criterion is verifiable (can be checked via command)
|
|
106
|
+
3. Check for an existing `.aiwg/ralph-external/` workspace; create if absent
|
|
95
107
|
3. Generate a unique `loop-id` (8-character hex) and create the loop state file at `.aiwg/ralph-external/loops/<loop-id>.json`
|
|
96
108
|
4. Write the initial state: `{ objective, completionCriteria, maxIterations, timeout, provider, status: "pending", iteration: 0 }`
|
|
97
109
|
5. If `--branch` is specified, create the git branch now
|
|
@@ -0,0 +1,323 @@
|
|
|
1
|
+
---
|
|
2
|
+
namespace: aiwg
|
|
3
|
+
name: infer-completion-criteria
|
|
4
|
+
aliases: [agent-loop-infer-completion, al-infer-completion, ralph-infer-completion]
|
|
5
|
+
platforms: [all]
|
|
6
|
+
description: Infer measurable completion criteria for an agent-loop task from project docs, code, and AIWG standards when the user has not supplied --completion explicitly
|
|
7
|
+
commandHint:
|
|
8
|
+
argumentHint: '"<task description>" [--task-type code|test|docs|refactor] [--non-interactive]'
|
|
9
|
+
allowedTools: "Read, Glob, Grep, Bash"
|
|
10
|
+
model: sonnet
|
|
11
|
+
category: automation
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Infer Completion Criteria
|
|
15
|
+
|
|
16
|
+
## Purpose
|
|
17
|
+
|
|
18
|
+
When a user starts an `agent-loop` task without supplying `--completion`, this skill derives a measurable, verifiable completion criterion from project state. The output must satisfy the `vague-discretion` rule: a concrete shell command or file-inspection check that returns pass/fail unambiguously.
|
|
19
|
+
|
|
20
|
+
Iteration is only as good as its gate. A loop with a vague gate ("until it's done") runs forever or exits prematurely. This skill is what turns "agent-loop this" into "agent-loop this until `<measurable thing>`."
|
|
21
|
+
|
|
22
|
+
The canonical name for the iterative-loop addon is **agent-loop**. `ralph` is the legacy name for the executor skill, retained as an alias; `al` is a short form. The detection/routing skill is `agent-loop` (which delegates to this skill when criteria are missing); the executor is `ralph` (canonical name forthcoming). Everywhere this skill says "agent-loop" you can read "ralph" as the legacy equivalent.
|
|
23
|
+
|
|
24
|
+
## When This Skill Runs
|
|
25
|
+
|
|
26
|
+
This skill is invoked by:
|
|
27
|
+
|
|
28
|
+
- The `agent-loop` detection-and-routing skill when it parses a user request without explicit completion criteria
|
|
29
|
+
- The `ralph` executor skill during Phase 1 initialization when `--completion` is omitted
|
|
30
|
+
- The `agent-loop-ext` external-loop launcher during pre-launch resolution when `--completion` is omitted
|
|
31
|
+
- Direct invocation via `aiwg discover "infer completion"` → `aiwg show skill infer-completion-criteria` when a user wants to preview the inferred criterion before committing to a loop
|
|
32
|
+
|
|
33
|
+
This skill does **not** run when `--completion` is explicit. The user's word is authoritative.
|
|
34
|
+
|
|
35
|
+
## Inference Pipeline
|
|
36
|
+
|
|
37
|
+
The skill is a deterministic walk through five evidence layers, plus one synthesis step. Each layer contributes candidate criteria; the synthesis picks the strongest measurable one and explains the chain of evidence.
|
|
38
|
+
|
|
39
|
+
### Layer 1 — The task verb
|
|
40
|
+
|
|
41
|
+
Parse the user's task description for an intent verb. Map to a default criterion class:
|
|
42
|
+
|
|
43
|
+
| Verb / phrase | Criterion class |
|
|
44
|
+
|---|---|
|
|
45
|
+
| "fix tests", "make tests pass", "test failure" | Test suite passes (exit 0) |
|
|
46
|
+
| "add tests", "increase coverage", "test coverage" | Coverage threshold met |
|
|
47
|
+
| "fix types", "type errors", "migrate to typescript" | Type checker exits 0 |
|
|
48
|
+
| "fix lint", "clean up warnings", "style" | Linter exits 0 |
|
|
49
|
+
| "build", "make it compile" | Build command exits 0 |
|
|
50
|
+
| "refactor", "extract", "rename" | Tests still pass AND build still passes (regression gate) |
|
|
51
|
+
| "implement <X>", "add feature <X>" | Tests for the new code exist and pass |
|
|
52
|
+
| "document", "add docs", "JSDoc" | Coverage check on docstrings/JSDoc presence |
|
|
53
|
+
| "fix bug", "resolve issue #N" | Specific test for that bug passes AND existing suite still green |
|
|
54
|
+
| "migrate", "upgrade" | Build + test + lint all green (no regression) |
|
|
55
|
+
|
|
56
|
+
If the verb is ambiguous, the skill falls back to "regression gate" (build + test + lint all green) as the safest default.
|
|
57
|
+
|
|
58
|
+
### Layer 2 — Project conventions in CLAUDE.md / AGENTS.md / AIWG.md
|
|
59
|
+
|
|
60
|
+
Read the project's context files. AIWG-managed projects often declare commands directly:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
# Run tests
|
|
64
|
+
npm test
|
|
65
|
+
|
|
66
|
+
# Type check
|
|
67
|
+
npx tsc --noEmit
|
|
68
|
+
|
|
69
|
+
# Lint markdown
|
|
70
|
+
npm exec markdownlint-cli2 "**/*.md"
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Extract these as the canonical commands for their respective domains. The Development section of `CLAUDE.md` is the highest-trust source here — it's what the project's maintainers run.
|
|
74
|
+
|
|
75
|
+
Also scan for explicit completion-criterion conventions. Some projects state "a commit is not finished until CI passes" — that signals the CI command (or equivalent local invocation) is the gate.
|
|
76
|
+
|
|
77
|
+
### Layer 3 — Package manifests and config
|
|
78
|
+
|
|
79
|
+
Inspect the project's manifest files to discover scripts and tools:
|
|
80
|
+
|
|
81
|
+
| Manifest | Where to look |
|
|
82
|
+
|---|---|
|
|
83
|
+
| `package.json` | `scripts.test`, `scripts.lint`, `scripts.build`, `scripts.coverage`, `scripts.typecheck` |
|
|
84
|
+
| `Cargo.toml` | implies `cargo test`, `cargo build`, `cargo clippy` |
|
|
85
|
+
| `pyproject.toml` | `[tool.pytest]`, `[tool.ruff]`, `[tool.mypy]`, `scripts.*` |
|
|
86
|
+
| `go.mod` | implies `go test ./...`, `go vet ./...`, `go build ./...` |
|
|
87
|
+
| `Gemfile` | implies `bundle exec rspec`, `bundle exec rubocop` |
|
|
88
|
+
| `pom.xml` / `build.gradle` | `mvn test`, `mvn verify`, `gradle test` |
|
|
89
|
+
| `.tool-versions` / `mise.toml` | language version pins inform which tool is canonical |
|
|
90
|
+
|
|
91
|
+
When multiple scripts exist (e.g. `test`, `test:unit`, `test:integration`), prefer the script the project's own docs reference. If the docs don't reference any, prefer the most specific match to the task verb (e.g. for "fix integration test" → `test:integration`).
|
|
92
|
+
|
|
93
|
+
### Layer 4 — CI configuration
|
|
94
|
+
|
|
95
|
+
CI files encode the team's actual definition of "passes":
|
|
96
|
+
|
|
97
|
+
| CI system | Scan |
|
|
98
|
+
|---|---|
|
|
99
|
+
| GitHub Actions | `.github/workflows/*.yml` — extract `run:` steps from non-deploy jobs |
|
|
100
|
+
| Gitea Actions | `.gitea/workflows/*.yml` — same |
|
|
101
|
+
| GitLab CI | `.gitlab-ci.yml` — extract `script:` from test/lint jobs |
|
|
102
|
+
| CircleCI | `.circleci/config.yml` |
|
|
103
|
+
| Jenkins | `Jenkinsfile` |
|
|
104
|
+
|
|
105
|
+
The first non-trivial verification step in the primary workflow is the team's canonical "done" gate. If CI runs `npm test && npm run lint && npm run typecheck` in order, the inferred criterion is "all three exit 0."
|
|
106
|
+
|
|
107
|
+
### Layer 5 — AIWG artifacts
|
|
108
|
+
|
|
109
|
+
If the project has a `.aiwg/` directory, scan for relevant context:
|
|
110
|
+
|
|
111
|
+
- `.aiwg/testing/test-strategy.md` — declared verification approach
|
|
112
|
+
- `.aiwg/architecture/software-architecture-doc.md` — architectural quality gates
|
|
113
|
+
- `.aiwg/security/security-gates.md` — security-related criteria
|
|
114
|
+
- `.aiwg/quality/code-review-guide.md` — code quality bars
|
|
115
|
+
- `.aiwg/activity.log` — recent operations that may indicate what "done" looked like for similar past tasks
|
|
116
|
+
- `.aiwg/working/<related-progress-files>.md` — prior task progress files; mine the "Completion criteria" sections
|
|
117
|
+
|
|
118
|
+
If the project has a related use case (`.aiwg/requirements/UC-*.md`) whose ID is in the task description, pull that use case's acceptance criteria — those ARE the completion criteria.
|
|
119
|
+
|
|
120
|
+
### Synthesis step
|
|
121
|
+
|
|
122
|
+
Combine the layers into a single proposed criterion. The decision logic:
|
|
123
|
+
|
|
124
|
+
1. If a use case in `.aiwg/requirements/` matches the task, use its acceptance criteria verbatim. Done.
|
|
125
|
+
2. Otherwise, take the verb-class default from Layer 1 and instantiate it using the canonical command from Layer 2 (CLAUDE.md) > Layer 4 (CI) > Layer 3 (manifest).
|
|
126
|
+
3. If the task is in the "regression gate" class, AND the project's CI runs more than one verification, combine them: `command-A passes AND command-B passes AND command-C passes`.
|
|
127
|
+
4. If no canonical command is found in any layer (unusual — typically only on empty-scaffold projects), fall back to:
|
|
128
|
+
- `<file or change exists in git diff against HEAD~1>` — pure structural check
|
|
129
|
+
- And inform the user that a substantive verification command should be added.
|
|
130
|
+
|
|
131
|
+
### Apply AIWG standards (vague-discretion)
|
|
132
|
+
|
|
133
|
+
Validate the proposal against the `vague-discretion` rule:
|
|
134
|
+
|
|
135
|
+
- Criterion must be expressible as a shell command (or shell pipeline) that exits 0 on pass, non-zero on fail.
|
|
136
|
+
- Criterion must NOT use the words "good enough", "thorough", "comprehensive", "complete" without a measurable suffix.
|
|
137
|
+
- Criterion must NOT be self-referential ("the agent is satisfied" — no).
|
|
138
|
+
- Criterion must have an implicit or explicit `max-iterations` cap (ralph's default 10 is the floor; very large refactors may need 20).
|
|
139
|
+
|
|
140
|
+
If the proposal fails any of these, regenerate. If after two regenerations the proposal still fails, surface the problem to the user with the diagnostic ("could not find a measurable verification command — please supply one explicitly").
|
|
141
|
+
|
|
142
|
+
## Output Contract
|
|
143
|
+
|
|
144
|
+
The skill emits a single block of structured output for the calling skill (`agent-loop` router, `ralph` executor, or `agent-loop-ext` launcher) to consume:
|
|
145
|
+
|
|
146
|
+
```yaml
|
|
147
|
+
proposed_completion:
|
|
148
|
+
criterion: "npm test passes AND npx tsc --noEmit exits 0"
|
|
149
|
+
verification_command: "npm test && npx tsc --noEmit"
|
|
150
|
+
rationale:
|
|
151
|
+
- "Task verb 'refactor' triggers regression gate (Layer 1)"
|
|
152
|
+
- "package.json scripts.test = 'jest --coverage' (Layer 3)"
|
|
153
|
+
- "CLAUDE.md Development section references both npm test and npx tsc --noEmit (Layer 2)"
|
|
154
|
+
- ".github/workflows/ci.yml runs both as required checks (Layer 4)"
|
|
155
|
+
confidence: high # high | medium | low
|
|
156
|
+
alternatives_considered:
|
|
157
|
+
- criterion: "npm run lint exits 0"
|
|
158
|
+
rejected_because: "Lint is not in CI required checks for this repo"
|
|
159
|
+
max_iterations_suggestion: 10
|
|
160
|
+
needs_human_confirmation: false # true if confidence == low OR criterion is unusual
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
When the skill runs in non-interactive mode (via `aiwg al --auto-criteria` or the equivalent on `agent-loop` / `agent-loop-ext`), `needs_human_confirmation: false` proceeds directly. Otherwise the consuming skill (the `agent-loop` router or the `ralph` / `agent-loop-ext` executor) shows the proposal to the user and confirms.
|
|
164
|
+
|
|
165
|
+
## Interaction With The User
|
|
166
|
+
|
|
167
|
+
In interactive mode, after running the pipeline, present the proposal:
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
No --completion criteria was provided. I inferred:
|
|
171
|
+
|
|
172
|
+
Criterion: npm test passes AND npx tsc --noEmit exits 0
|
|
173
|
+
Verification: `npm test && npx tsc --noEmit`
|
|
174
|
+
|
|
175
|
+
Evidence:
|
|
176
|
+
- Task verb "refactor" → regression gate
|
|
177
|
+
- package.json scripts.test = jest --coverage
|
|
178
|
+
- CLAUDE.md Development section references both checks
|
|
179
|
+
- .github/workflows/ci.yml requires both
|
|
180
|
+
|
|
181
|
+
Proceed with this criterion? [Y/n/edit]
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
User options:
|
|
185
|
+
- `Y` (default): start the loop with the inferred criterion
|
|
186
|
+
- `n`: abort and request the user supply `--completion` explicitly
|
|
187
|
+
- `edit`: accept a manual edit to the criterion before proceeding
|
|
188
|
+
|
|
189
|
+
Use the platform's native interaction tool when available (per `native-ux-tools` rule). On Claude Code, that means `AskUserQuestion`.
|
|
190
|
+
|
|
191
|
+
## When Inference Should NOT Run
|
|
192
|
+
|
|
193
|
+
- `--completion` is explicit → use the user's criterion, don't second-guess
|
|
194
|
+
- `--no-infer-completion` flag is passed → fail fast with a helpful error if `--completion` is also missing
|
|
195
|
+
- The task description is itself a criterion (e.g. "make `npx tsc --noEmit` pass") → extract the command from the task, don't re-infer
|
|
196
|
+
|
|
197
|
+
## Edge Cases
|
|
198
|
+
|
|
199
|
+
| Case | Handling |
|
|
200
|
+
|---|---|
|
|
201
|
+
| No `package.json`, no manifest, no CI | Scan project root for any executable test runner (`pytest`, `go test`, `cargo test`, `make test`, `Makefile` target `test`). Fall back to the structural check if none found. |
|
|
202
|
+
| Multiple test commands (`test:unit`, `test:integration`, `test:e2e`) | Prefer the one nearest to the task scope. If task mentions "unit", use `test:unit`. If the task is broad, prefer the union via `&&`. |
|
|
203
|
+
| CI runs tests on multiple OS/Node versions | Use the local invocation (`npm test`), not the matrix runner. The matrix is a deploy concern. |
|
|
204
|
+
| Monorepo with multiple packages | Detect from `pnpm-workspace.yaml` / `lerna.json` / `turbo.json` / workspaces field. If the task scope is one package, infer that package's commands. If the task spans the monorepo, use the top-level `test` script. |
|
|
205
|
+
| Project has no tests at all | This is a finding. The inferred criterion should be "tests exist for the new code AND those tests pass." Surface to the user that the project lacks a baseline test suite — that's important context for the loop's expectations. |
|
|
206
|
+
| Project's tests are currently broken (the task IS to fix them) | Set the criterion to the passing condition. The whole point of the loop is to get from current red state to green. |
|
|
207
|
+
| Conflict between layers (CLAUDE.md says X, CI says Y) | Prefer CLAUDE.md (closer to the maintainer's intent). Note the discrepancy in the rationale. |
|
|
208
|
+
|
|
209
|
+
## Interaction With Other AIWG Rules
|
|
210
|
+
|
|
211
|
+
| Rule | How this skill respects it |
|
|
212
|
+
|---|---|
|
|
213
|
+
| `vague-discretion` | The whole point — produces measurable, command-form criteria |
|
|
214
|
+
| `instruction-comprehension` | Reads the task description carefully; doesn't override explicit user instructions |
|
|
215
|
+
| `research-before-decision` | Layer-walk IS research; doesn't propose criteria without evidence |
|
|
216
|
+
| `human-authorization` | Confirms with user before starting loop (unless `--auto-criteria` explicitly granted) |
|
|
217
|
+
| `auto-compact-continue` | The inferred criterion IS what the loop continues toward; no "should I keep working" prompts |
|
|
218
|
+
| `cli-secondary` | Uses `aiwg discover` to find related skills if the task verb is unusual |
|
|
219
|
+
|
|
220
|
+
## Examples
|
|
221
|
+
|
|
222
|
+
### Example 1: Simple test task on a TypeScript project
|
|
223
|
+
|
|
224
|
+
```
|
|
225
|
+
$ agent-loop "fix the failing auth tests" # or: aiwg al / aiwg ralph (legacy)
|
|
226
|
+
|
|
227
|
+
Inferring completion criteria...
|
|
228
|
+
Layer 1 verb: "fix tests" → test-pass class
|
|
229
|
+
Layer 2 CLAUDE.md: "npm test" is the canonical test command
|
|
230
|
+
Layer 3 package.json: scripts.test = "jest"
|
|
231
|
+
Layer 4 CI: .github/workflows/ci.yml runs `npm test`
|
|
232
|
+
Layer 5: no related use case found
|
|
233
|
+
|
|
234
|
+
Proposed criterion: npm test passes (exit 0)
|
|
235
|
+
Verification: `npm test`
|
|
236
|
+
Confidence: high
|
|
237
|
+
|
|
238
|
+
Proceed? [Y/n/edit]
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
### Example 2: Refactor with no tests
|
|
242
|
+
|
|
243
|
+
```
|
|
244
|
+
$ agent-loop "extract auth logic into a separate module"
|
|
245
|
+
|
|
246
|
+
Inferring completion criteria...
|
|
247
|
+
Layer 1 verb: "extract" → regression gate
|
|
248
|
+
Layer 2 CLAUDE.md: no Development section
|
|
249
|
+
Layer 3 package.json: scripts.test = "echo 'no tests'" (degenerate)
|
|
250
|
+
Layer 4 CI: no workflows found
|
|
251
|
+
Layer 5: no related use case found
|
|
252
|
+
|
|
253
|
+
Warning: project has no functional test suite or CI configuration.
|
|
254
|
+
Falling back to structural verification.
|
|
255
|
+
|
|
256
|
+
Proposed criterion: The new module exists, the original code references it,
|
|
257
|
+
AND `npx tsc --noEmit` still exits 0
|
|
258
|
+
Verification: `test -f src/auth/index.ts && grep -q 'from.*src/auth' src/main.ts && npx tsc --noEmit`
|
|
259
|
+
Confidence: low
|
|
260
|
+
|
|
261
|
+
This is a weak gate. Consider supplying --completion explicitly
|
|
262
|
+
or adding a test suite first.
|
|
263
|
+
|
|
264
|
+
Proceed? [Y/n/edit]
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Example 3: Task with an explicit use case reference
|
|
268
|
+
|
|
269
|
+
```
|
|
270
|
+
$ agent-loop "implement UC-AUTH-001"
|
|
271
|
+
|
|
272
|
+
Inferring completion criteria...
|
|
273
|
+
Layer 1 verb: "implement" → tests-exist class
|
|
274
|
+
Layer 5: found .aiwg/requirements/UC-AUTH-001-user-login.md
|
|
275
|
+
|
|
276
|
+
Using acceptance criteria from UC-AUTH-001:
|
|
277
|
+
- [ ] User can log in with valid email/password
|
|
278
|
+
- [ ] Invalid credentials show clear error message
|
|
279
|
+
- [ ] Account locks after 5 failed attempts
|
|
280
|
+
- [ ] Login completes within 2 seconds
|
|
281
|
+
|
|
282
|
+
Proposed criterion: All acceptance criteria from UC-AUTH-001 verified by tests,
|
|
283
|
+
AND `npm test -- --testPathPattern=auth` passes
|
|
284
|
+
Verification: `npm test -- --testPathPattern=auth`
|
|
285
|
+
Confidence: high (acceptance criteria are explicit)
|
|
286
|
+
|
|
287
|
+
Proceed? [Y/n/edit]
|
|
288
|
+
```
|
|
289
|
+
|
|
290
|
+
### Example 4: Refusal case
|
|
291
|
+
|
|
292
|
+
```
|
|
293
|
+
$ agent-loop "make the code better"
|
|
294
|
+
|
|
295
|
+
Inferring completion criteria...
|
|
296
|
+
Layer 1 verb: "make better" → AMBIGUOUS, no clear criterion class
|
|
297
|
+
Layer 5: no related use case
|
|
298
|
+
|
|
299
|
+
Cannot infer measurable criteria for this task.
|
|
300
|
+
|
|
301
|
+
"Make the code better" is vague (per AIWG vague-discretion rule).
|
|
302
|
+
A loop with no measurable gate runs forever or exits prematurely.
|
|
303
|
+
|
|
304
|
+
Please supply --completion with a concrete check, e.g.:
|
|
305
|
+
--completion "npm test passes AND npm run lint exits 0"
|
|
306
|
+
--completion "all functions in src/utils/ have JSDoc"
|
|
307
|
+
--completion "complexity score from eslint < 10 for all files"
|
|
308
|
+
|
|
309
|
+
Or rephrase the task with a concrete intent:
|
|
310
|
+
agent-loop "reduce cyclomatic complexity in src/utils/"
|
|
311
|
+
agent-loop "add JSDoc to all exported functions in src/api/"
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
## References
|
|
315
|
+
|
|
316
|
+
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/vague-discretion.md — Measurable criteria requirement
|
|
317
|
+
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/research-before-decision.md — Layer-walk research pattern
|
|
318
|
+
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/instruction-comprehension.md — Don't override explicit user criteria
|
|
319
|
+
- @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md — Criterion IS what the loop continues toward
|
|
320
|
+
- @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/agent-loop/SKILL.md — The detection/routing skill that delegates here when `--completion` is missing
|
|
321
|
+
- @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/ralph/SKILL.md — The legacy executor skill that consumes this output (`ralph` is legacy for `agent-loop`'s executor)
|
|
322
|
+
- @$AIWG_ROOT/agentic/code/addons/agent-loop/skills/agent-loop-ext/SKILL.md — The crash-resilient external loop, same delegation pattern
|
|
323
|
+
- @$AIWG_ROOT/agentic/code/addons/agent-loop/agents/ralph-verifier.md — Runs the verification command this skill proposes
|
|
@@ -6,7 +6,7 @@ deprecated_names: [ralph]
|
|
|
6
6
|
platforms: [all]
|
|
7
7
|
description: Execute iterative task loop until completion criteria are met - iteration beats perfection
|
|
8
8
|
commandHint:
|
|
9
|
-
argumentHint: '"<task>" --completion "<criteria>" [--max-iterations N] [--timeout M] [--interactive --guidance "text"]'
|
|
9
|
+
argumentHint: '"<task>" [--completion "<criteria>"] [--max-iterations N] [--timeout M] [--interactive --guidance "text"] [--auto-criteria | --no-infer-completion]'
|
|
10
10
|
allowedTools: "Task, Read, Write, Bash, Glob, Grep, TodoWrite, Edit"
|
|
11
11
|
model: opus
|
|
12
12
|
category: automation
|
|
@@ -50,7 +50,7 @@ The task to execute. Should be:
|
|
|
50
50
|
- Measurable completion state
|
|
51
51
|
- Self-contained (all context provided)
|
|
52
52
|
|
|
53
|
-
### --completion (
|
|
53
|
+
### --completion (optional — inferred when omitted)
|
|
54
54
|
Success criteria. Must be:
|
|
55
55
|
- Verifiable (tests, lint, compilation)
|
|
56
56
|
- Specific (not subjective)
|
|
@@ -66,6 +66,10 @@ Success criteria. Must be:
|
|
|
66
66
|
- `--completion "code looks good"`
|
|
67
67
|
- `--completion "feature is done"`
|
|
68
68
|
|
|
69
|
+
**When omitted**: the loop delegates to the `infer-completion-criteria` skill, which derives a measurable criterion from project docs (CLAUDE.md / AGENTS.md / AIWG.md), package manifests, CI configuration, and `.aiwg/` artifacts. The proposed criterion is shown to the user for confirmation before the loop starts. Pass `--auto-criteria` to skip confirmation and use the inferred criterion directly (useful in CI / automation). Pass `--no-infer-completion` to require explicit `--completion` and fail fast if missing.
|
|
70
|
+
|
|
71
|
+
See `@$AIWG_ROOT/agentic/code/addons/agent-loop/skills/infer-completion-criteria/SKILL.md` for the inference pipeline.
|
|
72
|
+
|
|
69
73
|
### --max-iterations (default: 10)
|
|
70
74
|
Safety limit on iterations. Prevents infinite loops.
|
|
71
75
|
|
|
@@ -94,12 +98,21 @@ Create feature branch for loop work.
|
|
|
94
98
|
|
|
95
99
|
### Phase 1: Initialization
|
|
96
100
|
|
|
97
|
-
1. Parse task
|
|
98
|
-
2.
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
101
|
+
1. Parse task
|
|
102
|
+
2. **Completion-criteria resolution**:
|
|
103
|
+
- If `--completion` is provided → use it directly
|
|
104
|
+
- Else if `--no-infer-completion` is set → fail fast with a helpful error
|
|
105
|
+
- Else → invoke the `infer-completion-criteria` skill on the task description
|
|
106
|
+
- The skill returns a proposed criterion with rationale and confidence level
|
|
107
|
+
- If `--auto-criteria` is set OR confidence is `high`, adopt the proposal silently and log it
|
|
108
|
+
- Otherwise, surface the proposal to the user via the platform's native interaction tool (`AskUserQuestion` on Claude Code, formatted text elsewhere per `native-ux-tools`); accept `Y` / `n` / `edit`
|
|
109
|
+
- If the user rejects, abort the loop and ask them to supply `--completion` explicitly
|
|
110
|
+
3. Validate the final criterion is verifiable (can be checked via command)
|
|
111
|
+
4. Create `.aiwg/ralph/` workspace if not exists
|
|
112
|
+
5. Initialize iteration counter (i=0)
|
|
113
|
+
6. Create feature branch if --branch specified
|
|
114
|
+
7. **Write the criterion and its rationale into the loop's progress file** (`.aiwg/ralph/<loop-id>/progress.md`) per the `auto-compact-continue` rule — this survives compaction and resumption
|
|
115
|
+
8. Log initialization
|
|
103
116
|
|
|
104
117
|
**Communicate**:
|
|
105
118
|
```
|
|
@@ -36,7 +36,8 @@
|
|
|
36
36
|
"consolidation": {
|
|
37
37
|
"strategy": "index-with-links",
|
|
38
38
|
"rulesIndex": "rules/RULES-INDEX.md",
|
|
39
|
-
"deployIndexOnly":
|
|
39
|
+
"deployIndexOnly": false,
|
|
40
|
+
"_note": "deployIndexOnly was true (#1343); changed to false 2026-05-14 so individual rule files (including skill-discovery.md) deploy to every provider's rules directory alongside the index. RULES-INDEX.md remains the canonical entry point; its links now resolve to local files on every provider, not just Claude/Cursor."
|
|
40
41
|
},
|
|
41
42
|
"commands": [],
|
|
42
43
|
"agents": [
|
|
@@ -121,7 +122,8 @@
|
|
|
121
122
|
"post-commit-index-refresh",
|
|
122
123
|
"soul-enforcement",
|
|
123
124
|
"skill-discovery",
|
|
124
|
-
"cli-secondary"
|
|
125
|
+
"cli-secondary",
|
|
126
|
+
"auto-compact-continue"
|
|
125
127
|
],
|
|
126
128
|
"templates": [
|
|
127
129
|
"devkit/addon-manifest",
|
|
@@ -4,10 +4,15 @@ Core meta-utility rules for agent coordination, context management, and platform
|
|
|
4
4
|
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
## AIWG Utilities Rules (
|
|
7
|
+
## AIWG Utilities Rules (21 rules — active with aiwg-utils addon)
|
|
8
8
|
|
|
9
9
|
### HIGH
|
|
10
10
|
|
|
11
|
+
#### auto-compact-continue
|
|
12
|
+
**Summary**: The answer to "should I keep working?" is always YES — until the task's measurable completion criteria are met or the user redirects. Context pressure, long tool outputs, and high iteration counts are not scope questions. When context fills, the right response is to compact, checkpoint to durable storage (activity log, progress file at `.aiwg/working/<task>-progress.md`, git history, AIWG memory), and continue. Trust platform auto-compact (REF-910); load-bearing state must live in `CLAUDE.md` / `AGENTS.md` / `AIWG.md` / activity log / progress file / git / `.aiwg/working/` — never only in conversation turns. Maintain a `## Compact Instructions` section so the summarizer preserves completion criteria, last successful step, failed approaches (do not let them be re-attempted), and authorization questions. Apply aggressive in-session compression (REF-122: passive=6% savings, aggressive=22.7%); update the progress file every 10–15 tool calls on long tasks. Exceptions: authorization gates (per `human-authorization`), 3-attempts-failed escalation (per `anti-laziness` Rule 6), explicit user redirect, genuinely ambiguous new directive classification (per `skill-discovery` Rule 0). "Should I continue?" is never the right question.
|
|
13
|
+
**When to apply**: Any long-running session; before context fills; after long tool outputs; when crossing iteration thresholds; when resuming after compaction; when tempted to ask the user permission to keep going
|
|
14
|
+
**Full rule**: @$AIWG_ROOT/agentic/code/addons/aiwg-utils/rules/auto-compact-continue.md
|
|
15
|
+
|
|
11
16
|
#### cli-secondary
|
|
12
17
|
**Summary**: AIWG is agentic-first. The agent's strict priority order for any action: (1) **local skill** already loaded in context (kernel skills, framework quickrefs, deployed agents), (2) **discovered skill** via `aiwg discover` + `aiwg show`, (3) raw CLI command, (4) manual file edits as last resort. The skill carries the priming (pre-flight checks, dry-run preview, preservation logic, gates) that the CLI alone lacks. Sole exception: discovery/finder commands (`aiwg discover`, `aiwg show`, `aiwg list`, `aiwg status`, `aiwg version`, `aiwg runtime-info`, status/info subcommands) stay primary — they ARE the priming entry points and the bridge from priority 2 to priority 1. For mixed commands (`aiwg index`, `aiwg packages`, `aiwg ops`, `aiwg storage`), classify per subcommand. Includes a pairing table covering use/refresh/regenerate/doctor/init/promote/scaffold-*/add-*/doc-sync/lint/cleanup-audit/sdlc-accelerate/ralph/mc/steward/index build/ops actions/storage migrate.
|
|
13
18
|
**When to apply**: Before invoking any AIWG CLI command, before writing skill/agent docs that reference CLI commands, when updating quickrefs or routing tables, when filing pairings audits
|