@doidor/agentrig 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +224 -0
- package/dist/agent/claude.js +125 -0
- package/dist/agent/claude.js.map +1 -0
- package/dist/agent/copilot.js +147 -0
- package/dist/agent/copilot.js.map +1 -0
- package/dist/agent/index.js +17 -0
- package/dist/agent/index.js.map +1 -0
- package/dist/agent/provider.js +10 -0
- package/dist/agent/provider.js.map +1 -0
- package/dist/cli.js +169 -0
- package/dist/cli.js.map +1 -0
- package/dist/commands/compile.js +42 -0
- package/dist/commands/compile.js.map +1 -0
- package/dist/commands/dashboard.js +35 -0
- package/dist/commands/dashboard.js.map +1 -0
- package/dist/commands/doctor.js +40 -0
- package/dist/commands/doctor.js.map +1 -0
- package/dist/commands/eval.js +178 -0
- package/dist/commands/eval.js.map +1 -0
- package/dist/commands/init.js +100 -0
- package/dist/commands/init.js.map +1 -0
- package/dist/commands/update.js +176 -0
- package/dist/commands/update.js.map +1 -0
- package/dist/core/activity.js +80 -0
- package/dist/core/activity.js.map +1 -0
- package/dist/core/audit.js +112 -0
- package/dist/core/audit.js.map +1 -0
- package/dist/core/compile.js +250 -0
- package/dist/core/compile.js.map +1 -0
- package/dist/core/fsutil.js +45 -0
- package/dist/core/fsutil.js.map +1 -0
- package/dist/core/install.js +97 -0
- package/dist/core/install.js.map +1 -0
- package/dist/core/knowledge.js +34 -0
- package/dist/core/knowledge.js.map +1 -0
- package/dist/core/logger.js +31 -0
- package/dist/core/logger.js.map +1 -0
- package/dist/core/paths.js +22 -0
- package/dist/core/paths.js.map +1 -0
- package/dist/core/setupsteps.js +72 -0
- package/dist/core/setupsteps.js.map +1 -0
- package/dist/core/state.js +19 -0
- package/dist/core/state.js.map +1 -0
- package/dist/core/surfaces.js +62 -0
- package/dist/core/surfaces.js.map +1 -0
- package/dist/prompts/index.js +117 -0
- package/dist/prompts/index.js.map +1 -0
- package/dist/version.js +26 -0
- package/dist/version.js.map +1 -0
- package/knowledge/PRINCIPLES.md +106 -0
- package/knowledge/manifest.json +247 -0
- package/knowledge/templates/AGENTS.md +66 -0
- package/knowledge/templates/AGENTS.package.example.md +19 -0
- package/knowledge/templates/agents/README.md +33 -0
- package/knowledge/templates/agents/developer.md +7 -0
- package/knowledge/templates/agents/developer.yml +7 -0
- package/knowledge/templates/agents/judge.md +6 -0
- package/knowledge/templates/agents/judge.yml +6 -0
- package/knowledge/templates/agents/reviewer.md +6 -0
- package/knowledge/templates/agents/reviewer.yml +7 -0
- package/knowledge/templates/agents/triager.md +8 -0
- package/knowledge/templates/agents/triager.yml +8 -0
- package/knowledge/templates/dashboard/dashboard.mjs +261 -0
- package/knowledge/templates/eval/RUBRIC.md +94 -0
- package/knowledge/templates/eval/axes.json +56 -0
- package/knowledge/templates/eval/checks.json +304 -0
- package/knowledge/templates/eval/sandbox/eval-rules.md +23 -0
- package/knowledge/templates/eval/scenarios/README.md +24 -0
- package/knowledge/templates/eval/scenarios/add-small-feature.md +28 -0
- package/knowledge/templates/eval/scenarios/fix-failing-test.md +27 -0
- package/knowledge/templates/eval/scenarios/review-catches-bug.md +30 -0
- package/knowledge/templates/eval/score.mjs +257 -0
- package/knowledge/templates/eval/static-audit.mjs +112 -0
- package/knowledge/templates/harness/ORCHESTRATION.md +53 -0
- package/knowledge/templates/harness/state-machine.yml +105 -0
- package/knowledge/templates/mcp/mcp.json +12 -0
- package/knowledge/templates/rules/README.md +32 -0
- package/knowledge/templates/rules/code-review.md +26 -0
- package/knowledge/templates/rules/coding-standards.md +15 -0
- package/knowledge/templates/rules/no-debug-logging.md +16 -0
- package/knowledge/templates/rules/security.md +23 -0
- package/knowledge/templates/scripts/repair-worktrees.sh +124 -0
- package/knowledge/templates/skills/fix-ci/SKILL.md +17 -0
- package/knowledge/templates/skills/harness-eval/SKILL.md +83 -0
- package/knowledge/templates/skills/self-verify/SKILL.md +25 -0
- package/knowledge/templates/skills/skill-authoring/SKILL.md +35 -0
- package/knowledge/templates/skills/skill-improver/SKILL.md +23 -0
- package/knowledge/templates/skills/verify-loop/SKILL.md +35 -0
- package/knowledge/templates/wiki/README.md +23 -0
- package/knowledge/templates/wiki/_TEMPLATE.md +16 -0
- package/knowledge/templates/wiki/index.md +29 -0
- package/knowledge/templates/wiki/troubleshooting.md +14 -0
- package/package.json +70 -0
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# AgentRig — Principles of a successful agent harness
|
|
2
|
+
|
|
3
|
+
> This is AgentRig's **canonical, editable** copy of the harness principles. Edit it freely;
|
|
4
|
+
> `agentrig update` will carry your edits into any repo that uses AgentRig.
|
|
5
|
+
> Synthesized from `infinity-microsoft/epichan`, `office-shared/fluent-agent`, and
|
|
6
|
+
> `microsoft/fluentui`.
|
|
7
|
+
|
|
8
|
+
A *harness* is the surrounding scaffolding (orchestration, prompts, skills, memory, evaluation)
|
|
9
|
+
that lets autonomous coding agents reliably **triage → implement → review → judge → merge** with
|
|
10
|
+
minimal human babysitting. AgentRig installs an opinionated harness into any repo, keeps context of
|
|
11
|
+
what the repo is about, and ships a way to **evaluate the harness itself**.
|
|
12
|
+
|
|
13
|
+
Each principle below names the concrete artifact(s) AgentRig installs and how the harness audit
|
|
14
|
+
(`agentrig eval --static`) scores it.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## 1. Treat the workflow as an explicit state machine
|
|
19
|
+
Every task moves through named states (`ingested → queued → implementing → reviewing → judging →
|
|
20
|
+
ready_to_merge → merged → closed`) and every transition declares its trigger. The DAG is the
|
|
21
|
+
contract; agents do not invent transitions and reviewers cannot skip gates.
|
|
22
|
+
**Artifact:** `.agentrig/harness/state-machine.yml`.
|
|
23
|
+
|
|
24
|
+
## 2. Specialize roles, vary models
|
|
25
|
+
Route each state to a *role* (`triager`, `developer`, `reviewer`, `judge`), each with a short prompt
|
|
26
|
+
and its own `model_tier`. Run the reviewer on a **different model than the developer** — single-model
|
|
27
|
+
-bias mitigation matters more than any prompt tweak. The roster is extensible: add new agent types
|
|
28
|
+
(`designer`, `security-reviewer`, …) by dropping a `<role>.{yml,md}` in and wiring a transition.
|
|
29
|
+
**Artifact:** `.agentrig/agents/{triager,developer,reviewer,judge}.{yml,md}` (+ `README.md`) with
|
|
30
|
+
distinct models.
|
|
31
|
+
|
|
32
|
+
## 3. Externalize state in a system of record
|
|
33
|
+
GitHub is the source of truth. Labels are the contract, not decoration. Pollers reconcile the
|
|
34
|
+
engine against GitHub on a cadence; events drive reactive transitions. If the engine crashes,
|
|
35
|
+
GitHub still tells you the truth. A **dashboard** surfaces the live picture: which tasks sit in which
|
|
36
|
+
state (by label), who they're assigned to, plus harness score and eval status.
|
|
37
|
+
**Artifact:** labels/state mapping in the state machine + MCP GitHub server +
|
|
38
|
+
`.agentrig/dashboard/dashboard.mjs` (`agentrig dashboard`).
|
|
39
|
+
|
|
40
|
+
## 4. Skills are procedural memory; rules are reflexes
|
|
41
|
+
Skills (`SKILL.md` with YAML frontmatter for triggers, `allowed-tools`, `argument-hint`) encode
|
|
42
|
+
*how to do one thing well*. They are composable, auto-discovered, tool-scoped, and mirrored across
|
|
43
|
+
vendor surfaces (`.claude/`, `.copilot/`, `.agents/`, …). Rules are glob-scoped and auto-loaded
|
|
44
|
+
when matching files are edited, with an explicit priority order.
|
|
45
|
+
**Artifact:** `.agents/skills/*/SKILL.md`, `.agents/rules/*.md` + `README.md`.
|
|
46
|
+
|
|
47
|
+
## 5. Self-verify before handoff
|
|
48
|
+
After producing work, the implementing agent runs its own verification loop (build/test/visual)
|
|
49
|
+
pinned to its own HEAD and decides between *iterate*, *continue*, or *self-park*. The reviewer is
|
|
50
|
+
only invoked once the producer's loop has converged. Cap iteration attempts (N=3) and fall back.
|
|
51
|
+
**Artifact:** `.agents/skills/self-verify/SKILL.md`.
|
|
52
|
+
|
|
53
|
+
## 6. Independent, rubric-driven evaluation
|
|
54
|
+
Score work on explicit axes with credit tiers (0 / 0.5 / 1.0), a mandatory **issue code** plus
|
|
55
|
+
evidence whenever a score is < full, and a deterministic aggregator (never hand-edited JSON). This
|
|
56
|
+
is how you tell whether a prompt change made the agent better or worse — and it is how you evaluate
|
|
57
|
+
**the harness itself**.
|
|
58
|
+
**Artifact:** `.agentrig/eval/` (RUBRIC.md, checks.json, scenarios, score.mjs, static-audit.mjs)
|
|
59
|
+
and the `harness-eval` skill.
|
|
60
|
+
|
|
61
|
+
## 7. Hermetic per-agent environments
|
|
62
|
+
Each concurrent agent runs in its **own git worktree** so developers, reviewers, and judges never
|
|
63
|
+
trip over each other's working trees or lockfiles. A repair script prunes stale worktree metadata
|
|
64
|
+
before every add. Isolation is a hard prerequisite for multi-agent throughput.
|
|
65
|
+
**Artifact:** `scripts/repair-worktrees.sh` + worktree guidance in the wiki.
|
|
66
|
+
|
|
67
|
+
## 8. Continuous self-improvement: every mistake is a prompt bug
|
|
68
|
+
Agents log new gotchas to a tiered memory (central committed wiki → local git-ignored wiki →
|
|
69
|
+
session scratch). A `skill-improver` turns reviewer feedback into instruction-surface changes that
|
|
70
|
+
must pass a **prevention test** ("would this new wording have changed the original failure?").
|
|
71
|
+
Strict admission tests stop duplication from killing the wiki.
|
|
72
|
+
**Artifact:** `.agents/wiki/` + `.agents/skills/skill-improver/SKILL.md`.
|
|
73
|
+
|
|
74
|
+
## 9. Human-in-the-loop where reversibility is low
|
|
75
|
+
Low-reversibility actions are recommend-then-apply: the agent surfaces proposed changes and waits
|
|
76
|
+
for explicit `apply`/`approve`/`skip`. Certain labels are **human-only gates** the agent must never
|
|
77
|
+
apply or even name. These are deliberate trust boundaries, not friction.
|
|
78
|
+
**Artifact:** human-gate declarations in the state machine + rules.
|
|
79
|
+
|
|
80
|
+
## 10. Hard limits and safety nets
|
|
81
|
+
Set `max_review_iterations`, `max_diff_chars`, a token `runaway_cap`, and `pre_pr`/`pre_merge`
|
|
82
|
+
hooks. Protected files require a human-override label. A recovery scan re-queues anything stuck too
|
|
83
|
+
long. These caps keep an agent pool from melting the repo.
|
|
84
|
+
**Artifact:** `limits:` block in `.agentrig/harness/state-machine.yml`.
|
|
85
|
+
|
|
86
|
+
## 11. One canonical source, projected to every agent surface (local + remote)
|
|
87
|
+
The harness keeps **one** source of truth (`AGENTS.md` + `.agents/rules/` + `.agents/skills/`) and
|
|
88
|
+
**projects** it into each ecosystem's native discovery format so *any* agent benefits without
|
|
89
|
+
lock-in — local CLIs **and** remote/cloud agents:
|
|
90
|
+
- **GitHub Copilot (remote coding agent + IDE):** `.github/copilot-instructions.md`,
|
|
91
|
+
path-scoped `.github/instructions/*.instructions.md` (`applyTo` globs), and
|
|
92
|
+
`.github/workflows/copilot-setup-steps.yml` for the cloud agent's environment.
|
|
93
|
+
- **Claude Code:** `CLAUDE.md`. **Cursor:** `.cursor/rules/*.mdc`. **OpenCode/Codex:** `AGENTS.md`.
|
|
94
|
+
- **MCP** mirrored to each surface (`.mcp.json`, `.vscode/mcp.json`, `.github/copilot/mcp.json`).
|
|
95
|
+
|
|
96
|
+
This is the meta-harness payoff: assign an issue to the web GitHub Copilot agent and it sees the same
|
|
97
|
+
rules/setup/MCP as your local Copilot CLI, Claude Code, or Cursor. Projections regenerate from the
|
|
98
|
+
source; never hand-edit the generated files.
|
|
99
|
+
**Artifact:** the compiler (`agentrig compile`) + the projected files above; symlinked vendor dirs
|
|
100
|
+
for skills.
|
|
101
|
+
|
|
102
|
+
## 12. Instructions are the source of truth, not existing code
|
|
103
|
+
A short, unmissable **Critical Rules** block at the top of `AGENTS.md` beats a 50-page contributing
|
|
104
|
+
guide. Pair it with package-local AGENTS.md, golden-principles docs, and a directory map so an
|
|
105
|
+
agent can answer "what should I do?" without spelunking. Legacy code is not the spec.
|
|
106
|
+
**Artifact:** root `AGENTS.md` with a `Critical Rules` section + repo context.
|
|
@@ -0,0 +1,247 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "agentrig-manifest/1",
|
|
3
|
+
"knowledgeVersion": "0.3.3",
|
|
4
|
+
"description": "Declares which best-practice artifacts AgentRig installs into a target repo and where. `src` is relative to the knowledge/ root; `dest` is relative to the target repo root. `kind`: file | dir | template. Templates contain {{PLACEHOLDERS}} the agent fills from its investigation; deterministic installs substitute known values and leave the rest for the agent.",
|
|
5
|
+
"artifacts": [
|
|
6
|
+
{
|
|
7
|
+
"id": "agents-md",
|
|
8
|
+
"principle": 12,
|
|
9
|
+
"src": "templates/AGENTS.md",
|
|
10
|
+
"dest": "AGENTS.md",
|
|
11
|
+
"kind": "template",
|
|
12
|
+
"merge": "markers"
|
|
13
|
+
},
|
|
14
|
+
{
|
|
15
|
+
"id": "principles",
|
|
16
|
+
"principle": 12,
|
|
17
|
+
"src": "PRINCIPLES.md",
|
|
18
|
+
"dest": ".agentrig/PRINCIPLES.md",
|
|
19
|
+
"kind": "file"
|
|
20
|
+
},
|
|
21
|
+
{
|
|
22
|
+
"id": "agents-package-example",
|
|
23
|
+
"principle": 12,
|
|
24
|
+
"src": "templates/AGENTS.package.example.md",
|
|
25
|
+
"dest": ".agentrig/AGENTS.package.example.md",
|
|
26
|
+
"kind": "file"
|
|
27
|
+
},
|
|
28
|
+
{
|
|
29
|
+
"id": "state-machine",
|
|
30
|
+
"principle": 1,
|
|
31
|
+
"src": "templates/harness/state-machine.yml",
|
|
32
|
+
"dest": ".agentrig/harness/state-machine.yml",
|
|
33
|
+
"kind": "file",
|
|
34
|
+
"refresh": "preserve"
|
|
35
|
+
},
|
|
36
|
+
{
|
|
37
|
+
"id": "orchestration",
|
|
38
|
+
"principle": 1,
|
|
39
|
+
"src": "templates/harness/ORCHESTRATION.md",
|
|
40
|
+
"dest": ".agentrig/harness/ORCHESTRATION.md",
|
|
41
|
+
"kind": "file"
|
|
42
|
+
},
|
|
43
|
+
{
|
|
44
|
+
"id": "role-developer-yml",
|
|
45
|
+
"principle": 2,
|
|
46
|
+
"src": "templates/agents/developer.yml",
|
|
47
|
+
"dest": ".agentrig/agents/developer.yml",
|
|
48
|
+
"kind": "file",
|
|
49
|
+
"refresh": "preserve"
|
|
50
|
+
},
|
|
51
|
+
{
|
|
52
|
+
"id": "role-developer-md",
|
|
53
|
+
"principle": 2,
|
|
54
|
+
"src": "templates/agents/developer.md",
|
|
55
|
+
"dest": ".agentrig/agents/developer.md",
|
|
56
|
+
"kind": "file",
|
|
57
|
+
"refresh": "preserve"
|
|
58
|
+
},
|
|
59
|
+
{
|
|
60
|
+
"id": "role-reviewer-yml",
|
|
61
|
+
"principle": 2,
|
|
62
|
+
"src": "templates/agents/reviewer.yml",
|
|
63
|
+
"dest": ".agentrig/agents/reviewer.yml",
|
|
64
|
+
"kind": "file",
|
|
65
|
+
"refresh": "preserve"
|
|
66
|
+
},
|
|
67
|
+
{
|
|
68
|
+
"id": "role-reviewer-md",
|
|
69
|
+
"principle": 2,
|
|
70
|
+
"src": "templates/agents/reviewer.md",
|
|
71
|
+
"dest": ".agentrig/agents/reviewer.md",
|
|
72
|
+
"kind": "file",
|
|
73
|
+
"refresh": "preserve"
|
|
74
|
+
},
|
|
75
|
+
{
|
|
76
|
+
"id": "role-judge-yml",
|
|
77
|
+
"principle": 2,
|
|
78
|
+
"src": "templates/agents/judge.yml",
|
|
79
|
+
"dest": ".agentrig/agents/judge.yml",
|
|
80
|
+
"kind": "file",
|
|
81
|
+
"refresh": "preserve"
|
|
82
|
+
},
|
|
83
|
+
{
|
|
84
|
+
"id": "role-judge-md",
|
|
85
|
+
"principle": 2,
|
|
86
|
+
"src": "templates/agents/judge.md",
|
|
87
|
+
"dest": ".agentrig/agents/judge.md",
|
|
88
|
+
"kind": "file",
|
|
89
|
+
"refresh": "preserve"
|
|
90
|
+
},
|
|
91
|
+
{
|
|
92
|
+
"id": "role-triager-yml",
|
|
93
|
+
"principle": 2,
|
|
94
|
+
"src": "templates/agents/triager.yml",
|
|
95
|
+
"dest": ".agentrig/agents/triager.yml",
|
|
96
|
+
"kind": "file",
|
|
97
|
+
"refresh": "preserve"
|
|
98
|
+
},
|
|
99
|
+
{
|
|
100
|
+
"id": "role-triager-md",
|
|
101
|
+
"principle": 2,
|
|
102
|
+
"src": "templates/agents/triager.md",
|
|
103
|
+
"dest": ".agentrig/agents/triager.md",
|
|
104
|
+
"kind": "file",
|
|
105
|
+
"refresh": "preserve"
|
|
106
|
+
},
|
|
107
|
+
{
|
|
108
|
+
"id": "roles-readme",
|
|
109
|
+
"principle": 2,
|
|
110
|
+
"src": "templates/agents/README.md",
|
|
111
|
+
"dest": ".agentrig/agents/README.md",
|
|
112
|
+
"kind": "file",
|
|
113
|
+
"refresh": "preserve"
|
|
114
|
+
},
|
|
115
|
+
{
|
|
116
|
+
"id": "skill-self-verify",
|
|
117
|
+
"principle": 5,
|
|
118
|
+
"src": "templates/skills/self-verify",
|
|
119
|
+
"dest": ".agents/skills/self-verify",
|
|
120
|
+
"kind": "dir"
|
|
121
|
+
},
|
|
122
|
+
{
|
|
123
|
+
"id": "skill-fix-ci",
|
|
124
|
+
"principle": 4,
|
|
125
|
+
"src": "templates/skills/fix-ci",
|
|
126
|
+
"dest": ".agents/skills/fix-ci",
|
|
127
|
+
"kind": "dir"
|
|
128
|
+
},
|
|
129
|
+
{
|
|
130
|
+
"id": "skill-skill-improver",
|
|
131
|
+
"principle": 8,
|
|
132
|
+
"src": "templates/skills/skill-improver",
|
|
133
|
+
"dest": ".agents/skills/skill-improver",
|
|
134
|
+
"kind": "dir"
|
|
135
|
+
},
|
|
136
|
+
{
|
|
137
|
+
"id": "skill-harness-eval",
|
|
138
|
+
"principle": 6,
|
|
139
|
+
"src": "templates/skills/harness-eval",
|
|
140
|
+
"dest": ".agents/skills/harness-eval",
|
|
141
|
+
"kind": "dir"
|
|
142
|
+
},
|
|
143
|
+
{
|
|
144
|
+
"id": "skill-verify-loop",
|
|
145
|
+
"principle": 5,
|
|
146
|
+
"src": "templates/skills/verify-loop",
|
|
147
|
+
"dest": ".agents/skills/verify-loop",
|
|
148
|
+
"kind": "dir"
|
|
149
|
+
},
|
|
150
|
+
{
|
|
151
|
+
"id": "skill-authoring",
|
|
152
|
+
"principle": 4,
|
|
153
|
+
"src": "templates/skills/skill-authoring",
|
|
154
|
+
"dest": ".agents/skills/skill-authoring",
|
|
155
|
+
"kind": "dir"
|
|
156
|
+
},
|
|
157
|
+
{
|
|
158
|
+
"id": "rules",
|
|
159
|
+
"principle": 4,
|
|
160
|
+
"src": "templates/rules",
|
|
161
|
+
"dest": ".agents/rules",
|
|
162
|
+
"kind": "dir"
|
|
163
|
+
},
|
|
164
|
+
{
|
|
165
|
+
"id": "wiki",
|
|
166
|
+
"principle": 8,
|
|
167
|
+
"src": "templates/wiki",
|
|
168
|
+
"dest": ".agents/wiki",
|
|
169
|
+
"kind": "dir"
|
|
170
|
+
},
|
|
171
|
+
{
|
|
172
|
+
"id": "mcp",
|
|
173
|
+
"principle": 11,
|
|
174
|
+
"src": "templates/mcp/mcp.json",
|
|
175
|
+
"dest": ".mcp.json",
|
|
176
|
+
"kind": "file",
|
|
177
|
+
"refresh": "preserve"
|
|
178
|
+
},
|
|
179
|
+
{
|
|
180
|
+
"id": "worktree-script",
|
|
181
|
+
"principle": 7,
|
|
182
|
+
"src": "templates/scripts/repair-worktrees.sh",
|
|
183
|
+
"dest": "scripts/repair-worktrees.sh",
|
|
184
|
+
"kind": "file",
|
|
185
|
+
"mode": "0755"
|
|
186
|
+
},
|
|
187
|
+
{
|
|
188
|
+
"id": "eval-rubric",
|
|
189
|
+
"principle": 6,
|
|
190
|
+
"src": "templates/eval/RUBRIC.md",
|
|
191
|
+
"dest": ".agentrig/eval/RUBRIC.md",
|
|
192
|
+
"kind": "file"
|
|
193
|
+
},
|
|
194
|
+
{
|
|
195
|
+
"id": "eval-axes",
|
|
196
|
+
"principle": 6,
|
|
197
|
+
"src": "templates/eval/axes.json",
|
|
198
|
+
"dest": ".agentrig/eval/axes.json",
|
|
199
|
+
"kind": "file"
|
|
200
|
+
},
|
|
201
|
+
{
|
|
202
|
+
"id": "eval-checks",
|
|
203
|
+
"principle": 6,
|
|
204
|
+
"src": "templates/eval/checks.json",
|
|
205
|
+
"dest": ".agentrig/eval/checks.json",
|
|
206
|
+
"kind": "file"
|
|
207
|
+
},
|
|
208
|
+
{
|
|
209
|
+
"id": "eval-static-audit",
|
|
210
|
+
"principle": 6,
|
|
211
|
+
"src": "templates/eval/static-audit.mjs",
|
|
212
|
+
"dest": ".agentrig/eval/static-audit.mjs",
|
|
213
|
+
"kind": "file",
|
|
214
|
+
"mode": "0755"
|
|
215
|
+
},
|
|
216
|
+
{
|
|
217
|
+
"id": "eval-score",
|
|
218
|
+
"principle": 6,
|
|
219
|
+
"src": "templates/eval/score.mjs",
|
|
220
|
+
"dest": ".agentrig/eval/score.mjs",
|
|
221
|
+
"kind": "file",
|
|
222
|
+
"mode": "0755"
|
|
223
|
+
},
|
|
224
|
+
{
|
|
225
|
+
"id": "eval-scenarios",
|
|
226
|
+
"principle": 6,
|
|
227
|
+
"src": "templates/eval/scenarios",
|
|
228
|
+
"dest": ".agentrig/eval/scenarios",
|
|
229
|
+
"kind": "dir"
|
|
230
|
+
},
|
|
231
|
+
{
|
|
232
|
+
"id": "eval-sandbox",
|
|
233
|
+
"principle": 6,
|
|
234
|
+
"src": "templates/eval/sandbox",
|
|
235
|
+
"dest": ".agentrig/eval/sandbox",
|
|
236
|
+
"kind": "dir"
|
|
237
|
+
},
|
|
238
|
+
{
|
|
239
|
+
"id": "dashboard",
|
|
240
|
+
"principle": 3,
|
|
241
|
+
"src": "templates/dashboard/dashboard.mjs",
|
|
242
|
+
"dest": ".agentrig/dashboard/dashboard.mjs",
|
|
243
|
+
"kind": "file",
|
|
244
|
+
"mode": "0755"
|
|
245
|
+
}
|
|
246
|
+
]
|
|
247
|
+
}
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# {{REPO_NAME}} — Agent instructions
|
|
2
|
+
|
|
3
|
+
> Managed in part by [AgentRig](https://github.com/). Sections between AgentRig markers are
|
|
4
|
+
> refreshed by `agentrig update`; edit outside the markers (and the repo-specific context) freely.
|
|
5
|
+
|
|
6
|
+
## Critical Rules (read first, every time)
|
|
7
|
+
<!-- AGENTRIG:critical-rules:start -->
|
|
8
|
+
1. **Instructions are the source of truth, not existing code.** This repo may contain legacy
|
|
9
|
+
patterns that predate current standards. When code and these instructions disagree, follow the
|
|
10
|
+
instructions and flag the discrepancy.
|
|
11
|
+
2. **Self-verify before handoff.** Run the project's build/test/lint and the `self-verify` skill
|
|
12
|
+
before you mark work ready. Never hand a red build to a reviewer.
|
|
13
|
+
3. **Never skip a state-machine gate** (`.agentrig/harness/state-machine.yml`) and never apply a
|
|
14
|
+
human-only label. Low-reversibility actions are recommend-then-apply.
|
|
15
|
+
4. **Respect hard limits** (diff size, review iterations, token cap) declared in the state machine.
|
|
16
|
+
5. **Every mistake is a prompt bug.** When you hit a gotcha, record it in `.agents/wiki/` and, if a
|
|
17
|
+
skill or rule should have prevented it, run `skill-improver`.
|
|
18
|
+
<!-- AGENTRIG:critical-rules:end -->
|
|
19
|
+
|
|
20
|
+
## What this repository is
|
|
21
|
+
<!-- AGENTRIG:context:start -->
|
|
22
|
+
{{REPO_SUMMARY}}
|
|
23
|
+
|
|
24
|
+
See `.agentrig/context.md` for the full, agent-authored investigation of this repository.
|
|
25
|
+
<!-- AGENTRIG:context:end -->
|
|
26
|
+
|
|
27
|
+
## How to build, test, and lint
|
|
28
|
+
<!-- AGENTRIG:commands:start -->
|
|
29
|
+
- **Install:** `{{INSTALL_CMD}}`
|
|
30
|
+
- **Build:** `{{BUILD_CMD}}`
|
|
31
|
+
- **Test:** `{{TEST_CMD}}`
|
|
32
|
+
- **Lint:** `{{LINT_CMD}}`
|
|
33
|
+
<!-- AGENTRIG:commands:end -->
|
|
34
|
+
|
|
35
|
+
## Directory map
|
|
36
|
+
<!-- AGENTRIG:dirmap:start -->
|
|
37
|
+
{{DIRECTORY_MAP}}
|
|
38
|
+
<!-- AGENTRIG:dirmap:end -->
|
|
39
|
+
|
|
40
|
+
## The harness
|
|
41
|
+
<!-- AGENTRIG:harness:start -->
|
|
42
|
+
- **Workflow / state machine:** `.agentrig/harness/state-machine.yml`
|
|
43
|
+
- **Agent roles & models:** `.agentrig/agents/` (triager, developer, reviewer, judge — each on a
|
|
44
|
+
varied model; reviewer differs from developer on purpose). See `.agentrig/agents/README.md` to add
|
|
45
|
+
new agent types.
|
|
46
|
+
- **Skills (procedural memory):** `.agents/skills/`
|
|
47
|
+
<!-- AGENTRIG:skills-inventory:start -->
|
|
48
|
+
{{SKILLS_INVENTORY}}
|
|
49
|
+
<!-- AGENTRIG:skills-inventory:end -->
|
|
50
|
+
- **Rules (reflexes, glob-scoped):** `.agents/rules/`
|
|
51
|
+
- **Memory / wiki:** `.agents/wiki/` (see `index.md` for what belongs where)
|
|
52
|
+
- **Tooling (MCP):** `.mcp.json`
|
|
53
|
+
- **Agent surfaces (compiled):** `agentrig compile` projects this file + `.agents/rules/` into every
|
|
54
|
+
agent's native format — `.github/copilot-instructions.md` & `.github/instructions/` (Copilot, web +
|
|
55
|
+
IDE), `CLAUDE.md` (Claude Code), `.cursor/rules/` (Cursor), `.vscode/mcp.json`, and
|
|
56
|
+
`.github/workflows/copilot-setup-steps.yml`. Edit the source here, not the generated files.
|
|
57
|
+
- **Surfaces:** `.claude` / `.copilot` / `.opencode` / `.codex` symlink to `.agents` so any vendor CLI
|
|
58
|
+
sees the same skills/rules/wiki.
|
|
59
|
+
- **Orchestration contract:** `.agentrig/harness/ORCHESTRATION.md`
|
|
60
|
+
- **Dashboard:** `agentrig dashboard` (or `node .agentrig/dashboard/dashboard.mjs`) — agent roster,
|
|
61
|
+
live GitHub tasks per harness label, harness score, and eval status. `--html` for a web view.
|
|
62
|
+
- **Evaluate the harness itself:** `agentrig eval --static` or `node .agentrig/eval/static-audit.mjs`;
|
|
63
|
+
see `.agentrig/eval/RUBRIC.md`.
|
|
64
|
+
- **Package-local instructions:** drop an `AGENTS.md` in a subpackage to add scope-specific rules;
|
|
65
|
+
it augments this root file. See `.agentrig/AGENTS.package.example.md`.
|
|
66
|
+
<!-- AGENTRIG:harness:end -->
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
# <package name> — Agent instructions (package-local)
|
|
2
|
+
|
|
3
|
+
> Drop a file like this at the root of a subpackage/subtree. It **augments** the repo-root
|
|
4
|
+
> `AGENTS.md` with scope-specific guidance; it does not replace the root Critical Rules.
|
|
5
|
+
|
|
6
|
+
## Scope
|
|
7
|
+
Applies to everything under this directory.
|
|
8
|
+
|
|
9
|
+
## What this package is
|
|
10
|
+
One or two sentences: purpose, public surface, who depends on it.
|
|
11
|
+
|
|
12
|
+
## Local rules
|
|
13
|
+
- Build/test/lint commands specific to this package, if they differ from the root.
|
|
14
|
+
- Conventions that only apply here (naming, layering, allowed dependencies).
|
|
15
|
+
- Files/areas to treat as protected or generated.
|
|
16
|
+
|
|
17
|
+
## Pointers
|
|
18
|
+
- Root policy: `/AGENTS.md`
|
|
19
|
+
- Path-scoped reflexes: add a glob-scoped rule under `.agents/rules/` instead of repeating it here.
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Agent roles (principle 2 — specialize roles, vary models)
|
|
2
|
+
|
|
3
|
+
The harness routes each state of the workflow to a **specialized agent type**, each with its own
|
|
4
|
+
short prompt and its own model. Running different roles on **different models** is deliberate:
|
|
5
|
+
single-model-bias mitigation surfaces problems no single model would catch alone.
|
|
6
|
+
|
|
7
|
+
## Roster (installed by default)
|
|
8
|
+
|
|
9
|
+
| Role | File | Default model | Drives state |
|
|
10
|
+
|------|------|---------------|--------------|
|
|
11
|
+
| **triager** | `triager.{yml,md}` | `gpt-5-mini` (low) | `ingested → queued` |
|
|
12
|
+
| **developer**| `developer.{yml,md}`| `claude-sonnet-4.5` (high)| `queued → implementing → reviewing` |
|
|
13
|
+
| **reviewer** | `reviewer.{yml,md}` | `gpt-5` (high) | `reviewing` |
|
|
14
|
+
| **judge** | `judge.{yml,md}` | `claude-opus-4.5` (high) | `judging → ready_to_merge` |
|
|
15
|
+
|
|
16
|
+
> Keep the **reviewer on a different model family than the developer**. The audit
|
|
17
|
+
> (`agentrig eval --static`) checks for this.
|
|
18
|
+
|
|
19
|
+
## Each role has two files
|
|
20
|
+
- `<role>.yml` — declarative config: `role`, `model`, `model_tier`, `allowed_tools`, and the
|
|
21
|
+
`prompt` path. Skills are auto-discovered from `.agents/skills/`, so no skill list is needed.
|
|
22
|
+
- `<role>.md` — the role's short prompt (keep it to a few imperative lines).
|
|
23
|
+
|
|
24
|
+
## Adding a new agent type
|
|
25
|
+
1. Create `agents/<role>.yml` and `agents/<role>.md`. Pick a model that differs from adjacent roles
|
|
26
|
+
in the pipeline.
|
|
27
|
+
2. Wire the role into `.agentrig/harness/state-machine.yml` by giving a transition
|
|
28
|
+
`trigger: agent` and `role: <role>`.
|
|
29
|
+
3. If the role needs a new procedure, add a skill under `.agents/skills/`.
|
|
30
|
+
|
|
31
|
+
Example roles you might add: `designer` (visual/UX work), `security-reviewer`, `release-manager`,
|
|
32
|
+
`docs-writer`. The pipeline is yours to extend — the state machine is the contract that keeps it
|
|
33
|
+
coherent.
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
You are the **developer**. Implement the smallest correct change that fully satisfies the task.
|
|
2
|
+
|
|
3
|
+
- Follow `AGENTS.md` Critical Rules and the glob-scoped rules in `.agents/rules/`.
|
|
4
|
+
- Run `self-verify` (build + test + lint) before requesting review. Iterate up to 3 times; if still
|
|
5
|
+
red, self-park with a clear note rather than handing a broken diff to the reviewer.
|
|
6
|
+
- Keep the diff under the `max_diff_chars` limit. Split work if it grows larger.
|
|
7
|
+
- Log any new gotcha to `.agents/wiki/`.
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
# Developer role (principle 2). Implements the change in the `implementing` state.
|
|
2
|
+
role: developer
|
|
3
|
+
model: claude-sonnet-4.5
|
|
4
|
+
model_tier: high
|
|
5
|
+
# Skills are auto-discovered from .agents/skills; no explicit list needed.
|
|
6
|
+
allowed_tools: [read, write, edit, bash, grep, glob]
|
|
7
|
+
prompt: agents/developer.md
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
You are the **judge**. Score the completed work against `.agentrig/eval/RUBRIC.md`.
|
|
2
|
+
|
|
3
|
+
- Use credit tiers 0 / 0.5 / 1.0. Any score < 1.0 REQUIRES an issue code and one line of evidence.
|
|
4
|
+
- Save results with `.agentrig/eval/score.mjs` (never hand-edit the JSON).
|
|
5
|
+
- Pass to `ready_to_merge` only if the aggregate clears the threshold; otherwise return to
|
|
6
|
+
`implementing` with the failing axes.
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
You are the **reviewer**, running a different model than the developer on purpose.
|
|
2
|
+
|
|
3
|
+
- Review only the diff. Surface bugs, security issues, and logic errors — never style nits.
|
|
4
|
+
- Score with confidence per category; refuse to surface low-signal comments.
|
|
5
|
+
- If you request changes, return to `implementing` with a concrete, testable reason.
|
|
6
|
+
- You may not apply human-only labels (see the state machine).
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
# Reviewer role (principle 2). Deliberately a DIFFERENT model family than the developer
|
|
2
|
+
# to mitigate single-model bias — divergent verdicts surface problems neither model alone catches.
|
|
3
|
+
role: reviewer
|
|
4
|
+
model: gpt-5
|
|
5
|
+
model_tier: high
|
|
6
|
+
allowed_tools: [read, grep, glob, bash]
|
|
7
|
+
prompt: agents/reviewer.md
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
You are the **triager**. Turn a freshly `ingested` task into a well-formed `queued` one.
|
|
2
|
+
|
|
3
|
+
- Read the issue/task and the repo context (`.agentrig/context.md`).
|
|
4
|
+
- Recommend labels, an assignee/role, and a size estimate. Surface them as a proposal table and
|
|
5
|
+
**wait for explicit apply/approve** — never apply human-only labels (see the state machine).
|
|
6
|
+
- Confirm the task is actionable (clear acceptance criteria). If not, ask for clarification rather
|
|
7
|
+
than queueing ambiguous work.
|
|
8
|
+
- When approved, move the task to `queued`.
|
|
@@ -0,0 +1,8 @@
|
|
|
1
|
+
# Triager role (principle 2, 9). Moves `ingested` tasks to `queued`: recommend labels/assignees,
|
|
2
|
+
# size the work, and gate on human approval for low-reversibility calls. Uses a fast, cheap model
|
|
3
|
+
# on purpose — triage is high-volume and should not burn a premium tier.
|
|
4
|
+
role: triager
|
|
5
|
+
model: gpt-5-mini
|
|
6
|
+
model_tier: low
|
|
7
|
+
allowed_tools: [read, grep, glob, bash]
|
|
8
|
+
prompt: agents/triager.md
|