@doidor/agentrig 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +224 -0
  3. package/dist/agent/claude.js +125 -0
  4. package/dist/agent/claude.js.map +1 -0
  5. package/dist/agent/copilot.js +147 -0
  6. package/dist/agent/copilot.js.map +1 -0
  7. package/dist/agent/index.js +17 -0
  8. package/dist/agent/index.js.map +1 -0
  9. package/dist/agent/provider.js +10 -0
  10. package/dist/agent/provider.js.map +1 -0
  11. package/dist/cli.js +169 -0
  12. package/dist/cli.js.map +1 -0
  13. package/dist/commands/compile.js +42 -0
  14. package/dist/commands/compile.js.map +1 -0
  15. package/dist/commands/dashboard.js +35 -0
  16. package/dist/commands/dashboard.js.map +1 -0
  17. package/dist/commands/doctor.js +40 -0
  18. package/dist/commands/doctor.js.map +1 -0
  19. package/dist/commands/eval.js +178 -0
  20. package/dist/commands/eval.js.map +1 -0
  21. package/dist/commands/init.js +100 -0
  22. package/dist/commands/init.js.map +1 -0
  23. package/dist/commands/update.js +176 -0
  24. package/dist/commands/update.js.map +1 -0
  25. package/dist/core/activity.js +80 -0
  26. package/dist/core/activity.js.map +1 -0
  27. package/dist/core/audit.js +112 -0
  28. package/dist/core/audit.js.map +1 -0
  29. package/dist/core/compile.js +250 -0
  30. package/dist/core/compile.js.map +1 -0
  31. package/dist/core/fsutil.js +45 -0
  32. package/dist/core/fsutil.js.map +1 -0
  33. package/dist/core/install.js +97 -0
  34. package/dist/core/install.js.map +1 -0
  35. package/dist/core/knowledge.js +34 -0
  36. package/dist/core/knowledge.js.map +1 -0
  37. package/dist/core/logger.js +31 -0
  38. package/dist/core/logger.js.map +1 -0
  39. package/dist/core/paths.js +22 -0
  40. package/dist/core/paths.js.map +1 -0
  41. package/dist/core/setupsteps.js +72 -0
  42. package/dist/core/setupsteps.js.map +1 -0
  43. package/dist/core/state.js +19 -0
  44. package/dist/core/state.js.map +1 -0
  45. package/dist/core/surfaces.js +62 -0
  46. package/dist/core/surfaces.js.map +1 -0
  47. package/dist/prompts/index.js +117 -0
  48. package/dist/prompts/index.js.map +1 -0
  49. package/dist/version.js +26 -0
  50. package/dist/version.js.map +1 -0
  51. package/knowledge/PRINCIPLES.md +106 -0
  52. package/knowledge/manifest.json +247 -0
  53. package/knowledge/templates/AGENTS.md +66 -0
  54. package/knowledge/templates/AGENTS.package.example.md +19 -0
  55. package/knowledge/templates/agents/README.md +33 -0
  56. package/knowledge/templates/agents/developer.md +7 -0
  57. package/knowledge/templates/agents/developer.yml +7 -0
  58. package/knowledge/templates/agents/judge.md +6 -0
  59. package/knowledge/templates/agents/judge.yml +6 -0
  60. package/knowledge/templates/agents/reviewer.md +6 -0
  61. package/knowledge/templates/agents/reviewer.yml +7 -0
  62. package/knowledge/templates/agents/triager.md +8 -0
  63. package/knowledge/templates/agents/triager.yml +8 -0
  64. package/knowledge/templates/dashboard/dashboard.mjs +261 -0
  65. package/knowledge/templates/eval/RUBRIC.md +94 -0
  66. package/knowledge/templates/eval/axes.json +56 -0
  67. package/knowledge/templates/eval/checks.json +304 -0
  68. package/knowledge/templates/eval/sandbox/eval-rules.md +23 -0
  69. package/knowledge/templates/eval/scenarios/README.md +24 -0
  70. package/knowledge/templates/eval/scenarios/add-small-feature.md +28 -0
  71. package/knowledge/templates/eval/scenarios/fix-failing-test.md +27 -0
  72. package/knowledge/templates/eval/scenarios/review-catches-bug.md +30 -0
  73. package/knowledge/templates/eval/score.mjs +257 -0
  74. package/knowledge/templates/eval/static-audit.mjs +112 -0
  75. package/knowledge/templates/harness/ORCHESTRATION.md +53 -0
  76. package/knowledge/templates/harness/state-machine.yml +105 -0
  77. package/knowledge/templates/mcp/mcp.json +12 -0
  78. package/knowledge/templates/rules/README.md +32 -0
  79. package/knowledge/templates/rules/code-review.md +26 -0
  80. package/knowledge/templates/rules/coding-standards.md +15 -0
  81. package/knowledge/templates/rules/no-debug-logging.md +16 -0
  82. package/knowledge/templates/rules/security.md +23 -0
  83. package/knowledge/templates/scripts/repair-worktrees.sh +124 -0
  84. package/knowledge/templates/skills/fix-ci/SKILL.md +17 -0
  85. package/knowledge/templates/skills/harness-eval/SKILL.md +83 -0
  86. package/knowledge/templates/skills/self-verify/SKILL.md +25 -0
  87. package/knowledge/templates/skills/skill-authoring/SKILL.md +35 -0
  88. package/knowledge/templates/skills/skill-improver/SKILL.md +23 -0
  89. package/knowledge/templates/skills/verify-loop/SKILL.md +35 -0
  90. package/knowledge/templates/wiki/README.md +23 -0
  91. package/knowledge/templates/wiki/_TEMPLATE.md +16 -0
  92. package/knowledge/templates/wiki/index.md +29 -0
  93. package/knowledge/templates/wiki/troubleshooting.md +14 -0
  94. package/package.json +70 -0
@@ -0,0 +1,106 @@
1
+ # AgentRig — Principles of a successful agent harness
2
+
3
+ > This is AgentRig's **canonical, editable** copy of the harness principles. Edit it freely;
4
+ > `agentrig update` will carry your edits into any repo that uses AgentRig.
5
+ > Synthesized from `infinity-microsoft/epichan`, `office-shared/fluent-agent`, and
6
+ > `microsoft/fluentui`.
7
+
8
+ A *harness* is the surrounding scaffolding (orchestration, prompts, skills, memory, evaluation)
9
+ that lets autonomous coding agents reliably **triage → implement → review → judge → merge** with
10
+ minimal human babysitting. AgentRig installs an opinionated harness into any repo, keeps context of
11
+ what the repo is about, and ships a way to **evaluate the harness itself**.
12
+
13
+ Each principle below names the concrete artifact(s) AgentRig installs and how the harness audit
14
+ (`agentrig eval --static`) scores it.
15
+
16
+ ---
17
+
18
+ ## 1. Treat the workflow as an explicit state machine
19
+ Every task moves through named states (`ingested → queued → implementing → reviewing → judging →
20
+ ready_to_merge → merged → closed`) and every transition declares its trigger. The DAG is the
21
+ contract; agents do not invent transitions and reviewers cannot skip gates.
22
+ **Artifact:** `.agentrig/harness/state-machine.yml`.
23
+
24
+ ## 2. Specialize roles, vary models
25
+ Route each state to a *role* (`triager`, `developer`, `reviewer`, `judge`), each with a short prompt
26
+ and its own `model_tier`. Run the reviewer on a **different model than the developer** — single-model
27
+ -bias mitigation matters more than any prompt tweak. The roster is extensible: add new agent types
28
+ (`designer`, `security-reviewer`, …) by dropping a `<role>.{yml,md}` in and wiring a transition.
29
+ **Artifact:** `.agentrig/agents/{triager,developer,reviewer,judge}.{yml,md}` (+ `README.md`) with
30
+ distinct models.
31
+
32
+ ## 3. Externalize state in a system of record
33
+ GitHub is the source of truth. Labels are the contract, not decoration. Pollers reconcile the
34
+ engine against GitHub on a cadence; events drive reactive transitions. If the engine crashes,
35
+ GitHub still tells you the truth. A **dashboard** surfaces the live picture: which tasks sit in which
36
+ state (by label), who they're assigned to, plus harness score and eval status.
37
+ **Artifact:** labels/state mapping in the state machine + MCP GitHub server +
38
+ `.agentrig/dashboard/dashboard.mjs` (`agentrig dashboard`).
39
+
40
+ ## 4. Skills are procedural memory; rules are reflexes
41
+ Skills (`SKILL.md` with YAML frontmatter for triggers, `allowed-tools`, `argument-hint`) encode
42
+ *how to do one thing well*. They are composable, auto-discovered, tool-scoped, and mirrored across
43
+ vendor surfaces (`.claude/`, `.copilot/`, `.agents/`, …). Rules are glob-scoped and auto-loaded
44
+ when matching files are edited, with an explicit priority order.
45
+ **Artifact:** `.agents/skills/*/SKILL.md`, `.agents/rules/*.md` + `README.md`.
46
+
47
+ ## 5. Self-verify before handoff
48
+ After producing work, the implementing agent runs its own verification loop (build/test/visual)
49
+ pinned to its own HEAD and decides between *iterate*, *continue*, or *self-park*. The reviewer is
50
+ only invoked once the producer's loop has converged. Cap iteration attempts (N=3) and fall back.
51
+ **Artifact:** `.agents/skills/self-verify/SKILL.md`.
52
+
53
+ ## 6. Independent, rubric-driven evaluation
54
+ Score work on explicit axes with credit tiers (0 / 0.5 / 1.0), a mandatory **issue code** plus
55
+ evidence whenever a score is < full, and a deterministic aggregator (never hand-edited JSON). This
56
+ is how you tell whether a prompt change made the agent better or worse — and it is how you evaluate
57
+ **the harness itself**.
58
+ **Artifact:** `.agentrig/eval/` (RUBRIC.md, checks.json, scenarios, score.mjs, static-audit.mjs)
59
+ and the `harness-eval` skill.
60
+
61
+ ## 7. Hermetic per-agent environments
62
+ Each concurrent agent runs in its **own git worktree** so developers, reviewers, and judges never
63
+ trip over each other's working trees or lockfiles. A repair script prunes stale worktree metadata
64
+ before every add. Isolation is a hard prerequisite for multi-agent throughput.
65
+ **Artifact:** `scripts/repair-worktrees.sh` + worktree guidance in the wiki.
66
+
67
+ ## 8. Continuous self-improvement: every mistake is a prompt bug
68
+ Agents log new gotchas to a tiered memory (central committed wiki → local git-ignored wiki →
69
+ session scratch). A `skill-improver` turns reviewer feedback into instruction-surface changes that
70
+ must pass a **prevention test** ("would this new wording have changed the original failure?").
71
+ Strict admission tests stop duplication from killing the wiki.
72
+ **Artifact:** `.agents/wiki/` + `.agents/skills/skill-improver/SKILL.md`.
73
+
74
+ ## 9. Human-in-the-loop where reversibility is low
75
+ Low-reversibility actions are recommend-then-apply: the agent surfaces proposed changes and waits
76
+ for explicit `apply`/`approve`/`skip`. Certain labels are **human-only gates** the agent must never
77
+ apply or even name. These are deliberate trust boundaries, not friction.
78
+ **Artifact:** human-gate declarations in the state machine + rules.
79
+
80
+ ## 10. Hard limits and safety nets
81
+ Set `max_review_iterations`, `max_diff_chars`, a token `runaway_cap`, and `pre_pr`/`pre_merge`
82
+ hooks. Protected files require a human-override label. A recovery scan re-queues anything stuck too
83
+ long. These caps keep an agent pool from melting the repo.
84
+ **Artifact:** `limits:` block in `.agentrig/harness/state-machine.yml`.
85
+
86
+ ## 11. One canonical source, projected to every agent surface (local + remote)
87
+ The harness keeps **one** source of truth (`AGENTS.md` + `.agents/rules/` + `.agents/skills/`) and
88
+ **projects** it into each ecosystem's native discovery format so *any* agent benefits without
89
+ lock-in — local CLIs **and** remote/cloud agents:
90
+ - **GitHub Copilot (remote coding agent + IDE):** `.github/copilot-instructions.md`,
91
+ path-scoped `.github/instructions/*.instructions.md` (`applyTo` globs), and
92
+ `.github/workflows/copilot-setup-steps.yml` for the cloud agent's environment.
93
+ - **Claude Code:** `CLAUDE.md`. **Cursor:** `.cursor/rules/*.mdc`. **OpenCode/Codex:** `AGENTS.md`.
94
+ - **MCP** mirrored to each surface (`.mcp.json`, `.vscode/mcp.json`, `.github/copilot/mcp.json`).
95
+
96
+ This is the meta-harness payoff: assign an issue to the web GitHub Copilot agent and it sees the same
97
+ rules/setup/MCP as your local Copilot CLI, Claude Code, or Cursor. Projections regenerate from the
98
+ source; never hand-edit the generated files.
99
+ **Artifact:** the compiler (`agentrig compile`) + the projected files above; symlinked vendor dirs
100
+ for skills.
101
+
102
+ ## 12. Instructions are the source of truth, not existing code
103
+ A short, unmissable **Critical Rules** block at the top of `AGENTS.md` beats a 50-page contributing
104
+ guide. Pair it with package-local AGENTS.md, golden-principles docs, and a directory map so an
105
+ agent can answer "what should I do?" without spelunking. Legacy code is not the spec.
106
+ **Artifact:** root `AGENTS.md` with a `Critical Rules` section + repo context.
@@ -0,0 +1,247 @@
1
+ {
2
+ "$schema": "agentrig-manifest/1",
3
+ "knowledgeVersion": "0.3.3",
4
+ "description": "Declares which best-practice artifacts AgentRig installs into a target repo and where. `src` is relative to the knowledge/ root; `dest` is relative to the target repo root. `kind`: file | dir | template. Templates contain {{PLACEHOLDERS}} the agent fills from its investigation; deterministic installs substitute known values and leave the rest for the agent.",
5
+ "artifacts": [
6
+ {
7
+ "id": "agents-md",
8
+ "principle": 12,
9
+ "src": "templates/AGENTS.md",
10
+ "dest": "AGENTS.md",
11
+ "kind": "template",
12
+ "merge": "markers"
13
+ },
14
+ {
15
+ "id": "principles",
16
+ "principle": 12,
17
+ "src": "PRINCIPLES.md",
18
+ "dest": ".agentrig/PRINCIPLES.md",
19
+ "kind": "file"
20
+ },
21
+ {
22
+ "id": "agents-package-example",
23
+ "principle": 12,
24
+ "src": "templates/AGENTS.package.example.md",
25
+ "dest": ".agentrig/AGENTS.package.example.md",
26
+ "kind": "file"
27
+ },
28
+ {
29
+ "id": "state-machine",
30
+ "principle": 1,
31
+ "src": "templates/harness/state-machine.yml",
32
+ "dest": ".agentrig/harness/state-machine.yml",
33
+ "kind": "file",
34
+ "refresh": "preserve"
35
+ },
36
+ {
37
+ "id": "orchestration",
38
+ "principle": 1,
39
+ "src": "templates/harness/ORCHESTRATION.md",
40
+ "dest": ".agentrig/harness/ORCHESTRATION.md",
41
+ "kind": "file"
42
+ },
43
+ {
44
+ "id": "role-developer-yml",
45
+ "principle": 2,
46
+ "src": "templates/agents/developer.yml",
47
+ "dest": ".agentrig/agents/developer.yml",
48
+ "kind": "file",
49
+ "refresh": "preserve"
50
+ },
51
+ {
52
+ "id": "role-developer-md",
53
+ "principle": 2,
54
+ "src": "templates/agents/developer.md",
55
+ "dest": ".agentrig/agents/developer.md",
56
+ "kind": "file",
57
+ "refresh": "preserve"
58
+ },
59
+ {
60
+ "id": "role-reviewer-yml",
61
+ "principle": 2,
62
+ "src": "templates/agents/reviewer.yml",
63
+ "dest": ".agentrig/agents/reviewer.yml",
64
+ "kind": "file",
65
+ "refresh": "preserve"
66
+ },
67
+ {
68
+ "id": "role-reviewer-md",
69
+ "principle": 2,
70
+ "src": "templates/agents/reviewer.md",
71
+ "dest": ".agentrig/agents/reviewer.md",
72
+ "kind": "file",
73
+ "refresh": "preserve"
74
+ },
75
+ {
76
+ "id": "role-judge-yml",
77
+ "principle": 2,
78
+ "src": "templates/agents/judge.yml",
79
+ "dest": ".agentrig/agents/judge.yml",
80
+ "kind": "file",
81
+ "refresh": "preserve"
82
+ },
83
+ {
84
+ "id": "role-judge-md",
85
+ "principle": 2,
86
+ "src": "templates/agents/judge.md",
87
+ "dest": ".agentrig/agents/judge.md",
88
+ "kind": "file",
89
+ "refresh": "preserve"
90
+ },
91
+ {
92
+ "id": "role-triager-yml",
93
+ "principle": 2,
94
+ "src": "templates/agents/triager.yml",
95
+ "dest": ".agentrig/agents/triager.yml",
96
+ "kind": "file",
97
+ "refresh": "preserve"
98
+ },
99
+ {
100
+ "id": "role-triager-md",
101
+ "principle": 2,
102
+ "src": "templates/agents/triager.md",
103
+ "dest": ".agentrig/agents/triager.md",
104
+ "kind": "file",
105
+ "refresh": "preserve"
106
+ },
107
+ {
108
+ "id": "roles-readme",
109
+ "principle": 2,
110
+ "src": "templates/agents/README.md",
111
+ "dest": ".agentrig/agents/README.md",
112
+ "kind": "file",
113
+ "refresh": "preserve"
114
+ },
115
+ {
116
+ "id": "skill-self-verify",
117
+ "principle": 5,
118
+ "src": "templates/skills/self-verify",
119
+ "dest": ".agents/skills/self-verify",
120
+ "kind": "dir"
121
+ },
122
+ {
123
+ "id": "skill-fix-ci",
124
+ "principle": 4,
125
+ "src": "templates/skills/fix-ci",
126
+ "dest": ".agents/skills/fix-ci",
127
+ "kind": "dir"
128
+ },
129
+ {
130
+ "id": "skill-skill-improver",
131
+ "principle": 8,
132
+ "src": "templates/skills/skill-improver",
133
+ "dest": ".agents/skills/skill-improver",
134
+ "kind": "dir"
135
+ },
136
+ {
137
+ "id": "skill-harness-eval",
138
+ "principle": 6,
139
+ "src": "templates/skills/harness-eval",
140
+ "dest": ".agents/skills/harness-eval",
141
+ "kind": "dir"
142
+ },
143
+ {
144
+ "id": "skill-verify-loop",
145
+ "principle": 5,
146
+ "src": "templates/skills/verify-loop",
147
+ "dest": ".agents/skills/verify-loop",
148
+ "kind": "dir"
149
+ },
150
+ {
151
+ "id": "skill-authoring",
152
+ "principle": 4,
153
+ "src": "templates/skills/skill-authoring",
154
+ "dest": ".agents/skills/skill-authoring",
155
+ "kind": "dir"
156
+ },
157
+ {
158
+ "id": "rules",
159
+ "principle": 4,
160
+ "src": "templates/rules",
161
+ "dest": ".agents/rules",
162
+ "kind": "dir"
163
+ },
164
+ {
165
+ "id": "wiki",
166
+ "principle": 8,
167
+ "src": "templates/wiki",
168
+ "dest": ".agents/wiki",
169
+ "kind": "dir"
170
+ },
171
+ {
172
+ "id": "mcp",
173
+ "principle": 11,
174
+ "src": "templates/mcp/mcp.json",
175
+ "dest": ".mcp.json",
176
+ "kind": "file",
177
+ "refresh": "preserve"
178
+ },
179
+ {
180
+ "id": "worktree-script",
181
+ "principle": 7,
182
+ "src": "templates/scripts/repair-worktrees.sh",
183
+ "dest": "scripts/repair-worktrees.sh",
184
+ "kind": "file",
185
+ "mode": "0755"
186
+ },
187
+ {
188
+ "id": "eval-rubric",
189
+ "principle": 6,
190
+ "src": "templates/eval/RUBRIC.md",
191
+ "dest": ".agentrig/eval/RUBRIC.md",
192
+ "kind": "file"
193
+ },
194
+ {
195
+ "id": "eval-axes",
196
+ "principle": 6,
197
+ "src": "templates/eval/axes.json",
198
+ "dest": ".agentrig/eval/axes.json",
199
+ "kind": "file"
200
+ },
201
+ {
202
+ "id": "eval-checks",
203
+ "principle": 6,
204
+ "src": "templates/eval/checks.json",
205
+ "dest": ".agentrig/eval/checks.json",
206
+ "kind": "file"
207
+ },
208
+ {
209
+ "id": "eval-static-audit",
210
+ "principle": 6,
211
+ "src": "templates/eval/static-audit.mjs",
212
+ "dest": ".agentrig/eval/static-audit.mjs",
213
+ "kind": "file",
214
+ "mode": "0755"
215
+ },
216
+ {
217
+ "id": "eval-score",
218
+ "principle": 6,
219
+ "src": "templates/eval/score.mjs",
220
+ "dest": ".agentrig/eval/score.mjs",
221
+ "kind": "file",
222
+ "mode": "0755"
223
+ },
224
+ {
225
+ "id": "eval-scenarios",
226
+ "principle": 6,
227
+ "src": "templates/eval/scenarios",
228
+ "dest": ".agentrig/eval/scenarios",
229
+ "kind": "dir"
230
+ },
231
+ {
232
+ "id": "eval-sandbox",
233
+ "principle": 6,
234
+ "src": "templates/eval/sandbox",
235
+ "dest": ".agentrig/eval/sandbox",
236
+ "kind": "dir"
237
+ },
238
+ {
239
+ "id": "dashboard",
240
+ "principle": 3,
241
+ "src": "templates/dashboard/dashboard.mjs",
242
+ "dest": ".agentrig/dashboard/dashboard.mjs",
243
+ "kind": "file",
244
+ "mode": "0755"
245
+ }
246
+ ]
247
+ }
@@ -0,0 +1,66 @@
1
+ # {{REPO_NAME}} — Agent instructions
2
+
3
+ > Managed in part by [AgentRig](https://github.com/). Sections between AgentRig markers are
4
+ > refreshed by `agentrig update`; edit outside the markers (and the repo-specific context) freely.
5
+
6
+ ## Critical Rules (read first, every time)
7
+ <!-- AGENTRIG:critical-rules:start -->
8
+ 1. **Instructions are the source of truth, not existing code.** This repo may contain legacy
9
+ patterns that predate current standards. When code and these instructions disagree, follow the
10
+ instructions and flag the discrepancy.
11
+ 2. **Self-verify before handoff.** Run the project's build/test/lint and the `self-verify` skill
12
+ before you mark work ready. Never hand a red build to a reviewer.
13
+ 3. **Never skip a state-machine gate** (`.agentrig/harness/state-machine.yml`) and never apply a
14
+ human-only label. Low-reversibility actions are recommend-then-apply.
15
+ 4. **Respect hard limits** (diff size, review iterations, token cap) declared in the state machine.
16
+ 5. **Every mistake is a prompt bug.** When you hit a gotcha, record it in `.agents/wiki/` and, if a
17
+ skill or rule should have prevented it, run `skill-improver`.
18
+ <!-- AGENTRIG:critical-rules:end -->
19
+
20
+ ## What this repository is
21
+ <!-- AGENTRIG:context:start -->
22
+ {{REPO_SUMMARY}}
23
+
24
+ See `.agentrig/context.md` for the full, agent-authored investigation of this repository.
25
+ <!-- AGENTRIG:context:end -->
26
+
27
+ ## How to build, test, and lint
28
+ <!-- AGENTRIG:commands:start -->
29
+ - **Install:** `{{INSTALL_CMD}}`
30
+ - **Build:** `{{BUILD_CMD}}`
31
+ - **Test:** `{{TEST_CMD}}`
32
+ - **Lint:** `{{LINT_CMD}}`
33
+ <!-- AGENTRIG:commands:end -->
34
+
35
+ ## Directory map
36
+ <!-- AGENTRIG:dirmap:start -->
37
+ {{DIRECTORY_MAP}}
38
+ <!-- AGENTRIG:dirmap:end -->
39
+
40
+ ## The harness
41
+ <!-- AGENTRIG:harness:start -->
42
+ - **Workflow / state machine:** `.agentrig/harness/state-machine.yml`
43
+ - **Agent roles & models:** `.agentrig/agents/` (triager, developer, reviewer, judge — each on a
44
+ varied model; reviewer differs from developer on purpose). See `.agentrig/agents/README.md` to add
45
+ new agent types.
46
+ - **Skills (procedural memory):** `.agents/skills/`
47
+ <!-- AGENTRIG:skills-inventory:start -->
48
+ {{SKILLS_INVENTORY}}
49
+ <!-- AGENTRIG:skills-inventory:end -->
50
+ - **Rules (reflexes, glob-scoped):** `.agents/rules/`
51
+ - **Memory / wiki:** `.agents/wiki/` (see `index.md` for what belongs where)
52
+ - **Tooling (MCP):** `.mcp.json`
53
+ - **Agent surfaces (compiled):** `agentrig compile` projects this file + `.agents/rules/` into every
54
+ agent's native format — `.github/copilot-instructions.md` & `.github/instructions/` (Copilot, web +
55
+ IDE), `CLAUDE.md` (Claude Code), `.cursor/rules/` (Cursor), `.vscode/mcp.json`, and
56
+ `.github/workflows/copilot-setup-steps.yml`. Edit the source here, not the generated files.
57
+ - **Surfaces:** `.claude` / `.copilot` / `.opencode` / `.codex` symlink to `.agents` so any vendor CLI
58
+ sees the same skills/rules/wiki.
59
+ - **Orchestration contract:** `.agentrig/harness/ORCHESTRATION.md`
60
+ - **Dashboard:** `agentrig dashboard` (or `node .agentrig/dashboard/dashboard.mjs`) — agent roster,
61
+ live GitHub tasks per harness label, harness score, and eval status. `--html` for a web view.
62
+ - **Evaluate the harness itself:** `agentrig eval --static` or `node .agentrig/eval/static-audit.mjs`;
63
+ see `.agentrig/eval/RUBRIC.md`.
64
+ - **Package-local instructions:** drop an `AGENTS.md` in a subpackage to add scope-specific rules;
65
+ it augments this root file. See `.agentrig/AGENTS.package.example.md`.
66
+ <!-- AGENTRIG:harness:end -->
@@ -0,0 +1,19 @@
1
+ # <package name> — Agent instructions (package-local)
2
+
3
+ > Drop a file like this at the root of a subpackage/subtree. It **augments** the repo-root
4
+ > `AGENTS.md` with scope-specific guidance; it does not replace the root Critical Rules.
5
+
6
+ ## Scope
7
+ Applies to everything under this directory.
8
+
9
+ ## What this package is
10
+ One or two sentences: purpose, public surface, who depends on it.
11
+
12
+ ## Local rules
13
+ - Build/test/lint commands specific to this package, if they differ from the root.
14
+ - Conventions that only apply here (naming, layering, allowed dependencies).
15
+ - Files/areas to treat as protected or generated.
16
+
17
+ ## Pointers
18
+ - Root policy: `/AGENTS.md`
19
+ - Path-scoped reflexes: add a glob-scoped rule under `.agents/rules/` instead of repeating it here.
@@ -0,0 +1,33 @@
1
+ # Agent roles (principle 2 — specialize roles, vary models)
2
+
3
+ The harness routes each state of the workflow to a **specialized agent type**, each with its own
4
+ short prompt and its own model. Running different roles on **different models** is deliberate:
5
+ single-model-bias mitigation surfaces problems no single model would catch alone.
6
+
7
+ ## Roster (installed by default)
8
+
9
+ | Role | File | Default model | Drives state |
10
+ |------|------|---------------|--------------|
11
+ | **triager** | `triager.{yml,md}` | `gpt-5-mini` (low) | `ingested → queued` |
12
+ | **developer**| `developer.{yml,md}`| `claude-sonnet-4.5` (high)| `queued → implementing → reviewing` |
13
+ | **reviewer** | `reviewer.{yml,md}` | `gpt-5` (high) | `reviewing` |
14
+ | **judge** | `judge.{yml,md}` | `claude-opus-4.5` (high) | `judging → ready_to_merge` |
15
+
16
+ > Keep the **reviewer on a different model family than the developer**. The audit
17
+ > (`agentrig eval --static`) checks for this.
18
+
19
+ ## Each role has two files
20
+ - `<role>.yml` — declarative config: `role`, `model`, `model_tier`, `allowed_tools`, and the
21
+ `prompt` path. Skills are auto-discovered from `.agents/skills/`, so no skill list is needed.
22
+ - `<role>.md` — the role's short prompt (keep it to a few imperative lines).
23
+
24
+ ## Adding a new agent type
25
+ 1. Create `agents/<role>.yml` and `agents/<role>.md`. Pick a model that differs from adjacent roles
26
+ in the pipeline.
27
+ 2. Wire the role into `.agentrig/harness/state-machine.yml` by giving a transition
28
+ `trigger: agent` and `role: <role>`.
29
+ 3. If the role needs a new procedure, add a skill under `.agents/skills/`.
30
+
31
+ Example roles you might add: `designer` (visual/UX work), `security-reviewer`, `release-manager`,
32
+ `docs-writer`. The pipeline is yours to extend — the state machine is the contract that keeps it
33
+ coherent.
@@ -0,0 +1,7 @@
1
+ You are the **developer**. Implement the smallest correct change that fully satisfies the task.
2
+
3
+ - Follow `AGENTS.md` Critical Rules and the glob-scoped rules in `.agents/rules/`.
4
+ - Run `self-verify` (build + test + lint) before requesting review. Iterate up to 3 times; if still
5
+ red, self-park with a clear note rather than handing a broken diff to the reviewer.
6
+ - Keep the diff under the `max_diff_chars` limit. Split work if it grows larger.
7
+ - Log any new gotcha to `.agents/wiki/`.
@@ -0,0 +1,7 @@
1
+ # Developer role (principle 2). Implements the change in the `implementing` state.
2
+ role: developer
3
+ model: claude-sonnet-4.5
4
+ model_tier: high
5
+ # Skills are auto-discovered from .agents/skills; no explicit list needed.
6
+ allowed_tools: [read, write, edit, bash, grep, glob]
7
+ prompt: agents/developer.md
@@ -0,0 +1,6 @@
1
+ You are the **judge**. Score the completed work against `.agentrig/eval/RUBRIC.md`.
2
+
3
+ - Use credit tiers 0 / 0.5 / 1.0. Any score < 1.0 REQUIRES an issue code and one line of evidence.
4
+ - Save results with `.agentrig/eval/score.mjs` (never hand-edit the JSON).
5
+ - Pass to `ready_to_merge` only if the aggregate clears the threshold; otherwise return to
6
+ `implementing` with the failing axes.
@@ -0,0 +1,6 @@
1
+ # Judge role (principle 2, 6). Independent, rubric-driven scoring before merge.
2
+ role: judge
3
+ model: claude-opus-4.5
4
+ model_tier: high
5
+ allowed_tools: [read, grep, glob, bash]
6
+ prompt: agents/judge.md
@@ -0,0 +1,6 @@
1
+ You are the **reviewer**, running a different model than the developer on purpose.
2
+
3
+ - Review only the diff. Surface bugs, security issues, and logic errors — never style nits.
4
+ - Score with confidence per category; refuse to surface low-signal comments.
5
+ - If you request changes, return to `implementing` with a concrete, testable reason.
6
+ - You may not apply human-only labels (see the state machine).
@@ -0,0 +1,7 @@
1
+ # Reviewer role (principle 2). Deliberately a DIFFERENT model family than the developer
2
+ # to mitigate single-model bias — divergent verdicts surface problems neither model alone catches.
3
+ role: reviewer
4
+ model: gpt-5
5
+ model_tier: high
6
+ allowed_tools: [read, grep, glob, bash]
7
+ prompt: agents/reviewer.md
@@ -0,0 +1,8 @@
1
+ You are the **triager**. Turn a freshly `ingested` task into a well-formed `queued` one.
2
+
3
+ - Read the issue/task and the repo context (`.agentrig/context.md`).
4
+ - Recommend labels, an assignee/role, and a size estimate. Surface them as a proposal table and
5
+ **wait for explicit apply/approve** — never apply human-only labels (see the state machine).
6
+ - Confirm the task is actionable (clear acceptance criteria). If not, ask for clarification rather
7
+ than queueing ambiguous work.
8
+ - When approved, move the task to `queued`.
@@ -0,0 +1,8 @@
1
+ # Triager role (principle 2, 9). Moves `ingested` tasks to `queued`: recommend labels/assignees,
2
+ # size the work, and gate on human approval for low-reversibility calls. Uses a fast, cheap model
3
+ # on purpose — triage is high-volume and should not burn a premium tier.
4
+ role: triager
5
+ model: gpt-5-mini
6
+ model_tier: low
7
+ allowed_tools: [read, grep, glob, bash]
8
+ prompt: agents/triager.md