@seanyao/roll 0.5.0 → 2.602.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (181) hide show
  1. package/CHANGELOG.md +736 -0
  2. package/LICENSE +21 -0
  3. package/README.md +65 -165
  4. package/bin/dream-test-quality-scan +110 -0
  5. package/bin/roll +15030 -814
  6. package/conventions/config.yaml +17 -1
  7. package/conventions/global/AGENTS.md +146 -100
  8. package/conventions/global/CLAUDE.md +1 -21
  9. package/conventions/global/GEMINI.md +8 -22
  10. package/conventions/global/project_rules.md +9 -0
  11. package/conventions/templates/backend-service/AGENTS.md +30 -81
  12. package/conventions/templates/backend-service/GEMINI.md +3 -3
  13. package/conventions/templates/backend-service/project_rules.md +16 -0
  14. package/conventions/templates/cli/AGENTS.md +31 -58
  15. package/conventions/templates/cli/CLAUDE.md +3 -5
  16. package/conventions/templates/cli/GEMINI.md +3 -3
  17. package/conventions/templates/cli/project_rules.md +16 -0
  18. package/conventions/templates/frontend-only/AGENTS.md +29 -64
  19. package/conventions/templates/frontend-only/GEMINI.md +3 -3
  20. package/conventions/templates/frontend-only/project_rules.md +14 -0
  21. package/conventions/templates/fullstack/AGENTS.md +31 -79
  22. package/conventions/templates/fullstack/CLAUDE.md +1 -1
  23. package/conventions/templates/fullstack/GEMINI.md +3 -3
  24. package/conventions/templates/fullstack/project_rules.md +15 -0
  25. package/lib/README.md +42 -0
  26. package/lib/__pycache__/github_sync.cpython-314.pyc +0 -0
  27. package/lib/__pycache__/loop-fmt.cpython-314.pyc +0 -0
  28. package/lib/__pycache__/loop_result_eval.cpython-314.pyc +0 -0
  29. package/lib/__pycache__/loop_unstick.cpython-314.pyc +0 -0
  30. package/lib/__pycache__/model_prices.cpython-314.pyc +0 -0
  31. package/lib/__pycache__/prices_fetcher.cpython-314.pyc +0 -0
  32. package/lib/__pycache__/roll-home.cpython-314.pyc +0 -0
  33. package/lib/__pycache__/roll-loop-status.cpython-314.pyc +0 -0
  34. package/lib/__pycache__/roll_git.cpython-314.pyc +0 -0
  35. package/lib/__pycache__/roll_render.cpython-314.pyc +0 -0
  36. package/lib/__pycache__/slides-render.cpython-314.pyc +0 -0
  37. package/lib/agent_usage/README.md +49 -0
  38. package/lib/agent_usage/__init__.py +108 -0
  39. package/lib/agent_usage/__pycache__/__init__.cpython-314.pyc +0 -0
  40. package/lib/agent_usage/__pycache__/gemini.cpython-314.pyc +0 -0
  41. package/lib/agent_usage/__pycache__/kimi.cpython-314.pyc +0 -0
  42. package/lib/agent_usage/__pycache__/openai.cpython-314.pyc +0 -0
  43. package/lib/agent_usage/__pycache__/pi.cpython-314.pyc +0 -0
  44. package/lib/agent_usage/__pycache__/pi_emit.cpython-314.pyc +0 -0
  45. package/lib/agent_usage/__pycache__/qwen.cpython-314.pyc +0 -0
  46. package/lib/agent_usage/gemini.py +127 -0
  47. package/lib/agent_usage/kimi.py +278 -0
  48. package/lib/agent_usage/kimi_emit.py +123 -0
  49. package/lib/agent_usage/openai.py +126 -0
  50. package/lib/agent_usage/pi.py +200 -0
  51. package/lib/agent_usage/pi_emit.py +135 -0
  52. package/lib/agent_usage/qwen.py +128 -0
  53. package/lib/backfill-pi-usage.py +243 -0
  54. package/lib/changelog_audit.py +155 -0
  55. package/lib/changelog_generate.py +263 -0
  56. package/lib/context_feed_budget.sh +194 -0
  57. package/lib/github_sync.py +876 -0
  58. package/lib/i18n/README.md +54 -0
  59. package/lib/i18n/agent.sh +75 -0
  60. package/lib/i18n/alert.sh +20 -0
  61. package/lib/i18n/backlog.sh +96 -0
  62. package/lib/i18n/brief.sh +5 -0
  63. package/lib/i18n/changelog.sh +5 -0
  64. package/lib/i18n/ci.sh +15 -0
  65. package/lib/i18n/debug.sh +0 -0
  66. package/lib/i18n/doctor.sh +44 -0
  67. package/lib/i18n/dream.sh +0 -0
  68. package/lib/i18n/init.sh +91 -0
  69. package/lib/i18n/lang.sh +10 -0
  70. package/lib/i18n/loop.sh +140 -0
  71. package/lib/i18n/migrate.sh +74 -0
  72. package/lib/i18n/offboard.sh +31 -0
  73. package/lib/i18n/onboard.sh +0 -0
  74. package/lib/i18n/peer.sh +41 -0
  75. package/lib/i18n/peer_help.sh +25 -0
  76. package/lib/i18n/peer_reset.sh +7 -0
  77. package/lib/i18n/peer_status.sh +5 -0
  78. package/lib/i18n/prices.sh +3 -0
  79. package/lib/i18n/prices_refresh.sh +17 -0
  80. package/lib/i18n/prices_show.sh +7 -0
  81. package/lib/i18n/propose.sh +0 -0
  82. package/lib/i18n/release.sh +0 -0
  83. package/lib/i18n/research.sh +0 -0
  84. package/lib/i18n/review_pr.sh +0 -0
  85. package/lib/i18n/sentinel.sh +0 -0
  86. package/lib/i18n/setup.sh +3 -0
  87. package/lib/i18n/shared.sh +157 -0
  88. package/lib/i18n/skills/roll-brief.sh +47 -0
  89. package/lib/i18n/skills/roll-build.sh +97 -0
  90. package/lib/i18n/skills/roll-design.sh +18 -0
  91. package/lib/i18n/skills/roll-fix.sh +53 -0
  92. package/lib/i18n/skills/roll-loop.sh +28 -0
  93. package/lib/i18n/skills/roll-onboard.sh +33 -0
  94. package/lib/i18n/skills_catalog.sh +30 -0
  95. package/lib/i18n/slides.sh +3 -0
  96. package/lib/i18n/slides_build.sh +38 -0
  97. package/lib/i18n/slides_delete.sh +19 -0
  98. package/lib/i18n/slides_list.sh +14 -0
  99. package/lib/i18n/slides_logs.sh +12 -0
  100. package/lib/i18n/slides_new.sh +15 -0
  101. package/lib/i18n/slides_preview.sh +14 -0
  102. package/lib/i18n/slides_templates.sh +7 -0
  103. package/lib/i18n/status.sh +21 -0
  104. package/lib/i18n/update.sh +24 -0
  105. package/lib/i18n.sh +211 -0
  106. package/lib/loop-exit-summary.py +393 -0
  107. package/lib/loop-fmt.py +589 -0
  108. package/lib/loop_pick_agent.py +316 -0
  109. package/lib/loop_result_eval.py +469 -0
  110. package/lib/loop_unstick.py +180 -0
  111. package/lib/model_prices.py +194 -0
  112. package/lib/prices/README.md +35 -0
  113. package/lib/prices/snapshot-2026-05-22.json +22 -0
  114. package/lib/prices/snapshot-2026-05-23-deepseek.json +15 -0
  115. package/lib/prices/snapshot-2026-05-23-kimi.json +15 -0
  116. package/lib/prices_fetcher.py +285 -0
  117. package/lib/roll-backlog.py +225 -0
  118. package/lib/roll-brief.py +286 -0
  119. package/lib/roll-help.py +158 -0
  120. package/lib/roll-home.py +556 -0
  121. package/lib/roll-init.py +156 -0
  122. package/lib/roll-loop-status.py +1683 -0
  123. package/lib/roll-loop-story.py +191 -0
  124. package/lib/roll-onboard-render.py +378 -0
  125. package/lib/roll-peer.py +252 -0
  126. package/lib/roll-plan-validate.py +386 -0
  127. package/lib/roll-setup.py +102 -0
  128. package/lib/roll-status.py +367 -0
  129. package/lib/roll_git.py +41 -0
  130. package/lib/roll_render.py +414 -0
  131. package/lib/slides/components/README.md +123 -0
  132. package/lib/slides/components/cards-2.html +9 -0
  133. package/lib/slides/components/cards-3.html +9 -0
  134. package/lib/slides/components/cards-4.html +9 -0
  135. package/lib/slides/components/compare.html +22 -0
  136. package/lib/slides/components/highlight.html +9 -0
  137. package/lib/slides/components/pipeline.html +12 -0
  138. package/lib/slides/components/plain.html +7 -0
  139. package/lib/slides/components/quote.html +4 -0
  140. package/lib/slides/components/timeline.html +9 -0
  141. package/lib/slides/templates/introduction-v3.html +571 -0
  142. package/lib/slides/templates/pitch.html +0 -0
  143. package/lib/slides-render.py +778 -0
  144. package/lib/slides-validate.py +357 -0
  145. package/lib/test_quality_gate.py +143 -0
  146. package/package.json +8 -7
  147. package/skills/roll-.changelog/SKILL.md +406 -33
  148. package/skills/roll-.clarify/SKILL.md +5 -2
  149. package/skills/roll-.dream/SKILL.md +374 -0
  150. package/skills/roll-.echo/SKILL.md +5 -2
  151. package/skills/roll-.qa/SKILL.md +57 -3
  152. package/skills/roll-.review/SKILL.md +42 -3
  153. package/skills/roll-brief/SKILL.md +209 -0
  154. package/skills/roll-build/SKILL.md +308 -63
  155. package/skills/roll-debug/SKILL.md +341 -162
  156. package/skills/roll-debug/injectable-bb.js +263 -0
  157. package/skills/roll-deck/SKILL.md +296 -0
  158. package/skills/roll-design/ENGINEERING_CHECKLIST.md +1 -1
  159. package/skills/roll-design/SKILL.md +733 -94
  160. package/skills/roll-doc/SKILL.md +595 -0
  161. package/skills/roll-doctor/SKILL.md +192 -0
  162. package/skills/roll-fix/SKILL.md +149 -32
  163. package/skills/{roll-jot → roll-idea}/SKILL.md +18 -10
  164. package/skills/roll-loop/SKILL.md +579 -0
  165. package/skills/roll-notes/SKILL.md +103 -0
  166. package/skills/roll-onboard/SKILL.md +234 -0
  167. package/skills/roll-peer/SKILL.md +336 -0
  168. package/skills/roll-propose/SKILL.md +157 -0
  169. package/skills/roll-review-pr/SKILL.md +58 -0
  170. package/skills/roll-sentinel/SKILL.md +11 -2
  171. package/skills/roll-spar/SKILL.md +8 -6
  172. package/template/.github/workflows/ci.yml +5 -2
  173. package/template/AGENTS.md +20 -74
  174. package/skills/roll-research/SKILL.md +0 -307
  175. package/skills/roll-research/references/schema.json +0 -162
  176. package/skills/roll-research/scripts/md_to_pdf.py +0 -289
  177. package/tools/roll-fetch/SKILL.md +0 -182
  178. package/tools/roll-fetch/package.json +0 -15
  179. package/tools/roll-fetch/smart-web-fetch.js +0 -558
  180. package/tools/roll-probe/SKILL.md +0 -84
  181. /package/template/{BACKLOG.md → .roll/backlog.md} +0 -0
@@ -0,0 +1,234 @@
1
+ ---
2
+ name: roll-onboard
3
+ license: MIT
4
+ description: Interactive onboarding for legacy projects. Reads existing code, understands the project, asks 9 questions in 3 groups (cognition / scope / privacy), and writes .roll/onboard-plan.yaml as the contract for `roll init --apply` to execute.
5
+ ---
6
+
7
+ # Roll Onboard
8
+
9
+ > Follows the Architecture Constraints, Development Discipline, and Engineering Common Sense defined in the project AGENTS.md.
10
+
11
+ Interactive onboarding flow for **legacy projects**: existing code that needs to adopt the Roll convention without disrupting how the team already works.
12
+
13
+ ## Trigger
14
+
15
+ This skill runs when:
16
+
17
+ - `roll init` detected a legacy project (≥10 source files, no `AGENTS.md`)
18
+ - The CLI told the user to open an AI agent and run `$roll-onboard`
19
+ - The user has now invoked you here
20
+
21
+ ## Hard responsibility boundary
22
+
23
+ You are the **认知 (cognition) layer**. Your job ends with writing a plan file.
24
+
25
+ | You do | You do NOT |
26
+ |--------|------------|
27
+ | Read project code, infer type/domains/modules | Modify any source file |
28
+ | Call `roll-doc --dry-run` to get a gap report | Call `roll-doc` (write mode) |
29
+ | Ask the user 9 questions across 3 groups | Decide for the user |
30
+ | Produce `.roll/onboard-plan.yaml` | Write `.gitignore` |
31
+ | Produce `.roll/onboard-plan.yaml` | Run `roll init --apply` |
32
+
33
+ Hard constraint: **AI cannot create files in the user's project other than `.roll/onboard-plan.yaml`.** Anything else is `bash`'s job (`roll init --apply`).
34
+
35
+ ## Inputs you must read
36
+
37
+ 1. The repository tree (use the project's own structure to infer technologies)
38
+ 2. Any existing `README.md` / `package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod` etc. as evidence
39
+ 3. `roll-doc --dry-run` output → identifies what documentation gaps exist
40
+ 4. The path-audit pattern: scan for legacy structure markers (`BACKLOG.md`, `docs/features/`, etc.) — if any are present, REFUSE and tell the user to run `roll migrate` first
41
+
42
+ ## Workflow
43
+
44
+ ### Step 0 — Pre-flight
45
+
46
+ 1. Check that you're in a legacy project root (no `AGENTS.md`, has source code)
47
+ 2. If `BACKLOG.md` or `docs/features/` already present → STOP, tell user to run `roll migrate` first (this is a partial-migration project, not legacy)
48
+ 3. Check `.roll/onboard-plan.yaml` doesn't already exist; if it does, ask user whether to overwrite
49
+
50
+ ### Step 1 — Read code, build understanding
51
+
52
+ Walk the repo. Identify:
53
+ - **type**: one of `backend-service` / `frontend-only` / `fullstack` / `cli`
54
+ - **description**: 1-2 sentence summary of what this project does
55
+ - **domains**: top business/technical domains (e.g., "auth", "billing", "search")
56
+ - **key_modules**: top 3-5 modules that hold most of the logic
57
+
58
+ ### Step 1b — Phase 2 analysis: business model, tech, tests (US-ONBOARD-016)
59
+
60
+ A single onboard now produces three structured analysis sections in the plan
61
+ (`domain_model`, `tech_analysis`, `test_assessment`). Build them here so Step 4
62
+ can serialise them.
63
+
64
+ **`domain_model`** — from the code you read, identify the bounded contexts. For
65
+ each: a `name`, its `aggregates` (the entities that own consistency), and its
66
+ `ubiquitous_language` (the domain terms the code/docs actually use). If you
67
+ genuinely cannot infer contexts from the code, emit an empty
68
+ `bounded_contexts: []` — do NOT invent contexts that aren't in the code.
69
+
70
+ **`tech_analysis`** — `stack` (languages/frameworks evidenced by manifests),
71
+ `dependencies` (from `package.json` / `pyproject.toml` / `go.mod` / `Cargo.toml`
72
+ etc.), `architecture_notes` (observed structure, not aspirational), and `risks`
73
+ (each a mapping with a `description`; optionally `severity: LOW|MEDIUM|HIGH` and
74
+ an `evidence: detected|inferred` tag).
75
+
76
+ **`test_assessment`** — this section is under a **hard anti-hallucination
77
+ constraint** (next sub-step). Do NOT write it from intuition.
78
+
79
+ #### The verifiable test scan (ANTI-HALLUCINATION HARD CONSTRAINT)
80
+
81
+ Every `test_assessment` claim must be backed by a real filesystem scan you run
82
+ here — never by "what a project like this usually needs". Run these probes and
83
+ record the raw counts/paths:
84
+
85
+ 1. **Count test files** by the conventional patterns:
86
+ - `*.test.*` / `*.spec.*` (JS/TS), `*_test.go` (Go), `test_*.py` / `*_test.py` (Python), `*_spec.rb` (Ruby), `*Test.java` (Java)
87
+ - e.g. `git ls-files | grep -cE '\.(test|spec)\.[jt]sx?$'` and similar per pattern
88
+ 2. **Probe for runner configs**: `jest.config.*`, `pytest.ini` (or `[tool.pytest]` in `pyproject.toml`), `.mocharc.*`, `vitest.config.*`, `karma.conf.*`, `phpunit.xml`, `go test` (implied by `*_test.go`)
89
+ 3. **Probe for a `coverage/` directory** (and `.coverage` / `coverage.xml` artifacts)
90
+
91
+ Then turn the raw findings into claims, each a **mapping** carrying a `claim`
92
+ string plus an `evidence` tag:
93
+
94
+ - `evidence: detected` — the scan directly found it (e.g. "42 `*.test.ts` files detected", "vitest.config.ts present", "coverage/ directory present").
95
+ - `evidence: inferred` — a judgement you derived FROM the detected facts (e.g. "unit layer present but no E2E config — integration coverage likely thin"). The inference must trace back to something the scan detected.
96
+
97
+ **"none detected" rule**: when a probe finds nothing, you MUST say so explicitly
98
+ with a tagged claim — `{claim: "none detected", evidence: detected}` (a scan that
99
+ ran and returned zero is a genuine `detected` finding). You must NOT silently
100
+ omit the dimension, and you must NOT invent generic filler like "needs more E2E
101
+ tests" / "consider adding integration tests" with no detected basis. The plan
102
+ validator (`lib/roll-plan-validate.py`) rejects any untagged free-text claim, so
103
+ filler will fail `roll init --apply`.
104
+
105
+ Map the findings into the three buckets:
106
+ - `current_layers`: what test layers actually exist (each tagged `detected`)
107
+ - `gaps`: dimensions where the scan found nothing (`none detected`, tagged `detected`) or thin coverage you can justify (`inferred`)
108
+ - `recommended_actions`: actions that trace to a detected gap (tag `inferred`); if nothing is missing, this bucket may be `[]`
109
+
110
+ ### Step 2 — Get gap report
111
+
112
+ Run `roll-doc --dry-run` (READ-ONLY mode). This reports:
113
+ - Which standard Roll artifacts (BACKLOG, features, domain models) are missing
114
+ - Which existing docs Roll could `include` rather than regenerate
115
+
116
+ ### Step 3 — Three groups of nine questions
117
+
118
+ Present these in chat. **Aim for total time ≤ 3 minutes.** Group 1 confirms your understanding; group 2 scopes the work; group 3 handles privacy and next steps.
119
+
120
+ **$(msg onboard.questions_group1)**
121
+
122
+ 1. $(msg onboard.q1 "[type]" "[description]")
123
+ 2. $(msg onboard.q2 "[domain A, domain B, …]")
124
+ 3. $(msg onboard.q3 "[X, Y, Z]")
125
+
126
+ **$(msg onboard.questions_group2)**
127
+
128
+ 4. $(msg onboard.q4) Multi-select:
129
+ - `backlog` — initial BACKLOG with seeded stories
130
+ - `features` — features index + per-feature spec stubs
131
+ - `domain` — DDD context map
132
+ - `briefs` — directory for `$roll-brief` outputs
133
+ 5. Of these existing docs, which should I `include` rather than regenerate?
134
+ - (list candidates: README.md, docs/architecture.md, etc.)
135
+ 6. Put drafts inside `.roll/`? (default: yes; "no" means use the legacy `docs/` layout — not recommended for new adoption)
136
+
137
+ **Group 3 — Privacy & next steps**
138
+
139
+ 7. Add `.roll/` to `.gitignore`? (yes = keep project management private; no = commit it like Roll itself does)
140
+ 8. Sync Roll conventions to which AI tools? Multi-select from detected agents (claude / cursor / codex / kimi / deepseek / pi / opencode / agy / trae)
141
+ 9. Enable `roll loop` autonomous execution after init?
142
+
143
+ ### Step 4 — Write plan file
144
+
145
+ Write `.roll/onboard-plan.yaml` with this exact schema (validated by `lib/roll-plan-validate.py`):
146
+
147
+ ```yaml
148
+ version: 1
149
+ generated_at: "2026-05-19T14:30:00+08:00" # current ISO 8601, your timezone OK
150
+
151
+ project_understanding:
152
+ type: cli # one of: backend-service / frontend-only / fullstack / cli
153
+ description: "..."
154
+ domains: [...]
155
+ key_modules: [...]
156
+
157
+ scope:
158
+ approved: [backlog, features, domain] # user's Q4 multi-select
159
+ declined: [briefs] # what they said no to
160
+
161
+ include_existing:
162
+ - README.md # user's Q5 selections
163
+ - docs/architecture.md
164
+
165
+ privacy:
166
+ gitignore_dot_roll: true # user's Q7
167
+
168
+ sync_targets: [claude, cursor] # user's Q8
169
+ enable_loop: false # user's Q9
170
+ agent_routes_template: default # user's Q10 — agent routing preset
171
+ # one of: default / minimal / heavy / skip
172
+ # default = pi/deepseek/claude + history (US-AGENT-002)
173
+ # minimal = single agent (pi), no history
174
+ # heavy = pi/deepseek/claude/kimi + larger window
175
+ # skip = don't seed .roll/agent-routes.yaml
176
+
177
+ # ── US-ONBOARD-016: Phase 2 analysis sections (optional, but emit all three) ──
178
+ # All three are validated only when present, so they are backward-compatible,
179
+ # but a normal onboard should produce them from Step 1b.
180
+
181
+ domain_model:
182
+ bounded_contexts: # [] if none can be inferred — never invent
183
+ - name: auth
184
+ aggregates: [User, Session]
185
+ ubiquitous_language: [login, token, refresh]
186
+
187
+ tech_analysis:
188
+ stack: [bash, python3] # evidenced by manifests
189
+ dependencies: [pyyaml] # from package.json / pyproject / go.mod / ...
190
+ architecture_notes: ["single-binary CLI + python helpers in lib/"]
191
+ risks:
192
+ - description: "no automated test run on macOS bash 3.2"
193
+ severity: HIGH # optional: LOW | MEDIUM | HIGH
194
+ evidence: detected # optional: detected | inferred
195
+
196
+ # test_assessment — ANTI-HALLUCINATION: every claim is a mapping with an
197
+ # `evidence` tag (detected | inferred). A zero-result scan is `none detected`
198
+ # tagged `detected`. Untagged free-text is REJECTED by the validator.
199
+ test_assessment:
200
+ current_layers:
201
+ - claim: "112 bats files detected under tests/" # evidence: a real scan count
202
+ evidence: detected
203
+ gaps:
204
+ - claim: "none detected" # e.g. no coverage/ dir found
205
+ evidence: detected
206
+ recommended_actions: # [] if nothing is missing
207
+ - claim: "add a macOS CI runner (inferred from launchd-only test skips)"
208
+ evidence: inferred # an inference traceable to a detected fact
209
+ ```
210
+
211
+ Then tell the user:
212
+
213
+ > Onboard conversation done. Plan saved to `.roll/onboard-plan.yaml`.
214
+ > Return to your terminal and run:
215
+ >
216
+ > roll init --apply
217
+ >
218
+ > The plan expires in 24 hours.
219
+
220
+ ### Step 5 — Stop
221
+
222
+ Do NOT run `roll init --apply` yourself. Do NOT modify other project files. Your job is done.
223
+
224
+ ## When NOT to use
225
+
226
+ - **Not a legacy project**: empty dir or fresh project → use plain `roll init` instead
227
+ - **Has BACKLOG.md or docs/features/**: this is a pre-2.0 Roll project → run `roll migrate` first
228
+ - **Has .roll/ already**: already onboarded → don't re-run
229
+
230
+ ## Failure modes
231
+
232
+ - User aborts mid-conversation → don't write partial plan; tell user to re-run from scratch
233
+ - User answers contradict the gap report (e.g., declines `features` but has lots of code) → ask the contradictory question once more before accepting; if they confirm, respect the choice
234
+ - You can't read enough code to fill `project_understanding` (e.g., binary repo) → write a placeholder plan but ask user to fill in `type` and `description` manually before applying
@@ -0,0 +1,336 @@
1
+ ---
2
+ name: roll-peer
3
+ license: MIT
4
+ allowed-tools: "Read, Bash, Write, Edit"
5
+ description: |
6
+ Cross-agent peer review skill. When a task enters a decision phase (planning,
7
+ high-risk, ambiguous, destructive), triggers a bidirectional negotiation with
8
+ another AI agent via a unified protocol. Up to 3 rounds. If consensus is not
9
+ reached, escalates to the human user. Includes adaptive peer routing based on
10
+ task type and historical success rate.
11
+ Trigger: /peer, "叫上 peer", "peer review", or auto-triggered at workflow gates.
12
+ ---
13
+
14
+ # Roll Peer (Cross-Agent Peer Review)
15
+
16
+ > Follows the Architecture Constraints, Development Discipline, and Engineering
17
+ > Common Sense defined in the project AGENTS.md.
18
+
19
+ ## Credits
20
+
21
+ Cross-agent consultation protocol inspired by
22
+ [friend-skill](https://github.com/fpyluck/friend-skill) (MIT) by fpyluck.
23
+ Independent implementation for the Roll toolchain.
24
+
25
+ ## Trigger
26
+
27
+ **Manual:**
28
+ - `/peer`
29
+ - "叫上 peer"
30
+ - "peer review 一下"
31
+ - "和 peer 商量"
32
+
33
+ **Auto-triggered (with 10s opt-out):**
34
+ - `roll-build` enters Plan Mode (executable plans / architecture decisions)
35
+ - `roll-spar` Attacker and Defender disagree
36
+ - High context pressure (large number of files read / tools executed)
37
+ - Destructive / irreversible operations (`rm -rf`, production deploy, global config changes)
38
+ - High-risk signal words ("重要 / 关键 / 别搞砸 / important / critical")
39
+ - Cross-repository / cross-toolchain / ambiguous permission boundaries
40
+
41
+ **Never trigger:**
42
+ - Single-file changes
43
+ - Clear, well-defined fixes
44
+ - Simple refactoring
45
+
46
+ ## Protocol: `[PEER_REVIEW]`
47
+
48
+ ### Marker Format
49
+
50
+ The marker **must** appear on the first non-empty line of the message:
51
+
52
+ ```markdown
53
+ [PEER_REVIEW round=N tool=<from>→<to>]
54
+ ```
55
+
56
+ - `round=N`: Current round number (1–3)
57
+ - `tool=<from>→<to>`: Direction of this message (e.g., `kimi→claude`)
58
+
59
+ ### Three-State Resolution + Escape
60
+
61
+ Allowed states only. No invented words.
62
+
63
+ - **AGREE**: Accept the current proposal. Proceed to execution.
64
+ - **REFINE**: Direction is correct, but specific changes are needed. Proceed to next round.
65
+ - **OBJECT**: The proposal is wrong. Provide an alternative. Proceed to next round.
66
+ - **ESCALATE**: Round 3 reached without AGREE, or a round fails due to API/token error. Hand off to the human user.
67
+
68
+ After each round decision, emit a `peer` event to the cycle event stream:
69
+
70
+ ```bash
71
+ # $round = current round number, $total = max rounds, $verdict = AGREE/REFINE/OBJECT/ESCALATE
72
+ # $agents = e.g. "claude→deepseek"
73
+ _loop_event peer "${round}/${total}" "$verdict" "$agents" 2>/dev/null || true
74
+ ```
75
+
76
+ If information is insufficient:
77
+ ```
78
+ REFINE: Need to confirm X/Y/Z with the user first.
79
+ ```
80
+
81
+ ### Context Handoff Card (required for round=1)
82
+
83
+ When the task involves a local project, the first message must include:
84
+
85
+ ```markdown
86
+ ## Project Handoff (round=1 required)
87
+ - Project root: <absolute path>
88
+ - Execution environment: <shell / container / devcontainer / remote / N/A>
89
+ - Project type: <language + framework>
90
+ - Virtual environment: <absolute path / conda env / container name / N/A>
91
+ - Activation command: <one-line executable string, or N/A>
92
+ - Key tool calls:
93
+ - test: <one-line command or N/A>
94
+ - build: <one-line command or N/A>
95
+ - run: <one-line command or N/A>
96
+ - lint: <one-line command or N/A>
97
+ - Key conventions / constraints: <2–3 items, or N/A>
98
+ - Related file pointers: <absolute paths or @references, or N/A>
99
+ ```
100
+
101
+ Rules:
102
+ - Paths must be **absolute**.
103
+ - Commands must be **one-line executable strings**, not descriptions.
104
+ - Prefer commands that do **not** require an activated environment (absolute interpreter paths, `uv run`, `docker compose exec`).
105
+ - Do not copy README text. List file pointers only.
106
+ - Never include secrets, tokens, credentials, or `.env` content.
107
+ - Even if logically a continuation, treat as round=1 if the peer has **no prior context**.
108
+ - **Do NOT** prefill the peer with your own root-cause analysis, proposed fix, or leading questions — see the *Independent Judgment Rule* below. The handoff card is for context, not conclusions.
109
+
110
+ ### Anti-Hallucination Rule
111
+
112
+ When mentioning specific paths, function names, commands, line numbers, or tool results, **must cite the source** ("I read X at line Y"). If unverified, state "unverified" explicitly.
113
+
114
+ ### Independent Judgment Rule (round=1)
115
+
116
+ The whole point of peer review is to surface a **second, independent** read. If
117
+ the reviewer's own root-cause analysis, fix diff, and leading questions are sent
118
+ to the peer up front, the peer can only AGREE inside the reviewer's frame — and
119
+ that AGREE carries no signal. The reviewer **must complete their own analysis
120
+ before opening round=1**; skipping that step turns peer review into a search for
121
+ endorsement.
122
+
123
+ Round=1 message **must NOT include**:
124
+
125
+ - The reviewer's own root-cause analysis ("the bug is in function X at line Y because…").
126
+ - The reviewer's own proposed fix, patch, or diff.
127
+ - Leading questions of the form "do you agree with my conclusion on X?" / "is the change I made on Y safe?" — these lock the peer into the reviewer's framing.
128
+ - Specific line numbers, function names, or branch points the reviewer has already identified as relevant — let the peer locate them.
129
+
130
+ Round=1 message **should include**:
131
+
132
+ - The Project Handoff Card (above).
133
+ - Symptoms exactly as observed: the user's reported error, terminal output verbatim, the precise commands that triggered it.
134
+ - Necessary external context: the goal of the work, the date / version under test, anything the peer cannot infer from the repo alone.
135
+ - Key file pointers as **entry points** (paths only — let the peer choose what to read and how deep).
136
+ - An open invitation: "independently identify the root cause, propose a fix, and call out any test gaps."
137
+
138
+ After receiving the peer's round=1 reply, the reviewer **compares** their own
139
+ conclusion to the peer's and routes the next action:
140
+
141
+ | Reviewer's own conclusion vs. peer's conclusion | Next action |
142
+ |---|---|
143
+ | Same root cause + same fix direction | High confidence — AGREE and proceed to execution |
144
+ | Same root cause, different fix direction | REFINE — open round=2 to reconcile the fix |
145
+ | Different root cause | OBJECT — open round=2; at least one of the two analyses is wrong |
146
+ | Peer asks for more context | REFINE — supply the missing context, then re-evaluate |
147
+
148
+ #### Example (bad — endorsement-seeking)
149
+
150
+ ```
151
+ Bug is in `cmd_init` at line 932 — the v2 demo renderer fires unconditionally.
152
+ My fix: gate it behind `--demo`. Q1: is this over-killed? Q2: should I
153
+ refactor the renderer instead? Q3: are the tests strong enough?
154
+ ```
155
+
156
+ The peer can only AGREE or quibble inside the reviewer's framing.
157
+
158
+ #### Example (good — independent analysis)
159
+
160
+ ```
161
+ Symptoms: user ran `roll init` on /path/X and saw [verbatim terminal output A];
162
+ then ran `roll backlog` and saw [verbatim terminal output B]. Project background:
163
+ [project shape]. Entry points: `bin/roll`, `lib/roll-init.py`, `tests/`.
164
+ Independently identify the root cause and propose a fix.
165
+ ```
166
+
167
+ The peer reads, locates, and proposes on its own. The reviewer then compares.
168
+
169
+ ## State Machine
170
+
171
+ ### Per Negotiation (Single Task)
172
+
173
+ ```
174
+ Running
175
+ ├── AGREE (any round) → Execute proposal
176
+ ├── Round == 3, no AGREE → ESCALATE (failed_max_rounds)
177
+ ├── API/token error → ESCALATE (failed_api_error)
178
+ └── User aborts → ESCALATE (user_abort)
179
+ ```
180
+
181
+ ### Per Peer Pair (e.g., kimi→claude)
182
+
183
+ Stored in `~/.roll/.peer-state/` (flat key files per pair):
184
+
185
+ ```yaml
186
+ kimi→claude:
187
+ status: active # active | degraded | abandoned
188
+ streak: 0 # consecutive failure count
189
+ last_outcome: agreed
190
+ history:
191
+ - { time: "2026-05-08T23:30:00+08:00", outcome: agreed, rounds: 2, tag: architecture }
192
+ ```
193
+
194
+ Rules:
195
+ - `streak >= 3` → automatically mark as `abandoned`
196
+ - `abandoned` peer pairs are skipped by the bridge script
197
+ - Human can reset via `roll peer reset <from> <to>` or `roll peer reset --all`
198
+ - If a peer pair is abandoned, the bridge falls back to the next candidate in the capability map
199
+
200
+ ## Peer Routing (Adaptive)
201
+
202
+ ### Capability Map (Task Type → Preferred Peer Order)
203
+
204
+ ```yaml
205
+ peer:
206
+ capability_map:
207
+ architecture: [claude, deepseek, kimi, pi]
208
+ security: [claude, deepseek, pi, kimi]
209
+ test: [codex, kimi, deepseek, claude]
210
+ refactor: [deepseek, kimi, claude, pi]
211
+ default: [deepseek, kimi, claude, pi]
212
+ ```
213
+
214
+ ### Adaptive Adjustment
215
+
216
+ After each negotiation, record:
217
+ - `outcome`: agreed / failed_max_rounds / failed_api_error
218
+ - `rounds`: number of rounds consumed
219
+ - `tag`: task type
220
+
221
+ If `streak` for a peer pair reaches the configured threshold (default: 3 consecutive failures), mark as `abandoned`. The next task of the same type will try the next candidate in `capability_map`.
222
+
223
+ ### Peer Detection
224
+
225
+ The bridge script detects installed peers via `command -v <tool>`. Only installed tools are considered. The current running tool is excluded (`exclude_self: true`).
226
+
227
+ For `deepseek`, also check if serve mode is available as a more reliable alternative:
228
+ ```bash
229
+ command -v deepseek && { deepseek serve --help 2>/dev/null; true; } | grep -q "\-\-http" && echo "serve_mode"
230
+ ```
231
+ If serve mode is available, prefer HTTP transport over direct CLI invocation.
232
+
233
+ ### Peer Invocation Reference
234
+
235
+ | Peer | Non-interactive command | Reliability | Notes |
236
+ |------|------------------------|-------------|-------|
237
+ | `claude` | `claude -p "<prompt>"` | ✅ Verified | Native, stable |
238
+ | `deepseek` | `deepseek "<prompt>"` | ✅ Verified | No TTY dependency |
239
+ | `deepseek` (serve) | `curl localhost:<port>/v1/...` | ✅ High | Start with `deepseek serve --http`; preferred over direct CLI |
240
+ | `kimi` | `kimi --quiet -p "<prompt>"` | ✅ Verified | `--quiet` is alias for `--print --output-format text --final-message-only`; prompt via `-p` |
241
+ | `pi` | `pi -p "<prompt>"` | ✅ Verified | Clean non-interactive output |
242
+ | `opencode` | `opencode run "<prompt>"` | ✅ Verified | Works non-interactively |
243
+ | `codex` | `codex exec "<prompt>"` | ⚠️ Auth required | Token must be valid; re-login with `codex login` if expired |
244
+
245
+ **CLI vs. API Key**: `claude`, `deepseek`, `kimi`, `codex` CLIs authenticate via existing subscription accounts — no separate API key required. This is the primary advantage of CLI transport over the MCP/HTTP approach.
246
+
247
+ ## Inline Display Mode (Manual Triggers)
248
+
249
+ When peer review is manually triggered by a human (via `/peer`, "叫上 peer", etc.), the executing agent **must display each round inline in the current conversation**. This applies regardless of which agent is executing — Claude, DeepSeek, Kimi, PI, or any other.
250
+
251
+ **Per-round display format:**
252
+
253
+ ```
254
+ ─── Peer Review · Round N ───────────────────────────────
255
+ → Sending to [peer]:
256
+ {full message sent to peer}
257
+
258
+ ← [peer] responds:
259
+ {peer's full response, verbatim}
260
+
261
+ ◆ My analysis: {Claude/executing agent's reaction and position for this round}
262
+ ─────────────────────────────────────────────────────────
263
+ ```
264
+
265
+ **Rules:**
266
+ - Peer CLI calls must be **synchronous** (do NOT use background/async execution).
267
+ - The outgoing round=1 message must follow the *Independent Judgment Rule* above — no root-cause analysis, no fix diff, no leading questions.
268
+ - Show the outgoing message **before** calling the peer, so the user sees what's being asked.
269
+ - Relay the peer's response **verbatim** before adding your own analysis.
270
+ - After the peer's reply, the reviewer's own analysis block must explicitly state whether the peer's root cause and fix direction match the reviewer's own (independent) conclusion — that comparison is what determines the next round's action.
271
+ - If a peer call fails or times out, report it immediately inline and either retry or ESCALATE.
272
+ - Negotiation log is written to `<project>/.roll/peer/logs/` as usual.
273
+
274
+ **Why inline, not tmux:** When a human manually triggers peer review inside an agent's interactive session, the conversation IS the visible interface. tmux auto-attach is only relevant for CLI-launched background sessions (`bin/roll peer`), not for skill invocations.
275
+
276
+ ## Workflow Integration
277
+
278
+ ### `roll-build` Plan Mode
279
+
280
+ After generating an executable plan, before proceeding to TCR:
281
+
282
+ 1. Assess plan complexity (file count, cross-module impact, risk level)
283
+ 2. If complexity > threshold, prompt user:
284
+ ```
285
+ This plan affects 5 files across 3 modules. Estimated peer review: 2–3 rounds, ~X tokens.
286
+ Press Enter to launch peer review, or type 'n' to skip. Auto-executing in 10s...
287
+ ```
288
+ 3. If user does not abort within 10s, invoke `roll peer` with `--tag architecture`
289
+ 4. Wait for result:
290
+ - AGREE → proceed to TCR
291
+ - REFINE/OBJECT → incorporate feedback and regenerate plan
292
+ - ESCALATE → present both proposals to user for final decision
293
+
294
+ ### `roll-spar`
295
+
296
+ When Attacker and Defender reach a stalemate (both tests pass but interpretations differ):
297
+
298
+ 1. Auto-invoke `roll peer` with `--tag test`
299
+ 2. Use the peer's verdict as tie-breaker
300
+
301
+ ## Output Artifacts
302
+
303
+ - **Negotiation log**: `<project>/.roll/peer/logs/<timestamp>_<from>_<to>.md`
304
+ - **Structured record**: `<project>/.roll/peer/runs.jsonl`
305
+ - **State file**: `~/.roll/.peer-state/`
306
+ - **Decision record**: If AGREE, append summary to `docs/decisions/` or `.roll/backlog.md` (optional)
307
+
308
+ ## Configuration
309
+
310
+ User overrides in `~/.roll/config.yaml`:
311
+
312
+ ```yaml
313
+ peer:
314
+ max_rounds: 3
315
+ opt_out_seconds: 10
316
+ call_timeout: 180 # seconds per round; configure based on your API latency
317
+ fallback: file_mailbox # direct_cli | file_mailbox | auto
318
+ capability_map:
319
+ architecture: [claude, deepseek, kimi, pi]
320
+ security: [claude, deepseek, pi, kimi]
321
+ test: [codex, kimi, deepseek, claude]
322
+ refactor: [deepseek, kimi, claude, pi]
323
+ default: [deepseek, kimi, claude, pi]
324
+ adaptive:
325
+ streak_threshold: 3
326
+ min_samples: 3
327
+ ```
328
+
329
+ ## Limitations
330
+
331
+ 1. **Reverse link reliability**: Direct CLI calls are preferred. Reliability varies by tool — see Peer Invocation Reference table. If a peer fails consistently, the adaptive streak tracker marks it `abandoned` and falls back to the next candidate. File mailbox (`<project>/.roll/peer/mailbox/`) is the last-resort fallback.
332
+ - `deepseek serve --http` is the most reliable option when available — prefer it over direct `deepseek` CLI invocation.
333
+ - `codex exec` has known TTY/Ink issues in non-interactive environments; treat as low-priority fallback.
334
+ 2. **Cost**: Every peer review consumes tokens on both sides. Only trigger for tasks where the cost of a wrong decision exceeds the cost of peer review. DeepSeek is the most cost-effective peer for general use.
335
+ 3. **Context window**: Large project handoff cards may consume significant context. Keep file pointers concise.
336
+ 4. **Tool differences**: Claude, DeepSeek, Kimi, Codex, and Pi interpret skills and AGENTS.md differently. The peer may apply the protocol slightly differently. This is expected and acceptable — the protocol is designed to tolerate variation.