@seanyao/roll 0.5.0 → 2.602.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +717 -0
- package/LICENSE +21 -0
- package/README.md +65 -165
- package/bin/dream-test-quality-scan +110 -0
- package/bin/roll +14897 -815
- package/conventions/config.yaml +17 -1
- package/conventions/global/AGENTS.md +146 -100
- package/conventions/global/CLAUDE.md +1 -21
- package/conventions/global/GEMINI.md +8 -22
- package/conventions/global/project_rules.md +9 -0
- package/conventions/templates/backend-service/AGENTS.md +30 -81
- package/conventions/templates/backend-service/GEMINI.md +3 -3
- package/conventions/templates/backend-service/project_rules.md +16 -0
- package/conventions/templates/cli/AGENTS.md +31 -58
- package/conventions/templates/cli/CLAUDE.md +3 -5
- package/conventions/templates/cli/GEMINI.md +3 -3
- package/conventions/templates/cli/project_rules.md +16 -0
- package/conventions/templates/frontend-only/AGENTS.md +29 -64
- package/conventions/templates/frontend-only/GEMINI.md +3 -3
- package/conventions/templates/frontend-only/project_rules.md +14 -0
- package/conventions/templates/fullstack/AGENTS.md +31 -79
- package/conventions/templates/fullstack/CLAUDE.md +1 -1
- package/conventions/templates/fullstack/GEMINI.md +3 -3
- package/conventions/templates/fullstack/project_rules.md +15 -0
- package/lib/README.md +42 -0
- package/lib/__pycache__/github_sync.cpython-314.pyc +0 -0
- package/lib/__pycache__/loop-fmt.cpython-314.pyc +0 -0
- package/lib/__pycache__/loop_result_eval.cpython-314.pyc +0 -0
- package/lib/__pycache__/loop_unstick.cpython-314.pyc +0 -0
- package/lib/__pycache__/model_prices.cpython-314.pyc +0 -0
- package/lib/__pycache__/prices_fetcher.cpython-314.pyc +0 -0
- package/lib/__pycache__/roll-home.cpython-314.pyc +0 -0
- package/lib/__pycache__/roll-loop-status.cpython-314.pyc +0 -0
- package/lib/__pycache__/roll_git.cpython-314.pyc +0 -0
- package/lib/__pycache__/roll_render.cpython-314.pyc +0 -0
- package/lib/__pycache__/slides-render.cpython-314.pyc +0 -0
- package/lib/agent_usage/README.md +49 -0
- package/lib/agent_usage/__init__.py +108 -0
- package/lib/agent_usage/__pycache__/__init__.cpython-314.pyc +0 -0
- package/lib/agent_usage/__pycache__/gemini.cpython-314.pyc +0 -0
- package/lib/agent_usage/__pycache__/kimi.cpython-314.pyc +0 -0
- package/lib/agent_usage/__pycache__/openai.cpython-314.pyc +0 -0
- package/lib/agent_usage/__pycache__/pi.cpython-314.pyc +0 -0
- package/lib/agent_usage/__pycache__/pi_emit.cpython-314.pyc +0 -0
- package/lib/agent_usage/__pycache__/qwen.cpython-314.pyc +0 -0
- package/lib/agent_usage/gemini.py +127 -0
- package/lib/agent_usage/kimi.py +278 -0
- package/lib/agent_usage/kimi_emit.py +123 -0
- package/lib/agent_usage/openai.py +126 -0
- package/lib/agent_usage/pi.py +200 -0
- package/lib/agent_usage/pi_emit.py +135 -0
- package/lib/agent_usage/qwen.py +128 -0
- package/lib/backfill-pi-usage.py +243 -0
- package/lib/changelog_audit.py +155 -0
- package/lib/changelog_generate.py +263 -0
- package/lib/context_feed_budget.sh +194 -0
- package/lib/github_sync.py +876 -0
- package/lib/i18n/README.md +54 -0
- package/lib/i18n/agent.sh +75 -0
- package/lib/i18n/alert.sh +20 -0
- package/lib/i18n/backlog.sh +96 -0
- package/lib/i18n/brief.sh +5 -0
- package/lib/i18n/changelog.sh +5 -0
- package/lib/i18n/ci.sh +15 -0
- package/lib/i18n/debug.sh +0 -0
- package/lib/i18n/doctor.sh +44 -0
- package/lib/i18n/dream.sh +0 -0
- package/lib/i18n/init.sh +91 -0
- package/lib/i18n/lang.sh +10 -0
- package/lib/i18n/loop.sh +140 -0
- package/lib/i18n/migrate.sh +74 -0
- package/lib/i18n/offboard.sh +31 -0
- package/lib/i18n/onboard.sh +0 -0
- package/lib/i18n/peer.sh +41 -0
- package/lib/i18n/peer_help.sh +25 -0
- package/lib/i18n/peer_reset.sh +7 -0
- package/lib/i18n/peer_status.sh +5 -0
- package/lib/i18n/prices.sh +3 -0
- package/lib/i18n/prices_refresh.sh +17 -0
- package/lib/i18n/prices_show.sh +7 -0
- package/lib/i18n/propose.sh +0 -0
- package/lib/i18n/release.sh +0 -0
- package/lib/i18n/research.sh +0 -0
- package/lib/i18n/review_pr.sh +0 -0
- package/lib/i18n/sentinel.sh +0 -0
- package/lib/i18n/setup.sh +3 -0
- package/lib/i18n/shared.sh +157 -0
- package/lib/i18n/skills/roll-brief.sh +47 -0
- package/lib/i18n/skills/roll-build.sh +97 -0
- package/lib/i18n/skills/roll-design.sh +18 -0
- package/lib/i18n/skills/roll-fix.sh +53 -0
- package/lib/i18n/skills/roll-loop.sh +28 -0
- package/lib/i18n/skills/roll-onboard.sh +33 -0
- package/lib/i18n/skills_catalog.sh +30 -0
- package/lib/i18n/slides.sh +3 -0
- package/lib/i18n/slides_build.sh +38 -0
- package/lib/i18n/slides_delete.sh +19 -0
- package/lib/i18n/slides_list.sh +14 -0
- package/lib/i18n/slides_logs.sh +12 -0
- package/lib/i18n/slides_new.sh +15 -0
- package/lib/i18n/slides_preview.sh +14 -0
- package/lib/i18n/slides_templates.sh +7 -0
- package/lib/i18n/status.sh +21 -0
- package/lib/i18n/update.sh +24 -0
- package/lib/i18n.sh +211 -0
- package/lib/loop-exit-summary.py +393 -0
- package/lib/loop-fmt.py +589 -0
- package/lib/loop_pick_agent.py +316 -0
- package/lib/loop_result_eval.py +469 -0
- package/lib/loop_unstick.py +180 -0
- package/lib/model_prices.py +186 -0
- package/lib/prices/README.md +35 -0
- package/lib/prices/snapshot-2026-05-22.json +22 -0
- package/lib/prices/snapshot-2026-05-23-deepseek.json +15 -0
- package/lib/prices/snapshot-2026-05-23-kimi.json +14 -0
- package/lib/prices_fetcher.py +285 -0
- package/lib/roll-backlog.py +225 -0
- package/lib/roll-brief.py +286 -0
- package/lib/roll-help.py +158 -0
- package/lib/roll-home.py +556 -0
- package/lib/roll-init.py +156 -0
- package/lib/roll-loop-status.py +1683 -0
- package/lib/roll-loop-story.py +191 -0
- package/lib/roll-onboard-render.py +378 -0
- package/lib/roll-peer.py +252 -0
- package/lib/roll-plan-validate.py +386 -0
- package/lib/roll-setup.py +102 -0
- package/lib/roll-status.py +367 -0
- package/lib/roll_git.py +41 -0
- package/lib/roll_render.py +414 -0
- package/lib/slides/components/README.md +123 -0
- package/lib/slides/components/cards-2.html +9 -0
- package/lib/slides/components/cards-3.html +9 -0
- package/lib/slides/components/cards-4.html +9 -0
- package/lib/slides/components/compare.html +22 -0
- package/lib/slides/components/highlight.html +9 -0
- package/lib/slides/components/pipeline.html +12 -0
- package/lib/slides/components/plain.html +7 -0
- package/lib/slides/components/quote.html +4 -0
- package/lib/slides/components/timeline.html +9 -0
- package/lib/slides/templates/introduction-v3.html +571 -0
- package/lib/slides/templates/pitch.html +0 -0
- package/lib/slides-render.py +778 -0
- package/lib/slides-validate.py +357 -0
- package/lib/test_quality_gate.py +143 -0
- package/package.json +8 -7
- package/skills/roll-.changelog/SKILL.md +406 -33
- package/skills/roll-.clarify/SKILL.md +5 -2
- package/skills/roll-.dream/SKILL.md +374 -0
- package/skills/roll-.echo/SKILL.md +5 -2
- package/skills/roll-.qa/SKILL.md +57 -3
- package/skills/roll-.review/SKILL.md +42 -3
- package/skills/roll-brief/SKILL.md +209 -0
- package/skills/roll-build/SKILL.md +308 -63
- package/skills/roll-debug/SKILL.md +341 -162
- package/skills/roll-debug/injectable-bb.js +263 -0
- package/skills/roll-deck/SKILL.md +296 -0
- package/skills/roll-design/ENGINEERING_CHECKLIST.md +1 -1
- package/skills/roll-design/SKILL.md +727 -94
- package/skills/roll-doc/SKILL.md +595 -0
- package/skills/roll-doctor/SKILL.md +192 -0
- package/skills/roll-fix/SKILL.md +149 -32
- package/skills/{roll-jot → roll-idea}/SKILL.md +18 -10
- package/skills/roll-loop/SKILL.md +578 -0
- package/skills/roll-notes/SKILL.md +103 -0
- package/skills/roll-onboard/SKILL.md +234 -0
- package/skills/roll-peer/SKILL.md +336 -0
- package/skills/roll-propose/SKILL.md +157 -0
- package/skills/roll-review-pr/SKILL.md +58 -0
- package/skills/roll-sentinel/SKILL.md +11 -2
- package/skills/roll-spar/SKILL.md +8 -6
- package/template/.github/workflows/ci.yml +5 -2
- package/template/AGENTS.md +20 -74
- package/skills/roll-research/SKILL.md +0 -307
- package/skills/roll-research/references/schema.json +0 -162
- package/skills/roll-research/scripts/md_to_pdf.py +0 -289
- package/tools/roll-fetch/SKILL.md +0 -182
- package/tools/roll-fetch/package.json +0 -15
- package/tools/roll-fetch/smart-web-fetch.js +0 -558
- package/tools/roll-probe/SKILL.md +0 -84
- /package/template/{BACKLOG.md → .roll/backlog.md} +0 -0
|
@@ -0,0 +1,234 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: roll-onboard
|
|
3
|
+
license: MIT
|
|
4
|
+
description: Interactive onboarding for legacy projects. Reads existing code, understands the project, asks 9 questions in 3 groups (cognition / scope / privacy), and writes .roll/onboard-plan.yaml as the contract for `roll init --apply` to execute.
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Roll Onboard
|
|
8
|
+
|
|
9
|
+
> Follows the Architecture Constraints, Development Discipline, and Engineering Common Sense defined in the project AGENTS.md.
|
|
10
|
+
|
|
11
|
+
Interactive onboarding flow for **legacy projects**: existing code that needs to adopt the Roll convention without disrupting how the team already works.
|
|
12
|
+
|
|
13
|
+
## Trigger
|
|
14
|
+
|
|
15
|
+
This skill runs when:
|
|
16
|
+
|
|
17
|
+
- `roll init` detected a legacy project (≥10 source files, no `AGENTS.md`)
|
|
18
|
+
- The CLI told the user to open an AI agent and run `$roll-onboard`
|
|
19
|
+
- The user has now invoked you here
|
|
20
|
+
|
|
21
|
+
## Hard responsibility boundary
|
|
22
|
+
|
|
23
|
+
You are the **认知 (cognition) layer**. Your job ends with writing a plan file.
|
|
24
|
+
|
|
25
|
+
| You do | You do NOT |
|
|
26
|
+
|--------|------------|
|
|
27
|
+
| Read project code, infer type/domains/modules | Modify any source file |
|
|
28
|
+
| Call `roll-doc --dry-run` to get a gap report | Call `roll-doc` (write mode) |
|
|
29
|
+
| Ask the user 9 questions across 3 groups | Decide for the user |
|
|
30
|
+
| Produce `.roll/onboard-plan.yaml` | Write `.gitignore` |
|
|
31
|
+
| Produce `.roll/onboard-plan.yaml` | Run `roll init --apply` |
|
|
32
|
+
|
|
33
|
+
Hard constraint: **AI cannot create files in the user's project other than `.roll/onboard-plan.yaml`.** Anything else is `bash`'s job (`roll init --apply`).
|
|
34
|
+
|
|
35
|
+
## Inputs you must read
|
|
36
|
+
|
|
37
|
+
1. The repository tree (use the project's own structure to infer technologies)
|
|
38
|
+
2. Any existing `README.md` / `package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod` etc. as evidence
|
|
39
|
+
3. `roll-doc --dry-run` output → identifies what documentation gaps exist
|
|
40
|
+
4. The path-audit pattern: scan for legacy structure markers (`BACKLOG.md`, `docs/features/`, etc.) — if any are present, REFUSE and tell the user to run `roll migrate` first
|
|
41
|
+
|
|
42
|
+
## Workflow
|
|
43
|
+
|
|
44
|
+
### Step 0 — Pre-flight
|
|
45
|
+
|
|
46
|
+
1. Check that you're in a legacy project root (no `AGENTS.md`, has source code)
|
|
47
|
+
2. If `BACKLOG.md` or `docs/features/` already present → STOP, tell user to run `roll migrate` first (this is a partial-migration project, not legacy)
|
|
48
|
+
3. Check `.roll/onboard-plan.yaml` doesn't already exist; if it does, ask user whether to overwrite
|
|
49
|
+
|
|
50
|
+
### Step 1 — Read code, build understanding
|
|
51
|
+
|
|
52
|
+
Walk the repo. Identify:
|
|
53
|
+
- **type**: one of `backend-service` / `frontend-only` / `fullstack` / `cli`
|
|
54
|
+
- **description**: 1-2 sentence summary of what this project does
|
|
55
|
+
- **domains**: top business/technical domains (e.g., "auth", "billing", "search")
|
|
56
|
+
- **key_modules**: top 3-5 modules that hold most of the logic
|
|
57
|
+
|
|
58
|
+
### Step 1b — Phase 2 analysis: business model, tech, tests (US-ONBOARD-016)
|
|
59
|
+
|
|
60
|
+
A single onboard now produces three structured analysis sections in the plan
|
|
61
|
+
(`domain_model`, `tech_analysis`, `test_assessment`). Build them here so Step 4
|
|
62
|
+
can serialise them.
|
|
63
|
+
|
|
64
|
+
**`domain_model`** — from the code you read, identify the bounded contexts. For
|
|
65
|
+
each: a `name`, its `aggregates` (the entities that own consistency), and its
|
|
66
|
+
`ubiquitous_language` (the domain terms the code/docs actually use). If you
|
|
67
|
+
genuinely cannot infer contexts from the code, emit an empty
|
|
68
|
+
`bounded_contexts: []` — do NOT invent contexts that aren't in the code.
|
|
69
|
+
|
|
70
|
+
**`tech_analysis`** — `stack` (languages/frameworks evidenced by manifests),
|
|
71
|
+
`dependencies` (from `package.json` / `pyproject.toml` / `go.mod` / `Cargo.toml`
|
|
72
|
+
etc.), `architecture_notes` (observed structure, not aspirational), and `risks`
|
|
73
|
+
(each a mapping with a `description`; optionally `severity: LOW|MEDIUM|HIGH` and
|
|
74
|
+
an `evidence: detected|inferred` tag).
|
|
75
|
+
|
|
76
|
+
**`test_assessment`** — this section is under a **hard anti-hallucination
|
|
77
|
+
constraint** (next sub-step). Do NOT write it from intuition.
|
|
78
|
+
|
|
79
|
+
#### The verifiable test scan (ANTI-HALLUCINATION HARD CONSTRAINT)
|
|
80
|
+
|
|
81
|
+
Every `test_assessment` claim must be backed by a real filesystem scan you run
|
|
82
|
+
here — never by "what a project like this usually needs". Run these probes and
|
|
83
|
+
record the raw counts/paths:
|
|
84
|
+
|
|
85
|
+
1. **Count test files** by the conventional patterns:
|
|
86
|
+
- `*.test.*` / `*.spec.*` (JS/TS), `*_test.go` (Go), `test_*.py` / `*_test.py` (Python), `*_spec.rb` (Ruby), `*Test.java` (Java)
|
|
87
|
+
- e.g. `git ls-files | grep -cE '\.(test|spec)\.[jt]sx?$'` and similar per pattern
|
|
88
|
+
2. **Probe for runner configs**: `jest.config.*`, `pytest.ini` (or `[tool.pytest]` in `pyproject.toml`), `.mocharc.*`, `vitest.config.*`, `karma.conf.*`, `phpunit.xml`, `go test` (implied by `*_test.go`)
|
|
89
|
+
3. **Probe for a `coverage/` directory** (and `.coverage` / `coverage.xml` artifacts)
|
|
90
|
+
|
|
91
|
+
Then turn the raw findings into claims, each a **mapping** carrying a `claim`
|
|
92
|
+
string plus an `evidence` tag:
|
|
93
|
+
|
|
94
|
+
- `evidence: detected` — the scan directly found it (e.g. "42 `*.test.ts` files detected", "vitest.config.ts present", "coverage/ directory present").
|
|
95
|
+
- `evidence: inferred` — a judgement you derived FROM the detected facts (e.g. "unit layer present but no E2E config — integration coverage likely thin"). The inference must trace back to something the scan detected.
|
|
96
|
+
|
|
97
|
+
**"none detected" rule**: when a probe finds nothing, you MUST say so explicitly
|
|
98
|
+
with a tagged claim — `{claim: "none detected", evidence: detected}` (a scan that
|
|
99
|
+
ran and returned zero is a genuine `detected` finding). You must NOT silently
|
|
100
|
+
omit the dimension, and you must NOT invent generic filler like "needs more E2E
|
|
101
|
+
tests" / "consider adding integration tests" with no detected basis. The plan
|
|
102
|
+
validator (`lib/roll-plan-validate.py`) rejects any untagged free-text claim, so
|
|
103
|
+
filler will fail `roll init --apply`.
|
|
104
|
+
|
|
105
|
+
Map the findings into the three buckets:
|
|
106
|
+
- `current_layers`: what test layers actually exist (each tagged `detected`)
|
|
107
|
+
- `gaps`: dimensions where the scan found nothing (`none detected`, tagged `detected`) or thin coverage you can justify (`inferred`)
|
|
108
|
+
- `recommended_actions`: actions that trace to a detected gap (tag `inferred`); if nothing is missing, this bucket may be `[]`
|
|
109
|
+
|
|
110
|
+
### Step 2 — Get gap report
|
|
111
|
+
|
|
112
|
+
Run `roll-doc --dry-run` (READ-ONLY mode). This reports:
|
|
113
|
+
- Which standard Roll artifacts (BACKLOG, features, domain models) are missing
|
|
114
|
+
- Which existing docs Roll could `include` rather than regenerate
|
|
115
|
+
|
|
116
|
+
### Step 3 — Three groups of nine questions
|
|
117
|
+
|
|
118
|
+
Present these in chat. **Aim for total time ≤ 3 minutes.** Group 1 confirms your understanding; group 2 scopes the work; group 3 handles privacy and next steps.
|
|
119
|
+
|
|
120
|
+
**$(msg onboard.questions_group1)**
|
|
121
|
+
|
|
122
|
+
1. $(msg onboard.q1 "[type]" "[description]")
|
|
123
|
+
2. $(msg onboard.q2 "[domain A, domain B, …]")
|
|
124
|
+
3. $(msg onboard.q3 "[X, Y, Z]")
|
|
125
|
+
|
|
126
|
+
**$(msg onboard.questions_group2)**
|
|
127
|
+
|
|
128
|
+
4. $(msg onboard.q4) Multi-select:
|
|
129
|
+
- `backlog` — initial BACKLOG with seeded stories
|
|
130
|
+
- `features` — features index + per-feature spec stubs
|
|
131
|
+
- `domain` — DDD context map
|
|
132
|
+
- `briefs` — directory for `$roll-brief` outputs
|
|
133
|
+
5. Of these existing docs, which should I `include` rather than regenerate?
|
|
134
|
+
- (list candidates: README.md, docs/architecture.md, etc.)
|
|
135
|
+
6. Put drafts inside `.roll/`? (default: yes; "no" means use the legacy `docs/` layout — not recommended for new adoption)
|
|
136
|
+
|
|
137
|
+
**Group 3 — Privacy & next steps**
|
|
138
|
+
|
|
139
|
+
7. Add `.roll/` to `.gitignore`? (yes = keep project management private; no = commit it like Roll itself does)
|
|
140
|
+
8. Sync Roll conventions to which AI tools? Multi-select from detected agents (claude / cursor / codex / kimi / deepseek / pi / opencode / agy / trae)
|
|
141
|
+
9. Enable `roll loop` autonomous execution after init?
|
|
142
|
+
|
|
143
|
+
### Step 4 — Write plan file
|
|
144
|
+
|
|
145
|
+
Write `.roll/onboard-plan.yaml` with this exact schema (validated by `lib/roll-plan-validate.py`):
|
|
146
|
+
|
|
147
|
+
```yaml
|
|
148
|
+
version: 1
|
|
149
|
+
generated_at: "2026-05-19T14:30:00+08:00" # current ISO 8601, your timezone OK
|
|
150
|
+
|
|
151
|
+
project_understanding:
|
|
152
|
+
type: cli # one of: backend-service / frontend-only / fullstack / cli
|
|
153
|
+
description: "..."
|
|
154
|
+
domains: [...]
|
|
155
|
+
key_modules: [...]
|
|
156
|
+
|
|
157
|
+
scope:
|
|
158
|
+
approved: [backlog, features, domain] # user's Q4 multi-select
|
|
159
|
+
declined: [briefs] # what they said no to
|
|
160
|
+
|
|
161
|
+
include_existing:
|
|
162
|
+
- README.md # user's Q5 selections
|
|
163
|
+
- docs/architecture.md
|
|
164
|
+
|
|
165
|
+
privacy:
|
|
166
|
+
gitignore_dot_roll: true # user's Q7
|
|
167
|
+
|
|
168
|
+
sync_targets: [claude, cursor] # user's Q8
|
|
169
|
+
enable_loop: false # user's Q9
|
|
170
|
+
agent_routes_template: default # user's Q10 — agent routing preset
|
|
171
|
+
# one of: default / minimal / heavy / skip
|
|
172
|
+
# default = pi/deepseek/claude + history (US-AGENT-002)
|
|
173
|
+
# minimal = single agent (pi), no history
|
|
174
|
+
# heavy = pi/deepseek/claude/kimi + larger window
|
|
175
|
+
# skip = don't seed .roll/agent-routes.yaml
|
|
176
|
+
|
|
177
|
+
# ── US-ONBOARD-016: Phase 2 analysis sections (optional, but emit all three) ──
|
|
178
|
+
# All three are validated only when present, so they are backward-compatible,
|
|
179
|
+
# but a normal onboard should produce them from Step 1b.
|
|
180
|
+
|
|
181
|
+
domain_model:
|
|
182
|
+
bounded_contexts: # [] if none can be inferred — never invent
|
|
183
|
+
- name: auth
|
|
184
|
+
aggregates: [User, Session]
|
|
185
|
+
ubiquitous_language: [login, token, refresh]
|
|
186
|
+
|
|
187
|
+
tech_analysis:
|
|
188
|
+
stack: [bash, python3] # evidenced by manifests
|
|
189
|
+
dependencies: [pyyaml] # from package.json / pyproject / go.mod / ...
|
|
190
|
+
architecture_notes: ["single-binary CLI + python helpers in lib/"]
|
|
191
|
+
risks:
|
|
192
|
+
- description: "no automated test run on macOS bash 3.2"
|
|
193
|
+
severity: HIGH # optional: LOW | MEDIUM | HIGH
|
|
194
|
+
evidence: detected # optional: detected | inferred
|
|
195
|
+
|
|
196
|
+
# test_assessment — ANTI-HALLUCINATION: every claim is a mapping with an
|
|
197
|
+
# `evidence` tag (detected | inferred). A zero-result scan is `none detected`
|
|
198
|
+
# tagged `detected`. Untagged free-text is REJECTED by the validator.
|
|
199
|
+
test_assessment:
|
|
200
|
+
current_layers:
|
|
201
|
+
- claim: "112 bats files detected under tests/" # evidence: a real scan count
|
|
202
|
+
evidence: detected
|
|
203
|
+
gaps:
|
|
204
|
+
- claim: "none detected" # e.g. no coverage/ dir found
|
|
205
|
+
evidence: detected
|
|
206
|
+
recommended_actions: # [] if nothing is missing
|
|
207
|
+
- claim: "add a macOS CI runner (inferred from launchd-only test skips)"
|
|
208
|
+
evidence: inferred # an inference traceable to a detected fact
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
Then tell the user:
|
|
212
|
+
|
|
213
|
+
> Onboard conversation done. Plan saved to `.roll/onboard-plan.yaml`.
|
|
214
|
+
> Return to your terminal and run:
|
|
215
|
+
>
|
|
216
|
+
> roll init --apply
|
|
217
|
+
>
|
|
218
|
+
> The plan expires in 24 hours.
|
|
219
|
+
|
|
220
|
+
### Step 5 — Stop
|
|
221
|
+
|
|
222
|
+
Do NOT run `roll init --apply` yourself. Do NOT modify other project files. Your job is done.
|
|
223
|
+
|
|
224
|
+
## When NOT to use
|
|
225
|
+
|
|
226
|
+
- **Not a legacy project**: empty dir or fresh project → use plain `roll init` instead
|
|
227
|
+
- **Has BACKLOG.md or docs/features/**: this is a pre-2.0 Roll project → run `roll migrate` first
|
|
228
|
+
- **Has .roll/ already**: already onboarded → don't re-run
|
|
229
|
+
|
|
230
|
+
## Failure modes
|
|
231
|
+
|
|
232
|
+
- User aborts mid-conversation → don't write partial plan; tell user to re-run from scratch
|
|
233
|
+
- User answers contradict the gap report (e.g., declines `features` but has lots of code) → ask the contradictory question once more before accepting; if they confirm, respect the choice
|
|
234
|
+
- You can't read enough code to fill `project_understanding` (e.g., binary repo) → write a placeholder plan but ask user to fill in `type` and `description` manually before applying
|
|
@@ -0,0 +1,336 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: roll-peer
|
|
3
|
+
license: MIT
|
|
4
|
+
allowed-tools: "Read, Bash, Write, Edit"
|
|
5
|
+
description: |
|
|
6
|
+
Cross-agent peer review skill. When a task enters a decision phase (planning,
|
|
7
|
+
high-risk, ambiguous, destructive), triggers a bidirectional negotiation with
|
|
8
|
+
another AI agent via a unified protocol. Up to 3 rounds. If consensus is not
|
|
9
|
+
reached, escalates to the human user. Includes adaptive peer routing based on
|
|
10
|
+
task type and historical success rate.
|
|
11
|
+
Trigger: /peer, "叫上 peer", "peer review", or auto-triggered at workflow gates.
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Roll Peer (Cross-Agent Peer Review)
|
|
15
|
+
|
|
16
|
+
> Follows the Architecture Constraints, Development Discipline, and Engineering
|
|
17
|
+
> Common Sense defined in the project AGENTS.md.
|
|
18
|
+
|
|
19
|
+
## Credits
|
|
20
|
+
|
|
21
|
+
Cross-agent consultation protocol inspired by
|
|
22
|
+
[friend-skill](https://github.com/fpyluck/friend-skill) (MIT) by fpyluck.
|
|
23
|
+
Independent implementation for the Roll toolchain.
|
|
24
|
+
|
|
25
|
+
## Trigger
|
|
26
|
+
|
|
27
|
+
**Manual:**
|
|
28
|
+
- `/peer`
|
|
29
|
+
- "叫上 peer"
|
|
30
|
+
- "peer review 一下"
|
|
31
|
+
- "和 peer 商量"
|
|
32
|
+
|
|
33
|
+
**Auto-triggered (with 10s opt-out):**
|
|
34
|
+
- `roll-build` enters Plan Mode (executable plans / architecture decisions)
|
|
35
|
+
- `roll-spar` Attacker and Defender disagree
|
|
36
|
+
- High context pressure (large number of files read / tools executed)
|
|
37
|
+
- Destructive / irreversible operations (`rm -rf`, production deploy, global config changes)
|
|
38
|
+
- High-risk signal words ("重要 / 关键 / 别搞砸 / important / critical")
|
|
39
|
+
- Cross-repository / cross-toolchain / ambiguous permission boundaries
|
|
40
|
+
|
|
41
|
+
**Never trigger:**
|
|
42
|
+
- Single-file changes
|
|
43
|
+
- Clear, well-defined fixes
|
|
44
|
+
- Simple refactoring
|
|
45
|
+
|
|
46
|
+
## Protocol: `[PEER_REVIEW]`
|
|
47
|
+
|
|
48
|
+
### Marker Format
|
|
49
|
+
|
|
50
|
+
The marker **must** appear on the first non-empty line of the message:
|
|
51
|
+
|
|
52
|
+
```markdown
|
|
53
|
+
[PEER_REVIEW round=N tool=<from>→<to>]
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
- `round=N`: Current round number (1–3)
|
|
57
|
+
- `tool=<from>→<to>`: Direction of this message (e.g., `kimi→claude`)
|
|
58
|
+
|
|
59
|
+
### Three-State Resolution + Escape
|
|
60
|
+
|
|
61
|
+
Allowed states only. No invented words.
|
|
62
|
+
|
|
63
|
+
- **AGREE**: Accept the current proposal. Proceed to execution.
|
|
64
|
+
- **REFINE**: Direction is correct, but specific changes are needed. Proceed to next round.
|
|
65
|
+
- **OBJECT**: The proposal is wrong. Provide an alternative. Proceed to next round.
|
|
66
|
+
- **ESCALATE**: Round 3 reached without AGREE, or a round fails due to API/token error. Hand off to the human user.
|
|
67
|
+
|
|
68
|
+
After each round decision, emit a `peer` event to the cycle event stream:
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
# $round = current round number, $total = max rounds, $verdict = AGREE/REFINE/OBJECT/ESCALATE
|
|
72
|
+
# $agents = e.g. "claude→deepseek"
|
|
73
|
+
_loop_event peer "${round}/${total}" "$verdict" "$agents" 2>/dev/null || true
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
If information is insufficient:
|
|
77
|
+
```
|
|
78
|
+
REFINE: Need to confirm X/Y/Z with the user first.
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Context Handoff Card (required for round=1)
|
|
82
|
+
|
|
83
|
+
When the task involves a local project, the first message must include:
|
|
84
|
+
|
|
85
|
+
```markdown
|
|
86
|
+
## Project Handoff (round=1 required)
|
|
87
|
+
- Project root: <absolute path>
|
|
88
|
+
- Execution environment: <shell / container / devcontainer / remote / N/A>
|
|
89
|
+
- Project type: <language + framework>
|
|
90
|
+
- Virtual environment: <absolute path / conda env / container name / N/A>
|
|
91
|
+
- Activation command: <one-line executable string, or N/A>
|
|
92
|
+
- Key tool calls:
|
|
93
|
+
- test: <one-line command or N/A>
|
|
94
|
+
- build: <one-line command or N/A>
|
|
95
|
+
- run: <one-line command or N/A>
|
|
96
|
+
- lint: <one-line command or N/A>
|
|
97
|
+
- Key conventions / constraints: <2–3 items, or N/A>
|
|
98
|
+
- Related file pointers: <absolute paths or @references, or N/A>
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Rules:
|
|
102
|
+
- Paths must be **absolute**.
|
|
103
|
+
- Commands must be **one-line executable strings**, not descriptions.
|
|
104
|
+
- Prefer commands that do **not** require an activated environment (absolute interpreter paths, `uv run`, `docker compose exec`).
|
|
105
|
+
- Do not copy README text. List file pointers only.
|
|
106
|
+
- Never include secrets, tokens, credentials, or `.env` content.
|
|
107
|
+
- Even if logically a continuation, treat as round=1 if the peer has **no prior context**.
|
|
108
|
+
- **Do NOT** prefill the peer with your own root-cause analysis, proposed fix, or leading questions — see the *Independent Judgment Rule* below. The handoff card is for context, not conclusions.
|
|
109
|
+
|
|
110
|
+
### Anti-Hallucination Rule
|
|
111
|
+
|
|
112
|
+
When mentioning specific paths, function names, commands, line numbers, or tool results, **must cite the source** ("I read X at line Y"). If unverified, state "unverified" explicitly.
|
|
113
|
+
|
|
114
|
+
### Independent Judgment Rule (round=1)
|
|
115
|
+
|
|
116
|
+
The whole point of peer review is to surface a **second, independent** read. If
|
|
117
|
+
the reviewer's own root-cause analysis, fix diff, and leading questions are sent
|
|
118
|
+
to the peer up front, the peer can only AGREE inside the reviewer's frame — and
|
|
119
|
+
that AGREE carries no signal. The reviewer **must complete their own analysis
|
|
120
|
+
before opening round=1**; skipping that step turns peer review into a search for
|
|
121
|
+
endorsement.
|
|
122
|
+
|
|
123
|
+
Round=1 message **must NOT include**:
|
|
124
|
+
|
|
125
|
+
- The reviewer's own root-cause analysis ("the bug is in function X at line Y because…").
|
|
126
|
+
- The reviewer's own proposed fix, patch, or diff.
|
|
127
|
+
- Leading questions of the form "do you agree with my conclusion on X?" / "is the change I made on Y safe?" — these lock the peer into the reviewer's framing.
|
|
128
|
+
- Specific line numbers, function names, or branch points the reviewer has already identified as relevant — let the peer locate them.
|
|
129
|
+
|
|
130
|
+
Round=1 message **should include**:
|
|
131
|
+
|
|
132
|
+
- The Project Handoff Card (above).
|
|
133
|
+
- Symptoms exactly as observed: the user's reported error, terminal output verbatim, the precise commands that triggered it.
|
|
134
|
+
- Necessary external context: the goal of the work, the date / version under test, anything the peer cannot infer from the repo alone.
|
|
135
|
+
- Key file pointers as **entry points** (paths only — let the peer choose what to read and how deep).
|
|
136
|
+
- An open invitation: "independently identify the root cause, propose a fix, and call out any test gaps."
|
|
137
|
+
|
|
138
|
+
After receiving the peer's round=1 reply, the reviewer **compares** their own
|
|
139
|
+
conclusion to the peer's and routes the next action:
|
|
140
|
+
|
|
141
|
+
| Reviewer's own conclusion vs. peer's conclusion | Next action |
|
|
142
|
+
|---|---|
|
|
143
|
+
| Same root cause + same fix direction | High confidence — AGREE and proceed to execution |
|
|
144
|
+
| Same root cause, different fix direction | REFINE — open round=2 to reconcile the fix |
|
|
145
|
+
| Different root cause | OBJECT — open round=2; at least one of the two analyses is wrong |
|
|
146
|
+
| Peer asks for more context | REFINE — supply the missing context, then re-evaluate |
|
|
147
|
+
|
|
148
|
+
#### Example (bad — endorsement-seeking)
|
|
149
|
+
|
|
150
|
+
```
|
|
151
|
+
Bug is in `cmd_init` at line 932 — the v2 demo renderer fires unconditionally.
|
|
152
|
+
My fix: gate it behind `--demo`. Q1: is this over-killed? Q2: should I
|
|
153
|
+
refactor the renderer instead? Q3: are the tests strong enough?
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
The peer can only AGREE or quibble inside the reviewer's framing.
|
|
157
|
+
|
|
158
|
+
#### Example (good — independent analysis)
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
Symptoms: user ran `roll init` on /path/X and saw [verbatim terminal output A];
|
|
162
|
+
then ran `roll backlog` and saw [verbatim terminal output B]. Project background:
|
|
163
|
+
[project shape]. Entry points: `bin/roll`, `lib/roll-init.py`, `tests/`.
|
|
164
|
+
Independently identify the root cause and propose a fix.
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
The peer reads, locates, and proposes on its own. The reviewer then compares.
|
|
168
|
+
|
|
169
|
+
## State Machine
|
|
170
|
+
|
|
171
|
+
### Per Negotiation (Single Task)
|
|
172
|
+
|
|
173
|
+
```
|
|
174
|
+
Running
|
|
175
|
+
├── AGREE (any round) → Execute proposal
|
|
176
|
+
├── Round == 3, no AGREE → ESCALATE (failed_max_rounds)
|
|
177
|
+
├── API/token error → ESCALATE (failed_api_error)
|
|
178
|
+
└── User aborts → ESCALATE (user_abort)
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### Per Peer Pair (e.g., kimi→claude)
|
|
182
|
+
|
|
183
|
+
Stored in `~/.roll/.peer-state/` (flat key files per pair):
|
|
184
|
+
|
|
185
|
+
```yaml
|
|
186
|
+
kimi→claude:
|
|
187
|
+
status: active # active | degraded | abandoned
|
|
188
|
+
streak: 0 # consecutive failure count
|
|
189
|
+
last_outcome: agreed
|
|
190
|
+
history:
|
|
191
|
+
- { time: "2026-05-08T23:30:00+08:00", outcome: agreed, rounds: 2, tag: architecture }
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Rules:
|
|
195
|
+
- `streak >= 3` → automatically mark as `abandoned`
|
|
196
|
+
- `abandoned` peer pairs are skipped by the bridge script
|
|
197
|
+
- Human can reset via `roll peer reset <from> <to>` or `roll peer reset --all`
|
|
198
|
+
- If a peer pair is abandoned, the bridge falls back to the next candidate in the capability map
|
|
199
|
+
|
|
200
|
+
## Peer Routing (Adaptive)
|
|
201
|
+
|
|
202
|
+
### Capability Map (Task Type → Preferred Peer Order)
|
|
203
|
+
|
|
204
|
+
```yaml
|
|
205
|
+
peer:
|
|
206
|
+
capability_map:
|
|
207
|
+
architecture: [claude, deepseek, kimi, pi]
|
|
208
|
+
security: [claude, deepseek, pi, kimi]
|
|
209
|
+
test: [codex, kimi, deepseek, claude]
|
|
210
|
+
refactor: [deepseek, kimi, claude, pi]
|
|
211
|
+
default: [deepseek, kimi, claude, pi]
|
|
212
|
+
```
|
|
213
|
+
|
|
214
|
+
### Adaptive Adjustment
|
|
215
|
+
|
|
216
|
+
After each negotiation, record:
|
|
217
|
+
- `outcome`: agreed / failed_max_rounds / failed_api_error
|
|
218
|
+
- `rounds`: number of rounds consumed
|
|
219
|
+
- `tag`: task type
|
|
220
|
+
|
|
221
|
+
If `streak` for a peer pair reaches the configured threshold (default: 3 consecutive failures), mark as `abandoned`. The next task of the same type will try the next candidate in `capability_map`.
|
|
222
|
+
|
|
223
|
+
### Peer Detection
|
|
224
|
+
|
|
225
|
+
The bridge script detects installed peers via `command -v <tool>`. Only installed tools are considered. The current running tool is excluded (`exclude_self: true`).
|
|
226
|
+
|
|
227
|
+
For `deepseek`, also check if serve mode is available as a more reliable alternative:
|
|
228
|
+
```bash
|
|
229
|
+
command -v deepseek && { deepseek serve --help 2>/dev/null; true; } | grep -q "\-\-http" && echo "serve_mode"
|
|
230
|
+
```
|
|
231
|
+
If serve mode is available, prefer HTTP transport over direct CLI invocation.
|
|
232
|
+
|
|
233
|
+
### Peer Invocation Reference
|
|
234
|
+
|
|
235
|
+
| Peer | Non-interactive command | Reliability | Notes |
|
|
236
|
+
|------|------------------------|-------------|-------|
|
|
237
|
+
| `claude` | `claude -p "<prompt>"` | ✅ Verified | Native, stable |
|
|
238
|
+
| `deepseek` | `deepseek "<prompt>"` | ✅ Verified | No TTY dependency |
|
|
239
|
+
| `deepseek` (serve) | `curl localhost:<port>/v1/...` | ✅ High | Start with `deepseek serve --http`; preferred over direct CLI |
|
|
240
|
+
| `kimi` | `kimi --quiet -p "<prompt>"` | ✅ Verified | `--quiet` is alias for `--print --output-format text --final-message-only`; prompt via `-p` |
|
|
241
|
+
| `pi` | `pi -p "<prompt>"` | ✅ Verified | Clean non-interactive output |
|
|
242
|
+
| `opencode` | `opencode run "<prompt>"` | ✅ Verified | Works non-interactively |
|
|
243
|
+
| `codex` | `codex exec "<prompt>"` | ⚠️ Auth required | Token must be valid; re-login with `codex login` if expired |
|
|
244
|
+
|
|
245
|
+
**CLI vs. API Key**: `claude`, `deepseek`, `kimi`, `codex` CLIs authenticate via existing subscription accounts — no separate API key required. This is the primary advantage of CLI transport over the MCP/HTTP approach.
|
|
246
|
+
|
|
247
|
+
## Inline Display Mode (Manual Triggers)
|
|
248
|
+
|
|
249
|
+
When peer review is manually triggered by a human (via `/peer`, "叫上 peer", etc.), the executing agent **must display each round inline in the current conversation**. This applies regardless of which agent is executing — Claude, DeepSeek, Kimi, PI, or any other.
|
|
250
|
+
|
|
251
|
+
**Per-round display format:**
|
|
252
|
+
|
|
253
|
+
```
|
|
254
|
+
─── Peer Review · Round N ───────────────────────────────
|
|
255
|
+
→ Sending to [peer]:
|
|
256
|
+
{full message sent to peer}
|
|
257
|
+
|
|
258
|
+
← [peer] responds:
|
|
259
|
+
{peer's full response, verbatim}
|
|
260
|
+
|
|
261
|
+
◆ My analysis: {Claude/executing agent's reaction and position for this round}
|
|
262
|
+
─────────────────────────────────────────────────────────
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
**Rules:**
|
|
266
|
+
- Peer CLI calls must be **synchronous** (do NOT use background/async execution).
|
|
267
|
+
- The outgoing round=1 message must follow the *Independent Judgment Rule* above — no root-cause analysis, no fix diff, no leading questions.
|
|
268
|
+
- Show the outgoing message **before** calling the peer, so the user sees what's being asked.
|
|
269
|
+
- Relay the peer's response **verbatim** before adding your own analysis.
|
|
270
|
+
- After the peer's reply, the reviewer's own analysis block must explicitly state whether the peer's root cause and fix direction match the reviewer's own (independent) conclusion — that comparison is what determines the next round's action.
|
|
271
|
+
- If a peer call fails or times out, report it immediately inline and either retry or ESCALATE.
|
|
272
|
+
- Negotiation log is written to `<project>/.roll/peer/logs/` as usual.
|
|
273
|
+
|
|
274
|
+
**Why inline, not tmux:** When a human manually triggers peer review inside an agent's interactive session, the conversation IS the visible interface. tmux auto-attach is only relevant for CLI-launched background sessions (`bin/roll peer`), not for skill invocations.
|
|
275
|
+
|
|
276
|
+
## Workflow Integration
|
|
277
|
+
|
|
278
|
+
### `roll-build` Plan Mode
|
|
279
|
+
|
|
280
|
+
After generating an executable plan, before proceeding to TCR:
|
|
281
|
+
|
|
282
|
+
1. Assess plan complexity (file count, cross-module impact, risk level)
|
|
283
|
+
2. If complexity > threshold, prompt user:
|
|
284
|
+
```
|
|
285
|
+
This plan affects 5 files across 3 modules. Estimated peer review: 2–3 rounds, ~X tokens.
|
|
286
|
+
Press Enter to launch peer review, or type 'n' to skip. Auto-executing in 10s...
|
|
287
|
+
```
|
|
288
|
+
3. If user does not abort within 10s, invoke `roll peer` with `--tag architecture`
|
|
289
|
+
4. Wait for result:
|
|
290
|
+
- AGREE → proceed to TCR
|
|
291
|
+
- REFINE/OBJECT → incorporate feedback and regenerate plan
|
|
292
|
+
- ESCALATE → present both proposals to user for final decision
|
|
293
|
+
|
|
294
|
+
### `roll-spar`
|
|
295
|
+
|
|
296
|
+
When Attacker and Defender reach a stalemate (both tests pass but interpretations differ):
|
|
297
|
+
|
|
298
|
+
1. Auto-invoke `roll peer` with `--tag test`
|
|
299
|
+
2. Use the peer's verdict as tie-breaker
|
|
300
|
+
|
|
301
|
+
## Output Artifacts
|
|
302
|
+
|
|
303
|
+
- **Negotiation log**: `<project>/.roll/peer/logs/<timestamp>_<from>_<to>.md`
|
|
304
|
+
- **Structured record**: `<project>/.roll/peer/runs.jsonl`
|
|
305
|
+
- **State file**: `~/.roll/.peer-state/`
|
|
306
|
+
- **Decision record**: If AGREE, append summary to `docs/decisions/` or `.roll/backlog.md` (optional)
|
|
307
|
+
|
|
308
|
+
## Configuration
|
|
309
|
+
|
|
310
|
+
User overrides in `~/.roll/config.yaml`:
|
|
311
|
+
|
|
312
|
+
```yaml
|
|
313
|
+
peer:
|
|
314
|
+
max_rounds: 3
|
|
315
|
+
opt_out_seconds: 10
|
|
316
|
+
call_timeout: 180 # seconds per round; configure based on your API latency
|
|
317
|
+
fallback: file_mailbox # direct_cli | file_mailbox | auto
|
|
318
|
+
capability_map:
|
|
319
|
+
architecture: [claude, deepseek, kimi, pi]
|
|
320
|
+
security: [claude, deepseek, pi, kimi]
|
|
321
|
+
test: [codex, kimi, deepseek, claude]
|
|
322
|
+
refactor: [deepseek, kimi, claude, pi]
|
|
323
|
+
default: [deepseek, kimi, claude, pi]
|
|
324
|
+
adaptive:
|
|
325
|
+
streak_threshold: 3
|
|
326
|
+
min_samples: 3
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
## Limitations
|
|
330
|
+
|
|
331
|
+
1. **Reverse link reliability**: Direct CLI calls are preferred. Reliability varies by tool — see Peer Invocation Reference table. If a peer fails consistently, the adaptive streak tracker marks it `abandoned` and falls back to the next candidate. File mailbox (`<project>/.roll/peer/mailbox/`) is the last-resort fallback.
|
|
332
|
+
- `deepseek serve --http` is the most reliable option when available — prefer it over direct `deepseek` CLI invocation.
|
|
333
|
+
- `codex exec` has known TTY/Ink issues in non-interactive environments; treat as low-priority fallback.
|
|
334
|
+
2. **Cost**: Every peer review consumes tokens on both sides. Only trigger for tasks where the cost of a wrong decision exceeds the cost of peer review. DeepSeek is the most cost-effective peer for general use.
|
|
335
|
+
3. **Context window**: Large project handoff cards may consume significant context. Keep file pointers concise.
|
|
336
|
+
4. **Tool differences**: Claude, DeepSeek, Kimi, Codex, and Pi interpret skills and AGENTS.md differently. The peer may apply the protocol slightly differently. This is expected and acceptable — the protocol is designed to tolerate variation.
|