cs-scientist-plugin 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/PROTOCOL.md +431 -0
- package/README.md +256 -0
- package/agents/cs-scientist-arbiter.md +124 -0
- package/agents/cs-scientist-consultant.md +111 -0
- package/agents/cs-scientist-critic.md +234 -0
- package/agents/cs-scientist-dev.md +439 -0
- package/agents/cs-scientist-research.md +426 -0
- package/agents/cs-scientist-teach.md +430 -0
- package/agents/cs-scientist.md +201 -0
- package/agents/planner.md +41 -0
- package/agents/writer.md +35 -0
- package/bin/install.js +109 -0
- package/index.js +3 -0
- package/package.json +40 -0
- package/skills/concept-explainer.md +78 -0
- package/skills/deep-research.md +98 -0
- package/skills/kb-validate.md +101 -0
- package/skills/lesson-plan.md +107 -0
- package/skills/negative-results.md +100 -0
- package/skills/notebooklm.md +95 -0
- package/skills/paper-outline.md +143 -0
- package/skills/parallel-research.md +85 -0
- package/skills/project-onboarding.md +118 -0
- package/skills/session-status.md +79 -0
- package/skills/writing-plans/SKILL.md +152 -0
- package/skills/writing-plans/plan-document-reviewer-prompt.md +49 -0
package/PROTOCOL.md
ADDED
|
@@ -0,0 +1,431 @@
|
|
|
1
|
+
# CS-Scientist Plugin — Communication Protocol v1.1
|
|
2
|
+
|
|
3
|
+
This document is the authoritative contract for all agents in the plugin.
|
|
4
|
+
Before writing or modifying any agent prompt, read this file.
|
|
5
|
+
If a behavior conflicts with this document, this document wins.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Agent Registry
|
|
10
|
+
|
|
11
|
+
Each agent has exactly one responsibility. If it does more than one thing, it is incorrectly designed.
|
|
12
|
+
|
|
13
|
+
| Agent | Single Responsibility | Writes to disk | Reads from disk |
|
|
14
|
+
|-------|-----------------------|---------------|-----------------|
|
|
15
|
+
| `cs-scientist` | Routing + session init | `session_state.json`, `goals.md` (init only) | `session_state.json` |
|
|
16
|
+
| `cs-scientist-research` | Research loop (phases 1–10) | `session_state.json`, `goals.md`, `activity_log.jsonl`, KB | `session_state.json`, `goals.md`, `activity_log.jsonl` (last 5), KB |
|
|
17
|
+
| `cs-scientist-dev` | Dev loop (phases 1–7) | `session_state.json`, `goals.md`, `activity_log.jsonl`, KB | `session_state.json`, `goals.md`, `activity_log.jsonl` (last 5), KB |
|
|
18
|
+
| `cs-scientist-teach` | Teaching loop (phases 1–7) | `session_state.json`, `goals.md`, `activity_log.jsonl`, KB, `lesson.md` | `session_state.json`, `goals.md`, `activity_log.jsonl` (last 5), KB, sources |
|
|
19
|
+
| `cs-scientist-critic` | Gate validation | Nothing | Nothing |
|
|
20
|
+
| `cs-scientist-consultant` | Domain diagnosis | Nothing | Nothing |
|
|
21
|
+
| `cs-scientist-arbiter` | Council synthesis | Nothing | Nothing |
|
|
22
|
+
|
|
23
|
+
Sub-agents (critic, consultant, arbiter) **never touch disk**.
|
|
24
|
+
They receive structured input and return structured output. Period.
|
|
25
|
+
Sub-agent outcomes are logged by the mode agent that called them, not by the sub-agent itself.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Session Files
|
|
30
|
+
|
|
31
|
+
Every session lives in `.cs-scientist/{session_id}/` relative to the project root.
|
|
32
|
+
The session contains exactly these files:
|
|
33
|
+
|
|
34
|
+
| File | Purpose | Append-only |
|
|
35
|
+
|------|---------|-------------|
|
|
36
|
+
| `session_state.json` | State machine — where the session is right now | No (overwritten) |
|
|
37
|
+
| `goals.md` | Goal tracker — what the session is trying to achieve | No (updated) |
|
|
38
|
+
| `activity_log.jsonl` | Action history — what each agent did per turn | Yes |
|
|
39
|
+
| `knowledge_base.md` | Verified facts accumulator | No (updated) |
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## session_state.json
|
|
44
|
+
|
|
45
|
+
The single file that defines where the session is.
|
|
46
|
+
Everything else (KB, logs) is content. This is the state machine.
|
|
47
|
+
|
|
48
|
+
```json
|
|
49
|
+
{
|
|
50
|
+
"schema_version": "1.0",
|
|
51
|
+
"session_id": "topic-slug_mode_YYYYMMDD",
|
|
52
|
+
"mode": "research | dev | null",
|
|
53
|
+
"topic": "string",
|
|
54
|
+
"verifier": "string — external truth criterion",
|
|
55
|
+
|
|
56
|
+
"phase": "SCOPE | DECOMPOSE | RETRIEVE | TRIANGULATE | PROPOSE | EXPERIMENT | ANALYZE | SYNTHESIZE | CRITIQUE | DOCUMENT | DESIGN | PLAN | IMPLEMENT | VERIFY | ITERATE | INTAKE | MAP | SCAFFOLD | EXPLAIN | ITERATE_TEACH",
|
|
57
|
+
"phase_status": "active | blocked | completed",
|
|
58
|
+
|
|
59
|
+
"gates": {
|
|
60
|
+
"GATE_1": "pending | pass | fail",
|
|
61
|
+
"GATE_2": "pending | pass | fail",
|
|
62
|
+
"GATE_3": "pending | pass | fail",
|
|
63
|
+
"GATE_1_DEV": "pending | pass | fail",
|
|
64
|
+
"GATE_2_DEV": "pending | pass | fail",
|
|
65
|
+
"GATE_1_TEACH": "pending | pass | fail",
|
|
66
|
+
"GATE_2_TEACH": "pending | pass | fail",
|
|
67
|
+
"GATE_3_TEACH": "pending | pass | fail"
|
|
68
|
+
},
|
|
69
|
+
|
|
70
|
+
"active_artifact_ref": "path or inline ID of artifact being worked on",
|
|
71
|
+
"next_action": "concrete action, no ambiguity, for the next turn",
|
|
72
|
+
"blocked_reason": "string | null",
|
|
73
|
+
|
|
74
|
+
"iteration_count": 0,
|
|
75
|
+
"last_updated": "ISO8601",
|
|
76
|
+
"last_agent": "id of the agent that wrote this",
|
|
77
|
+
"last_agent_summary": "≤1 sentence of what that agent did"
|
|
78
|
+
}
|
|
79
|
+
```
|
|
80
|
+
|
|
81
|
+
### Write rules
|
|
82
|
+
|
|
83
|
+
- Orchestrator writes only on init (creates the file)
|
|
84
|
+
- Mode agents write after every phase or gate transition — not before, not mid-phase
|
|
85
|
+
- Sub-agents never write this file
|
|
86
|
+
- If `next_action` is empty or ambiguous, the file is malformed
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## goals.md
|
|
91
|
+
|
|
92
|
+
Tracks what the session is trying to achieve at every level.
|
|
93
|
+
Written by the orchestrator on init, updated by mode agents as work progresses.
|
|
94
|
+
|
|
95
|
+
### Schema
|
|
96
|
+
|
|
97
|
+
```markdown
|
|
98
|
+
# Session Goals
|
|
99
|
+
|
|
100
|
+
## Primary Goal
|
|
101
|
+
{set at init, never modified — the overarching objective of this session}
|
|
102
|
+
|
|
103
|
+
## Active
|
|
104
|
+
- [ ] HIGH | {goal description} | phase: {phase_name}
|
|
105
|
+
- [ ] MEDIUM | {goal description} | phase: {phase_name}
|
|
106
|
+
- [ ] LOW | {goal description} | phase: {phase_name}
|
|
107
|
+
|
|
108
|
+
## Completed
|
|
109
|
+
- [x] {goal description} | closed_at: {phase_name} | result: {one sentence}
|
|
110
|
+
|
|
111
|
+
## Blocked
|
|
112
|
+
- [!] {goal description} | blocked_by: {reason} | unblocks_when: {condition}
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Write rules
|
|
116
|
+
|
|
117
|
+
- Orchestrator writes the Primary Goal and initial Active goals at init
|
|
118
|
+
- Orchestrator may also close or add goals if the user requests it mid-session
|
|
119
|
+
- Mode agents add goals when a phase reveals new objectives
|
|
120
|
+
- Mode agents move goals from Active → Completed after phase exit with a verified result
|
|
121
|
+
- Mode agents move goals from Active → Blocked when a gate returns FAIL and the cause is unresolved
|
|
122
|
+
- Priority values are fixed: `HIGH`, `MEDIUM`, `LOW` — no other values
|
|
123
|
+
- Primary Goal section is never modified after init
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## activity_log.jsonl
|
|
128
|
+
|
|
129
|
+
Append-only log of every significant action. One JSON object per line.
|
|
130
|
+
Mode agents read the last 5 entries at the start of every turn to reconstruct what happened in the previous context window.
|
|
131
|
+
|
|
132
|
+
### Schema (one entry per line)
|
|
133
|
+
|
|
134
|
+
```json
|
|
135
|
+
{
|
|
136
|
+
"ts": "ISO8601",
|
|
137
|
+
"agent": "cs-scientist | cs-scientist-research | cs-scientist-dev",
|
|
138
|
+
"phase": "SCOPE | RETRIEVE | ...",
|
|
139
|
+
"action_type": "phase_enter | phase_complete | gate_dispatch | gate_return | subagent_dispatch | subagent_return | kb_update | goal_update | session_init | session_resume",
|
|
140
|
+
"summary": "≤1 sentence — what happened",
|
|
141
|
+
"result": "outcome or null",
|
|
142
|
+
"iteration": 0
|
|
143
|
+
}
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### Write rules
|
|
147
|
+
|
|
148
|
+
- Written after every significant action — not every thought, every action
|
|
149
|
+
- Significant actions: phase enter, phase complete, gate dispatch, gate return, sub-agent dispatch, sub-agent return, KB update, goal state change
|
|
150
|
+
- Sub-agent outcomes are logged by the mode agent that called them, with `action_type: "subagent_return"`
|
|
151
|
+
- Never rewrite or delete entries — append only
|
|
152
|
+
- `iteration` increments on each full Verified Loop cycle, not on each action
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Dispatch and Return Contracts
|
|
157
|
+
|
|
158
|
+
All inter-agent communication uses this envelope:
|
|
159
|
+
|
|
160
|
+
```
|
|
161
|
+
[DISPATCH → {agent_id}]
|
|
162
|
+
---
|
|
163
|
+
{payload structured per agent spec below}
|
|
164
|
+
---
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
[RETURN → {agent_id}]
|
|
169
|
+
---
|
|
170
|
+
{structured response payload}
|
|
171
|
+
---
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
### cs-scientist-critic
|
|
175
|
+
|
|
176
|
+
**Dispatch payload:**
|
|
177
|
+
```
|
|
178
|
+
GATE: GATE_1 | GATE_2 | GATE_3 | GATE_1_DEV | GATE_2_DEV | GATE_1_TEACH | GATE_2_TEACH | GATE_3_TEACH | CRITIQUE_LIBRE
|
|
179
|
+
ARTIFACT:
|
|
180
|
+
{artifact verbatim}
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
**Return payload:**
|
|
184
|
+
```
|
|
185
|
+
VERDICT: PASS | FAIL | HUMAN_REQUIRED
|
|
186
|
+
GATE: {gate_id}
|
|
187
|
+
FAILURES:
|
|
188
|
+
- {failure 1}
|
|
189
|
+
- {failure 2}
|
|
190
|
+
HUMAN_QUESTIONS:
|
|
191
|
+
- {question if HUMAN_REQUIRED}
|
|
192
|
+
PASS_NOTES:
|
|
193
|
+
- {caveats if PASS}
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### cs-scientist-consultant
|
|
197
|
+
|
|
198
|
+
**Dispatch payload:**
|
|
199
|
+
```
|
|
200
|
+
DOMAIN: {one sentence}
|
|
201
|
+
GATE_DIAGNOSIS: {FAILURES verbatim from Critic return}
|
|
202
|
+
FAILED_ARTIFACT:
|
|
203
|
+
{artifact verbatim}
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
**Return payload:**
|
|
207
|
+
```
|
|
208
|
+
ROOT_CAUSE: {specific, not generic}
|
|
209
|
+
CORRECTION: {concrete change — which line/section and how}
|
|
210
|
+
WHY_APPROACH_FAILED: {domain-specific reason}
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
### cs-scientist-arbiter
|
|
214
|
+
|
|
215
|
+
**Dispatch payload:**
|
|
216
|
+
```
|
|
217
|
+
BRIEF: {~800 tokens of shared context}
|
|
218
|
+
DEFENDER_A: {structured output from defender agent}
|
|
219
|
+
DEFENDER_B: {structured output from defender agent}
|
|
220
|
+
DEFENDER_C: {structured output from defender agent}
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
**Return payload:**
|
|
224
|
+
```
|
|
225
|
+
SYNTHESIS:
|
|
226
|
+
- In {situation A}: → Option {X} | {reason ≤1 line}
|
|
227
|
+
- In {situation B}: → Option {Y} | {reason ≤1 line}
|
|
228
|
+
NOT_RECOMMENDED_IF:
|
|
229
|
+
- Option A: {disqualifying condition}
|
|
230
|
+
- Option B: {disqualifying condition}
|
|
231
|
+
- Option C: {disqualifying condition}
|
|
232
|
+
FOR_CURRENT_CONTEXT:
|
|
233
|
+
→ {recommended option} | {justification ≤3 lines}
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
## Iron Rule — applied in every agent prompt that touches disk
|
|
239
|
+
|
|
240
|
+
```
|
|
241
|
+
FIRST action every turn:
|
|
242
|
+
1. Read session_state.json
|
|
243
|
+
2. Read goals.md
|
|
244
|
+
3. Read last 5 entries of activity_log.jsonl
|
|
245
|
+
|
|
246
|
+
AFTER every significant action:
|
|
247
|
+
4. Append one entry to activity_log.jsonl
|
|
248
|
+
|
|
249
|
+
AFTER any phase or gate transition:
|
|
250
|
+
5. Update session_state.json
|
|
251
|
+
6. Update goals.md if goal state changed
|
|
252
|
+
|
|
253
|
+
If session_state.json does not exist: stop and notify the user — do not improvise state.
|
|
254
|
+
If activity_log.jsonl does not exist: create it empty, then proceed.
|
|
255
|
+
If goals.md does not exist: stop and notify the user — do not improvise goals.
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
This is not a guideline. It appears as a NEVER block in every mode agent.
|
|
259
|
+
|
|
260
|
+
---
|
|
261
|
+
|
|
262
|
+
## Isolation Rule
|
|
263
|
+
|
|
264
|
+
The value of sub-agents comes from having zero session context.
|
|
265
|
+
When dispatching, pass only the structured artifact — never session history, never "for context."
|
|
266
|
+
A critic with session context cannot evaluate adversarially.
|
|
267
|
+
|
|
268
|
+
---
|
|
269
|
+
|
|
270
|
+
## Gate Failure Routing
|
|
271
|
+
|
|
272
|
+
Before acting on a gate failure, classify the cause:
|
|
273
|
+
|
|
274
|
+
```
|
|
275
|
+
Does the failure mention methodological terms ("falsifiable", "circular", "verifier")?
|
|
276
|
+
→ METHODOLOGICAL → mode agent corrects directly (max 2 attempts, then HUMAN_REQUIRED)
|
|
277
|
+
|
|
278
|
+
Does the failure mention datasets, algorithms, libraries, domain terminology?
|
|
279
|
+
→ DOMAIN → dispatch cs-scientist-consultant before retrying
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
---
|
|
283
|
+
|
|
284
|
+
## Project Health Check
|
|
285
|
+
|
|
286
|
+
The orchestrator runs this check **before** asking the user for mode (Research or Dev).
|
|
287
|
+
Never skip it. Never run it after the session has started.
|
|
288
|
+
The goal is to ensure the project has the files it needs to be maintainable by agents and humans.
|
|
289
|
+
|
|
290
|
+
### Universal files — any project
|
|
291
|
+
|
|
292
|
+
| File | Purpose | Action if missing |
|
|
293
|
+
|------|---------|-------------------|
|
|
294
|
+
| `AGENTS.md` or `CLAUDE.md` | AI agent instructions for this project | Run AGENTS.md questionnaire (see below) |
|
|
295
|
+
| `README.md` | What it is, how to run, how to test | Notify user — do not create without their input |
|
|
296
|
+
| `.gitignore` | Exclude files from version control | Create base version for detected project type |
|
|
297
|
+
| `CHANGELOG.md` | Version history | Create empty with standard header |
|
|
298
|
+
| `docs/adr/` | Architecture Decision Records | Propose creating first ADR if architectural decisions are made during session |
|
|
299
|
+
|
|
300
|
+
### Project-type files
|
|
301
|
+
|
|
302
|
+
Detect project type from existing files, then check for:
|
|
303
|
+
|
|
304
|
+
| Type | Required files |
|
|
305
|
+
|------|---------------|
|
|
306
|
+
| Node / Web | `package.json`, `.env.example` |
|
|
307
|
+
| Python | `pyproject.toml` or `requirements.txt`, `.python-version` |
|
|
308
|
+
| Container | `Dockerfile`, `docker-compose.yml` |
|
|
309
|
+
| CI/CD active | `.github/workflows/` or equivalent |
|
|
310
|
+
| ML / Data | `data/README.md`, `MODEL_CARD.md` |
|
|
311
|
+
|
|
312
|
+
### Rules
|
|
313
|
+
|
|
314
|
+
- Missing universal files → notify user with a checklist before proceeding
|
|
315
|
+
- Missing project-type files → notify, offer to create, do not block the session
|
|
316
|
+
- `README.md` is never auto-created — it requires human context
|
|
317
|
+
- `AGENTS.md` triggers the questionnaire below — it is the most important file for agents
|
|
318
|
+
- ADRs are proposed, never created silently
|
|
319
|
+
|
|
320
|
+
---
|
|
321
|
+
|
|
322
|
+
## AGENTS.md Questionnaire
|
|
323
|
+
|
|
324
|
+
Run when `AGENTS.md` (or `CLAUDE.md`) does not exist.
|
|
325
|
+
Ask exactly these 8 questions — no more, no less.
|
|
326
|
+
Every line in the resulting file must earn its place. Verbose autogenerated files hurt agent performance.
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
1. What type of project is this?
|
|
330
|
+
(web app / CLI / library / data pipeline / ML / embedded system / other)
|
|
331
|
+
|
|
332
|
+
2. What language(s) and main frameworks?
|
|
333
|
+
|
|
334
|
+
3. What architecture pattern does it follow?
|
|
335
|
+
(monolith / microservices / MVC / event-driven / CQRS / no formal architecture)
|
|
336
|
+
|
|
337
|
+
4. What are the exact commands to: run locally, run tests, deploy?
|
|
338
|
+
|
|
339
|
+
5. What is the "done" criterion for a task?
|
|
340
|
+
(tests pass / linter clean / review approved / specific metrics)
|
|
341
|
+
|
|
342
|
+
6. What should an AI agent NEVER do in this project?
|
|
343
|
+
(touch prod / modify migrations / push without review / other)
|
|
344
|
+
|
|
345
|
+
7. Are there non-obvious architectural decisions an agent must know about?
|
|
346
|
+
(something that looks wrong but has a reason)
|
|
347
|
+
|
|
348
|
+
8. Are there hard constraints?
|
|
349
|
+
(performance targets, security requirements, compliance, backwards compatibility)
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
From the answers, generate a minimal `AGENTS.md` with these sections only:
|
|
353
|
+
|
|
354
|
+
```markdown
|
|
355
|
+
# AGENTS.md
|
|
356
|
+
|
|
357
|
+
## Project
|
|
358
|
+
{type} — {language/framework} — {architecture pattern}
|
|
359
|
+
|
|
360
|
+
## Run
|
|
361
|
+
{exact commands}
|
|
362
|
+
|
|
363
|
+
## Done means
|
|
364
|
+
{done criterion}
|
|
365
|
+
|
|
366
|
+
## Never
|
|
367
|
+
- {forbidden action 1}
|
|
368
|
+
- {forbidden action 2}
|
|
369
|
+
|
|
370
|
+
## Non-obvious decisions
|
|
371
|
+
- {decision}: {reason}
|
|
372
|
+
|
|
373
|
+
## Hard constraints
|
|
374
|
+
- {constraint}
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
Omit any section for which the user had no answer. Do not pad with generic advice.
|
|
378
|
+
|
|
379
|
+
---
|
|
380
|
+
|
|
381
|
+
## Verifier Hierarchy
|
|
382
|
+
|
|
383
|
+
Inspired by DeepMind's AlphaProof: the most rigorous verification is the most formal available.
|
|
384
|
+
When defining the external truth criterion (SCOPE phase, GATE_1), choose the highest applicable level.
|
|
385
|
+
|
|
386
|
+
```
|
|
387
|
+
Level 1 — Formal (strongest)
|
|
388
|
+
Compiler, type checker, proof assistant (Lean, Coq), constraint solver.
|
|
389
|
+
If it compiles and type-checks, the property holds by construction.
|
|
390
|
+
Use when: type safety, protocol conformance, mathematical theorems.
|
|
391
|
+
|
|
392
|
+
Level 2 — Automated test
|
|
393
|
+
Unit test, integration test, property-based test (Hypothesis, QuickCheck).
|
|
394
|
+
Executable and repeatable. Fails predictably.
|
|
395
|
+
Use when: functional correctness, contract compliance, regression prevention.
|
|
396
|
+
|
|
397
|
+
Level 3 — Empirical measurement
|
|
398
|
+
Benchmark, statistical test, A/B comparison, reproducible experiment.
|
|
399
|
+
Results must be reproducible with the same seed/dataset.
|
|
400
|
+
Use when: performance, quality metrics, model evaluation.
|
|
401
|
+
|
|
402
|
+
Level 4 — Expert review
|
|
403
|
+
A qualified human evaluates against explicit criteria.
|
|
404
|
+
The criteria must be written down before the review — not derived from the output.
|
|
405
|
+
Use when: no automated verifier exists for the property being checked.
|
|
406
|
+
|
|
407
|
+
Level 5 — Self-assessment (weakest — avoid)
|
|
408
|
+
The model evaluates its own output.
|
|
409
|
+
Only acceptable as a preliminary filter before a higher-level verifier.
|
|
410
|
+
Never as the final gate.
|
|
411
|
+
```
|
|
412
|
+
|
|
413
|
+
### Rules
|
|
414
|
+
|
|
415
|
+
- The mode agent declares the verifier level in `session_state.json` during SCOPE
|
|
416
|
+
- Never accept a Level 5 verifier as a gate — it is a Verified Loop violation
|
|
417
|
+
- If no verifier above Level 4 exists for a claim, the claim stays `[HYPOTHESIS]`
|
|
418
|
+
- When multiple verifiers apply, use the highest level — do not settle for a lower one
|
|
419
|
+
because it is more convenient
|
|
420
|
+
- Error messages from Level 1/2 verifiers feed back into the next attempt verbatim —
|
|
421
|
+
not summarized, not interpreted. The raw error is the structured input.
|
|
422
|
+
|
|
423
|
+
---
|
|
424
|
+
|
|
425
|
+
## Schema Version
|
|
426
|
+
|
|
427
|
+
When this protocol changes in a breaking way, increment `schema_version` in `session_state.json`.
|
|
428
|
+
Mode agents must reject state files with a schema version they do not recognize.
|
|
429
|
+
|
|
430
|
+
Current version: `1.2`
|
|
431
|
+
Changes from `1.1`: Added Project Health Check and AGENTS.md Questionnaire sections.
|
package/README.md
ADDED
|
@@ -0,0 +1,256 @@
|
|
|
1
|
+
# cs-scientist-plugin
|
|
2
|
+
|
|
3
|
+
A multi-agent system for rigorous research, development, and teaching — built for [opencode](https://opencode.ai) and [Claude Code](https://claude.ai/code).
|
|
4
|
+
|
|
5
|
+
The core principle is borrowed from DeepMind's most reliable systems (AlphaFold, AlphaProof, FunSearch):
|
|
6
|
+
|
|
7
|
+
> **The model proposes. An external verifier decides.**
|
|
8
|
+
|
|
9
|
+
No self-assessed output advances to the next phase. Every gate is evaluated by a fresh agent with zero session context.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## What it does
|
|
14
|
+
|
|
15
|
+
Three operating modes, each a structured loop with adversarial gates:
|
|
16
|
+
|
|
17
|
+
| Mode | Purpose | External verifier |
|
|
18
|
+
|------|---------|------------------|
|
|
19
|
+
| **RESEARCH** | Investigate a topic: hypothesis, sources, triangulation, report | Reproducible experiment or ≥3 independent sources |
|
|
20
|
+
| **DEV** | Build something with correctness guarantees: TDD, verified design, traced decisions | Compiler / type checker / tests (formality hierarchy enforced) |
|
|
21
|
+
| **TEACH** | Learn or teach from provided source materials: progressive explanation, tiered exercises | Student can solve Tier 3 exercises that recall alone cannot answer |
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## Agents
|
|
26
|
+
|
|
27
|
+
| Agent | Role |
|
|
28
|
+
|-------|------|
|
|
29
|
+
| `cs-scientist` | Orchestrator — routes to modes, runs project health check, initializes session |
|
|
30
|
+
| `cs-scientist-research` | Research loop — 10 phases: SCOPE → DECOMPOSE → RETRIEVE → TRIANGULATE → PROPOSE → EXPERIMENT → ANALYZE → SYNTHESIZE → CRITIQUE → DOCUMENT |
|
|
31
|
+
| `cs-scientist-dev` | Dev loop — 7 phases: SCOPE → DESIGN → PLAN → IMPLEMENT → VERIFY → ITERATE → DOCUMENT |
|
|
32
|
+
| `cs-scientist-teach` | Teaching loop — 7 phases: INTAKE → MAP → SCAFFOLD → EXPLAIN → VERIFY → ITERATE → DOCUMENT |
|
|
33
|
+
| `cs-scientist-critic` | Adversarial gate validator — zero session context, structured PASS/FAIL/HUMAN_REQUIRED |
|
|
34
|
+
| `cs-scientist-consultant` | Domain expert for gate failures caused by missing domain knowledge |
|
|
35
|
+
| `cs-scientist-arbiter` | Council of State synthesis — evaluates 3+ competing options situationally |
|
|
36
|
+
|
|
37
|
+
Sub-agents (critic, consultant, arbiter) **never touch disk**. They receive a structured artifact and return a structured verdict.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Skills
|
|
42
|
+
|
|
43
|
+
| Skill | When to use |
|
|
44
|
+
|-------|-------------|
|
|
45
|
+
| `deep-research` | Directed research on a single question with KB-compatible output |
|
|
46
|
+
| `parallel-research` | Independent multi-angle searches with no cross-contamination |
|
|
47
|
+
| `kb-validate` | Validate KB integrity before a gate: tag consistency, source completeness, circular refs |
|
|
48
|
+
| `session-status` | Human-readable session state + ready-to-paste resume block after context compaction |
|
|
49
|
+
| `negative-results` | Document what didn't work and why: gate failures, refuted hypotheses, discarded approaches |
|
|
50
|
+
| `notebooklm` | Convert a completed research report to podcast script, FAQ, or executive briefing |
|
|
51
|
+
| `writing-plans` | Ultra-detailed TDD implementation plans — actual code in every step, no placeholders |
|
|
52
|
+
| `project-onboarding` | Generate a Day 1 guide for a new team member from the current repo state |
|
|
53
|
+
| `concept-explainer` | Explain a concept at 3 levels (accessible / practitioner / researcher) without starting a session |
|
|
54
|
+
| `paper-outline` | Map a research session KB to an academic paper skeleton |
|
|
55
|
+
| `lesson-plan` | Generate a structured lesson plan for class preparation |
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Install
|
|
60
|
+
|
|
61
|
+
```bash
|
|
62
|
+
npm install -g cs-scientist-plugin
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The postinstall script detects which tools you have and asks before copying anything.
|
|
66
|
+
|
|
67
|
+
**Manual install:**
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
git clone https://github.com/QuiquiMatCom2004/cs-scientist-plugin.git
|
|
71
|
+
cd cs-scientist-plugin
|
|
72
|
+
node bin/install.js
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Options:**
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
node bin/install.js # interactive — asks per platform
|
|
79
|
+
node bin/install.js --opencode # opencode only
|
|
80
|
+
node bin/install.js --claude # Claude Code only
|
|
81
|
+
node bin/install.js --force # overwrite existing files on update
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
**Where files go:**
|
|
85
|
+
|
|
86
|
+
| Platform | Agents | Skills |
|
|
87
|
+
|----------|--------|--------|
|
|
88
|
+
| opencode | `~/.config/opencode/agents/` | `~/.config/opencode/skills/` |
|
|
89
|
+
| Claude Code | `~/.claude/agents/` | `~/.claude/commands/` |
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Quick start
|
|
94
|
+
|
|
95
|
+
Activate the orchestrator:
|
|
96
|
+
|
|
97
|
+
```
|
|
98
|
+
/cs-scientist
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
Or say: `investiga`, `desarrolla con rigor`, `quiero aprender`, `modo research`, `modo dev`, `modo teach`.
|
|
102
|
+
|
|
103
|
+
The orchestrator runs a project health check, asks which mode and topic, initializes three session files in `.cs-scientist/{session_id}/`, and dispatches to the mode agent.
|
|
104
|
+
|
|
105
|
+
### Research
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
/cs-scientist
|
|
109
|
+
→ What is the impact of transformers on NLP compared to RNNs?
|
|
110
|
+
→ A) RESEARCH
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
The research agent runs 10 phases. GATE_1 verifies the question is falsifiable. GATE_2 verifies ≥3 independent sources per claim. GATE_3 verifies the hypothesis is falsifiable and non-circular.
|
|
114
|
+
|
|
115
|
+
### Dev
|
|
116
|
+
|
|
117
|
+
```
|
|
118
|
+
/cs-scientist
|
|
119
|
+
→ Implement a sliding window rate limiter with Redis
|
|
120
|
+
→ B) DEV
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
The dev agent runs 7 phases. GATE_1_DEV verifies the done criterion is external and binary. GATE_2_DEV verifies the design is unambiguous. Phase 3 PLAN invokes the `writing-plans` skill to produce ultra-detailed TDD steps with actual code in every task.
|
|
124
|
+
|
|
125
|
+
### Teach
|
|
126
|
+
|
|
127
|
+
```
|
|
128
|
+
/cs-scientist
|
|
129
|
+
→ I want to understand backpropagation
|
|
130
|
+
→ C) TEACH
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
The teach agent loads your source materials (papers, books, lecture notes), maps concept dependencies, scaffolds a lesson from your current level to the objective, and teaches each concept with a 7-step extractor:
|
|
134
|
+
|
|
135
|
+
1. **Minimal intuition** — using only vocabulary you already have
|
|
136
|
+
2. **Formal definition** — verbatim from source
|
|
137
|
+
3. **Best example** — concrete, not abstract
|
|
138
|
+
4. **Counter-example** — what this is NOT
|
|
139
|
+
5. **Implication** — what knowing this lets you do
|
|
140
|
+
6. **Connection backwards** — how this links to what came before
|
|
141
|
+
7. **What it unlocks** — preview of what comes next
|
|
142
|
+
|
|
143
|
+
Verification: Tier 1 (recall), Tier 2 (apply in source domain), Tier 3 (new scenario where recalling the source is insufficient — reasoning is mandatory).
|
|
144
|
+
|
|
145
|
+
### Resuming after context compaction
|
|
146
|
+
|
|
147
|
+
```
|
|
148
|
+
/session-status
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
Returns the current phase, gate states, active goals, last 5 actions, and a ready-to-paste dispatch block. Single copy-paste to resume.
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Session files
|
|
156
|
+
|
|
157
|
+
Every session creates `.cs-scientist/{session_id}/` in the project root:
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
.cs-scientist/
|
|
161
|
+
└── topic-slug_mode_YYYYMMDD/
|
|
162
|
+
├── session_state.json # state machine — phase, gates, next action
|
|
163
|
+
├── goals.md # goal tracker — active, completed, blocked
|
|
164
|
+
├── activity_log.jsonl # append-only action history (last 5 read each turn)
|
|
165
|
+
├── knowledge_base.md # verified findings accumulator
|
|
166
|
+
├── plan.md # (dev) TDD implementation plan
|
|
167
|
+
└── lesson.md # (teach) lesson with exercises and solutions
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
`session_state.json` is the single source of truth for where the session is.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## The Verified Loop
|
|
175
|
+
|
|
176
|
+
Every mode runs a variant of this cycle:
|
|
177
|
+
|
|
178
|
+
```
|
|
179
|
+
PROPOSE → CRITIQUE → VERIFY (external) → PERSIST → ITERATE
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
What makes it rigorous:
|
|
183
|
+
|
|
184
|
+
- **External verifier** — defined in SCOPE before any work starts. Not "I'll review it" — a concrete test, benchmark, or experiment that a fresh agent can check independently.
|
|
185
|
+
- **Adversarial critic** — evaluates artifacts with zero session context. Cannot be reassured by prior conversation. Only the artifact matters.
|
|
186
|
+
- **Structured failures** — `FAILURES` must cite the exact part of the artifact that fails. "Needs improvement" is not a failure.
|
|
187
|
+
- **Verifier hierarchy** — formal verifiers (compiler, type checker, proof assistant) are preferred over tests, tests over empirical measurement, empirical over human review. Self-assessment is never a final gate.
|
|
188
|
+
|
|
189
|
+
### Gates
|
|
190
|
+
|
|
191
|
+
| Gate | Phase transition | What the critic checks |
|
|
192
|
+
|------|-----------------|----------------------|
|
|
193
|
+
| `GATE_1` | Research SCOPE | Falsifiable question, external binary truth criterion, explicit scope |
|
|
194
|
+
| `GATE_2` | Research TRIANGULATE | ≥3 independent sources per `[FACT]`, contradictions documented |
|
|
195
|
+
| `GATE_3` | Research PROPOSE | Hypothesis falsifiable, non-circular, evidence from `[VERIFIED]` only |
|
|
196
|
+
| `GATE_1_DEV` | Dev SCOPE | External binary verifier, unambiguous done criterion, explicit constraints |
|
|
197
|
+
| `GATE_2_DEV` | Dev DESIGN | Any developer can implement without clarifying questions, all `[DECISION]` entries present |
|
|
198
|
+
| `GATE_1_TEACH` | Teach INTAKE | Objective is measurable (a capability, not "understand X"), sources sufficient |
|
|
199
|
+
| `GATE_2_TEACH` | Teach SCAFFOLD | All bridges start from student's actual knowledge, no forward dependencies |
|
|
200
|
+
| `GATE_3_TEACH` | Teach VERIFY | Tier 3 exercises cannot be answered by recall alone |
|
|
201
|
+
|
|
202
|
+
### Gate failure routing
|
|
203
|
+
|
|
204
|
+
```
|
|
205
|
+
Is the failure methodological? ("verifier not binary", "circular", "ambiguous")
|
|
206
|
+
→ Mode agent corrects directly — max 2 attempts, then HUMAN_REQUIRED
|
|
207
|
+
|
|
208
|
+
Is the failure domain-specific? (unknown framework behavior, unclear protocol)
|
|
209
|
+
→ Dispatch cs-scientist-consultant — one correction, then retry
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Knowledge base tags
|
|
215
|
+
|
|
216
|
+
Research and teach sessions accumulate findings in `knowledge_base.md`:
|
|
217
|
+
|
|
218
|
+
**Research KB:**
|
|
219
|
+
```
|
|
220
|
+
[FACT] — verified by ≥3 independent sources
|
|
221
|
+
[VERIFIED] — confirmed by experiment
|
|
222
|
+
[HYPOTHESIS] — plausible but unverified
|
|
223
|
+
[SYNTHESIS] — model-generated connection between verified facts
|
|
224
|
+
[REFUTED] — tested and disproven
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
**Teach KB:**
|
|
228
|
+
```
|
|
229
|
+
[CORE] — fundamental concept on the critical path to the objective
|
|
230
|
+
[ADVANCED] — builds on CORE, required for full depth
|
|
231
|
+
[APPLIED] — application of a concept to a specific domain
|
|
232
|
+
[PREREQUISITE] — needed but not taught in this session
|
|
233
|
+
[MISCONCEPTION]— common wrong understanding, addressed explicitly in EXPLAIN
|
|
234
|
+
```
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
## Protocol
|
|
239
|
+
|
|
240
|
+
The full inter-agent communication contract is in `PROTOCOL.md`. It defines the dispatch/return envelope format, gate criteria per gate type, the Iron Rule (3 mandatory reads per turn), verifier hierarchy, and session file schemas.
|
|
241
|
+
|
|
242
|
+
If an agent behavior conflicts with `PROTOCOL.md`, the protocol wins.
|
|
243
|
+
|
|
244
|
+
---
|
|
245
|
+
|
|
246
|
+
## Requirements
|
|
247
|
+
|
|
248
|
+
- Node.js ≥ 18
|
|
249
|
+
- [opencode](https://opencode.ai) and/or [Claude Code](https://claude.ai/code)
|
|
250
|
+
- A model API key configured in your tool
|
|
251
|
+
|
|
252
|
+
---
|
|
253
|
+
|
|
254
|
+
## License
|
|
255
|
+
|
|
256
|
+
MIT
|