cs-scientist-plugin 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/PROTOCOL.md ADDED
@@ -0,0 +1,431 @@
1
+ # CS-Scientist Plugin — Communication Protocol v1.1
2
+
3
+ This document is the authoritative contract for all agents in the plugin.
4
+ Before writing or modifying any agent prompt, read this file.
5
+ If a behavior conflicts with this document, this document wins.
6
+
7
+ ---
8
+
9
+ ## Agent Registry
10
+
11
+ Each agent has exactly one responsibility. If it does more than one thing, it is incorrectly designed.
12
+
13
+ | Agent | Single Responsibility | Writes to disk | Reads from disk |
14
+ |-------|-----------------------|---------------|-----------------|
15
+ | `cs-scientist` | Routing + session init | `session_state.json`, `goals.md` (init only) | `session_state.json` |
16
+ | `cs-scientist-research` | Research loop (phases 1–10) | `session_state.json`, `goals.md`, `activity_log.jsonl`, KB | `session_state.json`, `goals.md`, `activity_log.jsonl` (last 5), KB |
17
+ | `cs-scientist-dev` | Dev loop (phases 1–7) | `session_state.json`, `goals.md`, `activity_log.jsonl`, KB | `session_state.json`, `goals.md`, `activity_log.jsonl` (last 5), KB |
18
+ | `cs-scientist-teach` | Teaching loop (phases 1–7) | `session_state.json`, `goals.md`, `activity_log.jsonl`, KB, `lesson.md` | `session_state.json`, `goals.md`, `activity_log.jsonl` (last 5), KB, sources |
19
+ | `cs-scientist-critic` | Gate validation | Nothing | Nothing |
20
+ | `cs-scientist-consultant` | Domain diagnosis | Nothing | Nothing |
21
+ | `cs-scientist-arbiter` | Council synthesis | Nothing | Nothing |
22
+
23
+ Sub-agents (critic, consultant, arbiter) **never touch disk**.
24
+ They receive structured input and return structured output. Period.
25
+ Sub-agent outcomes are logged by the mode agent that called them, not by the sub-agent itself.
26
+
27
+ ---
28
+
29
+ ## Session Files
30
+
31
+ Every session lives in `.cs-scientist/{session_id}/` relative to the project root.
32
+ The session contains exactly these files:
33
+
34
+ | File | Purpose | Append-only |
35
+ |------|---------|-------------|
36
+ | `session_state.json` | State machine — where the session is right now | No (overwritten) |
37
+ | `goals.md` | Goal tracker — what the session is trying to achieve | No (updated) |
38
+ | `activity_log.jsonl` | Action history — what each agent did per turn | Yes |
39
+ | `knowledge_base.md` | Verified facts accumulator | No (updated) |
40
+
41
+ ---
42
+
43
+ ## session_state.json
44
+
45
+ The single file that defines where the session is.
46
+ Everything else (KB, logs) is content. This is the state machine.
47
+
48
+ ```json
49
+ {
50
+ "schema_version": "1.0",
51
+ "session_id": "topic-slug_mode_YYYYMMDD",
52
+ "mode": "research | dev | null",
53
+ "topic": "string",
54
+ "verifier": "string — external truth criterion",
55
+
56
+ "phase": "SCOPE | DECOMPOSE | RETRIEVE | TRIANGULATE | PROPOSE | EXPERIMENT | ANALYZE | SYNTHESIZE | CRITIQUE | DOCUMENT | DESIGN | PLAN | IMPLEMENT | VERIFY | ITERATE | INTAKE | MAP | SCAFFOLD | EXPLAIN | ITERATE_TEACH",
57
+ "phase_status": "active | blocked | completed",
58
+
59
+ "gates": {
60
+ "GATE_1": "pending | pass | fail",
61
+ "GATE_2": "pending | pass | fail",
62
+ "GATE_3": "pending | pass | fail",
63
+ "GATE_1_DEV": "pending | pass | fail",
64
+ "GATE_2_DEV": "pending | pass | fail",
65
+ "GATE_1_TEACH": "pending | pass | fail",
66
+ "GATE_2_TEACH": "pending | pass | fail",
67
+ "GATE_3_TEACH": "pending | pass | fail"
68
+ },
69
+
70
+ "active_artifact_ref": "path or inline ID of artifact being worked on",
71
+ "next_action": "concrete action, no ambiguity, for the next turn",
72
+ "blocked_reason": "string | null",
73
+
74
+ "iteration_count": 0,
75
+ "last_updated": "ISO8601",
76
+ "last_agent": "id of the agent that wrote this",
77
+ "last_agent_summary": "≤1 sentence of what that agent did"
78
+ }
79
+ ```
80
+
81
+ ### Write rules
82
+
83
+ - Orchestrator writes only on init (creates the file)
84
+ - Mode agents write after every phase or gate transition — not before, not mid-phase
85
+ - Sub-agents never write this file
86
+ - If `next_action` is empty or ambiguous, the file is malformed
87
+
88
+ ---
89
+
90
+ ## goals.md
91
+
92
+ Tracks what the session is trying to achieve at every level.
93
+ Written by the orchestrator on init, updated by mode agents as work progresses.
94
+
95
+ ### Schema
96
+
97
+ ```markdown
98
+ # Session Goals
99
+
100
+ ## Primary Goal
101
+ {set at init, never modified — the overarching objective of this session}
102
+
103
+ ## Active
104
+ - [ ] HIGH | {goal description} | phase: {phase_name}
105
+ - [ ] MEDIUM | {goal description} | phase: {phase_name}
106
+ - [ ] LOW | {goal description} | phase: {phase_name}
107
+
108
+ ## Completed
109
+ - [x] {goal description} | closed_at: {phase_name} | result: {one sentence}
110
+
111
+ ## Blocked
112
+ - [!] {goal description} | blocked_by: {reason} | unblocks_when: {condition}
113
+ ```
114
+
115
+ ### Write rules
116
+
117
+ - Orchestrator writes the Primary Goal and initial Active goals at init
118
+ - Orchestrator may also close or add goals if the user requests it mid-session
119
+ - Mode agents add goals when a phase reveals new objectives
120
+ - Mode agents move goals from Active → Completed after phase exit with a verified result
121
+ - Mode agents move goals from Active → Blocked when a gate returns FAIL and the cause is unresolved
122
+ - Priority values are fixed: `HIGH`, `MEDIUM`, `LOW` — no other values
123
+ - Primary Goal section is never modified after init
124
+
125
+ ---
126
+
127
+ ## activity_log.jsonl
128
+
129
+ Append-only log of every significant action. One JSON object per line.
130
+ Mode agents read the last 5 entries at the start of every turn to reconstruct what happened in the previous context window.
131
+
132
+ ### Schema (one entry per line)
133
+
134
+ ```json
135
+ {
136
+ "ts": "ISO8601",
137
+ "agent": "cs-scientist | cs-scientist-research | cs-scientist-dev",
138
+ "phase": "SCOPE | RETRIEVE | ...",
139
+ "action_type": "phase_enter | phase_complete | gate_dispatch | gate_return | subagent_dispatch | subagent_return | kb_update | goal_update | session_init | session_resume",
140
+ "summary": "≤1 sentence — what happened",
141
+ "result": "outcome or null",
142
+ "iteration": 0
143
+ }
144
+ ```
145
+
146
+ ### Write rules
147
+
148
+ - Written after every significant action — not every thought, every action
149
+ - Significant actions: phase enter, phase complete, gate dispatch, gate return, sub-agent dispatch, sub-agent return, KB update, goal state change
150
+ - Sub-agent outcomes are logged by the mode agent that called them, with `action_type: "subagent_return"`
151
+ - Never rewrite or delete entries — append only
152
+ - `iteration` increments on each full Verified Loop cycle, not on each action
153
+
154
+ ---
155
+
156
+ ## Dispatch and Return Contracts
157
+
158
+ All inter-agent communication uses this envelope:
159
+
160
+ ```
161
+ [DISPATCH → {agent_id}]
162
+ ---
163
+ {payload structured per agent spec below}
164
+ ---
165
+ ```
166
+
167
+ ```
168
+ [RETURN → {agent_id}]
169
+ ---
170
+ {structured response payload}
171
+ ---
172
+ ```
173
+
174
+ ### cs-scientist-critic
175
+
176
+ **Dispatch payload:**
177
+ ```
178
+ GATE: GATE_1 | GATE_2 | GATE_3 | GATE_1_DEV | GATE_2_DEV | GATE_1_TEACH | GATE_2_TEACH | GATE_3_TEACH | CRITIQUE_LIBRE
179
+ ARTIFACT:
180
+ {artifact verbatim}
181
+ ```
182
+
183
+ **Return payload:**
184
+ ```
185
+ VERDICT: PASS | FAIL | HUMAN_REQUIRED
186
+ GATE: {gate_id}
187
+ FAILURES:
188
+ - {failure 1}
189
+ - {failure 2}
190
+ HUMAN_QUESTIONS:
191
+ - {question if HUMAN_REQUIRED}
192
+ PASS_NOTES:
193
+ - {caveats if PASS}
194
+ ```
195
+
196
+ ### cs-scientist-consultant
197
+
198
+ **Dispatch payload:**
199
+ ```
200
+ DOMAIN: {one sentence}
201
+ GATE_DIAGNOSIS: {FAILURES verbatim from Critic return}
202
+ FAILED_ARTIFACT:
203
+ {artifact verbatim}
204
+ ```
205
+
206
+ **Return payload:**
207
+ ```
208
+ ROOT_CAUSE: {specific, not generic}
209
+ CORRECTION: {concrete change — which line/section and how}
210
+ WHY_APPROACH_FAILED: {domain-specific reason}
211
+ ```
212
+
213
+ ### cs-scientist-arbiter
214
+
215
+ **Dispatch payload:**
216
+ ```
217
+ BRIEF: {~800 tokens of shared context}
218
+ DEFENDER_A: {structured output from defender agent}
219
+ DEFENDER_B: {structured output from defender agent}
220
+ DEFENDER_C: {structured output from defender agent}
221
+ ```
222
+
223
+ **Return payload:**
224
+ ```
225
+ SYNTHESIS:
226
+ - In {situation A}: → Option {X} | {reason ≤1 line}
227
+ - In {situation B}: → Option {Y} | {reason ≤1 line}
228
+ NOT_RECOMMENDED_IF:
229
+ - Option A: {disqualifying condition}
230
+ - Option B: {disqualifying condition}
231
+ - Option C: {disqualifying condition}
232
+ FOR_CURRENT_CONTEXT:
233
+ → {recommended option} | {justification ≤3 lines}
234
+ ```
235
+
236
+ ---
237
+
238
+ ## Iron Rule — applied in every agent prompt that touches disk
239
+
240
+ ```
241
+ FIRST action every turn:
242
+ 1. Read session_state.json
243
+ 2. Read goals.md
244
+ 3. Read last 5 entries of activity_log.jsonl
245
+
246
+ AFTER every significant action:
247
+ 4. Append one entry to activity_log.jsonl
248
+
249
+ AFTER any phase or gate transition:
250
+ 5. Update session_state.json
251
+ 6. Update goals.md if goal state changed
252
+
253
+ If session_state.json does not exist: stop and notify the user — do not improvise state.
254
+ If activity_log.jsonl does not exist: create it empty, then proceed.
255
+ If goals.md does not exist: stop and notify the user — do not improvise goals.
256
+ ```
257
+
258
+ This is not a guideline. It appears as a NEVER block in every mode agent.
259
+
260
+ ---
261
+
262
+ ## Isolation Rule
263
+
264
+ The value of sub-agents comes from having zero session context.
265
+ When dispatching, pass only the structured artifact — never session history, never "for context."
266
+ A critic with session context cannot evaluate adversarially.
267
+
268
+ ---
269
+
270
+ ## Gate Failure Routing
271
+
272
+ Before acting on a gate failure, classify the cause:
273
+
274
+ ```
275
+ Does the failure mention methodological terms ("falsifiable", "circular", "verifier")?
276
+ → METHODOLOGICAL → mode agent corrects directly (max 2 attempts, then HUMAN_REQUIRED)
277
+
278
+ Does the failure mention datasets, algorithms, libraries, domain terminology?
279
+ → DOMAIN → dispatch cs-scientist-consultant before retrying
280
+ ```
281
+
282
+ ---
283
+
284
+ ## Project Health Check
285
+
286
+ The orchestrator runs this check **before** asking the user for mode (Research or Dev).
287
+ Never skip it. Never run it after the session has started.
288
+ The goal is to ensure the project has the files it needs to be maintainable by agents and humans.
289
+
290
+ ### Universal files — any project
291
+
292
+ | File | Purpose | Action if missing |
293
+ |------|---------|-------------------|
294
+ | `AGENTS.md` or `CLAUDE.md` | AI agent instructions for this project | Run AGENTS.md questionnaire (see below) |
295
+ | `README.md` | What it is, how to run, how to test | Notify user — do not create without their input |
296
+ | `.gitignore` | Exclude files from version control | Create base version for detected project type |
297
+ | `CHANGELOG.md` | Version history | Create empty with standard header |
298
+ | `docs/adr/` | Architecture Decision Records | Propose creating first ADR if architectural decisions are made during session |
299
+
300
+ ### Project-type files
301
+
302
+ Detect project type from existing files, then check for:
303
+
304
+ | Type | Required files |
305
+ |------|---------------|
306
+ | Node / Web | `package.json`, `.env.example` |
307
+ | Python | `pyproject.toml` or `requirements.txt`, `.python-version` |
308
+ | Container | `Dockerfile`, `docker-compose.yml` |
309
+ | CI/CD active | `.github/workflows/` or equivalent |
310
+ | ML / Data | `data/README.md`, `MODEL_CARD.md` |
311
+
312
+ ### Rules
313
+
314
+ - Missing universal files → notify user with a checklist before proceeding
315
+ - Missing project-type files → notify, offer to create, do not block the session
316
+ - `README.md` is never auto-created — it requires human context
317
+ - `AGENTS.md` triggers the questionnaire below — it is the most important file for agents
318
+ - ADRs are proposed, never created silently
319
+
320
+ ---
321
+
322
+ ## AGENTS.md Questionnaire
323
+
324
+ Run when `AGENTS.md` (or `CLAUDE.md`) does not exist.
325
+ Ask exactly these 8 questions — no more, no less.
326
+ Every line in the resulting file must earn its place. Verbose autogenerated files hurt agent performance.
327
+
328
+ ```
329
+ 1. What type of project is this?
330
+ (web app / CLI / library / data pipeline / ML / embedded system / other)
331
+
332
+ 2. What language(s) and main frameworks?
333
+
334
+ 3. What architecture pattern does it follow?
335
+ (monolith / microservices / MVC / event-driven / CQRS / no formal architecture)
336
+
337
+ 4. What are the exact commands to: run locally, run tests, deploy?
338
+
339
+ 5. What is the "done" criterion for a task?
340
+ (tests pass / linter clean / review approved / specific metrics)
341
+
342
+ 6. What should an AI agent NEVER do in this project?
343
+ (touch prod / modify migrations / push without review / other)
344
+
345
+ 7. Are there non-obvious architectural decisions an agent must know about?
346
+ (something that looks wrong but has a reason)
347
+
348
+ 8. Are there hard constraints?
349
+ (performance targets, security requirements, compliance, backwards compatibility)
350
+ ```
351
+
352
+ From the answers, generate a minimal `AGENTS.md` with these sections only:
353
+
354
+ ```markdown
355
+ # AGENTS.md
356
+
357
+ ## Project
358
+ {type} — {language/framework} — {architecture pattern}
359
+
360
+ ## Run
361
+ {exact commands}
362
+
363
+ ## Done means
364
+ {done criterion}
365
+
366
+ ## Never
367
+ - {forbidden action 1}
368
+ - {forbidden action 2}
369
+
370
+ ## Non-obvious decisions
371
+ - {decision}: {reason}
372
+
373
+ ## Hard constraints
374
+ - {constraint}
375
+ ```
376
+
377
+ Omit any section for which the user had no answer. Do not pad with generic advice.
378
+
379
+ ---
380
+
381
+ ## Verifier Hierarchy
382
+
383
+ Inspired by DeepMind's AlphaProof: the most rigorous verification is the most formal available.
384
+ When defining the external truth criterion (SCOPE phase, GATE_1), choose the highest applicable level.
385
+
386
+ ```
387
+ Level 1 — Formal (strongest)
388
+ Compiler, type checker, proof assistant (Lean, Coq), constraint solver.
389
+ If it compiles and type-checks, the property holds by construction.
390
+ Use when: type safety, protocol conformance, mathematical theorems.
391
+
392
+ Level 2 — Automated test
393
+ Unit test, integration test, property-based test (Hypothesis, QuickCheck).
394
+ Executable and repeatable. Fails predictably.
395
+ Use when: functional correctness, contract compliance, regression prevention.
396
+
397
+ Level 3 — Empirical measurement
398
+ Benchmark, statistical test, A/B comparison, reproducible experiment.
399
+ Results must be reproducible with the same seed/dataset.
400
+ Use when: performance, quality metrics, model evaluation.
401
+
402
+ Level 4 — Expert review
403
+ A qualified human evaluates against explicit criteria.
404
+ The criteria must be written down before the review — not derived from the output.
405
+ Use when: no automated verifier exists for the property being checked.
406
+
407
+ Level 5 — Self-assessment (weakest — avoid)
408
+ The model evaluates its own output.
409
+ Only acceptable as a preliminary filter before a higher-level verifier.
410
+ Never as the final gate.
411
+ ```
412
+
413
+ ### Rules
414
+
415
+ - The mode agent declares the verifier level in `session_state.json` during SCOPE
416
+ - Never accept a Level 5 verifier as a gate — it is a Verified Loop violation
417
+ - If no verifier above Level 4 exists for a claim, the claim stays `[HYPOTHESIS]`
418
+ - When multiple verifiers apply, use the highest level — do not settle for a lower one
419
+ because it is more convenient
420
+ - Error messages from Level 1/2 verifiers feed back into the next attempt verbatim —
421
+ not summarized, not interpreted. The raw error is the structured input.
422
+
423
+ ---
424
+
425
+ ## Schema Version
426
+
427
+ When this protocol changes in a breaking way, increment `schema_version` in `session_state.json`.
428
+ Mode agents must reject state files with a schema version they do not recognize.
429
+
430
+ Current version: `1.2`
431
+ Changes from `1.1`: Added Project Health Check and AGENTS.md Questionnaire sections.
package/README.md ADDED
@@ -0,0 +1,256 @@
1
+ # cs-scientist-plugin
2
+
3
+ A multi-agent system for rigorous research, development, and teaching — built for [opencode](https://opencode.ai) and [Claude Code](https://claude.ai/code).
4
+
5
+ The core principle is borrowed from DeepMind's most reliable systems (AlphaFold, AlphaProof, FunSearch):
6
+
7
+ > **The model proposes. An external verifier decides.**
8
+
9
+ No self-assessed output advances to the next phase. Every gate is evaluated by a fresh agent with zero session context.
10
+
11
+ ---
12
+
13
+ ## What it does
14
+
15
+ Three operating modes, each a structured loop with adversarial gates:
16
+
17
+ | Mode | Purpose | External verifier |
18
+ |------|---------|------------------|
19
+ | **RESEARCH** | Investigate a topic: hypothesis, sources, triangulation, report | Reproducible experiment or ≥3 independent sources |
20
+ | **DEV** | Build something with correctness guarantees: TDD, verified design, traced decisions | Compiler / type checker / tests (formality hierarchy enforced) |
21
+ | **TEACH** | Learn or teach from provided source materials: progressive explanation, tiered exercises | Student can solve Tier 3 exercises that recall alone cannot answer |
22
+
23
+ ---
24
+
25
+ ## Agents
26
+
27
+ | Agent | Role |
28
+ |-------|------|
29
+ | `cs-scientist` | Orchestrator — routes to modes, runs project health check, initializes session |
30
+ | `cs-scientist-research` | Research loop — 10 phases: SCOPE → DECOMPOSE → RETRIEVE → TRIANGULATE → PROPOSE → EXPERIMENT → ANALYZE → SYNTHESIZE → CRITIQUE → DOCUMENT |
31
+ | `cs-scientist-dev` | Dev loop — 7 phases: SCOPE → DESIGN → PLAN → IMPLEMENT → VERIFY → ITERATE → DOCUMENT |
32
+ | `cs-scientist-teach` | Teaching loop — 7 phases: INTAKE → MAP → SCAFFOLD → EXPLAIN → VERIFY → ITERATE → DOCUMENT |
33
+ | `cs-scientist-critic` | Adversarial gate validator — zero session context, structured PASS/FAIL/HUMAN_REQUIRED |
34
+ | `cs-scientist-consultant` | Domain expert for gate failures caused by missing domain knowledge |
35
+ | `cs-scientist-arbiter` | Council of State synthesis — evaluates 3+ competing options situationally |
36
+
37
+ Sub-agents (critic, consultant, arbiter) **never touch disk**. They receive a structured artifact and return a structured verdict.
38
+
39
+ ---
40
+
41
+ ## Skills
42
+
43
+ | Skill | When to use |
44
+ |-------|-------------|
45
+ | `deep-research` | Directed research on a single question with KB-compatible output |
46
+ | `parallel-research` | Independent multi-angle searches with no cross-contamination |
47
+ | `kb-validate` | Validate KB integrity before a gate: tag consistency, source completeness, circular refs |
48
+ | `session-status` | Human-readable session state + ready-to-paste resume block after context compaction |
49
+ | `negative-results` | Document what didn't work and why: gate failures, refuted hypotheses, discarded approaches |
50
+ | `notebooklm` | Convert a completed research report to podcast script, FAQ, or executive briefing |
51
+ | `writing-plans` | Ultra-detailed TDD implementation plans — actual code in every step, no placeholders |
52
+ | `project-onboarding` | Generate a Day 1 guide for a new team member from the current repo state |
53
+ | `concept-explainer` | Explain a concept at 3 levels (accessible / practitioner / researcher) without starting a session |
54
+ | `paper-outline` | Map a research session KB to an academic paper skeleton |
55
+ | `lesson-plan` | Generate a structured lesson plan for class preparation |
56
+
57
+ ---
58
+
59
+ ## Install
60
+
61
+ ```bash
62
+ npm install -g cs-scientist-plugin
63
+ ```
64
+
65
+ The postinstall script detects which tools you have and asks before copying anything.
66
+
67
+ **Manual install:**
68
+
69
+ ```bash
70
+ git clone https://github.com/QuiquiMatCom2004/cs-scientist-plugin.git
71
+ cd cs-scientist-plugin
72
+ node bin/install.js
73
+ ```
74
+
75
+ **Options:**
76
+
77
+ ```
78
+ node bin/install.js # interactive — asks per platform
79
+ node bin/install.js --opencode # opencode only
80
+ node bin/install.js --claude # Claude Code only
81
+ node bin/install.js --force # overwrite existing files on update
82
+ ```
83
+
84
+ **Where files go:**
85
+
86
+ | Platform | Agents | Skills |
87
+ |----------|--------|--------|
88
+ | opencode | `~/.config/opencode/agents/` | `~/.config/opencode/skills/` |
89
+ | Claude Code | `~/.claude/agents/` | `~/.claude/commands/` |
90
+
91
+ ---
92
+
93
+ ## Quick start
94
+
95
+ Activate the orchestrator:
96
+
97
+ ```
98
+ /cs-scientist
99
+ ```
100
+
101
+ Or say: `investiga`, `desarrolla con rigor`, `quiero aprender`, `modo research`, `modo dev`, `modo teach`.
102
+
103
+ The orchestrator runs a project health check, asks which mode and topic, initializes three session files in `.cs-scientist/{session_id}/`, and dispatches to the mode agent.
104
+
105
+ ### Research
106
+
107
+ ```
108
+ /cs-scientist
109
+ → What is the impact of transformers on NLP compared to RNNs?
110
+ → A) RESEARCH
111
+ ```
112
+
113
+ The research agent runs 10 phases. GATE_1 verifies the question is falsifiable. GATE_2 verifies ≥3 independent sources per claim. GATE_3 verifies the hypothesis is falsifiable and non-circular.
114
+
115
+ ### Dev
116
+
117
+ ```
118
+ /cs-scientist
119
+ → Implement a sliding window rate limiter with Redis
120
+ → B) DEV
121
+ ```
122
+
123
+ The dev agent runs 7 phases. GATE_1_DEV verifies the done criterion is external and binary. GATE_2_DEV verifies the design is unambiguous. Phase 3 PLAN invokes the `writing-plans` skill to produce ultra-detailed TDD steps with actual code in every task.
124
+
125
+ ### Teach
126
+
127
+ ```
128
+ /cs-scientist
129
+ → I want to understand backpropagation
130
+ → C) TEACH
131
+ ```
132
+
133
+ The teach agent loads your source materials (papers, books, lecture notes), maps concept dependencies, scaffolds a lesson from your current level to the objective, and teaches each concept with a 7-step extractor:
134
+
135
+ 1. **Minimal intuition** — using only vocabulary you already have
136
+ 2. **Formal definition** — verbatim from source
137
+ 3. **Best example** — concrete, not abstract
138
+ 4. **Counter-example** — what this is NOT
139
+ 5. **Implication** — what knowing this lets you do
140
+ 6. **Connection backwards** — how this links to what came before
141
+ 7. **What it unlocks** — preview of what comes next
142
+
143
+ Verification: Tier 1 (recall), Tier 2 (apply in source domain), Tier 3 (new scenario where recalling the source is insufficient — reasoning is mandatory).
144
+
145
+ ### Resuming after context compaction
146
+
147
+ ```
148
+ /session-status
149
+ ```
150
+
151
+ Returns the current phase, gate states, active goals, last 5 actions, and a ready-to-paste dispatch block. Single copy-paste to resume.
152
+
153
+ ---
154
+
155
+ ## Session files
156
+
157
+ Every session creates `.cs-scientist/{session_id}/` in the project root:
158
+
159
+ ```
160
+ .cs-scientist/
161
+ └── topic-slug_mode_YYYYMMDD/
162
+ ├── session_state.json # state machine — phase, gates, next action
163
+ ├── goals.md # goal tracker — active, completed, blocked
164
+ ├── activity_log.jsonl # append-only action history (last 5 read each turn)
165
+ ├── knowledge_base.md # verified findings accumulator
166
+ ├── plan.md # (dev) TDD implementation plan
167
+ └── lesson.md # (teach) lesson with exercises and solutions
168
+ ```
169
+
170
+ `session_state.json` is the single source of truth for where the session is.
171
+
172
+ ---
173
+
174
+ ## The Verified Loop
175
+
176
+ Every mode runs a variant of this cycle:
177
+
178
+ ```
179
+ PROPOSE → CRITIQUE → VERIFY (external) → PERSIST → ITERATE
180
+ ```
181
+
182
+ What makes it rigorous:
183
+
184
+ - **External verifier** — defined in SCOPE before any work starts. Not "I'll review it" — a concrete test, benchmark, or experiment that a fresh agent can check independently.
185
+ - **Adversarial critic** — evaluates artifacts with zero session context. Cannot be reassured by prior conversation. Only the artifact matters.
186
+ - **Structured failures** — `FAILURES` must cite the exact part of the artifact that fails. "Needs improvement" is not a failure.
187
+ - **Verifier hierarchy** — formal verifiers (compiler, type checker, proof assistant) are preferred over tests, tests over empirical measurement, empirical over human review. Self-assessment is never a final gate.
188
+
189
+ ### Gates
190
+
191
+ | Gate | Phase transition | What the critic checks |
192
+ |------|-----------------|----------------------|
193
+ | `GATE_1` | Research SCOPE | Falsifiable question, external binary truth criterion, explicit scope |
194
+ | `GATE_2` | Research TRIANGULATE | ≥3 independent sources per `[FACT]`, contradictions documented |
195
+ | `GATE_3` | Research PROPOSE | Hypothesis falsifiable, non-circular, evidence from `[VERIFIED]` only |
196
+ | `GATE_1_DEV` | Dev SCOPE | External binary verifier, unambiguous done criterion, explicit constraints |
197
+ | `GATE_2_DEV` | Dev DESIGN | Any developer can implement without clarifying questions, all `[DECISION]` entries present |
198
+ | `GATE_1_TEACH` | Teach INTAKE | Objective is measurable (a capability, not "understand X"), sources sufficient |
199
+ | `GATE_2_TEACH` | Teach SCAFFOLD | All bridges start from student's actual knowledge, no forward dependencies |
200
+ | `GATE_3_TEACH` | Teach VERIFY | Tier 3 exercises cannot be answered by recall alone |
201
+
202
+ ### Gate failure routing
203
+
204
+ ```
205
+ Is the failure methodological? ("verifier not binary", "circular", "ambiguous")
206
+ → Mode agent corrects directly — max 2 attempts, then HUMAN_REQUIRED
207
+
208
+ Is the failure domain-specific? (unknown framework behavior, unclear protocol)
209
+ → Dispatch cs-scientist-consultant — one correction, then retry
210
+ ```
211
+
212
+ ---
213
+
214
+ ## Knowledge base tags
215
+
216
+ Research and teach sessions accumulate findings in `knowledge_base.md`:
217
+
218
+ **Research KB:**
219
+ ```
220
+ [FACT] — verified by ≥3 independent sources
221
+ [VERIFIED] — confirmed by experiment
222
+ [HYPOTHESIS] — plausible but unverified
223
+ [SYNTHESIS] — model-generated connection between verified facts
224
+ [REFUTED] — tested and disproven
225
+ ```
226
+
227
+ **Teach KB:**
228
+ ```
229
+ [CORE] — fundamental concept on the critical path to the objective
230
+ [ADVANCED] — builds on CORE, required for full depth
231
+ [APPLIED] — application of a concept to a specific domain
232
+ [PREREQUISITE] — needed but not taught in this session
233
+ [MISCONCEPTION]— common wrong understanding, addressed explicitly in EXPLAIN
234
+ ```
235
+
236
+ ---
237
+
238
+ ## Protocol
239
+
240
+ The full inter-agent communication contract is in `PROTOCOL.md`. It defines the dispatch/return envelope format, gate criteria per gate type, the Iron Rule (3 mandatory reads per turn), verifier hierarchy, and session file schemas.
241
+
242
+ If an agent behavior conflicts with `PROTOCOL.md`, the protocol wins.
243
+
244
+ ---
245
+
246
+ ## Requirements
247
+
248
+ - Node.js ≥ 18
249
+ - [opencode](https://opencode.ai) and/or [Claude Code](https://claude.ai/code)
250
+ - A model API key configured in your tool
251
+
252
+ ---
253
+
254
+ ## License
255
+
256
+ MIT