@neikyun/ciel 6.11.0 → 6.11.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/assets/.claude/hooks/memory-engine.py +29 -4
- package/assets/.claude/settings.json +8 -8
- package/assets/commands/ciel-create-skill.md +2 -2
- package/assets/commands/ciel-status.md +1 -1
- package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +2 -2
- package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +2 -2
- package/assets/platforms/opencode/.opencode/commands/ciel-memory-bootstrap.md +195 -0
- package/assets/skills/workflow/adr-auto/SKILL.md +88 -0
- package/assets/skills/workflow/ai-failure-modes-detector/SKILL.md +180 -0
- package/assets/skills/workflow/ask-window/SKILL.md +119 -0
- package/assets/skills/workflow/avec-quoi-versioner/SKILL.md +111 -0
- package/assets/skills/workflow/ci-watcher/SKILL.md +194 -0
- package/assets/skills/workflow/critiquer-auditor/SKILL.md +135 -0
- package/assets/skills/workflow/critiquer-auditor/reference.md +134 -0
- package/assets/skills/workflow/debug-reasoning-rca/SKILL.md +174 -0
- package/assets/skills/workflow/depth-classifier/SKILL.md +118 -0
- package/assets/skills/workflow/diverge/SKILL.md +91 -0
- package/assets/skills/workflow/doc-validator-official/SKILL.md +196 -0
- package/assets/skills/workflow/evaluer-sizer/SKILL.md +112 -0
- package/assets/skills/workflow/faire-gatekeeper/SKILL.md +99 -0
- package/assets/skills/workflow/flux-narrator/SKILL.md +93 -0
- package/assets/skills/workflow/memoire/SKILL.md +198 -0
- package/assets/skills/workflow/memoire-consolidator/SKILL.md +91 -0
- package/assets/skills/workflow/meta-critiquer/SKILL.md +112 -0
- package/assets/skills/workflow/modern-patterns-checker/SKILL.md +166 -0
- package/assets/skills/workflow/pattern-fitness-check/SKILL.md +108 -0
- package/assets/skills/workflow/playwright-visual-critic/SKILL.md +98 -0
- package/assets/skills/workflow/pr-review-responder/SKILL.md +214 -0
- package/assets/skills/workflow/prouver-verifier/SKILL.md +184 -0
- package/assets/skills/workflow/prouver-verifier/reference.md +152 -0
- package/assets/skills/workflow/quoi-framer/SKILL.md +91 -0
- package/assets/skills/workflow/relire-critic/SKILL.md +99 -0
- package/assets/skills/workflow/security-regression-check/SKILL.md +86 -0
- package/assets/skills/workflow/self-consistency-verifier/SKILL.md +85 -0
- package/assets/skills/workflow/spike-mode/SKILL.md +101 -0
- package/assets/skills/workflow/stride-analyzer/SKILL.md +96 -0
- package/assets/skills/workflow/stride-analyzer/reference.md +144 -0
- package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md +119 -0
- package/package.json +1 -1
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: debug-reasoning-rca
|
|
3
|
+
description: How to debug systematically — hypothesis-driven root cause analysis methodology. 3 parallel hypotheses, fault-type taxonomy (model/context/orchestration/environment), semantic diff between expected and actual behavior. For bugs, incidents, flaky tests, regressions, production failures.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Systematic Debugging — Root Cause Analysis Methodology
|
|
8
|
+
|
|
9
|
+
## What this covers
|
|
10
|
+
|
|
11
|
+
How to find the real cause of a bug, not just patch the symptom. Default LLM failure: jump to the first plausible fix. Proper debugging is hypothesis-driven (Hunt & Thomas) and catches 75% more recurrences (STRATUS 2025).
|
|
12
|
+
|
|
13
|
+
## Core principle
|
|
14
|
+
|
|
15
|
+
**Never propose a fix before a hypothesis is SUPPORTED by evidence.** "It might be this, let me fix it" is forbidden.
|
|
16
|
+
|
|
17
|
+
## Step 1: Gather context
|
|
18
|
+
|
|
19
|
+
Before hypothesizing, understand the failure:
|
|
20
|
+
|
|
21
|
+
- **Read the error literally** — stack trace, log line, exit code. What does the system actually say?
|
|
22
|
+
- **Read the failing code** at the exact `file:line` from the trace
|
|
23
|
+
- **Check recent changes** — `git log -p --since="7 days ago" -- <scope>`. A recent bug usually has a recent cause.
|
|
24
|
+
- **Run the repro** once and capture full output
|
|
25
|
+
|
|
26
|
+
Skip this step = hypotheses based on vibes.
|
|
27
|
+
|
|
28
|
+
## Step 2: Generate 3 hypotheses
|
|
29
|
+
|
|
30
|
+
Generate EXACTLY 3 **causally distinct** hypotheses. Not 3 variants of the same theory.
|
|
31
|
+
|
|
32
|
+
Format:
|
|
33
|
+
```
|
|
34
|
+
H<n>: <cause> → <mechanism> → <observable effect>
|
|
35
|
+
Evidence for: <what would be true if correct>
|
|
36
|
+
Evidence against: <what would be true if wrong>
|
|
37
|
+
Fault-type: [MODEL | CONTEXT | ORCHESTRATION | ENVIRONMENT]
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Fault-type taxonomy
|
|
41
|
+
|
|
42
|
+
| Type | What it means | Example |
|
|
43
|
+
|------|--------------|---------|
|
|
44
|
+
| **MODEL** | Code logic wrong | Off-by-one, wrong algorithm, wrong assumption |
|
|
45
|
+
| **CONTEXT** | Missing/stale input | Wrong config, race window, state leak |
|
|
46
|
+
| **ORCHESTRATION** | Infrastructure misconfigured | Retry/timeout wrong, queue backlog |
|
|
47
|
+
| **ENVIRONMENT** | External change | Dependency drift, OS change, infra outage |
|
|
48
|
+
|
|
49
|
+
**Distribution rule**: hypotheses must span AT LEAST 2 fault-types. Three MODEL hypotheses = tunnel vision.
|
|
50
|
+
|
|
51
|
+
## Step 3: Validate (targeted checks)
|
|
52
|
+
|
|
53
|
+
For each hypothesis, run ONE targeted check (not fix):
|
|
54
|
+
|
|
55
|
+
- MODEL → add a log line or unit test asserting the expected invariant
|
|
56
|
+
- CONTEXT → dump actual input/config at failure point; diff vs expected
|
|
57
|
+
- ORCHESTRATION → check retry count, timeout, queue depth at failure time
|
|
58
|
+
- ENVIRONMENT → `<pkg-mgr> list | grep <dep>` vs lockfile; `uname -a`
|
|
59
|
+
|
|
60
|
+
Record: evidence collected, hypothesis supported/refuted/inconclusive.
|
|
61
|
+
|
|
62
|
+
## Step 4: Semantic diff
|
|
63
|
+
|
|
64
|
+
Once supported, write the diff between expected and actual:
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
EXPECTED: <behavior that should happen>
|
|
68
|
+
ACTUAL: <behavior that happens>
|
|
69
|
+
GAP: <precise mechanism>
|
|
70
|
+
ROOT: <why the gap exists — not "because of the bug", the underlying why>
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
If ROOT reads like "because the code is buggy" — you've only found the symptom. Ask "why" again.
|
|
74
|
+
|
|
75
|
+
## Step 5: Fix (two layers)
|
|
76
|
+
|
|
77
|
+
- **Direct fix** — address the supported hypothesis (the bug itself)
|
|
78
|
+
- **Systemic fix** — address why the bug was possible (missing test, missing alert, missing type)
|
|
79
|
+
|
|
80
|
+
Systemic fix is the 75% MTTR-reduction lever. Don't skip it on Critical bugs.
|
|
81
|
+
|
|
82
|
+
## Output format
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
## RCA VERDICT
|
|
86
|
+
|
|
87
|
+
### Symptom
|
|
88
|
+
<1 sentence>
|
|
89
|
+
|
|
90
|
+
### Repro
|
|
91
|
+
<exact command or "flaky — triggers ~1/N runs">
|
|
92
|
+
|
|
93
|
+
### Hypotheses explored
|
|
94
|
+
H1 [MODEL]: <cause> — <supported|refuted|inconclusive> — <evidence>
|
|
95
|
+
H2 [CONTEXT]: <cause> — <supported|refuted|inconclusive> — <evidence>
|
|
96
|
+
H3 [ORCHESTRATION]: <cause> — <supported|refuted|inconclusive> — <evidence>
|
|
97
|
+
|
|
98
|
+
### Root cause
|
|
99
|
+
<hypothesis number>: <cause>
|
|
100
|
+
|
|
101
|
+
### Semantic diff
|
|
102
|
+
EXPECTED/ACTUAL/GAP/ROOT
|
|
103
|
+
|
|
104
|
+
### Fix
|
|
105
|
+
- Direct: <exact code change>
|
|
106
|
+
- Systemic: <test/alert/process to add>
|
|
107
|
+
|
|
108
|
+
### Confidence
|
|
109
|
+
HIGH | MEDIUM | LOW — <why>
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Auto-inference (before asking the user)
|
|
113
|
+
|
|
114
|
+
Exhaust these sources before flagging input as unknown:
|
|
115
|
+
|
|
116
|
+
- **SYMPTOM** → grep last error in user's prompt; tail service logs; check recent PR descriptions
|
|
117
|
+
- **REPRO** → read `package.json` scripts, `Makefile`, `README.md`, test files, CI workflow
|
|
118
|
+
- **SCOPE** → `git diff HEAD~10 --stat` then rank by overlap with symptom keywords
|
|
119
|
+
- **RECENT_CHANGES** → `git log --since="7 days ago" --oneline -- <scope>`
|
|
120
|
+
|
|
121
|
+
State inferred values as `[ASSUMED from <source>]`. Only flag as `[UNKNOWN]` if truly blocking.
|
|
122
|
+
|
|
123
|
+
## How to verify
|
|
124
|
+
|
|
125
|
+
- [ ] ≥ 3 hypotheses generated (not just 1)?
|
|
126
|
+
- [ ] Each hypothesis has a fault type from the taxonomy?
|
|
127
|
+
- [ ] Semantic diff completed (EXPECTED vs ACTUAL vs GAP)?
|
|
128
|
+
- [ ] Root cause identified with evidence (file:line)?
|
|
129
|
+
- [ ] Fix addresses root cause, not symptom?
|
|
130
|
+
- [ ] Confidence level stated (HIGH/MEDIUM/LOW)?
|
|
131
|
+
|
|
132
|
+
## Anti-patterns
|
|
133
|
+
|
|
134
|
+
- **Patch-the-symptom**: add try/catch without understanding WHY it failed
|
|
135
|
+
- **Fix-the-test**: modify assertion to match wrong behavior instead of fixing code
|
|
136
|
+
- **Guess-and-check**: 5 commits titled "try fix" — no hypothesis discipline
|
|
137
|
+
- **First-hypothesis-wins**: commit first theory without validating alternatives
|
|
138
|
+
- **No repro, no RCA**: chasing intermittent bugs without deterministic repro burns hours
|
|
139
|
+
|
|
140
|
+
## Structured RCA methods (complementary)
|
|
141
|
+
|
|
142
|
+
The 3-hypothesis method above is the default — fast, hypothesis-driven, good for most bugs. For complex, recurrent, or systemic problems, these structured RCA methods add depth.
|
|
143
|
+
|
|
144
|
+
### Decision guide
|
|
145
|
+
|
|
146
|
+
| Problem type | Method | Why |
|
|
147
|
+
|-------------|--------|-----|
|
|
148
|
+
| Linear, single-symptom | **3 hypotheses** (default) | Fastest — parallel hypotheses, minimal overhead |
|
|
149
|
+
| Recurrent incident, process failure | **5 Whys** | Iterative questioning reaches systemic root cause |
|
|
150
|
+
| Multi-factor, need exhaustive exploration | **Ishikawa (Fishbone)** | 6M families (Method/Machine/Manpower/Material/Milieu/Measurement) guide complete coverage |
|
|
151
|
+
| Multi-layer, complex system | **Drill Down / Tree Diagram** | Decompose recursively (build → deploy → runtime → data) into atomic sub-causes; visualize as tree |
|
|
152
|
+
| Interacting causes, feedback loops | **Relations Diagram** | Map causal links, count outbound/inbound arrows to find drivers vs effects |
|
|
153
|
+
|
|
154
|
+
**When to use the full sequence**: if the problem involves ≥ 3 interacting factors across distinct system layers, use the full chain: Ishikawa (explore) → Relations Diagram (map interactions) → 5 Whys on each promising node → Tree Diagram (document). For simpler problems, pick one method from the guide.
|
|
155
|
+
|
|
156
|
+
### 5 Whys
|
|
157
|
+
|
|
158
|
+
Ask "why?" iteratively (5× typical) on the symptom. Each answer becomes the next question. Stop when the cause is systemic/process-level, not technical. **Anti-pattern**: stopping at "error 500" — the real cause may be "no integration test catches this path."
|
|
159
|
+
|
|
160
|
+
### Ishikawa (Fishbone)
|
|
161
|
+
|
|
162
|
+
Draw a horizontal spine ending at the problem (fish head). Add diagonal bones for 6 families: Method, Machine, Manpower, Material, Milieu, Measurement (adapt to software: Technology, Data/API). Branch sub-causes off each family. **Anti-pattern**: filling every family superficially — depth > breadth.
|
|
163
|
+
|
|
164
|
+
### Drill Down / Tree Diagram
|
|
165
|
+
|
|
166
|
+
Decompose the problem into 2-4 MECE sub-causes at each level, recursing until atomic (directly fixable). Visualize the result as a hierarchical tree with AND/OR logic per branch. These are the same analytical process — decomposition (Drill Down) and visualization (Tree Diagram). **Anti-pattern**: stopping at shallow levels — "module X crashes" isn't actionable, "method Y throws Z when condition W" is.
|
|
167
|
+
|
|
168
|
+
### Relations Diagram
|
|
169
|
+
|
|
170
|
+
List all discovered factors. For each pair, ask if causation exists and in which direction. Draw arrows. Count outbound (drivers) vs inbound (effects). Nodes with the most outbound arrows are root cause candidates. **Anti-pattern**: connecting everything — if most factors connect to most others, the diagram is not discriminating; focus on clear causal links only.
|
|
171
|
+
|
|
172
|
+
## Key insight
|
|
173
|
+
|
|
174
|
+
The hardest part of debugging is not finding the fix — it's resisting the urge to fix before understanding. The 3-hypothesis discipline forces you to consider alternatives before committing to one.
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: depth-classifier
|
|
3
|
+
description: Classifies a coding task as Trivial, Standard, or Critical based on mechanical signals (auth paths, security code, DB tables, diff size, route handlers). Use at the start of every Ciel workflow to determine which downstream skills to invoke. Returns a one-word depth + rationale + pipeline recommendation.
|
|
4
|
+
allowed-tools: Read, Grep, Glob
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# depth-classifier — Classify task depth
|
|
8
|
+
|
|
9
|
+
Gatekeeper skill at the entry of every Ciel workflow. Wrong classification = wrong depth = either waste (over-processing trivial) or risk (under-processing critical).
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Inputs
|
|
14
|
+
|
|
15
|
+
- **task**: the task description in natural language (from `/ciel <task>` or user message)
|
|
16
|
+
- **project-root** (optional): absolute path, defaults to CWD
|
|
17
|
+
- **overlay** (optional): `ciel-overlay.md` content if available
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Classification signals
|
|
22
|
+
|
|
23
|
+
### Critical if ANY match:
|
|
24
|
+
|
|
25
|
+
- Path patterns: `auth/`, `security/`, `Token`, `Password`, `Secret`, `Session`, `Crypto`
|
|
26
|
+
- DB table names: `users`, `sessions`, `tokens`, `accounts`, `credentials`, `2fa`, `api_keys`
|
|
27
|
+
- Code patterns: `.executeQuery`, `.executeUpdate`, raw SQL, `userId` (server-provided vs client-provided), `role`, `permission`
|
|
28
|
+
- Task keywords: "authentication", "authorization", "payment", "migration (DB schema)", "JWT", "OAuth", "encryption", "2FA", "session"
|
|
29
|
+
- Scope: touches user data, money, audit trails
|
|
30
|
+
|
|
31
|
+
### Standard if ANY match (and not Critical):
|
|
32
|
+
|
|
33
|
+
- Path patterns: `routes/`, `controllers/`, `services/`, `components/`, `hooks/`
|
|
34
|
+
- **CI/CD & pipeline files**: `.github/workflows/*.yml`, `.gitlab-ci.yml`, `.circleci/`, `Dockerfile`, `docker-compose*.yml`, `Jenkinsfile`, `.buildkite/`, `.drone.yml`
|
|
35
|
+
- **PR-review signals**:
|
|
36
|
+
- Prompt contains a PR number (`#\d+`, `PR \d+`, `pull request \d+`) OR phrases "open PR", "review PR", "fix PR", "merge PR"
|
|
37
|
+
- Planned tool calls include `gh pr list`, `gh pr view`, `gh pr checks`, `gh pr review`, `gh pr merge` (any variant: `--auto`, `--squash`, `--merge`, `--rebase`)
|
|
38
|
+
- Planned edits touch any CI/CD pipeline file (see row above)
|
|
39
|
+
- Diff scope (estimated): > 1 file OR > 50 lines change
|
|
40
|
+
- Code patterns: `validate`, `sanitize`, `rateLimit`, route handlers, state management
|
|
41
|
+
- Task keywords: "add endpoint", "new component", "refactor", "extract helper", "feature", "integration"
|
|
42
|
+
|
|
43
|
+
**Floor rule**: if ANY PR-review signal OR any CI/CD-file signal is present, depth is **at minimum Standard** — Trivial is disqualified even if the diff is small. PR review plus CI fix is never "just a one-line change".
|
|
44
|
+
|
|
45
|
+
### Trivial otherwise:
|
|
46
|
+
|
|
47
|
+
- Rename, typo, 1-line fix, copyright update, README edit
|
|
48
|
+
- Single-file localized change ≤ 10 lines
|
|
49
|
+
- No business logic change
|
|
50
|
+
|
|
51
|
+
### Default rule
|
|
52
|
+
|
|
53
|
+
If unsure → **Standard**. If touching user data or auth → **Critical**.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Pipeline recommendations
|
|
58
|
+
|
|
59
|
+
Return pipeline for each depth:
|
|
60
|
+
|
|
61
|
+
### Trivial
|
|
62
|
+
`quoi-framer` → `pattern-fitness-check` → `faire-gatekeeper` → `relire-critic` (inline) → push → `meta-critiquer`
|
|
63
|
+
|
|
64
|
+
### Standard
|
|
65
|
+
`quoi-framer` → `avec-quoi-versioner` → [researcher agent + explorer agent IN PARALLEL] → `evaluer-sizer` → `faire-gatekeeper` → `critic` agent MODE=RELIRE → `prouver-verifier` → `meta-critiquer`
|
|
66
|
+
|
|
67
|
+
### Critical
|
|
68
|
+
All of Standard + `stride-analyzer` (after `avec-quoi-versioner`) + `security-regression-check` (between FAIRE and RELIRE) + critic agent MANDATORY
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Output format
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
## DEPTH CLASSIFICATION
|
|
76
|
+
|
|
77
|
+
Depth: **Trivial | Standard | Critical**
|
|
78
|
+
|
|
79
|
+
Signals detected:
|
|
80
|
+
- [signal 1 with source — e.g. "path matches /auth/"]
|
|
81
|
+
- [signal 2]
|
|
82
|
+
|
|
83
|
+
Rationale: [1-2 sentences]
|
|
84
|
+
|
|
85
|
+
Pipeline:
|
|
86
|
+
1. <skill>
|
|
87
|
+
2. <skill>
|
|
88
|
+
...
|
|
89
|
+
|
|
90
|
+
Agents required:
|
|
91
|
+
- [researcher: yes/no]
|
|
92
|
+
- [explorer: yes/no]
|
|
93
|
+
- [critic: yes/no]
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Guardrails
|
|
99
|
+
|
|
100
|
+
- **Asymmetric bias**: when borderline between Trivial/Standard → Standard wins. When borderline between Standard/Critical → Critical wins. Missing a Critical is worse than over-processing a Standard.
|
|
101
|
+
- **Auth/security override**: any mention of auth, credentials, tokens, or user identity → Critical regardless of diff size
|
|
102
|
+
- **Single-line fix can still be Critical**: e.g. a 1-char fix in an auth check is Critical
|
|
103
|
+
- **Don't infer from filename alone**: `UserService.kt` could be Trivial if the change is a rename. Look at the actual code change being proposed.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## How to verify
|
|
108
|
+
|
|
109
|
+
- [ ] Classification signals checked (Critical, Standard, Trivial)?
|
|
110
|
+
- [ ] Pipeline recommendation provided?
|
|
111
|
+
- [ ] Default rule applied (Unsure → Standard)?
|
|
112
|
+
- [ ] Auth/security files → Critical?
|
|
113
|
+
|
|
114
|
+
## When triggered
|
|
115
|
+
|
|
116
|
+
- Automatically at start of `/ciel <task>` via the `ciel` orchestrator
|
|
117
|
+
- By `UserPromptSubmit` hook (light classification hint injected into context)
|
|
118
|
+
- Explicitly when depth is ambiguous after initial assessment
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: diverge
|
|
3
|
+
description: How to explore 2-3 radically different approaches before choosing one (Ciel v5 etape 5). Used after AVEC QUOI, before RECHERCHE. Prevents single-approach bias and premature convergence. Use when the task is non-trivial and there are multiple valid approaches.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Divergent Exploration — 2-3 Approaches Before Choosing (Ciel v5)
|
|
7
|
+
|
|
8
|
+
## What this covers
|
|
9
|
+
|
|
10
|
+
How to explore multiple approaches before committing to one. In Ciel v5, this is etape 5 (DIVERGE). The goal is to avoid premature convergence on the first viable approach that comes to mind.
|
|
11
|
+
|
|
12
|
+
## Core principle
|
|
13
|
+
|
|
14
|
+
**Generate 2-3 approaches before evaluating any of them.** The first approach that works is rarely the best. In v5, DIVERGE happens after AVEC QUOI (versions checked) and before RECHERCHE (external research).
|
|
15
|
+
|
|
16
|
+
## When to use
|
|
17
|
+
|
|
18
|
+
- Non-trivial tasks with multiple valid solutions
|
|
19
|
+
- Architectural decisions
|
|
20
|
+
- Library/framework choices
|
|
21
|
+
- Design patterns
|
|
22
|
+
- Database schema design
|
|
23
|
+
- API design
|
|
24
|
+
|
|
25
|
+
**When NOT to use**: 1-line fix, rename, trivial config change, obvious solution.
|
|
26
|
+
|
|
27
|
+
## The process
|
|
28
|
+
|
|
29
|
+
### Step 1: Generate at least 2 approaches
|
|
30
|
+
|
|
31
|
+
For each approach, describe:
|
|
32
|
+
- What it does (1-2 sentences)
|
|
33
|
+
- Key trade-offs (not "it's better" -- specific pros/cons)
|
|
34
|
+
- Implementation effort (rough estimate)
|
|
35
|
+
- Risk level (low/medium/high)
|
|
36
|
+
|
|
37
|
+
Approaches should be GENUINELY different. Not "use React vs use React with hooks" -- same approach. Bad:
|
|
38
|
+
- "Use PostgreSQL vs MySQL" (trivial database choice)
|
|
39
|
+
- "Use REST vs GraphQL" (genuinely different)
|
|
40
|
+
|
|
41
|
+
### Step 2: Let them compete (not you decide)
|
|
42
|
+
|
|
43
|
+
Generate approaches WITHOUT evaluating them. Evaluation happens in EVALUER (etape 9), after RECHERCHE has gathered external data about each approach.
|
|
44
|
+
|
|
45
|
+
Common trap: generating 2 approaches but immediately choosing the first one without research.
|
|
46
|
+
|
|
47
|
+
### Step 3: Document for EVALUER
|
|
48
|
+
|
|
49
|
+
Pass both approaches (with their trade-offs, effort, risk) to the EVALUER phase. The researcher should check documentation for BOTH approaches.
|
|
50
|
+
|
|
51
|
+
## Output format
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
## DIVERGE
|
|
55
|
+
|
|
56
|
+
### Approach A: <name>
|
|
57
|
+
What: <1-2 sentences>
|
|
58
|
+
Trade-offs:
|
|
59
|
+
+ <pro>
|
|
60
|
+
- <con>
|
|
61
|
+
Effort: <XS/S/M/L/XL>
|
|
62
|
+
Risk: <low/medium/high>
|
|
63
|
+
|
|
64
|
+
### Approach B: <name>
|
|
65
|
+
What: <1-2 sentences>
|
|
66
|
+
Trade-offs:
|
|
67
|
+
+ <pro>
|
|
68
|
+
- <con>
|
|
69
|
+
Effort: <XS/S/M/L/XL>
|
|
70
|
+
Risk: <low/medium/high>
|
|
71
|
+
|
|
72
|
+
### (Optional) Approach C: <name>
|
|
73
|
+
...
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Common rationalizations
|
|
77
|
+
|
|
78
|
+
| Rationalization | Reality |
|
|
79
|
+
|---|---|
|
|
80
|
+
| "I already know the best approach" | You know the first approach that came to mind. That's not the same as the best approach. Generate 2-3 then compare. |
|
|
81
|
+
| "Diverging takes too long" | It takes 5 minutes. Committing to the wrong approach costs days. The math is clear. |
|
|
82
|
+
| "There's only one valid way to do this" | There are almost always 2+ valid approaches. If you can't think of alternatives, you don't understand the problem well enough. |
|
|
83
|
+
|
|
84
|
+
## How to verify
|
|
85
|
+
|
|
86
|
+
- [ ] >= 2 genuinely different approaches generated?
|
|
87
|
+
- [ ] Approaches are different in kind, not degree?
|
|
88
|
+
- [ ] Trade-offs documented for each?
|
|
89
|
+
- [ ] Effort estimated?
|
|
90
|
+
- [ ] Risk assessed?
|
|
91
|
+
- [ ] Evaluation deferred to next phase (EVALUER)?
|
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: doc-validator-official
|
|
3
|
+
description: Before generating code that calls an external library, framework, or API, fetches the OFFICIAL documentation for the exact version in use and validates that each proposed API call (function name, signature, parameters, return type) exists as cited. Rejects reliance on Stack Overflow/blog posts when official docs exist. Forces citations for every non-trivial API use. The primary anti-hallucination gate for the RECHERCHE step.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash, WebFetch, WebSearch
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# doc-validator-official — Official docs first, blogs never
|
|
8
|
+
|
|
9
|
+
LLM hallucination of APIs is the #1 coding failure mode (ISSTA 2025). Functions that don't exist, wrong version signatures, parameters invented, return types fabricated. Advanced RAG against official docs eliminates this class of bug.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Inputs (infer before asking — see orchestrator's Autonomy protocol)
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
TARGET_STACK: [language + framework + version — e.g., "TypeScript 5.5 + React 19"]
|
|
17
|
+
PROPOSED_APIS: [list of function/class/method calls the implementation will use]
|
|
18
|
+
PACKAGE_SOURCES: [paths to package.json / go.mod / requirements.txt / Cargo.toml]
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
### Auto-inference sources (exhaust BEFORE asking the user)
|
|
22
|
+
|
|
23
|
+
- **PACKAGE_SOURCES** → `find . -maxdepth 3 -name 'package.json' -o -name 'go.mod' -o -name 'requirements.txt' -o -name 'pyproject.toml' -o -name 'Cargo.toml' -o -name 'Gemfile'` — pick up every manifest without asking.
|
|
24
|
+
- **TARGET_STACK** → derive from PACKAGE_SOURCES (read the files, extract versions of the key libs). Cross-check with `ciel-overlay.md`.
|
|
25
|
+
- **PROPOSED_APIS** → parse from the user's task description + any referenced code diff. If user said "use stripe to refund X", APIs = `stripe.refunds.create`, `stripe.paymentIntents.retrieve`, etc.
|
|
26
|
+
|
|
27
|
+
Only BLOCK if no manifest file exists at all (greenfield project with no deps yet) — then ask once "Which package.json / go.mod should I validate against?".
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## Phase 1 — Extract exact versions
|
|
32
|
+
|
|
33
|
+
Read package manifests. For each lib in PROPOSED_APIS extract the pinned version:
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
# npm/yarn/pnpm
|
|
37
|
+
jq -r '.dependencies + .devDependencies | to_entries[] | "\(.key) \(.value)"' package.json
|
|
38
|
+
|
|
39
|
+
# go
|
|
40
|
+
grep -E '^\s*<lib>' go.mod
|
|
41
|
+
|
|
42
|
+
# python
|
|
43
|
+
grep -E '^<lib>' requirements.txt pyproject.toml
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Record as `{lib_name, pinned_version, source_file:line}`.
|
|
47
|
+
|
|
48
|
+
If version is a range (`^1.2.0`) → resolve the actual installed version from lockfile (`package-lock.json`, `yarn.lock`, `uv.lock`, `Cargo.lock`). Never validate against a range.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Phase 2 — Locate official docs
|
|
53
|
+
|
|
54
|
+
For each lib, find the CANONICAL doc URL for the exact version. Priority order:
|
|
55
|
+
|
|
56
|
+
1. **Versioned docs site** — `https://reactjs.org/docs/v19.0.0/` or `https://fastapi.tiangolo.com/release-notes/`
|
|
57
|
+
2. **Repo `/docs/` at the tag** — `https://github.com/org/repo/tree/v1.2.0/docs`
|
|
58
|
+
3. **README at the tag** — `https://github.com/org/repo/blob/v1.2.0/README.md`
|
|
59
|
+
4. **Context7 MCP** (if available) — provides up-to-date official docs for thousands of libs
|
|
60
|
+
|
|
61
|
+
### Reject these sources
|
|
62
|
+
|
|
63
|
+
- Stack Overflow answers (even highly upvoted — often stale)
|
|
64
|
+
- Medium/dev.to blog posts (version drift, author may have been wrong)
|
|
65
|
+
- AI-generated tutorials (recursion hazard)
|
|
66
|
+
- Forum posts without corroboration by official docs
|
|
67
|
+
|
|
68
|
+
These may GUIDE investigation but never JUSTIFY an API claim.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Phase 3 — Validate each proposed API
|
|
73
|
+
|
|
74
|
+
For each item in PROPOSED_APIS:
|
|
75
|
+
|
|
76
|
+
1. **Fetch the official doc page** for that function/class.
|
|
77
|
+
2. **Verify the signature matches** — function exists, parameter names and types match, return type matches.
|
|
78
|
+
3. **Verify version availability** — "Added in vX.Y" metadata. If the pinned version < X.Y, the API doesn't exist in this project yet.
|
|
79
|
+
4. **Capture citation** — URL + section header + (if possible) quoted signature.
|
|
80
|
+
|
|
81
|
+
Output per API:
|
|
82
|
+
```
|
|
83
|
+
[VALID] lib.funcName(a: T1, b: T2): T3
|
|
84
|
+
Source: <URL>#section
|
|
85
|
+
Cited: "funcName(a, b) → T3 — Added in 1.4.0"
|
|
86
|
+
Pinned: 1.5.2 ✓
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
or:
|
|
90
|
+
```
|
|
91
|
+
[INVALID] lib.funcName — NOT FOUND in v1.5.2 docs
|
|
92
|
+
Similar: lib.otherFunc (did you mean this?)
|
|
93
|
+
Action: rename or choose a different lib
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
or:
|
|
97
|
+
```
|
|
98
|
+
[AMBIGUOUS] lib.funcName exists but signature differs
|
|
99
|
+
Doc says: funcName(a: string, opts?: Opts) → Promise<T>
|
|
100
|
+
Proposed: funcName(a, b) — missing opts wrapping
|
|
101
|
+
Action: rewrite call site to match doc signature
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Phase 4 — Citation enforcement
|
|
107
|
+
|
|
108
|
+
Every non-trivial API use in the final implementation MUST have a citation comment OR be documented in the PR description. Trivial = stdlib builtin (`Array.map`, `str.split`). Non-trivial = third-party lib, framework-specific, version-sensitive stdlib (e.g., `Intl.Segmenter`).
|
|
109
|
+
|
|
110
|
+
Citation format in code (optional, acceptable if 3+ APIs would clutter):
|
|
111
|
+
```typescript
|
|
112
|
+
// Per react.dev/reference/react/useTransition (v19)
|
|
113
|
+
const [isPending, startTransition] = useTransition();
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Citation format in PR description (mandatory for Critical tasks):
|
|
117
|
+
```
|
|
118
|
+
## External APIs used
|
|
119
|
+
- `react.useTransition` — react.dev/reference/react/useTransition (v19)
|
|
120
|
+
- `drizzle-orm.select().from()` — orm.drizzle.team/docs/select (v0.33)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Phase 5 — Training-cutoff awareness
|
|
126
|
+
|
|
127
|
+
If a lib in PROPOSED_APIS was released or had a major version AFTER your knowledge cutoff (January 2026), explicitly flag:
|
|
128
|
+
|
|
129
|
+
```
|
|
130
|
+
[CUTOFF-WARNING] lib <name> vX.Y (released 2026-MM-DD)
|
|
131
|
+
Your training data does not reliably cover this version.
|
|
132
|
+
MANDATORY: fetch live docs, do not rely on pattern-matching from memory.
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## Output format
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
## DOC VALIDATION
|
|
141
|
+
|
|
142
|
+
### Versions resolved
|
|
143
|
+
- react 19.0.2 (from package-lock.json:1234)
|
|
144
|
+
- drizzle-orm 0.33.1 (from package-lock.json:5678)
|
|
145
|
+
|
|
146
|
+
### API validation
|
|
147
|
+
[VALID] react.useTransition — react.dev/.../useTransition (v19)
|
|
148
|
+
[VALID] drizzle-orm.select — orm.drizzle.team/docs/select (v0.33)
|
|
149
|
+
[INVALID] drizzle-orm.raw — not in v0.33, renamed to sql.raw in v0.30+
|
|
150
|
+
[AMBIGUOUS] react.use — signature changed in v19, proposed call uses v18 shape
|
|
151
|
+
|
|
152
|
+
### Cutoff warnings
|
|
153
|
+
- drizzle-orm 0.33 (released 2026-02) — post-cutoff, relied on live fetch
|
|
154
|
+
|
|
155
|
+
### Verdict
|
|
156
|
+
BLOCKING: 1 INVALID, 1 AMBIGUOUS — cannot proceed until resolved
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
---
|
|
160
|
+
|
|
161
|
+
## Guardrails
|
|
162
|
+
|
|
163
|
+
- **Never infer an API from "it should exist"** — if you can't cite the doc page, the API doesn't exist for your purposes.
|
|
164
|
+
- **Exact version, never range** — validating against a range produces false positives.
|
|
165
|
+
- **Reject blog/SO as primary source** — they may CONFIRM, never ESTABLISH.
|
|
166
|
+
- **Cutoff-flag everything post-January 2026** — your memory is wrong often enough to require external validation.
|
|
167
|
+
- **If docs don't exist** (tiny lib, no website, just README) → read the source directly at the tag. No README + no source available → replace the lib.
|
|
168
|
+
- **Budget**: 5 APIs × 2 min lookups = 10 min max. Beyond 10 APIs, batch via a single doc-site crawl or ask user to narrow.
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## How to verify
|
|
173
|
+
|
|
174
|
+
- [ ] Exact versions extracted from lock files?
|
|
175
|
+
- [ ] Official docs located for each API call?
|
|
176
|
+
- [ ] Each proposed API validated (function name, signature, params, return type)?
|
|
177
|
+
- [ ] Citations enforced (file:line or URL for every API)?
|
|
178
|
+
- [ ] Training-cutoff awareness applied (if lib updated after cutoff)?
|
|
179
|
+
- [ ] VERDICT issued (VALID / INVALID / UNCERTAIN)?
|
|
180
|
+
|
|
181
|
+
## When triggered
|
|
182
|
+
|
|
183
|
+
- RECHERCHE step for Standard/Critical tasks using external libs
|
|
184
|
+
- Before any code using a lib published/updated after your knowledge cutoff
|
|
185
|
+
- When `@ciel-researcher` is dispatched for API design
|
|
186
|
+
- When user says "use library X" and you have no strong prior
|
|
187
|
+
- After `ai-failure-modes-detector` flags an invented-API risk
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## References
|
|
192
|
+
|
|
193
|
+
- ISSTA 2025 — "LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation"
|
|
194
|
+
- arxiv 2404.00971 — "Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code"
|
|
195
|
+
- Mintlify — AI hallucination prevention via accurate docs
|
|
196
|
+
- Context7 MCP — `@upstash/context7-mcp` for live official-doc retrieval
|