@neikyun/ciel 6.11.0 → 6.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/assets/.claude/hooks/memory-engine.py +29 -4
  2. package/assets/.claude/settings.json +8 -8
  3. package/assets/commands/ciel-create-skill.md +2 -2
  4. package/assets/commands/ciel-status.md +1 -1
  5. package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +2 -2
  6. package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +2 -2
  7. package/assets/platforms/opencode/.opencode/commands/ciel-memory-bootstrap.md +195 -0
  8. package/assets/skills/workflow/adr-auto/SKILL.md +88 -0
  9. package/assets/skills/workflow/ai-failure-modes-detector/SKILL.md +180 -0
  10. package/assets/skills/workflow/ask-window/SKILL.md +119 -0
  11. package/assets/skills/workflow/avec-quoi-versioner/SKILL.md +111 -0
  12. package/assets/skills/workflow/ci-watcher/SKILL.md +194 -0
  13. package/assets/skills/workflow/critiquer-auditor/SKILL.md +135 -0
  14. package/assets/skills/workflow/critiquer-auditor/reference.md +134 -0
  15. package/assets/skills/workflow/debug-reasoning-rca/SKILL.md +174 -0
  16. package/assets/skills/workflow/depth-classifier/SKILL.md +118 -0
  17. package/assets/skills/workflow/diverge/SKILL.md +91 -0
  18. package/assets/skills/workflow/doc-validator-official/SKILL.md +196 -0
  19. package/assets/skills/workflow/evaluer-sizer/SKILL.md +112 -0
  20. package/assets/skills/workflow/faire-gatekeeper/SKILL.md +99 -0
  21. package/assets/skills/workflow/flux-narrator/SKILL.md +93 -0
  22. package/assets/skills/workflow/memoire/SKILL.md +198 -0
  23. package/assets/skills/workflow/memoire-consolidator/SKILL.md +91 -0
  24. package/assets/skills/workflow/meta-critiquer/SKILL.md +112 -0
  25. package/assets/skills/workflow/modern-patterns-checker/SKILL.md +166 -0
  26. package/assets/skills/workflow/pattern-fitness-check/SKILL.md +108 -0
  27. package/assets/skills/workflow/playwright-visual-critic/SKILL.md +98 -0
  28. package/assets/skills/workflow/pr-review-responder/SKILL.md +214 -0
  29. package/assets/skills/workflow/prouver-verifier/SKILL.md +184 -0
  30. package/assets/skills/workflow/prouver-verifier/reference.md +152 -0
  31. package/assets/skills/workflow/quoi-framer/SKILL.md +91 -0
  32. package/assets/skills/workflow/relire-critic/SKILL.md +99 -0
  33. package/assets/skills/workflow/security-regression-check/SKILL.md +86 -0
  34. package/assets/skills/workflow/self-consistency-verifier/SKILL.md +85 -0
  35. package/assets/skills/workflow/spike-mode/SKILL.md +101 -0
  36. package/assets/skills/workflow/stride-analyzer/SKILL.md +96 -0
  37. package/assets/skills/workflow/stride-analyzer/reference.md +144 -0
  38. package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md +119 -0
  39. package/package.json +1 -1
@@ -0,0 +1,174 @@
1
+ ---
2
+ name: debug-reasoning-rca
3
+ description: How to debug systematically — hypothesis-driven root cause analysis methodology. 3 parallel hypotheses, fault-type taxonomy (model/context/orchestration/environment), semantic diff between expected and actual behavior. For bugs, incidents, flaky tests, regressions, production failures.
4
+ allowed-tools: Read, Grep, Glob, Bash
5
+ ---
6
+
7
+ # Systematic Debugging — Root Cause Analysis Methodology
8
+
9
+ ## What this covers
10
+
11
+ How to find the real cause of a bug, not just patch the symptom. Default LLM failure: jump to the first plausible fix. Proper debugging is hypothesis-driven (Hunt & Thomas) and catches 75% more recurrences (STRATUS 2025).
12
+
13
+ ## Core principle
14
+
15
+ **Never propose a fix before a hypothesis is SUPPORTED by evidence.** "It might be this, let me fix it" is forbidden.
16
+
17
+ ## Step 1: Gather context
18
+
19
+ Before hypothesizing, understand the failure:
20
+
21
+ - **Read the error literally** — stack trace, log line, exit code. What does the system actually say?
22
+ - **Read the failing code** at the exact `file:line` from the trace
23
+ - **Check recent changes** — `git log -p --since="7 days ago" -- <scope>`. A recent bug usually has a recent cause.
24
+ - **Run the repro** once and capture full output
25
+
26
+ Skip this step = hypotheses based on vibes.
27
+
28
+ ## Step 2: Generate 3 hypotheses
29
+
30
+ Generate EXACTLY 3 **causally distinct** hypotheses. Not 3 variants of the same theory.
31
+
32
+ Format:
33
+ ```
34
+ H<n>: <cause> → <mechanism> → <observable effect>
35
+ Evidence for: <what would be true if correct>
36
+ Evidence against: <what would be true if wrong>
37
+ Fault-type: [MODEL | CONTEXT | ORCHESTRATION | ENVIRONMENT]
38
+ ```
39
+
40
+ ### Fault-type taxonomy
41
+
42
+ | Type | What it means | Example |
43
+ |------|--------------|---------|
44
+ | **MODEL** | Code logic wrong | Off-by-one, wrong algorithm, wrong assumption |
45
+ | **CONTEXT** | Missing/stale input | Wrong config, race window, state leak |
46
+ | **ORCHESTRATION** | Infrastructure misconfigured | Retry/timeout wrong, queue backlog |
47
+ | **ENVIRONMENT** | External change | Dependency drift, OS change, infra outage |
48
+
49
+ **Distribution rule**: hypotheses must span AT LEAST 2 fault-types. Three MODEL hypotheses = tunnel vision.
50
+
51
+ ## Step 3: Validate (targeted checks)
52
+
53
+ For each hypothesis, run ONE targeted check (not fix):
54
+
55
+ - MODEL → add a log line or unit test asserting the expected invariant
56
+ - CONTEXT → dump actual input/config at failure point; diff vs expected
57
+ - ORCHESTRATION → check retry count, timeout, queue depth at failure time
58
+ - ENVIRONMENT → `<pkg-mgr> list | grep <dep>` vs lockfile; `uname -a`
59
+
60
+ Record: evidence collected, hypothesis supported/refuted/inconclusive.
61
+
62
+ ## Step 4: Semantic diff
63
+
64
+ Once supported, write the diff between expected and actual:
65
+
66
+ ```
67
+ EXPECTED: <behavior that should happen>
68
+ ACTUAL: <behavior that happens>
69
+ GAP: <precise mechanism>
70
+ ROOT: <why the gap exists — not "because of the bug", the underlying why>
71
+ ```
72
+
73
+ If ROOT reads like "because the code is buggy" — you've only found the symptom. Ask "why" again.
74
+
75
+ ## Step 5: Fix (two layers)
76
+
77
+ - **Direct fix** — address the supported hypothesis (the bug itself)
78
+ - **Systemic fix** — address why the bug was possible (missing test, missing alert, missing type)
79
+
80
+ Systemic fix is the 75% MTTR-reduction lever. Don't skip it on Critical bugs.
81
+
82
+ ## Output format
83
+
84
+ ```
85
+ ## RCA VERDICT
86
+
87
+ ### Symptom
88
+ <1 sentence>
89
+
90
+ ### Repro
91
+ <exact command or "flaky — triggers ~1/N runs">
92
+
93
+ ### Hypotheses explored
94
+ H1 [MODEL]: <cause> — <supported|refuted|inconclusive> — <evidence>
95
+ H2 [CONTEXT]: <cause> — <supported|refuted|inconclusive> — <evidence>
96
+ H3 [ORCHESTRATION]: <cause> — <supported|refuted|inconclusive> — <evidence>
97
+
98
+ ### Root cause
99
+ <hypothesis number>: <cause>
100
+
101
+ ### Semantic diff
102
+ EXPECTED/ACTUAL/GAP/ROOT
103
+
104
+ ### Fix
105
+ - Direct: <exact code change>
106
+ - Systemic: <test/alert/process to add>
107
+
108
+ ### Confidence
109
+ HIGH | MEDIUM | LOW — <why>
110
+ ```
111
+
112
+ ## Auto-inference (before asking the user)
113
+
114
+ Exhaust these sources before flagging input as unknown:
115
+
116
+ - **SYMPTOM** → grep last error in user's prompt; tail service logs; check recent PR descriptions
117
+ - **REPRO** → read `package.json` scripts, `Makefile`, `README.md`, test files, CI workflow
118
+ - **SCOPE** → `git diff HEAD~10 --stat` then rank by overlap with symptom keywords
119
+ - **RECENT_CHANGES** → `git log --since="7 days ago" --oneline -- <scope>`
120
+
121
+ State inferred values as `[ASSUMED from <source>]`. Only flag as `[UNKNOWN]` if truly blocking.
122
+
123
+ ## How to verify
124
+
125
+ - [ ] ≥ 3 hypotheses generated (not just 1)?
126
+ - [ ] Each hypothesis has a fault type from the taxonomy?
127
+ - [ ] Semantic diff completed (EXPECTED vs ACTUAL vs GAP)?
128
+ - [ ] Root cause identified with evidence (file:line)?
129
+ - [ ] Fix addresses root cause, not symptom?
130
+ - [ ] Confidence level stated (HIGH/MEDIUM/LOW)?
131
+
132
+ ## Anti-patterns
133
+
134
+ - **Patch-the-symptom**: add try/catch without understanding WHY it failed
135
+ - **Fix-the-test**: modify assertion to match wrong behavior instead of fixing code
136
+ - **Guess-and-check**: 5 commits titled "try fix" — no hypothesis discipline
137
+ - **First-hypothesis-wins**: commit first theory without validating alternatives
138
+ - **No repro, no RCA**: chasing intermittent bugs without deterministic repro burns hours
139
+
140
+ ## Structured RCA methods (complementary)
141
+
142
+ The 3-hypothesis method above is the default — fast, hypothesis-driven, good for most bugs. For complex, recurrent, or systemic problems, these structured RCA methods add depth.
143
+
144
+ ### Decision guide
145
+
146
+ | Problem type | Method | Why |
147
+ |-------------|--------|-----|
148
+ | Linear, single-symptom | **3 hypotheses** (default) | Fastest — parallel hypotheses, minimal overhead |
149
+ | Recurrent incident, process failure | **5 Whys** | Iterative questioning reaches systemic root cause |
150
+ | Multi-factor, need exhaustive exploration | **Ishikawa (Fishbone)** | 6M families (Method/Machine/Manpower/Material/Milieu/Measurement) guide complete coverage |
151
+ | Multi-layer, complex system | **Drill Down / Tree Diagram** | Decompose recursively (build → deploy → runtime → data) into atomic sub-causes; visualize as tree |
152
+ | Interacting causes, feedback loops | **Relations Diagram** | Map causal links, count outbound/inbound arrows to find drivers vs effects |
153
+
154
+ **When to use the full sequence**: if the problem involves ≥ 3 interacting factors across distinct system layers, use the full chain: Ishikawa (explore) → Relations Diagram (map interactions) → 5 Whys on each promising node → Tree Diagram (document). For simpler problems, pick one method from the guide.
155
+
156
+ ### 5 Whys
157
+
158
+ Ask "why?" iteratively (5× typical) on the symptom. Each answer becomes the next question. Stop when the cause is systemic/process-level, not technical. **Anti-pattern**: stopping at "error 500" — the real cause may be "no integration test catches this path."
159
+
160
+ ### Ishikawa (Fishbone)
161
+
162
+ Draw a horizontal spine ending at the problem (fish head). Add diagonal bones for 6 families: Method, Machine, Manpower, Material, Milieu, Measurement (adapt to software: Technology, Data/API). Branch sub-causes off each family. **Anti-pattern**: filling every family superficially — depth > breadth.
163
+
164
+ ### Drill Down / Tree Diagram
165
+
166
+ Decompose the problem into 2-4 MECE sub-causes at each level, recursing until atomic (directly fixable). Visualize the result as a hierarchical tree with AND/OR logic per branch. These are the same analytical process — decomposition (Drill Down) and visualization (Tree Diagram). **Anti-pattern**: stopping at shallow levels — "module X crashes" isn't actionable, "method Y throws Z when condition W" is.
167
+
168
+ ### Relations Diagram
169
+
170
+ List all discovered factors. For each pair, ask if causation exists and in which direction. Draw arrows. Count outbound (drivers) vs inbound (effects). Nodes with the most outbound arrows are root cause candidates. **Anti-pattern**: connecting everything — if most factors connect to most others, the diagram is not discriminating; focus on clear causal links only.
171
+
172
+ ## Key insight
173
+
174
+ The hardest part of debugging is not finding the fix — it's resisting the urge to fix before understanding. The 3-hypothesis discipline forces you to consider alternatives before committing to one.
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: depth-classifier
3
+ description: Classifies a coding task as Trivial, Standard, or Critical based on mechanical signals (auth paths, security code, DB tables, diff size, route handlers). Use at the start of every Ciel workflow to determine which downstream skills to invoke. Returns a one-word depth + rationale + pipeline recommendation.
4
+ allowed-tools: Read, Grep, Glob
5
+ ---
6
+
7
+ # depth-classifier — Classify task depth
8
+
9
+ Gatekeeper skill at the entry of every Ciel workflow. Wrong classification = wrong depth = either waste (over-processing trivial) or risk (under-processing critical).
10
+
11
+ ---
12
+
13
+ ## Inputs
14
+
15
+ - **task**: the task description in natural language (from `/ciel <task>` or user message)
16
+ - **project-root** (optional): absolute path, defaults to CWD
17
+ - **overlay** (optional): `ciel-overlay.md` content if available
18
+
19
+ ---
20
+
21
+ ## Classification signals
22
+
23
+ ### Critical if ANY match:
24
+
25
+ - Path patterns: `auth/`, `security/`, `Token`, `Password`, `Secret`, `Session`, `Crypto`
26
+ - DB table names: `users`, `sessions`, `tokens`, `accounts`, `credentials`, `2fa`, `api_keys`
27
+ - Code patterns: `.executeQuery`, `.executeUpdate`, raw SQL, `userId` (server-provided vs client-provided), `role`, `permission`
28
+ - Task keywords: "authentication", "authorization", "payment", "migration (DB schema)", "JWT", "OAuth", "encryption", "2FA", "session"
29
+ - Scope: touches user data, money, audit trails
30
+
31
+ ### Standard if ANY match (and not Critical):
32
+
33
+ - Path patterns: `routes/`, `controllers/`, `services/`, `components/`, `hooks/`
34
+ - **CI/CD & pipeline files**: `.github/workflows/*.yml`, `.gitlab-ci.yml`, `.circleci/`, `Dockerfile`, `docker-compose*.yml`, `Jenkinsfile`, `.buildkite/`, `.drone.yml`
35
+ - **PR-review signals**:
36
+ - Prompt contains a PR number (`#\d+`, `PR \d+`, `pull request \d+`) OR phrases "open PR", "review PR", "fix PR", "merge PR"
37
+ - Planned tool calls include `gh pr list`, `gh pr view`, `gh pr checks`, `gh pr review`, `gh pr merge` (any variant: `--auto`, `--squash`, `--merge`, `--rebase`)
38
+ - Planned edits touch any CI/CD pipeline file (see row above)
39
+ - Diff scope (estimated): > 1 file OR > 50 lines change
40
+ - Code patterns: `validate`, `sanitize`, `rateLimit`, route handlers, state management
41
+ - Task keywords: "add endpoint", "new component", "refactor", "extract helper", "feature", "integration"
42
+
43
+ **Floor rule**: if ANY PR-review signal OR any CI/CD-file signal is present, depth is **at minimum Standard** — Trivial is disqualified even if the diff is small. PR review plus CI fix is never "just a one-line change".
44
+
45
+ ### Trivial otherwise:
46
+
47
+ - Rename, typo, 1-line fix, copyright update, README edit
48
+ - Single-file localized change ≤ 10 lines
49
+ - No business logic change
50
+
51
+ ### Default rule
52
+
53
+ If unsure → **Standard**. If touching user data or auth → **Critical**.
54
+
55
+ ---
56
+
57
+ ## Pipeline recommendations
58
+
59
+ Return pipeline for each depth:
60
+
61
+ ### Trivial
62
+ `quoi-framer` → `pattern-fitness-check` → `faire-gatekeeper` → `relire-critic` (inline) → push → `meta-critiquer`
63
+
64
+ ### Standard
65
+ `quoi-framer` → `avec-quoi-versioner` → [researcher agent + explorer agent IN PARALLEL] → `evaluer-sizer` → `faire-gatekeeper` → `critic` agent MODE=RELIRE → `prouver-verifier` → `meta-critiquer`
66
+
67
+ ### Critical
68
+ All of Standard + `stride-analyzer` (after `avec-quoi-versioner`) + `security-regression-check` (between FAIRE and RELIRE) + critic agent MANDATORY
69
+
70
+ ---
71
+
72
+ ## Output format
73
+
74
+ ```
75
+ ## DEPTH CLASSIFICATION
76
+
77
+ Depth: **Trivial | Standard | Critical**
78
+
79
+ Signals detected:
80
+ - [signal 1 with source — e.g. "path matches /auth/"]
81
+ - [signal 2]
82
+
83
+ Rationale: [1-2 sentences]
84
+
85
+ Pipeline:
86
+ 1. <skill>
87
+ 2. <skill>
88
+ ...
89
+
90
+ Agents required:
91
+ - [researcher: yes/no]
92
+ - [explorer: yes/no]
93
+ - [critic: yes/no]
94
+ ```
95
+
96
+ ---
97
+
98
+ ## Guardrails
99
+
100
+ - **Asymmetric bias**: when borderline between Trivial/Standard → Standard wins. When borderline between Standard/Critical → Critical wins. Missing a Critical is worse than over-processing a Standard.
101
+ - **Auth/security override**: any mention of auth, credentials, tokens, or user identity → Critical regardless of diff size
102
+ - **Single-line fix can still be Critical**: e.g. a 1-char fix in an auth check is Critical
103
+ - **Don't infer from filename alone**: `UserService.kt` could be Trivial if the change is a rename. Look at the actual code change being proposed.
104
+
105
+ ---
106
+
107
+ ## How to verify
108
+
109
+ - [ ] Classification signals checked (Critical, Standard, Trivial)?
110
+ - [ ] Pipeline recommendation provided?
111
+ - [ ] Default rule applied (Unsure → Standard)?
112
+ - [ ] Auth/security files → Critical?
113
+
114
+ ## When triggered
115
+
116
+ - Automatically at start of `/ciel <task>` via the `ciel` orchestrator
117
+ - By `UserPromptSubmit` hook (light classification hint injected into context)
118
+ - Explicitly when depth is ambiguous after initial assessment
@@ -0,0 +1,91 @@
1
+ ---
2
+ name: diverge
3
+ description: How to explore 2-3 radically different approaches before choosing one (Ciel v5 etape 5). Used after AVEC QUOI, before RECHERCHE. Prevents single-approach bias and premature convergence. Use when the task is non-trivial and there are multiple valid approaches.
4
+ ---
5
+
6
+ # Divergent Exploration — 2-3 Approaches Before Choosing (Ciel v5)
7
+
8
+ ## What this covers
9
+
10
+ How to explore multiple approaches before committing to one. In Ciel v5, this is etape 5 (DIVERGE). The goal is to avoid premature convergence on the first viable approach that comes to mind.
11
+
12
+ ## Core principle
13
+
14
+ **Generate 2-3 approaches before evaluating any of them.** The first approach that works is rarely the best. In v5, DIVERGE happens after AVEC QUOI (versions checked) and before RECHERCHE (external research).
15
+
16
+ ## When to use
17
+
18
+ - Non-trivial tasks with multiple valid solutions
19
+ - Architectural decisions
20
+ - Library/framework choices
21
+ - Design patterns
22
+ - Database schema design
23
+ - API design
24
+
25
+ **When NOT to use**: 1-line fix, rename, trivial config change, obvious solution.
26
+
27
+ ## The process
28
+
29
+ ### Step 1: Generate at least 2 approaches
30
+
31
+ For each approach, describe:
32
+ - What it does (1-2 sentences)
33
+ - Key trade-offs (not "it's better" -- specific pros/cons)
34
+ - Implementation effort (rough estimate)
35
+ - Risk level (low/medium/high)
36
+
37
+ Approaches should be GENUINELY different. Not "use React vs use React with hooks" -- same approach. Bad:
38
+ - "Use PostgreSQL vs MySQL" (trivial database choice)
39
+ - "Use REST vs GraphQL" (genuinely different)
40
+
41
+ ### Step 2: Let them compete (not you decide)
42
+
43
+ Generate approaches WITHOUT evaluating them. Evaluation happens in EVALUER (etape 9), after RECHERCHE has gathered external data about each approach.
44
+
45
+ Common trap: generating 2 approaches but immediately choosing the first one without research.
46
+
47
+ ### Step 3: Document for EVALUER
48
+
49
+ Pass both approaches (with their trade-offs, effort, risk) to the EVALUER phase. The researcher should check documentation for BOTH approaches.
50
+
51
+ ## Output format
52
+
53
+ ```
54
+ ## DIVERGE
55
+
56
+ ### Approach A: <name>
57
+ What: <1-2 sentences>
58
+ Trade-offs:
59
+ + <pro>
60
+ - <con>
61
+ Effort: <XS/S/M/L/XL>
62
+ Risk: <low/medium/high>
63
+
64
+ ### Approach B: <name>
65
+ What: <1-2 sentences>
66
+ Trade-offs:
67
+ + <pro>
68
+ - <con>
69
+ Effort: <XS/S/M/L/XL>
70
+ Risk: <low/medium/high>
71
+
72
+ ### (Optional) Approach C: <name>
73
+ ...
74
+ ```
75
+
76
+ ## Common rationalizations
77
+
78
+ | Rationalization | Reality |
79
+ |---|---|
80
+ | "I already know the best approach" | You know the first approach that came to mind. That's not the same as the best approach. Generate 2-3 then compare. |
81
+ | "Diverging takes too long" | It takes 5 minutes. Committing to the wrong approach costs days. The math is clear. |
82
+ | "There's only one valid way to do this" | There are almost always 2+ valid approaches. If you can't think of alternatives, you don't understand the problem well enough. |
83
+
84
+ ## How to verify
85
+
86
+ - [ ] >= 2 genuinely different approaches generated?
87
+ - [ ] Approaches are different in kind, not degree?
88
+ - [ ] Trade-offs documented for each?
89
+ - [ ] Effort estimated?
90
+ - [ ] Risk assessed?
91
+ - [ ] Evaluation deferred to next phase (EVALUER)?
@@ -0,0 +1,196 @@
1
+ ---
2
+ name: doc-validator-official
3
+ description: Before generating code that calls an external library, framework, or API, fetches the OFFICIAL documentation for the exact version in use and validates that each proposed API call (function name, signature, parameters, return type) exists as cited. Rejects reliance on Stack Overflow/blog posts when official docs exist. Forces citations for every non-trivial API use. The primary anti-hallucination gate for the RECHERCHE step.
4
+ allowed-tools: Read, Grep, Glob, Bash, WebFetch, WebSearch
5
+ ---
6
+
7
+ # doc-validator-official — Official docs first, blogs never
8
+
9
+ LLM hallucination of APIs is the #1 coding failure mode (ISSTA 2025). Functions that don't exist, wrong version signatures, parameters invented, return types fabricated. Advanced RAG against official docs eliminates this class of bug.
10
+
11
+ ---
12
+
13
+ ## Inputs (infer before asking — see orchestrator's Autonomy protocol)
14
+
15
+ ```
16
+ TARGET_STACK: [language + framework + version — e.g., "TypeScript 5.5 + React 19"]
17
+ PROPOSED_APIS: [list of function/class/method calls the implementation will use]
18
+ PACKAGE_SOURCES: [paths to package.json / go.mod / requirements.txt / Cargo.toml]
19
+ ```
20
+
21
+ ### Auto-inference sources (exhaust BEFORE asking the user)
22
+
23
+ - **PACKAGE_SOURCES** → `find . -maxdepth 3 -name 'package.json' -o -name 'go.mod' -o -name 'requirements.txt' -o -name 'pyproject.toml' -o -name 'Cargo.toml' -o -name 'Gemfile'` — pick up every manifest without asking.
24
+ - **TARGET_STACK** → derive from PACKAGE_SOURCES (read the files, extract versions of the key libs). Cross-check with `ciel-overlay.md`.
25
+ - **PROPOSED_APIS** → parse from the user's task description + any referenced code diff. If user said "use stripe to refund X", APIs = `stripe.refunds.create`, `stripe.paymentIntents.retrieve`, etc.
26
+
27
+ Only BLOCK if no manifest file exists at all (greenfield project with no deps yet) — then ask once "Which package.json / go.mod should I validate against?".
28
+
29
+ ---
30
+
31
+ ## Phase 1 — Extract exact versions
32
+
33
+ Read package manifests. For each lib in PROPOSED_APIS extract the pinned version:
34
+
35
+ ```bash
36
+ # npm/yarn/pnpm
37
+ jq -r '.dependencies + .devDependencies | to_entries[] | "\(.key) \(.value)"' package.json
38
+
39
+ # go
40
+ grep -E '^\s*<lib>' go.mod
41
+
42
+ # python
43
+ grep -E '^<lib>' requirements.txt pyproject.toml
44
+ ```
45
+
46
+ Record as `{lib_name, pinned_version, source_file:line}`.
47
+
48
+ If version is a range (`^1.2.0`) → resolve the actual installed version from lockfile (`package-lock.json`, `yarn.lock`, `uv.lock`, `Cargo.lock`). Never validate against a range.
49
+
50
+ ---
51
+
52
+ ## Phase 2 — Locate official docs
53
+
54
+ For each lib, find the CANONICAL doc URL for the exact version. Priority order:
55
+
56
+ 1. **Versioned docs site** — `https://reactjs.org/docs/v19.0.0/` or `https://fastapi.tiangolo.com/release-notes/`
57
+ 2. **Repo `/docs/` at the tag** — `https://github.com/org/repo/tree/v1.2.0/docs`
58
+ 3. **README at the tag** — `https://github.com/org/repo/blob/v1.2.0/README.md`
59
+ 4. **Context7 MCP** (if available) — provides up-to-date official docs for thousands of libs
60
+
61
+ ### Reject these sources
62
+
63
+ - Stack Overflow answers (even highly upvoted — often stale)
64
+ - Medium/dev.to blog posts (version drift, author may have been wrong)
65
+ - AI-generated tutorials (recursion hazard)
66
+ - Forum posts without corroboration by official docs
67
+
68
+ These may GUIDE investigation but never JUSTIFY an API claim.
69
+
70
+ ---
71
+
72
+ ## Phase 3 — Validate each proposed API
73
+
74
+ For each item in PROPOSED_APIS:
75
+
76
+ 1. **Fetch the official doc page** for that function/class.
77
+ 2. **Verify the signature matches** — function exists, parameter names and types match, return type matches.
78
+ 3. **Verify version availability** — "Added in vX.Y" metadata. If the pinned version < X.Y, the API doesn't exist in this project yet.
79
+ 4. **Capture citation** — URL + section header + (if possible) quoted signature.
80
+
81
+ Output per API:
82
+ ```
83
+ [VALID] lib.funcName(a: T1, b: T2): T3
84
+ Source: <URL>#section
85
+ Cited: "funcName(a, b) → T3 — Added in 1.4.0"
86
+ Pinned: 1.5.2 ✓
87
+ ```
88
+
89
+ or:
90
+ ```
91
+ [INVALID] lib.funcName — NOT FOUND in v1.5.2 docs
92
+ Similar: lib.otherFunc (did you mean this?)
93
+ Action: rename or choose a different lib
94
+ ```
95
+
96
+ or:
97
+ ```
98
+ [AMBIGUOUS] lib.funcName exists but signature differs
99
+ Doc says: funcName(a: string, opts?: Opts) → Promise<T>
100
+ Proposed: funcName(a, b) — missing opts wrapping
101
+ Action: rewrite call site to match doc signature
102
+ ```
103
+
104
+ ---
105
+
106
+ ## Phase 4 — Citation enforcement
107
+
108
+ Every non-trivial API use in the final implementation MUST have a citation comment OR be documented in the PR description. Trivial = stdlib builtin (`Array.map`, `str.split`). Non-trivial = third-party lib, framework-specific, version-sensitive stdlib (e.g., `Intl.Segmenter`).
109
+
110
+ Citation format in code (optional, acceptable if 3+ APIs would clutter):
111
+ ```typescript
112
+ // Per react.dev/reference/react/useTransition (v19)
113
+ const [isPending, startTransition] = useTransition();
114
+ ```
115
+
116
+ Citation format in PR description (mandatory for Critical tasks):
117
+ ```
118
+ ## External APIs used
119
+ - `react.useTransition` — react.dev/reference/react/useTransition (v19)
120
+ - `drizzle-orm.select().from()` — orm.drizzle.team/docs/select (v0.33)
121
+ ```
122
+
123
+ ---
124
+
125
+ ## Phase 5 — Training-cutoff awareness
126
+
127
+ If a lib in PROPOSED_APIS was released or had a major version AFTER your knowledge cutoff (January 2026), explicitly flag:
128
+
129
+ ```
130
+ [CUTOFF-WARNING] lib <name> vX.Y (released 2026-MM-DD)
131
+ Your training data does not reliably cover this version.
132
+ MANDATORY: fetch live docs, do not rely on pattern-matching from memory.
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Output format
138
+
139
+ ```
140
+ ## DOC VALIDATION
141
+
142
+ ### Versions resolved
143
+ - react 19.0.2 (from package-lock.json:1234)
144
+ - drizzle-orm 0.33.1 (from package-lock.json:5678)
145
+
146
+ ### API validation
147
+ [VALID] react.useTransition — react.dev/.../useTransition (v19)
148
+ [VALID] drizzle-orm.select — orm.drizzle.team/docs/select (v0.33)
149
+ [INVALID] drizzle-orm.raw — not in v0.33, renamed to sql.raw in v0.30+
150
+ [AMBIGUOUS] react.use — signature changed in v19, proposed call uses v18 shape
151
+
152
+ ### Cutoff warnings
153
+ - drizzle-orm 0.33 (released 2026-02) — post-cutoff, relied on live fetch
154
+
155
+ ### Verdict
156
+ BLOCKING: 1 INVALID, 1 AMBIGUOUS — cannot proceed until resolved
157
+ ```
158
+
159
+ ---
160
+
161
+ ## Guardrails
162
+
163
+ - **Never infer an API from "it should exist"** — if you can't cite the doc page, the API doesn't exist for your purposes.
164
+ - **Exact version, never range** — validating against a range produces false positives.
165
+ - **Reject blog/SO as primary source** — they may CONFIRM, never ESTABLISH.
166
+ - **Cutoff-flag everything post-January 2026** — your memory is wrong often enough to require external validation.
167
+ - **If docs don't exist** (tiny lib, no website, just README) → read the source directly at the tag. No README + no source available → replace the lib.
168
+ - **Budget**: 5 APIs × 2 min lookups = 10 min max. Beyond 10 APIs, batch via a single doc-site crawl or ask user to narrow.
169
+
170
+ ---
171
+
172
+ ## How to verify
173
+
174
+ - [ ] Exact versions extracted from lock files?
175
+ - [ ] Official docs located for each API call?
176
+ - [ ] Each proposed API validated (function name, signature, params, return type)?
177
+ - [ ] Citations enforced (file:line or URL for every API)?
178
+ - [ ] Training-cutoff awareness applied (if lib updated after cutoff)?
179
+ - [ ] VERDICT issued (VALID / INVALID / UNCERTAIN)?
180
+
181
+ ## When triggered
182
+
183
+ - RECHERCHE step for Standard/Critical tasks using external libs
184
+ - Before any code using a lib published/updated after your knowledge cutoff
185
+ - When `@ciel-researcher` is dispatched for API design
186
+ - When user says "use library X" and you have no strong prior
187
+ - After `ai-failure-modes-detector` flags an invented-API risk
188
+
189
+ ---
190
+
191
+ ## References
192
+
193
+ - ISSTA 2025 — "LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation"
194
+ - arxiv 2404.00971 — "Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code"
195
+ - Mintlify — AI hallucination prevention via accurate docs
196
+ - Context7 MCP — `@upstash/context7-mcp` for live official-doc retrieval