@neikyun/ciel 6.10.1 → 6.11.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/assets/.claude/hooks/memory-engine.py +256 -0
- package/assets/commands/ciel-audit.md +42 -0
- package/assets/commands/ciel-create-skill.md +2 -2
- package/assets/commands/ciel-status.md +1 -1
- package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +2 -2
- package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +2 -2
- package/assets/platforms/opencode/.opencode/commands/ciel-memory-bootstrap.md +195 -0
- package/assets/skills/ciel/SKILL.md +2 -1
- package/assets/skills/workflow/adr-auto/SKILL.md +88 -0
- package/assets/skills/workflow/ai-failure-modes-detector/SKILL.md +180 -0
- package/assets/skills/workflow/ask-window/SKILL.md +119 -0
- package/assets/skills/workflow/avec-quoi-versioner/SKILL.md +111 -0
- package/assets/skills/workflow/ci-watcher/SKILL.md +194 -0
- package/assets/skills/workflow/critiquer-auditor/SKILL.md +135 -0
- package/assets/skills/workflow/critiquer-auditor/reference.md +134 -0
- package/assets/skills/workflow/debug-reasoning-rca/SKILL.md +174 -0
- package/assets/skills/workflow/depth-classifier/SKILL.md +118 -0
- package/assets/skills/workflow/diverge/SKILL.md +91 -0
- package/assets/skills/workflow/doc-validator-official/SKILL.md +196 -0
- package/assets/skills/workflow/evaluer-sizer/SKILL.md +112 -0
- package/assets/skills/workflow/faire-gatekeeper/SKILL.md +99 -0
- package/assets/skills/workflow/flux-narrator/SKILL.md +93 -0
- package/assets/skills/workflow/memoire/SKILL.md +198 -0
- package/assets/skills/workflow/memoire-consolidator/SKILL.md +91 -0
- package/assets/skills/workflow/meta-critiquer/SKILL.md +112 -0
- package/assets/skills/workflow/modern-patterns-checker/SKILL.md +166 -0
- package/assets/skills/workflow/pattern-fitness-check/SKILL.md +108 -0
- package/assets/skills/workflow/playwright-visual-critic/SKILL.md +98 -0
- package/assets/skills/workflow/pr-review-responder/SKILL.md +214 -0
- package/assets/skills/workflow/prouver-verifier/SKILL.md +184 -0
- package/assets/skills/workflow/prouver-verifier/reference.md +152 -0
- package/assets/skills/workflow/quoi-framer/SKILL.md +91 -0
- package/assets/skills/workflow/relire-critic/SKILL.md +99 -0
- package/assets/skills/workflow/security-regression-check/SKILL.md +86 -0
- package/assets/skills/workflow/self-consistency-verifier/SKILL.md +85 -0
- package/assets/skills/workflow/spike-mode/SKILL.md +101 -0
- package/assets/skills/workflow/stride-analyzer/SKILL.md +96 -0
- package/assets/skills/workflow/stride-analyzer/reference.md +144 -0
- package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md +119 -0
- package/package.json +1 -1
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: critiquer-auditor
|
|
3
|
+
description: How to audit code comprehensively — 7-dimension review methodology covering expected behavior, assumptions, scope, code-vs-model comparison, STRIDE security, pattern consistency, and findings with severity. For PR reviews, retrospective audits, and "is this code correct?" questions.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash, WebSearch
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Code Audit — 7-Dimension Review Methodology
|
|
8
|
+
|
|
9
|
+
## What this covers
|
|
10
|
+
|
|
11
|
+
How to do a thorough code audit. Distinct from quick self-review (relire-critic) — this is the comprehensive methodology for PR reviews, retrospective audits, and quality checks.
|
|
12
|
+
|
|
13
|
+
## Core principle
|
|
14
|
+
|
|
15
|
+
**Read the diff/changed files FIRST.** All dimensions operate on actual code, never on assumptions. Description lies; code doesn't.
|
|
16
|
+
|
|
17
|
+
## Dimension 1: Expected behavior model
|
|
18
|
+
|
|
19
|
+
From issue/spec/PR description: "what was this SUPPOSED to do?"
|
|
20
|
+
|
|
21
|
+
- Build a bypass signal checklist for this change type BEFORE scanning code
|
|
22
|
+
- If external lib involved: search `[lib] [version] anti-patterns common mistakes`
|
|
23
|
+
|
|
24
|
+
Output: 1-2 sentence behavior model + min 3 bypass signals to look for.
|
|
25
|
+
|
|
26
|
+
## Dimension 2: Assumptions
|
|
27
|
+
|
|
28
|
+
- Git blame: why was the original code written this way?
|
|
29
|
+
- Surface 3 assumptions, verify each (grep / blame / read)
|
|
30
|
+
|
|
31
|
+
Output: 3 assumptions + verification status each.
|
|
32
|
+
|
|
33
|
+
## Dimension 3: Scope
|
|
34
|
+
|
|
35
|
+
- "What if we do nothing?" considered?
|
|
36
|
+
- Scope of change proportional to the problem?
|
|
37
|
+
|
|
38
|
+
Output: counterfactual + proportionality judgment.
|
|
39
|
+
|
|
40
|
+
## Dimension 4: Code vs model + STRIDE + OPS
|
|
41
|
+
|
|
42
|
+
- Code matches expected behavior model? (grep-backed)
|
|
43
|
+
- All bypass signals checked from dimension 1's list?
|
|
44
|
+
- **STRIDE all 6 categories**: S / T / R / I / D / E — mark N/A explicitly, never skip silently
|
|
45
|
+
- OPS lens: unclosed connections, memory leaks, locks, 100x volume
|
|
46
|
+
|
|
47
|
+
### STRIDE reference
|
|
48
|
+
|
|
49
|
+
| Category | What to check |
|
|
50
|
+
|----------|--------------|
|
|
51
|
+
| **S**poofing | Authentication bypass, identity assumption |
|
|
52
|
+
| **T**ampering | Data integrity, unauthorized modification |
|
|
53
|
+
| **R**epudiation | Audit trail, logging completeness |
|
|
54
|
+
| **I**nformation disclosure | Data exposure, error messages, logs |
|
|
55
|
+
| **D**enial of service | Resource exhaustion, infinite loops, missing limits |
|
|
56
|
+
| **E**levation of privilege | Authorization bypass, role escalation |
|
|
57
|
+
|
|
58
|
+
## Dimension 5: Consistency
|
|
59
|
+
|
|
60
|
+
- Grep: pattern used consistently elsewhere in the codebase?
|
|
61
|
+
- Layer boundaries respected (no business logic in routes, no DB in controllers)?
|
|
62
|
+
- Health thresholds from overlay met (complexity, coverage)?
|
|
63
|
+
|
|
64
|
+
## Dimension 6: Findings with severity
|
|
65
|
+
|
|
66
|
+
Format: `RISQUE: X parce que Y — IMPACT: Z`
|
|
67
|
+
|
|
68
|
+
Severity levels:
|
|
69
|
+
- **BLOCKING** — must fix before merge (correctness, security, data loss). Requires specific FIX.
|
|
70
|
+
- **IMPORTANT** — should fix (degraded behavior, tech debt with near-term risk)
|
|
71
|
+
- **MINOR** — nice to fix (style, naming, low-risk improvement)
|
|
72
|
+
- **VALIDATED** — explicitly checked and confirmed correct
|
|
73
|
+
|
|
74
|
+
Every finding: RISQUE format. Every BLOCKING: specific FIX + NOT-X (what solution must NOT do).
|
|
75
|
+
|
|
76
|
+
## Dimension 7: Close the loop
|
|
77
|
+
|
|
78
|
+
- New anti-pattern found? → add to Guards or project overlay
|
|
79
|
+
- New failure mode? → add Guard immediately
|
|
80
|
+
- Capture learnings for future reference
|
|
81
|
+
|
|
82
|
+
## Output format
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
## AUDIT
|
|
86
|
+
|
|
87
|
+
### Expected behavior
|
|
88
|
+
<1-2 sentences + bypass signals>
|
|
89
|
+
|
|
90
|
+
### Assumptions
|
|
91
|
+
1. <assumption> — verified: <yes/no, evidence>
|
|
92
|
+
2. ...
|
|
93
|
+
3. ...
|
|
94
|
+
|
|
95
|
+
### Scope
|
|
96
|
+
- Nothing-counterfactual: <consequence if no change>
|
|
97
|
+
- Scope proportional: <yes/no, reason>
|
|
98
|
+
|
|
99
|
+
### Code vs model + STRIDE
|
|
100
|
+
- Code vs model: <matches | deviates at file:line>
|
|
101
|
+
- Bypass signals: <N/3 flagged>
|
|
102
|
+
- STRIDE:
|
|
103
|
+
- S: <N/A because X | RISQUE: ...>
|
|
104
|
+
- T/R/I/D/E: ...
|
|
105
|
+
|
|
106
|
+
### Consistency
|
|
107
|
+
- Pattern: <grep evidence>
|
|
108
|
+
- Layers: <clean | violation at file:line>
|
|
109
|
+
- Thresholds: <met | violation>
|
|
110
|
+
|
|
111
|
+
### Findings
|
|
112
|
+
BLOCKING: <RISQUE + FIX>
|
|
113
|
+
IMPORTANT: <RISQUE + FIX/ACCEPT>
|
|
114
|
+
MINOR: <note>
|
|
115
|
+
VALIDATED: <what was verified>
|
|
116
|
+
|
|
117
|
+
### Learnings
|
|
118
|
+
- New Guard: <yes/no>
|
|
119
|
+
- Overlay update: <yes/no>
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
## How to verify
|
|
123
|
+
|
|
124
|
+
- [ ] All 7 dimensions completed (Expected behavior, Assumptions, Scope, Code vs model + STRIDE, Consistency, Findings, Learnings)?
|
|
125
|
+
- [ ] All 6 STRIDE categories present (even if N/A)?
|
|
126
|
+
- [ ] Findings have severity (BLOCKING/IMPORTANT/MINOR)?
|
|
127
|
+
- [ ] VALIDATED section identifies what code got right?
|
|
128
|
+
- [ ] Learnings captured?
|
|
129
|
+
|
|
130
|
+
## Common mistakes
|
|
131
|
+
|
|
132
|
+
- **Operating from PR description alone**: always read the actual code
|
|
133
|
+
- **Skipping STRIDE categories**: all 6 must be explicit, even if N/A
|
|
134
|
+
- **BLOCKING without FIX**: if you can't name the fix, it's not actionable enough for BLOCKING
|
|
135
|
+
- **No VALIDATED section**: reviews that only report problems miss what the code got right
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# critiquer-auditor — Reference
|
|
2
|
+
|
|
3
|
+
## STRIDE — audit probes (7-step audit context)
|
|
4
|
+
|
|
5
|
+
Use these probes when running COMPARER on each STRIDE category. Mark N/A explicitly; never skip.
|
|
6
|
+
|
|
7
|
+
### S — Spoofing
|
|
8
|
+
- Can I impersonate another user/service in this code path?
|
|
9
|
+
- Identity: client-supplied or server-resolved?
|
|
10
|
+
- WebSocket / SSE / GraphQL subscription: same auth as REST?
|
|
11
|
+
|
|
12
|
+
### T — Tampering
|
|
13
|
+
- Input modified in transit? HTTPS? Signatures?
|
|
14
|
+
- Idempotency keys present?
|
|
15
|
+
- CSRF protection on state-changing endpoints?
|
|
16
|
+
|
|
17
|
+
### R — Repudiation
|
|
18
|
+
- Audit log coverage: who, what, when recorded?
|
|
19
|
+
- Log integrity: append-only? remote-shipped?
|
|
20
|
+
|
|
21
|
+
### I — Information Disclosure
|
|
22
|
+
- Error messages: stack traces? SQL? paths?
|
|
23
|
+
- Logs: PII? secrets?
|
|
24
|
+
- Response bodies: over-fetching? unprojected columns?
|
|
25
|
+
- Timing attacks: 404 vs 403 distinction?
|
|
26
|
+
|
|
27
|
+
### D — Denial of Service
|
|
28
|
+
- Rate limiting per IP/user/endpoint?
|
|
29
|
+
- Resource bounds: payload size, query depth, file upload?
|
|
30
|
+
- Algorithmic complexity on user-controlled input?
|
|
31
|
+
- Regex catastrophic backtracking?
|
|
32
|
+
|
|
33
|
+
### E — Elevation of Privilege
|
|
34
|
+
- Permission check BEFORE action?
|
|
35
|
+
- Horizontal escalation: user A read user B's data?
|
|
36
|
+
- Vertical escalation: mass assignment setting `isAdmin`?
|
|
37
|
+
|
|
38
|
+
## Severity rubric
|
|
39
|
+
|
|
40
|
+
### BLOCKING
|
|
41
|
+
- Correctness bug: code produces wrong result for some input
|
|
42
|
+
- Security: any STRIDE finding that an attacker can exploit
|
|
43
|
+
- Data loss: delete/overwrite without backup/confirm
|
|
44
|
+
- Production crash: uncaught exception on common path
|
|
45
|
+
|
|
46
|
+
### IMPORTANT
|
|
47
|
+
- Degraded behavior: works but slow / intermittent
|
|
48
|
+
- Tech debt with near-term risk: pattern that will break at 2x current load
|
|
49
|
+
- Accessibility violation: keyboard/screen reader broken
|
|
50
|
+
- Test debt: feature ships without meaningful test
|
|
51
|
+
|
|
52
|
+
### MINOR
|
|
53
|
+
- Naming / style inconsistency
|
|
54
|
+
- Unused import
|
|
55
|
+
- Todo comment for future work
|
|
56
|
+
- Minor DRY violation (< 3 copies)
|
|
57
|
+
|
|
58
|
+
### VALIDATED
|
|
59
|
+
- Explicit callout of what was checked and confirmed correct
|
|
60
|
+
- Useful because it shows the reviewer's mental map
|
|
61
|
+
- Helps author understand what was covered vs skipped
|
|
62
|
+
|
|
63
|
+
## Counterfactual analysis
|
|
64
|
+
|
|
65
|
+
Questions to answer in QUESTIONNER step:
|
|
66
|
+
|
|
67
|
+
- What if we merged without this change? What breaks?
|
|
68
|
+
- Is there a 10% of this change that would solve 90% of the problem?
|
|
69
|
+
- Is this fixing a symptom or a cause? If symptom: where's the cause?
|
|
70
|
+
- Is this change reversible? If yes, risk is lower.
|
|
71
|
+
|
|
72
|
+
## Bypass signal checklist (build in APPRENDRE)
|
|
73
|
+
|
|
74
|
+
Common bypass signals to look for per framework:
|
|
75
|
+
|
|
76
|
+
### React / frontend
|
|
77
|
+
- `window.*` or `document.*` inside components
|
|
78
|
+
- `useEffect` with no dependency array
|
|
79
|
+
- Direct DOM manipulation via `refs.current`
|
|
80
|
+
- `dangerouslySetInnerHTML` with non-sanitized input
|
|
81
|
+
|
|
82
|
+
### Backend / JVM
|
|
83
|
+
- Raw SQL string concatenation
|
|
84
|
+
- `catch(Exception e) { }` or `catch → null`
|
|
85
|
+
- `as` cast without type guard (Kotlin) or unchecked cast (Java)
|
|
86
|
+
- Thread creation without pool
|
|
87
|
+
|
|
88
|
+
### Async / concurrent
|
|
89
|
+
- `async` function called without `await`
|
|
90
|
+
- Promise created but not awaited
|
|
91
|
+
- Race conditions on shared state
|
|
92
|
+
- Timeout of 0 or infinite
|
|
93
|
+
|
|
94
|
+
## Layer boundary violations
|
|
95
|
+
|
|
96
|
+
- Business logic in routes / controllers → should be in services
|
|
97
|
+
- DB calls in controllers → should be behind repository
|
|
98
|
+
- UI logic in models → should be in view layer
|
|
99
|
+
- Tests reaching across layers without mocks
|
|
100
|
+
|
|
101
|
+
## Overlay thresholds
|
|
102
|
+
|
|
103
|
+
If `ciel-overlay.md` exists under `## Santé du code`, check its thresholds:
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
### Santé du code
|
|
107
|
+
- Complexité cyclomatique: < 15 par fonction
|
|
108
|
+
- Profondeur d'imbrication: < 4
|
|
109
|
+
- Taille de fonction: < 50 lignes
|
|
110
|
+
- Couverture test: > 80% lignes modifiées
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
If any violation: IMPORTANT finding (can be demoted to MINOR if tiny exceedance).
|
|
114
|
+
|
|
115
|
+
## Capitalization format
|
|
116
|
+
|
|
117
|
+
When `learnings-capture` is invoked from CAPITALISER:
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
[YYYY-MM-DD] MISTAKE: <what happened, 1 line>
|
|
121
|
+
→ RULE: <how to avoid in future, 1 line>
|
|
122
|
+
→ Invoke: <which skill/guard catches this>
|
|
123
|
+
→ Evidence: <file:line where it was found>
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
This format feeds into `ciel-overlay.md` under `## Leçons projet` (project-specific) or `.claude/learnings.md` (general).
|
|
127
|
+
|
|
128
|
+
## Anti-patterns in audits
|
|
129
|
+
|
|
130
|
+
- Reviewing without reading the diff first → operate on assumptions
|
|
131
|
+
- STRIDE performed but all 6 "N/A" → didn't actually probe each category
|
|
132
|
+
- Only finding problems (no VALIDATED) → unclear what was checked
|
|
133
|
+
- BLOCKING without FIX → not actionable, author can't resolve
|
|
134
|
+
- Copying PR description into audit → pure theater, no independent thought
|
|
@@ -0,0 +1,174 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: debug-reasoning-rca
|
|
3
|
+
description: How to debug systematically — hypothesis-driven root cause analysis methodology. 3 parallel hypotheses, fault-type taxonomy (model/context/orchestration/environment), semantic diff between expected and actual behavior. For bugs, incidents, flaky tests, regressions, production failures.
|
|
4
|
+
allowed-tools: Read, Grep, Glob, Bash
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Systematic Debugging — Root Cause Analysis Methodology
|
|
8
|
+
|
|
9
|
+
## What this covers
|
|
10
|
+
|
|
11
|
+
How to find the real cause of a bug, not just patch the symptom. Default LLM failure: jump to the first plausible fix. Proper debugging is hypothesis-driven (Hunt & Thomas) and catches 75% more recurrences (STRATUS 2025).
|
|
12
|
+
|
|
13
|
+
## Core principle
|
|
14
|
+
|
|
15
|
+
**Never propose a fix before a hypothesis is SUPPORTED by evidence.** "It might be this, let me fix it" is forbidden.
|
|
16
|
+
|
|
17
|
+
## Step 1: Gather context
|
|
18
|
+
|
|
19
|
+
Before hypothesizing, understand the failure:
|
|
20
|
+
|
|
21
|
+
- **Read the error literally** — stack trace, log line, exit code. What does the system actually say?
|
|
22
|
+
- **Read the failing code** at the exact `file:line` from the trace
|
|
23
|
+
- **Check recent changes** — `git log -p --since="7 days ago" -- <scope>`. A recent bug usually has a recent cause.
|
|
24
|
+
- **Run the repro** once and capture full output
|
|
25
|
+
|
|
26
|
+
Skip this step = hypotheses based on vibes.
|
|
27
|
+
|
|
28
|
+
## Step 2: Generate 3 hypotheses
|
|
29
|
+
|
|
30
|
+
Generate EXACTLY 3 **causally distinct** hypotheses. Not 3 variants of the same theory.
|
|
31
|
+
|
|
32
|
+
Format:
|
|
33
|
+
```
|
|
34
|
+
H<n>: <cause> → <mechanism> → <observable effect>
|
|
35
|
+
Evidence for: <what would be true if correct>
|
|
36
|
+
Evidence against: <what would be true if wrong>
|
|
37
|
+
Fault-type: [MODEL | CONTEXT | ORCHESTRATION | ENVIRONMENT]
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
### Fault-type taxonomy
|
|
41
|
+
|
|
42
|
+
| Type | What it means | Example |
|
|
43
|
+
|------|--------------|---------|
|
|
44
|
+
| **MODEL** | Code logic wrong | Off-by-one, wrong algorithm, wrong assumption |
|
|
45
|
+
| **CONTEXT** | Missing/stale input | Wrong config, race window, state leak |
|
|
46
|
+
| **ORCHESTRATION** | Infrastructure misconfigured | Retry/timeout wrong, queue backlog |
|
|
47
|
+
| **ENVIRONMENT** | External change | Dependency drift, OS change, infra outage |
|
|
48
|
+
|
|
49
|
+
**Distribution rule**: hypotheses must span AT LEAST 2 fault-types. Three MODEL hypotheses = tunnel vision.
|
|
50
|
+
|
|
51
|
+
## Step 3: Validate (targeted checks)
|
|
52
|
+
|
|
53
|
+
For each hypothesis, run ONE targeted check (not fix):
|
|
54
|
+
|
|
55
|
+
- MODEL → add a log line or unit test asserting the expected invariant
|
|
56
|
+
- CONTEXT → dump actual input/config at failure point; diff vs expected
|
|
57
|
+
- ORCHESTRATION → check retry count, timeout, queue depth at failure time
|
|
58
|
+
- ENVIRONMENT → `<pkg-mgr> list | grep <dep>` vs lockfile; `uname -a`
|
|
59
|
+
|
|
60
|
+
Record: evidence collected, hypothesis supported/refuted/inconclusive.
|
|
61
|
+
|
|
62
|
+
## Step 4: Semantic diff
|
|
63
|
+
|
|
64
|
+
Once supported, write the diff between expected and actual:
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
EXPECTED: <behavior that should happen>
|
|
68
|
+
ACTUAL: <behavior that happens>
|
|
69
|
+
GAP: <precise mechanism>
|
|
70
|
+
ROOT: <why the gap exists — not "because of the bug", the underlying why>
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
If ROOT reads like "because the code is buggy" — you've only found the symptom. Ask "why" again.
|
|
74
|
+
|
|
75
|
+
## Step 5: Fix (two layers)
|
|
76
|
+
|
|
77
|
+
- **Direct fix** — address the supported hypothesis (the bug itself)
|
|
78
|
+
- **Systemic fix** — address why the bug was possible (missing test, missing alert, missing type)
|
|
79
|
+
|
|
80
|
+
Systemic fix is the 75% MTTR-reduction lever. Don't skip it on Critical bugs.
|
|
81
|
+
|
|
82
|
+
## Output format
|
|
83
|
+
|
|
84
|
+
```
|
|
85
|
+
## RCA VERDICT
|
|
86
|
+
|
|
87
|
+
### Symptom
|
|
88
|
+
<1 sentence>
|
|
89
|
+
|
|
90
|
+
### Repro
|
|
91
|
+
<exact command or "flaky — triggers ~1/N runs">
|
|
92
|
+
|
|
93
|
+
### Hypotheses explored
|
|
94
|
+
H1 [MODEL]: <cause> — <supported|refuted|inconclusive> — <evidence>
|
|
95
|
+
H2 [CONTEXT]: <cause> — <supported|refuted|inconclusive> — <evidence>
|
|
96
|
+
H3 [ORCHESTRATION]: <cause> — <supported|refuted|inconclusive> — <evidence>
|
|
97
|
+
|
|
98
|
+
### Root cause
|
|
99
|
+
<hypothesis number>: <cause>
|
|
100
|
+
|
|
101
|
+
### Semantic diff
|
|
102
|
+
EXPECTED/ACTUAL/GAP/ROOT
|
|
103
|
+
|
|
104
|
+
### Fix
|
|
105
|
+
- Direct: <exact code change>
|
|
106
|
+
- Systemic: <test/alert/process to add>
|
|
107
|
+
|
|
108
|
+
### Confidence
|
|
109
|
+
HIGH | MEDIUM | LOW — <why>
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
## Auto-inference (before asking the user)
|
|
113
|
+
|
|
114
|
+
Exhaust these sources before flagging input as unknown:
|
|
115
|
+
|
|
116
|
+
- **SYMPTOM** → grep last error in user's prompt; tail service logs; check recent PR descriptions
|
|
117
|
+
- **REPRO** → read `package.json` scripts, `Makefile`, `README.md`, test files, CI workflow
|
|
118
|
+
- **SCOPE** → `git diff HEAD~10 --stat` then rank by overlap with symptom keywords
|
|
119
|
+
- **RECENT_CHANGES** → `git log --since="7 days ago" --oneline -- <scope>`
|
|
120
|
+
|
|
121
|
+
State inferred values as `[ASSUMED from <source>]`. Only flag as `[UNKNOWN]` if truly blocking.
|
|
122
|
+
|
|
123
|
+
## How to verify
|
|
124
|
+
|
|
125
|
+
- [ ] ≥ 3 hypotheses generated (not just 1)?
|
|
126
|
+
- [ ] Each hypothesis has a fault type from the taxonomy?
|
|
127
|
+
- [ ] Semantic diff completed (EXPECTED vs ACTUAL vs GAP)?
|
|
128
|
+
- [ ] Root cause identified with evidence (file:line)?
|
|
129
|
+
- [ ] Fix addresses root cause, not symptom?
|
|
130
|
+
- [ ] Confidence level stated (HIGH/MEDIUM/LOW)?
|
|
131
|
+
|
|
132
|
+
## Anti-patterns
|
|
133
|
+
|
|
134
|
+
- **Patch-the-symptom**: add try/catch without understanding WHY it failed
|
|
135
|
+
- **Fix-the-test**: modify assertion to match wrong behavior instead of fixing code
|
|
136
|
+
- **Guess-and-check**: 5 commits titled "try fix" — no hypothesis discipline
|
|
137
|
+
- **First-hypothesis-wins**: commit first theory without validating alternatives
|
|
138
|
+
- **No repro, no RCA**: chasing intermittent bugs without deterministic repro burns hours
|
|
139
|
+
|
|
140
|
+
## Structured RCA methods (complementary)
|
|
141
|
+
|
|
142
|
+
The 3-hypothesis method above is the default — fast, hypothesis-driven, good for most bugs. For complex, recurrent, or systemic problems, these structured RCA methods add depth.
|
|
143
|
+
|
|
144
|
+
### Decision guide
|
|
145
|
+
|
|
146
|
+
| Problem type | Method | Why |
|
|
147
|
+
|-------------|--------|-----|
|
|
148
|
+
| Linear, single-symptom | **3 hypotheses** (default) | Fastest — parallel hypotheses, minimal overhead |
|
|
149
|
+
| Recurrent incident, process failure | **5 Whys** | Iterative questioning reaches systemic root cause |
|
|
150
|
+
| Multi-factor, need exhaustive exploration | **Ishikawa (Fishbone)** | 6M families (Method/Machine/Manpower/Material/Milieu/Measurement) guide complete coverage |
|
|
151
|
+
| Multi-layer, complex system | **Drill Down / Tree Diagram** | Decompose recursively (build → deploy → runtime → data) into atomic sub-causes; visualize as tree |
|
|
152
|
+
| Interacting causes, feedback loops | **Relations Diagram** | Map causal links, count outbound/inbound arrows to find drivers vs effects |
|
|
153
|
+
|
|
154
|
+
**When to use the full sequence**: if the problem involves ≥ 3 interacting factors across distinct system layers, use the full chain: Ishikawa (explore) → Relations Diagram (map interactions) → 5 Whys on each promising node → Tree Diagram (document). For simpler problems, pick one method from the guide.
|
|
155
|
+
|
|
156
|
+
### 5 Whys
|
|
157
|
+
|
|
158
|
+
Ask "why?" iteratively (5× typical) on the symptom. Each answer becomes the next question. Stop when the cause is systemic/process-level, not technical. **Anti-pattern**: stopping at "error 500" — the real cause may be "no integration test catches this path."
|
|
159
|
+
|
|
160
|
+
### Ishikawa (Fishbone)
|
|
161
|
+
|
|
162
|
+
Draw a horizontal spine ending at the problem (fish head). Add diagonal bones for 6 families: Method, Machine, Manpower, Material, Milieu, Measurement (adapt to software: Technology, Data/API). Branch sub-causes off each family. **Anti-pattern**: filling every family superficially — depth > breadth.
|
|
163
|
+
|
|
164
|
+
### Drill Down / Tree Diagram
|
|
165
|
+
|
|
166
|
+
Decompose the problem into 2-4 MECE sub-causes at each level, recursing until atomic (directly fixable). Visualize the result as a hierarchical tree with AND/OR logic per branch. These are the same analytical process — decomposition (Drill Down) and visualization (Tree Diagram). **Anti-pattern**: stopping at shallow levels — "module X crashes" isn't actionable, "method Y throws Z when condition W" is.
|
|
167
|
+
|
|
168
|
+
### Relations Diagram
|
|
169
|
+
|
|
170
|
+
List all discovered factors. For each pair, ask if causation exists and in which direction. Draw arrows. Count outbound (drivers) vs inbound (effects). Nodes with the most outbound arrows are root cause candidates. **Anti-pattern**: connecting everything — if most factors connect to most others, the diagram is not discriminating; focus on clear causal links only.
|
|
171
|
+
|
|
172
|
+
## Key insight
|
|
173
|
+
|
|
174
|
+
The hardest part of debugging is not finding the fix — it's resisting the urge to fix before understanding. The 3-hypothesis discipline forces you to consider alternatives before committing to one.
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: depth-classifier
|
|
3
|
+
description: Classifies a coding task as Trivial, Standard, or Critical based on mechanical signals (auth paths, security code, DB tables, diff size, route handlers). Use at the start of every Ciel workflow to determine which downstream skills to invoke. Returns a one-word depth + rationale + pipeline recommendation.
|
|
4
|
+
allowed-tools: Read, Grep, Glob
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# depth-classifier — Classify task depth
|
|
8
|
+
|
|
9
|
+
Gatekeeper skill at the entry of every Ciel workflow. Wrong classification = wrong depth = either waste (over-processing trivial) or risk (under-processing critical).
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Inputs
|
|
14
|
+
|
|
15
|
+
- **task**: the task description in natural language (from `/ciel <task>` or user message)
|
|
16
|
+
- **project-root** (optional): absolute path, defaults to CWD
|
|
17
|
+
- **overlay** (optional): `ciel-overlay.md` content if available
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Classification signals
|
|
22
|
+
|
|
23
|
+
### Critical if ANY match:
|
|
24
|
+
|
|
25
|
+
- Path patterns: `auth/`, `security/`, `Token`, `Password`, `Secret`, `Session`, `Crypto`
|
|
26
|
+
- DB table names: `users`, `sessions`, `tokens`, `accounts`, `credentials`, `2fa`, `api_keys`
|
|
27
|
+
- Code patterns: `.executeQuery`, `.executeUpdate`, raw SQL, `userId` (server-provided vs client-provided), `role`, `permission`
|
|
28
|
+
- Task keywords: "authentication", "authorization", "payment", "migration (DB schema)", "JWT", "OAuth", "encryption", "2FA", "session"
|
|
29
|
+
- Scope: touches user data, money, audit trails
|
|
30
|
+
|
|
31
|
+
### Standard if ANY match (and not Critical):
|
|
32
|
+
|
|
33
|
+
- Path patterns: `routes/`, `controllers/`, `services/`, `components/`, `hooks/`
|
|
34
|
+
- **CI/CD & pipeline files**: `.github/workflows/*.yml`, `.gitlab-ci.yml`, `.circleci/`, `Dockerfile`, `docker-compose*.yml`, `Jenkinsfile`, `.buildkite/`, `.drone.yml`
|
|
35
|
+
- **PR-review signals**:
|
|
36
|
+
- Prompt contains a PR number (`#\d+`, `PR \d+`, `pull request \d+`) OR phrases "open PR", "review PR", "fix PR", "merge PR"
|
|
37
|
+
- Planned tool calls include `gh pr list`, `gh pr view`, `gh pr checks`, `gh pr review`, `gh pr merge` (any variant: `--auto`, `--squash`, `--merge`, `--rebase`)
|
|
38
|
+
- Planned edits touch any CI/CD pipeline file (see row above)
|
|
39
|
+
- Diff scope (estimated): > 1 file OR > 50 lines change
|
|
40
|
+
- Code patterns: `validate`, `sanitize`, `rateLimit`, route handlers, state management
|
|
41
|
+
- Task keywords: "add endpoint", "new component", "refactor", "extract helper", "feature", "integration"
|
|
42
|
+
|
|
43
|
+
**Floor rule**: if ANY PR-review signal OR any CI/CD-file signal is present, depth is **at minimum Standard** — Trivial is disqualified even if the diff is small. PR review plus CI fix is never "just a one-line change".
|
|
44
|
+
|
|
45
|
+
### Trivial otherwise:
|
|
46
|
+
|
|
47
|
+
- Rename, typo, 1-line fix, copyright update, README edit
|
|
48
|
+
- Single-file localized change ≤ 10 lines
|
|
49
|
+
- No business logic change
|
|
50
|
+
|
|
51
|
+
### Default rule
|
|
52
|
+
|
|
53
|
+
If unsure → **Standard**. If touching user data or auth → **Critical**.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Pipeline recommendations
|
|
58
|
+
|
|
59
|
+
Return pipeline for each depth:
|
|
60
|
+
|
|
61
|
+
### Trivial
|
|
62
|
+
`quoi-framer` → `pattern-fitness-check` → `faire-gatekeeper` → `relire-critic` (inline) → push → `meta-critiquer`
|
|
63
|
+
|
|
64
|
+
### Standard
|
|
65
|
+
`quoi-framer` → `avec-quoi-versioner` → [researcher agent + explorer agent IN PARALLEL] → `evaluer-sizer` → `faire-gatekeeper` → `critic` agent MODE=RELIRE → `prouver-verifier` → `meta-critiquer`
|
|
66
|
+
|
|
67
|
+
### Critical
|
|
68
|
+
All of Standard + `stride-analyzer` (after `avec-quoi-versioner`) + `security-regression-check` (between FAIRE and RELIRE) + critic agent MANDATORY
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Output format
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
## DEPTH CLASSIFICATION
|
|
76
|
+
|
|
77
|
+
Depth: **Trivial | Standard | Critical**
|
|
78
|
+
|
|
79
|
+
Signals detected:
|
|
80
|
+
- [signal 1 with source — e.g. "path matches /auth/"]
|
|
81
|
+
- [signal 2]
|
|
82
|
+
|
|
83
|
+
Rationale: [1-2 sentences]
|
|
84
|
+
|
|
85
|
+
Pipeline:
|
|
86
|
+
1. <skill>
|
|
87
|
+
2. <skill>
|
|
88
|
+
...
|
|
89
|
+
|
|
90
|
+
Agents required:
|
|
91
|
+
- [researcher: yes/no]
|
|
92
|
+
- [explorer: yes/no]
|
|
93
|
+
- [critic: yes/no]
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Guardrails
|
|
99
|
+
|
|
100
|
+
- **Asymmetric bias**: when borderline between Trivial/Standard → Standard wins. When borderline between Standard/Critical → Critical wins. Missing a Critical is worse than over-processing a Standard.
|
|
101
|
+
- **Auth/security override**: any mention of auth, credentials, tokens, or user identity → Critical regardless of diff size
|
|
102
|
+
- **Single-line fix can still be Critical**: e.g. a 1-char fix in an auth check is Critical
|
|
103
|
+
- **Don't infer from filename alone**: `UserService.kt` could be Trivial if the change is a rename. Look at the actual code change being proposed.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## How to verify
|
|
108
|
+
|
|
109
|
+
- [ ] Classification signals checked (Critical, Standard, Trivial)?
|
|
110
|
+
- [ ] Pipeline recommendation provided?
|
|
111
|
+
- [ ] Default rule applied (Unsure → Standard)?
|
|
112
|
+
- [ ] Auth/security files → Critical?
|
|
113
|
+
|
|
114
|
+
## When triggered
|
|
115
|
+
|
|
116
|
+
- Automatically at start of `/ciel <task>` via the `ciel` orchestrator
|
|
117
|
+
- By `UserPromptSubmit` hook (light classification hint injected into context)
|
|
118
|
+
- Explicitly when depth is ambiguous after initial assessment
|
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: diverge
|
|
3
|
+
description: How to explore 2-3 radically different approaches before choosing one (Ciel v5 etape 5). Used after AVEC QUOI, before RECHERCHE. Prevents single-approach bias and premature convergence. Use when the task is non-trivial and there are multiple valid approaches.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Divergent Exploration — 2-3 Approaches Before Choosing (Ciel v5)
|
|
7
|
+
|
|
8
|
+
## What this covers
|
|
9
|
+
|
|
10
|
+
How to explore multiple approaches before committing to one. In Ciel v5, this is etape 5 (DIVERGE). The goal is to avoid premature convergence on the first viable approach that comes to mind.
|
|
11
|
+
|
|
12
|
+
## Core principle
|
|
13
|
+
|
|
14
|
+
**Generate 2-3 approaches before evaluating any of them.** The first approach that works is rarely the best. In v5, DIVERGE happens after AVEC QUOI (versions checked) and before RECHERCHE (external research).
|
|
15
|
+
|
|
16
|
+
## When to use
|
|
17
|
+
|
|
18
|
+
- Non-trivial tasks with multiple valid solutions
|
|
19
|
+
- Architectural decisions
|
|
20
|
+
- Library/framework choices
|
|
21
|
+
- Design patterns
|
|
22
|
+
- Database schema design
|
|
23
|
+
- API design
|
|
24
|
+
|
|
25
|
+
**When NOT to use**: 1-line fix, rename, trivial config change, obvious solution.
|
|
26
|
+
|
|
27
|
+
## The process
|
|
28
|
+
|
|
29
|
+
### Step 1: Generate at least 2 approaches
|
|
30
|
+
|
|
31
|
+
For each approach, describe:
|
|
32
|
+
- What it does (1-2 sentences)
|
|
33
|
+
- Key trade-offs (not "it's better" -- specific pros/cons)
|
|
34
|
+
- Implementation effort (rough estimate)
|
|
35
|
+
- Risk level (low/medium/high)
|
|
36
|
+
|
|
37
|
+
Approaches should be GENUINELY different. Not "use React vs use React with hooks" -- same approach. Bad:
|
|
38
|
+
- "Use PostgreSQL vs MySQL" (trivial database choice)
|
|
39
|
+
- "Use REST vs GraphQL" (genuinely different)
|
|
40
|
+
|
|
41
|
+
### Step 2: Let them compete (not you decide)
|
|
42
|
+
|
|
43
|
+
Generate approaches WITHOUT evaluating them. Evaluation happens in EVALUER (etape 9), after RECHERCHE has gathered external data about each approach.
|
|
44
|
+
|
|
45
|
+
Common trap: generating 2 approaches but immediately choosing the first one without research.
|
|
46
|
+
|
|
47
|
+
### Step 3: Document for EVALUER
|
|
48
|
+
|
|
49
|
+
Pass both approaches (with their trade-offs, effort, risk) to the EVALUER phase. The researcher should check documentation for BOTH approaches.
|
|
50
|
+
|
|
51
|
+
## Output format
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
## DIVERGE
|
|
55
|
+
|
|
56
|
+
### Approach A: <name>
|
|
57
|
+
What: <1-2 sentences>
|
|
58
|
+
Trade-offs:
|
|
59
|
+
+ <pro>
|
|
60
|
+
- <con>
|
|
61
|
+
Effort: <XS/S/M/L/XL>
|
|
62
|
+
Risk: <low/medium/high>
|
|
63
|
+
|
|
64
|
+
### Approach B: <name>
|
|
65
|
+
What: <1-2 sentences>
|
|
66
|
+
Trade-offs:
|
|
67
|
+
+ <pro>
|
|
68
|
+
- <con>
|
|
69
|
+
Effort: <XS/S/M/L/XL>
|
|
70
|
+
Risk: <low/medium/high>
|
|
71
|
+
|
|
72
|
+
### (Optional) Approach C: <name>
|
|
73
|
+
...
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
## Common rationalizations
|
|
77
|
+
|
|
78
|
+
| Rationalization | Reality |
|
|
79
|
+
|---|---|
|
|
80
|
+
| "I already know the best approach" | You know the first approach that came to mind. That's not the same as the best approach. Generate 2-3 then compare. |
|
|
81
|
+
| "Diverging takes too long" | It takes 5 minutes. Committing to the wrong approach costs days. The math is clear. |
|
|
82
|
+
| "There's only one valid way to do this" | There are almost always 2+ valid approaches. If you can't think of alternatives, you don't understand the problem well enough. |
|
|
83
|
+
|
|
84
|
+
## How to verify
|
|
85
|
+
|
|
86
|
+
- [ ] >= 2 genuinely different approaches generated?
|
|
87
|
+
- [ ] Approaches are different in kind, not degree?
|
|
88
|
+
- [ ] Trade-offs documented for each?
|
|
89
|
+
- [ ] Effort estimated?
|
|
90
|
+
- [ ] Risk assessed?
|
|
91
|
+
- [ ] Evaluation deferred to next phase (EVALUER)?
|