@neikyun/ciel 6.10.1 → 6.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/assets/.claude/hooks/memory-engine.py +256 -0
  2. package/assets/commands/ciel-audit.md +42 -0
  3. package/assets/commands/ciel-create-skill.md +2 -2
  4. package/assets/commands/ciel-status.md +1 -1
  5. package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +2 -2
  6. package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +2 -2
  7. package/assets/platforms/opencode/.opencode/commands/ciel-memory-bootstrap.md +195 -0
  8. package/assets/skills/ciel/SKILL.md +2 -1
  9. package/assets/skills/workflow/adr-auto/SKILL.md +88 -0
  10. package/assets/skills/workflow/ai-failure-modes-detector/SKILL.md +180 -0
  11. package/assets/skills/workflow/ask-window/SKILL.md +119 -0
  12. package/assets/skills/workflow/avec-quoi-versioner/SKILL.md +111 -0
  13. package/assets/skills/workflow/ci-watcher/SKILL.md +194 -0
  14. package/assets/skills/workflow/critiquer-auditor/SKILL.md +135 -0
  15. package/assets/skills/workflow/critiquer-auditor/reference.md +134 -0
  16. package/assets/skills/workflow/debug-reasoning-rca/SKILL.md +174 -0
  17. package/assets/skills/workflow/depth-classifier/SKILL.md +118 -0
  18. package/assets/skills/workflow/diverge/SKILL.md +91 -0
  19. package/assets/skills/workflow/doc-validator-official/SKILL.md +196 -0
  20. package/assets/skills/workflow/evaluer-sizer/SKILL.md +112 -0
  21. package/assets/skills/workflow/faire-gatekeeper/SKILL.md +99 -0
  22. package/assets/skills/workflow/flux-narrator/SKILL.md +93 -0
  23. package/assets/skills/workflow/memoire/SKILL.md +198 -0
  24. package/assets/skills/workflow/memoire-consolidator/SKILL.md +91 -0
  25. package/assets/skills/workflow/meta-critiquer/SKILL.md +112 -0
  26. package/assets/skills/workflow/modern-patterns-checker/SKILL.md +166 -0
  27. package/assets/skills/workflow/pattern-fitness-check/SKILL.md +108 -0
  28. package/assets/skills/workflow/playwright-visual-critic/SKILL.md +98 -0
  29. package/assets/skills/workflow/pr-review-responder/SKILL.md +214 -0
  30. package/assets/skills/workflow/prouver-verifier/SKILL.md +184 -0
  31. package/assets/skills/workflow/prouver-verifier/reference.md +152 -0
  32. package/assets/skills/workflow/quoi-framer/SKILL.md +91 -0
  33. package/assets/skills/workflow/relire-critic/SKILL.md +99 -0
  34. package/assets/skills/workflow/security-regression-check/SKILL.md +86 -0
  35. package/assets/skills/workflow/self-consistency-verifier/SKILL.md +85 -0
  36. package/assets/skills/workflow/spike-mode/SKILL.md +101 -0
  37. package/assets/skills/workflow/stride-analyzer/SKILL.md +96 -0
  38. package/assets/skills/workflow/stride-analyzer/reference.md +144 -0
  39. package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md +119 -0
  40. package/package.json +1 -1
@@ -0,0 +1,91 @@
1
+ ---
2
+ name: quoi-framer
3
+ description: How to frame a task before starting — forces explicit goal, NOT-X constraint, intention partagee, and measurable definition of done. For Ciel v5 pipeline step 2 (QUOI). Use after DOCS phase, before ASK phase.
4
+ ---
5
+
6
+ # Task Framing — Define Before You Start (Ciel v5)
7
+
8
+ ## What this covers
9
+
10
+ How to define a task clearly before doing any work. This is the first step of the Ciel v5 pipeline (etape 2: QUOI). Applied after DOCS phase (etape 1) and before ASK (etape 3). Prevents scope drift, wasted research, and "I thought you meant..." conversations.
11
+
12
+ ## Core principle
13
+
14
+ **State the goal, the constraint, the intention, and the done criteria BEFORE researching or coding.** If you can't state these in 5 lines, you don't understand the task yet.
15
+
16
+ ## The 5 output gates (ALL required)
17
+
18
+ ### 1. Expected result
19
+
20
+ One sentence. Concrete and testable.
21
+
22
+ - BAD: "Improve the API"
23
+ - GOOD: "GET /api/users returns a paginated list with page+limit query params"
24
+
25
+ ### 2. NOT-X constraint
26
+
27
+ At least 1 concrete thing the solution MUST NOT do:
28
+ - "NOT-X: no N+1 queries"
29
+ - "NOT-X: no new dependencies added"
30
+ - "NOT-X: no breaking changes to existing callers"
31
+ - "NOT-X: no schema migration"
32
+
33
+ "no bad code" is not NOT-X. "No global state mutation" is.
34
+
35
+ ### 3. Intentions partagees (v5)
36
+
37
+ State what you are looking for, not what you expect to find. This guides exploration without biasing it:
38
+ - BAD: "Find where pdfmake is used for PDF export" (cherche une solution specifique)
39
+ - GOOD: "Understand how exports are handled in this project" (intention ouverte)
40
+
41
+ The intention is passed to @ciel-explorer to guide scent-following without creating confirmation bias.
42
+
43
+ ### 4. Definition of done
44
+
45
+ Measurable before research starts:
46
+ - "Done when: endpoint returns 200 with `{items, total, page}` shape, test passes on staging, no perf regression vs baseline"
47
+
48
+ "done when it works" is not acceptable. Specify the observable signal.
49
+
50
+ ### 5. DOCS gate (v5)
51
+
52
+ Before framing, verify that documentation has been read:
53
+ - README.md (project overview and conventions)
54
+ - ADRs if they exist (architecture decisions)
55
+ - Tickets/specs (requirements context)
56
+ - .ciel/map.json (existing project map)
57
+ - ciel-overlay.md (project overlay)
58
+
59
+ ## Output format
60
+
61
+ ```
62
+ ## QUOI
63
+
64
+ Expected result: <one sentence>
65
+ NOT-X: <concrete constraint>
66
+ Intentions: <what I'm looking for (open question)>
67
+ Done when: <measurable criteria>
68
+ Docs read: <yes — README, ADRs, map, tickets>
69
+ ```
70
+
71
+ ## Common rationalizations
72
+
73
+ | Rationalization | Reality |
74
+ |---|---|
75
+ | "This is simple, I don't need to frame it" | Simple tasks benefit from 2-line frames. The frame costs 10 seconds. Scope drift costs hours. |
76
+ | "I already know what to build" | Write it down anyway. Writing forces precision. "I know" is how ambiguity hides. |
77
+ | "NOT-X is obvious" | If it's obvious, writing it takes 2 seconds. If you can't write it, it wasn't obvious. |
78
+
79
+ ## How to verify
80
+
81
+ - [ ] QUOI statement: 1 sentence, describes WHAT not HOW?
82
+ - [ ] NOT-X constraint: >= 1 explicit exclusion?
83
+ - [ ] Intentions partagees: open question, not solution-biased?
84
+ - [ ] Definition of done: >= 1 measurable criterion?
85
+ - [ ] DOCS gate: documentation has been read?
86
+
87
+ ## When to re-frame
88
+
89
+ - Start of any task (before research, after DOCS)
90
+ - When scope drift is detected (3+ files touched without re-checking goal)
91
+ - When the user changes direction mid-task
@@ -0,0 +1,99 @@
1
+ ---
2
+ name: relire-critic
3
+ description: How to self-review code effectively — hostile critique methodology, risk taxonomy, and quality checklist. Generates exactly 3 targeted critiques (functional, import/API, data assumption) then resolves each. Applicable after any code change.
4
+ allowed-tools: Read, Grep, Glob, Bash
5
+ ---
6
+
7
+ # Code Self-Review — Hostile Critique Methodology
8
+
9
+ ## What this covers
10
+
11
+ How to review your own code as if someone else wrote it. Self-review fails because the author reinforces their own blind spots (degeneration of thought, CriticBench 2024). This methodology forces adversarial thinking.
12
+
13
+ ## Core principle
14
+
15
+ Read changed files **as if someone else wrote them**. Your job is to find what could fail, not to confirm what works.
16
+
17
+ ## Methodology: 3 RISQUES
18
+
19
+ Generate EXACTLY 3 specific critiques of the changed code. Not 2, not 5 — 3 forces focus.
20
+
21
+ ### Mandatory distribution
22
+
23
+ Each set of 3 RISQUES must include:
24
+
25
+ 1. **Functional risk** — what breaks for users? "This fails when..."
26
+ 2. **Import/API surface check** — does this import path actually exist? Is the API contract correct?
27
+ 3. **Data assumption check** — does this DB column / response shape / format actually match reality?
28
+
29
+ ### Specificity rules
30
+
31
+ - Concrete, not abstract: "might have bugs" is invalid
32
+ - Reference specific `file:line` where the risk lives
33
+ - Can't generate 3 specific critiques → you don't understand the code → read more
34
+
35
+ ### Format
36
+
37
+ ```
38
+ RISQUE: [what could fail] parce que [root cause] — IMPACT: [consequence]
39
+ ```
40
+
41
+ ## Resolution
42
+
43
+ For each RISQUE, choose ONE:
44
+
45
+ - **FIX**: exact correction needed — name the code change
46
+ - **ACCEPT**: why the risk is acceptable (TTL? cosmetic? window < 1s?)
47
+ - **DEFER**: issue reference + why out of scope
48
+
49
+ If 0 fixes needed → suspicious. Re-examine for specificity.
50
+
51
+ ## Quality checklist (8 items)
52
+
53
+ Apply after resolving RISQUES:
54
+
55
+ 1. Quality gates respected? (complexity < 15, nesting < 4, functions < 50 lines)
56
+ 2. All new imports exist in actual files at stated paths?
57
+ 3. All DB columns referenced exist in real schema?
58
+ 4. Test mocks on same host:port as actual requests?
59
+ 5. Tests could fail independently of implementation?
60
+ 6. Duplicated logic with existing code?
61
+ 7. Linter clean? (0 new violations vs base branch)
62
+ 8. Would a staff engineer approve this without changes?
63
+
64
+ Each item: evidence (`file:line` or command output) or explicit "N/A because X".
65
+
66
+ ## Output format
67
+
68
+ ```
69
+ ## RISQUES
70
+ 1. RISQUE: <X> parce que <Y> — IMPACT: <Z>
71
+ → FIX/ACCEPT/DEFER: <resolution>
72
+ 2. ...
73
+ 3. ...
74
+
75
+ ## CHECKLIST
76
+ - [✓/✗/N/A] <item> — <evidence>
77
+ ...
78
+
79
+ ## VERDICT
80
+ BLOCKING: <list or "none">
81
+ IMPORTANT: <list or "none">
82
+ MINOR: <list or "none">
83
+ ```
84
+
85
+ ## How to verify
86
+
87
+ - [ ] Exactly 3 RISQUES (no more, no less)?
88
+ - [ ] Distribution: 1 functional + 1 import + 1 data-assumption?
89
+ - [ ] Each RISQUE has file:line evidence?
90
+ - [ ] Each RISQUE has resolution (FIX/ACCEPT/DEFER)?
91
+ - [ ] Quality checklist (8 items) completed?
92
+ - [ ] VERDICT issued (BLOCKING/IMPORTANT/MINOR)?
93
+
94
+ ## Common mistakes
95
+
96
+ - **Generic critiques**: "might not scale" → too vague. "Loads all users into memory at line 47, O(n)" → specific.
97
+ - **Skipping distribution**: all 3 are functional risks, no import or data check → incomplete.
98
+ - **Too many RISQUES**: 5 critiques dilute focus. Pick top 3 by severity.
99
+ - **Not reading code**: reviewing the description instead of the actual file → always read code first.
@@ -0,0 +1,86 @@
1
+ ---
2
+ name: security-regression-check
3
+ description: How to check for security regressions in a diff — greps for new inputs, removed auth blocks, new external calls, new file access, new SQL/eval, and new trust boundaries. Attacker-eye review of what changed, not what was intended.
4
+ allowed-tools: Read, Grep, Bash
5
+ ---
6
+
7
+ # Security Regression Check — Attacker Eyes on the Diff
8
+
9
+ ## What this covers
10
+
11
+ How to check if a code change introduced security regressions. The hypothesis: "I fixed A without touching B" is NOT a check. Read the diff with attacker eyes — what did my fix add that wasn't there before?
12
+
13
+ ## Core principle
14
+
15
+ **Read `+` lines with attacker eyes, not author eyes.** The author's intent is irrelevant. What can an external actor do with this code path?
16
+
17
+ ## Process
18
+
19
+ ### 1. Capture the diff
20
+
21
+ ```bash
22
+ git diff --unified=3 HEAD
23
+ ```
24
+
25
+ ### 2. Grep for risk signals
26
+
27
+ | Signal | What to search | Why it matters |
28
+ |--------|---------------|----------------|
29
+ | New request param reads | `call.parameters[`, `request.body.`, `req.query.`, `req.params.` | New inputs = new validation surface |
30
+ | Removed auth blocks | `-` lines with `authenticate`, `requireAuth`, `verifyToken`, `checkPermission` | Removed auth = privilege escalation |
31
+ | New external calls | `+` lines with `fetch(`, `axios(`, `httpClient.` | New outbound = SSRF / data exfil risk |
32
+ | New file reads/writes | `+` lines with `File(`, `fs.readFile`, `fs.writeFile`, `Path(` | New FS access = path traversal risk |
33
+ | New SQL | `+` lines with SELECT, INSERT, UPDATE, DELETE | New queries = injection risk if concat |
34
+ | New eval/exec | `+` lines with `eval(`, `Function(`, `exec(` | Code injection risk |
35
+ | New trust boundaries | `+` lines with cookies, tokens, sessions | New trust = new spoofing surface |
36
+
37
+ ### 3. Classify each finding
38
+
39
+ - **Critical** — must address before merge
40
+ - **Important** — document + address OR accept with rationale
41
+ - **Informational** — note for reflection
42
+
43
+ ## Output format
44
+
45
+ ```
46
+ ## SECURITY REGRESSION CHECK
47
+
48
+ Diff scope: <N files, +X -Y lines>
49
+
50
+ ### New inputs (from request)
51
+ - <file:line> — <new param> — <has validation?>
52
+
53
+ ### Removed/modified auth
54
+ - <file:line> — <what changed>
55
+
56
+ ### New external calls
57
+ - <file:line> — <target | dynamic URL risk>
58
+
59
+ ### New file/FS access
60
+ - <file:line> — <path controlled by user?>
61
+
62
+ ### New SQL / eval
63
+ - <file:line> — <parameterized? safe?>
64
+
65
+ ### New trust boundaries
66
+ - <file:line> — <cookie/token/session change>
67
+
68
+ ### VERDICT
69
+ - Critical: <list or none>
70
+ - Important: <list or none>
71
+ - Informational: <list or none>
72
+ ```
73
+
74
+ ## How to verify
75
+
76
+ - [ ] Diff captured and reviewed?
77
+ - [ ] Risk signals grepped (new inputs, removed auth, external calls, file access, SQL/eval, trust boundaries)?
78
+ - [ ] Each finding classified (SAFE / RISK / BLOCK)?
79
+ - [ ] VERDICT issued (CLEAN / FINDINGS)?
80
+ - [ ] Attacker perspective applied?
81
+
82
+ ## Key rules
83
+
84
+ - **Diff scope matters**: 500-line diff → process in chunks. Fatigue causes misses.
85
+ - **Don't trust commit messages**: "just a refactor" still needs the check. Refactors routinely remove validation.
86
+ - **"No error" ≠ safe**: absence of error messages doesn't mean the change is secure.
@@ -0,0 +1,85 @@
1
+ ---
2
+ name: self-consistency-verifier
3
+ description: How to verify AI-generated code by generating 3 independent solutions, comparing them at syntactic/AST/behavioral levels, and scoring consistency. Divergent solutions indicate model uncertainty — re-prompt with constraints or escalate. Based on IdentityChain (2024) and Consistency-Aided Tested Code Generation (ACM 2025).
4
+ allowed-tools: Read, Grep, Glob, Bash, Write
5
+ ---
6
+
7
+ # Self-Consistency Verifier — If Three of You Disagree, One of You Is Wrong
8
+
9
+ ## What this covers
10
+
11
+ How to verify AI-generated code by generating 3 diverse solutions and comparing them. A confident LLM that generates 3 semantically identical solutions is probably right. A confident LLM that generates 3 divergent solutions is the dangerous case — it'll ship whichever came out first. Self-consistency is the cheapest high-signal uncertainty estimator available.
12
+
13
+ ## Core principle
14
+
15
+ **Divergence is diagnostic.** When solutions disagree, the disagreement itself tells you what constraint is missing. Don't just pick one — understand WHY they differ.
16
+
17
+ ## Methodology
18
+
19
+ ### Generate 3 diverse solutions
20
+
21
+ Re-prompt the LLM 3 times with diversifying seeds. The goal is divergent initial approaches, not different variable names.
22
+
23
+ **Diversification strategies** (pick 3 out of 5):
24
+ 1. **Constraint-reorder** — restate the problem with constraints in a different order
25
+ 2. **Language-shift** — ask for pseudocode first, THEN translate to target language
26
+ 3. **Test-first** — ask for test cases first, THEN the implementation
27
+ 4. **Adversarial framing** — "what would break this naïve solution?" then write the robust version
28
+ 5. **Reference implementation** — "find the canonical pattern" then adapt
29
+
30
+ ### Compare at 3 levels
31
+
32
+ **Level A — Syntactic (cheap)**
33
+ - Run formatter, normalize whitespace, compute textual diff
34
+ - Identical after format → consistency HIGH, skip to verdict
35
+ - Differ only in variable names → consistency HIGH
36
+ - Structural diff → proceed to Level B
37
+
38
+ **Level B — AST-level (medium)**
39
+ - Parse each solution to AST
40
+ - Compare: function signatures, control flow shape, side-effect surface, data shape flow
41
+ - Score: `consistency = matched_nodes / total_nodes`. ≥0.85 = HIGH, 0.60-0.85 = MEDIUM, <0.60 = LOW
42
+
43
+ **Level C — Behavioral (expensive, Critical only)**
44
+ - Generate 10-20 property-based test cases (`fast-check` / `hypothesis`)
45
+ - Run each solution against the same test cases
46
+ - All 3 pass all cases → consistency HIGH
47
+ - Divergent pass/fail patterns → at least one is wrong; use majority vote + investigate outlier
48
+
49
+ ### Interpret divergence
50
+
51
+ | Divergence type | Interpretation | Action |
52
+ |---|---|---|
53
+ | One solution handles edge case X, others don't | Missing explicit constraint | Add constraint, re-generate |
54
+ | Solutions use different libraries | Library choice under-specified | Pin the lib, pick one |
55
+ | Solutions use different algorithms with different complexity | Performance under-specified | Add perf constraint |
56
+ | Solutions have different error-handling | Error model under-specified | Specify what errors to surface |
57
+ | Two agree, one is outlier | Majority-vote the two, investigate outlier for missed insight | Use the majority |
58
+ | All three disagree | Problem under-specified or too hard | Escalate to human |
59
+
60
+ ## Key points
61
+
62
+ - **Cost budget**: Critical = full 3-level compare, ≤15 min. Standard = syntactic + AST only, ≤5 min. Trivial = skip entirely
63
+ - **Don't re-generate with the same prompt** — identical prompts produce highly similar outputs; the check becomes trivial. Always diversify
64
+ - **Don't majority-vote blindly** — an outlier that catches an edge case the other two missed is the RIGHT answer. Investigate before voting
65
+ - **AST compare requires a parser** — if the target language lacks easy AST access, fall back to behavioral compare or skip Level B
66
+ - **Three is the magic number** — two is a tie, four is diminishing returns
67
+
68
+ ## Common anti-patterns
69
+
70
+ 1. **Same-prompt re-generation**: identical prompts produce near-identical outputs, making the check trivial and useless
71
+ 2. **Blind majority voting**: an outlier may be the only one that caught a real edge case — investigate before discarding
72
+ 3. **Skipping divergence analysis**: the WHY of divergence is more valuable than the score itself
73
+ 4. **Running behavioral tests on every task**: reserve for Critical code only; syntactic + AST is enough for Standard
74
+
75
+ ## How to verify
76
+
77
+ - **Score threshold**: ≥0.85 = HIGH confidence, proceed. 0.60-0.85 = MEDIUM, adopt majority + add tests. <0.60 = LOW, re-prompt or escalate
78
+ - **Edge case surfacing**: divergence analysis should produce at least 1 concrete edge case to test
79
+ - **Constraint improvement**: after divergence, the problem statement should have more constraints than before
80
+
81
+ ## References
82
+
83
+ - IdentityChain — openreview.net/forum?id=caW7LdAALh — self-consistency for code LLMs
84
+ - ACM 2025 — "Consistency-Aided Tested Code Generation with LLM" (dl.acm.org/doi/pdf/10.1145/3728902)
85
+ - arxiv 2507.06920 — "Rethinking Verification for LLM Code Generation: From Generation to Testing"
@@ -0,0 +1,101 @@
1
+ ---
2
+ name: spike-mode
3
+ description: How to use SPIKE mode in Ciel v5 — prototype/exploration mode with assoupli gates. Create .ciel/exploration.active to enter spike mode. Quality gates relaxed, code marked FIXME/TODO. Used for POC, draft, experimental, throwaway code. Must be refactored properly after.
4
+ ---
5
+
6
+ # SPIKE Mode — Explore Without Commitment (Ciel v5)
7
+
8
+ ## What this covers
9
+
10
+ How to use SPIKE mode in Ciel v5 for prototyping and exploration. When you need to test an idea quickly without going through the full quality pipeline. The mode is triggered by creating a `.ciel/exploration.active` file in the project root.
11
+
12
+ ## Core principle
13
+
14
+ **Speed over quality during exploration. Quality over speed for production.** SPIKE mode exists because sometimes you need to write throwaway code to validate an approach. But throwaway code that stays is technical debt.
15
+
16
+ ## When to use SPIKE mode
17
+
18
+ - Prototyping a new feature
19
+ - Testing a library integration
20
+ - Exploring a complex refactoring
21
+ - Validating an architecture approach
22
+ - POC / proof of concept
23
+ - "I don't know if this will work, let me try"
24
+
25
+ Do NOT use SPIKE mode for:
26
+ - Production code
27
+ - Code you plan to keep
28
+ - Critical/security code
29
+ - Code you already know how to implement
30
+
31
+ ## How to enter SPIKE mode
32
+
33
+ ```bash
34
+ touch .ciel/exploration.active
35
+ ```
36
+
37
+ The plugin detects this file and:
38
+ - Assouplit gates 1 (test-first) and 4 (quality)
39
+ - Injects SPIKE mode indicator in system prompt
40
+ - Marks all code as experimental
41
+
42
+ ## How to exit SPIKE mode
43
+
44
+ ```bash
45
+ rm .ciel/exploration.active
46
+ ```
47
+
48
+ Or when the exploration is done,
49
+ - Refactor the experimental code properly
50
+ - Add tests
51
+ - Follow the full pipeline
52
+
53
+ ## What changes in SPIKE mode
54
+
55
+ | Gate | Standard mode | SPIKE mode |
56
+ |------|---------------|------------|
57
+ | Test-first (RED) | Bloquant | Assoupli |
58
+ | Alternatives | Requis | Recommande |
59
+ | Idiomatic | Requis | Recommande |
60
+ | Quality (complexity, nesting) | Enforce | Assoupli |
61
+ | Removal safety | Requis | Requis |
62
+ | Boy-scout | Recommande | Recommande |
63
+ | FIXME/TODO markers | Optionnel | OBLIGATOIRE |
64
+
65
+ ## Output format
66
+
67
+ When in SPIKE mode, add this to the plan:
68
+
69
+ ```
70
+ ## SPIKE MODE
71
+
72
+ Goal: <what are we trying to learn/prove?>
73
+ Exit criteria: <when is this exploration done?>
74
+ Markers: <files marked FIXME/TODO>
75
+ Follow-up task: <describe the proper implementation>
76
+ ```
77
+
78
+ ## Common rationalizations
79
+
80
+ | Rationalization | Reality |
81
+ |---|---|
82
+ | "I'll clean up the spike code later" | You won't. If you don't schedule the cleanup immediately, spike code becomes permanent debt. |
83
+ | "The gates are annoying, I'll use spike mode" | SPIKE mode is for when you DON'T KNOW the solution, not for when you don't WANT to write tests. |
84
+ | "This is just a quick prototype, no need for FIXME" | Unmarked prototype code looks like production code. Without FIXME, nobody knows it needs refactoring. Future you included. |
85
+ | "SPIKE mode means no rules" | SPIKE assouplit les gates mais ne les supprime pas. Security et removal restent actifs. |
86
+
87
+ ## Rules
88
+
89
+ - **Code written in SPIKE mode MUST be marked FIXME or TODO**
90
+ - **SPIKE code MUST be refactored or removed after exploration**
91
+ - **SPIKE mode does not bypass security gates** (removal safety still applies)
92
+ - **Do not commit SPIKE code without refactoring**
93
+ - **SPIKE mode is for individual exploration sessions, not for PRs**
94
+
95
+ ## How to verify
96
+
97
+ - [ ] .ciel/exploration.active exists?
98
+ - [ ] All exploratory code has FIXME/TODO markers?
99
+ - [ ] Exit criteria defined?
100
+ - [ ] Follow-up task created for proper implementation?
101
+ - [ ] No SPIKE code committed without refactoring?
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: stride-analyzer
3
+ description: How to threat model with STRIDE — 3-pass methodology: risk-rank by mechanical signals, STRIDE 6 categories (Spoofing/Tampering/Repudiation/Info Disclosure/DoS/Elevation) with grep evidence, and killer checklist. For auth, DB schema, payment, security changes.
4
+ allowed-tools: Read, Grep, Glob, Bash
5
+ ---
6
+
7
+ # STRIDE Threat Modeling — Security Analysis Methodology
8
+
9
+ ## What this covers
10
+
11
+ How to do a security threat model using STRIDE. STRIDE is the framework; grep is the evidence. No theater — every finding needs `file:line` proof.
12
+
13
+ ## Core principle
14
+
15
+ **Anti-theater rule**: every checklist item needs evidence (file:line or grep output). "Checked ✓" with no evidence = not checked.
16
+
17
+ ## Pass 1: Risk rank (mechanical signals)
18
+
19
+ Classify the change:
20
+
21
+ - **Critical** if ANY: `auth/`, `security/`, DB tables (users, sessions, tokens), `.executeQuery`, `.executeUpdate`, `userId`, `password`, `token`, `secret`
22
+ - **Important** if ANY: diff > 5 files, `validate`, `sanitize`, `rateLimit`, route handlers
23
+ - **Routine** otherwise
24
+
25
+ → Critical = all 3 passes. Important = passes 2+3. Routine = pass 3 only.
26
+
27
+ ## Pass 2: STRIDE 6 categories (Critical/Important)
28
+
29
+ For each category, answer with grep-backed evidence:
30
+
31
+ | Category | Question | Evidence type |
32
+ |----------|----------|--------------|
33
+ | **S**poofing | Can I impersonate someone? | Auth checks, token validation |
34
+ | **T**ampering | Can input be modified in transit? | Input validation, integrity checks |
35
+ | **R**epudiation | Can a user deny this action? | Audit logging, timestamps |
36
+ | **I**nfo Disclosure | What leaks? | Error messages, logs, responses |
37
+ | **D**oS | Can this be flooded/exhausted? | Rate limits, resource bounds |
38
+ | **E**levation | Can I access what I shouldn't? | Authorization checks, role validation |
39
+
40
+ Each answer: grep-backed or "N/A because X". **Mark N/A explicitly, never skip silently.**
41
+
42
+ **OPS lens** (overlayed on STRIDE): unclosed connections, memory leaks, locks, behavior at 100x volume.
43
+
44
+ ## Pass 3: Killer checklist (all levels)
45
+
46
+ - Same field = same validation everywhere? (grep to verify)
47
+ - Same domain = same auth on ALL transports (REST + WS + SSE)?
48
+ - Identity fields resolved server-side, never client-supplied?
49
+ - SQL parameterized, never interpolated?
50
+ - PII touched = anonymization covered?
51
+
52
+ Each item: evidence (`file:line` or grep output) or N/A.
53
+
54
+ ## Output format
55
+
56
+ ```
57
+ ## STRIDE ANALYSIS
58
+
59
+ ### Risk rank: <Critical | Important | Routine>
60
+ Signals: <list>
61
+
62
+ ### STRIDE (if Critical/Important)
63
+ - S (Spoofing): <N/A because X | RISQUE: ... — evidence: file:line>
64
+ - T (Tampering): <...>
65
+ - R (Repudiation): <...>
66
+ - I (Info Disclosure): <...>
67
+ - D (DoS): <...>
68
+ - E (Elevation): <...>
69
+
70
+ OPS: <connections | memory | locks | 100x volume>
71
+
72
+ ### Killer checklist
73
+ - [✓/✗] Same validation everywhere — evidence: <grep output>
74
+ - [✓/✗] Auth parity across transports — evidence: <...>
75
+ - [✓/✗] Identity server-side — evidence: <...>
76
+ - [✓/✗] SQL parameterized — evidence: <...>
77
+ - [✓/✗] PII anonymization — evidence: <...>
78
+
79
+ ### VERDICT
80
+ BLOCKING: <list or none>
81
+ IMPORTANT: <list or none>
82
+ ```
83
+
84
+ ## How to verify
85
+
86
+ - [ ] Pass 1 (Risk rank) completed with mechanical signals?
87
+ - [ ] Pass 2 (STRIDE 6 categories) — all categories have findings or explicit "N/A because X"?
88
+ - [ ] Pass 3 (Killer checklist) completed?
89
+ - [ ] VERDICT issued (PROCEED / BLOCK / INVESTIGATE)?
90
+ - [ ] Evidence format: `file:line` or grep output?
91
+
92
+ ## Key rules
93
+
94
+ - **Don't skip categories silently**: every STRIDE category gets a finding or explicit "N/A because X"
95
+ - **Evidence format**: `path/to/file.ext:123` or `grep -n "pattern" src/` output
96
+ - **Rotate stale items**: if a checklist item catches nothing in 10+ audits, consider replacing it