@neikyun/ciel 6.11.0 → 6.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/assets/.claude/hooks/memory-engine.py +29 -4
  2. package/assets/.claude/settings.json +8 -8
  3. package/assets/commands/ciel-create-skill.md +2 -2
  4. package/assets/commands/ciel-status.md +1 -1
  5. package/assets/platforms/opencode/.opencode/agents/ciel-improver.md +2 -2
  6. package/assets/platforms/opencode/.opencode/commands/ciel-create-skill.md +2 -2
  7. package/assets/platforms/opencode/.opencode/commands/ciel-memory-bootstrap.md +195 -0
  8. package/assets/skills/workflow/adr-auto/SKILL.md +88 -0
  9. package/assets/skills/workflow/ai-failure-modes-detector/SKILL.md +180 -0
  10. package/assets/skills/workflow/ask-window/SKILL.md +119 -0
  11. package/assets/skills/workflow/avec-quoi-versioner/SKILL.md +111 -0
  12. package/assets/skills/workflow/ci-watcher/SKILL.md +194 -0
  13. package/assets/skills/workflow/critiquer-auditor/SKILL.md +135 -0
  14. package/assets/skills/workflow/critiquer-auditor/reference.md +134 -0
  15. package/assets/skills/workflow/debug-reasoning-rca/SKILL.md +174 -0
  16. package/assets/skills/workflow/depth-classifier/SKILL.md +118 -0
  17. package/assets/skills/workflow/diverge/SKILL.md +91 -0
  18. package/assets/skills/workflow/doc-validator-official/SKILL.md +196 -0
  19. package/assets/skills/workflow/evaluer-sizer/SKILL.md +112 -0
  20. package/assets/skills/workflow/faire-gatekeeper/SKILL.md +99 -0
  21. package/assets/skills/workflow/flux-narrator/SKILL.md +93 -0
  22. package/assets/skills/workflow/memoire/SKILL.md +198 -0
  23. package/assets/skills/workflow/memoire-consolidator/SKILL.md +91 -0
  24. package/assets/skills/workflow/meta-critiquer/SKILL.md +112 -0
  25. package/assets/skills/workflow/modern-patterns-checker/SKILL.md +166 -0
  26. package/assets/skills/workflow/pattern-fitness-check/SKILL.md +108 -0
  27. package/assets/skills/workflow/playwright-visual-critic/SKILL.md +98 -0
  28. package/assets/skills/workflow/pr-review-responder/SKILL.md +214 -0
  29. package/assets/skills/workflow/prouver-verifier/SKILL.md +184 -0
  30. package/assets/skills/workflow/prouver-verifier/reference.md +152 -0
  31. package/assets/skills/workflow/quoi-framer/SKILL.md +91 -0
  32. package/assets/skills/workflow/relire-critic/SKILL.md +99 -0
  33. package/assets/skills/workflow/security-regression-check/SKILL.md +86 -0
  34. package/assets/skills/workflow/self-consistency-verifier/SKILL.md +85 -0
  35. package/assets/skills/workflow/spike-mode/SKILL.md +101 -0
  36. package/assets/skills/workflow/stride-analyzer/SKILL.md +96 -0
  37. package/assets/skills/workflow/stride-analyzer/reference.md +144 -0
  38. package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md +119 -0
  39. package/package.json +1 -1
@@ -0,0 +1,99 @@
1
+ ---
2
+ name: relire-critic
3
+ description: How to self-review code effectively — hostile critique methodology, risk taxonomy, and quality checklist. Generates exactly 3 targeted critiques (functional, import/API, data assumption) then resolves each. Applicable after any code change.
4
+ allowed-tools: Read, Grep, Glob, Bash
5
+ ---
6
+
7
+ # Code Self-Review — Hostile Critique Methodology
8
+
9
+ ## What this covers
10
+
11
+ How to review your own code as if someone else wrote it. Self-review fails because the author reinforces their own blind spots (degeneration of thought, CriticBench 2024). This methodology forces adversarial thinking.
12
+
13
+ ## Core principle
14
+
15
+ Read changed files **as if someone else wrote them**. Your job is to find what could fail, not to confirm what works.
16
+
17
+ ## Methodology: 3 RISQUES
18
+
19
+ Generate EXACTLY 3 specific critiques of the changed code. Not 2, not 5 — 3 forces focus.
20
+
21
+ ### Mandatory distribution
22
+
23
+ Each set of 3 RISQUES must include:
24
+
25
+ 1. **Functional risk** — what breaks for users? "This fails when..."
26
+ 2. **Import/API surface check** — does this import path actually exist? Is the API contract correct?
27
+ 3. **Data assumption check** — does this DB column / response shape / format actually match reality?
28
+
29
+ ### Specificity rules
30
+
31
+ - Concrete, not abstract: "might have bugs" is invalid
32
+ - Reference specific `file:line` where the risk lives
33
+ - Can't generate 3 specific critiques → you don't understand the code → read more
34
+
35
+ ### Format
36
+
37
+ ```
38
+ RISQUE: [what could fail] parce que [root cause] — IMPACT: [consequence]
39
+ ```
40
+
41
+ ## Resolution
42
+
43
+ For each RISQUE, choose ONE:
44
+
45
+ - **FIX**: exact correction needed — name the code change
46
+ - **ACCEPT**: why the risk is acceptable (TTL? cosmetic? window < 1s?)
47
+ - **DEFER**: issue reference + why out of scope
48
+
49
+ If 0 fixes needed → suspicious. Re-examine for specificity.
50
+
51
+ ## Quality checklist (8 items)
52
+
53
+ Apply after resolving RISQUES:
54
+
55
+ 1. Quality gates respected? (complexity < 15, nesting < 4, functions < 50 lines)
56
+ 2. All new imports exist in actual files at stated paths?
57
+ 3. All DB columns referenced exist in real schema?
58
+ 4. Test mocks on same host:port as actual requests?
59
+ 5. Tests could fail independently of implementation?
60
+ 6. Duplicated logic with existing code?
61
+ 7. Linter clean? (0 new violations vs base branch)
62
+ 8. Would a staff engineer approve this without changes?
63
+
64
+ Each item: evidence (`file:line` or command output) or explicit "N/A because X".
65
+
66
+ ## Output format
67
+
68
+ ```
69
+ ## RISQUES
70
+ 1. RISQUE: <X> parce que <Y> — IMPACT: <Z>
71
+ → FIX/ACCEPT/DEFER: <resolution>
72
+ 2. ...
73
+ 3. ...
74
+
75
+ ## CHECKLIST
76
+ - [✓/✗/N/A] <item> — <evidence>
77
+ ...
78
+
79
+ ## VERDICT
80
+ BLOCKING: <list or "none">
81
+ IMPORTANT: <list or "none">
82
+ MINOR: <list or "none">
83
+ ```
84
+
85
+ ## How to verify
86
+
87
+ - [ ] Exactly 3 RISQUES (no more, no less)?
88
+ - [ ] Distribution: 1 functional + 1 import + 1 data-assumption?
89
+ - [ ] Each RISQUE has file:line evidence?
90
+ - [ ] Each RISQUE has resolution (FIX/ACCEPT/DEFER)?
91
+ - [ ] Quality checklist (8 items) completed?
92
+ - [ ] VERDICT issued (BLOCKING/IMPORTANT/MINOR)?
93
+
94
+ ## Common mistakes
95
+
96
+ - **Generic critiques**: "might not scale" → too vague. "Loads all users into memory at line 47, O(n)" → specific.
97
+ - **Skipping distribution**: all 3 are functional risks, no import or data check → incomplete.
98
+ - **Too many RISQUES**: 5 critiques dilute focus. Pick top 3 by severity.
99
+ - **Not reading code**: reviewing the description instead of the actual file → always read code first.
@@ -0,0 +1,86 @@
1
+ ---
2
+ name: security-regression-check
3
+ description: How to check for security regressions in a diff — greps for new inputs, removed auth blocks, new external calls, new file access, new SQL/eval, and new trust boundaries. Attacker-eye review of what changed, not what was intended.
4
+ allowed-tools: Read, Grep, Bash
5
+ ---
6
+
7
+ # Security Regression Check — Attacker Eyes on the Diff
8
+
9
+ ## What this covers
10
+
11
+ How to check if a code change introduced security regressions. The hypothesis: "I fixed A without touching B" is NOT a check. Read the diff with attacker eyes — what did my fix add that wasn't there before?
12
+
13
+ ## Core principle
14
+
15
+ **Read `+` lines with attacker eyes, not author eyes.** The author's intent is irrelevant. What can an external actor do with this code path?
16
+
17
+ ## Process
18
+
19
+ ### 1. Capture the diff
20
+
21
+ ```bash
22
+ git diff --unified=3 HEAD
23
+ ```
24
+
25
+ ### 2. Grep for risk signals
26
+
27
+ | Signal | What to search | Why it matters |
28
+ |--------|---------------|----------------|
29
+ | New request param reads | `call.parameters[`, `request.body.`, `req.query.`, `req.params.` | New inputs = new validation surface |
30
+ | Removed auth blocks | `-` lines with `authenticate`, `requireAuth`, `verifyToken`, `checkPermission` | Removed auth = privilege escalation |
31
+ | New external calls | `+` lines with `fetch(`, `axios(`, `httpClient.` | New outbound = SSRF / data exfil risk |
32
+ | New file reads/writes | `+` lines with `File(`, `fs.readFile`, `fs.writeFile`, `Path(` | New FS access = path traversal risk |
33
+ | New SQL | `+` lines with SELECT, INSERT, UPDATE, DELETE | New queries = injection risk if concat |
34
+ | New eval/exec | `+` lines with `eval(`, `Function(`, `exec(` | Code injection risk |
35
+ | New trust boundaries | `+` lines with cookies, tokens, sessions | New trust = new spoofing surface |
36
+
37
+ ### 3. Classify each finding
38
+
39
+ - **Critical** — must address before merge
40
+ - **Important** — document + address OR accept with rationale
41
+ - **Informational** — note for reflection
42
+
43
+ ## Output format
44
+
45
+ ```
46
+ ## SECURITY REGRESSION CHECK
47
+
48
+ Diff scope: <N files, +X -Y lines>
49
+
50
+ ### New inputs (from request)
51
+ - <file:line> — <new param> — <has validation?>
52
+
53
+ ### Removed/modified auth
54
+ - <file:line> — <what changed>
55
+
56
+ ### New external calls
57
+ - <file:line> — <target | dynamic URL risk>
58
+
59
+ ### New file/FS access
60
+ - <file:line> — <path controlled by user?>
61
+
62
+ ### New SQL / eval
63
+ - <file:line> — <parameterized? safe?>
64
+
65
+ ### New trust boundaries
66
+ - <file:line> — <cookie/token/session change>
67
+
68
+ ### VERDICT
69
+ - Critical: <list or none>
70
+ - Important: <list or none>
71
+ - Informational: <list or none>
72
+ ```
73
+
74
+ ## How to verify
75
+
76
+ - [ ] Diff captured and reviewed?
77
+ - [ ] Risk signals grepped (new inputs, removed auth, external calls, file access, SQL/eval, trust boundaries)?
78
+ - [ ] Each finding classified (SAFE / RISK / BLOCK)?
79
+ - [ ] VERDICT issued (CLEAN / FINDINGS)?
80
+ - [ ] Attacker perspective applied?
81
+
82
+ ## Key rules
83
+
84
+ - **Diff scope matters**: 500-line diff → process in chunks. Fatigue causes misses.
85
+ - **Don't trust commit messages**: "just a refactor" still needs the check. Refactors routinely remove validation.
86
+ - **"No error" ≠ safe**: absence of error messages doesn't mean the change is secure.
@@ -0,0 +1,85 @@
1
+ ---
2
+ name: self-consistency-verifier
3
+ description: How to verify AI-generated code by generating 3 independent solutions, comparing them at syntactic/AST/behavioral levels, and scoring consistency. Divergent solutions indicate model uncertainty — re-prompt with constraints or escalate. Based on IdentityChain (2024) and Consistency-Aided Tested Code Generation (ACM 2025).
4
+ allowed-tools: Read, Grep, Glob, Bash, Write
5
+ ---
6
+
7
+ # Self-Consistency Verifier — If Three of You Disagree, One of You Is Wrong
8
+
9
+ ## What this covers
10
+
11
+ How to verify AI-generated code by generating 3 diverse solutions and comparing them. A confident LLM that generates 3 semantically identical solutions is probably right. A confident LLM that generates 3 divergent solutions is the dangerous case — it'll ship whichever came out first. Self-consistency is the cheapest high-signal uncertainty estimator available.
12
+
13
+ ## Core principle
14
+
15
+ **Divergence is diagnostic.** When solutions disagree, the disagreement itself tells you what constraint is missing. Don't just pick one — understand WHY they differ.
16
+
17
+ ## Methodology
18
+
19
+ ### Generate 3 diverse solutions
20
+
21
+ Re-prompt the LLM 3 times with diversifying seeds. The goal is divergent initial approaches, not different variable names.
22
+
23
+ **Diversification strategies** (pick 3 out of 5):
24
+ 1. **Constraint-reorder** — restate the problem with constraints in a different order
25
+ 2. **Language-shift** — ask for pseudocode first, THEN translate to target language
26
+ 3. **Test-first** — ask for test cases first, THEN the implementation
27
+ 4. **Adversarial framing** — "what would break this naïve solution?" then write the robust version
28
+ 5. **Reference implementation** — "find the canonical pattern" then adapt
29
+
30
+ ### Compare at 3 levels
31
+
32
+ **Level A — Syntactic (cheap)**
33
+ - Run formatter, normalize whitespace, compute textual diff
34
+ - Identical after format → consistency HIGH, skip to verdict
35
+ - Differ only in variable names → consistency HIGH
36
+ - Structural diff → proceed to Level B
37
+
38
+ **Level B — AST-level (medium)**
39
+ - Parse each solution to AST
40
+ - Compare: function signatures, control flow shape, side-effect surface, data shape flow
41
+ - Score: `consistency = matched_nodes / total_nodes`. ≥0.85 = HIGH, 0.60-0.85 = MEDIUM, <0.60 = LOW
42
+
43
+ **Level C — Behavioral (expensive, Critical only)**
44
+ - Generate 10-20 property-based test cases (`fast-check` / `hypothesis`)
45
+ - Run each solution against the same test cases
46
+ - All 3 pass all cases → consistency HIGH
47
+ - Divergent pass/fail patterns → at least one is wrong; use majority vote + investigate outlier
48
+
49
+ ### Interpret divergence
50
+
51
+ | Divergence type | Interpretation | Action |
52
+ |---|---|---|
53
+ | One solution handles edge case X, others don't | Missing explicit constraint | Add constraint, re-generate |
54
+ | Solutions use different libraries | Library choice under-specified | Pin the lib, pick one |
55
+ | Solutions use different algorithms with different complexity | Performance under-specified | Add perf constraint |
56
+ | Solutions have different error-handling | Error model under-specified | Specify what errors to surface |
57
+ | Two agree, one is outlier | Majority-vote the two, investigate outlier for missed insight | Use the majority |
58
+ | All three disagree | Problem under-specified or too hard | Escalate to human |
59
+
60
+ ## Key points
61
+
62
+ - **Cost budget**: Critical = full 3-level compare, ≤15 min. Standard = syntactic + AST only, ≤5 min. Trivial = skip entirely
63
+ - **Don't re-generate with the same prompt** — identical prompts produce highly similar outputs; the check becomes trivial. Always diversify
64
+ - **Don't majority-vote blindly** — an outlier that catches an edge case the other two missed is the RIGHT answer. Investigate before voting
65
+ - **AST compare requires a parser** — if the target language lacks easy AST access, fall back to behavioral compare or skip Level B
66
+ - **Three is the magic number** — two is a tie, four is diminishing returns
67
+
68
+ ## Common anti-patterns
69
+
70
+ 1. **Same-prompt re-generation**: identical prompts produce near-identical outputs, making the check trivial and useless
71
+ 2. **Blind majority voting**: an outlier may be the only one that caught a real edge case — investigate before discarding
72
+ 3. **Skipping divergence analysis**: the WHY of divergence is more valuable than the score itself
73
+ 4. **Running behavioral tests on every task**: reserve for Critical code only; syntactic + AST is enough for Standard
74
+
75
+ ## How to verify
76
+
77
+ - **Score threshold**: ≥0.85 = HIGH confidence, proceed. 0.60-0.85 = MEDIUM, adopt majority + add tests. <0.60 = LOW, re-prompt or escalate
78
+ - **Edge case surfacing**: divergence analysis should produce at least 1 concrete edge case to test
79
+ - **Constraint improvement**: after divergence, the problem statement should have more constraints than before
80
+
81
+ ## References
82
+
83
+ - IdentityChain — openreview.net/forum?id=caW7LdAALh — self-consistency for code LLMs
84
+ - ACM 2025 — "Consistency-Aided Tested Code Generation with LLM" (dl.acm.org/doi/pdf/10.1145/3728902)
85
+ - arxiv 2507.06920 — "Rethinking Verification for LLM Code Generation: From Generation to Testing"
@@ -0,0 +1,101 @@
1
+ ---
2
+ name: spike-mode
3
+ description: How to use SPIKE mode in Ciel v5 — prototype/exploration mode with assoupli gates. Create .ciel/exploration.active to enter spike mode. Quality gates relaxed, code marked FIXME/TODO. Used for POC, draft, experimental, throwaway code. Must be refactored properly after.
4
+ ---
5
+
6
+ # SPIKE Mode — Explore Without Commitment (Ciel v5)
7
+
8
+ ## What this covers
9
+
10
+ How to use SPIKE mode in Ciel v5 for prototyping and exploration. When you need to test an idea quickly without going through the full quality pipeline. The mode is triggered by creating a `.ciel/exploration.active` file in the project root.
11
+
12
+ ## Core principle
13
+
14
+ **Speed over quality during exploration. Quality over speed for production.** SPIKE mode exists because sometimes you need to write throwaway code to validate an approach. But throwaway code that stays is technical debt.
15
+
16
+ ## When to use SPIKE mode
17
+
18
+ - Prototyping a new feature
19
+ - Testing a library integration
20
+ - Exploring a complex refactoring
21
+ - Validating an architecture approach
22
+ - POC / proof of concept
23
+ - "I don't know if this will work, let me try"
24
+
25
+ Do NOT use SPIKE mode for:
26
+ - Production code
27
+ - Code you plan to keep
28
+ - Critical/security code
29
+ - Code you already know how to implement
30
+
31
+ ## How to enter SPIKE mode
32
+
33
+ ```bash
34
+ touch .ciel/exploration.active
35
+ ```
36
+
37
+ The plugin detects this file and:
38
+ - Assouplit gates 1 (test-first) and 4 (quality)
39
+ - Injects SPIKE mode indicator in system prompt
40
+ - Marks all code as experimental
41
+
42
+ ## How to exit SPIKE mode
43
+
44
+ ```bash
45
+ rm .ciel/exploration.active
46
+ ```
47
+
48
+ Or when the exploration is done,
49
+ - Refactor the experimental code properly
50
+ - Add tests
51
+ - Follow the full pipeline
52
+
53
+ ## What changes in SPIKE mode
54
+
55
+ | Gate | Standard mode | SPIKE mode |
56
+ |------|---------------|------------|
57
+ | Test-first (RED) | Bloquant | Assoupli |
58
+ | Alternatives | Requis | Recommande |
59
+ | Idiomatic | Requis | Recommande |
60
+ | Quality (complexity, nesting) | Enforce | Assoupli |
61
+ | Removal safety | Requis | Requis |
62
+ | Boy-scout | Recommande | Recommande |
63
+ | FIXME/TODO markers | Optionnel | OBLIGATOIRE |
64
+
65
+ ## Output format
66
+
67
+ When in SPIKE mode, add this to the plan:
68
+
69
+ ```
70
+ ## SPIKE MODE
71
+
72
+ Goal: <what are we trying to learn/prove?>
73
+ Exit criteria: <when is this exploration done?>
74
+ Markers: <files marked FIXME/TODO>
75
+ Follow-up task: <describe the proper implementation>
76
+ ```
77
+
78
+ ## Common rationalizations
79
+
80
+ | Rationalization | Reality |
81
+ |---|---|
82
+ | "I'll clean up the spike code later" | You won't. If you don't schedule the cleanup immediately, spike code becomes permanent debt. |
83
+ | "The gates are annoying, I'll use spike mode" | SPIKE mode is for when you DON'T KNOW the solution, not for when you don't WANT to write tests. |
84
+ | "This is just a quick prototype, no need for FIXME" | Unmarked prototype code looks like production code. Without FIXME, nobody knows it needs refactoring. Future you included. |
85
+ | "SPIKE mode means no rules" | SPIKE assouplit les gates mais ne les supprime pas. Security et removal restent actifs. |
86
+
87
+ ## Rules
88
+
89
+ - **Code written in SPIKE mode MUST be marked FIXME or TODO**
90
+ - **SPIKE code MUST be refactored or removed after exploration**
91
+ - **SPIKE mode does not bypass security gates** (removal safety still applies)
92
+ - **Do not commit SPIKE code without refactoring**
93
+ - **SPIKE mode is for individual exploration sessions, not for PRs**
94
+
95
+ ## How to verify
96
+
97
+ - [ ] .ciel/exploration.active exists?
98
+ - [ ] All exploratory code has FIXME/TODO markers?
99
+ - [ ] Exit criteria defined?
100
+ - [ ] Follow-up task created for proper implementation?
101
+ - [ ] No SPIKE code committed without refactoring?
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: stride-analyzer
3
+ description: How to threat model with STRIDE — 3-pass methodology: risk-rank by mechanical signals, STRIDE 6 categories (Spoofing/Tampering/Repudiation/Info Disclosure/DoS/Elevation) with grep evidence, and killer checklist. For auth, DB schema, payment, security changes.
4
+ allowed-tools: Read, Grep, Glob, Bash
5
+ ---
6
+
7
+ # STRIDE Threat Modeling — Security Analysis Methodology
8
+
9
+ ## What this covers
10
+
11
+ How to do a security threat model using STRIDE. STRIDE is the framework; grep is the evidence. No theater — every finding needs `file:line` proof.
12
+
13
+ ## Core principle
14
+
15
+ **Anti-theater rule**: every checklist item needs evidence (file:line or grep output). "Checked ✓" with no evidence = not checked.
16
+
17
+ ## Pass 1: Risk rank (mechanical signals)
18
+
19
+ Classify the change:
20
+
21
+ - **Critical** if ANY: `auth/`, `security/`, DB tables (users, sessions, tokens), `.executeQuery`, `.executeUpdate`, `userId`, `password`, `token`, `secret`
22
+ - **Important** if ANY: diff > 5 files, `validate`, `sanitize`, `rateLimit`, route handlers
23
+ - **Routine** otherwise
24
+
25
+ → Critical = all 3 passes. Important = passes 2+3. Routine = pass 3 only.
26
+
27
+ ## Pass 2: STRIDE 6 categories (Critical/Important)
28
+
29
+ For each category, answer with grep-backed evidence:
30
+
31
+ | Category | Question | Evidence type |
32
+ |----------|----------|--------------|
33
+ | **S**poofing | Can I impersonate someone? | Auth checks, token validation |
34
+ | **T**ampering | Can input be modified in transit? | Input validation, integrity checks |
35
+ | **R**epudiation | Can a user deny this action? | Audit logging, timestamps |
36
+ | **I**nfo Disclosure | What leaks? | Error messages, logs, responses |
37
+ | **D**oS | Can this be flooded/exhausted? | Rate limits, resource bounds |
38
+ | **E**levation | Can I access what I shouldn't? | Authorization checks, role validation |
39
+
40
+ Each answer: grep-backed or "N/A because X". **Mark N/A explicitly, never skip silently.**
41
+
42
+ **OPS lens** (overlayed on STRIDE): unclosed connections, memory leaks, locks, behavior at 100x volume.
43
+
44
+ ## Pass 3: Killer checklist (all levels)
45
+
46
+ - Same field = same validation everywhere? (grep to verify)
47
+ - Same domain = same auth on ALL transports (REST + WS + SSE)?
48
+ - Identity fields resolved server-side, never client-supplied?
49
+ - SQL parameterized, never interpolated?
50
+ - PII touched = anonymization covered?
51
+
52
+ Each item: evidence (`file:line` or grep output) or N/A.
53
+
54
+ ## Output format
55
+
56
+ ```
57
+ ## STRIDE ANALYSIS
58
+
59
+ ### Risk rank: <Critical | Important | Routine>
60
+ Signals: <list>
61
+
62
+ ### STRIDE (if Critical/Important)
63
+ - S (Spoofing): <N/A because X | RISQUE: ... — evidence: file:line>
64
+ - T (Tampering): <...>
65
+ - R (Repudiation): <...>
66
+ - I (Info Disclosure): <...>
67
+ - D (DoS): <...>
68
+ - E (Elevation): <...>
69
+
70
+ OPS: <connections | memory | locks | 100x volume>
71
+
72
+ ### Killer checklist
73
+ - [✓/✗] Same validation everywhere — evidence: <grep output>
74
+ - [✓/✗] Auth parity across transports — evidence: <...>
75
+ - [✓/✗] Identity server-side — evidence: <...>
76
+ - [✓/✗] SQL parameterized — evidence: <...>
77
+ - [✓/✗] PII anonymization — evidence: <...>
78
+
79
+ ### VERDICT
80
+ BLOCKING: <list or none>
81
+ IMPORTANT: <list or none>
82
+ ```
83
+
84
+ ## How to verify
85
+
86
+ - [ ] Pass 1 (Risk rank) completed with mechanical signals?
87
+ - [ ] Pass 2 (STRIDE 6 categories) — all categories have findings or explicit "N/A because X"?
88
+ - [ ] Pass 3 (Killer checklist) completed?
89
+ - [ ] VERDICT issued (PROCEED / BLOCK / INVESTIGATE)?
90
+ - [ ] Evidence format: `file:line` or grep output?
91
+
92
+ ## Key rules
93
+
94
+ - **Don't skip categories silently**: every STRIDE category gets a finding or explicit "N/A because X"
95
+ - **Evidence format**: `path/to/file.ext:123` or `grep -n "pattern" src/` output
96
+ - **Rotate stale items**: if a checklist item catches nothing in 10+ audits, consider replacing it
@@ -0,0 +1,144 @@
1
+ # stride-analyzer — Reference
2
+
3
+ ## STRIDE — detailed category probes
4
+
5
+ ### S — Spoofing (identity)
6
+
7
+ Can I impersonate another user/service/system?
8
+
9
+ Probes:
10
+ - Grep for `userId` / `user_id` coming from request params vs resolved server-side (JWT, session)
11
+ - Grep for identity claims trusted without verification (e.g. `X-User-Id` header accepted as-is)
12
+ - Check auth middleware ordering: is authentication before authorization?
13
+ - WebSocket/SSE: is the same auth applied? (common gap: REST auth is bulletproof, WS accepts any token)
14
+
15
+ Evidence format:
16
+ ```
17
+ - Spoofing: userId extracted from JWT claim at JwtMiddleware.kt:45 — not client-supplied ✓
18
+ ```
19
+
20
+ ### T — Tampering (data integrity)
21
+
22
+ Can input be modified in transit or at rest without detection?
23
+
24
+ Probes:
25
+ - HTTPS everywhere? Grep for `http://` (non-localhost)
26
+ - CSRF tokens on state-changing endpoints?
27
+ - Signed cookies / signed JWTs? What algorithm? (HS256 vs RS256 considerations)
28
+ - Database writes: is the audit trail immutable? (INSERT-only tables for events)
29
+
30
+ ### R — Repudiation (non-denial)
31
+
32
+ Can a user deny having performed an action?
33
+
34
+ Probes:
35
+ - Audit log coverage: what events are logged? With what identity?
36
+ - Log tampering resistance: append-only? Logged externally?
37
+ - Timestamp source: server-controlled? Synced?
38
+
39
+ ### I — Information Disclosure
40
+
41
+ What information leaks to unauthorized parties?
42
+
43
+ Probes:
44
+ - Error messages: do they include stack traces / SQL / paths / credentials?
45
+ - Logs: do they contain PII, secrets, tokens?
46
+ - API responses: over-fetching? `SELECT *` instead of projected columns?
47
+ - 404 vs 403 distinction: timing attack on existence probe?
48
+ - Autocomplete endpoints: leak usernames / emails?
49
+
50
+ ### D — Denial of Service
51
+
52
+ Can this be flooded or exhausted?
53
+
54
+ Probes:
55
+ - Rate limiting: per-IP? per-user? per-endpoint?
56
+ - Resource bounds: max payload size? max query depth (GraphQL)? max file upload?
57
+ - Algorithmic complexity: O(n²) loops on user-controlled n?
58
+ - Connection pooling: max connections? timeout?
59
+ - Regex catastrophic backtracking on user input?
60
+
61
+ ### E — Elevation of Privilege
62
+
63
+ Can I access what I shouldn't?
64
+
65
+ Probes:
66
+ - RBAC/ABAC correctness: does the permission check run before the action?
67
+ - Horizontal privilege escalation: can user A read user B's data with API manipulation?
68
+ - Vertical privilege escalation: can user become admin via some path?
69
+ - Mass assignment: can user set `isAdmin` via PATCH body?
70
+
71
+ ## OPS lens (overlayed on STRIDE)
72
+
73
+ - **Unclosed connections**: grep for `conn.close()` / `client.close()` / `try-with-resources` / `use {}` — every open should have a close
74
+ - **Memory leaks**: long-lived caches without eviction? Unbounded collections? Listeners not removed?
75
+ - **Locks**: deadlock-prone order? Held across I/O?
76
+ - **100x volume**: if traffic grew 100x tomorrow, what breaks first?
77
+
78
+ ## Killer checklist — detail
79
+
80
+ ### Same field = same validation everywhere
81
+
82
+ If `email` is validated one way in `RegisterRoute.kt` and another way in `ProfileUpdateRoute.kt`, an attacker uses the weaker one. Validation must be centralized.
83
+
84
+ ```bash
85
+ # Find all places email is validated
86
+ grep -rn "email" --include='*.kt' src/ | grep -iE 'valid|sanitize|check'
87
+ ```
88
+
89
+ Evidence: all call sites converge on a single validator.
90
+
91
+ ### Same domain = same auth on ALL transports
92
+
93
+ REST endpoint has auth; WebSocket channel for the same resource doesn't (or uses different auth). Attacker bypasses via WebSocket.
94
+
95
+ ```bash
96
+ grep -rn "authenticate" src/ --include='*.kt'
97
+ grep -rn "socket\|websocket\|sse\|webFluxClient" src/
98
+ ```
99
+
100
+ ### Identity resolved server-side
101
+
102
+ ```bash
103
+ # Any userId coming from request body/path?
104
+ grep -rn 'call.parameters\["userId"\]' src/
105
+ grep -rn 'request.body.userId' src/
106
+ # Should all be via JWT/session claim
107
+ ```
108
+
109
+ ### SQL parameterized
110
+
111
+ ```bash
112
+ # Find string interpolation in SQL
113
+ grep -rn "\\\$" src/ --include='*.kt' | grep -iE 'sql|query'
114
+ grep -rn "\"SELECT.*\"\ +\ " src/
115
+ ```
116
+
117
+ ### PII anonymization
118
+
119
+ ```bash
120
+ # Find logging of user fields
121
+ grep -rn "logger.info.*user" src/
122
+ grep -rn "println.*email\|println.*phone" src/
123
+ ```
124
+
125
+ ## Multi-PR delegation
126
+
127
+ When the same reviewer has done 2+ STRIDE passes on related PRs in one session, blind spots compound. Delegate the 2nd pass to a subagent:
128
+
129
+ ```
130
+ Task(subagent_type="Explore", prompt="""
131
+ Run STRIDE PASSE 2 on this diff. Fresh eyes, no session history.
132
+ CHANGED_FILES: [...]
133
+ FOCUS: category you feel is weakest
134
+ """)
135
+ ```
136
+
137
+ ## Stale item rotation
138
+
139
+ Tracked via `learnings-capture`: if a killer checklist item passes (✓) in 10+ audits without catching anything, flag for review. Either:
140
+
141
+ - The codebase is genuinely clean on that dimension → consider removing item
142
+ - The item is too vague to fail → tighten the check
143
+
144
+ Replace with a newer, more specific check.