npm - @neikyun/ciel - Versions diffs - 6.10.1 → 6.11.1 - Mend

@neikyun/ciel 6.10.1 → 6.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/assets/skills/workflow/quoi-framer/SKILL.md ADDED Viewed

@@ -0,0 +1,91 @@
+---
+name: quoi-framer
+description: How to frame a task before starting — forces explicit goal, NOT-X constraint, intention partagee, and measurable definition of done. For Ciel v5 pipeline step 2 (QUOI). Use after DOCS phase, before ASK phase.
+---
+# Task Framing — Define Before You Start (Ciel v5)
+## What this covers
+How to define a task clearly before doing any work. This is the first step of the Ciel v5 pipeline (etape 2: QUOI). Applied after DOCS phase (etape 1) and before ASK (etape 3). Prevents scope drift, wasted research, and "I thought you meant..." conversations.
+## Core principle
+**State the goal, the constraint, the intention, and the done criteria BEFORE researching or coding.** If you can't state these in 5 lines, you don't understand the task yet.
+## The 5 output gates (ALL required)
+### 1. Expected result
+One sentence. Concrete and testable.
+- BAD: "Improve the API"
+- GOOD: "GET /api/users returns a paginated list with page+limit query params"
+### 2. NOT-X constraint
+At least 1 concrete thing the solution MUST NOT do:
+- "NOT-X: no N+1 queries"
+- "NOT-X: no new dependencies added"
+- "NOT-X: no breaking changes to existing callers"
+- "NOT-X: no schema migration"
+"no bad code" is not NOT-X. "No global state mutation" is.
+### 3. Intentions partagees (v5)
+State what you are looking for, not what you expect to find. This guides exploration without biasing it:
+- BAD: "Find where pdfmake is used for PDF export" (cherche une solution specifique)
+- GOOD: "Understand how exports are handled in this project" (intention ouverte)
+The intention is passed to @ciel-explorer to guide scent-following without creating confirmation bias.
+### 4. Definition of done
+Measurable before research starts:
+- "Done when: endpoint returns 200 with `{items, total, page}` shape, test passes on staging, no perf regression vs baseline"
+"done when it works" is not acceptable. Specify the observable signal.
+### 5. DOCS gate (v5)
+Before framing, verify that documentation has been read:
+- README.md (project overview and conventions)
+- ADRs if they exist (architecture decisions)
+- Tickets/specs (requirements context)
+- .ciel/map.json (existing project map)
+- ciel-overlay.md (project overlay)
+## Output format
+```
+## QUOI
+Expected result: <one sentence>
+NOT-X: <concrete constraint>
+Intentions: <what I'm looking for (open question)>
+Done when: <measurable criteria>
+Docs read: <yes — README, ADRs, map, tickets>
+```
+## Common rationalizations
+| Rationalization | Reality |
+|---|---|
+| "This is simple, I don't need to frame it" | Simple tasks benefit from 2-line frames. The frame costs 10 seconds. Scope drift costs hours. |
+| "I already know what to build" | Write it down anyway. Writing forces precision. "I know" is how ambiguity hides. |
+| "NOT-X is obvious" | If it's obvious, writing it takes 2 seconds. If you can't write it, it wasn't obvious. |
+## How to verify
+- [ ] QUOI statement: 1 sentence, describes WHAT not HOW?
+- [ ] NOT-X constraint: >= 1 explicit exclusion?
+- [ ] Intentions partagees: open question, not solution-biased?
+- [ ] Definition of done: >= 1 measurable criterion?
+- [ ] DOCS gate: documentation has been read?
+## When to re-frame
+- Start of any task (before research, after DOCS)
+- When scope drift is detected (3+ files touched without re-checking goal)
+- When the user changes direction mid-task

package/assets/skills/workflow/relire-critic/SKILL.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: relire-critic
+description: How to self-review code effectively — hostile critique methodology, risk taxonomy, and quality checklist. Generates exactly 3 targeted critiques (functional, import/API, data assumption) then resolves each. Applicable after any code change.
+allowed-tools: Read, Grep, Glob, Bash
+---
+# Code Self-Review — Hostile Critique Methodology
+## What this covers
+How to review your own code as if someone else wrote it. Self-review fails because the author reinforces their own blind spots (degeneration of thought, CriticBench 2024). This methodology forces adversarial thinking.
+## Core principle
+Read changed files **as if someone else wrote them**. Your job is to find what could fail, not to confirm what works.
+## Methodology: 3 RISQUES
+Generate EXACTLY 3 specific critiques of the changed code. Not 2, not 5 — 3 forces focus.
+### Mandatory distribution
+Each set of 3 RISQUES must include:
+1. **Functional risk** — what breaks for users? "This fails when..."
+2. **Import/API surface check** — does this import path actually exist? Is the API contract correct?
+3. **Data assumption check** — does this DB column / response shape / format actually match reality?
+### Specificity rules
+- Concrete, not abstract: "might have bugs" is invalid
+- Reference specific `file:line` where the risk lives
+- Can't generate 3 specific critiques → you don't understand the code → read more
+### Format
+```
+RISQUE: [what could fail] parce que [root cause] — IMPACT: [consequence]
+```
+## Resolution
+For each RISQUE, choose ONE:
+- **FIX**: exact correction needed — name the code change
+- **ACCEPT**: why the risk is acceptable (TTL? cosmetic? window < 1s?)
+- **DEFER**: issue reference + why out of scope
+If 0 fixes needed → suspicious. Re-examine for specificity.
+## Quality checklist (8 items)
+Apply after resolving RISQUES:
+1. Quality gates respected? (complexity < 15, nesting < 4, functions < 50 lines)
+2. All new imports exist in actual files at stated paths?
+3. All DB columns referenced exist in real schema?
+4. Test mocks on same host:port as actual requests?
+5. Tests could fail independently of implementation?
+6. Duplicated logic with existing code?
+7. Linter clean? (0 new violations vs base branch)
+8. Would a staff engineer approve this without changes?
+Each item: evidence (`file:line` or command output) or explicit "N/A because X".
+## Output format
+```
+## RISQUES
+1. RISQUE: <X> parce que <Y> — IMPACT: <Z>
+   → FIX/ACCEPT/DEFER: <resolution>
+2. ...
+3. ...
+## CHECKLIST
+- [✓/✗/N/A] <item> — <evidence>
+...
+## VERDICT
+BLOCKING: <list or "none">
+IMPORTANT: <list or "none">
+MINOR: <list or "none">
+```
+## How to verify
+- [ ] Exactly 3 RISQUES (no more, no less)?
+- [ ] Distribution: 1 functional + 1 import + 1 data-assumption?
+- [ ] Each RISQUE has file:line evidence?
+- [ ] Each RISQUE has resolution (FIX/ACCEPT/DEFER)?
+- [ ] Quality checklist (8 items) completed?
+- [ ] VERDICT issued (BLOCKING/IMPORTANT/MINOR)?
+## Common mistakes
+- **Generic critiques**: "might not scale" → too vague. "Loads all users into memory at line 47, O(n)" → specific.
+- **Skipping distribution**: all 3 are functional risks, no import or data check → incomplete.
+- **Too many RISQUES**: 5 critiques dilute focus. Pick top 3 by severity.
+- **Not reading code**: reviewing the description instead of the actual file → always read code first.

package/assets/skills/workflow/security-regression-check/SKILL.md ADDED Viewed

@@ -0,0 +1,86 @@
+---
+name: security-regression-check
+description: How to check for security regressions in a diff — greps for new inputs, removed auth blocks, new external calls, new file access, new SQL/eval, and new trust boundaries. Attacker-eye review of what changed, not what was intended.
+allowed-tools: Read, Grep, Bash
+---
+# Security Regression Check — Attacker Eyes on the Diff
+## What this covers
+How to check if a code change introduced security regressions. The hypothesis: "I fixed A without touching B" is NOT a check. Read the diff with attacker eyes — what did my fix add that wasn't there before?
+## Core principle
+**Read `+` lines with attacker eyes, not author eyes.** The author's intent is irrelevant. What can an external actor do with this code path?
+## Process
+### 1. Capture the diff
+```bash
+git diff --unified=3 HEAD
+```
+### 2. Grep for risk signals
+| Signal | What to search | Why it matters |
+|--------|---------------|----------------|
+| New request param reads | `call.parameters[`, `request.body.`, `req.query.`, `req.params.` | New inputs = new validation surface |
+| Removed auth blocks | `-` lines with `authenticate`, `requireAuth`, `verifyToken`, `checkPermission` | Removed auth = privilege escalation |
+| New external calls | `+` lines with `fetch(`, `axios(`, `httpClient.` | New outbound = SSRF / data exfil risk |
+| New file reads/writes | `+` lines with `File(`, `fs.readFile`, `fs.writeFile`, `Path(` | New FS access = path traversal risk |
+| New SQL | `+` lines with SELECT, INSERT, UPDATE, DELETE | New queries = injection risk if concat |
+| New eval/exec | `+` lines with `eval(`, `Function(`, `exec(` | Code injection risk |
+| New trust boundaries | `+` lines with cookies, tokens, sessions | New trust = new spoofing surface |
+### 3. Classify each finding
+- **Critical** — must address before merge
+- **Important** — document + address OR accept with rationale
+- **Informational** — note for reflection
+## Output format
+```
+## SECURITY REGRESSION CHECK
+Diff scope: <N files, +X -Y lines>
+### New inputs (from request)
+- <file:line> — <new param> — <has validation?>
+### Removed/modified auth
+- <file:line> — <what changed>
+### New external calls
+- <file:line> — <target | dynamic URL risk>
+### New file/FS access
+- <file:line> — <path controlled by user?>
+### New SQL / eval
+- <file:line> — <parameterized? safe?>
+### New trust boundaries
+- <file:line> — <cookie/token/session change>
+### VERDICT
+- Critical: <list or none>
+- Important: <list or none>
+- Informational: <list or none>
+```
+## How to verify
+- [ ] Diff captured and reviewed?
+- [ ] Risk signals grepped (new inputs, removed auth, external calls, file access, SQL/eval, trust boundaries)?
+- [ ] Each finding classified (SAFE / RISK / BLOCK)?
+- [ ] VERDICT issued (CLEAN / FINDINGS)?
+- [ ] Attacker perspective applied?
+## Key rules
+- **Diff scope matters**: 500-line diff → process in chunks. Fatigue causes misses.
+- **Don't trust commit messages**: "just a refactor" still needs the check. Refactors routinely remove validation.
+- **"No error" ≠ safe**: absence of error messages doesn't mean the change is secure.

package/assets/skills/workflow/self-consistency-verifier/SKILL.md ADDED Viewed

@@ -0,0 +1,85 @@
+---
+name: self-consistency-verifier
+description: How to verify AI-generated code by generating 3 independent solutions, comparing them at syntactic/AST/behavioral levels, and scoring consistency. Divergent solutions indicate model uncertainty — re-prompt with constraints or escalate. Based on IdentityChain (2024) and Consistency-Aided Tested Code Generation (ACM 2025).
+allowed-tools: Read, Grep, Glob, Bash, Write
+---
+# Self-Consistency Verifier — If Three of You Disagree, One of You Is Wrong
+## What this covers
+How to verify AI-generated code by generating 3 diverse solutions and comparing them. A confident LLM that generates 3 semantically identical solutions is probably right. A confident LLM that generates 3 divergent solutions is the dangerous case — it'll ship whichever came out first. Self-consistency is the cheapest high-signal uncertainty estimator available.
+## Core principle
+**Divergence is diagnostic.** When solutions disagree, the disagreement itself tells you what constraint is missing. Don't just pick one — understand WHY they differ.
+## Methodology
+### Generate 3 diverse solutions
+Re-prompt the LLM 3 times with diversifying seeds. The goal is divergent initial approaches, not different variable names.
+**Diversification strategies** (pick 3 out of 5):
+1. **Constraint-reorder** — restate the problem with constraints in a different order
+2. **Language-shift** — ask for pseudocode first, THEN translate to target language
+3. **Test-first** — ask for test cases first, THEN the implementation
+4. **Adversarial framing** — "what would break this naïve solution?" then write the robust version
+5. **Reference implementation** — "find the canonical pattern" then adapt
+### Compare at 3 levels
+**Level A — Syntactic (cheap)**
+- Run formatter, normalize whitespace, compute textual diff
+- Identical after format → consistency HIGH, skip to verdict
+- Differ only in variable names → consistency HIGH
+- Structural diff → proceed to Level B
+**Level B — AST-level (medium)**
+- Parse each solution to AST
+- Compare: function signatures, control flow shape, side-effect surface, data shape flow
+- Score: `consistency = matched_nodes / total_nodes`. ≥0.85 = HIGH, 0.60-0.85 = MEDIUM, <0.60 = LOW
+**Level C — Behavioral (expensive, Critical only)**
+- Generate 10-20 property-based test cases (`fast-check` / `hypothesis`)
+- Run each solution against the same test cases
+- All 3 pass all cases → consistency HIGH
+- Divergent pass/fail patterns → at least one is wrong; use majority vote + investigate outlier
+### Interpret divergence
+| Divergence type | Interpretation | Action |
+|---|---|---|
+| One solution handles edge case X, others don't | Missing explicit constraint | Add constraint, re-generate |
+| Solutions use different libraries | Library choice under-specified | Pin the lib, pick one |
+| Solutions use different algorithms with different complexity | Performance under-specified | Add perf constraint |
+| Solutions have different error-handling | Error model under-specified | Specify what errors to surface |
+| Two agree, one is outlier | Majority-vote the two, investigate outlier for missed insight | Use the majority |
+| All three disagree | Problem under-specified or too hard | Escalate to human |
+## Key points
+- **Cost budget**: Critical = full 3-level compare, ≤15 min. Standard = syntactic + AST only, ≤5 min. Trivial = skip entirely
+- **Don't re-generate with the same prompt** — identical prompts produce highly similar outputs; the check becomes trivial. Always diversify
+- **Don't majority-vote blindly** — an outlier that catches an edge case the other two missed is the RIGHT answer. Investigate before voting
+- **AST compare requires a parser** — if the target language lacks easy AST access, fall back to behavioral compare or skip Level B
+- **Three is the magic number** — two is a tie, four is diminishing returns
+## Common anti-patterns
+1. **Same-prompt re-generation**: identical prompts produce near-identical outputs, making the check trivial and useless
+2. **Blind majority voting**: an outlier may be the only one that caught a real edge case — investigate before discarding
+3. **Skipping divergence analysis**: the WHY of divergence is more valuable than the score itself
+4. **Running behavioral tests on every task**: reserve for Critical code only; syntactic + AST is enough for Standard
+## How to verify
+- **Score threshold**: ≥0.85 = HIGH confidence, proceed. 0.60-0.85 = MEDIUM, adopt majority + add tests. <0.60 = LOW, re-prompt or escalate
+- **Edge case surfacing**: divergence analysis should produce at least 1 concrete edge case to test
+- **Constraint improvement**: after divergence, the problem statement should have more constraints than before
+## References
+- IdentityChain — openreview.net/forum?id=caW7LdAALh — self-consistency for code LLMs
+- ACM 2025 — "Consistency-Aided Tested Code Generation with LLM" (dl.acm.org/doi/pdf/10.1145/3728902)
+- arxiv 2507.06920 — "Rethinking Verification for LLM Code Generation: From Generation to Testing"

package/assets/skills/workflow/spike-mode/SKILL.md ADDED Viewed

@@ -0,0 +1,101 @@
+---
+name: spike-mode
+description: How to use SPIKE mode in Ciel v5 — prototype/exploration mode with assoupli gates. Create .ciel/exploration.active to enter spike mode. Quality gates relaxed, code marked FIXME/TODO. Used for POC, draft, experimental, throwaway code. Must be refactored properly after.
+---
+# SPIKE Mode — Explore Without Commitment (Ciel v5)
+## What this covers
+How to use SPIKE mode in Ciel v5 for prototyping and exploration. When you need to test an idea quickly without going through the full quality pipeline. The mode is triggered by creating a `.ciel/exploration.active` file in the project root.
+## Core principle
+**Speed over quality during exploration. Quality over speed for production.** SPIKE mode exists because sometimes you need to write throwaway code to validate an approach. But throwaway code that stays is technical debt.
+## When to use SPIKE mode
+- Prototyping a new feature
+- Testing a library integration
+- Exploring a complex refactoring
+- Validating an architecture approach
+- POC / proof of concept
+- "I don't know if this will work, let me try"
+Do NOT use SPIKE mode for:
+- Production code
+- Code you plan to keep
+- Critical/security code
+- Code you already know how to implement
+## How to enter SPIKE mode
+```bash
+touch .ciel/exploration.active
+```
+The plugin detects this file and:
+- Assouplit gates 1 (test-first) and 4 (quality)
+- Injects SPIKE mode indicator in system prompt
+- Marks all code as experimental
+## How to exit SPIKE mode
+```bash
+rm .ciel/exploration.active
+```
+Or when the exploration is done,
+- Refactor the experimental code properly
+- Add tests
+- Follow the full pipeline
+## What changes in SPIKE mode
+| Gate | Standard mode | SPIKE mode |
+|------|---------------|------------|
+| Test-first (RED) | Bloquant | Assoupli |
+| Alternatives | Requis | Recommande |
+| Idiomatic | Requis | Recommande |
+| Quality (complexity, nesting) | Enforce | Assoupli |
+| Removal safety | Requis | Requis |
+| Boy-scout | Recommande | Recommande |
+| FIXME/TODO markers | Optionnel | OBLIGATOIRE |
+## Output format
+When in SPIKE mode, add this to the plan:
+```
+## SPIKE MODE
+Goal: <what are we trying to learn/prove?>
+Exit criteria: <when is this exploration done?>
+Markers: <files marked FIXME/TODO>
+Follow-up task: <describe the proper implementation>
+```
+## Common rationalizations
+| Rationalization | Reality |
+|---|---|
+| "I'll clean up the spike code later" | You won't. If you don't schedule the cleanup immediately, spike code becomes permanent debt. |
+| "The gates are annoying, I'll use spike mode" | SPIKE mode is for when you DON'T KNOW the solution, not for when you don't WANT to write tests. |
+| "This is just a quick prototype, no need for FIXME" | Unmarked prototype code looks like production code. Without FIXME, nobody knows it needs refactoring. Future you included. |
+| "SPIKE mode means no rules" | SPIKE assouplit les gates mais ne les supprime pas. Security et removal restent actifs. |
+## Rules
+- **Code written in SPIKE mode MUST be marked FIXME or TODO**
+- **SPIKE code MUST be refactored or removed after exploration**
+- **SPIKE mode does not bypass security gates** (removal safety still applies)
+- **Do not commit SPIKE code without refactoring**
+- **SPIKE mode is for individual exploration sessions, not for PRs**
+## How to verify
+- [ ] .ciel/exploration.active exists?
+- [ ] All exploratory code has FIXME/TODO markers?
+- [ ] Exit criteria defined?
+- [ ] Follow-up task created for proper implementation?
+- [ ] No SPIKE code committed without refactoring?

package/assets/skills/workflow/stride-analyzer/SKILL.md ADDED Viewed

@@ -0,0 +1,96 @@
+---
+name: stride-analyzer
+description: How to threat model with STRIDE — 3-pass methodology: risk-rank by mechanical signals, STRIDE 6 categories (Spoofing/Tampering/Repudiation/Info Disclosure/DoS/Elevation) with grep evidence, and killer checklist. For auth, DB schema, payment, security changes.
+allowed-tools: Read, Grep, Glob, Bash
+---
+# STRIDE Threat Modeling — Security Analysis Methodology
+## What this covers
+How to do a security threat model using STRIDE. STRIDE is the framework; grep is the evidence. No theater — every finding needs `file:line` proof.
+## Core principle
+**Anti-theater rule**: every checklist item needs evidence (file:line or grep output). "Checked ✓" with no evidence = not checked.
+## Pass 1: Risk rank (mechanical signals)
+Classify the change:
+- **Critical** if ANY: `auth/`, `security/`, DB tables (users, sessions, tokens), `.executeQuery`, `.executeUpdate`, `userId`, `password`, `token`, `secret`
+- **Important** if ANY: diff > 5 files, `validate`, `sanitize`, `rateLimit`, route handlers
+- **Routine** otherwise
+→ Critical = all 3 passes. Important = passes 2+3. Routine = pass 3 only.
+## Pass 2: STRIDE 6 categories (Critical/Important)
+For each category, answer with grep-backed evidence:
+| Category | Question | Evidence type |
+|----------|----------|--------------|
+| **S**poofing | Can I impersonate someone? | Auth checks, token validation |
+| **T**ampering | Can input be modified in transit? | Input validation, integrity checks |
+| **R**epudiation | Can a user deny this action? | Audit logging, timestamps |
+| **I**nfo Disclosure | What leaks? | Error messages, logs, responses |
+| **D**oS | Can this be flooded/exhausted? | Rate limits, resource bounds |
+| **E**levation | Can I access what I shouldn't? | Authorization checks, role validation |
+Each answer: grep-backed or "N/A because X". **Mark N/A explicitly, never skip silently.**
+**OPS lens** (overlayed on STRIDE): unclosed connections, memory leaks, locks, behavior at 100x volume.
+## Pass 3: Killer checklist (all levels)
+- Same field = same validation everywhere? (grep to verify)
+- Same domain = same auth on ALL transports (REST + WS + SSE)?
+- Identity fields resolved server-side, never client-supplied?
+- SQL parameterized, never interpolated?
+- PII touched = anonymization covered?
+Each item: evidence (`file:line` or grep output) or N/A.
+## Output format
+```
+## STRIDE ANALYSIS
+### Risk rank: <Critical | Important | Routine>
+Signals: <list>
+### STRIDE (if Critical/Important)
+- S (Spoofing): <N/A because X | RISQUE: ... — evidence: file:line>
+- T (Tampering): <...>
+- R (Repudiation): <...>
+- I (Info Disclosure): <...>
+- D (DoS): <...>
+- E (Elevation): <...>
+OPS: <connections | memory | locks | 100x volume>
+### Killer checklist
+- [✓/✗] Same validation everywhere — evidence: <grep output>
+- [✓/✗] Auth parity across transports — evidence: <...>
+- [✓/✗] Identity server-side — evidence: <...>
+- [✓/✗] SQL parameterized — evidence: <...>
+- [✓/✗] PII anonymization — evidence: <...>
+### VERDICT
+BLOCKING: <list or none>
+IMPORTANT: <list or none>
+```
+## How to verify
+- [ ] Pass 1 (Risk rank) completed with mechanical signals?
+- [ ] Pass 2 (STRIDE 6 categories) — all categories have findings or explicit "N/A because X"?
+- [ ] Pass 3 (Killer checklist) completed?
+- [ ] VERDICT issued (PROCEED / BLOCK / INVESTIGATE)?
+- [ ] Evidence format: `file:line` or grep output?
+## Key rules
+- **Don't skip categories silently**: every STRIDE category gets a finding or explicit "N/A because X"
+- **Evidence format**: `path/to/file.ext:123` or `grep -n "pattern" src/` output
+- **Rotate stale items**: if a checklist item catches nothing in 10+ audits, consider replacing it