npm - @neikyun/ciel - Versions diffs - 6.10.1 → 6.11.1 - Mend

@neikyun/ciel 6.10.1 → 6.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/assets/skills/workflow/doc-validator-official/SKILL.md ADDED Viewed

@@ -0,0 +1,196 @@
+---
+name: doc-validator-official
+description: Before generating code that calls an external library, framework, or API, fetches the OFFICIAL documentation for the exact version in use and validates that each proposed API call (function name, signature, parameters, return type) exists as cited. Rejects reliance on Stack Overflow/blog posts when official docs exist. Forces citations for every non-trivial API use. The primary anti-hallucination gate for the RECHERCHE step.
+allowed-tools: Read, Grep, Glob, Bash, WebFetch, WebSearch
+---
+# doc-validator-official — Official docs first, blogs never
+LLM hallucination of APIs is the #1 coding failure mode (ISSTA 2025). Functions that don't exist, wrong version signatures, parameters invented, return types fabricated. Advanced RAG against official docs eliminates this class of bug.
+---
+## Inputs (infer before asking — see orchestrator's Autonomy protocol)
+```
+TARGET_STACK: [language + framework + version — e.g., "TypeScript 5.5 + React 19"]
+PROPOSED_APIS: [list of function/class/method calls the implementation will use]
+PACKAGE_SOURCES: [paths to package.json / go.mod / requirements.txt / Cargo.toml]
+```
+### Auto-inference sources (exhaust BEFORE asking the user)
+- **PACKAGE_SOURCES** → `find . -maxdepth 3 -name 'package.json' -o -name 'go.mod' -o -name 'requirements.txt' -o -name 'pyproject.toml' -o -name 'Cargo.toml' -o -name 'Gemfile'` — pick up every manifest without asking.
+- **TARGET_STACK** → derive from PACKAGE_SOURCES (read the files, extract versions of the key libs). Cross-check with `ciel-overlay.md`.
+- **PROPOSED_APIS** → parse from the user's task description + any referenced code diff. If user said "use stripe to refund X", APIs = `stripe.refunds.create`, `stripe.paymentIntents.retrieve`, etc.
+Only BLOCK if no manifest file exists at all (greenfield project with no deps yet) — then ask once "Which package.json / go.mod should I validate against?".
+---
+## Phase 1 — Extract exact versions
+Read package manifests. For each lib in PROPOSED_APIS extract the pinned version:
+```bash
+# npm/yarn/pnpm
+jq -r '.dependencies + .devDependencies | to_entries[] | "\(.key) \(.value)"' package.json
+# go
+grep -E '^\s*<lib>' go.mod
+# python
+grep -E '^<lib>' requirements.txt pyproject.toml
+```
+Record as `{lib_name, pinned_version, source_file:line}`.
+If version is a range (`^1.2.0`) → resolve the actual installed version from lockfile (`package-lock.json`, `yarn.lock`, `uv.lock`, `Cargo.lock`). Never validate against a range.
+---
+## Phase 2 — Locate official docs
+For each lib, find the CANONICAL doc URL for the exact version. Priority order:
+1. **Versioned docs site** — `https://reactjs.org/docs/v19.0.0/` or `https://fastapi.tiangolo.com/release-notes/`
+2. **Repo `/docs/` at the tag** — `https://github.com/org/repo/tree/v1.2.0/docs`
+3. **README at the tag** — `https://github.com/org/repo/blob/v1.2.0/README.md`
+4. **Context7 MCP** (if available) — provides up-to-date official docs for thousands of libs
+### Reject these sources
+- Stack Overflow answers (even highly upvoted — often stale)
+- Medium/dev.to blog posts (version drift, author may have been wrong)
+- AI-generated tutorials (recursion hazard)
+- Forum posts without corroboration by official docs
+These may GUIDE investigation but never JUSTIFY an API claim.
+---
+## Phase 3 — Validate each proposed API
+For each item in PROPOSED_APIS:
+1. **Fetch the official doc page** for that function/class.
+2. **Verify the signature matches** — function exists, parameter names and types match, return type matches.
+3. **Verify version availability** — "Added in vX.Y" metadata. If the pinned version < X.Y, the API doesn't exist in this project yet.
+4. **Capture citation** — URL + section header + (if possible) quoted signature.
+Output per API:
+```
+[VALID] lib.funcName(a: T1, b: T2): T3
+  Source: <URL>#section
+  Cited: "funcName(a, b) → T3 — Added in 1.4.0"
+  Pinned: 1.5.2 ✓
+```
+or:
+```
+[INVALID] lib.funcName — NOT FOUND in v1.5.2 docs
+  Similar: lib.otherFunc (did you mean this?)
+  Action: rename or choose a different lib
+```
+or:
+```
+[AMBIGUOUS] lib.funcName exists but signature differs
+  Doc says: funcName(a: string, opts?: Opts) → Promise<T>
+  Proposed: funcName(a, b) — missing opts wrapping
+  Action: rewrite call site to match doc signature
+```
+---
+## Phase 4 — Citation enforcement
+Every non-trivial API use in the final implementation MUST have a citation comment OR be documented in the PR description. Trivial = stdlib builtin (`Array.map`, `str.split`). Non-trivial = third-party lib, framework-specific, version-sensitive stdlib (e.g., `Intl.Segmenter`).
+Citation format in code (optional, acceptable if 3+ APIs would clutter):
+```typescript
+// Per react.dev/reference/react/useTransition (v19)
+const [isPending, startTransition] = useTransition();
+```
+Citation format in PR description (mandatory for Critical tasks):
+```
+## External APIs used
+- `react.useTransition` — react.dev/reference/react/useTransition (v19)
+- `drizzle-orm.select().from()` — orm.drizzle.team/docs/select (v0.33)
+```
+---
+## Phase 5 — Training-cutoff awareness
+If a lib in PROPOSED_APIS was released or had a major version AFTER your knowledge cutoff (January 2026), explicitly flag:
+```
+[CUTOFF-WARNING] lib <name> vX.Y (released 2026-MM-DD)
+  Your training data does not reliably cover this version.
+  MANDATORY: fetch live docs, do not rely on pattern-matching from memory.
+```
+---
+## Output format
+```
+## DOC VALIDATION
+### Versions resolved
+- react 19.0.2 (from package-lock.json:1234)
+- drizzle-orm 0.33.1 (from package-lock.json:5678)
+### API validation
+[VALID]      react.useTransition — react.dev/.../useTransition (v19)
+[VALID]      drizzle-orm.select — orm.drizzle.team/docs/select (v0.33)
+[INVALID]    drizzle-orm.raw — not in v0.33, renamed to sql.raw in v0.30+
+[AMBIGUOUS]  react.use — signature changed in v19, proposed call uses v18 shape
+### Cutoff warnings
+- drizzle-orm 0.33 (released 2026-02) — post-cutoff, relied on live fetch
+### Verdict
+BLOCKING: 1 INVALID, 1 AMBIGUOUS — cannot proceed until resolved
+```
+---
+## Guardrails
+- **Never infer an API from "it should exist"** — if you can't cite the doc page, the API doesn't exist for your purposes.
+- **Exact version, never range** — validating against a range produces false positives.
+- **Reject blog/SO as primary source** — they may CONFIRM, never ESTABLISH.
+- **Cutoff-flag everything post-January 2026** — your memory is wrong often enough to require external validation.
+- **If docs don't exist** (tiny lib, no website, just README) → read the source directly at the tag. No README + no source available → replace the lib.
+- **Budget**: 5 APIs × 2 min lookups = 10 min max. Beyond 10 APIs, batch via a single doc-site crawl or ask user to narrow.
+---
+## How to verify
+- [ ] Exact versions extracted from lock files?
+- [ ] Official docs located for each API call?
+- [ ] Each proposed API validated (function name, signature, params, return type)?
+- [ ] Citations enforced (file:line or URL for every API)?
+- [ ] Training-cutoff awareness applied (if lib updated after cutoff)?
+- [ ] VERDICT issued (VALID / INVALID / UNCERTAIN)?
+## When triggered
+- RECHERCHE step for Standard/Critical tasks using external libs
+- Before any code using a lib published/updated after your knowledge cutoff
+- When `@ciel-researcher` is dispatched for API design
+- When user says "use library X" and you have no strong prior
+- After `ai-failure-modes-detector` flags an invented-API risk
+---
+## References
+- ISSTA 2025 — "LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation"
+- arxiv 2404.00971 — "Beyond Functional Correctness: Exploring Hallucinations in LLM-Generated Code"
+- Mintlify — AI hallucination prevention via accurate docs
+- Context7 MCP — `@upstash/context7-mcp` for live official-doc retrieval

package/assets/skills/workflow/evaluer-sizer/SKILL.md ADDED Viewed

@@ -0,0 +1,112 @@
+---
+name: evaluer-sizer
+description: How to size and assess risk before coding — back-of-envelope sizing, pre-mortem (2 failure modes), recent-churn check, diverged alternatives (v5), and counterfactual ("what if we do nothing?"). For Ciel v5 pipeline step 9 (EVALUER). Use after DIVERGE and RECHERCHE, before ASK2.
+---
+# Pre-Implementation Sizing — 5 Cheap Gates (Ciel v5)
+## What this covers
+How to sanity-check an approach before committing to it. In v5, this step follows DIVERGE (etape 5) which explored 2-3 approaches. Here we evaluate the selected approach. These 5 gates take 2 minutes and prevent hours of wasted work.
+## Core principle
+**Quantify before coding.** "Small memory footprint" is not sizing. "~2 MB × 10k entries = 20 MB" is.
+## Gate 1: Back-of-envelope sizing
+Compute rough estimates:
+- Memory: bytes per row × row count
+- Connections: concurrent users × connections per user
+- Throughput: req/s × processing time per req
+- Storage: items × avg size × retention
+Does the solution fit in the budget? If caching requires 10 GB and the server has 2 GB -> wrong solution, don't start.
+## Gate 2: Pre-mortem
+State explicitly: "In production, this could fail in these 2 ways:"
+1. <failure mode 1>
+2. <failure mode 2>
+Can't imagine 2 failure modes -> don't understand the system well enough.
+## Gate 3: Recent churn
+```bash
+git log --oneline --since="7 days" -- <impacted files>
+```
+If 2+ commits in the last week touched the same module:
+- Read those commits BEFORE proposing your fix
+- Someone already fixed this area twice this week -> incomplete mental model
+- Your "fix" might be the 3rd attempt at the same bug
+## Gate 4: Diverged approach comparison (v5)
+Compare the approaches explored during DIVERGE (etape 5):
+- Approach A (from DIVERGE): <summary>
+- Approach B (from DIVERGE): <summary>
+- Selected: <A or B> because <reason>
+- Why NOT the other: <specific limitation, not "it's worse">
+If only 1 approach was explored -> DIVERGE was incomplete.
+## Gate 5: Counterfactual
+**Counterfactual**: "What if we do NOTHING?" If doing nothing solves 80% of the problem with 0 risk -> reconsider scope.
+## Output format
+```
+## EVALUER
+### Sizing
+- Memory: <estimate>
+- Connections: <estimate>
+- Throughput: <estimate>
+- Fit: <yes -- within budget | no -- what breaks>
+### Pre-mortem (2 ways this could fail)
+1. <failure mode>
+2. <failure mode>
+### Recent churn
+- Commits in last 7 days: <N>
+- Relevant: <list>
+- Read them? <yes -- findings>
+### Diverged approach comparison (v5)
+- A: <summary>
+- B: <summary>
+- Selected: <A/B> because <reason>
+### Counterfactual
+- What if nothing? <consequence>
+- 80% solve with 0 risk? <yes -> reconsider | no -> proceed>
+```
+## How to verify
+- [ ] Sizing has concrete numbers (request rate, data volume, latency budget)?
+- [ ] Pre-mortem identifies >= 2 specific failure modes?
+- [ ] Recent churn checked (git log for affected files)?
+- [ ] >= 2 approaches compared from DIVERGE?
+- [ ] Counterfactual stated ("what if we don't do this")?
+- [ ] Selected approach justified with specific reason?
+## Common rationalizations
+| Rationalization | Reality |
+|---|---|
+| "I'll size it as I go" | Sizing after coding is guessing. Sizing before prevents committing to the wrong approach. |
+| "I can't estimate without coding" | Back-of-envelope takes 2 minutes. 2 minutes of thinking saves 2 hours of coding the wrong thing. |
+| "Pre-mortem is pessimistic" | Pre-mortem is the cheapest bug fix you'll ever write. Imagining failure costs nothing. Production failure costs everything. |
+| "Diverging is a waste of time, the first approach is fine" | The first approach is rarely the best. It's just the first. Generating 2-3 approaches takes 5 minutes. Committing to the wrong one takes days. |
+## Common mistakes
+- **Hand-waving sizing**: "small footprint" without numbers
+- **Pre-mortem = tests**: "might have bugs" is useless. "Query times out at > 10k notifications" is useful.
+- **Fake alternatives**: "React over Assembly" is not real. "Page vs cursor pagination because API is public" is real.
+- **Single-approach bias (v5)**: if DIVERGE was skipped, EVALUER cannot compare alternatives. Go back to DIVERGE.

package/assets/skills/workflow/faire-gatekeeper/SKILL.md ADDED Viewed

@@ -0,0 +1,99 @@
+---
+name: faire-gatekeeper
+description: How to implement code safely — 6 quality gates for Ciel v5 (test-first, alternatives, idiomatic, quality, removal, boy-scout). SPIKE mode relaxes gates. A checklist for implementation discipline during FAIRE (etape 11).
+---
+# Implementation Safety — 6 Quality Gates (Ciel v5)
+## What this covers
+How to implement code with discipline during Ciel v5 FAIRE phase (etape 11). These gates run during coding and are enforced by the plugin or hooks. SPIKE mode relaxes some gates.
+## Core principle
+**Check gates per-file, not per-task.** Each write/edit gets its own gate check. In SPIKE mode, gates 1 and 6 are optional but the code must be marked FIXME/TODO.
+## The 6 gates (v5)
+### Gate 1: Test-first (RED)
+Do not write source code before the test exists. If no `*.test.*` file exists:
+- OpenCode: plugin blocks via tool.execute.before
+- Claude Code: hook blocks via exit 2
+**SPIKE mode**: relaxed. Exploration code may be written without tests, but must be marked FIXME/TODO.
+### Gate 2: Alternatives
+"I chose X over Y because [reason]." No Y named -> research alternatives first.
+### Gate 3: Idiomatic
+Common bypass signals that need justification:
+- `window.*` / `document.*` in React -> why not hook/ref/router?
+- `for` + raw SQL -> why not batch/ORM?
+- `catch(e) { return null }` -> why not Result/sealed class?
+- `as X` without type guard -> why not `is X`?
+- Copying a block for the 3rd+ time -> why not extract a helper?
+### Gate 4: Quality
+- Complexity: < 15 (cyclomatic)
+- Nesting: < 4 levels
+- Function length: < 50 lines
+- File length: < 400 lines
+**SPIKE mode**: relaxed. Exploration code may be longer or complex, but must be marked FIXME/TODO.
+### Gate 5: Removal safety
+If you are DELETING code:
+- Who uses it? (grep for imports/references)
+- What replaces it?
+- What degrades if not replaced?
+- Is there a migration path?
+### Gate 6: Boy-scout (v5)
+After the change: is the code better than before?
+- Minor improvements count: better naming, removed dead code, added missing test
+- If the file was already touched, leaving it better is cheap
+- If nothing to improve, note "status quo" explicitly
+## Output format
+```
+## FAIRE gates
+Gate 1 (test-first): <PASS | BLOCKED | SPIKE>
+Gate 2 (alternatives): <X > Y because ...>
+Gate 3 (idiomatic): <PASS | bypass justified: ...>
+Gate 4 (quality): <complexity N, nesting N, length N | PASS>
+Gate 5 (removal): <no removal | safe: ...>
+Gate 6 (boy-scout): <improved: ... | status quo>
+```
+## How to verify
+- [ ] Test exists (or spike mode active)?
+- [ ] Alternative considered and documented?
+- [ ] No unjustified framework bypass?
+- [ ] Complexity < 15, nesting < 4, function < 50 lines?
+- [ ] Removals checked for dependents?
+- [ ] Code better than before?
+## SPIKE mode behavior
+When `.ciel/exploration.active` exists:
+- Gate 1 (test-first): relaxed
+- Gate 4 (quality): relaxed
+- Gates 2, 3, 5: always active
+- Gate 6 (boy-scout): recommended but not blocking
+- Exploration code MUST be marked FIXME or TODO
+## Key rules
+- **Gates are non-negotiable in Standard/Critical mode**: plugin/hooks enforce them
+- **SPIKE mode is for exploration only**: gates are relaxed, but code must be refactored properly after
+- **Gate 5 matters most**: deleting code without checking dependents is the fastest way to break production
+- **Boy-scout is the cheapest improvement**: if you already read the file, clean it up

package/assets/skills/workflow/flux-narrator/SKILL.md ADDED Viewed

@@ -0,0 +1,93 @@
+---
+name: flux-narrator
+description: How to trace data flow through a system — trigger → handler → service → state → output, with boundaries, assumptions, and break points called out. Essential before implementation or test writing.
+allowed-tools: Read, Grep
+---
+# Data Flow Tracing — Narrate Before You Code
+## What this covers
+How to trace and narrate data flow through a system. If you can't narrate the flow, you don't understand the system — read more code before implementing.
+## Core narration
+Format: `"When [trigger] → [handler fires] → [function calls] → [data flows] → [output]"`
+Example:
+```
+When user clicks "Save" on ProfileForm →
+  → ProfileForm.tsx:handleSubmit (component boundary)
+  → useUpdateProfile hook fires (state boundary)
+  → fetch('/api/users/:id/profile', {method: 'PATCH'}) (network boundary)
+  → Ktor Route at routes/UserRoute.kt:PATCH /:id/profile
+  → UserService.updateProfile (service layer)
+  → UserRepository.save (DB layer)
+  → return HTTP 200 with updated user
+  → UI optimistically updates via React Query
+  → Toast notification: "Profile saved"
+```
+## 3 cross-cutting dimensions
+### BOUNDARIES
+Where does control pass between layers? Each boundary is a place where contracts can break.
+### ASSUMPTIONS
+What must be true for this flow to work? E.g. "assumes user is authenticated", "assumes DB connection is not exhausted".
+### BREAK POINTS
+Where can the flow fail WITHOUT visible error? E.g. silent swallowed exceptions, network retries that mask failures, caching that hides stale data.
+**Break points ≠ assumptions**: an assumption is "must be true"; a break point is "how it fails silently even when all assumptions hold".
+## Test-specific items (when writing tests)
+When the task involves writing tests, also determine:
+- **Test level**: unit / integration / E2E — justify the choice
+- **URL routing**: request `host:port` vs handler `host:port` — match or mismatch? (CI often differs from local)
+- **Mock lifecycle**: fires at module load? function call? render cycle?
+- **Timing**: expected delay in ms / CI runner capabilities (fake timers? timeout?)
+## Output format
+```
+## FLUX
+When <trigger>
+  → <layer 1: component/handler — file:function>
+  → <layer 2: service/function — file:function>
+  → <layer 3: DB/API/store>
+  → <output: state change / HTTP response / side effect>
+### Boundaries
+- <list: where control crosses layers>
+### Assumptions
+- <list: what must be true>
+### Break points (silent failures)
+- <list: how the flow fails without visible error>
+[If writing tests:]
+### Test-specific
+- Test level: <unit | integration | E2E> — <justification>
+- URL routing: MATCH ✓ | MISMATCH ⚠️
+- Mock lifecycle: <module load | function call | render>
+- Timing: <X ms>, CI: <capable | insufficient ⚠️>
+```
+## How to verify
+- [ ] ≥ 3 layers in the flow (trigger → middle → output)?
+- [ ] BOUNDARIES identified?
+- [ ] ASSUMPTIONS listed (what must be true)?
+- [ ] BREAK POINTS identified (silent failures)?
+- [ ] Narration based on grep (not memory)?
+## Key rules
+- **Minimum 3 layers**: trigger → middle → output. Only 2 = don't understand the flow.
+- **Don't narrate from memory**: grep the actual call graph. Pattern-matching produces plausible but wrong narrations.
+- **Test items mandatory when writing tests**: skipping any one risks CI/local mismatch or flaky tests.