npm - @neikyun/ciel - Versions diffs - 6.10.1 → 6.11.1 - Mend

@neikyun/ciel 6.10.1 → 6.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

package/assets/skills/workflow/stride-analyzer/reference.md ADDED Viewed

@@ -0,0 +1,144 @@
+# stride-analyzer — Reference
+## STRIDE — detailed category probes
+### S — Spoofing (identity)
+Can I impersonate another user/service/system?
+Probes:
+- Grep for `userId` / `user_id` coming from request params vs resolved server-side (JWT, session)
+- Grep for identity claims trusted without verification (e.g. `X-User-Id` header accepted as-is)
+- Check auth middleware ordering: is authentication before authorization?
+- WebSocket/SSE: is the same auth applied? (common gap: REST auth is bulletproof, WS accepts any token)
+Evidence format:
+```
+- Spoofing: userId extracted from JWT claim at JwtMiddleware.kt:45 — not client-supplied ✓
+```
+### T — Tampering (data integrity)
+Can input be modified in transit or at rest without detection?
+Probes:
+- HTTPS everywhere? Grep for `http://` (non-localhost)
+- CSRF tokens on state-changing endpoints?
+- Signed cookies / signed JWTs? What algorithm? (HS256 vs RS256 considerations)
+- Database writes: is the audit trail immutable? (INSERT-only tables for events)
+### R — Repudiation (non-denial)
+Can a user deny having performed an action?
+Probes:
+- Audit log coverage: what events are logged? With what identity?
+- Log tampering resistance: append-only? Logged externally?
+- Timestamp source: server-controlled? Synced?
+### I — Information Disclosure
+What information leaks to unauthorized parties?
+Probes:
+- Error messages: do they include stack traces / SQL / paths / credentials?
+- Logs: do they contain PII, secrets, tokens?
+- API responses: over-fetching? `SELECT *` instead of projected columns?
+- 404 vs 403 distinction: timing attack on existence probe?
+- Autocomplete endpoints: leak usernames / emails?
+### D — Denial of Service
+Can this be flooded or exhausted?
+Probes:
+- Rate limiting: per-IP? per-user? per-endpoint?
+- Resource bounds: max payload size? max query depth (GraphQL)? max file upload?
+- Algorithmic complexity: O(n²) loops on user-controlled n?
+- Connection pooling: max connections? timeout?
+- Regex catastrophic backtracking on user input?
+### E — Elevation of Privilege
+Can I access what I shouldn't?
+Probes:
+- RBAC/ABAC correctness: does the permission check run before the action?
+- Horizontal privilege escalation: can user A read user B's data with API manipulation?
+- Vertical privilege escalation: can user become admin via some path?
+- Mass assignment: can user set `isAdmin` via PATCH body?
+## OPS lens (overlayed on STRIDE)
+- **Unclosed connections**: grep for `conn.close()` / `client.close()` / `try-with-resources` / `use {}` — every open should have a close
+- **Memory leaks**: long-lived caches without eviction? Unbounded collections? Listeners not removed?
+- **Locks**: deadlock-prone order? Held across I/O?
+- **100x volume**: if traffic grew 100x tomorrow, what breaks first?
+## Killer checklist — detail
+### Same field = same validation everywhere
+If `email` is validated one way in `RegisterRoute.kt` and another way in `ProfileUpdateRoute.kt`, an attacker uses the weaker one. Validation must be centralized.
+```bash
+# Find all places email is validated
+grep -rn "email" --include='*.kt' src/ | grep -iE 'valid|sanitize|check'
+```
+Evidence: all call sites converge on a single validator.
+### Same domain = same auth on ALL transports
+REST endpoint has auth; WebSocket channel for the same resource doesn't (or uses different auth). Attacker bypasses via WebSocket.
+```bash
+grep -rn "authenticate" src/ --include='*.kt'
+grep -rn "socket\|websocket\|sse\|webFluxClient" src/
+```
+### Identity resolved server-side
+```bash
+# Any userId coming from request body/path?
+grep -rn 'call.parameters\["userId"\]' src/
+grep -rn 'request.body.userId' src/
+# Should all be via JWT/session claim
+```
+### SQL parameterized
+```bash
+# Find string interpolation in SQL
+grep -rn "\\\$" src/ --include='*.kt' | grep -iE 'sql|query'
+grep -rn "\"SELECT.*\"\ +\ " src/
+```
+### PII anonymization
+```bash
+# Find logging of user fields
+grep -rn "logger.info.*user" src/
+grep -rn "println.*email\|println.*phone" src/
+```
+## Multi-PR delegation
+When the same reviewer has done 2+ STRIDE passes on related PRs in one session, blind spots compound. Delegate the 2nd pass to a subagent:
+```
+Task(subagent_type="Explore", prompt="""
+Run STRIDE PASSE 2 on this diff. Fresh eyes, no session history.
+CHANGED_FILES: [...]
+FOCUS: category you feel is weakest
+""")
+```
+## Stale item rotation
+Tracked via `learnings-capture`: if a killer checklist item passes (✓) in 10+ audits without catching anything, flag for review. Either:
+- The codebase is genuinely clean on that dimension → consider removing item
+- The item is too vague to fail → tighten the check
+Replace with a newer, more specific check.

package/assets/skills/workflow/test-strategy-vitest-playwright/SKILL.md ADDED Viewed

@@ -0,0 +1,119 @@
+---
+name: test-strategy-vitest-playwright
+description: How to plan a test strategy — test pyramid (70/20/10), what to test at each level (unit/integration/E2E), what to mock vs hit real, property-based testing for boundaries, and keeping the suite fast. 2026 convention: browser-native runners, accessibility-tree assertions over screenshots.
+allowed-tools: Read, Grep, Glob, Bash
+---
+# Test Strategy — Pyramid, Not Ice-Cream Cone
+## What this covers
+How to decide which tests go where, what to mock, and how to keep a test suite fast. The anti-pattern is 70% E2E Playwright, 5% unit — slow CI, flaky, expensive. The 2026 pyramid: most tests at the unit level, very few real-browser E2E.
+## Core principle
+**Most tests should be unit tests.** E2E is for critical user paths across 3+ components, not coverage inflation. If you're writing E2E because "it's hard to isolate", the code needs a refactor, not more tests.
+## The 2026 pyramid (target ratios)
+```
+        ┌───────────────┐
+        │  E2E (10%)     │  Playwright — critical user paths only
+        ├───────────────┤
+        │  Integ (20%)   │  Vitest + MSW (no real network) OR test DB
+        ├───────────────┤
+        │                │
+        │  Unit (70%)    │  Vitest — pure logic, reducers, utils
+        │                │
+        └───────────────┘
+```
+Property-based (`fast-check`) crosscuts all levels for boundary conditions.
+## Unit testing (Vitest)
+**When**: pure function, reducer, class method with deterministic input→output.
+- Test ONE behavior per test (not "mega tests")
+- No real filesystem, no real network, no real DB
+- Run in < 50ms each
+- Should fail if implementation logic breaks (not if formatting breaks)
+```typescript
+it('paginates offset correctly when page is 0', () => {
+  expect(paginate({ page: 0, size: 10 }).offset).toBe(0);
+});
+```
+## Integration testing (Vitest + MSW)
+**When**: module touches an external system (HTTP API, DB, cache) but you want fast deterministic runs.
+- MSW mocks the HTTP layer at the network level (not at the `fetch` level)
+- Seed the test with a realistic fixture response
+- DB: use `vitest-environment` + SQLite in-memory OR Testcontainers for the real engine
+- Run in < 500ms each
+**Anti-pattern**: mocking your own modules. If you mock your own user-service, you're just testing that you wrote mocks correctly.
+## E2E testing (Playwright)
+**When**: critical user path across ≥ 3 components (login → browse → checkout → confirm).
+- **Accessibility-tree assertions** (`page.getByRole('button', { name: 'Submit' })`) — deterministic, doesn't break on CSS changes
+- **Avoid screenshot assertions** for behavior — use for visual regression only, and only on static content
+- Seed DB via a test setup script, NOT through the UI (too slow)
+- One test = one user journey, not twelve
+## Property-based testing (fast-check)
+**When**: boundary conditions are the risk — off-by-one, null, empty, max int, unicode.
+- State the PROPERTY ("sorting is idempotent: sort(sort(x)) === sort(x)")
+- Let fast-check generate 100+ inputs
+- Use `fc.pre()` to filter invalid inputs (not to avoid branches of logic)
+## What to mock, what to hit real
+| System | Mock? | Rationale |
+|---|---|---|
+| External HTTP APIs | Yes (MSW) | Flaky, slow, rate-limited |
+| Internal microservices | Yes (MSW) for unit/integ; real for E2E | Keep blast radius small |
+| Database | Real (in-memory or container) | Too many bugs hide in ORM/raw-SQL mismatch |
+| Time (`Date.now`) | Yes (vi.useFakeTimers) | Non-determinism otherwise |
+| Randomness | Yes (seeded PRNG) | Same reason |
+| Filesystem | Real (temp dir) for integ; mock for unit | `memfs` is fine for pure tests |
+| Auth tokens | Real signed test token | Mocked tokens hide signature-validation bugs |
+| Third-party SDK | Mock at module boundary | Not at network level |
+## Key points
+- Pyramid ratios are targets, not strict quotas — a pure-UI feature may skew E2E higher; a pure-algorithm feature may be 95% unit
+- No E2E without unit first
+- One test per behavior — tests named `it('does many things', ...)` are code smell
+- Avoid snapshot tests for dynamic output — they become "update snapshots" rituals that don't catch bugs
+- Accessibility-tree > CSS selectors in Playwright — `getByRole` survives refactors
+- Flaky test policy: first flake → debug. Second flake → quarantine (`test.skip` + ISSUE). Third flake → delete unless Critical
+## Common anti-patterns
+1. **Ice-cream cone**: 70% E2E, 5% unit — slow, flaky, expensive to maintain
+2. **Coverage theater**: high coverage number but tests don't catch real bugs
+3. **Mocking yourself**: mocking your own modules proves nothing except that mocks work
+4. **Mega-tests**: one test covering 5 scenarios — split them
+5. **Screenshot assertions for behavior**: brittle, break on font changes; use accessibility-tree assertions instead
+6. **E2E as unit replacement**: E2E tests are 100x slower, use them only for integration across real browsers
+## How to verify your test strategy is good
+- **Runtime budget**: total suite < 3 min for pre-commit + CI
+- **Mutation testing**: change production code → does a test fail?
+- **New person test**: can someone understand the feature from tests alone?
+- **Bug regression**: when a bug is found, add a test that would have caught it at the lowest level possible
+## References
+- defined.net/blog/modern-frontend-testing — Vitest + Storybook + Playwright stack
+- playwright.dev/docs/best-practices — accessibility-tree assertions
+- fast-check.dev — property-based testing in TS/JS
+- hypothesis.works — property-based testing in Python (equivalent concepts)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@neikyun/ciel",
-  "version": "6.10.1",
+  "version": "6.11.1",
   "description": "Ciel — Deep-reasoning pipeline for LLM-assisted development. OpenCode plugin + multi-platform CLI (OpenCode, Claude Code, more).",
   "main": "./dist/plugin/index.js",
   "types": "./dist/plugin/index.d.ts",