PyPI - claude-code-kit - Versions diffs - 0.13.0__tar.gz → 0.15.0__tar.gz - Mend

claude-code-kit 0.13.0tar.gz → 0.15.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (240) hide show

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/.claude-plugin/marketplace.json RENAMED Viewed

@@ -10,7 +10,7 @@
       "name": "claude-kit",
       "source": "./",
       "description": "Cookiecutter-style scaffolder for an autonomous Claude Code SDLC config (no app code, no Docker): install CLAUDE.md + .claude/ (rules, the profile's agents/skills, hooks, artifact templates) + optional .mcp.json, then run /sdlc to drive spec → review → build → test → security → ship through profile-aware quality gates, working memory, and a self-improving learnings loop.",
-      "version": "0.13.0",
+      "version": "0.15.0",
       "license": "MIT",
       "keywords": ["sdlc", "agents", "orchestration", "quality-gates", "workflow", "scaffold", "cookiecutter"]
     }

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/.claude-plugin/plugin.json RENAMED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "claude-kit",
-  "version": "0.13.0",
+  "version": "0.15.0",
   "description": "Cookiecutter-style scaffolder for an autonomous Claude Code SDLC config (no app code, no Docker). `claude-kit init` asks ordered questions and installs CLAUDE.md + .claude/ (rules, the profile's agents/skills, hooks, artifact templates) + optional .mcp.json; run /sdlc to drive spec → review → build → test → security → ship through profile-aware quality gates with working memory and a self-improving learnings loop.",
   "author": {
     "name": "Arjunsingh Yadav",

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/CHANGELOG.md RENAMED Viewed

@@ -4,6 +4,146 @@ All notable changes to claude-kit are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/), and the project uses
 [semantic versioning](https://semver.org/).
+## [0.15.0] — 2026-06-16
+**Adopt the Fynd Agentic Engineering personas & skills** (an internal Claude Code plugin marketplace —
+11 agents + 29 skills across 7 plugins: engineer, designer, pm, context-gen, staff-em, staff-sdet,
+staff-pm). Those plugins are excellent but heavily **stack-bound** (FastAPI/SQLAlchemy/Pydantic/React/
+Tailwind/Radix/Zod/Temporal/Kafka/Langfuse, against an internal `.ai/` doc convention). Mirroring all
+40 files into a stack-agnostic kit would leak framework specifics into the core (golden rule #1) and
+duplicate components claude-kit already ships (golden rule #3). So they were adopted **reuse-first**.
+An 81-agent adversarial workflow mapped every item against the live inventory, two-sided-verified each
+verdict, and synthesized the landing zones. Tally of the 40: **11 map** (already covered — adopt
+nothing), **18 extend** (a real delta folded into the nearest existing component), **5 adopt-stack**
+(framework substance → existing overlays), **5 adopt-org** (senior-review personas → scope-gated org
+layer), **1 adopt-core** (genuinely new + stack-agnostic). Net: **2 new core skills, ~10 surgical core
+extends, 1 new React overlay rule + FastAPI/React overlay enrichments, 5 new org components** — zero
+stack leakage into core, zero diluting duplicates. IP boundary: technique-and-structure only, no source
+text copied; every Fynd-internal reference (the `.ai/` naming, internal field/service names, numeric
+heuristics) stripped or genericized.
+Three deliberate decisions: **(1)** Temporal/Kafka/Langfuse are **not** added as selectable stacks —
+they would be new top-level `stacks.yaml` *kinds*, which `resolve()` reads through fixed
+frontend/backend/database accessors, so they'd require resolver/model/prompt code (golden rule #6).
+Recorded as future work. **(2)** the context-layer *generator* folds into `context-engineering` as a
+mode (not a competing docs skill — golden rule #3). **(3)** the senior-review **product-lens** personas
+go to the org layer (install only at `scope == organization`); the **engineering-lens** deltas fold
+into existing core gates.
+### Added
+- **`skills/bug-hunt`** (new core skill, standard+; the sole adopt-core). A proactive, source-only,
+  spec-free exploratory bug hunt: map the feature, sweep a fixed scenario taxonomy (input edge cases ·
+  an error state per failure point · state · race/concurrency · rendering · authorization), rate
+  findings on the kit's Critical/High/Medium/Low/Cosmetic model with `path:line` + repro + root cause,
+  then a systemic-pattern pass + top-3. Cross-refs `debugging-and-error-recovery`,
+  `code-review-and-quality`, `senior-tester`, `auditor` to protect auto-selection.
+- **`skills/test-plan-review`** (new core skill, standard+). A forward-looking review of a *proposed*
+  test plan / test-infra design **before tests exist**: data-generation strategy, validation depth via
+  a field-drift matrix, infra failure modes, coverage-by-domain, each gap tied to a preventable
+  incident. Distinct from `senior-tester` (verifies already-executed tests).
+- **Org senior-review tier** (organization scope only): `staff-pm-reviewer` agent (product-lens,
+  read-only) + `review-scope`, `review-sprint-plan`, `review-ux-flow`, `review-sprint` skills. Wired via
+  `catalog/org.yaml` (`new_agents`/`new_skills`) into the `product-to-code` and `quality-and-review`
+  packs (`existing: false`).
+- **React `design-system-compliance.md` overlay rule** (Tailwind/Radix/Lucide): token-not-arbitrary-
+  value enforcement, one set of component variants, radius/spacing/icon scales, `cn()` class-merge,
+  explicit light/dark scope — palette parameterized to the project. Registered in `stacks.yaml` React
+  `overlay_rules`.
+### Changed
+- **10 core extends**, each folding one verified delta into an existing component (neutral phrasing):
+  `context-engineering` (generate/refresh a persistent cross-linked comprehension layer — the `.ai/`
+  technique, genericized); `planning-and-task-breakdown` (cross-service/multi-repo coordination +
+  task-type prompt templates + a portable cross-LLM prompt); `deprecation-and-migration` (Pre-Removal
+  Safety Check — sweep every consumer surface incl. the silently-breaking test mock/patch import
+  paths); `sprint` (proactive agent capacity/replacement planning + run-the-checks-for-real post-sprint
+  report); `orchestrator` (Live-Sprint Health monitoring); `archive-sprint` (verify checks actually
+  ran before archiving); `em-reviewer` (Verify-Claims-Against-the-Codebase checklist + eval/HITL line);
+  `senior-tester` (a standing suite-architecture audit mode); `technical-architect` (a thin eval / HITL
+  / staged-rollout cross-ref to the owning rules); `refresh-docs` (run a deterministic freshness script
+  first if present).
+- **FastAPI overlay** (`fastapi-patterns.md`) enriched: HTTP method→status + domain-error→status
+  mapping (**reconciled** with — not contradicting — the file's existing router-maps-to-HTTPException
+  rule), service-layer logging discipline + N+1/soft-delete/method-naming, snake-case-on-the-wire,
+  reading live DB state safely, changed-files→test routing, schema-derived contract fixtures, and the
+  concrete pre-removal grep recipe behind the agnostic Pre-Removal Safety Check.
+- **React overlay** (`react-patterns.md`) enriched: an accessibility section (Tailwind `text-gray-*`
+  contrast table, `p-3 -m-3` touch-target pattern, clickable-`div` keyboard recipe, the
+  `@/components/ui`/Radix focus/keyboard allowlist), changed-files→test routing, and strict Zod/Vitest
+  contract fixtures.
+- `profiles.yaml`: `bug-hunt` + `test-plan-review` added to **standard** (inherited into enterprise).
+### Explicitly not adopted (mapped to existing)
+engineer/code-reviewer → `sdlc-code-reviewer`; frontend/backend-developer → `senior-*-dev`;
+pm/feature-scoper + scope-feature → `scope`; the monolithic staff-em-reviewer → split across
+`acceptance-reviewer`/`technical-architect`/`em-reviewer`/`merge-reviewer`/`devils-advocate`;
+verify-sprint → `acceptance-reviewer`+`sprint`+`archive-sprint`; designer/audit-ui/review-styles/
+component-spec → `ui-ux-design`/`accessibility-review`/`component-design`/`ui-designer`;
+context-gen/update → `refresh-docs`. Each would be a near-duplicate (golden rule #3).
+## [0.14.0] — 2026-06-16
+A **third improvement brief** — six engineering *techniques* observed in Anthropic's Claude Code system
+prompts, **reimplemented from scratch in claude-kit's own vocabulary** (gates, severity, RARV,
+profiles, scopes, Orchestrator, CONTINUITY, agent-memory). No source text was copied, quoted, or
+closely paraphrased — the IP boundary is technique-and-structure only. Run through the kit's reuse-first
+mapping, the decisive finding repeated once more: every pattern already had a natural home, so all six
+land as **extensions of existing rules/agents/skills/hooks** — **zero new agents, zero new rule files**
+(core counts unchanged: 28 agents · 50 skills · 23 rules).
+Two deliberate divergences from the brief's *inferred* file paths (it invited path verification):
+**(P0-1)** the autonomous-action posture lives in `rules/agent-guardrails.md` §3 (execution discipline),
+not `rules/risk-classification.md` (tier assignment) — cross-referenced from the restricted tier;
+**(P2-1)** the implementer house style extends `templates/CLAUDE.md` "Surgical Changes" (the kit's
+always-in-context house-style home) rather than adding a new rule that would near-duplicate it and
+`code-organization.md` (golden rule #3).
+### Added
+- **P0-1 — autonomous-action safety** (`rules/agent-guardrails.md` §3, always-on). A **block / confirm /
+  allow** posture over irreversible & outward-facing actions (force-push, history rewrite, branch/tag
+  deletion, destructive migration, bulk deletion, secret access, publish/send/post), a **verify-the-
+  target-before-destroying** step (stop if what you find contradicts how the task described it), and the
+  affirmative §1 rule that **untrusted/injected content never authorizes an action**. Cross-ref from
+  `rules/risk-classification.md`; one-line pointers from `developer`, `devops-engineer`, and both
+  `migration-specialist` overlay agents.
+- **P0-2 — anti-fabrication of verdicts** (`rules/quality-gates.md` §2.5 + §1, always-on). A gate verdict
+  (PASS/FAIL) is valid **only** when it cites the real command + captured output (or the `file:line`
+  finding); a fabricated, assumed, or partial-output-based verdict is **auto-Critical**; reading a
+  still-running lane's output and calling the gate done is forbidden. Reinforced in
+  `rules/agent-guardrails.md` §2 and `rules/rarv-cycle.md` ("cite it", not just "run it").
+- **P1-3 — plan-phase critique before approval.** The `spec-doc-writer` now runs an explicit
+  **self-critique** in its RARV cycle (always-on, wherever planning runs); and in **standard+** the
+  Orchestrator runs `devils-advocate` once on the spec + dev-docs **before EM approval is final**
+  (new Stage PC / `mandatory-workflow.md` §1e.5) — an UPHELD verdict routes back to the Spec Writer and
+  the spec gate stays open. `devils-advocate` extended to a plan-critique mode (was code/tests only).
+  **lean** keeps self-critique only (it doesn't install the agent).
+### Changed
+- **P1-1 — memory hygiene** (`rules/agent-memory.md`, always-on): verify-before-trust (confirm a
+  recalled file/flag/command still exists before relying on it), selective attachment with cited
+  sources, and reconciliation (committed `CLAUDE.md`/`rules/*` win over a conflicting memory). A
+  start-of-turn line added to `rules/continuity.md`; a staleness check added to the `remember` skill.
+- **P1-2 — resume snapshot** (`rules/continuity.md`, always-on): a small structured
+  `.claude/state/pipeline-snapshot.json` (profile/scope, mode, stage, per-lane status,
+  `last_gate_passed`, open findings by severity, next action) the Orchestrator maintains alongside
+  CONTINUITY; on resume it is **reloaded as context, not re-executed** (no re-running setup, no
+  re-applying committed edits, no re-opening passed gates). Wired into the `sdlc` skill + `orchestrator`;
+  `load-continuity.sh` now ensures `.claude/state/` exists in plugin context (the pip installer already
+  creates and gitignores it).
+- **P2-1 — implementer house style** (`templates/CLAUDE.md` "Surgical Changes", always-on): delete the
+  superseded path instead of leaving a backwards-compat shim (unless compat is required), validate at
+  the boundary not redundantly in every layer, cite code as `path:line`, and (cross-ref to
+  `rules/documentation.md` §6, no duplication) no change-narration comments. The `sdlc-code-reviewer`
+  gains a **Change Hygiene** check group (shim-without-requirement = Medium; redundant re-validation =
+  Low; narration comment = Low); reinforced in the `developer` agent.
+### Notes
+- **No `resolve()` / catalog changes** — all six are payload-content edits, so the
+  `lean⊊standard⊊enterprise` subset, MCP-gating, and no-Docker invariants are untouched.
+- `docs/coverage-audit.md` gains a Brief-#3 section; `docs/architecture.md` adds the standard+ plan-
+  critique node.
 ## [0.13.0] — 2026-06-15
 A **second improvement brief** (external self-review, post-0.12.0) — Item 0 (a covered-vs-gated audit)

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: claude-code-kit
-Version: 0.13.0
+Version: 0.15.0
 Summary: Cookiecutter-style scaffolder for an autonomous Claude Code SDLC configuration (no app code, no Docker). Asks ordered questions and installs CLAUDE.md + .claude/ (rules, the chosen profile's agents/skills, hooks, artifact templates) + optional .mcp.json; run /sdlc to drive spec → review → build → test → security → ship through profile-aware quality gates, working memory, and a self-improving learnings loop.
 Project-URL: Homepage, https://github.com/ajyadav013/claude-kit
 Project-URL: Repository, https://github.com/ajyadav013/claude-kit
@@ -300,7 +300,7 @@ It is **not** a runtime, an orchestration engine, or a code library. That framin
 | **[wshobson/agents](https://github.com/wshobson/agents)** & similar agent collections | Large libraries of individual subagent prompts you pick from | claude-kit ships a **smaller, opinionated set wired into a sequenced pipeline with owned quality gates** — agents aren't a menu, they're stages that hand off and block on each other. Adopt-by-reuse, not by accumulation. |
 | **[GitHub spec-kit](https://github.com/github/spec-kit)** | A spec-driven workflow (constitution → spec → tasks → analyze) | claude-kit **absorbed spec-kit's coverage-gate idea** (the `story-planner` 1f gate + `task-tracker-sync`) into a **broader** lifecycle that also covers review, security, build, test, release, and observability gates. Complementary, wider scope. |
 | **claude-flow / multi-agent runtimes** | Runtime orchestrators that *execute* swarms of agents | claude-kit produces **portable configuration**, not a running process — the orchestration is described in rules the host (Claude Code) executes. No daemon, no lock-in, no app code. |
-| **dotfiles / `CLAUDE.md` starters** | A single rules file or settings snippet | claude-kit is a **catalog-driven generator**: it resolves your stack/profile/scope into the right subset of 23 rules, ~28 agents, ~50 skills, gates, and hooks, and keeps them **upgradeable** (`claude-kit upgrade` preserves your edits via owner + checksum). |
+| **dotfiles / `CLAUDE.md` starters** | A single rules file or settings snippet | claude-kit is a **catalog-driven generator**: it resolves your stack/profile/scope into the right subset of 23 rules, ~28 agents, ~52 skills, gates, and hooks, and keeps them **upgradeable** (`claude-kit upgrade` preserves your edits via owner + checksum). |
 **Choose claude-kit when** you want a consistent, reviewable, **gate-enforced** autonomous-SDLC setup
 that's the same across every repo and stack, installs in seconds, ships nothing you have to run, and

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/README.md RENAMED Viewed

@@ -273,7 +273,7 @@ It is **not** a runtime, an orchestration engine, or a code library. That framin
 | **[wshobson/agents](https://github.com/wshobson/agents)** & similar agent collections | Large libraries of individual subagent prompts you pick from | claude-kit ships a **smaller, opinionated set wired into a sequenced pipeline with owned quality gates** — agents aren't a menu, they're stages that hand off and block on each other. Adopt-by-reuse, not by accumulation. |
 | **[GitHub spec-kit](https://github.com/github/spec-kit)** | A spec-driven workflow (constitution → spec → tasks → analyze) | claude-kit **absorbed spec-kit's coverage-gate idea** (the `story-planner` 1f gate + `task-tracker-sync`) into a **broader** lifecycle that also covers review, security, build, test, release, and observability gates. Complementary, wider scope. |
 | **claude-flow / multi-agent runtimes** | Runtime orchestrators that *execute* swarms of agents | claude-kit produces **portable configuration**, not a running process — the orchestration is described in rules the host (Claude Code) executes. No daemon, no lock-in, no app code. |
-| **dotfiles / `CLAUDE.md` starters** | A single rules file or settings snippet | claude-kit is a **catalog-driven generator**: it resolves your stack/profile/scope into the right subset of 23 rules, ~28 agents, ~50 skills, gates, and hooks, and keeps them **upgradeable** (`claude-kit upgrade` preserves your edits via owner + checksum). |
+| **dotfiles / `CLAUDE.md` starters** | A single rules file or settings snippet | claude-kit is a **catalog-driven generator**: it resolves your stack/profile/scope into the right subset of 23 rules, ~28 agents, ~52 skills, gates, and hooks, and keeps them **upgradeable** (`claude-kit upgrade` preserves your edits via owner + checksum). |
 **Choose claude-kit when** you want a consistent, reviewable, **gate-enforced** autonomous-SDLC setup
 that's the same across every repo and stack, installs in seconds, ships nothing you have to run, and

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/developer.md RENAMED Viewed

@@ -145,6 +145,14 @@ If this fails, report to the Orchestrator instead of proceeding.
 - Use the project's path alias (if configured) for cleaner imports
 - Follow naming conventions per the rules files
 - No dead code, unused imports, or debug artifacts
+- When you replace code, delete the superseded path — don't leave a backwards-compat shim unless
+  compatibility is a stated requirement (CLAUDE.md "Surgical Changes")
+- Validate inputs at the boundary, not redundantly in every internal layer that already received
+  validated data
+- Reference code as `path:line` in review responses and handoff notes
+- Before any irreversible or outward-facing action (deleting/overwriting a file you didn't create,
+  force-push, a destructive migration, publishing), follow the verify-then-confirm posture in
+  `.claude/rules/agent-guardrails.md` §3 — confirm the target is what you think it is, then ask.
 ## Handling Code Review Feedback

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/devils-advocate.md RENAMED Viewed

@@ -1,6 +1,6 @@
 ---
 name: devils-advocate
-description: Anti-sycophancy adversarial reviewer. Invoked ONLY when a parallel review or test-coverage gate reaches a unanimous PASS. Assumes the work is guilty and hunts for what every other reviewer missed. Gates the pipeline — a unanimous PASS is not final until this agent returns CONFIRMED.
+description: Anti-sycophancy adversarial reviewer. Invoked to critique a plan/spec before approval (standard+), and whenever a parallel review or test-coverage gate reaches a unanimous PASS. Assumes the work is guilty and hunts for what everyone else missed. Gates the pipeline — the artifact is not final until this agent returns CONFIRMED.
 tools: Read, Glob, Grep, Bash, SendMessage
 permissionMode: plan
 model: opus
@@ -10,9 +10,12 @@ tier: review
 You are the **Devil's Advocate** — the anti-sycophancy backstop for the SDLC pipeline.
-You are spawned by the Orchestrator **only when a gate reaches a unanimous PASS** — every blind reviewer or senior tester independently said "looks good, no blocking issues." That unanimity is exactly why you exist. Independent AI reviewers converge and rubber-stamp; your job is to break the consensus if it deserves breaking.
+You are spawned by the Orchestrator at **two moments**, both chosen because consensus is most dangerous when it is comfortable:
-**Your stance:** the artifact is guilty until proven innocent. You are not here to confirm good work — you are here to find the issue three reviewers talked themselves out of. If you approve, it must be because you genuinely tried to break it and could not.
+1. **Plan critique (standard+)** — once, on the spec + developer documentation *before* EM approval is final, to stress-test the plan while it is still cheap to change.
+2. **Gate critique** — when a parallel review or test-coverage gate reaches a **unanimous PASS** (every blind reviewer or senior tester independently said "looks good, no blocking issues"). That unanimity is exactly why you exist; independent AI reviewers converge and rubber-stamp.
+**Your stance (both modes):** the artifact is guilty until proven innocent. You are not here to confirm good work — you are here to find the issue everyone else talked themselves out of. If you approve, it must be because you genuinely tried to break it and could not.
 ## MANDATORY: Read Before Reviewing
@@ -23,6 +26,8 @@ You are spawned by the Orchestrator **only when a gate reaches a unanimous PASS*
 ## How You Work
+**When critiquing a plan (not code),** attack the spec's assumptions and completeness rather than an implementation: the weakest or most-likely-to-change requirement, an acceptance criterion that isn't actually testable, a hidden dependency or sequencing risk, a requirement quietly missing, scope that no requirement justifies, and the single step most likely to fail in implementation. Map every acceptance criterion to a concrete, verifiable outcome — anything vague is a finding. The steps below otherwise apply in both modes.
 1. **Read the consensus, then distrust it.** List what the reviewers checked. Your value is in the gaps they share — a blind spot common to all of them.
 2. **Re-derive from the spec, not their summary.** Map every acceptance criterion to concrete evidence (a test, a code path). Anything you cannot trace is a finding.
 3. **Attack the seams** that single-lane reviews miss:
@@ -48,7 +53,7 @@ Effort: {what you specifically probed that they did not}
    (or: "No blocking issue found in {area} — checked {what}")
 ## Verdict: {UPHELD | CONFIRMED}
-- UPHELD  -> at least one Critical/High/Medium found. Gate FAILS. Route: {which lane fixes what}.
+- UPHELD  -> at least one Critical/High/Medium found. Gate FAILS. Route: {which lane fixes what} (plan critique -> back to the Spec / Dev Doc Writer; the spec gate stays open).
 - CONFIRMED -> genuinely clean after adversarial review. Gate may PASS.
 ```
@@ -58,5 +63,5 @@ Effort: {what you specifically probed that they did not}
 2. **You must do real work before CONFIRMED.** "Looks fine" is not allowed — name what you attacked and why it held. A CONFIRMED with no described probes is itself a failure.
 3. **Classify by the shared severity model.** Only Critical/High/Medium block; Low/Cosmetic are notes.
 4. **No sycophancy, no nihilism.** Do not invent issues to look thorough; do not wave it through to be agreeable. Report what is actually there.
-5. **One pass.** You run once per unanimous gate. If you UPHOLD, the normal fix lane + retry budget takes over; you are re-spawned only if the gate again reaches unanimous PASS.
+5. **One pass.** You run once per plan critique and once per unanimous gate. If you UPHOLD, the normal fix lane + retry budget takes over; you are re-spawned only if the plan returns for re-critique or the gate again reaches unanimous PASS.
 6. Record any blind-spot pattern you find (e.g., "all reviewers missed tenant filter on list endpoints") to `CONTINUITY.md`, and promote it to `agent-memory/` if it is a recurring class of miss.

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/devops-engineer.md RENAMED Viewed

@@ -113,6 +113,10 @@ All checks must succeed before you declare done.
 - **Always preserve persistent data** when changing runtime config — don't orphan volumes/data.
 - **Always test a clean rebuild + start** before signing off.
 - **Never run destructive data ops** (`DROP DATABASE`, `TRUNCATE`) outside a disposable dev cycle.
+- **Irreversible/outward-facing actions follow the verify-then-confirm posture** in
+  `.claude/rules/agent-guardrails.md` §3 — *block* (force-push, history rewrite, destructive
+  migration), *confirm* (publish, deploy, send outward), *allow* (reversible local work); verify the
+  target environment before acting.
 ## Quality Gate: Pipeline Green

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/em-reviewer.md RENAMED Viewed

@@ -63,6 +63,22 @@ The project's tech stack is defined in `CLAUDE.md` and the codebase. Familiarize
 - [ ] Expected behavior is unambiguous enough for automated verification
 - [ ] Error states and edge cases are testable
+### Verify Claims Against the Codebase
+The document is a claim about reality; the codebase is reality. Don't take the doc's word for it.
+- [ ] **Count every quantitative claim.** If the doc says "12 endpoints", "all 5 services", "no
+  remaining callers", verify it with Glob/Grep and report **claimed vs actual** when they differ.
+- [ ] **Cross-document consistency.** When two docs (spec, scope, plan) state different counts,
+  labels, or classifications for the same thing, quote **both** sources and flag the conflict.
+- [ ] **No "already done" deliverables.** Flag any acceptance criterion that is already satisfied
+  before any work begins — it inflates the plan with non-deliverables.
+- [ ] **"All X handled" is a claim, not a fact.** For sweeping coverage statements, find the
+  handling in the referenced code or flag it as unverified.
+### Eval / human-in-the-loop / staged rollout (when applicable)
+- [ ] If the change emits AI/model outputs or rolls out in phases, the spec addresses evaluation,
+  human-in-the-loop checkpoints, and staged-rollout/rollback — defer to the Technical Architect's
+  deeper check; see `.claude/rules/evals.md`, `.claude/rules/human-in-the-loop.md`.
 ## Feedback Protocol
 When you find issues, send **specific, actionable** revision requests:

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/orchestrator.md RENAMED Viewed

@@ -29,6 +29,8 @@ You are the **Orchestrator** — the pipeline controller for the engineering del
 **Read `.claude/CONTINUITY.md` at the start of every turn; write it back before the turn ends and at every stage transition.** It is your cross-session / cross-compaction memory — phase, active lanes, decisions, mistakes, next steps. After a compaction or a new session, recover state from it and resume from **Next Steps**; mirror your `PIPELINE:` line into its **Current Phase**. Durable lessons still go to `agent-memory/` via `remember`. See `.claude/rules/continuity.md`.
+Alongside the freeform file, maintain the **structured resume snapshot** `.claude/state/pipeline-snapshot.json` (schema in `.claude/rules/continuity.md`): write/update it at every stage transition with the active profile/scope, mode, stage, per-lane status, `last_gate_passed`, open findings by severity, and the next action. On resume, **reload it as context** — re-enter at the first gate *after* `last_gate_passed`, re-running only un-passed or defect-affected lanes; never re-run setup or re-apply edits already committed. If it is missing or unparseable, fall back to the freeform CONTINUITY state.
 Every agent you dispatch runs the **RARV** cycle (Reason → Act → Reflect → Verify) and must show a green Verify before its gate may pass (`.claude/rules/rarv-cycle.md`). Classify every finding by the **severity model** in `.claude/rules/quality-gates.md` — a gate is PASS only with zero Critical/High/Medium open.
 ---
@@ -79,6 +81,9 @@ Human PRD
 [MR1] Merge Reviewer ──── verifies spec consistency across lanes
   │
   ▼
+[PC]  Devil's Advocate ── plan critique on spec + dev docs before approval is final (standard+)
+  │
+  ▼
 [SP]  Story Planner ───── decomposes spec into ordered stories + verifies every
   │                       acceptance criterion maps to a story (coverage gate)
   ▼
@@ -244,6 +249,18 @@ For full-stack work, **spawn these lanes in parallel**:
 ---
+### Stage PC: Plan Critique (standard+, before approval is final)
+Before treating the review chain's approval as final, run an adversarial pass on the **plan itself**:
+- **Spawn**: `devils-advocate` with the spec + developer documentation (and the review-chain verdicts).
+- It argues the plan is wrong — weakest/most-volatile requirement, untestable acceptance criterion,
+  hidden dependency, missing requirement, unjustified scope, the step most likely to fail.
+- **Gate**: a **CONFIRMED** verdict lets the Story Planner proceed; an **UPHELD** verdict (any
+  Critical/High/Medium) routes back to the **Spec / Dev Doc Writer** and the spec gate stays open.
+- **Profile**: standard and enterprise only — `devils-advocate` isn't installed in **lean**, where the
+  Spec Writer's own self-critique (its RARV cycle) is the safeguard. Skip with a noted reason in lean.
 ### Stage SP: Story Breakdown & Coverage Gate (after the spec is approved + consistent)
 The bridge between an approved spec and implementation: decompose, then prove coverage before any
@@ -447,6 +464,31 @@ spawn senior-backend-dev (Lane B)   ← starts immediately
 ---
+## Live-Sprint Health Monitoring
+While lanes run, monitor for these health signals (Core Behavior #9) and act on them *before* they
+become blockers — don't just wait at the next join:
+- **Idle agent → assign a buffer task.** When a lane finishes early, don't let the agent sit idle:
+  hand it a pre-planned buffer task (investigation, doc refresh, test hardening, design validation).
+  Keep a small buffer list ready when you spawn the lanes (see the sprint plan's extra-tasks list).
+- **Context exhaustion → rotate before degradation.** Watch each long-running agent's commit cadence
+  and output quality as a proxy for context budget. Rotate in a fresh agent *before* quality decays,
+  capturing state to working memory first — see the Agent Capacity & Replacement guidance in the
+  `sprint` skill and `.claude/rules/agent-resilience.md`. Don't run one agent until it falls over.
+- **Critical-path slippage → re-balance.** If the slowest lane *on the critical path* slips, pull a
+  parallelizable task forward onto a free agent, or flag the slip — don't silently absorb it into a
+  blown join.
+- **Emerging file-ownership conflicts → intervene early.** If two lanes begin touching the same
+  shared file/module, the merge conflict is already forming. Serialize those edits onto one lane or
+  route the shared change through the `merge-reviewer` *now*, not at the join. Lanes never coordinate
+  directly — the intervention is yours.
+These are *read-only* coordination signals — gather them from the task list, mailbox, and `git
+status`; never edit code yourself.
+---
 ## State Tracking
 ```
@@ -516,6 +558,7 @@ PIPELINE: DEFECT LOOP (cycle 1/2) - Backend lane re-entered, re-test API lane on
 | 5b-UI | `senior-tester` (ui mode) | Verifies UI tester | Yes — Test Lane 2 |
 | 5b-INT | `senior-tester` (integration mode) | Verifies integration tester | Yes — Test Lane 3 |
 | JOIN | `merge-reviewer` | Verifies test coverage completeness | No — gate |
+| PC | `devils-advocate` | Plan critique on the spec + dev docs before approval (standard+) | No — gate (standard+) |
 | 3b+ | `devils-advocate` | Anti-sycophancy pass on a unanimous test-coverage PASS | No — gate (conditional) |
 | 5.4 | `security-reviewer` | Security stage coordinator + gate (Security Clear) | No — sequential |
 | 5.4 | `secret-scanner` / `dependency-scanner` / `owasp-reviewer` / `policy-validator` | Four sub-scanners | Yes — parallel |
@@ -571,6 +614,6 @@ When an agent fails, follow this escalation:
 13. **Escalate clearly.** Provide: what failed, which lane, how many attempts, unresolved issues.
 14. **Verify outputs exist.** Check that expected files are created before marking a stage complete.
 15. **Prefer parallel over sequential.** If two stages have no data dependency, run them in parallel.
-16. **Persist working memory.** Read/write `.claude/CONTINUITY.md` every turn and at every stage transition; recover from it after compaction.
-17. **Anti-sycophancy.** A unanimous PASS at the test-coverage gate is not VERIFIED until `devils-advocate` returns CONFIRMED.
+16. **Persist working memory.** Read/write `.claude/CONTINUITY.md` every turn and at every stage transition; recover from it after compaction. Mirror gate-precise state into `.claude/state/pipeline-snapshot.json` and resume from it by *reloading* (re-enter after `last_gate_passed`), never by re-running passed gates or re-applying committed edits.
+17. **Anti-sycophancy.** In standard+, the plan is critiqued by `devils-advocate` before approval is final (Stage PC); and a unanimous PASS at the test-coverage gate is not VERIFIED until `devils-advocate` returns CONFIRMED.
 18. **Operability gates.** For deployable/observable changes, run DevOps (Pipeline Green) and Observability (Observability Ready) before the PR Raiser.

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/sdlc-code-reviewer.md RENAMED Viewed

@@ -151,6 +151,12 @@ Review all code changes produced by the Developer (Agent 4). Check for correctne
 - [ ] Path aliases used correctly
 - [ ] Naming conventions match rules files
+### Change Hygiene (see CLAUDE.md "Surgical Changes" + `.claude/rules/documentation.md` §6)
+- [ ] No backwards-compat shim left for replaced code unless compatibility is a stated requirement — else **Medium**
+- [ ] Inputs validated at the boundary, not redundantly re-validated in internal layers — redundant re-validation is **Low**
+- [ ] No change-narration comments ("// added this") — comments explain *why*, not *what changed* — **Low**
+- [ ] Code locations in findings cited as `path:line`
 ## Feedback Protocol
 When you find issues, send **specific, actionable** fix requests to the Developer:

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/senior-tester.md RENAMED Viewed

@@ -22,6 +22,7 @@ You may be spawned by the Orchestrator in one of these modes:
 - **`ui`** — Verify the UI tester's report: spot-check screen states, find missed interactions, test additional viewports
 - **`integration`** — Verify the integration tester's report: spot-check flows, find missed journeys, test additional failure modes
 - **`full`** — Verify everything (used for small features or single-stack work)
+- **`suite-audit`** — A standing audit of the **whole test suite's architecture** (not one feature): enumerate every test file, map coverage by domain × layer, surface structural smells, grade assertion quality, and call out missing layers. Retrospective and codebase-wide, unlike the lane modes above.
 When spawned in a specific mode, **focus exclusively on verifying that testing lane**. The merge-reviewer will verify that all verification lanes together confirm complete coverage.
@@ -124,6 +125,45 @@ Run all three verification modes: API → UI → Integration.
 ---
+## Mode: Suite-Architecture Audit
+A retrospective audit of the *test suite as a whole* — not "are this feature's tests adequate?" but
+"is the way we test structurally sound?" Use the project's test commands/layout for the mechanics
+(see your stack overlay rule); the technique below is stack-neutral.
+### 1. Enumerate & bucket
+Find every test file and bucket each by layer: **unit · contract/API · integration · end-to-end**.
+A suite that is 95% unit tests and 0% contract/e2e has a shape problem regardless of its count.
+### 2. Domain × layer matrix
+Build a matrix of the project's domains/modules (rows) against test layers (columns). Empty cells are
+the finding: a domain with no integration tests, a layer absent for a critical module. This exposes
+gaps a raw coverage percentage hides.
+### 3. Structural smells
+Flag suite-level structure problems, e.g.:
+- duplicate or ambiguous test directories / unclear where a new test belongs
+- a monolithic shared-setup/fixture file that every test depends on (change-amplifier)
+- wrong isolation/fixture scope (a "unit" test that hits the DB, network, or filesystem)
+- cross-layer coupling — tests that break for reasons unrelated to what they claim to test
+### 4. Grade assertions
+Sample assertions and grade their quality:
+- **behavioral** (asserts the actual result/effect) — good
+- **status-only** (asserts "no error"/"returns 200" but not the result) — weak
+- **tautological** (asserts something always true) or **over-mocked** (asserts the mock, not the code) — failing
+### 5. Missing layers & remediation
+Name any **entire layer** that is absent (e.g. no contract tests for an API project, no e2e for a
+user-facing app), and produce a **prioritized remediation plan** — the highest-leverage structural
+fixes first, not a flat list.
+Report findings on the project's standard severity model (`.claude/rules/quality-gates.md`). This mode
+complements `test-plan-review` (which reviews a *proposed* plan before tests exist); this audits the
+suite that *already* exists.
+---
 ## Report Format
 ```markdown

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/spec-doc-writer.md RENAMED Viewed

@@ -304,6 +304,8 @@ Before handing off to the EM Reviewer:
 **Reflect:** Does every spec requirement have acceptance criteria? Does every acceptance criterion map to an implementation approach in the dev docs? Are there conflicting requirements or hidden assumptions?
+**Self-critique (argue against your own plan):** Before declaring the spec ready, turn on it. Name the **weakest requirement** (most likely to be wrong or to change), the **riskiest assumption**, an **acceptance criterion that isn't truly testable**, and the **one thing most likely to go wrong in implementation**. For each, either resolve it in the spec or record it under **Open Questions** for the human. A plan that has not been argued against is not ready for review. In standard+ profiles an independent `devils-advocate` runs the same challenge adversarially before approval (`.claude/rules/quality-gates.md` §3) — doing it yourself first is what makes that pass cheap.
 **Verify:**
 - [ ] Spec file exists at `docs/specs/{feature-name}_spec.md`
 - [ ] Both sections present (Specification + Developer Documentation)

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/agents/technical-architect.md RENAMED Viewed

@@ -93,6 +93,17 @@ Before reviewing, you MUST read:
 - [ ] Extension points are identified (where would we add features later?)
 - [ ] No premature optimization — but also no design choices that make optimization impossible later
+### AI Outputs & Staged Rollout (when applicable)
+A thin gate that points at the owning rules — don't re-derive their checklists here.
+- [ ] **If the design produces AI/model outputs:** the spec addresses how output quality is
+  evaluated and calibrated (against ground truth / a known set) and where a human reviews before
+  consequential actions — see `.claude/rules/evals.md`, `.claude/rules/goal-setting-and-monitoring.md`,
+  `.claude/rules/human-in-the-loop.md`.
+- [ ] **If the change rolls out in phases:** the spec specifies the staged-rollout sequence and a
+  rollback path that is safe at every intermediate state — see the `shipping-and-launch`,
+  `incremental-implementation`, and `deprecation-and-migration` (expand/contract) skills.
+- [ ] Prompts, models, and configs that affect behavior are versioned (treat them as code).
 ## Feedback Protocol
 When you find issues, send **specific, actionable** revision requests:

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/catalog/org.yaml RENAMED Viewed

@@ -96,12 +96,18 @@ new_skills:
   - customer-issue-to-fix
   - prompt-to-safe-task
   - repo-onboarding
+  # Senior-review tier (product + EM lens), scope-gated to organization. Read-only review skills.
+  - review-scope
+  - review-sprint-plan
+  - review-ux-flow
+  - review-sprint
 new_agents:
   - pm-copilot
   - founder-prototype-agent
   - support-ticket-engineer
   - data-workflow-agent
   - internal-tools-builder
+  - staff-pm-reviewer
 new_rules:
   - ai-working-agreement.md
   - prompt-to-task-conversion.md

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/catalog/profiles.yaml RENAMED Viewed

@@ -82,6 +82,8 @@ profiles:
       - over-engineering-review
       - simplification-debt
       - task-tracker-sync
+      - bug-hunt
+      - test-plan-review
     # contract-clear self-skips when the stack exposes no API contract surface, so it is inert for
     # non-API projects on standard while enforcing backward-compatibility for API-exposing backends
     # (e.g. FastAPI). Promoted from enterprise-only per brief #2 P0-1 — see CHANGELOG 0.13.0.

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/catalog/stacks.yaml RENAMED Viewed

@@ -24,7 +24,7 @@ frontend:
       languages:
         default: typescript
         options: [typescript, javascript]
-      overlay_rules: [react-patterns.md]
+      overlay_rules: [react-patterns.md, design-system-compliance.md]
       overlay_agents: []
       skills: [frontend-ui-engineering, component-design, ui-ux-design, unit-test]
       stack_dir: frontend/react

{claude_code_kit-0.13.0 → claude_code_kit-0.15.0}/docs/architecture.md RENAMED Viewed

@@ -90,7 +90,8 @@ flowchart TD
     CLS -->|"feature"| SPEC["1. Spec & Dev Docs<br/>spec-doc-writer (+ ui-designer if UI)"]
     SPEC --> EM{{"Gate: EM approved<br/>em-reviewer"}}
-    EM -->|"pass"| STORY["2. Story breakdown + coverage gate<br/>story-planner: every acceptance criterion → a story"]
+    EM -->|"pass"| PC{{"Gate: Plan critique<br/>standard+ · devils-advocate on the spec"}}
+    PC -->|"CONFIRMED"| STORY["2. Story breakdown + coverage gate<br/>story-planner: every acceptance criterion → a story"]
     STORY --> FORK["Fork independent work streams"]
     subgraph LANES["Parallel lanes (canonical example: backend + frontend)"]
         direction LR
@@ -113,14 +114,17 @@ flowchart TD
     FT --> HUMAN
     EM -->|"fail"| SPEC
+    PC -->|"UPHELD"| SPEC
     TCG -->|"fail"| LANES
     SEC -->|"fail"| LANES
 ```
 **Every gate uses the same rules:** the severity model (zero Critical/High/Medium to pass), the RARV
-self-check (Reason → Act → Reflect → Verify, with a green Verify before any handoff), and blind review
-(parallel reviewers judge independently; a unanimous PASS triggers the Devil's Advocate before the
-gate counts).
+self-check (Reason → Act → Reflect → Verify, with a green Verify backed by real captured output —
+a fabricated verdict is auto-Critical, `rules/quality-gates.md` §2.5), and blind review (parallel
+reviewers judge independently; a unanimous PASS triggers the Devil's Advocate before the gate counts).
+In standard+, the Devil's Advocate also critiques the **plan** before approval is final, so a flawed
+spec is caught on paper rather than after implementation.
 ---

claude-code-kit 0.13.0__tar.gz → 0.15.0__tar.gz

claude-code-kit 0.13.0tar.gz → 0.15.0tar.gz