npm - contextdevkit - Versions diffs - 1.8.0 - Mend

contextdevkit 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (345) hide show

package/templates/contextkit/behaviors-examples.md ADDED Viewed

@@ -0,0 +1,183 @@
+# Behavioral Discipline — examples
+> Concrete before/after diffs for the four guidelines in
+> [`behaviors.md`](./behaviors.md). Each shows what an LLM commonly does wrong and
+> the disciplined alternative. Examples are in Python for brevity; the *behavior*
+> is language-agnostic.
+>
+> Adapted from the MIT-licensed `karpathy-guidelines` skill (Andrej Karpathy's
+> observations on LLM coding pitfalls).
+---
+## 1. Think before coding
+### Hidden assumptions
+**Request:** "Add a feature to export user data."
+❌ **What LLMs do** — implement `export_users(format='json')` that dumps *all*
+users to a hardcoded file path with a guessed field list. Four silent
+assumptions (scope, location, fields, privacy), any of which can be wrong.
+✅ **Disciplined** — surface the assumptions and ask before coding:
+```
+Before implementing, I need to clarify:
+1. Scope: all users or a filtered subset? (privacy implications)
+2. Delivery: browser download, background job + email, or an API endpoint?
+3. Fields: which ones? (some may be sensitive)
+4. Volume: how many rows typically? (changes the approach)
+Simplest path: a paginated JSON endpoint. File exports need more info — preference?
+```
+### Multiple interpretations
+**Request:** "Make the search faster."
+❌ **What LLMs do** — silently add caching *and* indexes *and* async in one
+200-line diff, optimizing a dimension the user may not care about.
+✅ **Disciplined** — name the interpretations with rough cost, then let the user pick:
+latency (indexes/cache, ~2h) · throughput (async/pooling, ~4h) · perceived speed
+(progressive results, ~3h). "Current ~500ms typical — which matters most?"
+---
+## 2. Simplicity first
+### Over-abstraction
+**Request:** "Add a function to calculate a discount."
+❌ **What LLMs do** — `DiscountStrategy(ABC)` + `PercentageDiscount` +
+`FixedDiscount` + `DiscountConfig` + `DiscountCalculator`: 30+ lines of patterns
+for one calculation.
+✅ **Disciplined:**
+```python
+def calculate_discount(amount: float, percent: float) -> float:
+    """Discount amount. percent is 0-100."""
+    return amount * (percent / 100)
+```
+> Add the strategy pattern **when a second discount type actually exists** —
+> refactor then, not now.
+### Speculative features
+**Request:** "Save user preferences to the database."
+❌ **What LLMs do** — a `PreferenceManager` with injected cache + validator and a
+`save(..., merge=True, validate=True, notify=False)` signature: caching,
+validation, merging, and notifications nobody asked for.
+✅ **Disciplined:**
+```python
+def save_preferences(db, user_id: int, preferences: dict):
+    db.execute("UPDATE users SET preferences = ? WHERE id = ?",
+               (json.dumps(preferences), user_id))
+```
+> Add caching when performance bites, validation when bad data appears, merging
+> when the requirement emerges.
+---
+## 3. Surgical changes
+### Drive-by refactoring
+**Request:** "Fix the bug where empty emails crash the validator."
+❌ **What LLMs do** — fix the empty-email check *and* "improve" email validation,
+*and* add username length/alphanumeric rules, *and* add a docstring, *and* rewrite
+the comments. The bug fix is buried in unrelated changes.
+✅ **Disciplined** — change only the lines that fix empty-email handling:
+```diff
+  def validate_user(user_data):
+      # Check email format
+-     if not user_data.get('email'):
++     email = user_data.get('email', '')
++     if not email or not email.strip():
+          raise ValueError("Email required")
+```
+Username validation, docstrings, and comment rewrites are *not in this diff*.
+### Style drift
+**Request:** "Add logging to the upload function."
+❌ **What LLMs do** — while adding logging, switch `'single'` to `"double"`
+quotes, add type hints, insert a docstring, and reformat whitespace. Every one of
+those is noise that hides the real change and fights the file's existing style.
+✅ **Disciplined** — add the logging lines and **nothing else**; match the file's
+existing single quotes, no-type-hint, existing-boolean-pattern style. The diff
+contains logging and only logging.
+---
+## 4. Goal-driven execution
+### Reproduce before fixing
+**Request:** "The sorting breaks when there are duplicate scores."
+❌ **What LLMs do** — immediately rewrite the sort key without confirming the bug,
+so there's no proof the fix addresses the real failure (or that it won't return).
+✅ **Disciplined** — write the failing test first, *then* fix:
+```python
+def test_sort_with_duplicate_scores():
+    scores = [{'name':'Alice','score':100},{'name':'Bob','score':100},{'name':'Charlie','score':90}]
+    result = sort_scores(scores)              # run repeatedly — order must be stable
+    assert [s['name'] for s in result] == ['Alice', 'Bob', 'Charlie']
+# Verify it FAILS (reproduces the non-deterministic order), then:
+def sort_scores(scores):
+    """Score desc, then name asc for ties (stable)."""
+    return sorted(scores, key=lambda s: (-s['score'], s['name']))
+# Verify it now passes, consistently.
+```
+### Incremental with verification
+**Request:** "Add rate limiting to the API."
+❌ **What LLMs do** — a 300-line Redis-backed, multi-strategy, configurable
+system in one commit with no verification steps.
+✅ **Disciplined** — a plan where each step is independently verifiable:
+```
+1. In-memory limit, one endpoint → verify: 11th request gets 429
+2. Extract to middleware (all endpoints) → verify: applies to /users + /posts; old tests green
+3. Redis backend (multi-server) → verify: limit persists across restart; shared across instances
+4. Per-endpoint config → verify: /search 10/min, /users 100/min
+```
+---
+## Anti-pattern summary
+| Guideline | Anti-pattern | Fix |
+|---|---|---|
+| Think before coding | Silently assumes scope/format/fields | List assumptions, ask before coding |
+| Simplicity first | Strategy pattern for one calculation | One function until a 2nd consumer is real |
+| Surgical changes | Reformats + adds type hints while fixing a bug | Only the lines that fix the reported issue |
+| Goal-driven | "I'll review and improve the code" | "Failing test for X → make it pass → no regressions" |
+## The key insight
+The overcomplicated versions aren't obviously wrong — they follow real design
+patterns. The problem is **timing**: complexity added *before it's needed* is
+harder to read, more bug-prone, slower to ship, and harder to test. Good code
+solves **today's** problem simply; it doesn't pre-solve tomorrow's.

package/templates/contextkit/behaviors.md ADDED Viewed

@@ -0,0 +1,116 @@
+# Behavioral Discipline — how the agent *acts* while coding
+> The behavioral companion to [`best-practices.md`](./best-practices.md).
+> `best-practices.md` says **what good code looks like** (structure, naming,
+> errors); this file says **how to behave while producing it** (clarify before
+> coding, stay surgical, loop to a verified goal). The constitution in `CLAUDE.md`
+> has absolute priority; this file expands its behavioral clauses.
+>
+> **Source & credit.** Adapted from
+> [Andrej Karpathy's observations on LLM coding pitfalls](https://x.com/karpathy/status/2015883857489522876)
+> via the MIT-licensed `karpathy-guidelines` skill. Generalized to ContextDevKit's
+> voice and reconciled with the constitution.
+>
+> **Tradeoff (stated honestly).** These guidelines bias toward **caution over
+> raw speed**. For trivial, unambiguous tasks, use judgment — don't perform a
+> ceremony the work doesn't need. The discipline is a guardrail, not a tax.
+---
+## 1. Think before coding
+**Don't assume. Don't hide confusion. Surface tradeoffs — *before* the diff.**
+- **State assumptions explicitly.** If the request has gaps (scope, format,
+  fields, volume, privacy), name what you're assuming. If an assumption is
+  load-bearing and you're unsure, **ask** instead of guessing.
+- **Present interpretations; don't pick silently.** "Make the search faster" can
+  mean latency, throughput, or perceived speed — lay out the options with their
+  rough cost and let the user choose.
+- **Push back when warranted.** If a simpler approach exists, say so. If the ask
+  conflicts with an immutable rule or an ADR, refuse and explain — silently
+  complying with a bad instruction is not service.
+- **Stop on genuine confusion.** Name what's unclear and ask. One clarifying
+  question now beats a wrong 200-line diff later.
+**Fits the kit.** This is the moment-to-moment companion to `/new-adr` (decide
+the big call *before* implementing) and `/dev-start` (lock the objective). The
+`architect` and `product-owner` agents exist precisely to resolve ambiguity —
+route to them when the unknown is design or scope.
+## 2. Simplicity first
+**The minimum code that solves the problem. Nothing speculative.**
+- No features beyond what was asked. No flags, options, or "configurability"
+  nobody requested.
+- **No abstraction for single-use code.** A strategy pattern / factory / base
+  class earns its place when a *second* real consumer appears — not before.
+- No error handling for impossible scenarios; handle the failures that can
+  actually happen at the boundary (constitution §4).
+- If you wrote 200 lines and 50 would do, rewrite it. Ask: *"Would a senior
+  engineer call this overcomplicated?"* If yes, simplify before showing it.
+**Fits the kit.** This is constitution §9 ("build what is asked; defer the
+rest") and §1 (line budget) as an *in-the-moment* habit. The trap is **timing**:
+the over-engineered version isn't "wrong" — it adds complexity *before it's
+needed*. Add it later, when the requirement is real.
+## 3. Surgical changes
+**Touch only what the task requires. Clean up only your own mess.**
+- **Don't "improve" adjacent code, comments, or formatting.** Don't refactor what
+  isn't broken on the way past.
+- **Match the surrounding style — even if you'd do it differently.** Quote style,
+  type-hint usage, naming cadence, spacing: conform to the file you're in. A diff
+  that reformats is a diff that hides its real change.
+- **Every changed line traces to the request.** If you can't explain a line by
+  the task, it shouldn't be in the diff.
+- **Orphans:** remove imports/variables/functions *your* change made unused.
+  **Don't** delete pre-existing dead code unless asked — *mention* it instead.
+**Fits the kit (the one real tension).** The constitution rewards refactoring by
+responsibility (§1, §2) and the kit ships `/analyze-code-ia-practices`,
+`/tech-debt-sweep`, and `/dev-start "refactor X"`. That is not a contradiction:
+**refactoring is a deliberate, scoped task — never an opportunistic side effect
+of an unrelated change.** When you spot a real structural smell mid-task, note it
+and offer a focused follow-up; don't fold it into the current diff. `/dev-start`
+already locks scope and blocks opportunistic refactors — this rule makes the same
+discipline always-on, even without it.
+## 4. Goal-driven execution
+**Define a verifiable success criterion. Loop until it's met.**
+- **Turn vague tasks into checkable goals.** "Add validation" → "tests for the
+  invalid inputs, then make them pass." "Fix the bug" → "a test that reproduces
+  it, then make it green." "Refactor X" → "the suite is green before and after."
+- **Reproduce before you fix.** Write the failing test that captures the bug
+  *first* — it proves you understood it and guards against its return (this is
+  `/bug-hunt`'s root-cause-first stance, and constitution-grade for any fix).
+- **State a brief plan for multi-step work**, each step paired with its check:
+  ```
+  1. <step> → verify: <check>
+  2. <step> → verify: <check>
+  ```
+- **Loop independently on strong criteria.** A clear success test lets you
+  iterate without pestering the user; a weak one ("make it work") guarantees
+  rework. Don't claim done until the check is green — report honestly if it isn't.
+**Fits the kit.** This ties together what the kit already provides: `/bug-hunt`
+(root cause first), the QA squad (`/test-plan`, `/scaffold-tests`, `/qa-signoff`),
+`best-practices.md` §H7 (behavior tests that catch the bug), and the TodoWrite
+plan. This rule is the thread that connects them on every task.
+---
+## These guidelines are working if…
+- clarifying questions arrive **before** implementation, not after a wrong diff;
+- diffs are smaller and every line traces to the request;
+- fewer rewrites for overcomplication;
+- fixes ship with a test that fails without them.
+See [`behaviors-examples.md`](./behaviors-examples.md) for concrete before/after
+diffs of each anti-pattern.

package/templates/contextkit/best-practices.md ADDED Viewed

@@ -0,0 +1,323 @@
+# AI-Assisted Coding Best Practices — The Rubric
+> The principles `/analyze-code-ia-practices` checks against. Mirrors the
+> constitution in `CLAUDE.md` — keep them in sync. Tune thresholds via
+> `contextkit/config.json` (`l5.lineBudget`, etc.).
+>
+> **Paired with `review-protocol.md`** (severity vocabulary, scope, the
+> protocol the auditor follows, and the scanner contract). This file says
+> *what good looks like*; the protocol file says *how to apply it*.
+>
+> **Stack-agnostic by design.** ContextDevKit installs into any language and any
+> stack — this rubric speaks in principles, not in framework names. If your
+> project wants concrete stack-specific rules (e.g. ORM patterns, framework
+> conventions, security controls particular to your stack), write them in a
+> sibling `best-practices.local.md` your project owns. The kit will not
+> invent domain content (constitution, rule 9).
+>
+> **Why this rubric exists.** AI-assisted code fails in predictable ways, and
+> the *expensive* failures are not the ones a linter sees. Logic married to
+> the database, authorization holes, duplicated domain rules, state with no
+> single source of truth — that is what sinks a project. God files and bad
+> names are real, but they are the *cheap* tier. The tiers below are ordered
+> by what actually costs the project, not by what a regex can match.
+## Risk model
+A finding's weight is not "which rule." It is:
+> **severity ≈ likelihood × blast radius × cost-to-fix-later**
+- **likelihood** — how often this bites in practice.
+- **blast radius** — how much breaks, or how much data leaks, when it does.
+- **cost-to-fix-later** — cheap now, expensive once it has spread or shipped.
+That is why the rubric is reviewed **top-down by tier**:
+1. **System & architecture** — wrong here and everything downstream inherits it.
+2. **Module & function hygiene** — real, but local and cheap to fix.
+**Security is deliberately *not* a tier here.** The kit already has a
+dedicated security ecosystem (`code-security`, `security`, `infra-security`
+agents; `/audit`, `/deps-audit`, `/security-setup` commands). Duplicating
+those rules in this rubric would create two places to maintain them and
+violate the rubric's own §H3 (separation of concerns). For security
+findings, dispatch the security-team — see *Adjacent concerns* at the foot
+of this file.
+## Rule shape
+Every rule below has the same four blocks — **Principle / Smells / Fix /
+Don't over-apply** — because an agent applies a rule well only when it knows
+what triggers a look, what to propose, *and where to stay quiet.* The
+over-apply clause is the calibration; a respected guardrail beats a flagged
+false positive. (Severity, scope, and reporting format live in
+`review-protocol.md`.)
+---
+# TIER 1 — System & architecture
+The highest-leverage tier, and the one a line-counting linter is blind to.
+The scanner gives the agent almost nothing here — read the code.
+## S1 — Dependency direction
+**Principle.** Dependencies point *inward*: UI and domain logic do not depend
+on infrastructure. The persistence client, the HTTP transport, and the
+framework live at the edges and depend on the core, not the reverse. The
+domain should not know how it is stored or transported.
+**Smells.** A `domain/` or business module importing the DB/ORM client, the
+network library, the router, or generated persistence types. Generated row
+types used as the domain model everywhere. "It's easier to just import the
+client here."
+**Fix.** Invert it: the core defines an interface (a repository/port); infra
+implements it; the dependency is injected at the edge. The domain stays
+ignorant of how things are persisted or transported.
+**Don't over-apply.** A genuinely thin app — a CRUD admin panel with no real
+domain — doesn't need a hexagonal cathedral. Match ceremony to complexity.
+The *direction* rule still holds; the *depth* of layering is negotiable.
+## S2 — Boundaries & encapsulation
+**Principle.** Each module has a deliberate public surface; everything else
+is internal. Callers depend on the contract, not the guts. The shape of a
+module's public API is a *decision*, not a side effect of which files happen
+to be re-exported.
+**Smells.** Deep imports into a module's internals
+(`feature/x/internal/helpers`). A barrel that re-exports everything, so
+nothing is actually private. Two modules reaching into each other's state.
+A consumer mutating a value the producer hands back.
+**Fix.** Export the contract from the module's entry point; keep the rest
+unexported. If a caller genuinely needs an internal, that internal probably
+wants promoting to the public API — on purpose, not by accident. Return
+immutable values or copies across the boundary.
+**Don't over-apply.** Don't add barrel/index ceremony to a two-file feature.
+The point is intentional surface, not bureaucratic surface.
+## S3 — Coupling & cycles
+**Principle.** Low coupling between modules; no import cycles. Watch
+**fan-in** (many things depend on this — change it carefully) and
+**fan-out** (this depends on many things — it's probably doing too much).
+**Smells.** Circular imports. A "god module" everything imports. A change
+here forcing edits in five unrelated places. A module that pulls in half the
+codebase to do its job.
+**Fix.** Break the cycle by extracting the shared piece both sides depend on,
+or by inverting one direction with an interface. High fan-out usually means a
+missing decomposition: the module is orchestrating *and* doing the work; pull
+the work into focused collaborators it calls.
+**Don't over-apply.** Shared kernel utilities (types, pure helpers) having
+high fan-in is normal and fine — that's what they're *for*. The smell is
+high fan-in on something that *changes often*, not high fan-in on a stable
+primitive.
+## S4 — State location & single source of truth
+**Principle.** Every piece of state has **one source of truth.** Server
+state and client/UI state are different things and live in different places.
+Domain rules live once. Derived data is computed, not stored.
+**Smells.** The same server data fetched and cached in several places,
+drifting out of sync. The same validation or business rule copy-pasted across
+screens. Server data and UI-only state mashed into one blob. Two flags that
+must always agree (and don't, on some path).
+**Fix.** Server state → one query/cache layer (a query hook, a cache, a
+loader, whatever your stack offers), read everywhere from that one place.
+Client/UI state → local and minimal. A rule that appears twice → extract it
+to one function. Derive, don't duplicate.
+**Don't over-apply.** Not every local state is a sin. Genuinely ephemeral
+UI state (a toggle, an input draft, a hover flag) belongs where it lives. The
+target is *duplicated* and *drifting* state, not all state.
+---
+# TIER 2 — Module & function hygiene
+Real and worth fixing — but local, cheap, and lower-leverage than Tier 1.
+This is where the deterministic scanner does most of its work.
+## H1 — Complexity & cohesion (line count is a trip-wire, not a verdict)
+**Principle.** A unit is too big when it carries more than one reason to
+change or exceeds what a reader can hold in their head. The constitution
+sets a **280 useful lines** budget (tolerance ~308) — treat that line count
+as a trip-wire that *starts* the conversation, not as the verdict.
+**Smells.** Scanner trip-wire: yellow at **240+**, RED (`BLOCKER`) at
+**> 308**. The real smells the agent has to spot beyond the line count: deep
+nesting; a function doing two jobs; a name needing a conjunction
+(`validateAndSave`, `fetchAndTransform`, or the Portuguese
+`validarESalvar`, `buscarETransformar`); grab-bag modules (`utils.ts`,
+`manager.ts`, `helpers.ts` that accrete unrelated functions); long parameter
+lists; bodies that mix abstraction levels.
+**Fix — refactor by responsibility, in order of preference:**
+- **Extract a unit with one job** — a service/use-case, a pure helper
+  module, a custom hook (UI state/effects), a sub-component (a `renderX()`
+  becomes a real component), a validator/schema module, a mapper/adapter.
+- **Promote inline render functions** (`renderHeader()`, `renderList()`) to
+  real components.
+- **Lift complex state** (`> 2 useState + ≥ 1 effect`) into a custom hook.
+- **Separate layers** — leaked business logic in a transport handler moves
+  to a service; *that's* the split, never an arbitrary line cut.
+- **Name the orchestrator by its intent** (`processOrder`, not
+  `validateAndSaveOrder`).
+**Don't over-apply.** A 300-line dumb DTO/constants/types file is fine —
+flat, cohesive, no branching. Document the cohesion in a header comment and
+move on. Conversely, a 90-line function with three responsibilities and five
+levels of nesting is rotten *under* the limit. Judge complexity, not length.
+An orchestrator that composes single-purpose units is doing one job
+(coordinating) — don't shatter cohesive logic to chase a number.
+**280 starts analysis; it is not a guillotine.**
+## H2 — Single Responsibility
+**Principle.** Each function/module/component has **one reason to change.**
+**Smells.** A name needing a conjunction signals two jobs. Grab-bag modules
+that accrete unrelated functions. Bodies mixing abstraction levels (one line
+of high-level intent followed by ten lines of low-level twiddling). Very
+long parameter lists.
+**Fix.** Extract the parts; give the caller an intention-revealing name. If
+a module is a grab-bag, split it by domain. If a function takes 8
+parameters, it probably wants an object — or it's actually two functions.
+**Don't over-apply.** An orchestrator that *composes* single-purpose units
+is doing one job — coordinating. SRP is about *reasons to change*, not
+literal single statements. Don't shatter cohesive logic to chase the rule.
+## H3 — Separation of concerns (unit level)
+**Principle.** Visual logic ≠ business logic ≠ data access. Components stay
+"dumb"; logic lives in hooks/services/helpers. Transport dispatches and
+never holds business rules. Side effects (IO, network, clock, randomness)
+are isolated and injectable.
+**Smells.** Network calls inside view code (`fetch()` in JSX). Business math
+in a route handler or controller. `new Date()`, `Math.random()`, or direct
+env reads buried in business code that pretends to be pure.
+**Fix.** Push logic out of components into hooks/services. Make controllers
+*call* a service rather than contain one. Inject the clock/random/IO so
+they're swappable and testable. Leaked business logic in a controller →
+move it to a service; that's the split, not an arbitrary line cut.
+**Don't over-apply.** Don't stand up a DI container or a repository layer
+for a three-file script. Match the ceremony to the size of the project.
+This is the unit-level companion to **S1**; depth scales with complexity.
+## H4 — Errors
+**Principle.** Validate at the boundary, fail fast with typed, descriptive
+errors. Never swallow exceptions silently; never leak stack traces to end
+users.
+**Smells.** Empty `catch {}`. A `catch` that only `console.log`s and then
+continues as if nothing happened. Returning `null`/`undefined` on failure
+with no signal to the caller. Throwing bare strings. Rendering raw
+`error.message` straight into the UI.
+**Fix.** Validate inputs where they enter the system. Throw or return typed
+errors with context. Log the technical detail with a correlation id, show
+the user a clean generic message, and let unexpected errors propagate to a
+boundary that handles them.
+**Don't over-apply.** Deliberate best-effort handling is legitimate —
+fire-and-forget analytics, optional cache warmups — *as long as the intent
+is explicit and logged*. That's not swallowing.
+## H5 — Naming
+**Principle.** Names reveal intent and domain. Readability beats clever or
+compact code — a good name is the first layer of documentation.
+**Smells.** Meaningful identifiers wearing meaningless names: `data`,
+`temp`, `obj`, `val`, `result`, `arr`, `foo`, `thing` (and equivalents in
+any language — e.g. Portuguese `dados`, `valor`, `resultado`).
+`manager`/`helper`/`util` as the actual thing carrying meaning.
+**Fix.** Rename to the domain concept (`invoices` not `data`, `retryCount`
+not `x`); booleans as predicates (`isLoading`); carry units in the name
+(`timeoutMs`).
+**Don't over-apply.** Short names are fine in tight scope — `i`/`j` in
+loops, `x`/`y` in math or coordinates, `err` in a catch, `_` for an unused
+arg, single letters in a one-line lambda. The ban is for identifiers that
+*carry meaning*, not throwaways.
+## H6 — Documentation
+**Principle.** Doc-comment non-trivial business logic, hooks, and public
+functions (`@param`/`@returns`/`@throws`). Comments explain the **why**,
+not the obvious **what**.
+**Smells.** Comments restating the code (`// increment i` above `i++`).
+Stale comments contradicting the code. A public function that throws with
+no `@throws`. A non-obvious workaround with no rationale.
+**Fix.** Delete the redundant ones. Add the *why* for anything surprising.
+Document the contract on public surfaces — what it promises, what it
+guarantees, what it can throw.
+**Don't over-apply.** Trivial one-liners and self-evident code don't need
+JSDoc. Don't demand ceremony where the name already says it.
+## H7 — Tests
+**Principle.** Critical paths and failure modes earn tests that would
+**actually catch the bug** — tests of *behavior*, not implementation
+details.
+**Smells.** Happy-path-only suites. Tests asserting internal calls or
+object shape instead of observable outcome. Snapshot-only coverage. No test
+for the error/edge cases the code clearly handles.
+**Fix.** Test contracts and behavior; cover the failure modes; write the
+test that fails if the bug returns. See the QA squad (`/test-plan`,
+`/scaffold-tests`, `/qa-signoff`) for the wider testing plan.
+**Don't over-apply.** Coverage % is not the goal; risk is. Don't pad with
+trivial tests to hit a number.
+---
+# Adjacent concerns — what this rubric does *not* cover
+The kit deliberately splits responsibilities across specialised
+agents/commands. The rubric stays in its lane:
+- **Security (AppSec, supply-chain, infra/cloud)** — `code-security`,
+  `security`, `infra-security` agents; `/audit`, `/deps-audit`,
+  `/security-setup`. Dispatch the security-team for findings of any of:
+  trust-boundary validation, server-side authorization, secret/PII
+  exposure, injection/unsafe sinks, dependency CVEs, IAM, IaC misconfig.
+- **Accessibility** — `accessibility` agent (design-team).
+- **Privacy (LGPD / regional)** — `privacy-lgpd` agent (compliance-team).
+- **Architectural decisions (new designs, dependency adoption)** —
+  `architect` agent + `/new-adr` + `/simulate-impact`. The rubric audits
+  *existing* code's architecture (Tier 1); `architect` decides *new*
+  shape.
+- **QA / test strategy** — `qa-orchestrator` and the qa-* squad;
+  `/test-plan`, `/scaffold-tests`, `/qa-signoff`. H7 in this rubric is a
+  code-quality lens on tests; QA owns the deeper test strategy.
+---
+For severity vocabulary, scope (when Tier 2 is relaxed for spikes), the
+review protocol the auditor follows, and the contract between scanner and
+agent — see **[`review-protocol.md`](./review-protocol.md)**.

package/templates/contextkit/config.json ADDED Viewed

@@ -0,0 +1,66 @@
+{
+  "level": 2,
+  "setup": { "completed": false },
+  "practices": { "active": false },
+  "ledger": {
+    "important": [
+      "src/",
+      "lib/",
+      "app/",
+      "apps/",
+      "packages/",
+      "components/",
+      "pages/",
+      "server/",
+      "contextkit/",
+      ".claude/",
+      ".github/",
+      "CLAUDE.md",
+      "package.json",
+      "tsconfig.json",
+      "pyproject.toml",
+      "go.mod",
+      "Cargo.toml"
+    ],
+    "irrelevant": [
+      "node_modules/",
+      "dist/",
+      "build/",
+      "out/",
+      ".next/",
+      ".turbo/",
+      ".expo/",
+      ".svelte-kit/",
+      "coverage/",
+      "__pycache__/",
+      "target/",
+      "vendor/",
+      ".context-snapshot.md",
+      ".claude/.sessions/",
+      ".claude/.workspace/"
+    ],
+    "registration": ["contextkit/memory/SESSIONS.md", "docs/CHANGELOG.md"]
+  },
+  "l3": { "mainBranch": "main" },
+  "qa": {
+    "criticalPaths": [],
+    "coverageTarget": { "lines": 80, "branches": 70 }
+  },
+  "l5": {
+    "highRiskPaths": [],
+    "lineBudget": { "yellow": 240, "red": 308 },
+    "contractGlobs": [],
+    "distill": {
+      "observeWindow": 10,
+      "proposeAfterSessions": 30,
+      "archiveLedgersOlderThanDays": 7
+    },
+    "techDebtSweep": {
+      "default": "full",
+      "profiles": {
+        "full": { "cadenceDays": 14, "scope": "all" },
+        "quick": { "cadenceDays": 3, "scope": "red-zone-only" }
+      }
+    }
+  }
+}