agent-harness-kit 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/.claude-plugin/marketplace.json +27 -0
  2. package/.claude-plugin/plugin.json +25 -0
  3. package/LICENSE +21 -0
  4. package/README.md +165 -0
  5. package/bin/cli.mjs +261 -0
  6. package/package.json +64 -0
  7. package/src/core/detect-stack.mjs +181 -0
  8. package/src/core/doctor.mjs +106 -0
  9. package/src/core/patch-package-json.mjs +53 -0
  10. package/src/core/render-templates.mjs +277 -0
  11. package/src/core/upgrade.mjs +274 -0
  12. package/src/templates/.claude/agents/api-consistency-reviewer.md +33 -0
  13. package/src/templates/.claude/agents/architecture-reviewer.md.hbs +41 -0
  14. package/src/templates/.claude/agents/performance-reviewer.md +35 -0
  15. package/src/templates/.claude/agents/reliability-reviewer.md +38 -0
  16. package/src/templates/.claude/agents/security-reviewer.md +39 -0
  17. package/src/templates/.claude/hooks/hooks.json.hbs +39 -0
  18. package/src/templates/.claude/settings.json.hbs +25 -0
  19. package/src/templates/.claude/skills/add-adr/SKILL.md +60 -0
  20. package/src/templates/.claude/skills/add-feature/SKILL.md.hbs +50 -0
  21. package/src/templates/.claude/skills/debug-flow/SKILL.md.hbs +38 -0
  22. package/src/templates/.claude/skills/doc-drift-scan/SKILL.md +43 -0
  23. package/src/templates/.claude/skills/eval-runner/SKILL.md +55 -0
  24. package/src/templates/.claude/skills/garbage-collection/SKILL.md.hbs +49 -0
  25. package/src/templates/.claude/skills/inspect-app/SKILL.md +57 -0
  26. package/src/templates/.claude/skills/inspect-module/SKILL.md.hbs +53 -0
  27. package/src/templates/.claude/skills/propose-harness-improvement/SKILL.md +43 -0
  28. package/src/templates/.claude/skills/structural-test-author/SKILL.md.hbs +46 -0
  29. package/src/templates/.claude/skills/write-skill/SKILL.md +39 -0
  30. package/src/templates/CLAUDE.md.hbs +70 -0
  31. package/src/templates/_adapter-python/.importlinter +14 -0
  32. package/src/templates/_adapter-python/harness/__init__.py +0 -0
  33. package/src/templates/_adapter-python/harness/eval_runner.py +281 -0
  34. package/src/templates/_adapter-python/harness/structural_test.py +195 -0
  35. package/src/templates/_adapter-typescript/.dependency-cruiser.cjs +27 -0
  36. package/src/templates/_adapter-typescript/eslint.config.mjs +38 -0
  37. package/src/templates/_adapter-typescript/harness/eval-runner.mjs +322 -0
  38. package/src/templates/_adapter-typescript/harness/structural-test.mjs +125 -0
  39. package/src/templates/_ci/.github/workflows/eval-nightly.yml +59 -0
  40. package/src/templates/_ci/.github/workflows/harness.yml +55 -0
  41. package/src/templates/docs/adr/0001-use-agent-harness-kit.md.hbs +56 -0
  42. package/src/templates/docs/agent-failures.md +25 -0
  43. package/src/templates/docs/architecture.md.hbs +47 -0
  44. package/src/templates/docs/core-beliefs.md.hbs +41 -0
  45. package/src/templates/docs/golden-principles.md.hbs +80 -0
  46. package/src/templates/docs/tech-debt-tracker.md +30 -0
  47. package/src/templates/feature_list.json.hbs +29 -0
  48. package/src/templates/harness.config.json.hbs +40 -0
  49. package/src/templates/scripts/dev-up.sh.hbs +51 -0
  50. package/src/templates/scripts/harness-report.mjs +189 -0
  51. package/src/templates/scripts/install-git-hooks.sh +18 -0
  52. package/src/templates/scripts/pre-push.sh +21 -0
  53. package/src/templates/scripts/precompletion-checklist.sh.hbs +99 -0
  54. package/src/templates/scripts/structural-test-on-edit.sh.hbs +53 -0
  55. package/src/templates/scripts/telemetry-on-skill.sh +26 -0
@@ -0,0 +1,43 @@
1
+ ---
2
+ name: doc-drift-scan
3
+ description: Use this skill weekly, before releases, or when the user mentions "stale docs", "doc drift", "docs are wrong", or "the README is out of date". Cross-checks every code path, file path, and command referenced in `docs/` and `CLAUDE.md` against the current repo state and produces a list of stale references — the doc-gardening agent pattern.
4
+ allowed-tools: Read, Glob, Grep, Bash(test -e:*), Bash(command -v:*)
5
+ suggested-turns: 12
6
+ ---
7
+
8
+ ## Steps
9
+
10
+ 1. **Extract references.** From `docs/**/*.md` and `CLAUDE.md`, extract:
11
+ - Backtick paths matching `[a-z0-9_/.\\-]+\\.(md|ts|tsx|js|py|json|yml|yaml|sh)`
12
+ - Backtick commands (first token after a backtick).
13
+ - Markdown links to local files (`./...` or `../...`).
14
+ 2. **Validate each.**
15
+ - Paths: `test -e <path>`. If missing, mark as drift.
16
+ - Commands: `command -v <cmd>`. If not on PATH, mark as drift (but allow
17
+ a configured allowlist for known optional tools).
18
+ 3. **Group findings.**
19
+ - `missing-paths`: file moved or deleted.
20
+ - `wrong-layer-claim`: doc says module is in layer X, structural test
21
+ says layer Y.
22
+ - `outdated-commands`: command no longer exists or signature changed.
23
+ 4. **Open ONE PR.** Label `doc-drift`. Patch all findings in one commit. Do
24
+ not merge.
25
+
26
+ ## Output contract
27
+
28
+ ```
29
+ ### Doc-drift scan: <date>
30
+ ### References checked: <N>
31
+ ### Drifted: <count>
32
+ ### PR opened: #<n>
33
+ ### Top 3 drifts:
34
+ 1. ...
35
+ 2. ...
36
+ 3. ...
37
+ ```
38
+
39
+ ## Anti-patterns
40
+
41
+ - Don't fix code to match docs. Fix docs to match code.
42
+ - Don't propose deleting a doc that drifted — drift means the doc was once
43
+ useful. Update or supersede it.
@@ -0,0 +1,55 @@
1
+ ---
2
+ name: eval-runner
3
+ description: Use this skill whenever a skill, subagent, or hook is changed, before merging to main, or when the user mentions "eval", "regression test for the harness", or "is the harness still working". This skill is a thin wrapper — it runs a single shell command (`npm run harness:eval` or `python -m harness.eval_runner`) and summarizes the JSONL output. Do not implement the eval logic yourself; the runner is already deterministic.
4
+ allowed-tools: Bash(npm run harness:eval:*), Bash(python -m harness.eval_runner:*), Read
5
+ suggested-turns: 2
6
+ ---
7
+
8
+ ## When to use
9
+
10
+ The user said any of: "run the evals", "regression-test the harness", "is the
11
+ harness still working", "eval the kit changes".
12
+
13
+ ## Steps
14
+
15
+ This is a **2-turn skill**. Don't over-engineer.
16
+
17
+ 1. **Run the script.** Pick the right invocation based on stack:
18
+ - TypeScript: `npm run harness:eval -- --quick --transport=mock`
19
+ - Python: `python -m harness.eval_runner --quick --transport=mock`
20
+
21
+ Use `--quick` (3 tasks, ~30s) by default. Switch to `--full` only if the
22
+ user asked for it. Use `--transport=claude-cli` only if the user explicitly
23
+ wants a real (paid) run.
24
+
25
+ 2. **Summarize the JSONL output.** Read the latest file in
26
+ `.harness/eval/results/` and produce:
27
+
28
+ ```
29
+ ### Eval run: <sha>
30
+ ### Set: quick | full
31
+ ### Transport: mock | claude-cli
32
+ ### Tasks: <pass>/<total>
33
+ ### Failed dimensions:
34
+ - <task-id>.outcome: <info>
35
+ - <task-id>.process: missing skills [<list>]
36
+ ### Verdict: PASS | FAIL
37
+ ```
38
+
39
+ That's it. Stop after the summary.
40
+
41
+ ## What NOT to do
42
+
43
+ - **Don't re-implement the eval logic.** The runner is a deterministic script.
44
+ If you find yourself writing tool calls one by one, you've misread this skill.
45
+ - **Don't run `--full --transport=claude-cli` without permission.** That's a
46
+ ~$2 paid API run. Always confirm first.
47
+ - **Don't fix failing tasks.** This skill reports the result; fixing is a
48
+ separate task that uses `/structural-test-author` or
49
+ `/propose-harness-improvement`.
50
+
51
+ ## Anti-pattern flag
52
+
53
+ If you (the agent reading this skill) feel like you need >3 turns to do this,
54
+ stop and re-read the steps. The whole skill is: spawn a subprocess, read a
55
+ file, format a summary.
@@ -0,0 +1,49 @@
1
+ ---
2
+ name: garbage-collection
3
+ description: Use this skill on Fridays, before tagging a release, or when the user mentions "cleanup", "tech debt", "AI slop", "GC", or "garbage collection". Runs the deterministic linters, structural tests, and doc-drift scans, then proposes the top-3 highest-leverage cleanups (with risk/cost/benefit) — does NOT auto-merge. This is the solo-dev shrunk version of OpenAI's Friday garbage-collection ritual.
4
+ allowed-tools: Read, Glob, Grep, Bash(npm run:*), Bash(pytest:*), Bash(ruff:*), Bash(git:*), Bash(gh:*)
5
+ suggested-turns: 15
6
+ ---
7
+
8
+ ## Steps
9
+
10
+ 1. **Capture baseline.** Run the full suite and save the output:
11
+ - {{#if isPython}}`python -m harness.structural_test`{{else}}`npm run harness:check`{{/if}}
12
+ - {{#if isPython}}`ruff check . --output-format json`{{else}}`npm run lint -- --format json`{{/if}}
13
+ - Save to `.harness/gc-<date>.json`.
14
+ 2. **Classify violations.** For each finding:
15
+ - **Layer violation** → fix.
16
+ - **Duplicate utility** (same function body in 2+ places) → propose
17
+ extraction to `src/shared/`.
18
+ - **Dead import** → remove.
19
+ - **Doc drift** (a path in `docs/architecture.md` no longer exists) →
20
+ invoke `doc-drift-scan` skill.
21
+ - **Hand-rolled helper** matching a shared utility → propose replacement.
22
+ 3. **Score** each candidate fix on three 1–5 dimensions:
23
+ - **Risk**: 1 = trivial rename, 5 = touches multiple modules.
24
+ - **Cost**: tokens + minutes.
25
+ - **Benefit**: how often this drift recurs (read `.harness/gc-history.json`).
26
+ 4. **Propose ONLY the top 3** cleanups (solo-dev cap; OpenAI does dozens, you
27
+ do 3). Open them as separate PRs with `gh pr create --label gc --draft`.
28
+ 5. **Append a row** to `.harness/gc-history.json`:
29
+ `{ "date": "...", "violations_found": N, "fixes_opened": M, "total_tokens": K }`.
30
+ 6. **Stop.** Do not merge anything. Human review required at solo scale.
31
+
32
+ ## Output contract
33
+
34
+ ```
35
+ ### GC run: <date>
36
+ ### Violations found: <N>
37
+ ### Top 3 fixes proposed:
38
+ 1. <slug> — risk:<1-5> cost:<1-5> benefit:<1-5> — PR #<n>
39
+ 2. ...
40
+ 3. ...
41
+ ### Fixes deferred (not in top 3): <count>
42
+ ```
43
+
44
+ ## Anti-patterns
45
+
46
+ - Don't open more than 3 PRs in one run. The cap is a feature, not a bug.
47
+ - Don't auto-merge — the entire point of solo-scale GC is human review.
48
+ - Don't suppress findings that score low; record them in
49
+ `.harness/gc-history.json` so trends surface over time.
@@ -0,0 +1,57 @@
1
+ ---
2
+ name: inspect-app
3
+ description: Use this skill whenever the user asks to "test the UI", "check what the app looks like", "inspect the page", "verify the dev server is up", or before claiming a UI feature is done. Boots the dev server via scripts/dev-up.sh and drives the failing flow through Playwright MCP if installed (else falls back to curl + lightweight HTML capture). Mirrors the OpenAI Chrome-DevTools-Protocol-into-runtime pattern at solo scale — verify the running app, don't trust the type checker alone.
4
+ allowed-tools: Read, Bash(scripts/dev-up.sh), Bash(curl:*), Bash(playwright:*), Bash(node:*)
5
+ suggested-turns: 12
6
+ ---
7
+
8
+ ## When to use
9
+
10
+ The user said any of: "what does the page look like", "test the UI flow",
11
+ "is the dev server up", "before merging the UI work", or invoked this skill
12
+ explicitly via `/inspect-app`. Also auto-invokes from `/debug-flow` when the
13
+ bug is UI-shaped.
14
+
15
+ ## Steps
16
+
17
+ 1. **Detect dev server.** Read `harness.config.json` for the framework.
18
+ - If a process is already listening on the expected port (3000 / 8000 /
19
+ 5000 depending on framework), reuse it.
20
+ - Else: `bash scripts/dev-up.sh &` in the background and wait up to 30s
21
+ for the readiness probe.
22
+ 2. **Capture mode — Playwright MCP (preferred).** If `mcp__playwright__*`
23
+ tools are available:
24
+ - `mcp__playwright__browser_navigate` to the target URL
25
+ - `mcp__playwright__browser_snapshot` for accessibility tree
26
+ - `mcp__playwright__browser_take_screenshot` for a visual
27
+ - Optionally drive a single user flow (click → fill → click → wait)
28
+ 3. **Capture mode — curl fallback.** If MCP is unavailable:
29
+ - `curl -i -s -o response.body "http://localhost:$PORT$PATH"` for headers + body
30
+ - `wc -l response.body` to size-check
31
+ - `grep -E '<title>|<h1>' response.body | head -5` for sanity
32
+ 4. **Diff against expectation.** If the user gave an expected element /
33
+ text, grep for it. If not, just report what's on the page.
34
+ 5. **Cleanup.** Kill the dev server we started (don't kill ones already
35
+ running before this session).
36
+
37
+ ## Output contract
38
+
39
+ ```
40
+ ### App inspection
41
+ **URL:** http://localhost:<port><path>
42
+ **Status:** <HTTP status>
43
+ **Title:** <page title or first H1>
44
+ **Mode:** playwright-mcp | curl-fallback
45
+ **Screenshot:** <path or "n/a">
46
+ **Findings:** <bulleted list of matches/mismatches against expectation>
47
+ ```
48
+
49
+ ## Anti-patterns
50
+
51
+ - Don't claim a UI feature is done without running this skill once.
52
+ - Don't leave a dev server running after the inspection — kill what you
53
+ started.
54
+ - Don't grep for "Error" alone as a failure signal — many pages legitimately
55
+ contain that word. Match against specific expected text instead.
56
+ - Don't take screenshots of pages with secrets or test fixtures with PII —
57
+ the screenshot lands on disk.
@@ -0,0 +1,53 @@
1
+ ---
2
+ name: inspect-module
3
+ description: Use this skill whenever the user mentions "explore", "inspect", "understand", "what does X do", "where is Y", or before adding a new feature in an unfamiliar area. Produces a structured map of one module — files, exports, dependencies, layer assignment, and recent commits — without reading the entire codebase. Always invoke this skill before editing an unfamiliar module so the agent has accurate context, not guesses.
4
+ allowed-tools: Read, Glob, Grep, Bash(git log:*), Bash(git ls-tree:*), Bash(tree:*)
5
+ suggested-turns: 6
6
+ ---
7
+
8
+ ## When to use
9
+
10
+ The user asked any of: "how does X work", "what's in src/foo", "before I add
11
+ feature Y, what's in the area?", "explore <path>", "show me the shape of
12
+ <module>".
13
+
14
+ ## Steps
15
+
16
+ 1. **Resolve the target.** If the user gave a feature name (not a path), grep
17
+ `feature_list.json` for it. If multiple paths match, ask the user which.
18
+ 2. **Recent activity.** Run `git log --oneline -20 -- <target>` and read the
19
+ top 3 commit messages.
20
+ 3. **Layer.** Cross-reference the path against `harness.config.json` →
21
+ `domains[].layers` to determine the layer.
22
+ 4. **Public surface.** List exported symbols:
23
+ - TypeScript: `grep -nE "^export " <target>/**/*.ts`
24
+ - Python: `grep -nE "^def |^class |^[A-Z_][A-Z0-9_]+ ?=" <target>/**/*.py`
25
+ 5. **Dependencies.** List inbound imports (who depends on this module) and
26
+ outbound imports (what this module depends on). Verify both respect the
27
+ forward-only layer order; if not, **stop and report the violation** before
28
+ proceeding with any change plan.
29
+ 6. **Risks.** Flag any of: dynamic imports, eval, shell-out with
30
+ interpolation, missing tests for an exported function.
31
+
32
+ ## Output contract
33
+
34
+ Produce a Markdown report with these sections, in this order:
35
+
36
+ ```
37
+ ### Module: <path>
38
+ ### Layer: <layer-name>
39
+ ### Public surface: <list>
40
+ ### Inbound deps: <list of paths>
41
+ ### Outbound deps: <list of paths or external packages>
42
+ ### Recent changes: <top 3 commit messages>
43
+ ### Risks: <bulleted list, "none" if clean>
44
+ ```
45
+
46
+ End with: "I am ready to make changes to `<module>`. The architecture-reviewer
47
+ subagent will be invoked on completion."
48
+
49
+ ## Anti-patterns
50
+
51
+ - Don't read every file in the module — sample exports, then drill in only
52
+ where the user's task points.
53
+ - Don't propose changes in this skill. This is read-only context-gathering.
@@ -0,0 +1,43 @@
1
+ ---
2
+ name: propose-harness-improvement
3
+ description: Use this skill whenever the agent makes a mistake, the user observes an avoidable failure, a pattern recurs, or someone says "the agent keeps doing X". Files an "Engineer the Harness" entry — Mitchell Hashimoto's discipline: every failure becomes a permanent prevention mechanism. Always invoke this instead of just fixing the immediate symptom.
4
+ allowed-tools: Read, Edit, Write, Bash(git diff:*)
5
+ suggested-turns: 8
6
+ ---
7
+
8
+ ## Steps
9
+
10
+ 1. **Triage.** Ask: "What just went wrong? What was the agent's intended
11
+ behavior? What's the symptom?"
12
+ 2. **Classify.** One of:
13
+ - **(a) Missing context** — the agent didn't know something. Fix: add to
14
+ `docs/`.
15
+ - **(b) Missing rule** — the agent did something forbidden by an
16
+ unwritten rule. Fix: invoke `/structural-test-author`.
17
+ - **(c) Missing tool/skill** — the agent reached for the wrong tool. Fix:
18
+ invoke `/write-skill`.
19
+ - **(d) Wrong layer / architecture** — the structure invited the
20
+ mistake. Fix: write an ADR via `/add-adr`.
21
+ 3. **Append entry** to `docs/agent-failures.md` with: date, symptom, fix,
22
+ fix-type, file modified.
23
+ 4. **Apply the fix in the right place.** NEVER paper over with a CLAUDE.md
24
+ "be careful" sentence unless rule (a) applies — and even then, only as a
25
+ pointer to a longer doc.
26
+ 5. **Update PROGRESS.** Append `harness-improvement: <slug>` to
27
+ `.harness/PROGRESS.md`.
28
+
29
+ ## Output contract
30
+
31
+ ```
32
+ ### Failure: <one-line summary>
33
+ ### Classification: (a|b|c|d) <name>
34
+ ### Fix applied at: <file:line>
35
+ ### docs/agent-failures.md entry: §<n>
36
+ ```
37
+
38
+ ## Anti-patterns (block on these)
39
+
40
+ - Don't add a vague "be careful with X" sentence to CLAUDE.md.
41
+ - Don't add a rule whose enforcement is also LLM-based.
42
+ - Don't use this skill to log unrelated cleanup ideas — those go in
43
+ `docs/tech-debt-tracker.md`.
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: structural-test-author
3
+ description: Use this skill whenever the user wants to add a new architectural rule, prevent a recurring agent mistake, or codify a pattern from golden-principles.md. Generates a {{#if isPython}}libcst-based Python{{else}}ts-morph-based TypeScript{{/if}} structural test plus the matching {{#if isPython}}import-linter contract{{else}}eslint-plugin-boundaries rule{{/if}} entry. Always prefer this over leaving rules in prose.
4
+ allowed-tools: Read, Edit, Write, Bash(npm test:*), Bash(pytest:*)
5
+ suggested-turns: 15
6
+ ---
7
+
8
+ ## Steps
9
+
10
+ 1. **Phrase the rule.** Ask the user: "What invariant do you want enforced?
11
+ Phrase it as: 'No code in layer X may import from layer Y' or 'Every
12
+ <thing> must <do>'."
13
+ 2. **Layer rules first.** If the rule is layer-based, edit `harness.config.json`
14
+ `domains[].layers` and the {{#if isPython}}`.importlinter`{{else}}`eslint.config.js`{{/if}}
15
+ config — DO NOT write a custom test for layer rules; the existing test
16
+ already supports them via configuration.
17
+ 3. **Structural rules.** If the rule is structural but not layer-based (e.g.
18
+ "every controller must call validateAt"), open
19
+ `{{#if isPython}}harness/structural_test.py{{else}}harness/structural-test.ts{{/if}}`:
20
+ - {{#if isPython}}Use `libcst.CSTVisitor` subclass — preserves whitespace and comments.{{else}}Use `Project` + `getSourceFiles()` + AST visitors — the canonical ts-morph pattern.{{/if}}
21
+ 4. **Add a fixture test.** Create a file in `tests/structural/` that contains
22
+ a deliberately-violating snippet, and verify the rule fails on it.
23
+ 5. **Run against the whole repo.** If the rule fails on existing code, choose:
24
+ - **(a)** fix the existing code, OR
25
+ - **(b)** add the existing violations to `.harness/structural-baseline.json`
26
+ so only **new** violations block. (PMD/baseline pattern.)
27
+ 6. **Document.** Append the rule and its rationale (one paragraph traced to a
28
+ specific past failure) to `docs/golden-principles.md`.
29
+ 7. **Log the harness change.** Run `/propose-harness-improvement` to record
30
+ this as a permanent harness improvement.
31
+
32
+ ## Output contract
33
+
34
+ ```
35
+ ### Rule added: <one-line description>
36
+ ### Files changed: <list>
37
+ ### New violations on existing code: <count> — baselined: yes/no
38
+ ### golden-principles.md entry: §<n>
39
+ ```
40
+
41
+ ## Anti-patterns
42
+
43
+ - Don't write a rule whose enforcement is also LLM-based — that just recurses.
44
+ - Don't write a rule that requires runtime information to evaluate (e.g.
45
+ "this function must not take more than 100ms"). Those go in evals or
46
+ observability, not structural tests.
@@ -0,0 +1,39 @@
1
+ ---
2
+ name: write-skill
3
+ description: Use this skill whenever the user asks to "create a skill", "add a slash command", "package a workflow", or "make X reusable across sessions". Generates a SKILL.md with valid YAML frontmatter (name regex, description ≤ 1024 chars, body ≤ 500 lines) and supporting scripts/references/assets. Tests the skill by simulating an auto-discovery prompt.
4
+ allowed-tools: Read, Edit, Write, Bash(ls:*)
5
+ suggested-turns: 8
6
+ ---
7
+
8
+ ## Steps
9
+
10
+ 1. **Validate the name.** Must match `^[a-z0-9]+(-[a-z0-9]+)*$` and be ≤ 64
11
+ characters.
12
+ 2. **Write a "pushy" description.** Third-person, ≤ 1024 chars. Explicitly
13
+ mention triggers ("Use this skill whenever the user mentions <X>, <Y>,
14
+ <Z>"). Models under-trigger skills with shy descriptions.
15
+ 3. **Body sections,** in this order: `## When to use`, `## Steps`,
16
+ `## Output contract`, `## Anti-patterns`. Cap body at 500 lines.
17
+ 4. **Externalize deterministic logic.** If the skill needs deterministic work
18
+ (parsing, formatting, computation), put it in `scripts/<name>.sh` (or `.py`
19
+ / `.mjs`) under the skill directory and reference it via a `Bash(...)`
20
+ tool call. SKILL.md stays declarative.
21
+ 5. **Test discovery.** Open a fresh Claude Code session and prompt with one
22
+ of the description triggers. Verify the skill auto-loads.
23
+
24
+ ## Output contract
25
+
26
+ ```
27
+ ### Skill: /<name>
28
+ ### Description bytes: <count>/1024
29
+ ### Body lines: <count>/500
30
+ ### Allowed tools: <list>
31
+ ### Discovery trigger tested: yes/no
32
+ ```
33
+
34
+ ## Anti-patterns
35
+
36
+ - Don't write a description that starts with "This skill…" — start with "Use
37
+ this skill whenever the user…" so triggers are front-loaded.
38
+ - Don't pack two unrelated workflows into one skill. Split them.
39
+ - Don't grant `Bash(*:*)` in `allowed-tools`. Pin specific commands.
@@ -0,0 +1,70 @@
1
+ # {{projectName}} — Agent Working Notes
2
+
3
+ {{description}} {{language}}/{{framework}} project. Single-developer hobby
4
+ project. This file is intentionally short — it is a **table of contents**, not
5
+ an encyclopedia.
6
+
7
+ ## Build & Run
8
+
9
+ - Install: `{{installCmd}}`
10
+ - Dev: `{{devCmd}}`
11
+ - Test: `{{testCmd}}`
12
+ - Lint: `{{lintCmd}}`
13
+ - Structural: `{{#if isPython}}python -m harness.structural_test{{else}}npm run harness:check{{/if}}` (must pass before any PR)
14
+
15
+ ## Architecture (brief)
16
+
17
+ Layer order, enforced mechanically:
18
+
19
+ **{{layersJoined}}** — code may only depend forward. Cross-cutting concerns
20
+ enter via `providers/`.
21
+
22
+ Full diagram and rationale: `docs/architecture.md`.
23
+
24
+ ## Golden principles (must hold)
25
+
26
+ 1. Prefer shared utilities in `src/shared/` over new helpers.
27
+ 2. Validate at boundaries; never probe data shape "YOLO-style".
28
+ 3. Each test is end-to-end through one feature in `feature_list.json`.
29
+
30
+ Full list: `docs/golden-principles.md`.
31
+
32
+ ## Where to look (read on demand)
33
+
34
+ - `docs/architecture.md` — read when adding a new module or moving code.
35
+ - `docs/adr/` — read when changing public APIs.
36
+ - `docs/golden-principles.md` — read before any refactor.
37
+ - `feature_list.json` — read before claiming a feature is done.
38
+ - `.harness/PROGRESS.md` — read at session start; write at session end.
39
+
40
+ ## Skills you should use
41
+
42
+ - `/inspect-module <path>` when you need to understand existing code.
43
+ - `/add-feature <description>` when adding new capability — never freestyle.
44
+ - `/structural-test-author <layer>` when adding a new structural rule.
45
+ - `/garbage-collection` every Friday or before tagging a release.
46
+ - `/eval-runner` before merging any change to a skill or agent file.
47
+
48
+ ## Subagents you should delegate to (do NOT inline these reviews)
49
+
50
+ - `architecture-reviewer` — for any cross-layer change.
51
+ - `security-reviewer` — for any auth, input handling, or secret-touching change.
52
+ - `reliability-reviewer` — for any new error path, retry loop, or async boundary.
53
+
54
+ ## Workflow contract
55
+
56
+ 1. Start session: run `/inspect-module .` and read `.harness/PROGRESS.md`.
57
+ 2. Pick ONE feature from `feature_list.json` whose `passes: false`.
58
+ 3. Implement. Run the structural test. If it fails, FIX before continuing.
59
+ 4. Self-verify with the matching reviewer subagent(s).
60
+ 5. Commit with descriptive message. Append a line to `.harness/PROGRESS.md`.
61
+ 6. Update `feature_list.json` (`passes: true`) **only after** end-to-end test passes.
62
+
63
+ ## What NOT to do
64
+
65
+ - Don't add a new layer without an ADR.
66
+ - Don't disable the structural test to make a PR pass.
67
+ - Don't write code that the structural test cannot reason about (no dynamic
68
+ imports across layers).
69
+ - Don't update CLAUDE.md without proposing a harness improvement
70
+ (`/propose-harness-improvement`).
@@ -0,0 +1,14 @@
1
+ [importlinter]
2
+ root_package = app
3
+ include_external_packages = True
4
+
5
+ [importlinter:contract:layered]
6
+ name = Forward-only layer flow per domain
7
+ type = layers
8
+ layers =
9
+ app.ui
10
+ app.runtime
11
+ app.service
12
+ app.repo
13
+ app.config
14
+ app.types