agent-harness-kit 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +27 -0
- package/.claude-plugin/plugin.json +25 -0
- package/LICENSE +21 -0
- package/README.md +165 -0
- package/bin/cli.mjs +261 -0
- package/package.json +64 -0
- package/src/core/detect-stack.mjs +181 -0
- package/src/core/doctor.mjs +106 -0
- package/src/core/patch-package-json.mjs +53 -0
- package/src/core/render-templates.mjs +277 -0
- package/src/core/upgrade.mjs +274 -0
- package/src/templates/.claude/agents/api-consistency-reviewer.md +33 -0
- package/src/templates/.claude/agents/architecture-reviewer.md.hbs +41 -0
- package/src/templates/.claude/agents/performance-reviewer.md +35 -0
- package/src/templates/.claude/agents/reliability-reviewer.md +38 -0
- package/src/templates/.claude/agents/security-reviewer.md +39 -0
- package/src/templates/.claude/hooks/hooks.json.hbs +39 -0
- package/src/templates/.claude/settings.json.hbs +25 -0
- package/src/templates/.claude/skills/add-adr/SKILL.md +60 -0
- package/src/templates/.claude/skills/add-feature/SKILL.md.hbs +50 -0
- package/src/templates/.claude/skills/debug-flow/SKILL.md.hbs +38 -0
- package/src/templates/.claude/skills/doc-drift-scan/SKILL.md +43 -0
- package/src/templates/.claude/skills/eval-runner/SKILL.md +55 -0
- package/src/templates/.claude/skills/garbage-collection/SKILL.md.hbs +49 -0
- package/src/templates/.claude/skills/inspect-app/SKILL.md +57 -0
- package/src/templates/.claude/skills/inspect-module/SKILL.md.hbs +53 -0
- package/src/templates/.claude/skills/propose-harness-improvement/SKILL.md +43 -0
- package/src/templates/.claude/skills/structural-test-author/SKILL.md.hbs +46 -0
- package/src/templates/.claude/skills/write-skill/SKILL.md +39 -0
- package/src/templates/CLAUDE.md.hbs +70 -0
- package/src/templates/_adapter-python/.importlinter +14 -0
- package/src/templates/_adapter-python/harness/__init__.py +0 -0
- package/src/templates/_adapter-python/harness/eval_runner.py +281 -0
- package/src/templates/_adapter-python/harness/structural_test.py +195 -0
- package/src/templates/_adapter-typescript/.dependency-cruiser.cjs +27 -0
- package/src/templates/_adapter-typescript/eslint.config.mjs +38 -0
- package/src/templates/_adapter-typescript/harness/eval-runner.mjs +322 -0
- package/src/templates/_adapter-typescript/harness/structural-test.mjs +125 -0
- package/src/templates/_ci/.github/workflows/eval-nightly.yml +59 -0
- package/src/templates/_ci/.github/workflows/harness.yml +55 -0
- package/src/templates/docs/adr/0001-use-agent-harness-kit.md.hbs +56 -0
- package/src/templates/docs/agent-failures.md +25 -0
- package/src/templates/docs/architecture.md.hbs +47 -0
- package/src/templates/docs/core-beliefs.md.hbs +41 -0
- package/src/templates/docs/golden-principles.md.hbs +80 -0
- package/src/templates/docs/tech-debt-tracker.md +30 -0
- package/src/templates/feature_list.json.hbs +29 -0
- package/src/templates/harness.config.json.hbs +40 -0
- package/src/templates/scripts/dev-up.sh.hbs +51 -0
- package/src/templates/scripts/harness-report.mjs +189 -0
- package/src/templates/scripts/install-git-hooks.sh +18 -0
- package/src/templates/scripts/pre-push.sh +21 -0
- package/src/templates/scripts/precompletion-checklist.sh.hbs +99 -0
- package/src/templates/scripts/structural-test-on-edit.sh.hbs +53 -0
- package/src/templates/scripts/telemetry-on-skill.sh +26 -0
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: doc-drift-scan
|
|
3
|
+
description: Use this skill weekly, before releases, or when the user mentions "stale docs", "doc drift", "docs are wrong", or "the README is out of date". Cross-checks every code path, file path, and command referenced in `docs/` and `CLAUDE.md` against the current repo state and produces a list of stale references — the doc-gardening agent pattern.
|
|
4
|
+
allowed-tools: Read, Glob, Grep, Bash(test -e:*), Bash(command -v:*)
|
|
5
|
+
suggested-turns: 12
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Steps
|
|
9
|
+
|
|
10
|
+
1. **Extract references.** From `docs/**/*.md` and `CLAUDE.md`, extract:
|
|
11
|
+
- Backtick paths matching `[a-z0-9_/.\\-]+\\.(md|ts|tsx|js|py|json|yml|yaml|sh)`
|
|
12
|
+
- Backtick commands (first token after a backtick).
|
|
13
|
+
- Markdown links to local files (`./...` or `../...`).
|
|
14
|
+
2. **Validate each.**
|
|
15
|
+
- Paths: `test -e <path>`. If missing, mark as drift.
|
|
16
|
+
- Commands: `command -v <cmd>`. If not on PATH, mark as drift (but allow
|
|
17
|
+
a configured allowlist for known optional tools).
|
|
18
|
+
3. **Group findings.**
|
|
19
|
+
- `missing-paths`: file moved or deleted.
|
|
20
|
+
- `wrong-layer-claim`: doc says module is in layer X, structural test
|
|
21
|
+
says layer Y.
|
|
22
|
+
- `outdated-commands`: command no longer exists or signature changed.
|
|
23
|
+
4. **Open ONE PR.** Label `doc-drift`. Patch all findings in one commit. Do
|
|
24
|
+
not merge.
|
|
25
|
+
|
|
26
|
+
## Output contract
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
### Doc-drift scan: <date>
|
|
30
|
+
### References checked: <N>
|
|
31
|
+
### Drifted: <count>
|
|
32
|
+
### PR opened: #<n>
|
|
33
|
+
### Top 3 drifts:
|
|
34
|
+
1. ...
|
|
35
|
+
2. ...
|
|
36
|
+
3. ...
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Anti-patterns
|
|
40
|
+
|
|
41
|
+
- Don't fix code to match docs. Fix docs to match code.
|
|
42
|
+
- Don't propose deleting a doc that drifted — drift means the doc was once
|
|
43
|
+
useful. Update or supersede it.
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: eval-runner
|
|
3
|
+
description: Use this skill whenever a skill, subagent, or hook is changed, before merging to main, or when the user mentions "eval", "regression test for the harness", or "is the harness still working". This skill is a thin wrapper — it runs a single shell command (`npm run harness:eval` or `python -m harness.eval_runner`) and summarizes the JSONL output. Do not implement the eval logic yourself; the runner is already deterministic.
|
|
4
|
+
allowed-tools: Bash(npm run harness:eval:*), Bash(python -m harness.eval_runner:*), Read
|
|
5
|
+
suggested-turns: 2
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## When to use
|
|
9
|
+
|
|
10
|
+
The user said any of: "run the evals", "regression-test the harness", "is the
|
|
11
|
+
harness still working", "eval the kit changes".
|
|
12
|
+
|
|
13
|
+
## Steps
|
|
14
|
+
|
|
15
|
+
This is a **2-turn skill**. Don't over-engineer.
|
|
16
|
+
|
|
17
|
+
1. **Run the script.** Pick the right invocation based on stack:
|
|
18
|
+
- TypeScript: `npm run harness:eval -- --quick --transport=mock`
|
|
19
|
+
- Python: `python -m harness.eval_runner --quick --transport=mock`
|
|
20
|
+
|
|
21
|
+
Use `--quick` (3 tasks, ~30s) by default. Switch to `--full` only if the
|
|
22
|
+
user asked for it. Use `--transport=claude-cli` only if the user explicitly
|
|
23
|
+
wants a real (paid) run.
|
|
24
|
+
|
|
25
|
+
2. **Summarize the JSONL output.** Read the latest file in
|
|
26
|
+
`.harness/eval/results/` and produce:
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
### Eval run: <sha>
|
|
30
|
+
### Set: quick | full
|
|
31
|
+
### Transport: mock | claude-cli
|
|
32
|
+
### Tasks: <pass>/<total>
|
|
33
|
+
### Failed dimensions:
|
|
34
|
+
- <task-id>.outcome: <info>
|
|
35
|
+
- <task-id>.process: missing skills [<list>]
|
|
36
|
+
### Verdict: PASS | FAIL
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
That's it. Stop after the summary.
|
|
40
|
+
|
|
41
|
+
## What NOT to do
|
|
42
|
+
|
|
43
|
+
- **Don't re-implement the eval logic.** The runner is a deterministic script.
|
|
44
|
+
If you find yourself writing tool calls one by one, you've misread this skill.
|
|
45
|
+
- **Don't run `--full --transport=claude-cli` without permission.** That's a
|
|
46
|
+
~$2 paid API run. Always confirm first.
|
|
47
|
+
- **Don't fix failing tasks.** This skill reports the result; fixing is a
|
|
48
|
+
separate task that uses `/structural-test-author` or
|
|
49
|
+
`/propose-harness-improvement`.
|
|
50
|
+
|
|
51
|
+
## Anti-pattern flag
|
|
52
|
+
|
|
53
|
+
If you (the agent reading this skill) feel like you need >3 turns to do this,
|
|
54
|
+
stop and re-read the steps. The whole skill is: spawn a subprocess, read a
|
|
55
|
+
file, format a summary.
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: garbage-collection
|
|
3
|
+
description: Use this skill on Fridays, before tagging a release, or when the user mentions "cleanup", "tech debt", "AI slop", "GC", or "garbage collection". Runs the deterministic linters, structural tests, and doc-drift scans, then proposes the top-3 highest-leverage cleanups (with risk/cost/benefit) — does NOT auto-merge. This is the solo-dev shrunk version of OpenAI's Friday garbage-collection ritual.
|
|
4
|
+
allowed-tools: Read, Glob, Grep, Bash(npm run:*), Bash(pytest:*), Bash(ruff:*), Bash(git:*), Bash(gh:*)
|
|
5
|
+
suggested-turns: 15
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Steps
|
|
9
|
+
|
|
10
|
+
1. **Capture baseline.** Run the full suite and save the output:
|
|
11
|
+
- {{#if isPython}}`python -m harness.structural_test`{{else}}`npm run harness:check`{{/if}}
|
|
12
|
+
- {{#if isPython}}`ruff check . --output-format json`{{else}}`npm run lint -- --format json`{{/if}}
|
|
13
|
+
- Save to `.harness/gc-<date>.json`.
|
|
14
|
+
2. **Classify violations.** For each finding:
|
|
15
|
+
- **Layer violation** → fix.
|
|
16
|
+
- **Duplicate utility** (same function body in 2+ places) → propose
|
|
17
|
+
extraction to `src/shared/`.
|
|
18
|
+
- **Dead import** → remove.
|
|
19
|
+
- **Doc drift** (a path in `docs/architecture.md` no longer exists) →
|
|
20
|
+
invoke `doc-drift-scan` skill.
|
|
21
|
+
- **Hand-rolled helper** matching a shared utility → propose replacement.
|
|
22
|
+
3. **Score** each candidate fix on three 1–5 dimensions:
|
|
23
|
+
- **Risk**: 1 = trivial rename, 5 = touches multiple modules.
|
|
24
|
+
- **Cost**: tokens + minutes.
|
|
25
|
+
- **Benefit**: how often this drift recurs (read `.harness/gc-history.json`).
|
|
26
|
+
4. **Propose ONLY the top 3** cleanups (solo-dev cap; OpenAI does dozens, you
|
|
27
|
+
do 3). Open them as separate PRs with `gh pr create --label gc --draft`.
|
|
28
|
+
5. **Append a row** to `.harness/gc-history.json`:
|
|
29
|
+
`{ "date": "...", "violations_found": N, "fixes_opened": M, "total_tokens": K }`.
|
|
30
|
+
6. **Stop.** Do not merge anything. Human review required at solo scale.
|
|
31
|
+
|
|
32
|
+
## Output contract
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
### GC run: <date>
|
|
36
|
+
### Violations found: <N>
|
|
37
|
+
### Top 3 fixes proposed:
|
|
38
|
+
1. <slug> — risk:<1-5> cost:<1-5> benefit:<1-5> — PR #<n>
|
|
39
|
+
2. ...
|
|
40
|
+
3. ...
|
|
41
|
+
### Fixes deferred (not in top 3): <count>
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Anti-patterns
|
|
45
|
+
|
|
46
|
+
- Don't open more than 3 PRs in one run. The cap is a feature, not a bug.
|
|
47
|
+
- Don't auto-merge — the entire point of solo-scale GC is human review.
|
|
48
|
+
- Don't suppress findings that score low; record them in
|
|
49
|
+
`.harness/gc-history.json` so trends surface over time.
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: inspect-app
|
|
3
|
+
description: Use this skill whenever the user asks to "test the UI", "check what the app looks like", "inspect the page", "verify the dev server is up", or before claiming a UI feature is done. Boots the dev server via scripts/dev-up.sh and drives the failing flow through Playwright MCP if installed (else falls back to curl + lightweight HTML capture). Mirrors the OpenAI Chrome-DevTools-Protocol-into-runtime pattern at solo scale — verify the running app, don't trust the type checker alone.
|
|
4
|
+
allowed-tools: Read, Bash(scripts/dev-up.sh), Bash(curl:*), Bash(playwright:*), Bash(node:*)
|
|
5
|
+
suggested-turns: 12
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## When to use
|
|
9
|
+
|
|
10
|
+
The user said any of: "what does the page look like", "test the UI flow",
|
|
11
|
+
"is the dev server up", "before merging the UI work", or invoked this skill
|
|
12
|
+
explicitly via `/inspect-app`. Also auto-invokes from `/debug-flow` when the
|
|
13
|
+
bug is UI-shaped.
|
|
14
|
+
|
|
15
|
+
## Steps
|
|
16
|
+
|
|
17
|
+
1. **Detect dev server.** Read `harness.config.json` for the framework.
|
|
18
|
+
- If a process is already listening on the expected port (3000 / 8000 /
|
|
19
|
+
5000 depending on framework), reuse it.
|
|
20
|
+
- Else: `bash scripts/dev-up.sh &` in the background and wait up to 30s
|
|
21
|
+
for the readiness probe.
|
|
22
|
+
2. **Capture mode — Playwright MCP (preferred).** If `mcp__playwright__*`
|
|
23
|
+
tools are available:
|
|
24
|
+
- `mcp__playwright__browser_navigate` to the target URL
|
|
25
|
+
- `mcp__playwright__browser_snapshot` for accessibility tree
|
|
26
|
+
- `mcp__playwright__browser_take_screenshot` for a visual
|
|
27
|
+
- Optionally drive a single user flow (click → fill → click → wait)
|
|
28
|
+
3. **Capture mode — curl fallback.** If MCP is unavailable:
|
|
29
|
+
- `curl -i -s -o response.body "http://localhost:$PORT$PATH"` for headers + body
|
|
30
|
+
- `wc -l response.body` to size-check
|
|
31
|
+
- `grep -E '<title>|<h1>' response.body | head -5` for sanity
|
|
32
|
+
4. **Diff against expectation.** If the user gave an expected element /
|
|
33
|
+
text, grep for it. If not, just report what's on the page.
|
|
34
|
+
5. **Cleanup.** Kill the dev server we started (don't kill ones already
|
|
35
|
+
running before this session).
|
|
36
|
+
|
|
37
|
+
## Output contract
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
### App inspection
|
|
41
|
+
**URL:** http://localhost:<port><path>
|
|
42
|
+
**Status:** <HTTP status>
|
|
43
|
+
**Title:** <page title or first H1>
|
|
44
|
+
**Mode:** playwright-mcp | curl-fallback
|
|
45
|
+
**Screenshot:** <path or "n/a">
|
|
46
|
+
**Findings:** <bulleted list of matches/mismatches against expectation>
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Anti-patterns
|
|
50
|
+
|
|
51
|
+
- Don't claim a UI feature is done without running this skill once.
|
|
52
|
+
- Don't leave a dev server running after the inspection — kill what you
|
|
53
|
+
started.
|
|
54
|
+
- Don't grep for "Error" alone as a failure signal — many pages legitimately
|
|
55
|
+
contain that word. Match against specific expected text instead.
|
|
56
|
+
- Don't take screenshots of pages with secrets or test fixtures with PII —
|
|
57
|
+
the screenshot lands on disk.
|
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: inspect-module
|
|
3
|
+
description: Use this skill whenever the user mentions "explore", "inspect", "understand", "what does X do", "where is Y", or before adding a new feature in an unfamiliar area. Produces a structured map of one module — files, exports, dependencies, layer assignment, and recent commits — without reading the entire codebase. Always invoke this skill before editing an unfamiliar module so the agent has accurate context, not guesses.
|
|
4
|
+
allowed-tools: Read, Glob, Grep, Bash(git log:*), Bash(git ls-tree:*), Bash(tree:*)
|
|
5
|
+
suggested-turns: 6
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## When to use
|
|
9
|
+
|
|
10
|
+
The user asked any of: "how does X work", "what's in src/foo", "before I add
|
|
11
|
+
feature Y, what's in the area?", "explore <path>", "show me the shape of
|
|
12
|
+
<module>".
|
|
13
|
+
|
|
14
|
+
## Steps
|
|
15
|
+
|
|
16
|
+
1. **Resolve the target.** If the user gave a feature name (not a path), grep
|
|
17
|
+
`feature_list.json` for it. If multiple paths match, ask the user which.
|
|
18
|
+
2. **Recent activity.** Run `git log --oneline -20 -- <target>` and read the
|
|
19
|
+
top 3 commit messages.
|
|
20
|
+
3. **Layer.** Cross-reference the path against `harness.config.json` →
|
|
21
|
+
`domains[].layers` to determine the layer.
|
|
22
|
+
4. **Public surface.** List exported symbols:
|
|
23
|
+
- TypeScript: `grep -nE "^export " <target>/**/*.ts`
|
|
24
|
+
- Python: `grep -nE "^def |^class |^[A-Z_][A-Z0-9_]+ ?=" <target>/**/*.py`
|
|
25
|
+
5. **Dependencies.** List inbound imports (who depends on this module) and
|
|
26
|
+
outbound imports (what this module depends on). Verify both respect the
|
|
27
|
+
forward-only layer order; if not, **stop and report the violation** before
|
|
28
|
+
proceeding with any change plan.
|
|
29
|
+
6. **Risks.** Flag any of: dynamic imports, eval, shell-out with
|
|
30
|
+
interpolation, missing tests for an exported function.
|
|
31
|
+
|
|
32
|
+
## Output contract
|
|
33
|
+
|
|
34
|
+
Produce a Markdown report with these sections, in this order:
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
### Module: <path>
|
|
38
|
+
### Layer: <layer-name>
|
|
39
|
+
### Public surface: <list>
|
|
40
|
+
### Inbound deps: <list of paths>
|
|
41
|
+
### Outbound deps: <list of paths or external packages>
|
|
42
|
+
### Recent changes: <top 3 commit messages>
|
|
43
|
+
### Risks: <bulleted list, "none" if clean>
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
End with: "I am ready to make changes to `<module>`. The architecture-reviewer
|
|
47
|
+
subagent will be invoked on completion."
|
|
48
|
+
|
|
49
|
+
## Anti-patterns
|
|
50
|
+
|
|
51
|
+
- Don't read every file in the module — sample exports, then drill in only
|
|
52
|
+
where the user's task points.
|
|
53
|
+
- Don't propose changes in this skill. This is read-only context-gathering.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: propose-harness-improvement
|
|
3
|
+
description: Use this skill whenever the agent makes a mistake, the user observes an avoidable failure, a pattern recurs, or someone says "the agent keeps doing X". Files an "Engineer the Harness" entry — Mitchell Hashimoto's discipline: every failure becomes a permanent prevention mechanism. Always invoke this instead of just fixing the immediate symptom.
|
|
4
|
+
allowed-tools: Read, Edit, Write, Bash(git diff:*)
|
|
5
|
+
suggested-turns: 8
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Steps
|
|
9
|
+
|
|
10
|
+
1. **Triage.** Ask: "What just went wrong? What was the agent's intended
|
|
11
|
+
behavior? What's the symptom?"
|
|
12
|
+
2. **Classify.** One of:
|
|
13
|
+
- **(a) Missing context** — the agent didn't know something. Fix: add to
|
|
14
|
+
`docs/`.
|
|
15
|
+
- **(b) Missing rule** — the agent did something forbidden by an
|
|
16
|
+
unwritten rule. Fix: invoke `/structural-test-author`.
|
|
17
|
+
- **(c) Missing tool/skill** — the agent reached for the wrong tool. Fix:
|
|
18
|
+
invoke `/write-skill`.
|
|
19
|
+
- **(d) Wrong layer / architecture** — the structure invited the
|
|
20
|
+
mistake. Fix: write an ADR via `/add-adr`.
|
|
21
|
+
3. **Append entry** to `docs/agent-failures.md` with: date, symptom, fix,
|
|
22
|
+
fix-type, file modified.
|
|
23
|
+
4. **Apply the fix in the right place.** NEVER paper over with a CLAUDE.md
|
|
24
|
+
"be careful" sentence unless rule (a) applies — and even then, only as a
|
|
25
|
+
pointer to a longer doc.
|
|
26
|
+
5. **Update PROGRESS.** Append `harness-improvement: <slug>` to
|
|
27
|
+
`.harness/PROGRESS.md`.
|
|
28
|
+
|
|
29
|
+
## Output contract
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
### Failure: <one-line summary>
|
|
33
|
+
### Classification: (a|b|c|d) <name>
|
|
34
|
+
### Fix applied at: <file:line>
|
|
35
|
+
### docs/agent-failures.md entry: §<n>
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Anti-patterns (block on these)
|
|
39
|
+
|
|
40
|
+
- Don't add a vague "be careful with X" sentence to CLAUDE.md.
|
|
41
|
+
- Don't add a rule whose enforcement is also LLM-based.
|
|
42
|
+
- Don't use this skill to log unrelated cleanup ideas — those go in
|
|
43
|
+
`docs/tech-debt-tracker.md`.
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: structural-test-author
|
|
3
|
+
description: Use this skill whenever the user wants to add a new architectural rule, prevent a recurring agent mistake, or codify a pattern from golden-principles.md. Generates a {{#if isPython}}libcst-based Python{{else}}ts-morph-based TypeScript{{/if}} structural test plus the matching {{#if isPython}}import-linter contract{{else}}eslint-plugin-boundaries rule{{/if}} entry. Always prefer this over leaving rules in prose.
|
|
4
|
+
allowed-tools: Read, Edit, Write, Bash(npm test:*), Bash(pytest:*)
|
|
5
|
+
suggested-turns: 15
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Steps
|
|
9
|
+
|
|
10
|
+
1. **Phrase the rule.** Ask the user: "What invariant do you want enforced?
|
|
11
|
+
Phrase it as: 'No code in layer X may import from layer Y' or 'Every
|
|
12
|
+
<thing> must <do>'."
|
|
13
|
+
2. **Layer rules first.** If the rule is layer-based, edit `harness.config.json`
|
|
14
|
+
`domains[].layers` and the {{#if isPython}}`.importlinter`{{else}}`eslint.config.js`{{/if}}
|
|
15
|
+
config — DO NOT write a custom test for layer rules; the existing test
|
|
16
|
+
already supports them via configuration.
|
|
17
|
+
3. **Structural rules.** If the rule is structural but not layer-based (e.g.
|
|
18
|
+
"every controller must call validateAt"), open
|
|
19
|
+
`{{#if isPython}}harness/structural_test.py{{else}}harness/structural-test.ts{{/if}}`:
|
|
20
|
+
- {{#if isPython}}Use `libcst.CSTVisitor` subclass — preserves whitespace and comments.{{else}}Use `Project` + `getSourceFiles()` + AST visitors — the canonical ts-morph pattern.{{/if}}
|
|
21
|
+
4. **Add a fixture test.** Create a file in `tests/structural/` that contains
|
|
22
|
+
a deliberately-violating snippet, and verify the rule fails on it.
|
|
23
|
+
5. **Run against the whole repo.** If the rule fails on existing code, choose:
|
|
24
|
+
- **(a)** fix the existing code, OR
|
|
25
|
+
- **(b)** add the existing violations to `.harness/structural-baseline.json`
|
|
26
|
+
so only **new** violations block. (PMD/baseline pattern.)
|
|
27
|
+
6. **Document.** Append the rule and its rationale (one paragraph traced to a
|
|
28
|
+
specific past failure) to `docs/golden-principles.md`.
|
|
29
|
+
7. **Log the harness change.** Run `/propose-harness-improvement` to record
|
|
30
|
+
this as a permanent harness improvement.
|
|
31
|
+
|
|
32
|
+
## Output contract
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
### Rule added: <one-line description>
|
|
36
|
+
### Files changed: <list>
|
|
37
|
+
### New violations on existing code: <count> — baselined: yes/no
|
|
38
|
+
### golden-principles.md entry: §<n>
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Anti-patterns
|
|
42
|
+
|
|
43
|
+
- Don't write a rule whose enforcement is also LLM-based — that just recurses.
|
|
44
|
+
- Don't write a rule that requires runtime information to evaluate (e.g.
|
|
45
|
+
"this function must not take more than 100ms"). Those go in evals or
|
|
46
|
+
observability, not structural tests.
|
|
@@ -0,0 +1,39 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: write-skill
|
|
3
|
+
description: Use this skill whenever the user asks to "create a skill", "add a slash command", "package a workflow", or "make X reusable across sessions". Generates a SKILL.md with valid YAML frontmatter (name regex, description ≤ 1024 chars, body ≤ 500 lines) and supporting scripts/references/assets. Tests the skill by simulating an auto-discovery prompt.
|
|
4
|
+
allowed-tools: Read, Edit, Write, Bash(ls:*)
|
|
5
|
+
suggested-turns: 8
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Steps
|
|
9
|
+
|
|
10
|
+
1. **Validate the name.** Must match `^[a-z0-9]+(-[a-z0-9]+)*$` and be ≤ 64
|
|
11
|
+
characters.
|
|
12
|
+
2. **Write a "pushy" description.** Third-person, ≤ 1024 chars. Explicitly
|
|
13
|
+
mention triggers ("Use this skill whenever the user mentions <X>, <Y>,
|
|
14
|
+
<Z>"). Models under-trigger skills with shy descriptions.
|
|
15
|
+
3. **Body sections,** in this order: `## When to use`, `## Steps`,
|
|
16
|
+
`## Output contract`, `## Anti-patterns`. Cap body at 500 lines.
|
|
17
|
+
4. **Externalize deterministic logic.** If the skill needs deterministic work
|
|
18
|
+
(parsing, formatting, computation), put it in `scripts/<name>.sh` (or `.py`
|
|
19
|
+
/ `.mjs`) under the skill directory and reference it via a `Bash(...)`
|
|
20
|
+
tool call. SKILL.md stays declarative.
|
|
21
|
+
5. **Test discovery.** Open a fresh Claude Code session and prompt with one
|
|
22
|
+
of the description triggers. Verify the skill auto-loads.
|
|
23
|
+
|
|
24
|
+
## Output contract
|
|
25
|
+
|
|
26
|
+
```
|
|
27
|
+
### Skill: /<name>
|
|
28
|
+
### Description bytes: <count>/1024
|
|
29
|
+
### Body lines: <count>/500
|
|
30
|
+
### Allowed tools: <list>
|
|
31
|
+
### Discovery trigger tested: yes/no
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Anti-patterns
|
|
35
|
+
|
|
36
|
+
- Don't write a description that starts with "This skill…" — start with "Use
|
|
37
|
+
this skill whenever the user…" so triggers are front-loaded.
|
|
38
|
+
- Don't pack two unrelated workflows into one skill. Split them.
|
|
39
|
+
- Don't grant `Bash(*:*)` in `allowed-tools`. Pin specific commands.
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
# {{projectName}} — Agent Working Notes
|
|
2
|
+
|
|
3
|
+
{{description}} {{language}}/{{framework}} project. Single-developer hobby
|
|
4
|
+
project. This file is intentionally short — it is a **table of contents**, not
|
|
5
|
+
an encyclopedia.
|
|
6
|
+
|
|
7
|
+
## Build & Run
|
|
8
|
+
|
|
9
|
+
- Install: `{{installCmd}}`
|
|
10
|
+
- Dev: `{{devCmd}}`
|
|
11
|
+
- Test: `{{testCmd}}`
|
|
12
|
+
- Lint: `{{lintCmd}}`
|
|
13
|
+
- Structural: `{{#if isPython}}python -m harness.structural_test{{else}}npm run harness:check{{/if}}` (must pass before any PR)
|
|
14
|
+
|
|
15
|
+
## Architecture (brief)
|
|
16
|
+
|
|
17
|
+
Layer order, enforced mechanically:
|
|
18
|
+
|
|
19
|
+
**{{layersJoined}}** — code may only depend forward. Cross-cutting concerns
|
|
20
|
+
enter via `providers/`.
|
|
21
|
+
|
|
22
|
+
Full diagram and rationale: `docs/architecture.md`.
|
|
23
|
+
|
|
24
|
+
## Golden principles (must hold)
|
|
25
|
+
|
|
26
|
+
1. Prefer shared utilities in `src/shared/` over new helpers.
|
|
27
|
+
2. Validate at boundaries; never probe data shape "YOLO-style".
|
|
28
|
+
3. Each test is end-to-end through one feature in `feature_list.json`.
|
|
29
|
+
|
|
30
|
+
Full list: `docs/golden-principles.md`.
|
|
31
|
+
|
|
32
|
+
## Where to look (read on demand)
|
|
33
|
+
|
|
34
|
+
- `docs/architecture.md` — read when adding a new module or moving code.
|
|
35
|
+
- `docs/adr/` — read when changing public APIs.
|
|
36
|
+
- `docs/golden-principles.md` — read before any refactor.
|
|
37
|
+
- `feature_list.json` — read before claiming a feature is done.
|
|
38
|
+
- `.harness/PROGRESS.md` — read at session start; write at session end.
|
|
39
|
+
|
|
40
|
+
## Skills you should use
|
|
41
|
+
|
|
42
|
+
- `/inspect-module <path>` when you need to understand existing code.
|
|
43
|
+
- `/add-feature <description>` when adding new capability — never freestyle.
|
|
44
|
+
- `/structural-test-author <layer>` when adding a new structural rule.
|
|
45
|
+
- `/garbage-collection` every Friday or before tagging a release.
|
|
46
|
+
- `/eval-runner` before merging any change to a skill or agent file.
|
|
47
|
+
|
|
48
|
+
## Subagents you should delegate to (do NOT inline these reviews)
|
|
49
|
+
|
|
50
|
+
- `architecture-reviewer` — for any cross-layer change.
|
|
51
|
+
- `security-reviewer` — for any auth, input handling, or secret-touching change.
|
|
52
|
+
- `reliability-reviewer` — for any new error path, retry loop, or async boundary.
|
|
53
|
+
|
|
54
|
+
## Workflow contract
|
|
55
|
+
|
|
56
|
+
1. Start session: run `/inspect-module .` and read `.harness/PROGRESS.md`.
|
|
57
|
+
2. Pick ONE feature from `feature_list.json` whose `passes: false`.
|
|
58
|
+
3. Implement. Run the structural test. If it fails, FIX before continuing.
|
|
59
|
+
4. Self-verify with the matching reviewer subagent(s).
|
|
60
|
+
5. Commit with descriptive message. Append a line to `.harness/PROGRESS.md`.
|
|
61
|
+
6. Update `feature_list.json` (`passes: true`) **only after** end-to-end test passes.
|
|
62
|
+
|
|
63
|
+
## What NOT to do
|
|
64
|
+
|
|
65
|
+
- Don't add a new layer without an ADR.
|
|
66
|
+
- Don't disable the structural test to make a PR pass.
|
|
67
|
+
- Don't write code that the structural test cannot reason about (no dynamic
|
|
68
|
+
imports across layers).
|
|
69
|
+
- Don't update CLAUDE.md without proposing a harness improvement
|
|
70
|
+
(`/propose-harness-improvement`).
|
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
[importlinter]
|
|
2
|
+
root_package = app
|
|
3
|
+
include_external_packages = True
|
|
4
|
+
|
|
5
|
+
[importlinter:contract:layered]
|
|
6
|
+
name = Forward-only layer flow per domain
|
|
7
|
+
type = layers
|
|
8
|
+
layers =
|
|
9
|
+
app.ui
|
|
10
|
+
app.runtime
|
|
11
|
+
app.service
|
|
12
|
+
app.repo
|
|
13
|
+
app.config
|
|
14
|
+
app.types
|
|
File without changes
|