@pieerry/harness-kit 4.0.0 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -12,6 +12,8 @@ Registered in [`AGENTS.md`](../../../AGENTS.md). Agent definition: [`product-man
12
12
 
13
13
  Also invokable as sub-agent via Task tool with `subagent_type: "product-manager"`.
14
14
 
15
+ PRP explorer (`skills/prp/SKILL.md`) consults optional context cache before grep: cached graphify graph or repomix pack at `.claude/runtime/cache/{graphify,repomix}/`. See [`.claude/shared/context-strategy.md`](../../../.claude/shared/context-strategy.md). Built via `/context:graph` or `/context:pack` — manual, optional.
16
+
15
17
  ## Tree
16
18
 
17
19
  ```
@@ -90,6 +92,11 @@ jq -s '.[] | select(.feature_id | contains("dispatch"))' .claude/runtime/outputs
90
92
 
91
93
  After a PRP is approved, engineering picks it up via the [staff-software-engineer plugin](../staff-software-engineer/README.md). The SSE plugin reads `.claude/runtime/outputs/pm/prp/{feature_id}.md` and runs plan → dev → test → pr stages, all writing to the same `.claude/runtime/outputs/pm/tokens/{feature_id}.json` file. Full feature lifecycle in one token log.
92
94
 
95
+ Engineering can choose:
96
+ - `/sse:run` — full pipeline through merged PR
97
+ - `/sse:run --local` — plan, dev, test; stop before PR
98
+ - `/sse:sdd` — spec-driven loop using the PRP's `Success criteria (verifiable)` + `Validation gates` as goal predicate. PRP authors should write **testable** criteria — vague bullets fail the `prp-has-acceptance-criteria` pre-flight sensor and block the loop.
99
+
93
100
  ## Status bar
94
101
 
95
102
  The repo's status-line (`.claude/hooks/status-line.sh`) detects PM activity automatically. If any PRD or PRP file under this plugin was modified in the last hour, it switches the status bar to pipeline mode and falls back to the engineering picker otherwise.
@@ -25,8 +25,14 @@ Read:
25
25
  - guides/templates/prp.md
26
26
  - guides/pipeline.md
27
27
  - guides/examples/good-prp-example.md
28
+ - .claude/shared/context-strategy.md — pick the right tier when exploring target repos
28
29
 
29
- Explore target repos. Ask user for repo paths if not provided. Use Grep and Read to map files. Capture file:line. Never invent paths.
30
+ Explore target repos. Ask user for repo paths if not provided. Use the context-strategy tier order:
31
+ 1. Cached graphify graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` → query for symbols/callers (much cheaper than grep)
32
+ 2. Cached repomix pack at `.claude/runtime/cache/repomix/*.xml` → read for full file content
33
+ 3. Fall back to Grep + Read on live repo
34
+
35
+ Capture file:line. Never invent paths. If a target repo is large + uncached, suggest the user run `/context:graph {repo}` before continuing — don't auto-build.
30
36
 
31
37
  Save to .claude/runtime/outputs/pm/prp/{feature_id}.md.
32
38
 
@@ -10,10 +10,13 @@ Registered in [`AGENTS.md`](../../../AGENTS.md). Agent definition: [`staff-softw
10
10
  - `/sse:dev`: implement the plan in code, run convention gates
11
11
  - `/sse:test`: run the project test suite
12
12
  - `/sse:pr`: open the draft PR
13
- - `/sse:run`: full pipeline, plan to pr
13
+ - `/sse:run`: full pipeline, plan to pr. Flags: `--local` (stop before PR), `--sdd` (hand off to spec-driven loop), `--no-monitor` (skip PR monitor only)
14
+ - `/sse:sdd`: spec-driven dev loop. Plan once, then dev↔test↔spec-satisfied eval until PRP spec met. Cap 3 iters. Local only — never auto-opens PR.
14
15
 
15
16
  Also invokable as sub-agent via Task tool with `subagent_type: "staff-software-engineer"`.
16
17
 
18
+ Optional context helpers (separate namespace, manual): `/context:pack <feature_id>` and `/context:graph [repo]`. Plan stage + SDD supervisor eval consult the cache when present. See [`.claude/shared/context-strategy.md`](../../../.claude/shared/context-strategy.md).
19
+
17
20
  ## Tree
18
21
 
19
22
  ```
@@ -25,23 +28,28 @@ Also invokable as sub-agent via Task tool with `subagent_type: "staff-software-e
25
28
  │ ├── mobile/SKILL.md iOS/Android defaults
26
29
  │ └── devops/SKILL.md CI/IaC defaults
27
30
  ├── guides/
28
- │ ├── pipeline.md retry, approval, token accounting
31
+ │ ├── pipeline.md retry, approval, token accounting, variants (--local, sdd)
32
+ │ ├── sdd-loop.md spec-driven loop algorithm, predicate from PRP
29
33
  │ ├── coding-style.md team code style
30
34
  │ ├── commit-style.md Conventional Commits with TICKET
31
35
  │ └── conventions-override.md how project overrides work
32
- ├── sensors/ plan, dev, test, pr structure + conventions
33
- └── evals/ plan/dev/test/pr quality rubrics
36
+ ├── sensors/ plan, dev, test, pr structure + conventions + prp-has-acceptance-criteria (sdd pre-flight)
37
+ └── evals/ plan/dev/test/pr quality rubrics + spec-satisfied (sdd supervisor)
34
38
 
35
39
  .claude/runtime/ ← state + outputs
36
40
  ├── hooks/staff-software-engineer/ phase markers, sensor gates
37
41
  ├── scripts/staff-software-engineer/ symlinks to PM scripts (sensor-runner, token-phase)
38
- └── outputs/sse/
39
- ├── plan/ generated plans
40
- ├── dev/ dev summaries
41
- ├── test/ test results
42
- ├── pr/ opened PR records
43
- ├── tokens/ per-feature phase tokens JSON
44
- └── .markers/ phase start/end markers (transient)
42
+ ├── outputs/sse/
43
+ ├── plan/ generated plans
44
+ ├── dev/ dev summaries
45
+ ├── test/ test results
46
+ ├── pr/ opened PR records
47
+ ├── sdd/ sdd-loop transcripts (per-iter eval verdicts)
48
+ │ ├── tokens/ per-feature phase tokens JSON
49
+ │ └── .markers/ phase start/end markers (transient)
50
+ └── cache/ optional context tools (gitignored)
51
+ ├── repomix/{feature_id}.xml per-feature snapshot (ephemeral, cleared on /pipeline:reset)
52
+ └── graphify/{slug}/graphify-out/ per-repo knowledge graph (long-lived, manual rebuild)
45
53
  ```
46
54
 
47
55
  ## How conventions work
@@ -73,6 +81,12 @@ Plugin skills read both. Project rules win. See `guides/conventions-override.md`
73
81
  | Test command detection | commands/test.md |
74
82
  | PR template | commands/pr.md, hooks/post-eval-pr.sh |
75
83
  | Sensors | sensors/*.md |
84
+ | SDD loop algorithm | guides/sdd-loop.md |
85
+ | SDD iter cap | guides/sdd-loop.md + commands/sdd.md |
86
+ | SDD predicate rubric | evals/spec-satisfied.md |
87
+ | SDD pre-flight check | sensors/prp-has-acceptance-criteria.md |
88
+ | --local behavior | commands/run.md (## Flags) |
89
+ | Context tier order | ../../shared/context-strategy.md |
76
90
 
77
91
  ## Connects to PM plugin
78
92
 
@@ -0,0 +1,66 @@
1
+ # Eval: Spec Satisfied
2
+
3
+ Type: LLM-judge (supervisor)
4
+ Mode: goal predicate
5
+ Verdict: PASS or FAIL
6
+
7
+ Decide if current repo state satisfies source PRP. Run by independent session (no prior context from worker turn) to avoid worker self-judging.
8
+
9
+ ## Inputs
10
+
11
+ - source PRP: `.claude/runtime/outputs/pm/prp/{feature_id}.md`
12
+ - latest dev summary: `.claude/runtime/outputs/sse/dev/{feature_id}.md`
13
+ - latest test report: `.claude/runtime/outputs/sse/test/{feature_id}.md`
14
+ - diff: `git diff {base}...HEAD` on dev branch
15
+ - **optional richer context** (per `.claude/shared/context-strategy.md`):
16
+ - cached pack `.claude/runtime/cache/repomix/{feature_id}.xml` if present → read for surrounding code
17
+ - cached graph `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` if present → query for callers of touched symbols (impact analysis)
18
+
19
+ ## Rubric
20
+
21
+ Walk each criterion. For each, mark MET / NOT MET / UNCLEAR with evidence (file:line, test name, or diff hunk).
22
+
23
+ ### 1. Success criteria coverage
24
+ Every bullet under PRP section 3 "Success criteria (verifiable)" mapped to:
25
+ - code change that implements it (file:line), AND
26
+ - test that asserts it (test name)
27
+
28
+ Missing either side → NOT MET.
29
+
30
+ ### 2. Validation gates green
31
+ Every command in PRP section 6 "Validation gates" bash block exit 0 in latest test report. UNCLEAR if command not run.
32
+
33
+ ### 3. Manual verification items
34
+ Each `- [ ]` under "Manual verification" in PRP. Worker cannot tick these — flag as PENDING for user. Do not block goal on these; report separately.
35
+
36
+ ### 4. Scope discipline
37
+ No code change outside PRP "Repos and files touched" without justification in dev summary. Out-of-scope diff hunks → flag for review, not auto-fail.
38
+
39
+ ## Verdict
40
+
41
+ - All success criteria MET + all validation gates green → **PASS**
42
+ - Any criterion NOT MET → **FAIL** with specific next-iter focus
43
+ - Any UNCLEAR → **FAIL** (treat as not satisfied; iter again or add evidence)
44
+
45
+ ## Output
46
+
47
+ ```json
48
+ {
49
+ "verdict": "PASS|FAIL",
50
+ "criteria": [
51
+ {"criterion": "...", "status": "MET|NOT_MET|UNCLEAR", "evidence": "file:line | test name | diff hunk"}
52
+ ],
53
+ "validation_gates": [
54
+ {"command": "...", "exit_code": 0, "status": "GREEN|RED|NOT_RUN"}
55
+ ],
56
+ "manual_pending": ["..."],
57
+ "scope_flags": ["..."],
58
+ "next_iter_focus": "specific fix for FAIL; omit on PASS"
59
+ }
60
+ ```
61
+
62
+ ## On FAIL
63
+
64
+ `/sse:sdd` loop reads `next_iter_focus`, regenerates dev step targeting only that. Re-runs test. Re-evals.
65
+
66
+ Max 3 iters. Cap hit → hard stop, write loop transcript + final verdict to `.claude/runtime/outputs/sse/sdd/{feature_id}.md`, return blocker.
@@ -9,6 +9,12 @@ Shared rules for plan, dev, test, pr stages. Edit retry, approval, publish, toke
9
9
  3. test: run project test suite. Capture results.
10
10
  4. pr: open draft PR with standard template.
11
11
 
12
+ ### Variants
13
+
14
+ - `/sse:run` runs 1→4 sequentially.
15
+ - `/sse:run --local` runs 1→3, skips 4. Use for local dev+test without push.
16
+ - `/sse:sdd` replaces 2+3 with goal-loop: dev↔test↔spec-satisfied eval, cap 3 iters, predicate from PRP. Stops local. See `sdd-loop.md`.
17
+
12
18
  ## Retry policy
13
19
 
14
20
  Max attempts per stage: 3.
@@ -67,6 +73,7 @@ To merge with PM tokens file, SSE token-phase.py writes to same path under this
67
73
  - tests fail after dev
68
74
  - gh CLI not available for pr stage
69
75
  - missing required input after one clarification
76
+ - SDD: 3 loop iters complete without spec-satisfied PASS verdict
70
77
 
71
78
  ## Conventions override
72
79
 
@@ -0,0 +1,127 @@
1
+ # SDD Loop
2
+
3
+ Spec-driven dev loop. Used by `/sse:sdd`. Wraps plan + dev + test in goal-loop. Stops local. No PR.
4
+
5
+ Inspired by Claude Code `/goal` (May 2026): worker model attempts work, independent evaluator checks goal met, retry until met or cap hit.
6
+
7
+ ## When to use
8
+
9
+ - Want to iterate locally on a feature until spec satisfied
10
+ - Not ready to push or open PR
11
+ - PRP exists with testable success criteria + validation gates
12
+
13
+ When not:
14
+ - No PRP — run `/product-manager:prp` first
15
+ - Want PR opened automatically — use `/sse:run`
16
+ - PRP success criteria vague — sensor blocks, fix PRP first
17
+
18
+ ## Inputs
19
+
20
+ - source PRP: latest approved in `.claude/runtime/outputs/pm/prp/` or path arg
21
+ - feature_id derived from PRP filename
22
+
23
+ ## Predicate (goal completion condition)
24
+
25
+ Built from PRP:
26
+ - every bullet in `## 3) What` → `Success criteria (verifiable):` MET (code + test exist)
27
+ - every command in `## 6) Validation gates` bash block exit 0
28
+
29
+ Eval rubric: `evals/spec-satisfied.md`. Independent session reads PRP + dev summary + test report + git diff. Returns PASS or FAIL with `next_iter_focus`.
30
+
31
+ ## Algorithm
32
+
33
+ ```
34
+ 1. pre-flight
35
+ - sensor: prp-has-acceptance-criteria → block if FAIL
36
+ - read PRP, extract criteria + gates
37
+
38
+ 2. plan once
39
+ - invoke /sse:plan (normal gates, eval, retry up to 3)
40
+ - get approved plan
41
+
42
+ 3. goal loop (cap = 3 iters)
43
+ iter = 1
44
+ while iter <= 3:
45
+ a. /sse:dev step (worker)
46
+ - if iter > 1, focus on `next_iter_focus` from prior eval
47
+ b. /sse:test
48
+ c. supervisor eval: spec-satisfied.md
49
+ - fresh session, no worker context
50
+ - reads PRP + dev/{id}.md + test/{id}.md + git diff
51
+ d. verdict == PASS:
52
+ - write sdd/{feature_id}.md transcript
53
+ - exit loop, success
54
+ e. verdict == FAIL:
55
+ - append iter result to sdd transcript
56
+ - iter += 1
57
+ - continue
58
+
59
+ 4. cap hit (iter > 3)
60
+ - hard stop
61
+ - write sdd/{feature_id}.md with full transcript + final verdict
62
+ - return blocker, list NOT_MET criteria + UNCLEAR items
63
+ ```
64
+
65
+ ## Output artifact
66
+
67
+ `.claude/runtime/outputs/sse/sdd/{feature_id}.md`:
68
+
69
+ ```markdown
70
+ # SDD Loop: {feature_id}
71
+
72
+ Source PRP: {path}
73
+ Iterations: {N}
74
+ Final verdict: PASS | FAIL (cap hit)
75
+
76
+ ## Iteration 1
77
+ - dev summary: {path}
78
+ - test report: {path}
79
+ - eval verdict: FAIL
80
+ - next focus: {string}
81
+
82
+ ## Iteration 2
83
+ ...
84
+
85
+ ## Final state
86
+ - branch: {branch}
87
+ - commits: {N}
88
+ - criteria MET: {N}/{total}
89
+ - gates GREEN: {N}/{total}
90
+ - manual pending: [...]
91
+
92
+ ## Next step
93
+ PASS → review diff, run `/sse:pr` when ready
94
+ FAIL → address blockers, re-run `/sse:sdd`
95
+ ```
96
+
97
+ ## Token accounting
98
+
99
+ Per-iter phases tracked under same `.markers/` dir as standard pipeline:
100
+ - `{feature_id}.sdd-iter{N}-dev.{start,end}`
101
+ - `{feature_id}.sdd-iter{N}-test.{start,end}`
102
+ - `{feature_id}.sdd-iter{N}-eval.{start,end}`
103
+
104
+ Reuses existing `scripts/staff-software-engineer/token-phase.py` — pass phase name as arg.
105
+
106
+ ## Stop conditions
107
+
108
+ - 3 iters complete without PASS
109
+ - pre-flight sensor fails (PRP not testable)
110
+ - `/sse:plan` hard stops (3 plan retries exhausted)
111
+ - user interrupts mid-loop
112
+
113
+ ## Why separate from /sse:run
114
+
115
+ `/sse:run` is monolithic plan→dev→test→pr. SDD loop adds iter inside dev+test, omits pr. Two reasons:
116
+
117
+ 1. **Local dev workflow**: want repeated dev↔test cycle with spec-judge feedback, not single shot
118
+ 2. **PR gate**: human reviews loop transcript before pushing. Auto-PR on goal PASS is too eager — supervisor can be wrong
119
+
120
+ `/sse:run --local` exists as cheap alternative (skip pr stage, no loop). SDD loop is the spec-driven mode.
121
+
122
+ ## References
123
+
124
+ - predicate source: `guides/templates/prp.md` sections 3, 6 (product-manager plugin)
125
+ - worker stage: `commands/sse/dev.md`, `commands/sse/test.md`
126
+ - evaluator: `evals/spec-satisfied.md`
127
+ - gating sensor: `sensors/prp-has-acceptance-criteria.md`
@@ -0,0 +1,35 @@
1
+ # Sensor: PRP Has Acceptance Criteria
2
+
3
+ Type: structural
4
+ Mode: hard gate
5
+
6
+ `/sse:sdd` uses PRP as load-bearing spec. Goal-loop predicate built from PRP. PRP missing testable criteria → predicate undefined → loop cannot judge done. Block early.
7
+
8
+ ## Check
9
+
10
+ Source PRP must contain both:
11
+
12
+ 1. **Success criteria (verifiable):** bullet list under section `## 3) What`. Each bullet must be testable (observable behavior, output, or measurable state). Vague bullets (e.g. "works well", "user-friendly") fail.
13
+
14
+ 2. **Validation gates** section `## 6) Validation gates` with non-empty bash block. Empty fenced block or placeholder `{commands the executor MUST run}` fails.
15
+
16
+ ## Pass conditions
17
+
18
+ - `Success criteria (verifiable):` literal string present
19
+ - At least 1 bullet under it (line starts with `- ` or `* `)
20
+ - No bullet contains placeholder `{` `}` braces
21
+ - `## 6) Validation gates` present
22
+ - Bash fence under it contains at least 1 non-comment, non-placeholder command
23
+
24
+ ## On failure
25
+
26
+ Block. Return blocker pointing at missing/weak section. Tell user to run `/product-manager:prp` and address feedback before retrying `/sse:sdd`.
27
+
28
+ ## Example failure output
29
+
30
+ ```
31
+ prp-has-acceptance-criteria: FAIL
32
+ file: .claude/runtime/outputs/pm/prp/{feature_id}.md
33
+ reason: section "Success criteria (verifiable)" empty
34
+ fix: add 2-4 testable bullets under section 3
35
+ ```
@@ -0,0 +1,65 @@
1
+ ---
2
+ description: Build queryable knowledge graph of target repo with graphify. Cached per repo, reused across features.
3
+ ---
4
+
5
+ Build graphify knowledge graph for a long-lived target repo. Tree-sitter local for code (no API key). Adds semantic edges via LLM for docs/PDFs/images if present (`--with-docs` later, v2).
6
+
7
+ ## Inputs
8
+
9
+ - Arg 1 (optional): target repo path (default = cwd)
10
+ - `--update`: incremental refresh, merge with existing graph (cheap, fast)
11
+ - `--deep`: aggressive edge inference (slower, higher fidelity)
12
+ - `--wiki`: also build wiki-style markdown index
13
+
14
+ ## Pre-flight
15
+
16
+ Run `command -v graphify`. Missing → block with install hint:
17
+ ```
18
+ graphify not installed
19
+ install: uv tool install graphifyy | pipx install graphifyy | pip install graphifyy
20
+ note: PyPI pkg is graphifyy (double y), CLI cmd is graphify
21
+ docs: https://github.com/safishamsi/graphify
22
+ ```
23
+
24
+ ## Steps
25
+
26
+ 1. Bash: `.claude/scripts/graph-repo.sh [target] [--update] [--deep] [--wiki]`
27
+ 2. Script writes `.claude/runtime/cache/graphify/{repo_slug}/graphify-out/`
28
+ 3. Outputs `graph.json` (queryable), `graph.html` (interactive), `GRAPH_REPORT.md` (audit)
29
+ 4. Report cache path + open command for graph.html
30
+
31
+ ## When to use
32
+
33
+ - Target repo touched by many features (e.g. main product monorepo) — amortize build cost
34
+ - Plan stage needs "find all callers of X", "where does Y live" — graph query >> grep
35
+ - PRP `## 4) Context → Repos and files touched` discovery
36
+ - ~71× token reduction on queries vs raw file reads (per graphify benchmarks)
37
+
38
+ When NOT to use:
39
+ - One-off small repo — overkill
40
+ - Hot repo with high commit churn — graph stales fast, use `--update` hook (see Hot-repo section)
41
+ - No `graphify` binary in env
42
+
43
+ ## Reply format
44
+
45
+ ```
46
+ Graph build complete.
47
+ target: {target_dir}
48
+ slug: {repo_slug}
49
+ path: .claude/runtime/cache/graphify/{slug}/graphify-out/
50
+ view: open .claude/runtime/cache/graphify/{slug}/graphify-out/graph.html
51
+ report: .claude/runtime/cache/graphify/{slug}/graphify-out/GRAPH_REPORT.md
52
+ next: /sse:plan + /sse:sdd will consult graph.json when present
53
+ ```
54
+
55
+ ## Hot-repo: auto-update hook
56
+
57
+ For frequently-changing target repos, install graphify's git hook:
58
+ ```bash
59
+ cd {target_repo} && graphify hook install
60
+ ```
61
+ Each commit triggers `graphify . --update`. Graph stays fresh, harness-kit cache reads latest. Document this in target repo's CONTRIBUTING.
62
+
63
+ ## Privacy note
64
+
65
+ Code-only mode runs Tree-sitter locally — no network calls. Only the optional `--with-docs` step (not enabled by default in this wrapper) sends semantic descriptions of non-code files to a configured LLM. Raw source code never leaves the machine.
@@ -0,0 +1,59 @@
1
+ ---
2
+ description: Pack target repo with repomix. Snapshot for AI context. Cached per feature_id.
3
+ ---
4
+
5
+ Build repomix snapshot of target repo. Used by `/sse:plan`, `/sse:sdd` supervisor eval, and multi-repo PRPs.
6
+
7
+ ## Inputs
8
+
9
+ - Arg 1 (required): `feature_id` (matches PRD/PRP filename basename)
10
+ - Arg 2 (optional): target repo path (default = cwd)
11
+ - `--include "glob"`: narrow scope (e.g. `--include "src/auth/**"`). Repeat with comma-separated globs.
12
+ - `--style xml|markdown`: output format. Default xml.
13
+
14
+ ## Pre-flight
15
+
16
+ Run `command -v repomix`. Missing → block with install hint:
17
+ ```
18
+ repomix not installed
19
+ install: npm i -g repomix | brew install repomix
20
+ docs: https://repomix.com
21
+ ```
22
+
23
+ ## Steps
24
+
25
+ 1. Bash: `.claude/scripts/pack-repo.sh <feature_id> [target] [--include glob] [--style xml]`
26
+ 2. Script writes `.claude/runtime/cache/repomix/{feature_id}.{style}`
27
+ 3. Report path + size + token estimate
28
+
29
+ ## When to use
30
+
31
+ - Target repo > 50 files and stages keep re-grepping same context
32
+ - Multi-repo feature: pack each repo, supervisor eval reads concat
33
+ - SDD loop: pack `{changed files} ∪ {PRP "Repos and files touched"}` for cheaper per-iter eval
34
+
35
+ When NOT to use:
36
+ - Small repo (<20 files) — grep is fine
37
+ - Pack would exceed context budget (>200k tokens) — narrow with `--include`
38
+
39
+ ## Reply format
40
+
41
+ ```
42
+ Pack complete.
43
+ feature: {feature_id}
44
+ target: {target_dir}
45
+ style: {xml|markdown}
46
+ include: {glob | (default .gitignore aware)}
47
+ path: .claude/runtime/cache/repomix/{feature_id}.{ext}
48
+ size: {N} bytes
49
+ tokens: {N} (repomix estimate)
50
+ next: consumed automatically by /sse:plan, /sse:sdd if cache present
51
+ ```
52
+
53
+ ## Cleanup
54
+
55
+ Pack invalidated when:
56
+ - `/pipeline:reset` runs (clears all caches for active feature)
57
+ - Manual: `rm .claude/runtime/cache/repomix/{feature_id}.*`
58
+
59
+ Pack is **snapshot, not live**. Re-run when target repo materially changes mid-feature.
@@ -24,6 +24,13 @@ Read:
24
24
  - .claude/agents/staff-software-engineer/guides/coding-style.md
25
25
  - area-specific skill: .claude/agents/staff-software-engineer/skills/{area}/SKILL.md (area = backend, web, mobile, devops)
26
26
  - project conventions if present: {repo}/.claude/conventions/{area}.md (see .claude/agents/staff-software-engineer/guides/conventions-override.md)
27
+ - .claude/shared/context-strategy.md — pick the right tier for target-repo lookups
28
+
29
+ Context lookups (per `context-strategy.md`):
30
+ - Cached graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` → query for callers/refs instead of grepping
31
+ - Cached pack at `.claude/runtime/cache/repomix/{feature_id}.xml` → read for full file snapshot
32
+ - Neither present → fall back to grep + Read on live repo
33
+ - Don't double-load. If pack/graph covers a PRP-listed file, skip the grep for it.
27
34
 
28
35
  Save to .claude/runtime/outputs/sse/plan/{feature_id}.md.
29
36
 
@@ -1,15 +1,25 @@
1
1
  ---
2
- description: Run the full engineering pipeline. Plan, dev, test, pr in sequence.
2
+ description: Run the full engineering pipeline. Plan, dev, test, pr in sequence. Pass --local to stop after test (no PR). Pass --sdd for spec-driven loop variant.
3
3
  ---
4
4
 
5
5
  Run end to end.
6
6
 
7
- 1. Invoke /sse:plan. Wait for approval marker on plan.
8
- 2. Invoke /sse:dev. Implements plan in code.
9
- 3. Invoke /sse:test. Runs project test suite.
10
- 4. Invoke /sse:pr. Opens pull request.
11
- 5. Invoke /sse:pr-monitor. Arms backoff polling, auto-clears pipeline state on merge. Skip if user passed `--no-monitor` or `gh pr view` already returns MERGED.
12
- 6. Return summary.
7
+ ## Flags
8
+
9
+ - `--local`: skip /sse:pr + /sse:pr-monitor. Stop after /sse:test. Use when want dev+test locally without push.
10
+ - `--sdd`: hand off to `/sse:sdd` instead (spec-driven loop, plan once + dev↔test↔eval loop, also local-only). See `.claude/commands/sse/sdd.md`. Mutually exclusive with `--local`.
11
+ - `--no-monitor`: skip /sse:pr-monitor only. PR still opens.
12
+
13
+ ## Steps
14
+
15
+ 1. `--sdd` set → invoke /sse:sdd with same args (minus --sdd). Return its result. Skip rest.
16
+ 2. Invoke /sse:plan. Wait for approval marker on plan.
17
+ 3. Invoke /sse:dev. Implements plan in code.
18
+ 4. Invoke /sse:test. Runs project test suite.
19
+ 5. `--local` set → stop here. Print summary (omit PR + Monitor lines). Tell user `next: review diff, /sse:pr when ready`.
20
+ 6. Invoke /sse:pr. Opens pull request.
21
+ 7. Invoke /sse:pr-monitor. Arms backoff polling, auto-clears pipeline state on merge. Skip if user passed `--no-monitor` or `gh pr view` already returns MERGED.
22
+ 8. Return summary.
13
23
 
14
24
  Follow .claude/agents/staff-software-engineer/guides/pipeline.md for retry, approval markers, token accounting, publish behavior.
15
25
 
@@ -0,0 +1,98 @@
1
+ ---
2
+ description: Spec-driven dev loop. Plan once, then dev↔test↔eval loop until PRP spec satisfied. Local only, no PR.
3
+ ---
4
+
5
+ Run spec-driven dev loop. Follow .claude/agents/staff-software-engineer/guides/sdd-loop.md.
6
+
7
+ ## Inputs
8
+
9
+ - Optional arg: path to source PRP. Default = latest approved PRP in `.claude/runtime/outputs/pm/prp/`.
10
+ - None found → abort. Tell user to run `/product-manager:prp` first.
11
+
12
+ feature_id = basename of PRP file (no .md).
13
+
14
+ ## Pre-flight (hard gate)
15
+
16
+ Run sensor `.claude/agents/staff-software-engineer/sensors/prp-has-acceptance-criteria.md` on source PRP. Fail → block, return blocker. Do not proceed.
17
+
18
+ Print header card. Format: `.claude/scripts/stage-card.md`.
19
+
20
+ ```
21
+ ━━━ /sse:sdd · {feature_id} ━━━
22
+ **guides:** sdd-loop.md, pipeline.md
23
+ **refs:** prp/{feature_id}.md
24
+ **sensors:** prp-has-acceptance-criteria
25
+ **eval:** spec-satisfied (supervisor, per iter)
26
+ **next:** /sse:pr (after PASS, manual trigger)
27
+ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
28
+ ```
29
+
30
+ ## Steps
31
+
32
+ 1. **Plan once.** Invoke `/sse:plan`. Wait approved marker. Hard stop if plan fails.
33
+
34
+ 2. **Goal loop. Cap = 3 iters.**
35
+
36
+ For iter in 1..3:
37
+
38
+ a. Write marker `.claude/runtime/outputs/sse/.markers/{feature_id}.sdd-iter{N}-dev.start`.
39
+ b. Invoke `/sse:dev`. If iter > 1, pass `--focus="{next_iter_focus from prior eval}"`.
40
+ c. Invoke `/sse:test`. Capture report path.
41
+ d. Run supervisor eval `.claude/agents/staff-software-engineer/evals/spec-satisfied.md`. **Use fresh session** (Task tool with subagent_type=general-purpose, prompt includes PRP path + dev summary path + test report path + `git diff main...HEAD`). Worker context must not leak.
42
+ e. Parse eval JSON output. Verdict PASS → break. FAIL → record `next_iter_focus`, continue.
43
+
44
+ 3. **Write transcript** `.claude/runtime/outputs/sse/sdd/{feature_id}.md`. Format per `guides/sdd-loop.md` § Output artifact.
45
+
46
+ 4. **PASS path:** append approval marker:
47
+ ```
48
+ <!-- approved: {YYYY-MM-DD} verdict=PASS iters={N} -->
49
+ ```
50
+
51
+ 5. **FAIL path (cap hit):** append:
52
+ ```
53
+ <!-- blocked: {YYYY-MM-DD} verdict=FAIL iters=3 -->
54
+ ```
55
+ Return blocker listing NOT_MET criteria + UNCLEAR items.
56
+
57
+ ## Reply format
58
+
59
+ PASS:
60
+ ```
61
+ SDD loop PASS. {N} iter(s).
62
+ feature: {feature_id}
63
+ branch: {branch}
64
+ commits: {M} ({short-sha}, ...)
65
+ criteria: {met}/{total} MET
66
+ gates: {green}/{total} GREEN
67
+ manual: {N} pending (user verify)
68
+ transcript: .claude/runtime/outputs/sse/sdd/{feature_id}.md
69
+ next: review diff, run /sse:pr when ready
70
+ ```
71
+
72
+ FAIL (cap hit):
73
+ ```
74
+ SDD loop FAIL. cap hit at 3 iters.
75
+ feature: {feature_id}
76
+ branch: {branch}
77
+ blockers:
78
+ - criterion: "{text}" status: NOT_MET reason: {evidence}
79
+ - gate: "{cmd}" status: RED exit: {N}
80
+ transcript: .claude/runtime/outputs/sse/sdd/{feature_id}.md
81
+ next: address blockers, re-run /sse:sdd
82
+ ```
83
+
84
+ ## Context tiers
85
+
86
+ Before iter 1, check for cached context per `.claude/shared/context-strategy.md`:
87
+ - repomix pack at `.claude/runtime/cache/repomix/{feature_id}.xml` → supervisor eval reads it for richer judgment
88
+ - graphify graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` → supervisor eval queries for "does diff break callers"
89
+
90
+ Neither present + repo big → suggest user run `/context:pack {feature_id} --include "{paths from PRP}"` once before continuing. Don't auto-build (manual cmds only, per harness pattern).
91
+
92
+ ## Rules
93
+
94
+ - **No PR.** This command never opens PR. User decides via `/sse:pr`.
95
+ - **No push.** Commits stay local on dev branch.
96
+ - **Fresh evaluator session.** Worker self-eval defeats supervisor pattern.
97
+ - **3 iter cap.** Do not exceed. Cap hit = real signal of spec/code mismatch.
98
+ - **Print iter banner** between iters: `━━━ SDD iter {N}/3 ━━━` so user can follow loop.
File without changes
File without changes
File without changes
@@ -0,0 +1,58 @@
1
+ #!/usr/bin/env bash
2
+ # Build knowledge graph of target repo with graphify.
3
+ # Output: .claude/runtime/cache/graphify/{repo_slug}/graphify-out/
4
+ #
5
+ # Code files processed locally via Tree-sitter — no API key needed.
6
+ # Non-code files (docs, PDFs, images) require an LLM key if present.
7
+ #
8
+ # Usage:
9
+ # graph-repo.sh [target_dir] [--update] [--deep] [--wiki]
10
+ #
11
+ # Defaults: target_dir = cwd, fresh build (rerun overwrites)
12
+
13
+ set -e
14
+
15
+ TARGET="${PWD}"
16
+ EXTRA_ARGS=()
17
+
18
+ while [ $# -gt 0 ]; do
19
+ case "$1" in
20
+ --update) EXTRA_ARGS+=(--update); shift ;;
21
+ --deep) EXTRA_ARGS+=(--mode deep); shift ;;
22
+ --wiki) EXTRA_ARGS+=(--wiki); shift ;;
23
+ --*) echo "unknown flag: $1 (pass-through flags: --update --deep --wiki)"; exit 2 ;;
24
+ *) TARGET="$1"; shift ;;
25
+ esac
26
+ done
27
+
28
+ command -v graphify >/dev/null 2>&1 || {
29
+ echo "graphify not installed"
30
+ echo "install: uv tool install graphifyy | pipx install graphifyy | pip install graphifyy"
31
+ echo "note: PyPI pkg is graphifyy (double y), CLI cmd is graphify"
32
+ echo "docs: https://github.com/safishamsi/graphify"
33
+ exit 127
34
+ }
35
+
36
+ ABS_TARGET="$(cd "$TARGET" && pwd)"
37
+ SLUG="$(basename "$ABS_TARGET")-$(echo "$ABS_TARGET" | shasum | cut -c1-8)"
38
+ CACHE_DIR=".claude/runtime/cache/graphify/$SLUG"
39
+
40
+ mkdir -p "$CACHE_DIR"
41
+
42
+ echo "graph: $ABS_TARGET → $CACHE_DIR/graphify-out/"
43
+ [ ${#EXTRA_ARGS[@]} -gt 0 ] && echo " args: ${EXTRA_ARGS[*]}"
44
+
45
+ # graphify writes graphify-out/ into cwd. cd cache, run with abs target.
46
+ (cd "$CACHE_DIR" && graphify "$ABS_TARGET" "${EXTRA_ARGS[@]}")
47
+
48
+ OUT="$CACHE_DIR/graphify-out"
49
+ if [ ! -d "$OUT" ]; then
50
+ echo "expected output dir not found: $OUT"
51
+ exit 1
52
+ fi
53
+
54
+ echo "ok"
55
+ echo " graph: $OUT/graph.json"
56
+ echo " view: open $OUT/graph.html"
57
+ echo " report: $OUT/GRAPH_REPORT.md"
58
+ echo " re-run with --update for incremental refresh"
@@ -0,0 +1,54 @@
1
+ #!/usr/bin/env bash
2
+ # Pack target repo with repomix. Snapshot for AI context.
3
+ # Output: .claude/runtime/cache/repomix/{feature_id}.xml
4
+ #
5
+ # Usage:
6
+ # pack-repo.sh <feature_id> [target_dir] [--include glob] [--style xml|markdown]
7
+ #
8
+ # Defaults:
9
+ # target_dir = cwd
10
+ # style = xml
11
+ # include = repomix default (.gitignore aware)
12
+
13
+ set -e
14
+
15
+ FEATURE_ID="${1:?feature_id required}"
16
+ shift || true
17
+
18
+ TARGET="${PWD}"
19
+ STYLE="xml"
20
+ INCLUDE=""
21
+
22
+ while [ $# -gt 0 ]; do
23
+ case "$1" in
24
+ --include) INCLUDE="$2"; shift 2 ;;
25
+ --style) STYLE="$2"; shift 2 ;;
26
+ --*) echo "unknown flag: $1"; exit 2 ;;
27
+ *) TARGET="$1"; shift ;;
28
+ esac
29
+ done
30
+
31
+ command -v repomix >/dev/null 2>&1 || {
32
+ echo "repomix not installed"
33
+ echo "install: npm i -g repomix | brew install repomix"
34
+ echo "docs: https://repomix.com"
35
+ exit 127
36
+ }
37
+
38
+ CACHE_ROOT=".claude/runtime/cache/repomix"
39
+ mkdir -p "$CACHE_ROOT"
40
+ OUT="$CACHE_ROOT/${FEATURE_ID}.${STYLE}"
41
+
42
+ ARGS=(--style "$STYLE" -o "$OUT")
43
+ [ -n "$INCLUDE" ] && ARGS+=(--include "$INCLUDE")
44
+
45
+ echo "pack: $TARGET → $OUT"
46
+ (cd "$TARGET" && repomix "${ARGS[@]}" --quiet)
47
+
48
+ TOKENS=$(grep -oE "Total Tokens:.*" "$OUT" 2>/dev/null | head -1 || true)
49
+ SIZE=$(wc -c < "$OUT" | tr -d ' ')
50
+
51
+ echo "ok"
52
+ echo " path: $OUT"
53
+ echo " size: ${SIZE} bytes"
54
+ [ -n "$TOKENS" ] && echo " tokens: $TOKENS"
@@ -92,7 +92,12 @@
92
92
  "Bash(echo \"exit=$?\")",
93
93
  "Bash(echo \"next exit=$? \\(expect 1, nothing to do\\)\")",
94
94
  "Bash(:)",
95
- "Bash(ffmpeg -y -i out/demo.mp4 -vf \"fps=8,scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=64[p];[s1][p]paletteuse=dither=bayer:bayer_scale=5\" -loop 0 preview.gif)"
95
+ "Bash(ffmpeg -y -i out/demo.mp4 -vf \"fps=8,scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=64[p];[s1][p]paletteuse=dither=bayer:bayer_scale=5\" -loop 0 preview.gif)",
96
+ "Bash(mkdir -p /Users/pierryborges/Development/harness-kit/.claude/runtime/outputs/sse/sdd)",
97
+ "Bash(touch /Users/pierryborges/Development/harness-kit/.claude/runtime/outputs/sse/sdd/.gitkeep)",
98
+ "WebFetch(domain:graphify.net)",
99
+ "Bash(mkdir -p /Users/pierryborges/Development/harness-kit/.claude/commands/context /Users/pierryborges/Development/harness-kit/.claude/shared /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/repomix /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/graphify)",
100
+ "Bash(touch /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/repomix/.gitkeep /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/graphify/.gitkeep)"
96
101
  ]
97
102
  }
98
103
  }
@@ -0,0 +1,102 @@
1
+ # Context Strategy
2
+
3
+ Shared guide for PM + SSE agents. When pulling target-repo context into a stage, pick the right tier.
4
+
5
+ Three tiers. Cost goes up left → right. Capability goes up too. Match tier to need.
6
+
7
+ | Tier | Tool | When | Cost | Freshness |
8
+ |---|---|---|---|---|
9
+ | 1 | `grep` / `Read` | small repo, narrow question, one-shot | free | always live |
10
+ | 2 | `repomix` snapshot | feature-scoped context, deterministic handoff | low (one pack per feature) | frozen at pack time |
11
+ | 3 | `graphify` graph | long-lived repo, queryable, multi-feature reuse | medium upfront (one build per repo), tiny per query | merged via `--update` per commit (with hook) |
12
+
13
+ ## Decision tree
14
+
15
+ ```
16
+ question = "where does X live?" or "what calls Y?"
17
+ ├── repo <20 files? ──── grep / Read
18
+ ├── graph cached + repo unchanged? ── query graph.json
19
+ ├── graph stale or missing + repo big? ── /context:graph (one shot)
20
+ └── feature-scoped narrow scope? ──── /context:pack --include "..."
21
+
22
+ question = "give me the full repo for an independent eval"
23
+ ├── repo packs under 200k tokens? ── /context:pack
24
+ └── too big? ── /context:pack --include "{narrowed glob}"
25
+
26
+ question = "supervisor eval reads diff + minimal context (SDD loop)"
27
+ ├── pack of {changed files} ∪ {PRP files touched} → /context:pack --include
28
+ └── add graph query for impact analysis (v2)
29
+ ```
30
+
31
+ ## Per-stage hookup
32
+
33
+ ### `/product-manager:prp`
34
+ - § 4 Context discovery: prefer graph query (if cached) over grep
35
+ - `Repos and files touched` list: graph + grep both OK; graph faster
36
+ - No pack here — PRP is upstream of pack
37
+
38
+ ### `/sse:plan`
39
+ - Read order:
40
+ 1. source PRP
41
+ 2. cached graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` if present
42
+ 3. cached pack at `.claude/runtime/cache/repomix/{feature_id}.{ext}` if present
43
+ 4. fall back to grep + Read on target repo
44
+ - Don't double-load. If pack covers all PRP-listed files, skip grep.
45
+
46
+ ### `/sse:dev`
47
+ - Never reads stale pack/graph for the live code — code is mutating per commit.
48
+ - Reads plan only. Use grep/Read on live repo for guidance lookups.
49
+
50
+ ### `/sse:test`
51
+ - No context tools. Runs detected test command.
52
+
53
+ ### `/sse:sdd` supervisor eval
54
+ - Fresh session reads:
55
+ 1. PRP (full)
56
+ 2. dev summary
57
+ 3. test report
58
+ 4. `git diff main...HEAD`
59
+ - If pack exists: ALSO read `cache/repomix/{feature_id}.{ext}` for richer judgment
60
+ - If graph exists: ALSO query graph for "does diff break callers of touched symbols"
61
+
62
+ ## Cache layout
63
+
64
+ ```
65
+ .claude/runtime/cache/
66
+ ├── repomix/
67
+ │ ├── .gitkeep
68
+ │ └── {feature_id}.xml ephemeral, per-feature, cleared on /pipeline:reset
69
+ └── graphify/
70
+ ├── .gitkeep
71
+ └── {repo_slug}/ long-lived, per-repo, manual rebuild or --update hook
72
+ └── graphify-out/
73
+ ├── graph.json
74
+ ├── graph.html
75
+ └── GRAPH_REPORT.md
76
+ ```
77
+
78
+ `{repo_slug}` = `basename(abs_target)` + `-` + `shasum(abs_target)[:8]`. Stable across machines for same path.
79
+
80
+ ## Invalidation
81
+
82
+ | Cache | Invalidated by |
83
+ |---|---|
84
+ | `repomix/{feature_id}.*` | `/pipeline:reset`, manual `rm`, or stage detecting target diff > 100 LOC since pack |
85
+ | `graphify/{slug}/` | manual `/context:graph --update`, graphify git-hook auto-commit refresh, or manual `rm -rf` |
86
+
87
+ ## Cost notes
88
+
89
+ - **Repomix**: ~1-2s build for medium repo. Token count printed. Fits inside context.
90
+ - **Graphify code-only**: Tree-sitter local, ~5-30s for medium repo, no API key, no network.
91
+ - **Graphify --with-docs**: LLM semantic extraction on docs/PDFs/images. Sends semantic descriptions only (not raw code). Requires API key per their docs. Opt-in only; this harness defaults to code-only.
92
+
93
+ ## Install
94
+
95
+ Both optional. Detect on `hk install`; print install hint if missing. Don't auto-install.
96
+
97
+ ```
98
+ repomix: npm i -g repomix | brew install repomix
99
+ graphify: uv tool install graphifyy | pipx install graphifyy
100
+ ```
101
+
102
+ PyPI package name is `graphifyy` (double y), CLI command is `graphify`.
package/AGENTS.md CHANGED
@@ -31,8 +31,10 @@ CLAUDE.md ← project context (style, role, conventions
31
31
  │ └── staff-software-engineer.md orchestrator agent
32
32
  ├── commands/ ← slash-command entry points
33
33
  │ ├── product-manager/ /product-manager:{prd,prp,run}
34
- │ ├── sse/ /sse:{plan,dev,test,pr,run}
34
+ │ ├── sse/ /sse:{plan,dev,test,pr,run,sdd}
35
+ │ ├── context/ /context:{pack,graph}
35
36
  │ └── pipeline/ /pipeline:{continue,reset}
37
+ ├── shared/ ← cross-agent guides (context-strategy.md)
36
38
  ├── conventions/ ← generic conventions (overridable per repo)
37
39
  ├── hooks/ ← root lifecycle hooks (session-start, prompt, postedit, postwrite, status-line, activity-pre-read)
38
40
  ├── scripts/ ← root utilities (pipeline.py, activity.py, pr-monitor.py)
@@ -58,6 +60,7 @@ CLAUDE.md ← project context (style, role, conventions
58
60
  |---|---|---|---|
59
61
  | `product-manager` | Generate PRD then PRP for a squad/feature | `/product-manager:run` | `.claude/agents/product-manager.md` |
60
62
  | `staff-software-engineer` | Full engineering pipeline: plan → dev → test → pr | `/sse:run` | `.claude/agents/staff-software-engineer.md` |
63
+ | `staff-software-engineer` (sdd) | Spec-driven dev loop, local only: plan once + dev↔test↔eval until PRP spec met | `/sse:sdd` | `.claude/agents/staff-software-engineer/guides/sdd-loop.md` |
61
64
 
62
65
  ---
63
66
 
@@ -75,6 +78,10 @@ When the user types a slash command, the entry point is unambiguous. When the us
75
78
  | "implement the plan" | `/sse:dev` |
76
79
  | "run tests" | `/sse:test` |
77
80
  | "open the PR" | `/sse:pr` |
81
+ | "dev + test locally, no PR" | `/sse:run --local` |
82
+ | "spec-driven loop until PRP met" | `/sse:sdd` |
83
+ | "snapshot a repo for AI context" | `/context:pack <feature_id>` |
84
+ | "build knowledge graph of a repo" | `/context:graph` |
78
85
  | "continue the active pipeline" | `/pipeline:continue` |
79
86
  | "abandon active feature" | `/pipeline:reset` |
80
87
 
@@ -94,6 +101,21 @@ Each stage:
94
101
 
95
102
  Phase markers under `.claude/runtime/outputs/{pm,sse}/.markers/` track stage boundaries (`{feature}.{phase}.{start,end}`) for token accounting and pipeline status.
96
103
 
104
+ ### SDD variant
105
+
106
+ `/sse:sdd` adds a spec-driven loop replacing the single-shot `dev → test`:
107
+
108
+ ```
109
+ prd → prp → plan → [dev ↔ test ↔ spec-satisfied eval] → [user gate] → pr
110
+ ↑ loop, cap 3 iters stops local
111
+ ```
112
+
113
+ - Predicate built from PRP `## 3) What → Success criteria (verifiable)` + `## 6) Validation gates`.
114
+ - Pre-flight sensor `prp-has-acceptance-criteria` blocks if PRP not testable.
115
+ - Per-iter supervisor eval `spec-satisfied` runs in fresh session (no worker context).
116
+ - Transcript at `.claude/runtime/outputs/sse/sdd/{feature_id}.md`.
117
+ - PR never auto-opened. User triggers `/sse:pr` after reviewing transcript.
118
+
97
119
  ---
98
120
 
99
121
  ## Runtime (state, outputs, hooks)
@@ -103,7 +125,8 @@ Generated artifacts and lifecycle hooks live under `.claude/runtime/`.
103
125
  | Path | Contents |
104
126
  |---|---|
105
127
  | `runtime/outputs/pm/{prd,prp,tokens,.markers}/` | PRD/PRP artifacts, token JSONs, phase markers |
106
- | `runtime/outputs/sse/{plan,dev,test,pr,tokens,.markers}/` | Plan/dev/test/pr artifacts, token JSONs, phase markers |
128
+ | `runtime/outputs/sse/{plan,dev,test,pr,sdd,tokens,.markers}/` | Plan/dev/test/pr/sdd-loop artifacts, token JSONs, phase markers |
129
+ | `runtime/cache/{repomix,graphify}/` | Optional context cache: repomix snapshots (per feature_id) + graphify graphs (per repo). See `.claude/shared/context-strategy.md` |
107
130
  | `runtime/hooks/<agent>/` | Per-agent lifecycle hooks (post-write, post-eval, pre-prp-check) |
108
131
  | `runtime/scripts/<agent>/` | Per-agent utilities (sensor-runner, token-phase, link-validator, confluence-publish) |
109
132
 
package/CLAUDE.md CHANGED
@@ -51,12 +51,24 @@ Engineering pipeline. Slash commands:
51
51
  - `/sse:dev` implement the plan, run convention gates
52
52
  - `/sse:test` run repo tests
53
53
  - `/sse:pr` open the draft PR
54
- - `/sse:run` full SSE pipeline
54
+ - `/sse:run` full SSE pipeline. `--local` skip PR. `--sdd` use spec-driven loop variant.
55
+ - `/sse:sdd` spec-driven dev loop. Plan once + dev↔test↔eval loop until PRP spec satisfied. Local only, no PR.
55
56
 
56
57
  Sub-agent `staff-software-engineer` is also Task-tool-invokable. Assets: `.claude/agents/staff-software-engineer/`.
57
58
 
58
59
  Full pipeline order: `prd → prp → plan → dev → test → pr`. Each stage gets an approval marker. The status bar tracks the current one.
59
60
 
61
+ SDD variant: `prd → prp → plan → [dev ↔ test ↔ spec-satisfied eval] → [user gate] → pr`. Loop cap 3 iters. Predicate built from PRP "Success criteria (verifiable)" + "Validation gates". See `.claude/agents/staff-software-engineer/guides/sdd-loop.md`.
62
+
63
+ ### context tools (optional, manual)
64
+
65
+ Two opt-in helpers for big target repos. Both bind to external CLIs; missing binary → cmd prints install hint.
66
+
67
+ - `/context:pack <feature_id>` — `repomix` snapshot to `.claude/runtime/cache/repomix/{feature_id}.xml`. Ephemeral per feature. Cleared on `/pipeline:reset`.
68
+ - `/context:graph [repo]` — `graphify` knowledge graph to `.claude/runtime/cache/graphify/{slug}/graphify-out/`. Long-lived per repo, code-only mode needs no API key (Tree-sitter local).
69
+
70
+ PRP, plan, and SDD supervisor eval consult cache when present and fall back to grep otherwise. Tier order + when-to-use in `.claude/shared/context-strategy.md`.
71
+
60
72
  ## Project conventions override
61
73
 
62
74
  Each target repo can override SSE defaults with files in `.claude/conventions/`:
package/README.md CHANGED
@@ -2,9 +2,9 @@
2
2
 
3
3
  # harness-kit
4
4
 
5
- From idea to merged PR. One pipeline. Six stages.
5
+ From idea to merged PR. One pipeline. Six stages. Or spec-driven loop until the spec is met.
6
6
 
7
- [![Version](https://img.shields.io/badge/version-4.0.0-blue.svg)](VERSION)
7
+ [![Version](https://img.shields.io/badge/version-4.1.0-blue.svg)](VERSION)
8
8
  [![Claude Code](https://img.shields.io/badge/Claude%20Code-AGENTS.md-8b5cf6.svg)](https://claude.ai/code)
9
9
  [![Agents](https://img.shields.io/badge/agents-2-success.svg)](#agents)
10
10
  [![License](https://img.shields.io/badge/license-MIT-lightgrey.svg)](LICENSE)
@@ -29,6 +29,15 @@ prd → prp → plan → dev → test → pr
29
29
 
30
30
  Each stage produces a markdown artifact, gated by **deterministic sensors** (pass/fail) and a **scored eval** (≥ 8.0). After the PR opens, an in-session monitor watches for merge.
31
31
 
32
+ For local-only work, two extra modes skip the PR stage:
33
+
34
+ ```
35
+ /sse:run --local plan → dev → test → STOP (single shot, no loop)
36
+ /sse:sdd plan → [dev↔test↔eval]×3 → STOP (spec-driven goal loop)
37
+ ```
38
+
39
+ `/sse:sdd` is the SDD variant: the PRP is the spec, and an independent supervisor session re-checks the repo against `Success criteria (verifiable)` + `Validation gates` after every dev↔test iteration. PR is never auto-opened — user runs `/sse:pr` after reviewing the loop transcript.
40
+
32
41
  ---
33
42
 
34
43
  ## Install
@@ -51,11 +60,83 @@ CLI: `hk install` · `hk update` · `hk uninstall` · `hk status` · `hk version
51
60
 
52
61
  ---
53
62
 
63
+ ## Getting started
64
+
65
+ Pick the flow that matches the task. All of them share the same pipeline state, so you can switch between them mid-feature.
66
+
67
+ ### Big task — full pipeline (PM + Eng)
68
+
69
+ A new feature with stakes, ambiguity, or a Jira ticket attached. You want a written PRD, a thought-through PRP, a plan, code, tests, and a PR.
70
+
71
+ ```
72
+ /product-manager:run # drafts PRD then PRP, with sensor + eval gates
73
+ /sse:run # plans, implements, tests, opens PR, watches for merge
74
+ ```
75
+
76
+ Approve each artifact when prompted. The status bar tracks where you are in the six stages.
77
+
78
+ ### Spec only — no code yet
79
+
80
+ You need the PRD and PRP to align with stakeholders before any engineering work. Stop after the PRP.
81
+
82
+ ```
83
+ /product-manager:run
84
+ ```
85
+
86
+ When eng is ready, hand them the repo and they run `/sse:run` against the approved PRP.
87
+
88
+ ### Dev only — small change, plan in your head
89
+
90
+ A bug fix, a small enhancement, or a refactor where writing a PRD would be theatre. Skip PM, run engineering directly.
91
+
92
+ ```
93
+ /sse:run # plan → dev → test → PR
94
+ /sse:run --local # plan → dev → test, stop before PR (push manually later)
95
+ ```
96
+
97
+ Or run a single stage if that's all you need:
98
+
99
+ ```
100
+ /sse:plan # just the plan
101
+ /sse:dev # just the code (against an approved plan)
102
+ /sse:test # just the tests
103
+ /sse:pr # just open the PR
104
+ ```
105
+
106
+ ### Spec-driven loop — iterate locally until the PRP is satisfied
107
+
108
+ You have an approved PRP and want Claude to loop dev↔test until the spec actually passes, judged by an independent supervisor session. No PR until you say so.
109
+
110
+ ```
111
+ /sse:sdd # plan once + dev↔test↔spec-satisfied eval, cap 3 iters
112
+ # review .claude/runtime/outputs/sse/sdd/{feature_id}.md
113
+ /sse:pr # manual gate when ready
114
+ ```
115
+
116
+ The loop predicate is built from the PRP's `Success criteria (verifiable)` and `Validation gates` sections — both must be present and concrete, or the `prp-has-acceptance-criteria` sensor blocks before the first iteration runs. Cap hit without a PASS verdict returns a blocker listing the unmet criteria.
117
+
118
+ ### Resume — pick up where you left off
119
+
120
+ Closed the session, restarted Claude Code, or got interrupted. State persists at `.claude/.pipeline-state.json`.
121
+
122
+ ```
123
+ /pipeline:continue # next pending stage for the active feature
124
+ /pipeline:reset # abandon the active run and start fresh
125
+ ```
126
+
127
+ When the PR merges, the in-session monitor clears state automatically.
128
+
129
+ ---
130
+
54
131
  ## Use it
55
132
 
56
133
  ```
57
134
  /product-manager:run draft PRD then PRP
58
135
  /sse:run plan, dev, test, open PR, watch for merge
136
+ /sse:run --local plan, dev, test — stop before PR
137
+ /sse:sdd spec-driven loop: dev↔test↔eval until PRP met, no PR
138
+ /context:pack <feature_id> repomix snapshot of target repo (per-feature cache)
139
+ /context:graph [repo] graphify knowledge graph of a target repo (per-repo cache)
59
140
  /pipeline:continue resume next pending stage
60
141
  /pipeline:reset abandon active run
61
142
  ```
@@ -70,8 +151,9 @@ Need just one stage? Each is its own slash command:
70
151
  | `dev` | `/sse:dev` | `code-conventions`, `test-coverage`, `dev-structure` · `dev-quality` |
71
152
  | `test` | `/sse:test` | `test-structure` · `test-quality` |
72
153
  | `pr` | `/sse:pr` | `pr-structure` · `pr-quality` · auto-arms `/sse:pr-monitor` |
154
+ | `sdd` | `/sse:sdd` | `prp-has-acceptance-criteria` (pre-flight) · `spec-satisfied` per iter (fresh session) · cap 3 iters |
73
155
 
74
- Sensors block on failure (Claude regenerates). Evals score; threshold 8.0; retried up to 3 times.
156
+ Sensors block on failure (Claude regenerates). Evals score; threshold 8.0; retried up to 3 times. SDD eval returns PASS/FAIL — FAIL re-enters the loop with a `next_iter_focus` hint.
75
157
 
76
158
  ---
77
159
 
@@ -87,12 +169,13 @@ Registered in [`AGENTS.md`](./AGENTS.md) at the repo root. Each ships its own se
87
169
  - Guides: `pipeline.md`, `prd-guidelines.md`, `prp-guidelines.md`, `writing-style.md`, `templates/`, `examples/`
88
170
  - [Full docs →](.claude/agents/product-manager/README.md)
89
171
 
90
- ### `staff-software-engineer` — turns an approved PRP into a merged PR
172
+ ### `staff-software-engineer` — turns an approved PRP into a merged PR (or a satisfied spec)
91
173
 
92
174
  - Skills: `backend`, `web`, `mobile`, `devops` (auto-detected from repo)
93
- - Sensors: 6 (`plan-structure`, `code-conventions`, `test-coverage`, `dev-structure`, `test-structure`, `pr-structure`)
94
- - Evals: 4 (`plan`, `dev`, `test`, `pr` quality)
95
- - Guides: `pipeline.md`, `coding-style.md`, `commit-style.md`, `conventions-override.md`
175
+ - Sensors: 7 (`plan-structure`, `code-conventions`, `test-coverage`, `dev-structure`, `test-structure`, `pr-structure`, `prp-has-acceptance-criteria`)
176
+ - Evals: 5 (`plan`, `dev`, `test`, `pr` quality; `spec-satisfied` supervisor for SDD loop)
177
+ - Guides: `pipeline.md`, `coding-style.md`, `commit-style.md`, `conventions-override.md`, `sdd-loop.md`
178
+ - Modes: `/sse:run` (full pipeline), `/sse:run --local` (skip PR), `/sse:sdd` (spec-driven loop)
96
179
  - [Full docs →](.claude/agents/staff-software-engineer/README.md)
97
180
 
98
181
  ---
@@ -146,13 +229,15 @@ What `hk install` lays down in your repo:
146
229
  ├── CLAUDE.md workspace style + role
147
230
  └── .claude/
148
231
  ├── agents/ agent definitions (sensors, evals, guides, skills)
149
- ├── commands/ slash command entry points
232
+ ├── commands/ slash command entry points (pm, sse, context, pipeline)
233
+ ├── shared/ cross-agent guides (context-strategy.md)
150
234
  ├── hooks/ status-line + lifecycle hooks
151
- ├── scripts/ pipeline.py · activity.py · pr-monitor.py
235
+ ├── scripts/ pipeline.py · activity.py · pr-monitor.py · pack-repo.sh · graph-repo.sh
152
236
  ├── runtime/
153
237
  │ ├── hooks/<agent>/ per-agent lifecycle (post-write, post-eval, pre-prp-check)
154
238
  │ ├── scripts/<agent>/ per-agent utilities (sensor-runner, token-phase, link-validator)
155
- └── outputs/{pm,sse}/ generated artifacts, markers, tokens
239
+ ├── outputs/{pm,sse}/ generated artifacts, markers, tokens (incl. sse/sdd/ loop transcripts)
240
+ │ └── cache/ repomix packs + graphify graphs (optional, gitignored)
156
241
  ├── conventions/ your per-repo overrides
157
242
  └── settings.json hook wiring
158
243
  ```
@@ -163,14 +248,25 @@ Full path-by-path map in [`AGENTS.md`](./AGENTS.md).
163
248
 
164
249
  ## Tooling
165
250
 
166
- | Tool | Why |
167
- |------|-----|
168
- | [Claude Code](https://claude.ai/code) | agent runtime |
169
- | python3 | sensors, token accounting, pipeline state |
170
- | [gh CLI](https://cli.github.com/) | opens PR, polls for merge |
171
- | git | branch + commit ops |
251
+ | Tool | Why | Required |
252
+ |------|-----|----------|
253
+ | [Claude Code](https://claude.ai/code) | agent runtime | yes |
254
+ | python3 | sensors, token accounting, pipeline state | yes |
255
+ | [gh CLI](https://cli.github.com/) | opens PR, polls for merge | for `/sse:pr` |
256
+ | git | branch + commit ops | yes |
257
+ | [repomix](https://repomix.com) | snapshot target repo for AI context (`/context:pack`) | optional |
258
+ | [graphify](https://github.com/safishamsi/graphify) | queryable knowledge graph of a repo (`/context:graph`) | optional |
259
+
260
+ Install optional tools:
261
+
262
+ ```bash
263
+ npm i -g repomix # or: brew install repomix
264
+ uv tool install graphifyy # or: pipx install graphifyy (CLI cmd is `graphify`)
265
+ ```
266
+
267
+ `hk install` detects both and prints a hint if missing — never auto-installs. See [`.claude/shared/context-strategy.md`](.claude/shared/context-strategy.md) for when each tier is worth it (grep vs pack vs graph).
172
268
 
173
- Optional: `jq` for token JSON queries. `JIRA_USERNAME` + `JIRA_API_TOKEN` to publish PRD/PRP to Confluence.
269
+ Other optional: `jq` for token JSON queries. `JIRA_USERNAME` + `JIRA_API_TOKEN` to publish PRD/PRP to Confluence.
174
270
 
175
271
  ---
176
272
 
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.0.0
1
+ 4.1.0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@pieerry/harness-kit",
3
- "version": "4.0.0",
3
+ "version": "4.1.0",
4
4
  "description": "Claude Code harness for product + engineering delivery. From idea to merged PR, one pipeline.",
5
5
  "author": "Space Metrics AI",
6
6
  "license": "MIT",
package/setup/install.sh CHANGED
@@ -24,6 +24,18 @@ fi
24
24
  command -v git >/dev/null 2>&1 || { echo "git not found"; exit 1; }
25
25
  command -v python3 >/dev/null 2>&1 || { echo "python3 not found"; exit 1; }
26
26
 
27
+ # Optional context tools (repomix snapshot, graphify knowledge graph).
28
+ # Both are no-ops if missing — /context:pack and /context:graph will print install hints.
29
+ if ! command -v repomix >/dev/null 2>&1; then
30
+ echo " optional: repomix not found. /context:pack disabled until you install it."
31
+ echo " npm i -g repomix | brew install repomix"
32
+ fi
33
+ if ! command -v graphify >/dev/null 2>&1; then
34
+ echo " optional: graphify not found. /context:graph disabled until you install it."
35
+ echo " uv tool install graphifyy | pipx install graphifyy | pip install graphifyy"
36
+ echo " (PyPI pkg graphifyy, double y; CLI cmd graphify)"
37
+ fi
38
+
27
39
  OLD_VERSION="$(cat "$TARGET/.claude/.hk-version" 2>/dev/null || echo "")"
28
40
 
29
41
  if [ -n "$OLD_VERSION" ] && [ "$OLD_VERSION" != "$VERSION" ]; then
@@ -57,30 +69,35 @@ for agent in product-manager staff-software-engineer; do
57
69
  done
58
70
 
59
71
  # 3) Per-agent runtime hooks + scripts (NOT outputs — that's target-side state)
60
- mkdir -p "$TARGET/.claude/runtime/hooks" "$TARGET/.claude/runtime/scripts" "$TARGET/.claude/runtime/outputs/pm/.markers" "$TARGET/.claude/runtime/outputs/sse/.markers"
72
+ mkdir -p "$TARGET/.claude/runtime/hooks" "$TARGET/.claude/runtime/scripts" "$TARGET/.claude/runtime/outputs/pm/.markers" "$TARGET/.claude/runtime/outputs/sse/.markers" "$TARGET/.claude/runtime/outputs/sse/sdd" "$TARGET/.claude/runtime/cache/repomix" "$TARGET/.claude/runtime/cache/graphify"
73
+ touch "$TARGET/.claude/runtime/cache/repomix/.gitkeep" "$TARGET/.claude/runtime/cache/graphify/.gitkeep"
61
74
  for agent in product-manager staff-software-engineer; do
62
75
  rm -rf "$TARGET/.claude/runtime/hooks/$agent"
63
76
  cp -R "$SOURCE_ROOT/.claude/runtime/hooks/$agent" "$TARGET/.claude/runtime/hooks/$agent"
64
- rm -rf "$TARGET/.claude/runtime/scripts/$agent"
65
- cp -R "$SOURCE_ROOT/.claude/runtime/scripts/$agent" "$TARGET/.claude/runtime/scripts/$agent"
66
77
  done
67
78
 
68
- # Re-resolve symlinks in SSE scripts (point at PM's shared utilities)
69
- for f in "$TARGET/.claude/runtime/scripts/staff-software-engineer/"*; do
70
- if [ -L "$f" ]; then
71
- name="$(basename "$f")"
72
- rm "$f"
73
- ln -sf "../product-manager/$name" "$f"
74
- fi
79
+ # PM scripts: copy real files. SSE scripts: build dir of symlinks → PM's shared utilities.
80
+ # (SSE scripts in source repo are symlinks; npm pack drops symlinks, so we recreate target-side.)
81
+ rm -rf "$TARGET/.claude/runtime/scripts/product-manager"
82
+ cp -R "$SOURCE_ROOT/.claude/runtime/scripts/product-manager" "$TARGET/.claude/runtime/scripts/product-manager"
83
+
84
+ rm -rf "$TARGET/.claude/runtime/scripts/staff-software-engineer"
85
+ mkdir -p "$TARGET/.claude/runtime/scripts/staff-software-engineer"
86
+ for name in sensor-runner.py token-phase.py; do
87
+ ln -sf "../product-manager/$name" "$TARGET/.claude/runtime/scripts/staff-software-engineer/$name"
75
88
  done
76
89
 
77
90
  # 4) Slash commands
78
91
  mkdir -p "$TARGET/.claude/commands"
79
- for ns in product-manager sse pipeline; do
92
+ for ns in product-manager sse pipeline context; do
80
93
  rm -rf "$TARGET/.claude/commands/$ns"
81
94
  cp -R "$SOURCE_ROOT/.claude/commands/$ns" "$TARGET/.claude/commands/$ns"
82
95
  done
83
96
 
97
+ # 4.5) Shared cross-agent guides
98
+ mkdir -p "$TARGET/.claude/shared"
99
+ cp -R "$SOURCE_ROOT/.claude/shared/." "$TARGET/.claude/shared/"
100
+
84
101
  # 5) Root harness hooks (status-line + pipeline tracking + live activity)
85
102
  mkdir -p "$TARGET/.claude/hooks"
86
103
  for h in status-line.sh pipeline-prompt.sh pipeline-postwrite.sh pipeline-postedit.sh pipeline-session-start.sh activity-pre-read.sh; do
@@ -88,12 +105,16 @@ for h in status-line.sh pipeline-prompt.sh pipeline-postwrite.sh pipeline-posted
88
105
  chmod +x "$TARGET/.claude/hooks/$h"
89
106
  done
90
107
 
91
- # 6) Root harness scripts (pipeline state, activity, PR monitor, stage-card)
108
+ # 6) Root harness scripts (pipeline state, activity, PR monitor, stage-card, context wrappers)
92
109
  mkdir -p "$TARGET/.claude/scripts"
93
110
  for s in pipeline.py activity.py pr-monitor.py; do
94
111
  cp "$SOURCE_ROOT/.claude/scripts/$s" "$TARGET/.claude/scripts/$s"
95
112
  chmod +x "$TARGET/.claude/scripts/$s"
96
113
  done
114
+ for s in pack-repo.sh graph-repo.sh; do
115
+ cp "$SOURCE_ROOT/.claude/scripts/$s" "$TARGET/.claude/scripts/$s"
116
+ chmod +x "$TARGET/.claude/scripts/$s"
117
+ done
97
118
  cp "$SOURCE_ROOT/.claude/scripts/stage-card.md" "$TARGET/.claude/scripts/stage-card.md"
98
119
 
99
120
  # 6.5) Strip any __pycache__/.pyc that hitched a ride from the source repo
@@ -209,5 +230,6 @@ echo "$VERSION" > "$TARGET/.claude/.hk-version"
209
230
 
210
231
  echo "done. restart Claude Code to load."
211
232
  echo " /product-manager:prd | :prp | :run"
212
- echo " /sse:plan | :dev | :test | :pr | :pr-monitor | :run"
233
+ echo " /sse:plan | :dev | :test | :pr | :pr-monitor | :run | :sdd"
234
+ echo " /context:pack | :graph"
213
235
  echo " /pipeline:continue | :reset"