npm - @pieerry/harness-kit - Versions diffs - 4.0.1 → 4.1.0 - Mend

@pieerry/harness-kit 4.0.1 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/.claude/agents/product-manager/README.md +7 -0
package/.claude/agents/product-manager/skills/prp/SKILL.md +7 -1
package/.claude/agents/staff-software-engineer/README.md +25 -11
package/.claude/agents/staff-software-engineer/evals/spec-satisfied.md +66 -0
package/.claude/agents/staff-software-engineer/guides/pipeline.md +7 -0
package/.claude/agents/staff-software-engineer/guides/sdd-loop.md +127 -0
package/.claude/agents/staff-software-engineer/sensors/prp-has-acceptance-criteria.md +35 -0
package/.claude/commands/context/graph.md +65 -0
package/.claude/commands/context/pack.md +59 -0
package/.claude/commands/sse/plan.md +7 -0
package/.claude/commands/sse/run.md +17 -7
package/.claude/commands/sse/sdd.md +98 -0
package/.claude/runtime/cache/graphify/.gitkeep +0 -0
package/.claude/runtime/cache/repomix/.gitkeep +0 -0
package/.claude/runtime/outputs/sse/sdd/.gitkeep +0 -0
package/.claude/scripts/graph-repo.sh +58 -0
package/.claude/scripts/pack-repo.sh +54 -0
package/.claude/settings.local.json +6 -1
package/.claude/shared/context-strategy.md +102 -0
package/AGENTS.md +25 -2
package/CLAUDE.md +13 -1
package/README.md +58 -17
package/VERSION +1 -1
package/package.json +1 -1
package/setup/install.sh +26 -4

package/.claude/agents/product-manager/README.md CHANGED Viewed

@@ -12,6 +12,8 @@ Registered in [`AGENTS.md`](../../../AGENTS.md). Agent definition: [`product-man
 Also invokable as sub-agent via Task tool with `subagent_type: "product-manager"`.
+PRP explorer (`skills/prp/SKILL.md`) consults optional context cache before grep: cached graphify graph or repomix pack at `.claude/runtime/cache/{graphify,repomix}/`. See [`.claude/shared/context-strategy.md`](../../../.claude/shared/context-strategy.md). Built via `/context:graph` or `/context:pack` — manual, optional.
 ## Tree
 ```
@@ -90,6 +92,11 @@ jq -s '.[] | select(.feature_id | contains("dispatch"))' .claude/runtime/outputs
 After a PRP is approved, engineering picks it up via the [staff-software-engineer plugin](../staff-software-engineer/README.md). The SSE plugin reads `.claude/runtime/outputs/pm/prp/{feature_id}.md` and runs plan → dev → test → pr stages, all writing to the same `.claude/runtime/outputs/pm/tokens/{feature_id}.json` file. Full feature lifecycle in one token log.
+Engineering can choose:
+- `/sse:run` — full pipeline through merged PR
+- `/sse:run --local` — plan, dev, test; stop before PR
+- `/sse:sdd` — spec-driven loop using the PRP's `Success criteria (verifiable)` + `Validation gates` as goal predicate. PRP authors should write **testable** criteria — vague bullets fail the `prp-has-acceptance-criteria` pre-flight sensor and block the loop.
 ## Status bar
 The repo's status-line (`.claude/hooks/status-line.sh`) detects PM activity automatically. If any PRD or PRP file under this plugin was modified in the last hour, it switches the status bar to pipeline mode and falls back to the engineering picker otherwise.

package/.claude/agents/product-manager/skills/prp/SKILL.md CHANGED Viewed

@@ -25,8 +25,14 @@ Read:
 - guides/templates/prp.md
 - guides/pipeline.md
 - guides/examples/good-prp-example.md
+- .claude/shared/context-strategy.md — pick the right tier when exploring target repos
-Explore target repos. Ask user for repo paths if not provided. Use Grep and Read to map files. Capture file:line. Never invent paths.
+Explore target repos. Ask user for repo paths if not provided. Use the context-strategy tier order:
+1. Cached graphify graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` → query for symbols/callers (much cheaper than grep)
+2. Cached repomix pack at `.claude/runtime/cache/repomix/*.xml` → read for full file content
+3. Fall back to Grep + Read on live repo
+Capture file:line. Never invent paths. If a target repo is large + uncached, suggest the user run `/context:graph {repo}` before continuing — don't auto-build.
 Save to .claude/runtime/outputs/pm/prp/{feature_id}.md.

package/.claude/agents/staff-software-engineer/README.md CHANGED Viewed

@@ -10,10 +10,13 @@ Registered in [`AGENTS.md`](../../../AGENTS.md). Agent definition: [`staff-softw
 - `/sse:dev`: implement the plan in code, run convention gates
 - `/sse:test`: run the project test suite
 - `/sse:pr`: open the draft PR
-- `/sse:run`: full pipeline, plan to pr
+- `/sse:run`: full pipeline, plan to pr. Flags: `--local` (stop before PR), `--sdd` (hand off to spec-driven loop), `--no-monitor` (skip PR monitor only)
+- `/sse:sdd`: spec-driven dev loop. Plan once, then dev↔test↔spec-satisfied eval until PRP spec met. Cap 3 iters. Local only — never auto-opens PR.
 Also invokable as sub-agent via Task tool with `subagent_type: "staff-software-engineer"`.
+Optional context helpers (separate namespace, manual): `/context:pack <feature_id>` and `/context:graph [repo]`. Plan stage + SDD supervisor eval consult the cache when present. See [`.claude/shared/context-strategy.md`](../../../.claude/shared/context-strategy.md).
 ## Tree
 ```
@@ -25,23 +28,28 @@ Also invokable as sub-agent via Task tool with `subagent_type: "staff-software-e
 │   ├── mobile/SKILL.md                    iOS/Android defaults
 │   └── devops/SKILL.md                    CI/IaC defaults
 ├── guides/
-│   ├── pipeline.md                        retry, approval, token accounting
+│   ├── pipeline.md                        retry, approval, token accounting, variants (--local, sdd)
+│   ├── sdd-loop.md                        spec-driven loop algorithm, predicate from PRP
 │   ├── coding-style.md                    team code style
 │   ├── commit-style.md                    Conventional Commits with TICKET
 │   └── conventions-override.md            how project overrides work
-├── sensors/                               plan, dev, test, pr structure + conventions
-└── evals/                                 plan/dev/test/pr quality rubrics
+├── sensors/                               plan, dev, test, pr structure + conventions + prp-has-acceptance-criteria (sdd pre-flight)
+└── evals/                                 plan/dev/test/pr quality rubrics + spec-satisfied (sdd supervisor)
 .claude/runtime/                          ← state + outputs
 ├── hooks/staff-software-engineer/         phase markers, sensor gates
 ├── scripts/staff-software-engineer/       symlinks to PM scripts (sensor-runner, token-phase)
-└── outputs/sse/
-    ├── plan/                              generated plans
-    ├── dev/                               dev summaries
-    ├── test/                              test results
-    ├── pr/                                opened PR records
-    ├── tokens/                            per-feature phase tokens JSON
-    └── .markers/                          phase start/end markers (transient)
+├── outputs/sse/
+│   ├── plan/                              generated plans
+│   ├── dev/                               dev summaries
+│   ├── test/                              test results
+│   ├── pr/                                opened PR records
+│   ├── sdd/                               sdd-loop transcripts (per-iter eval verdicts)
+│   ├── tokens/                            per-feature phase tokens JSON
+│   └── .markers/                          phase start/end markers (transient)
+└── cache/                                 optional context tools (gitignored)
+    ├── repomix/{feature_id}.xml           per-feature snapshot (ephemeral, cleared on /pipeline:reset)
+    └── graphify/{slug}/graphify-out/      per-repo knowledge graph (long-lived, manual rebuild)
 ```
 ## How conventions work
@@ -73,6 +81,12 @@ Plugin skills read both. Project rules win. See `guides/conventions-override.md`
 | Test command detection | commands/test.md |
 | PR template | commands/pr.md, hooks/post-eval-pr.sh |
 | Sensors | sensors/*.md |
+| SDD loop algorithm | guides/sdd-loop.md |
+| SDD iter cap | guides/sdd-loop.md + commands/sdd.md |
+| SDD predicate rubric | evals/spec-satisfied.md |
+| SDD pre-flight check | sensors/prp-has-acceptance-criteria.md |
+| --local behavior | commands/run.md (## Flags) |
+| Context tier order | ../../shared/context-strategy.md |
 ## Connects to PM plugin

package/.claude/agents/staff-software-engineer/evals/spec-satisfied.md ADDED Viewed

@@ -0,0 +1,66 @@
+# Eval: Spec Satisfied
+Type: LLM-judge (supervisor)
+Mode: goal predicate
+Verdict: PASS or FAIL
+Decide if current repo state satisfies source PRP. Run by independent session (no prior context from worker turn) to avoid worker self-judging.
+## Inputs
+- source PRP: `.claude/runtime/outputs/pm/prp/{feature_id}.md`
+- latest dev summary: `.claude/runtime/outputs/sse/dev/{feature_id}.md`
+- latest test report: `.claude/runtime/outputs/sse/test/{feature_id}.md`
+- diff: `git diff {base}...HEAD` on dev branch
+- **optional richer context** (per `.claude/shared/context-strategy.md`):
+  - cached pack `.claude/runtime/cache/repomix/{feature_id}.xml` if present → read for surrounding code
+  - cached graph `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` if present → query for callers of touched symbols (impact analysis)
+## Rubric
+Walk each criterion. For each, mark MET / NOT MET / UNCLEAR with evidence (file:line, test name, or diff hunk).
+### 1. Success criteria coverage
+Every bullet under PRP section 3 "Success criteria (verifiable)" mapped to:
+- code change that implements it (file:line), AND
+- test that asserts it (test name)
+Missing either side → NOT MET.
+### 2. Validation gates green
+Every command in PRP section 6 "Validation gates" bash block exit 0 in latest test report. UNCLEAR if command not run.
+### 3. Manual verification items
+Each `- [ ]` under "Manual verification" in PRP. Worker cannot tick these — flag as PENDING for user. Do not block goal on these; report separately.
+### 4. Scope discipline
+No code change outside PRP "Repos and files touched" without justification in dev summary. Out-of-scope diff hunks → flag for review, not auto-fail.
+## Verdict
+- All success criteria MET + all validation gates green → **PASS**
+- Any criterion NOT MET → **FAIL** with specific next-iter focus
+- Any UNCLEAR → **FAIL** (treat as not satisfied; iter again or add evidence)
+## Output
+```json
+{
+  "verdict": "PASS|FAIL",
+  "criteria": [
+    {"criterion": "...", "status": "MET|NOT_MET|UNCLEAR", "evidence": "file:line | test name | diff hunk"}
+  ],
+  "validation_gates": [
+    {"command": "...", "exit_code": 0, "status": "GREEN|RED|NOT_RUN"}
+  ],
+  "manual_pending": ["..."],
+  "scope_flags": ["..."],
+  "next_iter_focus": "specific fix for FAIL; omit on PASS"
+}
+```
+## On FAIL
+`/sse:sdd` loop reads `next_iter_focus`, regenerates dev step targeting only that. Re-runs test. Re-evals.
+Max 3 iters. Cap hit → hard stop, write loop transcript + final verdict to `.claude/runtime/outputs/sse/sdd/{feature_id}.md`, return blocker.

package/.claude/agents/staff-software-engineer/guides/pipeline.md CHANGED Viewed

@@ -9,6 +9,12 @@ Shared rules for plan, dev, test, pr stages. Edit retry, approval, publish, toke
 3. test: run project test suite. Capture results.
 4. pr: open draft PR with standard template.
+### Variants
+- `/sse:run` runs 1→4 sequentially.
+- `/sse:run --local` runs 1→3, skips 4. Use for local dev+test without push.
+- `/sse:sdd` replaces 2+3 with goal-loop: dev↔test↔spec-satisfied eval, cap 3 iters, predicate from PRP. Stops local. See `sdd-loop.md`.
 ## Retry policy
 Max attempts per stage: 3.
@@ -67,6 +73,7 @@ To merge with PM tokens file, SSE token-phase.py writes to same path under this
 - tests fail after dev
 - gh CLI not available for pr stage
 - missing required input after one clarification
+- SDD: 3 loop iters complete without spec-satisfied PASS verdict
 ## Conventions override

package/.claude/agents/staff-software-engineer/guides/sdd-loop.md ADDED Viewed

@@ -0,0 +1,127 @@
+# SDD Loop
+Spec-driven dev loop. Used by `/sse:sdd`. Wraps plan + dev + test in goal-loop. Stops local. No PR.
+Inspired by Claude Code `/goal` (May 2026): worker model attempts work, independent evaluator checks goal met, retry until met or cap hit.
+## When to use
+- Want to iterate locally on a feature until spec satisfied
+- Not ready to push or open PR
+- PRP exists with testable success criteria + validation gates
+When not:
+- No PRP — run `/product-manager:prp` first
+- Want PR opened automatically — use `/sse:run`
+- PRP success criteria vague — sensor blocks, fix PRP first
+## Inputs
+- source PRP: latest approved in `.claude/runtime/outputs/pm/prp/` or path arg
+- feature_id derived from PRP filename
+## Predicate (goal completion condition)
+Built from PRP:
+- every bullet in `## 3) What` → `Success criteria (verifiable):` MET (code + test exist)
+- every command in `## 6) Validation gates` bash block exit 0
+Eval rubric: `evals/spec-satisfied.md`. Independent session reads PRP + dev summary + test report + git diff. Returns PASS or FAIL with `next_iter_focus`.
+## Algorithm
+```
+1. pre-flight
+   - sensor: prp-has-acceptance-criteria → block if FAIL
+   - read PRP, extract criteria + gates
+2. plan once
+   - invoke /sse:plan (normal gates, eval, retry up to 3)
+   - get approved plan
+3. goal loop (cap = 3 iters)
+   iter = 1
+   while iter <= 3:
+     a. /sse:dev step (worker)
+        - if iter > 1, focus on `next_iter_focus` from prior eval
+     b. /sse:test
+     c. supervisor eval: spec-satisfied.md
+        - fresh session, no worker context
+        - reads PRP + dev/{id}.md + test/{id}.md + git diff
+     d. verdict == PASS:
+        - write sdd/{feature_id}.md transcript
+        - exit loop, success
+     e. verdict == FAIL:
+        - append iter result to sdd transcript
+        - iter += 1
+        - continue
+4. cap hit (iter > 3)
+   - hard stop
+   - write sdd/{feature_id}.md with full transcript + final verdict
+   - return blocker, list NOT_MET criteria + UNCLEAR items
+```
+## Output artifact
+`.claude/runtime/outputs/sse/sdd/{feature_id}.md`:
+```markdown
+# SDD Loop: {feature_id}
+Source PRP: {path}
+Iterations: {N}
+Final verdict: PASS | FAIL (cap hit)
+## Iteration 1
+- dev summary: {path}
+- test report: {path}
+- eval verdict: FAIL
+- next focus: {string}
+## Iteration 2
+...
+## Final state
+- branch: {branch}
+- commits: {N}
+- criteria MET: {N}/{total}
+- gates GREEN: {N}/{total}
+- manual pending: [...]
+## Next step
+PASS → review diff, run `/sse:pr` when ready
+FAIL → address blockers, re-run `/sse:sdd`
+```
+## Token accounting
+Per-iter phases tracked under same `.markers/` dir as standard pipeline:
+- `{feature_id}.sdd-iter{N}-dev.{start,end}`
+- `{feature_id}.sdd-iter{N}-test.{start,end}`
+- `{feature_id}.sdd-iter{N}-eval.{start,end}`
+Reuses existing `scripts/staff-software-engineer/token-phase.py` — pass phase name as arg.
+## Stop conditions
+- 3 iters complete without PASS
+- pre-flight sensor fails (PRP not testable)
+- `/sse:plan` hard stops (3 plan retries exhausted)
+- user interrupts mid-loop
+## Why separate from /sse:run
+`/sse:run` is monolithic plan→dev→test→pr. SDD loop adds iter inside dev+test, omits pr. Two reasons:
+1. **Local dev workflow**: want repeated dev↔test cycle with spec-judge feedback, not single shot
+2. **PR gate**: human reviews loop transcript before pushing. Auto-PR on goal PASS is too eager — supervisor can be wrong
+`/sse:run --local` exists as cheap alternative (skip pr stage, no loop). SDD loop is the spec-driven mode.
+## References
+- predicate source: `guides/templates/prp.md` sections 3, 6 (product-manager plugin)
+- worker stage: `commands/sse/dev.md`, `commands/sse/test.md`
+- evaluator: `evals/spec-satisfied.md`
+- gating sensor: `sensors/prp-has-acceptance-criteria.md`

package/.claude/agents/staff-software-engineer/sensors/prp-has-acceptance-criteria.md ADDED Viewed

@@ -0,0 +1,35 @@
+# Sensor: PRP Has Acceptance Criteria
+Type: structural
+Mode: hard gate
+`/sse:sdd` uses PRP as load-bearing spec. Goal-loop predicate built from PRP. PRP missing testable criteria → predicate undefined → loop cannot judge done. Block early.
+## Check
+Source PRP must contain both:
+1. **Success criteria (verifiable):** bullet list under section `## 3) What`. Each bullet must be testable (observable behavior, output, or measurable state). Vague bullets (e.g. "works well", "user-friendly") fail.
+2. **Validation gates** section `## 6) Validation gates` with non-empty bash block. Empty fenced block or placeholder `{commands the executor MUST run}` fails.
+## Pass conditions
+- `Success criteria (verifiable):` literal string present
+- At least 1 bullet under it (line starts with `- ` or `* `)
+- No bullet contains placeholder `{` `}` braces
+- `## 6) Validation gates` present
+- Bash fence under it contains at least 1 non-comment, non-placeholder command
+## On failure
+Block. Return blocker pointing at missing/weak section. Tell user to run `/product-manager:prp` and address feedback before retrying `/sse:sdd`.
+## Example failure output
+```
+prp-has-acceptance-criteria: FAIL
+  file: .claude/runtime/outputs/pm/prp/{feature_id}.md
+  reason: section "Success criteria (verifiable)" empty
+  fix:    add 2-4 testable bullets under section 3
+```

package/.claude/commands/context/graph.md ADDED Viewed

@@ -0,0 +1,65 @@
+---
+description: Build queryable knowledge graph of target repo with graphify. Cached per repo, reused across features.
+---
+Build graphify knowledge graph for a long-lived target repo. Tree-sitter local for code (no API key). Adds semantic edges via LLM for docs/PDFs/images if present (`--with-docs` later, v2).
+## Inputs
+- Arg 1 (optional): target repo path (default = cwd)
+- `--update`: incremental refresh, merge with existing graph (cheap, fast)
+- `--deep`: aggressive edge inference (slower, higher fidelity)
+- `--wiki`: also build wiki-style markdown index
+## Pre-flight
+Run `command -v graphify`. Missing → block with install hint:
+```
+graphify not installed
+install: uv tool install graphifyy  |  pipx install graphifyy  |  pip install graphifyy
+note:    PyPI pkg is graphifyy (double y), CLI cmd is graphify
+docs:    https://github.com/safishamsi/graphify
+```
+## Steps
+1. Bash: `.claude/scripts/graph-repo.sh [target] [--update] [--deep] [--wiki]`
+2. Script writes `.claude/runtime/cache/graphify/{repo_slug}/graphify-out/`
+3. Outputs `graph.json` (queryable), `graph.html` (interactive), `GRAPH_REPORT.md` (audit)
+4. Report cache path + open command for graph.html
+## When to use
+- Target repo touched by many features (e.g. main product monorepo) — amortize build cost
+- Plan stage needs "find all callers of X", "where does Y live" — graph query >> grep
+- PRP `## 4) Context → Repos and files touched` discovery
+- ~71× token reduction on queries vs raw file reads (per graphify benchmarks)
+When NOT to use:
+- One-off small repo — overkill
+- Hot repo with high commit churn — graph stales fast, use `--update` hook (see Hot-repo section)
+- No `graphify` binary in env
+## Reply format
+```
+Graph build complete.
+  target:  {target_dir}
+  slug:    {repo_slug}
+  path:    .claude/runtime/cache/graphify/{slug}/graphify-out/
+  view:    open .claude/runtime/cache/graphify/{slug}/graphify-out/graph.html
+  report:  .claude/runtime/cache/graphify/{slug}/graphify-out/GRAPH_REPORT.md
+  next:    /sse:plan + /sse:sdd will consult graph.json when present
+```
+## Hot-repo: auto-update hook
+For frequently-changing target repos, install graphify's git hook:
+```bash
+cd {target_repo} && graphify hook install
+```
+Each commit triggers `graphify . --update`. Graph stays fresh, harness-kit cache reads latest. Document this in target repo's CONTRIBUTING.
+## Privacy note
+Code-only mode runs Tree-sitter locally — no network calls. Only the optional `--with-docs` step (not enabled by default in this wrapper) sends semantic descriptions of non-code files to a configured LLM. Raw source code never leaves the machine.

package/.claude/commands/context/pack.md ADDED Viewed

@@ -0,0 +1,59 @@
+---
+description: Pack target repo with repomix. Snapshot for AI context. Cached per feature_id.
+---
+Build repomix snapshot of target repo. Used by `/sse:plan`, `/sse:sdd` supervisor eval, and multi-repo PRPs.
+## Inputs
+- Arg 1 (required): `feature_id` (matches PRD/PRP filename basename)
+- Arg 2 (optional): target repo path (default = cwd)
+- `--include "glob"`: narrow scope (e.g. `--include "src/auth/**"`). Repeat with comma-separated globs.
+- `--style xml|markdown`: output format. Default xml.
+## Pre-flight
+Run `command -v repomix`. Missing → block with install hint:
+```
+repomix not installed
+install: npm i -g repomix  |  brew install repomix
+docs:    https://repomix.com
+```
+## Steps
+1. Bash: `.claude/scripts/pack-repo.sh <feature_id> [target] [--include glob] [--style xml]`
+2. Script writes `.claude/runtime/cache/repomix/{feature_id}.{style}`
+3. Report path + size + token estimate
+## When to use
+- Target repo > 50 files and stages keep re-grepping same context
+- Multi-repo feature: pack each repo, supervisor eval reads concat
+- SDD loop: pack `{changed files} ∪ {PRP "Repos and files touched"}` for cheaper per-iter eval
+When NOT to use:
+- Small repo (<20 files) — grep is fine
+- Pack would exceed context budget (>200k tokens) — narrow with `--include`
+## Reply format
+```
+Pack complete.
+  feature:  {feature_id}
+  target:   {target_dir}
+  style:    {xml|markdown}
+  include:  {glob | (default .gitignore aware)}
+  path:     .claude/runtime/cache/repomix/{feature_id}.{ext}
+  size:     {N} bytes
+  tokens:   {N} (repomix estimate)
+  next:     consumed automatically by /sse:plan, /sse:sdd if cache present
+```
+## Cleanup
+Pack invalidated when:
+- `/pipeline:reset` runs (clears all caches for active feature)
+- Manual: `rm .claude/runtime/cache/repomix/{feature_id}.*`
+Pack is **snapshot, not live**. Re-run when target repo materially changes mid-feature.

package/.claude/commands/sse/plan.md CHANGED Viewed

@@ -24,6 +24,13 @@ Read:
 - .claude/agents/staff-software-engineer/guides/coding-style.md
 - area-specific skill: .claude/agents/staff-software-engineer/skills/{area}/SKILL.md (area = backend, web, mobile, devops)
 - project conventions if present: {repo}/.claude/conventions/{area}.md (see .claude/agents/staff-software-engineer/guides/conventions-override.md)
+- .claude/shared/context-strategy.md — pick the right tier for target-repo lookups
+Context lookups (per `context-strategy.md`):
+- Cached graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` → query for callers/refs instead of grepping
+- Cached pack at `.claude/runtime/cache/repomix/{feature_id}.xml` → read for full file snapshot
+- Neither present → fall back to grep + Read on live repo
+- Don't double-load. If pack/graph covers a PRP-listed file, skip the grep for it.
 Save to .claude/runtime/outputs/sse/plan/{feature_id}.md.

package/.claude/commands/sse/run.md CHANGED Viewed

@@ -1,15 +1,25 @@
 ---
-description: Run the full engineering pipeline. Plan, dev, test, pr in sequence.
+description: Run the full engineering pipeline. Plan, dev, test, pr in sequence. Pass --local to stop after test (no PR). Pass --sdd for spec-driven loop variant.
 ---
 Run end to end.
-1. Invoke /sse:plan. Wait for approval marker on plan.
-2. Invoke /sse:dev. Implements plan in code.
-3. Invoke /sse:test. Runs project test suite.
-4. Invoke /sse:pr. Opens pull request.
-5. Invoke /sse:pr-monitor. Arms backoff polling, auto-clears pipeline state on merge. Skip if user passed `--no-monitor` or `gh pr view` already returns MERGED.
-6. Return summary.
+## Flags
+- `--local`: skip /sse:pr + /sse:pr-monitor. Stop after /sse:test. Use when want dev+test locally without push.
+- `--sdd`: hand off to `/sse:sdd` instead (spec-driven loop, plan once + dev↔test↔eval loop, also local-only). See `.claude/commands/sse/sdd.md`. Mutually exclusive with `--local`.
+- `--no-monitor`: skip /sse:pr-monitor only. PR still opens.
+## Steps
+1. `--sdd` set → invoke /sse:sdd with same args (minus --sdd). Return its result. Skip rest.
+2. Invoke /sse:plan. Wait for approval marker on plan.
+3. Invoke /sse:dev. Implements plan in code.
+4. Invoke /sse:test. Runs project test suite.
+5. `--local` set → stop here. Print summary (omit PR + Monitor lines). Tell user `next: review diff, /sse:pr when ready`.
+6. Invoke /sse:pr. Opens pull request.
+7. Invoke /sse:pr-monitor. Arms backoff polling, auto-clears pipeline state on merge. Skip if user passed `--no-monitor` or `gh pr view` already returns MERGED.
+8. Return summary.
 Follow .claude/agents/staff-software-engineer/guides/pipeline.md for retry, approval markers, token accounting, publish behavior.

package/.claude/commands/sse/sdd.md ADDED Viewed

@@ -0,0 +1,98 @@
+---
+description: Spec-driven dev loop. Plan once, then dev↔test↔eval loop until PRP spec satisfied. Local only, no PR.
+---
+Run spec-driven dev loop. Follow .claude/agents/staff-software-engineer/guides/sdd-loop.md.
+## Inputs
+- Optional arg: path to source PRP. Default = latest approved PRP in `.claude/runtime/outputs/pm/prp/`.
+- None found → abort. Tell user to run `/product-manager:prp` first.
+feature_id = basename of PRP file (no .md).
+## Pre-flight (hard gate)
+Run sensor `.claude/agents/staff-software-engineer/sensors/prp-has-acceptance-criteria.md` on source PRP. Fail → block, return blocker. Do not proceed.
+Print header card. Format: `.claude/scripts/stage-card.md`.
+```
+━━━ /sse:sdd · {feature_id} ━━━
+**guides:**  sdd-loop.md, pipeline.md
+**refs:**    prp/{feature_id}.md
+**sensors:** prp-has-acceptance-criteria
+**eval:**    spec-satisfied (supervisor, per iter)
+**next:**    /sse:pr (after PASS, manual trigger)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+## Steps
+1. **Plan once.** Invoke `/sse:plan`. Wait approved marker. Hard stop if plan fails.
+2. **Goal loop. Cap = 3 iters.**
+   For iter in 1..3:
+   a. Write marker `.claude/runtime/outputs/sse/.markers/{feature_id}.sdd-iter{N}-dev.start`.
+   b. Invoke `/sse:dev`. If iter > 1, pass `--focus="{next_iter_focus from prior eval}"`.
+   c. Invoke `/sse:test`. Capture report path.
+   d. Run supervisor eval `.claude/agents/staff-software-engineer/evals/spec-satisfied.md`. **Use fresh session** (Task tool with subagent_type=general-purpose, prompt includes PRP path + dev summary path + test report path + `git diff main...HEAD`). Worker context must not leak.
+   e. Parse eval JSON output. Verdict PASS → break. FAIL → record `next_iter_focus`, continue.
+3. **Write transcript** `.claude/runtime/outputs/sse/sdd/{feature_id}.md`. Format per `guides/sdd-loop.md` § Output artifact.
+4. **PASS path:** append approval marker:
+   ```
+   <!-- approved: {YYYY-MM-DD} verdict=PASS iters={N} -->
+   ```
+5. **FAIL path (cap hit):** append:
+   ```
+   <!-- blocked: {YYYY-MM-DD} verdict=FAIL iters=3 -->
+   ```
+   Return blocker listing NOT_MET criteria + UNCLEAR items.
+## Reply format
+PASS:
+```
+SDD loop PASS. {N} iter(s).
+  feature:    {feature_id}
+  branch:     {branch}
+  commits:    {M} ({short-sha}, ...)
+  criteria:   {met}/{total} MET
+  gates:      {green}/{total} GREEN
+  manual:     {N} pending (user verify)
+  transcript: .claude/runtime/outputs/sse/sdd/{feature_id}.md
+  next:       review diff, run /sse:pr when ready
+```
+FAIL (cap hit):
+```
+SDD loop FAIL. cap hit at 3 iters.
+  feature:    {feature_id}
+  branch:     {branch}
+  blockers:
+    - criterion: "{text}" status: NOT_MET reason: {evidence}
+    - gate: "{cmd}" status: RED exit: {N}
+  transcript: .claude/runtime/outputs/sse/sdd/{feature_id}.md
+  next:       address blockers, re-run /sse:sdd
+```
+## Context tiers
+Before iter 1, check for cached context per `.claude/shared/context-strategy.md`:
+- repomix pack at `.claude/runtime/cache/repomix/{feature_id}.xml` → supervisor eval reads it for richer judgment
+- graphify graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` → supervisor eval queries for "does diff break callers"
+Neither present + repo big → suggest user run `/context:pack {feature_id} --include "{paths from PRP}"` once before continuing. Don't auto-build (manual cmds only, per harness pattern).
+## Rules
+- **No PR.** This command never opens PR. User decides via `/sse:pr`.
+- **No push.** Commits stay local on dev branch.
+- **Fresh evaluator session.** Worker self-eval defeats supervisor pattern.
+- **3 iter cap.** Do not exceed. Cap hit = real signal of spec/code mismatch.
+- **Print iter banner** between iters: `━━━ SDD iter {N}/3 ━━━` so user can follow loop.

package/.claude/runtime/cache/graphify/.gitkeep ADDED Viewed

File without changes

package/.claude/runtime/cache/repomix/.gitkeep ADDED Viewed

File without changes

package/.claude/runtime/outputs/sse/sdd/.gitkeep ADDED Viewed

File without changes

package/.claude/scripts/graph-repo.sh ADDED Viewed

@@ -0,0 +1,58 @@
+#!/usr/bin/env bash
+# Build knowledge graph of target repo with graphify.
+# Output: .claude/runtime/cache/graphify/{repo_slug}/graphify-out/
+#
+# Code files processed locally via Tree-sitter — no API key needed.
+# Non-code files (docs, PDFs, images) require an LLM key if present.
+#
+# Usage:
+#   graph-repo.sh [target_dir] [--update] [--deep] [--wiki]
+#
+# Defaults: target_dir = cwd, fresh build (rerun overwrites)
+set -e
+TARGET="${PWD}"
+EXTRA_ARGS=()
+while [ $# -gt 0 ]; do
+  case "$1" in
+    --update) EXTRA_ARGS+=(--update); shift ;;
+    --deep)   EXTRA_ARGS+=(--mode deep); shift ;;
+    --wiki)   EXTRA_ARGS+=(--wiki); shift ;;
+    --*)      echo "unknown flag: $1 (pass-through flags: --update --deep --wiki)"; exit 2 ;;
+    *)        TARGET="$1"; shift ;;
+  esac
+done
+command -v graphify >/dev/null 2>&1 || {
+  echo "graphify not installed"
+  echo "install: uv tool install graphifyy  |  pipx install graphifyy  |  pip install graphifyy"
+  echo "note:    PyPI pkg is graphifyy (double y), CLI cmd is graphify"
+  echo "docs:    https://github.com/safishamsi/graphify"
+  exit 127
+}
+ABS_TARGET="$(cd "$TARGET" && pwd)"
+SLUG="$(basename "$ABS_TARGET")-$(echo "$ABS_TARGET" | shasum | cut -c1-8)"
+CACHE_DIR=".claude/runtime/cache/graphify/$SLUG"
+mkdir -p "$CACHE_DIR"
+echo "graph: $ABS_TARGET → $CACHE_DIR/graphify-out/"
+[ ${#EXTRA_ARGS[@]} -gt 0 ] && echo "  args: ${EXTRA_ARGS[*]}"
+# graphify writes graphify-out/ into cwd. cd cache, run with abs target.
+(cd "$CACHE_DIR" && graphify "$ABS_TARGET" "${EXTRA_ARGS[@]}")
+OUT="$CACHE_DIR/graphify-out"
+if [ ! -d "$OUT" ]; then
+  echo "expected output dir not found: $OUT"
+  exit 1
+fi
+echo "ok"
+echo "  graph:  $OUT/graph.json"
+echo "  view:   open $OUT/graph.html"
+echo "  report: $OUT/GRAPH_REPORT.md"
+echo "  re-run with --update for incremental refresh"

package/.claude/scripts/pack-repo.sh ADDED Viewed

@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+# Pack target repo with repomix. Snapshot for AI context.
+# Output: .claude/runtime/cache/repomix/{feature_id}.xml
+#
+# Usage:
+#   pack-repo.sh <feature_id> [target_dir] [--include glob] [--style xml|markdown]
+#
+# Defaults:
+#   target_dir = cwd
+#   style      = xml
+#   include    = repomix default (.gitignore aware)
+set -e
+FEATURE_ID="${1:?feature_id required}"
+shift || true
+TARGET="${PWD}"
+STYLE="xml"
+INCLUDE=""
+while [ $# -gt 0 ]; do
+  case "$1" in
+    --include) INCLUDE="$2"; shift 2 ;;
+    --style)   STYLE="$2"; shift 2 ;;
+    --*)       echo "unknown flag: $1"; exit 2 ;;
+    *)         TARGET="$1"; shift ;;
+  esac
+done
+command -v repomix >/dev/null 2>&1 || {
+  echo "repomix not installed"
+  echo "install: npm i -g repomix  |  brew install repomix"
+  echo "docs:    https://repomix.com"
+  exit 127
+}
+CACHE_ROOT=".claude/runtime/cache/repomix"
+mkdir -p "$CACHE_ROOT"
+OUT="$CACHE_ROOT/${FEATURE_ID}.${STYLE}"
+ARGS=(--style "$STYLE" -o "$OUT")
+[ -n "$INCLUDE" ] && ARGS+=(--include "$INCLUDE")
+echo "pack: $TARGET → $OUT"
+(cd "$TARGET" && repomix "${ARGS[@]}" --quiet)
+TOKENS=$(grep -oE "Total Tokens:.*" "$OUT" 2>/dev/null | head -1 || true)
+SIZE=$(wc -c < "$OUT" | tr -d ' ')
+echo "ok"
+echo "  path:   $OUT"
+echo "  size:   ${SIZE} bytes"
+[ -n "$TOKENS" ] && echo "  tokens: $TOKENS"

package/.claude/settings.local.json CHANGED Viewed

@@ -92,7 +92,12 @@
       "Bash(echo \"exit=$?\")",
       "Bash(echo \"next exit=$? \\(expect 1, nothing to do\\)\")",
       "Bash(:)",
-      "Bash(ffmpeg -y -i out/demo.mp4 -vf \"fps=8,scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=64[p];[s1][p]paletteuse=dither=bayer:bayer_scale=5\" -loop 0 preview.gif)"
+      "Bash(ffmpeg -y -i out/demo.mp4 -vf \"fps=8,scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=64[p];[s1][p]paletteuse=dither=bayer:bayer_scale=5\" -loop 0 preview.gif)",
+      "Bash(mkdir -p /Users/pierryborges/Development/harness-kit/.claude/runtime/outputs/sse/sdd)",
+      "Bash(touch /Users/pierryborges/Development/harness-kit/.claude/runtime/outputs/sse/sdd/.gitkeep)",
+      "WebFetch(domain:graphify.net)",
+      "Bash(mkdir -p /Users/pierryborges/Development/harness-kit/.claude/commands/context /Users/pierryborges/Development/harness-kit/.claude/shared /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/repomix /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/graphify)",
+      "Bash(touch /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/repomix/.gitkeep /Users/pierryborges/Development/harness-kit/.claude/runtime/cache/graphify/.gitkeep)"
     ]
   }
 }

package/.claude/shared/context-strategy.md ADDED Viewed

@@ -0,0 +1,102 @@
+# Context Strategy
+Shared guide for PM + SSE agents. When pulling target-repo context into a stage, pick the right tier.
+Three tiers. Cost goes up left → right. Capability goes up too. Match tier to need.
+| Tier | Tool | When | Cost | Freshness |
+|---|---|---|---|---|
+| 1 | `grep` / `Read` | small repo, narrow question, one-shot | free | always live |
+| 2 | `repomix` snapshot | feature-scoped context, deterministic handoff | low (one pack per feature) | frozen at pack time |
+| 3 | `graphify` graph | long-lived repo, queryable, multi-feature reuse | medium upfront (one build per repo), tiny per query | merged via `--update` per commit (with hook) |
+## Decision tree
+```
+question = "where does X live?" or "what calls Y?"
+  ├── repo <20 files? ──── grep / Read
+  ├── graph cached + repo unchanged? ── query graph.json
+  ├── graph stale or missing + repo big? ── /context:graph (one shot)
+  └── feature-scoped narrow scope? ──── /context:pack --include "..."
+question = "give me the full repo for an independent eval"
+  ├── repo packs under 200k tokens? ── /context:pack
+  └── too big? ── /context:pack --include "{narrowed glob}"
+question = "supervisor eval reads diff + minimal context (SDD loop)"
+  ├── pack of {changed files} ∪ {PRP files touched} → /context:pack --include
+  └── add graph query for impact analysis (v2)
+```
+## Per-stage hookup
+### `/product-manager:prp`
+- § 4 Context discovery: prefer graph query (if cached) over grep
+- `Repos and files touched` list: graph + grep both OK; graph faster
+- No pack here — PRP is upstream of pack
+### `/sse:plan`
+- Read order:
+  1. source PRP
+  2. cached graph at `.claude/runtime/cache/graphify/{slug}/graphify-out/graph.json` if present
+  3. cached pack at `.claude/runtime/cache/repomix/{feature_id}.{ext}` if present
+  4. fall back to grep + Read on target repo
+- Don't double-load. If pack covers all PRP-listed files, skip grep.
+### `/sse:dev`
+- Never reads stale pack/graph for the live code — code is mutating per commit.
+- Reads plan only. Use grep/Read on live repo for guidance lookups.
+### `/sse:test`
+- No context tools. Runs detected test command.
+### `/sse:sdd` supervisor eval
+- Fresh session reads:
+  1. PRP (full)
+  2. dev summary
+  3. test report
+  4. `git diff main...HEAD`
+- If pack exists: ALSO read `cache/repomix/{feature_id}.{ext}` for richer judgment
+- If graph exists: ALSO query graph for "does diff break callers of touched symbols"
+## Cache layout
+```
+.claude/runtime/cache/
+├── repomix/
+│   ├── .gitkeep
+│   └── {feature_id}.xml             ephemeral, per-feature, cleared on /pipeline:reset
+└── graphify/
+    ├── .gitkeep
+    └── {repo_slug}/                 long-lived, per-repo, manual rebuild or --update hook
+        └── graphify-out/
+            ├── graph.json
+            ├── graph.html
+            └── GRAPH_REPORT.md
+```
+`{repo_slug}` = `basename(abs_target)` + `-` + `shasum(abs_target)[:8]`. Stable across machines for same path.
+## Invalidation
+| Cache | Invalidated by |
+|---|---|
+| `repomix/{feature_id}.*` | `/pipeline:reset`, manual `rm`, or stage detecting target diff > 100 LOC since pack |
+| `graphify/{slug}/` | manual `/context:graph --update`, graphify git-hook auto-commit refresh, or manual `rm -rf` |
+## Cost notes
+- **Repomix**: ~1-2s build for medium repo. Token count printed. Fits inside context.
+- **Graphify code-only**: Tree-sitter local, ~5-30s for medium repo, no API key, no network.
+- **Graphify --with-docs**: LLM semantic extraction on docs/PDFs/images. Sends semantic descriptions only (not raw code). Requires API key per their docs. Opt-in only; this harness defaults to code-only.
+## Install
+Both optional. Detect on `hk install`; print install hint if missing. Don't auto-install.
+```
+repomix:   npm i -g repomix   |   brew install repomix
+graphify:  uv tool install graphifyy   |   pipx install graphifyy
+```
+PyPI package name is `graphifyy` (double y), CLI command is `graphify`.

package/AGENTS.md CHANGED Viewed

@@ -31,8 +31,10 @@ CLAUDE.md                          ← project context (style, role, conventions
 │   └── staff-software-engineer.md orchestrator agent
 ├── commands/                      ← slash-command entry points
 │   ├── product-manager/           /product-manager:{prd,prp,run}
-│   ├── sse/                       /sse:{plan,dev,test,pr,run}
+│   ├── sse/                       /sse:{plan,dev,test,pr,run,sdd}
+│   ├── context/                   /context:{pack,graph}
 │   └── pipeline/                  /pipeline:{continue,reset}
+├── shared/                        ← cross-agent guides (context-strategy.md)
 ├── conventions/                   ← generic conventions (overridable per repo)
 ├── hooks/                         ← root lifecycle hooks (session-start, prompt, postedit, postwrite, status-line, activity-pre-read)
 ├── scripts/                       ← root utilities (pipeline.py, activity.py, pr-monitor.py)
@@ -58,6 +60,7 @@ CLAUDE.md                          ← project context (style, role, conventions
 |---|---|---|---|
 | `product-manager` | Generate PRD then PRP for a squad/feature | `/product-manager:run` | `.claude/agents/product-manager.md` |
 | `staff-software-engineer` | Full engineering pipeline: plan → dev → test → pr | `/sse:run` | `.claude/agents/staff-software-engineer.md` |
+| `staff-software-engineer` (sdd) | Spec-driven dev loop, local only: plan once + dev↔test↔eval until PRP spec met | `/sse:sdd` | `.claude/agents/staff-software-engineer/guides/sdd-loop.md` |
 ---
@@ -75,6 +78,10 @@ When the user types a slash command, the entry point is unambiguous. When the us
 | "implement the plan" | `/sse:dev` |
 | "run tests" | `/sse:test` |
 | "open the PR" | `/sse:pr` |
+| "dev + test locally, no PR" | `/sse:run --local` |
+| "spec-driven loop until PRP met" | `/sse:sdd` |
+| "snapshot a repo for AI context" | `/context:pack <feature_id>` |
+| "build knowledge graph of a repo" | `/context:graph` |
 | "continue the active pipeline" | `/pipeline:continue` |
 | "abandon active feature" | `/pipeline:reset` |
@@ -94,6 +101,21 @@ Each stage:
 Phase markers under `.claude/runtime/outputs/{pm,sse}/.markers/` track stage boundaries (`{feature}.{phase}.{start,end}`) for token accounting and pipeline status.
+### SDD variant
+`/sse:sdd` adds a spec-driven loop replacing the single-shot `dev → test`:
+```
+prd → prp → plan → [dev ↔ test ↔ spec-satisfied eval]  →  [user gate]  →  pr
+                          ↑ loop, cap 3 iters             stops local
+```
+- Predicate built from PRP `## 3) What → Success criteria (verifiable)` + `## 6) Validation gates`.
+- Pre-flight sensor `prp-has-acceptance-criteria` blocks if PRP not testable.
+- Per-iter supervisor eval `spec-satisfied` runs in fresh session (no worker context).
+- Transcript at `.claude/runtime/outputs/sse/sdd/{feature_id}.md`.
+- PR never auto-opened. User triggers `/sse:pr` after reviewing transcript.
 ---
 ## Runtime (state, outputs, hooks)
@@ -103,7 +125,8 @@ Generated artifacts and lifecycle hooks live under `.claude/runtime/`.
 | Path | Contents |
 |---|---|
 | `runtime/outputs/pm/{prd,prp,tokens,.markers}/` | PRD/PRP artifacts, token JSONs, phase markers |
-| `runtime/outputs/sse/{plan,dev,test,pr,tokens,.markers}/` | Plan/dev/test/pr artifacts, token JSONs, phase markers |
+| `runtime/outputs/sse/{plan,dev,test,pr,sdd,tokens,.markers}/` | Plan/dev/test/pr/sdd-loop artifacts, token JSONs, phase markers |
+| `runtime/cache/{repomix,graphify}/` | Optional context cache: repomix snapshots (per feature_id) + graphify graphs (per repo). See `.claude/shared/context-strategy.md` |
 | `runtime/hooks/<agent>/` | Per-agent lifecycle hooks (post-write, post-eval, pre-prp-check) |
 | `runtime/scripts/<agent>/` | Per-agent utilities (sensor-runner, token-phase, link-validator, confluence-publish) |

package/CLAUDE.md CHANGED Viewed

@@ -51,12 +51,24 @@ Engineering pipeline. Slash commands:
 - `/sse:dev` implement the plan, run convention gates
 - `/sse:test` run repo tests
 - `/sse:pr` open the draft PR
-- `/sse:run` full SSE pipeline
+- `/sse:run` full SSE pipeline. `--local` skip PR. `--sdd` use spec-driven loop variant.
+- `/sse:sdd` spec-driven dev loop. Plan once + dev↔test↔eval loop until PRP spec satisfied. Local only, no PR.
 Sub-agent `staff-software-engineer` is also Task-tool-invokable. Assets: `.claude/agents/staff-software-engineer/`.
 Full pipeline order: `prd → prp → plan → dev → test → pr`. Each stage gets an approval marker. The status bar tracks the current one.
+SDD variant: `prd → prp → plan → [dev ↔ test ↔ spec-satisfied eval] → [user gate] → pr`. Loop cap 3 iters. Predicate built from PRP "Success criteria (verifiable)" + "Validation gates". See `.claude/agents/staff-software-engineer/guides/sdd-loop.md`.
+### context tools (optional, manual)
+Two opt-in helpers for big target repos. Both bind to external CLIs; missing binary → cmd prints install hint.
+- `/context:pack <feature_id>` — `repomix` snapshot to `.claude/runtime/cache/repomix/{feature_id}.xml`. Ephemeral per feature. Cleared on `/pipeline:reset`.
+- `/context:graph [repo]` — `graphify` knowledge graph to `.claude/runtime/cache/graphify/{slug}/graphify-out/`. Long-lived per repo, code-only mode needs no API key (Tree-sitter local).
+PRP, plan, and SDD supervisor eval consult cache when present and fall back to grep otherwise. Tier order + when-to-use in `.claude/shared/context-strategy.md`.
 ## Project conventions override
 Each target repo can override SSE defaults with files in `.claude/conventions/`:

package/README.md CHANGED Viewed

@@ -2,9 +2,9 @@
 # harness-kit
-From idea to merged PR. One pipeline. Six stages.
+From idea to merged PR. One pipeline. Six stages. Or spec-driven loop until the spec is met.
-[![Version](https://img.shields.io/badge/version-4.0.1-blue.svg)](VERSION)
+[![Version](https://img.shields.io/badge/version-4.1.0-blue.svg)](VERSION)
 [![Claude Code](https://img.shields.io/badge/Claude%20Code-AGENTS.md-8b5cf6.svg)](https://claude.ai/code)
 [![Agents](https://img.shields.io/badge/agents-2-success.svg)](#agents)
 [![License](https://img.shields.io/badge/license-MIT-lightgrey.svg)](LICENSE)
@@ -29,6 +29,15 @@ prd → prp → plan → dev → test → pr
 Each stage produces a markdown artifact, gated by **deterministic sensors** (pass/fail) and a **scored eval** (≥ 8.0). After the PR opens, an in-session monitor watches for merge.
+For local-only work, two extra modes skip the PR stage:
+```
+/sse:run --local          plan → dev → test → STOP            (single shot, no loop)
+/sse:sdd                  plan → [dev↔test↔eval]×3 → STOP     (spec-driven goal loop)
+```
+`/sse:sdd` is the SDD variant: the PRP is the spec, and an independent supervisor session re-checks the repo against `Success criteria (verifiable)` + `Validation gates` after every dev↔test iteration. PR is never auto-opened — user runs `/sse:pr` after reviewing the loop transcript.
 ---
 ## Install
@@ -82,6 +91,7 @@ A bug fix, a small enhancement, or a refactor where writing a PRD would be theat
 ```
 /sse:run                 # plan → dev → test → PR
+/sse:run --local         # plan → dev → test, stop before PR (push manually later)
 ```
 Or run a single stage if that's all you need:
@@ -93,6 +103,18 @@ Or run a single stage if that's all you need:
 /sse:pr                  # just open the PR
 ```
+### Spec-driven loop — iterate locally until the PRP is satisfied
+You have an approved PRP and want Claude to loop dev↔test until the spec actually passes, judged by an independent supervisor session. No PR until you say so.
+```
+/sse:sdd                 # plan once + dev↔test↔spec-satisfied eval, cap 3 iters
+# review .claude/runtime/outputs/sse/sdd/{feature_id}.md
+/sse:pr                  # manual gate when ready
+```
+The loop predicate is built from the PRP's `Success criteria (verifiable)` and `Validation gates` sections — both must be present and concrete, or the `prp-has-acceptance-criteria` sensor blocks before the first iteration runs. Cap hit without a PASS verdict returns a blocker listing the unmet criteria.
 ### Resume — pick up where you left off
 Closed the session, restarted Claude Code, or got interrupted. State persists at `.claude/.pipeline-state.json`.
@@ -111,6 +133,10 @@ When the PR merges, the in-session monitor clears state automatically.
 ```
 /product-manager:run           draft PRD then PRP
 /sse:run                       plan, dev, test, open PR, watch for merge
+/sse:run --local               plan, dev, test — stop before PR
+/sse:sdd                       spec-driven loop: dev↔test↔eval until PRP met, no PR
+/context:pack <feature_id>     repomix snapshot of target repo (per-feature cache)
+/context:graph [repo]          graphify knowledge graph of a target repo (per-repo cache)
 /pipeline:continue             resume next pending stage
 /pipeline:reset                abandon active run
 ```
@@ -125,8 +151,9 @@ Need just one stage? Each is its own slash command:
 | `dev` | `/sse:dev` | `code-conventions`, `test-coverage`, `dev-structure` · `dev-quality` |
 | `test` | `/sse:test` | `test-structure` · `test-quality` |
 | `pr` | `/sse:pr` | `pr-structure` · `pr-quality` · auto-arms `/sse:pr-monitor` |
+| `sdd` | `/sse:sdd` | `prp-has-acceptance-criteria` (pre-flight) · `spec-satisfied` per iter (fresh session) · cap 3 iters |
-Sensors block on failure (Claude regenerates). Evals score; threshold 8.0; retried up to 3 times.
+Sensors block on failure (Claude regenerates). Evals score; threshold 8.0; retried up to 3 times. SDD eval returns PASS/FAIL — FAIL re-enters the loop with a `next_iter_focus` hint.
 ---
@@ -142,12 +169,13 @@ Registered in [`AGENTS.md`](./AGENTS.md) at the repo root. Each ships its own se
 - Guides: `pipeline.md`, `prd-guidelines.md`, `prp-guidelines.md`, `writing-style.md`, `templates/`, `examples/`
 - [Full docs →](.claude/agents/product-manager/README.md)
-### `staff-software-engineer` — turns an approved PRP into a merged PR
+### `staff-software-engineer` — turns an approved PRP into a merged PR (or a satisfied spec)
 - Skills: `backend`, `web`, `mobile`, `devops` (auto-detected from repo)
-- Sensors: 6 (`plan-structure`, `code-conventions`, `test-coverage`, `dev-structure`, `test-structure`, `pr-structure`)
-- Evals: 4 (`plan`, `dev`, `test`, `pr` quality)
-- Guides: `pipeline.md`, `coding-style.md`, `commit-style.md`, `conventions-override.md`
+- Sensors: 7 (`plan-structure`, `code-conventions`, `test-coverage`, `dev-structure`, `test-structure`, `pr-structure`, `prp-has-acceptance-criteria`)
+- Evals: 5 (`plan`, `dev`, `test`, `pr` quality; `spec-satisfied` supervisor for SDD loop)
+- Guides: `pipeline.md`, `coding-style.md`, `commit-style.md`, `conventions-override.md`, `sdd-loop.md`
+- Modes: `/sse:run` (full pipeline), `/sse:run --local` (skip PR), `/sse:sdd` (spec-driven loop)
 - [Full docs →](.claude/agents/staff-software-engineer/README.md)
 ---
@@ -201,13 +229,15 @@ What `hk install` lays down in your repo:
 ├── CLAUDE.md                    workspace style + role
 └── .claude/
     ├── agents/                  agent definitions (sensors, evals, guides, skills)
-    ├── commands/                slash command entry points
+    ├── commands/                slash command entry points (pm, sse, context, pipeline)
+    ├── shared/                  cross-agent guides (context-strategy.md)
     ├── hooks/                   status-line + lifecycle hooks
-    ├── scripts/                 pipeline.py · activity.py · pr-monitor.py
+    ├── scripts/                 pipeline.py · activity.py · pr-monitor.py · pack-repo.sh · graph-repo.sh
     ├── runtime/
     │   ├── hooks/<agent>/       per-agent lifecycle (post-write, post-eval, pre-prp-check)
     │   ├── scripts/<agent>/     per-agent utilities (sensor-runner, token-phase, link-validator)
-    │   └── outputs/{pm,sse}/    generated artifacts, markers, tokens
+    │   ├── outputs/{pm,sse}/    generated artifacts, markers, tokens (incl. sse/sdd/ loop transcripts)
+    │   └── cache/               repomix packs + graphify graphs (optional, gitignored)
     ├── conventions/             your per-repo overrides
     └── settings.json            hook wiring
 ```
@@ -218,14 +248,25 @@ Full path-by-path map in [`AGENTS.md`](./AGENTS.md).
 ## Tooling
-| Tool | Why |
-|------|-----|
-| [Claude Code](https://claude.ai/code) | agent runtime |
-| python3 | sensors, token accounting, pipeline state |
-| [gh CLI](https://cli.github.com/) | opens PR, polls for merge |
-| git | branch + commit ops |
+| Tool | Why | Required |
+|------|-----|----------|
+| [Claude Code](https://claude.ai/code) | agent runtime | yes |
+| python3 | sensors, token accounting, pipeline state | yes |
+| [gh CLI](https://cli.github.com/) | opens PR, polls for merge | for `/sse:pr` |
+| git | branch + commit ops | yes |
+| [repomix](https://repomix.com) | snapshot target repo for AI context (`/context:pack`) | optional |
+| [graphify](https://github.com/safishamsi/graphify) | queryable knowledge graph of a repo (`/context:graph`) | optional |
+Install optional tools:
+```bash
+npm i -g repomix           # or: brew install repomix
+uv tool install graphifyy  # or: pipx install graphifyy   (CLI cmd is `graphify`)
+```
+`hk install` detects both and prints a hint if missing — never auto-installs. See [`.claude/shared/context-strategy.md`](.claude/shared/context-strategy.md) for when each tier is worth it (grep vs pack vs graph).
-Optional: `jq` for token JSON queries. `JIRA_USERNAME` + `JIRA_API_TOKEN` to publish PRD/PRP to Confluence.
+Other optional: `jq` for token JSON queries. `JIRA_USERNAME` + `JIRA_API_TOKEN` to publish PRD/PRP to Confluence.
 ---

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.0.1
1	+ 4.1.0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@pieerry/harness-kit",
-  "version": "4.0.1",
+  "version": "4.1.0",
   "description": "Claude Code harness for product + engineering delivery. From idea to merged PR, one pipeline.",
   "author": "Space Metrics AI",
   "license": "MIT",

package/setup/install.sh CHANGED Viewed

@@ -24,6 +24,18 @@ fi
 command -v git >/dev/null 2>&1 || { echo "git not found"; exit 1; }
 command -v python3 >/dev/null 2>&1 || { echo "python3 not found"; exit 1; }
+# Optional context tools (repomix snapshot, graphify knowledge graph).
+# Both are no-ops if missing — /context:pack and /context:graph will print install hints.
+if ! command -v repomix >/dev/null 2>&1; then
+  echo "  optional: repomix not found. /context:pack disabled until you install it."
+  echo "            npm i -g repomix   |   brew install repomix"
+fi
+if ! command -v graphify >/dev/null 2>&1; then
+  echo "  optional: graphify not found. /context:graph disabled until you install it."
+  echo "            uv tool install graphifyy   |   pipx install graphifyy   |   pip install graphifyy"
+  echo "            (PyPI pkg graphifyy, double y; CLI cmd graphify)"
+fi
 OLD_VERSION="$(cat "$TARGET/.claude/.hk-version" 2>/dev/null || echo "")"
 if [ -n "$OLD_VERSION" ] && [ "$OLD_VERSION" != "$VERSION" ]; then
@@ -57,7 +69,8 @@ for agent in product-manager staff-software-engineer; do
 done
 # 3) Per-agent runtime hooks + scripts (NOT outputs — that's target-side state)
-mkdir -p "$TARGET/.claude/runtime/hooks" "$TARGET/.claude/runtime/scripts" "$TARGET/.claude/runtime/outputs/pm/.markers" "$TARGET/.claude/runtime/outputs/sse/.markers"
+mkdir -p "$TARGET/.claude/runtime/hooks" "$TARGET/.claude/runtime/scripts" "$TARGET/.claude/runtime/outputs/pm/.markers" "$TARGET/.claude/runtime/outputs/sse/.markers" "$TARGET/.claude/runtime/outputs/sse/sdd" "$TARGET/.claude/runtime/cache/repomix" "$TARGET/.claude/runtime/cache/graphify"
+touch "$TARGET/.claude/runtime/cache/repomix/.gitkeep" "$TARGET/.claude/runtime/cache/graphify/.gitkeep"
 for agent in product-manager staff-software-engineer; do
   rm -rf "$TARGET/.claude/runtime/hooks/$agent"
   cp -R "$SOURCE_ROOT/.claude/runtime/hooks/$agent" "$TARGET/.claude/runtime/hooks/$agent"
@@ -76,11 +89,15 @@ done
 # 4) Slash commands
 mkdir -p "$TARGET/.claude/commands"
-for ns in product-manager sse pipeline; do
+for ns in product-manager sse pipeline context; do
   rm -rf "$TARGET/.claude/commands/$ns"
   cp -R "$SOURCE_ROOT/.claude/commands/$ns" "$TARGET/.claude/commands/$ns"
 done
+# 4.5) Shared cross-agent guides
+mkdir -p "$TARGET/.claude/shared"
+cp -R "$SOURCE_ROOT/.claude/shared/." "$TARGET/.claude/shared/"
 # 5) Root harness hooks (status-line + pipeline tracking + live activity)
 mkdir -p "$TARGET/.claude/hooks"
 for h in status-line.sh pipeline-prompt.sh pipeline-postwrite.sh pipeline-postedit.sh pipeline-session-start.sh activity-pre-read.sh; do
@@ -88,12 +105,16 @@ for h in status-line.sh pipeline-prompt.sh pipeline-postwrite.sh pipeline-posted
   chmod +x "$TARGET/.claude/hooks/$h"
 done
-# 6) Root harness scripts (pipeline state, activity, PR monitor, stage-card)
+# 6) Root harness scripts (pipeline state, activity, PR monitor, stage-card, context wrappers)
 mkdir -p "$TARGET/.claude/scripts"
 for s in pipeline.py activity.py pr-monitor.py; do
   cp "$SOURCE_ROOT/.claude/scripts/$s" "$TARGET/.claude/scripts/$s"
   chmod +x "$TARGET/.claude/scripts/$s"
 done
+for s in pack-repo.sh graph-repo.sh; do
+  cp "$SOURCE_ROOT/.claude/scripts/$s" "$TARGET/.claude/scripts/$s"
+  chmod +x "$TARGET/.claude/scripts/$s"
+done
 cp "$SOURCE_ROOT/.claude/scripts/stage-card.md" "$TARGET/.claude/scripts/stage-card.md"
 # 6.5) Strip any __pycache__/.pyc that hitched a ride from the source repo
@@ -209,5 +230,6 @@ echo "$VERSION" > "$TARGET/.claude/.hk-version"
 echo "done. restart Claude Code to load."
 echo "  /product-manager:prd | :prp | :run"
-echo "  /sse:plan | :dev | :test | :pr | :pr-monitor | :run"
+echo "  /sse:plan | :dev | :test | :pr | :pr-monitor | :run | :sdd"
+echo "  /context:pack | :graph"
 echo "  /pipeline:continue | :reset"