npm - @hallucination-studio/harness-engine - Versions diffs - 1.0.0-beta.8.87407 → 1.0.0-beta.9.bb2cd30 - Mend

@hallucination-studio/harness-engine 1.0.0-beta.8.87407 → 1.0.0-beta.9.bb2cd30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +23 -6
package/package.json +8 -2
package/skills/harness-repo-bootstrap/SKILL.md +18 -7
package/skills/harness-repo-bootstrap/evals/cases.json +8 -0
package/skills/harness-repo-bootstrap/evals/run_evals.py +453 -2
package/skills/harness-repo-bootstrap/references/evaluation-loop.md +2 -0
package/skills/harness-repo-bootstrap/references/exec-plans.md +14 -4
package/skills/harness-repo-bootstrap/references/workflow.md +6 -0
package/skills/harness-repo-bootstrap/scripts/manage_harness.py +1016 -22

package/README.md CHANGED Viewed

@@ -19,6 +19,8 @@ ask for missing high-impact facts, create the harness files, and keep future wor
 - Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
 - Enforces a local harness check without assuming the user's project has CI.
 - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
+- Enforces a local quality gate for execution plans; failed scores write `## Rework Required` into the plan and block `plan-close`.
+- Tracks resumable workstreams so interrupted features, refactors, reliability work, and cleanup efforts can be recovered from repo state instead of chat history.
 ## Why It Exists
@@ -89,8 +91,11 @@ The intended workflow is:
 5. Log durable knowledge into active plans.
 6. Write the durable facts into permanent docs.
 7. Mark knowledge as written using ID plus evidence text.
-8. Run the local harness check before handoff.
-9. Close the execution plan only after the durable docs are updated.
+8. Score the finished work across product, UX/operator clarity, architecture, reliability, and security.
+9. If the quality gate fails, implement the generated `## Rework Required` items and score again.
+10. For phased or resumable work, update `Phase Continuity` and `docs/exec-plans/workstreams.md`.
+11. Close the execution plan only after the quality gate passes, phase continuity is recorded, and durable docs are updated.
+12. Run the local harness check before handoff.
 The installed skill exposes the underlying script at:
@@ -105,9 +110,20 @@ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py analyze -
 python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py sample-answers --analysis analysis.json --output answers.json
 python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py init --repo . --answers answers.json
 python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py plan-start --repo . --slug feature-name --goal "Implement the feature"
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py quality-score --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --product-correctness 8 --ux-operator-clarity 8 --architecture-maintainability 8 --reliability-observability 8 --security-data-handling 8
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py phase-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --mode multi-phase --workstream feature-name --current-phase 1 --next-phase 2 --continuation docs/exec-plans/workstreams.md#feature-name --next-action "Create Phase 2 plan"
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py workstream-upsert --repo . --id feature-name --status active --current-plan docs/exec-plans/active/2026-06-11-feature-name.md --next-action "Create Phase 2 plan"
 python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo .
 ```
+The quality gate is intentionally local and repository-owned. It does not require the user's
+project to have CI. `plan-close` refuses to move a plan to `completed` unless `quality-score`
+has passed, and `check` reports active plans whose quality gate is missing or failing.
+For multi-phase work, `Phase Continuity` and `docs/exec-plans/workstreams.md` form the recovery
+ledger. A plan like `Local Workbench Phase 1` can close only after it records whether the workstream
+continues, pauses, completes, or stops, and where the next agent should resume.
 ## Generated Harness Shape
 A typical initialized target repository receives:
@@ -127,6 +143,7 @@ docs/
 ├── exec-plans/
 │   ├── active/
 │   ├── completed/
+│   ├── workstreams.md
 │   └── tech-debt-tracker.md
 ├── generated/
 ├── product-specs/
@@ -184,14 +201,14 @@ These scores describe the current implementation, not an external guarantee.
 | Layer | Score | Notes |
 | --- | ---: | --- |
 | Product fit | 8.5 / 10 | Clear purpose: install a Codex skill that creates and maintains an agent-first repository harness. The main missing piece is broader real-world usage data across more project types. |
-| Skill workflow design | 8.5 / 10 | Strong progressive workflow: analyze, confirm, initialize/update, plan, capture knowledge, validate, close. The current skill is opinionated but still adapts to target repositories. |
-| Knowledge-closure loop | 8 / 10 | Stable knowledge IDs plus evidence text reduce noisy doc duplication. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
+| Skill workflow design | 9 / 10 | Strong progressive workflow: analyze, confirm, initialize/update, plan, capture knowledge, validate, score, rework, record continuity, close. The current skill is opinionated but still adapts to target repositories. |
+| Knowledge, quality, and workstream closure loop | 8.7 / 10 | Stable knowledge IDs plus evidence text reduce noisy doc duplication, `quality-score` blocks closure until failed dimensions are reworked, and workstreams make phased work recoverable. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
 | CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
 | Generated harness docs | 7.5 / 10 | Covers architecture, plans, reliability, security, frontend policy, references, generated artifacts, and SOPs. Templates still require Codex to tighten project-specific language after generation. |
-| Evaluation coverage | 7.5 / 10 | Includes empty-repo init, frontend analysis, closed-loop plan behavior, user-owned doc preservation, and installer smoke tests. More end-to-end Codex acceptance tests would raise confidence. |
+| Evaluation coverage | 8.2 / 10 | Includes empty-repo init, frontend analysis, quality-gated closed-loop plan behavior, phase continuity, workstream recovery, user-owned doc preservation, and installer smoke tests. More end-to-end Codex acceptance tests would raise confidence. |
 | Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
 | User-project safety | 8.5 / 10 | The skill avoids adding CI to target projects by default and uses local harness checks instead. It preserves unmanaged files unless forced. |
-| Overall | 8.1 / 10 | Usable and coherent, with the highest leverage still in richer evals and more structured plan/knowledge state. |
+| Overall | 8.5 / 10 | Usable and coherent, with the highest leverage still in richer end-to-end evals and more structured plan/knowledge/quality/workstream state. |
 ## Reference

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hallucination-studio/harness-engine",
-  "version": "1.0.0-beta.8.87407",
+  "version": "1.0.0-beta.9.bb2cd30",
   "description": "Install a Codex skill that bootstraps and updates an advanced harness-engineering repository layout.",
   "repository": {
     "type": "git",
@@ -19,7 +19,13 @@
   },
   "files": [
     "bin",
-    "skills"
+    "skills/**/SKILL.md",
+    "skills/**/agents/**",
+    "skills/**/assets/**",
+    "skills/**/evals/*.json",
+    "skills/**/evals/*.py",
+    "skills/**/references/**",
+    "skills/**/scripts/*.py"
   ],
   "license": "MIT"
 }

package/skills/harness-repo-bootstrap/SKILL.md CHANGED Viewed

@@ -18,12 +18,17 @@ Run the packaged script to inspect the target repository before editing files. U
    - `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`
    - `python3 scripts/manage_harness.py update --repo <target-repo> --answers <answers.json>`
 7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
-8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`.
+8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
 9. Before closing the task, write those facts into their durable docs.
-10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<text already in durable doc>"`; use `--append` only when the exact fact should be appended mechanically.
-11. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
-12. Before handoff, run `python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
-13. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
+10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Use `--append` only when the exact fact should be appended mechanically.
+11. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This forces the quality gate to fail.
+12. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
+13. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --ux-operator-clarity <0-10> --architecture-maintainability <0-10> --reliability-observability <0-10> --security-data-handling <0-10>`.
+14. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
+15. For phased or resumable work, run `python3 scripts/manage_harness.py phase-set --repo <target-repo> --plan <plan-file> --mode <multi-phase|paused|completed|stopped> --workstream <id> --current-phase <n> --continuation <target> --next-action "<next action>"`, then update `workstreams.md` with `workstream-upsert`.
+16. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
+17. Before handoff, run `python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
+18. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
 ## Reading Order
@@ -45,10 +50,15 @@ Run the packaged script to inspect the target repository before editing files. U
 - Do not overwrite existing files unless the human asked for it or you pass `--force`.
 - Treat the generated files as starting points. After generation, tighten them with repository-specific details instead of leaving placeholders behind.
 - Treat `docs/exec-plans/` as required state for multi-step work, not optional notes.
+- Read `docs/exec-plans/workstreams.md` before resuming interrupted feature, refactor, reliability, security, frontend, or cleanup work.
 - Treat `docs/sops/` as mechanical operating procedures, not background reading.
 - When you answer a question using facts that are not yet in the repo but should be reusable, write them into a durable doc before finishing.
-- Prefer `knowledge-mark-written --id ... --evidence ...` so durable docs can use natural wording instead of duplicated exact fact strings.
-- Use `plan-close` as the final guardrail so plan state and durable docs stay synchronized.
+- Prefer `knowledge-mark-written --id ... --evidence-file ...` so durable docs can use natural wording without shell quoting failures or duplicated exact fact strings.
+- Use `defect-log` for every bug found by tests, evals, browser validation, or code review; unresolved defects must block handoff.
+- Use `defect-resolve` only after the implementation is fixed and you can cite passing validation or code evidence.
+- Use `quality-score` before `plan-close`; failed scores must drive rework, not handoff.
+- Use `phase-set` and `workstream-upsert` before `plan-close` for Phase 1/2/3 or any other resumable multi-plan work.
+- Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized.
 - Use `check` as the local handoff guardrail for user repositories.
 - Run `python3 evals/run_evals.py` after skill changes and treat failures as iteration input.
 - Do not add CI to user repositories unless the human explicitly asks for it.
@@ -58,6 +68,7 @@ Run the packaged script to inspect the target repository before editing files. U
 - Keep `AGENTS.md` short and routing-oriented.
 - Keep durable knowledge in repo docs, not in chat-only explanations.
 - Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
+- Keep resumable workstreams in `docs/exec-plans/workstreams.md`.
 - Keep generated material under `docs/generated/`.
 - Keep external, model-friendly references under `docs/references/`.
 - Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.

package/skills/harness-repo-bootstrap/evals/cases.json CHANGED Viewed

@@ -11,6 +11,14 @@
     "id": "closed-loop-plan",
     "description": "Execution plans should refuse to close until durable knowledge is written back."
   },
+  {
+    "id": "plan-path-canonicalization",
+    "description": "Plan commands should canonicalize absolute plan paths before updating workstreams."
+  },
+  {
+    "id": "defect-recovery-loop",
+    "description": "Validation or review defects should block quality gates until resolved with evidence."
+  },
   {
     "id": "preserve-unmanaged-docs",
     "description": "Existing user-owned harness files should be skipped unless explicitly forced."