@hallucination-studio/harness-engine 1.0.0-beta.8.87407 → 1.0.0-beta.9.bb2cd30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -19,6 +19,8 @@ ask for missing high-impact facts, create the harness files, and keep future wor
19
19
  - Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
20
20
  - Enforces a local harness check without assuming the user's project has CI.
21
21
  - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
22
+ - Enforces a local quality gate for execution plans; failed scores write `## Rework Required` into the plan and block `plan-close`.
23
+ - Tracks resumable workstreams so interrupted features, refactors, reliability work, and cleanup efforts can be recovered from repo state instead of chat history.
22
24
 
23
25
  ## Why It Exists
24
26
 
@@ -89,8 +91,11 @@ The intended workflow is:
89
91
  5. Log durable knowledge into active plans.
90
92
  6. Write the durable facts into permanent docs.
91
93
  7. Mark knowledge as written using ID plus evidence text.
92
- 8. Run the local harness check before handoff.
93
- 9. Close the execution plan only after the durable docs are updated.
94
+ 8. Score the finished work across product, UX/operator clarity, architecture, reliability, and security.
95
+ 9. If the quality gate fails, implement the generated `## Rework Required` items and score again.
96
+ 10. For phased or resumable work, update `Phase Continuity` and `docs/exec-plans/workstreams.md`.
97
+ 11. Close the execution plan only after the quality gate passes, phase continuity is recorded, and durable docs are updated.
98
+ 12. Run the local harness check before handoff.
94
99
 
95
100
  The installed skill exposes the underlying script at:
96
101
 
@@ -105,9 +110,20 @@ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py analyze -
105
110
  python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py sample-answers --analysis analysis.json --output answers.json
106
111
  python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py init --repo . --answers answers.json
107
112
  python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py plan-start --repo . --slug feature-name --goal "Implement the feature"
113
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py quality-score --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --product-correctness 8 --ux-operator-clarity 8 --architecture-maintainability 8 --reliability-observability 8 --security-data-handling 8
114
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py phase-set --repo . --plan docs/exec-plans/active/2026-06-11-feature-name.md --mode multi-phase --workstream feature-name --current-phase 1 --next-phase 2 --continuation docs/exec-plans/workstreams.md#feature-name --next-action "Create Phase 2 plan"
115
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py workstream-upsert --repo . --id feature-name --status active --current-plan docs/exec-plans/active/2026-06-11-feature-name.md --next-action "Create Phase 2 plan"
108
116
  python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo .
109
117
  ```
110
118
 
119
+ The quality gate is intentionally local and repository-owned. It does not require the user's
120
+ project to have CI. `plan-close` refuses to move a plan to `completed` unless `quality-score`
121
+ has passed, and `check` reports active plans whose quality gate is missing or failing.
122
+
123
+ For multi-phase work, `Phase Continuity` and `docs/exec-plans/workstreams.md` form the recovery
124
+ ledger. A plan like `Local Workbench Phase 1` can close only after it records whether the workstream
125
+ continues, pauses, completes, or stops, and where the next agent should resume.
126
+
111
127
  ## Generated Harness Shape
112
128
 
113
129
  A typical initialized target repository receives:
@@ -127,6 +143,7 @@ docs/
127
143
  ├── exec-plans/
128
144
  │ ├── active/
129
145
  │ ├── completed/
146
+ │ ├── workstreams.md
130
147
  │ └── tech-debt-tracker.md
131
148
  ├── generated/
132
149
  ├── product-specs/
@@ -184,14 +201,14 @@ These scores describe the current implementation, not an external guarantee.
184
201
  | Layer | Score | Notes |
185
202
  | --- | ---: | --- |
186
203
  | Product fit | 8.5 / 10 | Clear purpose: install a Codex skill that creates and maintains an agent-first repository harness. The main missing piece is broader real-world usage data across more project types. |
187
- | Skill workflow design | 8.5 / 10 | Strong progressive workflow: analyze, confirm, initialize/update, plan, capture knowledge, validate, close. The current skill is opinionated but still adapts to target repositories. |
188
- | Knowledge-closure loop | 8 / 10 | Stable knowledge IDs plus evidence text reduce noisy doc duplication. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
204
+ | Skill workflow design | 9 / 10 | Strong progressive workflow: analyze, confirm, initialize/update, plan, capture knowledge, validate, score, rework, record continuity, close. The current skill is opinionated but still adapts to target repositories. |
205
+ | Knowledge, quality, and workstream closure loop | 8.7 / 10 | Stable knowledge IDs plus evidence text reduce noisy doc duplication, `quality-score` blocks closure until failed dimensions are reworked, and workstreams make phased work recoverable. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
189
206
  | CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
190
207
  | Generated harness docs | 7.5 / 10 | Covers architecture, plans, reliability, security, frontend policy, references, generated artifacts, and SOPs. Templates still require Codex to tighten project-specific language after generation. |
191
- | Evaluation coverage | 7.5 / 10 | Includes empty-repo init, frontend analysis, closed-loop plan behavior, user-owned doc preservation, and installer smoke tests. More end-to-end Codex acceptance tests would raise confidence. |
208
+ | Evaluation coverage | 8.2 / 10 | Includes empty-repo init, frontend analysis, quality-gated closed-loop plan behavior, phase continuity, workstream recovery, user-owned doc preservation, and installer smoke tests. More end-to-end Codex acceptance tests would raise confidence. |
192
209
  | Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
193
210
  | User-project safety | 8.5 / 10 | The skill avoids adding CI to target projects by default and uses local harness checks instead. It preserves unmanaged files unless forced. |
194
- | Overall | 8.1 / 10 | Usable and coherent, with the highest leverage still in richer evals and more structured plan/knowledge state. |
211
+ | Overall | 8.5 / 10 | Usable and coherent, with the highest leverage still in richer end-to-end evals and more structured plan/knowledge/quality/workstream state. |
195
212
 
196
213
  ## Reference
197
214
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hallucination-studio/harness-engine",
3
- "version": "1.0.0-beta.8.87407",
3
+ "version": "1.0.0-beta.9.bb2cd30",
4
4
  "description": "Install a Codex skill that bootstraps and updates an advanced harness-engineering repository layout.",
5
5
  "repository": {
6
6
  "type": "git",
@@ -19,7 +19,13 @@
19
19
  },
20
20
  "files": [
21
21
  "bin",
22
- "skills"
22
+ "skills/**/SKILL.md",
23
+ "skills/**/agents/**",
24
+ "skills/**/assets/**",
25
+ "skills/**/evals/*.json",
26
+ "skills/**/evals/*.py",
27
+ "skills/**/references/**",
28
+ "skills/**/scripts/*.py"
23
29
  ],
24
30
  "license": "MIT"
25
31
  }
@@ -18,12 +18,17 @@ Run the packaged script to inspect the target repository before editing files. U
18
18
  - `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`
19
19
  - `python3 scripts/manage_harness.py update --repo <target-repo> --answers <answers.json>`
20
20
  7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
21
- 8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`.
21
+ 8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`. Use `--fact-file <file>` when the fact contains shell-sensitive characters.
22
22
  9. Before closing the task, write those facts into their durable docs.
23
- 10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<text already in durable doc>"`; use `--append` only when the exact fact should be appended mechanically.
24
- 11. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
25
- 12. Before handoff, run `python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
26
- 13. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
23
+ 10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<text already in durable doc>"`; prefer `--evidence-file <file>` when evidence contains backticks, globs, quotes, pipes, or other shell-sensitive characters. Use `--append` only when the exact fact should be appended mechanically.
24
+ 11. If validation, evals, browser checks, or code review reveal a bug, immediately run `python3 scripts/manage_harness.py defect-log --repo <target-repo> --plan <plan-file> --severity <P0|P1|P2|P3> --summary "<bug>" --evidence "<failing check>"`. This forces the quality gate to fail.
25
+ 12. Fix logged defects, then run `python3 scripts/manage_harness.py defect-resolve --repo <target-repo> --plan <plan-file> --id <bug-id> --fix-evidence "<passing check or code evidence>"`.
26
+ 13. Score the finished work with `python3 scripts/manage_harness.py quality-score --repo <target-repo> --plan <plan-file> --product-correctness <0-10> --ux-operator-clarity <0-10> --architecture-maintainability <0-10> --reliability-observability <0-10> --security-data-handling <0-10>`.
27
+ 14. If `quality-score` fails, treat `## Rework Required` in the plan as the next implementation input, fix the work, then run `quality-score` again.
28
+ 15. For phased or resumable work, run `python3 scripts/manage_harness.py phase-set --repo <target-repo> --plan <plan-file> --mode <multi-phase|paused|completed|stopped> --workstream <id> --current-phase <n> --continuation <target> --next-action "<next action>"`, then update `workstreams.md` with `workstream-upsert`.
29
+ 16. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
30
+ 17. Before handoff, run `python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
31
+ 18. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
27
32
 
28
33
  ## Reading Order
29
34
 
@@ -45,10 +50,15 @@ Run the packaged script to inspect the target repository before editing files. U
45
50
  - Do not overwrite existing files unless the human asked for it or you pass `--force`.
46
51
  - Treat the generated files as starting points. After generation, tighten them with repository-specific details instead of leaving placeholders behind.
47
52
  - Treat `docs/exec-plans/` as required state for multi-step work, not optional notes.
53
+ - Read `docs/exec-plans/workstreams.md` before resuming interrupted feature, refactor, reliability, security, frontend, or cleanup work.
48
54
  - Treat `docs/sops/` as mechanical operating procedures, not background reading.
49
55
  - When you answer a question using facts that are not yet in the repo but should be reusable, write them into a durable doc before finishing.
50
- - Prefer `knowledge-mark-written --id ... --evidence ...` so durable docs can use natural wording instead of duplicated exact fact strings.
51
- - Use `plan-close` as the final guardrail so plan state and durable docs stay synchronized.
56
+ - Prefer `knowledge-mark-written --id ... --evidence-file ...` so durable docs can use natural wording without shell quoting failures or duplicated exact fact strings.
57
+ - Use `defect-log` for every bug found by tests, evals, browser validation, or code review; unresolved defects must block handoff.
58
+ - Use `defect-resolve` only after the implementation is fixed and you can cite passing validation or code evidence.
59
+ - Use `quality-score` before `plan-close`; failed scores must drive rework, not handoff.
60
+ - Use `phase-set` and `workstream-upsert` before `plan-close` for Phase 1/2/3 or any other resumable multi-plan work.
61
+ - Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized.
52
62
  - Use `check` as the local handoff guardrail for user repositories.
53
63
  - Run `python3 evals/run_evals.py` after skill changes and treat failures as iteration input.
54
64
  - Do not add CI to user repositories unless the human explicitly asks for it.
@@ -58,6 +68,7 @@ Run the packaged script to inspect the target repository before editing files. U
58
68
  - Keep `AGENTS.md` short and routing-oriented.
59
69
  - Keep durable knowledge in repo docs, not in chat-only explanations.
60
70
  - Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
71
+ - Keep resumable workstreams in `docs/exec-plans/workstreams.md`.
61
72
  - Keep generated material under `docs/generated/`.
62
73
  - Keep external, model-friendly references under `docs/references/`.
63
74
  - Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.
@@ -11,6 +11,14 @@
11
11
  "id": "closed-loop-plan",
12
12
  "description": "Execution plans should refuse to close until durable knowledge is written back."
13
13
  },
14
+ {
15
+ "id": "plan-path-canonicalization",
16
+ "description": "Plan commands should canonicalize absolute plan paths before updating workstreams."
17
+ },
18
+ {
19
+ "id": "defect-recovery-loop",
20
+ "description": "Validation or review defects should block quality gates until resolved with evidence."
21
+ },
14
22
  {
15
23
  "id": "preserve-unmanaged-docs",
16
24
  "description": "Existing user-owned harness files should be skipped unless explicitly forced."