@hallucination-studio/harness-engine 1.0.0-beta.10.9ff10d9 → 1.0.0-beta.11.2a4849a

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -18,6 +18,7 @@ ask for missing high-impact facts, create the harness files, and keep future wor
18
18
  - Creates execution-plan folders for active and completed plans.
19
19
  - Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
20
20
  - Reconciles managed harnesses through the same `init` flow, refreshing managed files and backfilling newly introduced managed files while preserving unmanaged docs.
21
+ - Provides `clean` to remove transient harness runtime state, add `.gitignore` entries, and untrack already committed harness runtime files so a follow-up commit deletes them from the remote.
21
22
  - Enforces a local harness check without assuming the user's project has CI.
22
23
  - Previews and optionally removes stale unreferenced generated evidence under `docs/generated/`.
23
24
  - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
@@ -161,12 +162,44 @@ python3 .codex/skills/harness-engine/scripts/manage_harness.py workstream-upsert
161
162
  python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo .
162
163
  python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14
163
164
  python3 .codex/skills/harness-engine/scripts/manage_harness.py evidence-prune --repo . --older-than-days 14 --apply
165
+ python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo .
166
+ python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo . --apply
164
167
  ```
165
168
 
166
169
  The quality gate is intentionally local and repository-owned. It does not require the user's
167
170
  project to have CI. `plan-close` refuses to move a plan to `completed` unless `quality-score`
168
171
  has passed, and `check` reports active plans whose quality gate is missing or failing.
169
172
 
173
+ ## Version Control Policy
174
+
175
+ Commit harness docs that carry durable repository knowledge: `AGENTS.md`, `ARCHITECTURE.md`,
176
+ `docs/PLANS.md`, `docs/QUALITY_SCORE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`,
177
+ `docs/FRONTEND.md`, `docs/sops/`, `docs/product-specs/`, `docs/design-docs/`,
178
+ `docs/references/`, and intentional execution-plan state.
179
+
180
+ Do not commit local skill installs or generated evidence by default. `clean --apply` adds these ignores:
181
+
182
+ ```gitignore
183
+ # harness-engine transient files
184
+ .codex/skills/
185
+ docs/generated/
186
+ # end harness-engine transient files
187
+ ```
188
+
189
+ If those files were already committed or pushed, run:
190
+
191
+ ```bash
192
+ python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo .
193
+ python3 .codex/skills/harness-engine/scripts/manage_harness.py clean --repo . --apply
194
+ git status --short
195
+ git diff --cached --stat
196
+ git commit -m "Remove harness runtime artifacts from git"
197
+ git push
198
+ ```
199
+
200
+ `clean --apply` removes local generated evidence and stale task snapshots, then uses
201
+ `git rm --cached` to stage removal of tracked harness runtime files from git and the remote.
202
+
170
203
  For multi-phase work, `Phase Continuity` and `docs/exec-plans/workstreams.md` form the recovery
171
204
  ledger. A plan like `Local Workbench Phase 1` can close only after it records whether the workstream
172
205
  continues, pauses, completes, or stops, and where the next agent should resume.
@@ -252,7 +285,7 @@ These scores describe the current implementation, not an external guarantee.
252
285
  | Knowledge, quality, and workstream closure loop | 9.1 / 10 | Stable knowledge IDs plus exact destination evidence reduce noisy doc duplication, `quality-score` rejects missing evidence notes, defects block closure until resolved, and workstreams make phased work recoverable. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
253
286
  | CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
254
287
  | Generated harness docs | 8.4 / 10 | Covers architecture, plans, reliability, security, frontend policy, issue workflows, references, generated artifacts, and SOPs. The docs now front-load exact knowledge evidence, per-dimension quality notes, and plan placeholder cleanup, but templates still require Codex to tighten project-specific language after generation. |
255
- | Evaluation coverage | 9 / 10 | `npm test` runs 12 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, issue workflow coverage, closed-loop plan behavior, phase continuity, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, generated-evidence cleanup, eval report shape, and user-owned doc preservation. A fully automated Codex child-agent E2E would raise this further. |
288
+ | Evaluation coverage | 9 / 10 | `npm test` runs 13 structured eval cases covering empty-repo init, frontend analysis, init reconciliation, clean command behavior for local runtime state and already tracked artifacts, issue workflow coverage, closed-loop plan behavior, phase continuity, path canonicalization, defect recovery, required quality-score notes, exact knowledge evidence, generated-evidence cleanup, eval report shape, and user-owned doc preservation. A fully automated Codex child-agent E2E would raise this further. |
256
289
  | Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
257
290
  | User-project safety | 8.8 / 10 | The skill avoids adding CI to target projects by default, preserves unmanaged files unless forced, and requires evidence-backed closure for defects and durable knowledge. More destructive-change simulation in evals would improve this score. |
258
291
  | Overall | 9 / 10 | The skill is now strong enough for regular use: self evals pass across the structured suite, real acceptance covered initial scaffold plus frontend and backend issue workflows, and the main failure modes found during acceptance are now documented and eval-covered. Remaining leverage is automated child-agent E2E coverage and structured plan metadata. |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@hallucination-studio/harness-engine",
3
- "version": "1.0.0-beta.10.9ff10d9",
3
+ "version": "1.0.0-beta.11.2a4849a",
4
4
  "description": "Install the harness-engine Codex skill for initializing and reconciling advanced repository harness docs.",
5
5
  "repository": {
6
6
  "type": "git",
@@ -28,7 +28,8 @@ Run the packaged script to inspect the target repository before editing files. U
28
28
  17. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
29
29
  18. Before handoff, run `python3 .codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
30
30
  19. To review stale generated evidence, run `python3 scripts/manage_harness.py evidence-prune --repo <target-repo>` first; it is dry-run by default. Add `--apply` only after checking the candidate list.
31
- 20. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
31
+ 20. To clean transient harness runtime files or remove already committed runtime files from the remote, run `python3 scripts/manage_harness.py clean --repo <target-repo>` first; it is dry-run by default. Add `--apply` to clean local runtime state, update `.gitignore`, and stage `git rm --cached` removals, then commit and push.
32
+ 21. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
32
33
 
33
34
  ## Reading Order
34
35
 
@@ -63,6 +64,7 @@ Run the packaged script to inspect the target repository before editing files. U
63
64
  - Use `plan-close` as the final guardrail so plan state, quality score, and durable docs stay synchronized.
64
65
  - Use `check` as the local handoff guardrail for user repositories.
65
66
  - Use `evidence-prune` as a cleanup preview for old unreferenced files under `docs/generated/`; it never deletes unless `--apply` is present.
67
+ - Use `clean` when `.codex/skills/`, `docs/generated/`, or stale `docs/exec-plans/active|completed/` files need cleanup or were already committed. It never changes files or the git index unless `--apply` is present.
66
68
  - Run `python3 evals/run_evals.py` after skill changes, read the structured report, and treat per-case failures as iteration input.
67
69
  - Do not add CI to user repositories unless the human explicitly asks for it.
68
70
 
@@ -72,7 +74,7 @@ Run the packaged script to inspect the target repository before editing files. U
72
74
  - Keep durable knowledge in repo docs, not in chat-only explanations.
73
75
  - Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
74
76
  - Keep resumable workstreams in `docs/exec-plans/workstreams.md`.
75
- - Keep generated material under `docs/generated/`.
77
+ - Keep generated evidence under `docs/generated/`; it is local runtime output and is ignored by git unless the human intentionally promotes a specific artifact into tracked docs.
76
78
  - Keep external, model-friendly references under `docs/references/`.
77
79
  - Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.
78
80
 
@@ -11,6 +11,10 @@
11
11
  "id": "init-reconciles-existing-harness",
12
12
  "description": "Init should reconcile an existing harness by refreshing managed files and adding newly introduced managed files."
13
13
  },
14
+ {
15
+ "id": "clean-removes-runtime-state-and-untracks-artifacts",
16
+ "description": "Clean should dry-run by default, remove local harness runtime state, update .gitignore, and stage removal of tracked harness runtime artifacts when applied."
17
+ },
14
18
  {
15
19
  "id": "closed-loop-plan",
16
20
  "description": "Execution plans should refuse to close until durable knowledge is written back."
@@ -197,6 +197,66 @@ def test_init_reconciles_existing_harness(tmp_root):
197
197
  assert_exists(repo, "docs/sops/evidence-first-eval-loop.md")
198
198
 
199
199
 
200
+ def test_clean_removes_runtime_state_and_untracks_artifacts(tmp_root):
201
+ repo = tmp_root / "clean-repo"
202
+ repo.mkdir()
203
+ subprocess.run(["git", "init"], cwd=repo, text=True, capture_output=True, check=True)
204
+ tracked_files = [
205
+ ".codex/skills/harness-engine/SKILL.md",
206
+ "docs/generated/canvas-polish-desktop-final.png",
207
+ "docs/generated/harness-analysis.json",
208
+ "docs/exec-plans/active/2026-06-11-old-task.md",
209
+ "docs/exec-plans/completed/2026-06-11-old-task.md",
210
+ ]
211
+ for relative_path in tracked_files:
212
+ path = repo / relative_path
213
+ path.parent.mkdir(parents=True, exist_ok=True)
214
+ path.write_text("tracked runtime artifact\n")
215
+ subprocess.run(["git", "add", *tracked_files], cwd=repo, text=True, capture_output=True, check=True)
216
+ subprocess.run(
217
+ ["git", "commit", "-m", "track runtime artifacts"],
218
+ cwd=repo,
219
+ text=True,
220
+ capture_output=True,
221
+ check=True,
222
+ )
223
+
224
+ dry_run = run_manager("clean", "--repo", str(repo))
225
+ if dry_run["mode"] != "dry-run" or dry_run["tracked_candidate_count"] != len(tracked_files):
226
+ raise AssertionError("clean should dry-run tracked runtime artifact candidates")
227
+ if set(dry_run["tracked_candidates"]) != set(tracked_files):
228
+ raise AssertionError("clean tracked candidates should match tracked harness runtime artifacts")
229
+ if "docs/generated/canvas-polish-desktop-final.png" not in set(dry_run["local_candidates"]):
230
+ raise AssertionError("clean should preview local generated evidence cleanup")
231
+ for relative_path in tracked_files:
232
+ if not (repo / relative_path).exists():
233
+ raise AssertionError("clean dry-run should not delete local files")
234
+
235
+ applied = run_manager("clean", "--repo", str(repo), "--apply")
236
+ if applied["mode"] != "apply" or set(applied["removed_from_index"]) != set(tracked_files):
237
+ raise AssertionError("clean --apply should remove candidates from the git index")
238
+ assert_contains(repo, ".gitignore", ".codex/skills/")
239
+ assert_contains(repo, ".gitignore", "docs/generated/")
240
+ status = subprocess.run(
241
+ ["git", "status", "--short"],
242
+ cwd=repo,
243
+ text=True,
244
+ capture_output=True,
245
+ check=True,
246
+ ).stdout
247
+ for relative_path in tracked_files:
248
+ if f"D {relative_path}" not in status:
249
+ raise AssertionError(f"clean should stage index deletion for {relative_path}")
250
+ for relative_path in tracked_files:
251
+ if relative_path.startswith(".codex/skills/"):
252
+ if not (repo / relative_path).exists():
253
+ raise AssertionError(f"clean should keep local skill install file for {relative_path}")
254
+ elif (repo / relative_path).exists():
255
+ raise AssertionError(f"clean should delete local runtime file for {relative_path}")
256
+ if "A .gitignore" not in status:
257
+ raise AssertionError("clean should stage the new .gitignore block")
258
+
259
+
200
260
  def test_closed_loop_plan(tmp_root):
201
261
  repo = tmp_root / "loop-repo"
202
262
  repo.mkdir()
@@ -1091,6 +1151,7 @@ EVALS = [
1091
1151
  ("empty-repo-init", test_empty_repo_init),
1092
1152
  ("frontend-analysis", test_frontend_analysis),
1093
1153
  ("init-reconciles-existing-harness", test_init_reconciles_existing_harness),
1154
+ ("clean-removes-runtime-state-and-untracks-artifacts", test_clean_removes_runtime_state_and_untracks_artifacts),
1094
1155
  ("closed-loop-plan", test_closed_loop_plan),
1095
1156
  ("phase-continuity-workstream", test_phase_continuity_workstream),
1096
1157
  ("plan-path-canonicalization", test_plan_path_canonicalization),
@@ -7,6 +7,9 @@ Every generated file starts with a managed marker:
7
7
  Init behavior:
8
8
 
9
9
  - `init`: create missing files for new repositories; when an existing managed harness is detected, refresh managed files and create missing files while preserving unmanaged files
10
+ - `clean` removes transient runtime state under `docs/generated/`, `docs/exec-plans/active/`, and `docs/exec-plans/completed/`
11
+ - `clean` maintains `.gitignore` entries for `.codex/skills/` and `docs/generated/`
12
+ - `clean` previews or stages removal of already tracked harness runtime files from the git index; it requires `--apply` before running `git rm --cached`
10
13
 
11
14
  Use `init` as the normal workspace command so creation and reconciliation share one path. Use `--force` only when the human explicitly accepts overwriting.
12
15
 
@@ -48,6 +48,7 @@ After the scaffold exists:
48
48
  - run `quality-score` after implementation and validation, with evidence notes for every dimension
49
49
  - if `quality-score` fails, implement the `## Rework Required` items and score again
50
50
  - use `phase-set` and `workstream-upsert` when a plan belongs to phased or resumable work
51
+ - use `clean` when harness runtime files need cleanup or were already committed; review dry-run output first, then apply, commit, and push the staged removals
51
52
  - use `plan-close` to verify no durable knowledge is left stranded in the active plan
52
53
  - before `plan-close`, replace generic plan placeholders with task-specific scope, constraints, steps, validation, and completion notes; delete unused ad hoc durable-knowledge TODOs
53
54
  - run `.codex/skills/harness-engine/scripts/manage_harness.py check --repo <target-repo>` before handoff
@@ -5,13 +5,35 @@ import hashlib
5
5
  import json
6
6
  import os
7
7
  import re
8
+ import subprocess
8
9
  import time
9
10
  from datetime import UTC, datetime
10
11
  from pathlib import Path
11
12
 
12
13
  MANAGED_MARKER = "<!-- harness-engine:managed -->"
14
+ OBSOLETE_MANAGED_MARKERS = [
15
+ "<!-- harness-repo-bootstrap:managed -->",
16
+ "<!-- harness-init:managed -->",
17
+ ]
13
18
  DEFAULT_KNOWLEDGE_PLACEHOLDER = "- [ ] Add durable facts here as they emerge -> <destination-doc>"
14
19
  DEFAULT_DEFECT_PLACEHOLDER = "None."
20
+ GITIGNORE_BLOCK_START = "# harness-engine transient files"
21
+ GITIGNORE_BLOCK_END = "# end harness-engine transient files"
22
+ GITIGNORE_ENTRIES = [
23
+ ".codex/skills/",
24
+ "docs/generated/",
25
+ ]
26
+ CLEAN_INIT_DIRS = [
27
+ "docs/generated",
28
+ "docs/exec-plans/active",
29
+ "docs/exec-plans/completed",
30
+ ]
31
+ GIT_CLEAN_PATHS = [
32
+ ".codex/skills",
33
+ "docs/generated",
34
+ "docs/exec-plans/active",
35
+ "docs/exec-plans/completed",
36
+ ]
15
37
  PLAN_TEMPLATE = """# Execution Plan: {title}
16
38
 
17
39
  ## Goal
@@ -673,7 +695,7 @@ def detect_existing_managed_files(repo):
673
695
  path = repo / relative_path
674
696
  if path.exists():
675
697
  try:
676
- if is_managed_text(path.read_text()):
698
+ if is_harness_owned_text(path.read_text()):
677
699
  managed.append(relative_path)
678
700
  except UnicodeDecodeError:
679
701
  continue
@@ -738,10 +760,102 @@ def ensure_parent(path):
738
760
  path.parent.mkdir(parents=True, exist_ok=True)
739
761
 
740
762
 
763
+ def ensure_gitignore(repo):
764
+ path = repo / ".gitignore"
765
+ existing = path.read_text() if path.exists() else ""
766
+ block_lines = [GITIGNORE_BLOCK_START, *GITIGNORE_ENTRIES, GITIGNORE_BLOCK_END]
767
+ block = "\n".join(block_lines)
768
+ pattern = re.compile(
769
+ rf"(^|\n){re.escape(GITIGNORE_BLOCK_START)}\n.*?\n{re.escape(GITIGNORE_BLOCK_END)}(?=\n|$)",
770
+ flags=re.DOTALL,
771
+ )
772
+ if pattern.search(existing):
773
+ updated = pattern.sub(lambda match: match.group(1) + block, existing)
774
+ else:
775
+ prefix = existing.rstrip()
776
+ updated = f"{prefix}\n\n{block}" if prefix else block
777
+ updated = updated.rstrip() + "\n"
778
+ changed = updated != existing
779
+ if changed:
780
+ path.write_text(updated)
781
+ return {
782
+ "path": ".gitignore",
783
+ "updated": changed,
784
+ "entries": GITIGNORE_ENTRIES,
785
+ }
786
+
787
+
788
+ def clean_init_state(repo):
789
+ cleaned = []
790
+ for relative_dir in CLEAN_INIT_DIRS:
791
+ root = repo / relative_dir
792
+ if not root.exists():
793
+ continue
794
+ for path in sorted(root.rglob("*"), reverse=True):
795
+ if path.is_file() or path.is_symlink():
796
+ cleaned.append(str(path.relative_to(repo)))
797
+ path.unlink()
798
+ elif path.is_dir():
799
+ try:
800
+ path.rmdir()
801
+ except OSError:
802
+ pass
803
+ return cleaned
804
+
805
+
806
+ def git_tracked_harness_runtime_files(repo, roots=None):
807
+ roots = roots or GIT_CLEAN_PATHS
808
+ result = subprocess.run(
809
+ ["git", "-C", str(repo), "ls-files", *roots],
810
+ text=True,
811
+ capture_output=True,
812
+ check=False,
813
+ )
814
+ if result.returncode != 0:
815
+ raise RuntimeError(result.stderr.strip() or "git ls-files failed")
816
+ return [line for line in result.stdout.splitlines() if line.strip()]
817
+
818
+
819
+ def git_untrack_files(repo, paths):
820
+ if not paths:
821
+ return []
822
+ result = subprocess.run(
823
+ ["git", "-C", str(repo), "rm", "-r", "--cached", "--", *paths],
824
+ text=True,
825
+ capture_output=True,
826
+ check=False,
827
+ )
828
+ if result.returncode != 0:
829
+ raise RuntimeError(result.stderr.strip() or "git rm --cached failed")
830
+ return paths
831
+
832
+
833
+ def git_add_paths(repo, paths):
834
+ if not paths:
835
+ return []
836
+ result = subprocess.run(
837
+ ["git", "-C", str(repo), "add", "--", *paths],
838
+ text=True,
839
+ capture_output=True,
840
+ check=False,
841
+ )
842
+ if result.returncode != 0:
843
+ raise RuntimeError(result.stderr.strip() or "git add failed")
844
+ return paths
845
+
846
+
741
847
  def is_managed_text(text):
742
848
  return text.startswith(MANAGED_MARKER)
743
849
 
744
850
 
851
+ def is_obsolete_managed_text(text):
852
+ return any(text.startswith(marker) for marker in OBSOLETE_MANAGED_MARKERS)
853
+
854
+
855
+ def is_harness_owned_text(text):
856
+ return is_managed_text(text) or is_obsolete_managed_text(text)
857
+
858
+
745
859
  def slugify(value):
746
860
  normalized = re.sub(r"[^a-z0-9]+", "-", value.strip().lower()).strip("-")
747
861
  return normalized or "task"
@@ -1580,7 +1694,7 @@ def should_write(path, refresh_managed, force):
1580
1694
  if force:
1581
1695
  return True
1582
1696
  try:
1583
- is_managed = is_managed_text(path.read_text())
1697
+ is_managed = is_harness_owned_text(path.read_text())
1584
1698
  except UnicodeDecodeError:
1585
1699
  return False
1586
1700
  if refresh_managed and is_managed:
@@ -2218,6 +2332,48 @@ def command_evidence_prune(args):
2218
2332
  write_json(args.output, result)
2219
2333
 
2220
2334
 
2335
+ def command_clean(args):
2336
+ repo = Path(args.repo).resolve()
2337
+ candidates = git_tracked_harness_runtime_files(repo)
2338
+ local_clean_candidates = []
2339
+ for relative_dir in CLEAN_INIT_DIRS:
2340
+ root = repo / relative_dir
2341
+ if not root.exists():
2342
+ continue
2343
+ for path in sorted(root.rglob("*")):
2344
+ if path.is_file() or path.is_symlink():
2345
+ local_clean_candidates.append(str(path.relative_to(repo)))
2346
+ gitignore = None
2347
+ removed_from_index = []
2348
+ cleaned = []
2349
+ if args.apply:
2350
+ gitignore = ensure_gitignore(repo)
2351
+ cleaned = clean_init_state(repo)
2352
+ removed_from_index = git_untrack_files(repo, candidates)
2353
+ if gitignore["updated"]:
2354
+ git_add_paths(repo, [gitignore["path"]])
2355
+ result = {
2356
+ "repo": str(repo),
2357
+ "mode": "apply" if args.apply else "dry-run",
2358
+ "tracked_candidate_count": len(candidates),
2359
+ "tracked_candidates": candidates,
2360
+ "local_candidate_count": len(local_clean_candidates),
2361
+ "local_candidates": local_clean_candidates,
2362
+ "gitignore": gitignore,
2363
+ "removed_from_index": removed_from_index,
2364
+ "cleaned": cleaned,
2365
+ "next_steps": (
2366
+ [
2367
+ "Review staged changes with `git status --short` and `git diff --cached --stat`.",
2368
+ "Commit and push to remove these files from the remote repository.",
2369
+ ]
2370
+ if args.apply
2371
+ else ["Re-run with `--apply` to clean local harness runtime files, update .gitignore, and stage git index removals."]
2372
+ ),
2373
+ }
2374
+ write_json(args.output, result)
2375
+
2376
+
2221
2377
  def build_parser():
2222
2378
  parser = argparse.ArgumentParser(description="Manage the harness repo scaffold.")
2223
2379
  subparsers = parser.add_subparsers(dest="command", required=True)
@@ -2361,6 +2517,12 @@ def build_parser():
2361
2517
  evidence_prune.add_argument("--output")
2362
2518
  evidence_prune.set_defaults(func=command_evidence_prune)
2363
2519
 
2520
+ clean = subparsers.add_parser("clean")
2521
+ clean.add_argument("--repo", required=True)
2522
+ clean.add_argument("--apply", action="store_true")
2523
+ clean.add_argument("--output")
2524
+ clean.set_defaults(func=command_clean)
2525
+
2364
2526
  return parser
2365
2527
 
2366
2528