@hallucination-studio/harness-engine 1.0.0-beta.8.87407

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,198 @@
1
+ # Harness Engine
2
+
3
+ Harness Engine packages a Codex skill that bootstraps an agent-first repository harness.
4
+ It turns the repository-shaping ideas from OpenAI's
5
+ ["Harness engineering: leveraging Codex in an agent-first world"](https://openai.com/index/harness-engineering/)
6
+ into an installable `npx` workflow.
7
+
8
+ The package does not install a harness into this repository. This repository builds and publishes
9
+ the installer. Users install the bundled `harness-repo-bootstrap` skill into their own project or
10
+ global Codex skill directory, then ask Codex to use that skill to analyze the target repository,
11
+ ask for missing high-impact facts, create the harness files, and keep future work closed-loop.
12
+
13
+ ## What This Project Does
14
+
15
+ - Installs the `harness-repo-bootstrap` Codex skill locally, globally, or into a custom skills directory.
16
+ - Provides a repository analyzer that detects language, package manager, frontend signals, existing harness files, missing execution-plan state, and missing SOPs.
17
+ - Generates a short routing-style `AGENTS.md` plus durable system-of-record docs such as `ARCHITECTURE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`, `docs/QUALITY_SCORE.md`, and `docs/FRONTEND.md`.
18
+ - Creates execution-plan folders for active and completed plans.
19
+ - Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
20
+ - Enforces a local harness check without assuming the user's project has CI.
21
+ - Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
22
+
23
+ ## Why It Exists
24
+
25
+ The OpenAI harness engineering article argues that agent-first repositories work better when the
26
+ repo itself becomes the system of record: a short `AGENTS.md` routes agents into deeper docs,
27
+ execution plans live beside the code, and validation loops are mechanical rather than remembered.
28
+ This project packages that shape as a reusable Codex skill.
29
+
30
+ The goal is not to blindly copy a template. The skill first analyzes the target repo, then asks the
31
+ human to confirm product, reliability, security, frontend, and quality facts that cannot be safely
32
+ inferred from code alone.
33
+
34
+ ## Install
35
+
36
+ The npm package is scoped as `@hallucination-studio/harness-engine`. The installed command name is
37
+ still `harness-engine`.
38
+
39
+ Install the latest stable release into the current repository:
40
+
41
+ ```bash
42
+ npx @hallucination-studio/harness-engine install --local
43
+ ```
44
+
45
+ Install the latest beta build from `main`:
46
+
47
+ ```bash
48
+ npx @hallucination-studio/harness-engine@beta install --local
49
+ ```
50
+
51
+ Install globally into `${CODEX_HOME:-~/.codex}/skills`:
52
+
53
+ ```bash
54
+ npx @hallucination-studio/harness-engine install --global
55
+ ```
56
+
57
+ Install into a custom skills directory:
58
+
59
+ ```bash
60
+ npx @hallucination-studio/harness-engine install --path /path/to/skills
61
+ ```
62
+
63
+ Replace an existing installed skill:
64
+
65
+ ```bash
66
+ npx @hallucination-studio/harness-engine install --local --force
67
+ ```
68
+
69
+ Show where the skill would be installed:
70
+
71
+ ```bash
72
+ npx @hallucination-studio/harness-engine where --local
73
+ ```
74
+
75
+ ## Use The Skill In A Target Repo
76
+
77
+ After installing, open Codex in the target repository and invoke:
78
+
79
+ ```text
80
+ $harness-repo-bootstrap
81
+ ```
82
+
83
+ The intended workflow is:
84
+
85
+ 1. Analyze the target repository.
86
+ 2. Ask the human only for unresolved, high-impact facts.
87
+ 3. Initialize or update the harness files.
88
+ 4. Create execution plans for multi-step work.
89
+ 5. Log durable knowledge into active plans.
90
+ 6. Write the durable facts into permanent docs.
91
+ 7. Mark knowledge as written using ID plus evidence text.
92
+ 8. Run the local harness check before handoff.
93
+ 9. Close the execution plan only after the durable docs are updated.
94
+
95
+ The installed skill exposes the underlying script at:
96
+
97
+ ```bash
98
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py --help
99
+ ```
100
+
101
+ Common commands:
102
+
103
+ ```bash
104
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py analyze --repo . --output analysis.json
105
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py sample-answers --analysis analysis.json --output answers.json
106
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py init --repo . --answers answers.json
107
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py plan-start --repo . --slug feature-name --goal "Implement the feature"
108
+ python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo .
109
+ ```
110
+
111
+ ## Generated Harness Shape
112
+
113
+ A typical initialized target repository receives:
114
+
115
+ ```text
116
+ AGENTS.md
117
+ ARCHITECTURE.md
118
+ docs/
119
+ ├── DESIGN.md
120
+ ├── FRONTEND.md
121
+ ├── PLANS.md
122
+ ├── PRODUCT_SENSE.md
123
+ ├── QUALITY_SCORE.md
124
+ ├── RELIABILITY.md
125
+ ├── SECURITY.md
126
+ ├── design-docs/
127
+ ├── exec-plans/
128
+ │ ├── active/
129
+ │ ├── completed/
130
+ │ └── tech-debt-tracker.md
131
+ ├── generated/
132
+ ├── product-specs/
133
+ ├── references/
134
+ └── sops/
135
+ ```
136
+
137
+ `AGENTS.md` is intentionally short. It is a router, not an encyclopedia.
138
+
139
+ ## Version Channels
140
+
141
+ - `latest`: Stable releases created from GitHub Release tags. The workflow derives the package version from the release tag and publishes to npm with the `latest` dist-tag.
142
+ - `beta`: Every push to `main` publishes a unique prerelease version like `1.0.0-beta.<run-number>.<short-sha>` with the `beta` dist-tag. npm cannot overwrite an existing version, so the `beta` tag moves forward to the newest main build.
143
+ - `nightly`: A scheduled daily build publishes versions like `1.0.0-nightly.<yyyymmdd>.<run-number>` with the `nightly` dist-tag.
144
+ - Manual test builds: The release workflow can be run manually. By default it performs `npm publish --dry-run` with a generated `-test.<run-number>` version. Set `dry_run=false` to publish a test package to a non-`latest` dist-tag such as `next`.
145
+
146
+ Examples:
147
+
148
+ ```bash
149
+ npx @hallucination-studio/harness-engine install --local
150
+ npx @hallucination-studio/harness-engine@beta install --local
151
+ npx @hallucination-studio/harness-engine@nightly install --local
152
+ ```
153
+
154
+ ## Local Development
155
+
156
+ Run the skill evaluations:
157
+
158
+ ```bash
159
+ npm test
160
+ ```
161
+
162
+ Smoke-test installation:
163
+
164
+ ```bash
165
+ npm run smoke:install
166
+ ```
167
+
168
+ Check npm package contents:
169
+
170
+ ```bash
171
+ npm run pack:check
172
+ ```
173
+
174
+ The publish workflows expect an npm token when trusted publishing is not yet configured:
175
+
176
+ ```text
177
+ GitHub Actions secret: NPM_TOKEN
178
+ ```
179
+
180
+ ## Implementation Quality Score
181
+
182
+ These scores describe the current implementation, not an external guarantee.
183
+
184
+ | Layer | Score | Notes |
185
+ | --- | ---: | --- |
186
+ | Product fit | 8.5 / 10 | Clear purpose: install a Codex skill that creates and maintains an agent-first repository harness. The main missing piece is broader real-world usage data across more project types. |
187
+ | Skill workflow design | 8.5 / 10 | Strong progressive workflow: analyze, confirm, initialize/update, plan, capture knowledge, validate, close. The current skill is opinionated but still adapts to target repositories. |
188
+ | Knowledge-closure loop | 8 / 10 | Stable knowledge IDs plus evidence text reduce noisy doc duplication. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
189
+ | CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
190
+ | Generated harness docs | 7.5 / 10 | Covers architecture, plans, reliability, security, frontend policy, references, generated artifacts, and SOPs. Templates still require Codex to tighten project-specific language after generation. |
191
+ | Evaluation coverage | 7.5 / 10 | Includes empty-repo init, frontend analysis, closed-loop plan behavior, user-owned doc preservation, and installer smoke tests. More end-to-end Codex acceptance tests would raise confidence. |
192
+ | Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
193
+ | User-project safety | 8.5 / 10 | The skill avoids adding CI to target projects by default and uses local harness checks instead. It preserves unmanaged files unless forced. |
194
+ | Overall | 8.1 / 10 | Usable and coherent, with the highest leverage still in richer evals and more structured plan/knowledge state. |
195
+
196
+ ## Reference
197
+
198
+ - OpenAI: [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)
package/bin/install.js ADDED
@@ -0,0 +1,154 @@
1
+ #!/usr/bin/env node
2
+
3
+ const fs = require("fs");
4
+ const os = require("os");
5
+ const path = require("path");
6
+
7
+ const PACKAGE_ROOT = path.resolve(__dirname, "..");
8
+ const SKILL_NAME = "harness-repo-bootstrap";
9
+ const SOURCE_SKILL_DIR = path.join(PACKAGE_ROOT, "skills", SKILL_NAME);
10
+
11
+ function printHelp() {
12
+ console.log(`harness-engine
13
+
14
+ Usage:
15
+ npx @hallucination-studio/harness-engine install [--local | --global | --path <dir>] [--force]
16
+ npx @hallucination-studio/harness-engine where [--local | --global | --path <dir>]
17
+
18
+ Options:
19
+ --local Install into <cwd>/.codex/skills
20
+ --global Install into \${CODEX_HOME:-~/.codex}/skills
21
+ --path <dir> Install into a custom skills directory
22
+ --force Replace an existing installed skill
23
+ -h, --help Show this help text
24
+ `);
25
+ }
26
+
27
+ function parseArgs(argv) {
28
+ const result = {
29
+ command: "install",
30
+ mode: null,
31
+ customPath: null,
32
+ force: false
33
+ };
34
+
35
+ const args = [...argv];
36
+ if (args.length > 0 && !args[0].startsWith("-")) {
37
+ result.command = args.shift();
38
+ }
39
+
40
+ for (let i = 0; i < args.length; i += 1) {
41
+ const arg = args[i];
42
+ if (arg === "--local") {
43
+ result.mode = "local";
44
+ } else if (arg === "--global") {
45
+ result.mode = "global";
46
+ } else if (arg === "--path") {
47
+ result.mode = "custom";
48
+ result.customPath = args[i + 1];
49
+ i += 1;
50
+ } else if (arg === "--force") {
51
+ result.force = true;
52
+ } else if (arg === "-h" || arg === "--help") {
53
+ result.command = "help";
54
+ } else {
55
+ throw new Error(`Unknown argument: ${arg}`);
56
+ }
57
+ }
58
+
59
+ if (result.mode === "custom" && !result.customPath) {
60
+ throw new Error("--path requires a directory value");
61
+ }
62
+
63
+ if (!result.mode) {
64
+ result.mode = "local";
65
+ }
66
+
67
+ return result;
68
+ }
69
+
70
+ function resolveSkillsDir(mode, customPath) {
71
+ if (mode === "local") {
72
+ return path.join(process.cwd(), ".codex", "skills");
73
+ }
74
+
75
+ if (mode === "global") {
76
+ const codexHome = process.env.CODEX_HOME || path.join(os.homedir(), ".codex");
77
+ return path.join(codexHome, "skills");
78
+ }
79
+
80
+ return path.resolve(process.cwd(), customPath);
81
+ }
82
+
83
+ function copyDir(sourceDir, targetDir) {
84
+ fs.mkdirSync(targetDir, { recursive: true });
85
+ for (const entry of fs.readdirSync(sourceDir, { withFileTypes: true })) {
86
+ const sourcePath = path.join(sourceDir, entry.name);
87
+ const targetPath = path.join(targetDir, entry.name);
88
+ if (entry.isDirectory()) {
89
+ copyDir(sourcePath, targetPath);
90
+ } else {
91
+ fs.copyFileSync(sourcePath, targetPath);
92
+ const stat = fs.statSync(sourcePath);
93
+ fs.chmodSync(targetPath, stat.mode);
94
+ }
95
+ }
96
+ }
97
+
98
+ function installSkill(destinationDir, force) {
99
+ const skillTargetDir = path.join(destinationDir, SKILL_NAME);
100
+ if (!fs.existsSync(SOURCE_SKILL_DIR)) {
101
+ throw new Error(`Bundled skill not found: ${SOURCE_SKILL_DIR}`);
102
+ }
103
+
104
+ if (fs.existsSync(skillTargetDir)) {
105
+ if (!force) {
106
+ throw new Error(`Skill already exists at ${skillTargetDir}. Re-run with --force to replace it.`);
107
+ }
108
+ fs.rmSync(skillTargetDir, { recursive: true, force: true });
109
+ }
110
+
111
+ fs.mkdirSync(destinationDir, { recursive: true });
112
+ copyDir(SOURCE_SKILL_DIR, skillTargetDir);
113
+ return skillTargetDir;
114
+ }
115
+
116
+ function main() {
117
+ let args;
118
+ try {
119
+ args = parseArgs(process.argv.slice(2));
120
+ } catch (error) {
121
+ console.error(`Error: ${error.message}`);
122
+ printHelp();
123
+ process.exit(1);
124
+ }
125
+
126
+ if (args.command === "help") {
127
+ printHelp();
128
+ return;
129
+ }
130
+
131
+ const destinationDir = resolveSkillsDir(args.mode, args.customPath);
132
+
133
+ if (args.command === "where") {
134
+ console.log(path.join(destinationDir, SKILL_NAME));
135
+ return;
136
+ }
137
+
138
+ if (args.command !== "install") {
139
+ console.error(`Unknown command: ${args.command}`);
140
+ printHelp();
141
+ process.exit(1);
142
+ }
143
+
144
+ try {
145
+ const installedPath = installSkill(destinationDir, args.force);
146
+ console.log(`Installed ${SKILL_NAME} to ${installedPath}`);
147
+ console.log("Invoke it in Codex with $harness-repo-bootstrap.");
148
+ } catch (error) {
149
+ console.error(`Install failed: ${error.message}`);
150
+ process.exit(1);
151
+ }
152
+ }
153
+
154
+ main();
package/package.json ADDED
@@ -0,0 +1,25 @@
1
+ {
2
+ "name": "@hallucination-studio/harness-engine",
3
+ "version": "1.0.0-beta.8.87407",
4
+ "description": "Install a Codex skill that bootstraps and updates an advanced harness-engineering repository layout.",
5
+ "repository": {
6
+ "type": "git",
7
+ "url": "git+https://github.com/hallucination-studio/harness-engine.git"
8
+ },
9
+ "bin": {
10
+ "harness-engine": "bin/install.js"
11
+ },
12
+ "publishConfig": {
13
+ "access": "public"
14
+ },
15
+ "scripts": {
16
+ "test": "python3 skills/harness-repo-bootstrap/evals/run_evals.py",
17
+ "smoke:install": "node scripts/smoke_install.js",
18
+ "pack:check": "npm pack --dry-run"
19
+ },
20
+ "files": [
21
+ "bin",
22
+ "skills"
23
+ ],
24
+ "license": "MIT"
25
+ }
@@ -0,0 +1,68 @@
1
+ ---
2
+ name: harness-repo-bootstrap
3
+ description: Bootstrap or refresh an advanced harness-engineering repository shape for Codex-driven projects. Use when Codex needs to analyze a repository, ask the human to confirm high-impact product and architecture facts, and then create or update AGENTS.md, architecture docs, policy docs, plan folders, reference folders, and SOP-backed starter files for the repository.
4
+ ---
5
+
6
+ # Harness Repo Bootstrap
7
+
8
+ Run the packaged script to inspect the target repository before editing files. Use the generated analysis to decide what to ask the human, what durable knowledge is missing from the repo, and which execution-plan and SOP files must be created or updated.
9
+
10
+ ## Workflow
11
+
12
+ 1. Run `python3 scripts/manage_harness.py analyze --repo <target-repo> --output <analysis.json>`.
13
+ 2. Read `analysis.json`.
14
+ 3. Ask the human only the unresolved, high-impact questions from `human_confirmations`.
15
+ 4. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
16
+ 5. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
17
+ 6. Run one of:
18
+ - `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`
19
+ - `python3 scripts/manage_harness.py update --repo <target-repo> --answers <answers.json>`
20
+ 7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
21
+ 8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`.
22
+ 9. Before closing the task, write those facts into their durable docs.
23
+ 10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<text already in durable doc>"`; use `--append` only when the exact fact should be appended mechanically.
24
+ 11. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
25
+ 12. Before handoff, run `python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
26
+ 13. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
27
+
28
+ ## Reading Order
29
+
30
+ - Read [references/workflow.md](references/workflow.md) first for the operating model and question policy.
31
+ - Read [references/file-map.md](references/file-map.md) when deciding which generated file to update.
32
+ - Read [references/question-catalog.md](references/question-catalog.md) when the analysis surfaces ambiguous product, security, reliability, or frontend facts.
33
+ - Read [references/knowledge-capture.md](references/knowledge-capture.md) when you discover facts that should survive chat history.
34
+ - Read [references/exec-plans.md](references/exec-plans.md) before planning or updating any multi-step work.
35
+ - Read [references/sop-index.md](references/sop-index.md) to choose the right SOP for architecture, UI validation, observability, or knowledge capture work.
36
+ - Read [references/template-policy.md](references/template-policy.md) before overwriting existing files.
37
+ - Read [references/evaluation-loop.md](references/evaluation-loop.md) before changing the skill, templates, scripts, or policy references.
38
+
39
+ ## Command Rules
40
+
41
+ - Prefer `analyze` before `init` or `update`.
42
+ - Prefer the draft, test, evaluate, iterate loop for changes to this skill.
43
+ - Prefer `init` when the target repo has none of the managed files.
44
+ - Prefer `update` when the repo already contains any managed file or a partial harness layout.
45
+ - Do not overwrite existing files unless the human asked for it or you pass `--force`.
46
+ - Treat the generated files as starting points. After generation, tighten them with repository-specific details instead of leaving placeholders behind.
47
+ - Treat `docs/exec-plans/` as required state for multi-step work, not optional notes.
48
+ - Treat `docs/sops/` as mechanical operating procedures, not background reading.
49
+ - When you answer a question using facts that are not yet in the repo but should be reusable, write them into a durable doc before finishing.
50
+ - Prefer `knowledge-mark-written --id ... --evidence ...` so durable docs can use natural wording instead of duplicated exact fact strings.
51
+ - Use `plan-close` as the final guardrail so plan state and durable docs stay synchronized.
52
+ - Use `check` as the local handoff guardrail for user repositories.
53
+ - Run `python3 evals/run_evals.py` after skill changes and treat failures as iteration input.
54
+ - Do not add CI to user repositories unless the human explicitly asks for it.
55
+
56
+ ## Output Rules
57
+
58
+ - Keep `AGENTS.md` short and routing-oriented.
59
+ - Keep durable knowledge in repo docs, not in chat-only explanations.
60
+ - Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
61
+ - Keep generated material under `docs/generated/`.
62
+ - Keep external, model-friendly references under `docs/references/`.
63
+ - Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.
64
+
65
+ ## Assets
66
+
67
+ - Scaffold templates live under [assets/repo-template](assets/repo-template).
68
+ - SOP starter docs live under [assets/sops](assets/sops).
@@ -0,0 +1,4 @@
1
+ interface:
2
+ display_name: "Harness Repo Bootstrap"
3
+ short_description: "Scaffold advanced Codex harness docs"
4
+ default_prompt: "Use $harness-repo-bootstrap to analyze this repository and scaffold or refresh its advanced harness documentation."
@@ -0,0 +1 @@
1
+ starter files are generated by scripts/manage_harness.py
@@ -0,0 +1 @@
1
+ sop starter files are generated by scripts/manage_harness.py
@@ -0,0 +1,18 @@
1
+ [
2
+ {
3
+ "id": "empty-repo-init",
4
+ "description": "Empty repositories should receive the full advanced harness scaffold."
5
+ },
6
+ {
7
+ "id": "frontend-analysis",
8
+ "description": "Frontend repositories should trigger frontend-specific confirmation and policy output."
9
+ },
10
+ {
11
+ "id": "closed-loop-plan",
12
+ "description": "Execution plans should refuse to close until durable knowledge is written back."
13
+ },
14
+ {
15
+ "id": "preserve-unmanaged-docs",
16
+ "description": "Existing user-owned harness files should be skipped unless explicitly forced."
17
+ }
18
+ ]