npm - @hallucination-studio/harness-engine - Versions diffs - 1.0.0-beta.8.87407 - Mend

@hallucination-studio/harness-engine 1.0.0-beta.8.87407

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md ADDED Viewed

@@ -0,0 +1,198 @@
+# Harness Engine
+Harness Engine packages a Codex skill that bootstraps an agent-first repository harness.
+It turns the repository-shaping ideas from OpenAI's
+["Harness engineering: leveraging Codex in an agent-first world"](https://openai.com/index/harness-engineering/)
+into an installable `npx` workflow.
+The package does not install a harness into this repository. This repository builds and publishes
+the installer. Users install the bundled `harness-repo-bootstrap` skill into their own project or
+global Codex skill directory, then ask Codex to use that skill to analyze the target repository,
+ask for missing high-impact facts, create the harness files, and keep future work closed-loop.
+## What This Project Does
+- Installs the `harness-repo-bootstrap` Codex skill locally, globally, or into a custom skills directory.
+- Provides a repository analyzer that detects language, package manager, frontend signals, existing harness files, missing execution-plan state, and missing SOPs.
+- Generates a short routing-style `AGENTS.md` plus durable system-of-record docs such as `ARCHITECTURE.md`, `docs/RELIABILITY.md`, `docs/SECURITY.md`, `docs/QUALITY_SCORE.md`, and `docs/FRONTEND.md`.
+- Creates execution-plan folders for active and completed plans.
+- Adds SOPs for architecture setup, knowledge capture, local observability, and UI validation.
+- Enforces a local harness check without assuming the user's project has CI.
+- Supports durable knowledge closure with stable knowledge IDs and evidence text, so permanent docs can use natural wording instead of duplicated checklist strings.
+## Why It Exists
+The OpenAI harness engineering article argues that agent-first repositories work better when the
+repo itself becomes the system of record: a short `AGENTS.md` routes agents into deeper docs,
+execution plans live beside the code, and validation loops are mechanical rather than remembered.
+This project packages that shape as a reusable Codex skill.
+The goal is not to blindly copy a template. The skill first analyzes the target repo, then asks the
+human to confirm product, reliability, security, frontend, and quality facts that cannot be safely
+inferred from code alone.
+## Install
+The npm package is scoped as `@hallucination-studio/harness-engine`. The installed command name is
+still `harness-engine`.
+Install the latest stable release into the current repository:
+```bash
+npx @hallucination-studio/harness-engine install --local
+```
+Install the latest beta build from `main`:
+```bash
+npx @hallucination-studio/harness-engine@beta install --local
+```
+Install globally into `${CODEX_HOME:-~/.codex}/skills`:
+```bash
+npx @hallucination-studio/harness-engine install --global
+```
+Install into a custom skills directory:
+```bash
+npx @hallucination-studio/harness-engine install --path /path/to/skills
+```
+Replace an existing installed skill:
+```bash
+npx @hallucination-studio/harness-engine install --local --force
+```
+Show where the skill would be installed:
+```bash
+npx @hallucination-studio/harness-engine where --local
+```
+## Use The Skill In A Target Repo
+After installing, open Codex in the target repository and invoke:
+```text
+$harness-repo-bootstrap
+```
+The intended workflow is:
+1. Analyze the target repository.
+2. Ask the human only for unresolved, high-impact facts.
+3. Initialize or update the harness files.
+4. Create execution plans for multi-step work.
+5. Log durable knowledge into active plans.
+6. Write the durable facts into permanent docs.
+7. Mark knowledge as written using ID plus evidence text.
+8. Run the local harness check before handoff.
+9. Close the execution plan only after the durable docs are updated.
+The installed skill exposes the underlying script at:
+```bash
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py --help
+```
+Common commands:
+```bash
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py analyze --repo . --output analysis.json
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py sample-answers --analysis analysis.json --output answers.json
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py init --repo . --answers answers.json
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py plan-start --repo . --slug feature-name --goal "Implement the feature"
+python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo .
+```
+## Generated Harness Shape
+A typical initialized target repository receives:
+```text
+AGENTS.md
+ARCHITECTURE.md
+docs/
+├── DESIGN.md
+├── FRONTEND.md
+├── PLANS.md
+├── PRODUCT_SENSE.md
+├── QUALITY_SCORE.md
+├── RELIABILITY.md
+├── SECURITY.md
+├── design-docs/
+├── exec-plans/
+│   ├── active/
+│   ├── completed/
+│   └── tech-debt-tracker.md
+├── generated/
+├── product-specs/
+├── references/
+└── sops/
+```
+`AGENTS.md` is intentionally short. It is a router, not an encyclopedia.
+## Version Channels
+- `latest`: Stable releases created from GitHub Release tags. The workflow derives the package version from the release tag and publishes to npm with the `latest` dist-tag.
+- `beta`: Every push to `main` publishes a unique prerelease version like `1.0.0-beta.<run-number>.<short-sha>` with the `beta` dist-tag. npm cannot overwrite an existing version, so the `beta` tag moves forward to the newest main build.
+- `nightly`: A scheduled daily build publishes versions like `1.0.0-nightly.<yyyymmdd>.<run-number>` with the `nightly` dist-tag.
+- Manual test builds: The release workflow can be run manually. By default it performs `npm publish --dry-run` with a generated `-test.<run-number>` version. Set `dry_run=false` to publish a test package to a non-`latest` dist-tag such as `next`.
+Examples:
+```bash
+npx @hallucination-studio/harness-engine install --local
+npx @hallucination-studio/harness-engine@beta install --local
+npx @hallucination-studio/harness-engine@nightly install --local
+```
+## Local Development
+Run the skill evaluations:
+```bash
+npm test
+```
+Smoke-test installation:
+```bash
+npm run smoke:install
+```
+Check npm package contents:
+```bash
+npm run pack:check
+```
+The publish workflows expect an npm token when trusted publishing is not yet configured:
+```text
+GitHub Actions secret: NPM_TOKEN
+```
+## Implementation Quality Score
+These scores describe the current implementation, not an external guarantee.
+| Layer | Score | Notes |
+| --- | ---: | --- |
+| Product fit | 8.5 / 10 | Clear purpose: install a Codex skill that creates and maintains an agent-first repository harness. The main missing piece is broader real-world usage data across more project types. |
+| Skill workflow design | 8.5 / 10 | Strong progressive workflow: analyze, confirm, initialize/update, plan, capture knowledge, validate, close. The current skill is opinionated but still adapts to target repositories. |
+| Knowledge-closure loop | 8 / 10 | Stable knowledge IDs plus evidence text reduce noisy doc duplication. Future work could move plan state into structured sidecar metadata instead of Markdown parsing. |
+| CLI installer | 8 / 10 | Simple local/global/custom install modes, force replacement, and path discovery. It is intentionally minimal and does not manage Codex runtime configuration. |
+| Generated harness docs | 7.5 / 10 | Covers architecture, plans, reliability, security, frontend policy, references, generated artifacts, and SOPs. Templates still require Codex to tighten project-specific language after generation. |
+| Evaluation coverage | 7.5 / 10 | Includes empty-repo init, frontend analysis, closed-loop plan behavior, user-owned doc preservation, and installer smoke tests. More end-to-end Codex acceptance tests would raise confidence. |
+| Release automation | 8 / 10 | Supports stable release, beta on every main commit, nightly, manual dry-run, artifacts, provenance, and token fallback. npm first-publish/trusted-publishing setup still requires external configuration. |
+| User-project safety | 8.5 / 10 | The skill avoids adding CI to target projects by default and uses local harness checks instead. It preserves unmanaged files unless forced. |
+| Overall | 8.1 / 10 | Usable and coherent, with the highest leverage still in richer evals and more structured plan/knowledge state. |
+## Reference
+- OpenAI: [Harness engineering: leveraging Codex in an agent-first world](https://openai.com/index/harness-engineering/)

package/bin/install.js ADDED Viewed

@@ -0,0 +1,154 @@
+#!/usr/bin/env node
+const fs = require("fs");
+const os = require("os");
+const path = require("path");
+const PACKAGE_ROOT = path.resolve(__dirname, "..");
+const SKILL_NAME = "harness-repo-bootstrap";
+const SOURCE_SKILL_DIR = path.join(PACKAGE_ROOT, "skills", SKILL_NAME);
+function printHelp() {
+  console.log(`harness-engine
+Usage:
+  npx @hallucination-studio/harness-engine install [--local | --global | --path <dir>] [--force]
+  npx @hallucination-studio/harness-engine where [--local | --global | --path <dir>]
+Options:
+  --local         Install into <cwd>/.codex/skills
+  --global        Install into \${CODEX_HOME:-~/.codex}/skills
+  --path <dir>    Install into a custom skills directory
+  --force         Replace an existing installed skill
+  -h, --help      Show this help text
+`);
+}
+function parseArgs(argv) {
+  const result = {
+    command: "install",
+    mode: null,
+    customPath: null,
+    force: false
+  };
+  const args = [...argv];
+  if (args.length > 0 && !args[0].startsWith("-")) {
+    result.command = args.shift();
+  }
+  for (let i = 0; i < args.length; i += 1) {
+    const arg = args[i];
+    if (arg === "--local") {
+      result.mode = "local";
+    } else if (arg === "--global") {
+      result.mode = "global";
+    } else if (arg === "--path") {
+      result.mode = "custom";
+      result.customPath = args[i + 1];
+      i += 1;
+    } else if (arg === "--force") {
+      result.force = true;
+    } else if (arg === "-h" || arg === "--help") {
+      result.command = "help";
+    } else {
+      throw new Error(`Unknown argument: ${arg}`);
+    }
+  }
+  if (result.mode === "custom" && !result.customPath) {
+    throw new Error("--path requires a directory value");
+  }
+  if (!result.mode) {
+    result.mode = "local";
+  }
+  return result;
+}
+function resolveSkillsDir(mode, customPath) {
+  if (mode === "local") {
+    return path.join(process.cwd(), ".codex", "skills");
+  }
+  if (mode === "global") {
+    const codexHome = process.env.CODEX_HOME || path.join(os.homedir(), ".codex");
+    return path.join(codexHome, "skills");
+  }
+  return path.resolve(process.cwd(), customPath);
+}
+function copyDir(sourceDir, targetDir) {
+  fs.mkdirSync(targetDir, { recursive: true });
+  for (const entry of fs.readdirSync(sourceDir, { withFileTypes: true })) {
+    const sourcePath = path.join(sourceDir, entry.name);
+    const targetPath = path.join(targetDir, entry.name);
+    if (entry.isDirectory()) {
+      copyDir(sourcePath, targetPath);
+    } else {
+      fs.copyFileSync(sourcePath, targetPath);
+      const stat = fs.statSync(sourcePath);
+      fs.chmodSync(targetPath, stat.mode);
+    }
+  }
+}
+function installSkill(destinationDir, force) {
+  const skillTargetDir = path.join(destinationDir, SKILL_NAME);
+  if (!fs.existsSync(SOURCE_SKILL_DIR)) {
+    throw new Error(`Bundled skill not found: ${SOURCE_SKILL_DIR}`);
+  }
+  if (fs.existsSync(skillTargetDir)) {
+    if (!force) {
+      throw new Error(`Skill already exists at ${skillTargetDir}. Re-run with --force to replace it.`);
+    }
+    fs.rmSync(skillTargetDir, { recursive: true, force: true });
+  }
+  fs.mkdirSync(destinationDir, { recursive: true });
+  copyDir(SOURCE_SKILL_DIR, skillTargetDir);
+  return skillTargetDir;
+}
+function main() {
+  let args;
+  try {
+    args = parseArgs(process.argv.slice(2));
+  } catch (error) {
+    console.error(`Error: ${error.message}`);
+    printHelp();
+    process.exit(1);
+  }
+  if (args.command === "help") {
+    printHelp();
+    return;
+  }
+  const destinationDir = resolveSkillsDir(args.mode, args.customPath);
+  if (args.command === "where") {
+    console.log(path.join(destinationDir, SKILL_NAME));
+    return;
+  }
+  if (args.command !== "install") {
+    console.error(`Unknown command: ${args.command}`);
+    printHelp();
+    process.exit(1);
+  }
+  try {
+    const installedPath = installSkill(destinationDir, args.force);
+    console.log(`Installed ${SKILL_NAME} to ${installedPath}`);
+    console.log("Invoke it in Codex with $harness-repo-bootstrap.");
+  } catch (error) {
+    console.error(`Install failed: ${error.message}`);
+    process.exit(1);
+  }
+}
+main();

package/package.json ADDED Viewed

@@ -0,0 +1,25 @@
+{
+  "name": "@hallucination-studio/harness-engine",
+  "version": "1.0.0-beta.8.87407",
+  "description": "Install a Codex skill that bootstraps and updates an advanced harness-engineering repository layout.",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/hallucination-studio/harness-engine.git"
+  },
+  "bin": {
+    "harness-engine": "bin/install.js"
+  },
+  "publishConfig": {
+    "access": "public"
+  },
+  "scripts": {
+    "test": "python3 skills/harness-repo-bootstrap/evals/run_evals.py",
+    "smoke:install": "node scripts/smoke_install.js",
+    "pack:check": "npm pack --dry-run"
+  },
+  "files": [
+    "bin",
+    "skills"
+  ],
+  "license": "MIT"
+}

package/skills/harness-repo-bootstrap/SKILL.md ADDED Viewed

@@ -0,0 +1,68 @@
+---
+name: harness-repo-bootstrap
+description: Bootstrap or refresh an advanced harness-engineering repository shape for Codex-driven projects. Use when Codex needs to analyze a repository, ask the human to confirm high-impact product and architecture facts, and then create or update AGENTS.md, architecture docs, policy docs, plan folders, reference folders, and SOP-backed starter files for the repository.
+---
+# Harness Repo Bootstrap
+Run the packaged script to inspect the target repository before editing files. Use the generated analysis to decide what to ask the human, what durable knowledge is missing from the repo, and which execution-plan and SOP files must be created or updated.
+## Workflow
+1. Run `python3 scripts/manage_harness.py analyze --repo <target-repo> --output <analysis.json>`.
+2. Read `analysis.json`.
+3. Ask the human only the unresolved, high-impact questions from `human_confirmations`.
+4. Run `python3 scripts/manage_harness.py sample-answers --analysis <analysis.json> --output <answers.json>`.
+5. Fill the placeholders in `answers.json` from the repository and the human's confirmed answers.
+6. Run one of:
+   - `python3 scripts/manage_harness.py init --repo <target-repo> --answers <answers.json>`
+   - `python3 scripts/manage_harness.py update --repo <target-repo> --answers <answers.json>`
+7. If the task is multi-step, run `python3 scripts/manage_harness.py plan-start --repo <target-repo> --slug <task-name> --goal "<goal>"`.
+8. If you learn durable facts during the work, run `python3 scripts/manage_harness.py knowledge-log --repo <target-repo> --plan <plan-file> --fact "<fact>" --destination <durable-doc>` and keep the returned `id`.
+9. Before closing the task, write those facts into their durable docs.
+10. Run `python3 scripts/manage_harness.py knowledge-mark-written --repo <target-repo> --plan <plan-file> --id <knowledge-id> --evidence "<text already in durable doc>"`; use `--append` only when the exact fact should be appended mechanically.
+11. Close the plan with `python3 scripts/manage_harness.py plan-close --repo <target-repo> --plan <plan-file> --summary "<summary>"`.
+12. Before handoff, run `python3 .codex/skills/harness-repo-bootstrap/scripts/manage_harness.py check --repo <target-repo>` from an installed target repository.
+13. After changing this skill, run `python3 evals/run_evals.py` and iterate until it passes.
+## Reading Order
+- Read [references/workflow.md](references/workflow.md) first for the operating model and question policy.
+- Read [references/file-map.md](references/file-map.md) when deciding which generated file to update.
+- Read [references/question-catalog.md](references/question-catalog.md) when the analysis surfaces ambiguous product, security, reliability, or frontend facts.
+- Read [references/knowledge-capture.md](references/knowledge-capture.md) when you discover facts that should survive chat history.
+- Read [references/exec-plans.md](references/exec-plans.md) before planning or updating any multi-step work.
+- Read [references/sop-index.md](references/sop-index.md) to choose the right SOP for architecture, UI validation, observability, or knowledge capture work.
+- Read [references/template-policy.md](references/template-policy.md) before overwriting existing files.
+- Read [references/evaluation-loop.md](references/evaluation-loop.md) before changing the skill, templates, scripts, or policy references.
+## Command Rules
+- Prefer `analyze` before `init` or `update`.
+- Prefer the draft, test, evaluate, iterate loop for changes to this skill.
+- Prefer `init` when the target repo has none of the managed files.
+- Prefer `update` when the repo already contains any managed file or a partial harness layout.
+- Do not overwrite existing files unless the human asked for it or you pass `--force`.
+- Treat the generated files as starting points. After generation, tighten them with repository-specific details instead of leaving placeholders behind.
+- Treat `docs/exec-plans/` as required state for multi-step work, not optional notes.
+- Treat `docs/sops/` as mechanical operating procedures, not background reading.
+- When you answer a question using facts that are not yet in the repo but should be reusable, write them into a durable doc before finishing.
+- Prefer `knowledge-mark-written --id ... --evidence ...` so durable docs can use natural wording instead of duplicated exact fact strings.
+- Use `plan-close` as the final guardrail so plan state and durable docs stay synchronized.
+- Use `check` as the local handoff guardrail for user repositories.
+- Run `python3 evals/run_evals.py` after skill changes and treat failures as iteration input.
+- Do not add CI to user repositories unless the human explicitly asks for it.
+## Output Rules
+- Keep `AGENTS.md` short and routing-oriented.
+- Keep durable knowledge in repo docs, not in chat-only explanations.
+- Keep plans under `docs/exec-plans/active/` and move finished plans to `docs/exec-plans/completed/`.
+- Keep generated material under `docs/generated/`.
+- Keep external, model-friendly references under `docs/references/`.
+- Keep SOPs explicit and task-triggered so the next agent can follow the same path mechanically.
+## Assets
+- Scaffold templates live under [assets/repo-template](assets/repo-template).
+- SOP starter docs live under [assets/sops](assets/sops).

package/skills/harness-repo-bootstrap/agents/openai.yaml ADDED Viewed

@@ -0,0 +1,4 @@
+interface:
+  display_name: "Harness Repo Bootstrap"
+  short_description: "Scaffold advanced Codex harness docs"
+  default_prompt: "Use $harness-repo-bootstrap to analyze this repository and scaffold or refresh its advanced harness documentation."

package/skills/harness-repo-bootstrap/assets/repo-template/.keep ADDED Viewed

	@@ -0,0 +1 @@
1	+ starter files are generated by scripts/manage_harness.py

package/skills/harness-repo-bootstrap/assets/sops/.keep ADDED Viewed

	@@ -0,0 +1 @@
1	+ sop starter files are generated by scripts/manage_harness.py

package/skills/harness-repo-bootstrap/evals/cases.json ADDED Viewed

@@ -0,0 +1,18 @@
+[
+  {
+    "id": "empty-repo-init",
+    "description": "Empty repositories should receive the full advanced harness scaffold."
+  },
+  {
+    "id": "frontend-analysis",
+    "description": "Frontend repositories should trigger frontend-specific confirmation and policy output."
+  },
+  {
+    "id": "closed-loop-plan",
+    "description": "Execution plans should refuse to close until durable knowledge is written back."
+  },
+  {
+    "id": "preserve-unmanaged-docs",
+    "description": "Existing user-owned harness files should be skipped unless explicitly forced."
+  }
+]