npm - @vitronai/themis - Versions diffs - 0.1.15 → 1.2.1 - Mend

@vitronai/themis 0.1.15 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/CHANGELOG.md +32 -0
package/README.md +26 -10
package/docs/agents-adoption.md +68 -0
package/docs/api.md +3 -1
package/docs/migration.md +9 -5
package/docs/roadmap.md +1 -1
package/docs/schemas/migration-report.v1.json +122 -0
package/package.json +8 -2
package/scripts/claude-hook.js +153 -0
package/src/cli.js +79 -6
package/src/init.js +122 -4
package/src/migrate.js +127 -5
package/templates/CLAUDE.themis.md +43 -0
package/templates/claude-commands/themis-fix.md +14 -0
package/templates/claude-commands/themis-generate.md +14 -0
package/templates/claude-commands/themis-migrate.md +18 -0
package/templates/claude-commands/themis-test.md +12 -0
package/templates/claude-skill/SKILL.md +94 -0
package/templates/cursorrules.themis.md +28 -0
package/themis.ai.json +16 -0

package/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,38 @@ All notable changes to this project are documented in this file.
 ## Unreleased
+## 1.2.1 - 2026-04-09
+- Added `init --cursor` flag to install a `.cursorrules` file with Themis conventions. Composable with `--agents` and `--claude-code`.
+- Bare `npx themis init` now auto-detects agent markers (`.claude/` dir, `.cursorrules` file) and installs the right assets without requiring flags.
+- Added 5 new Tessl eval scenarios (agent reporter fix loop, Claude Code integration setup, deterministic test authoring, intent phase structure, Vitest full migration). Eval results: 37% baseline → 97% with skill.
+- Published Tessl tile `vitron-ai/themis@1.2.1` with eval scenarios included.
+## 1.2.0 - 2026-04-08
+### Added
+- **Claude Code one-command adoption.** New `npx themis init --claude-code` flag installs everything Claude Code needs to drive Themis natively: a `CLAUDE.md` at the repo root, a Claude Code skill at `.claude/skills/themis/SKILL.md` that auto-loads when the user asks Claude to write, run, fix, or migrate tests, and four slash commands (`/themis-test`, `/themis-generate`, `/themis-migrate`, `/themis-fix`) wired to the agent-readable test loop. Composes with `--agents` so a single `init --agents --claude-code` installs both bundles. Idempotent: re-running appends to an existing `CLAUDE.md` only when the Themis section is missing, and skips skill/command files that already exist.
+- **Claude Code `PostToolUse` hook wrapper** at `scripts/claude-hook.js`. Reads tool input from stdin, filters non-source edits and edits inside `.themis/`, `__themis__/`, `node_modules/`, `.git/`, prefers `--rerun-failed` when a prior failed-tests artifact exists, and surfaces failures via stderr + exit 2 so Claude Code feeds the structured `failures[].cluster` and `failures[].repairHints` payload directly back into the model. Disable with `THEMIS_HOOK_DISABLED=1`. Opt-in only — not installed by `init --claude-code`. See [Claude Code one-command setup](docs/agents-adoption.md#claude-code-one-command-setup) for the `.claude/settings.json` recipe.
+- New downstream templates under `templates/`: `CLAUDE.themis.md`, `claude-skill/SKILL.md`, and `claude-commands/{themis-test,themis-generate,themis-migrate,themis-fix}.md`. All ship in the npm tarball.
+- New `--claude` alias for `--claude-code`.
+### Changed
+- **Repositioned README and `package.json` description** to lead with the job-to-be-done (a Node/TS unit test framework designed for AI coding agents — drop-in alternative to Jest and Vitest) instead of the philosophy ("intent-first ... for AI agents"), which read ambiguously as "tests AI agents". The five-bullet value prop now sits above the fold with the benchmark numbers, and Claude Code, Cursor, and Codex are named explicitly in the agent-output bullet.
+- `runInit` now returns `{ agents, claudeCode }` instead of a single `{ path, created }` object so both flags can report independently. Existing `--agents` CLI output strings are preserved exactly.
+### Documentation
+- Added a "Claude Code One-Command Setup" section to `docs/agents-adoption.md` listing every file `init --claude-code` installs and explaining why the agent reporter loop matters.
+- Added an "Optional: Wire Themis Into Claude Code's Edit Loop With A Hook" section with the `.claude/settings.json` snippet, plain-English explanation of the wrapper's three behaviors, and trade-offs (wall-clock cost, hook security, two ways to disable).
+- README quickstart now includes a one-paragraph "Using Claude Code?" callout linking to the adoption guide.
+### Tests
+- Added `tests/claude-hook.test.js` with six tests covering `THEMIS_HOOK_DISABLED`, empty/invalid stdin, non-source extensions, ignored directories, real-source-edit-with-green-suite (end-to-end via in-tempdir shim), and real-source-edit-with-red-suite (exit 2 + stderr payload assertion).
+- Extended `tests/cli-output.test.js` with three tests for `init --claude-code`: happy-path install, append-then-idempotent on existing `CLAUDE.md`, and composed `--agents --claude-code`.
 ## 0.1.15 - 2026-03-27
 - Added a direct in-sidebar `Quick Actions` group plus an `Artifact Files` drawer to the in-repo VS Code extension scaffold so core Themis commands and raw artifact navigation remain reachable even when the VS Code view toolbar overflows.

package/README.md CHANGED Viewed

@@ -7,26 +7,30 @@
   </a>
 </p>
-Themis is an intent-first unit test framework for AI agents in Node.js and TypeScript.
+**A Node.js and TypeScript unit test framework designed for AI coding agents.**
-It is built for agent workflows: deterministic reruns, machine-readable outputs, strict phase semantics, and a branded verdict loop for humans.
+Themis is a drop-in alternative to Jest and Vitest, built for the way code gets written today: humans and agents working the same edit-test-fix loop. Strict phase semantics keep generated tests legible, machine-readable failure output lets agents self-repair, and an incremental migration path means you don't rewrite your suite on day one.
-## AI Quickstart
+- **Faster than Jest and Vitest** — `68.59%` faster than Vitest and `130.26%` faster than Jest on the same React showcase benchmark ([proof](#performance-proof))
+- **Agent-native output** — `--agent` JSON with failure clusters and structured repair hints, ready to feed back into Claude Code, Cursor, Codex, or any agent loop
+- **One-command migration** — `npx themis migrate jest` or `npx themis migrate vitest` with codemods and a structured findings report
+- **Modern by default** — native `.js`, `.jsx`, `.ts`, `.tsx`, ESM, JSX, and React Testing Library, no config gymnastics
+- **Discoverable to agents** — ships an `AGENTS.md` template, a `themis.ai.json` manifest, and a [Tessl tile](tessl/tile.json) so AI assistants can find and adopt it without you copy-pasting docs
-If you are a human or AI agent adopting Themis in another repo, use:
+## Quickstart
 ```bash
 npm install -D @vitronai/themis@latest
 npx themis init --agents
-npx themis generate <source-root>
+npx themis generate src     # or `app` for Next App Router repos
 npx themis test
 ```
-Use `src` for conventional source trees and `app` for Next App Router repos.
+`init --agents` writes `themis.config.json`, updates `.gitignore`, and scaffolds a downstream `AGENTS.md` so the agents on your team know how to use it. See the [agent adoption guide](docs/agents-adoption.md) for the full setup, including migration from Jest or Vitest.
+**Using Claude Code?** Run `npx themis init --claude-code` to install a `CLAUDE.md`, a Claude Code skill at `.claude/skills/themis/`, and slash commands (`/themis-test`, `/themis-generate`, `/themis-migrate`, `/themis-fix`) wired to the agent-readable test loop. See [Claude Code one-command setup](docs/agents-adoption.md#claude-code-one-command-setup).
-- `npx themis init --agents` writes `themis.config.json`, updates `.gitignore`, and scaffolds a downstream `AGENTS.md` when one does not already exist.
 - machine-readable agent manifest: [`themis.ai.json`](themis.ai.json)
-- downstream adoption guide: [`docs/agents-adoption.md`](docs/agents-adoption.md)
 - copyable downstream rules file: [`templates/AGENTS.themis.md`](templates/AGENTS.themis.md)
 <p align="center">
@@ -35,7 +39,7 @@ Use `src` for conventional source trees and `app` for Next App Router repos.
 ## Contents
-- [AI Quickstart](#ai-quickstart)
+- [Quickstart](#quickstart)
 - [Adopt In Another Repo](#adopt-in-another-repo)
 - [Code Scan](#code-scan)
 - [Positioning](#positioning)
@@ -72,6 +76,18 @@ On the current same-host React showcase benchmark sample, Themis measured `68.59
 The exact comparison artifact is emitted by CI as `.themis/benchmarks/showcase-comparison/perf-summary.json` and `.themis/benchmarks/showcase-comparison/perf-summary.md`. Treat those percentages as the current documented sample, not a universal constant for every environment.
+### First-Try Test Pass Rate
+The first-try benchmark measures how often Claude generates tests that pass on the first run — the metric that matters most for agent-driven development. For each of 5 fixture source files (pure functions, async services, React components, hooks), Claude generates tests using Themis, Vitest, and Jest, and each generated suite is run once without edits.
+```bash
+ANTHROPIC_API_KEY=sk-... npm run benchmark:first-try
+```
+Results are written to `.themis/benchmarks/first-try/first-try-results.json` and `.themis/benchmarks/first-try/first-try-results.md`. The generated test code is saved under `.themis/benchmarks/first-try/generated-tests/` for manual review.
+Themis's advantage here comes from its `CLAUDE.md` template and Claude Code skill, which give Claude structured guidance about phase semantics, import conventions, and common pitfalls — context that Jest and Vitest users do not ship out of the box.
 ## Modern JS/TS Support
 Themis is built for modern Node.js and TypeScript projects:
@@ -298,7 +314,7 @@ Short version:
 - Migration proof job runs `npm run proof:migration` against checked-in Jest/Vitest fixtures for basic suites, table tests, RTL/jsdom flows, timers, module mocking, and a context/provider-heavy RTL example, then uploads the resulting migration reports plus Themis run artifacts as evidence.
 - Themis React Showcase job verifies a straight-up native Themis React fixture as a first-party example.
 - React showcase perf job runs `npm run benchmark:showcase` on the exact same React scenarios for Themis, Jest, and Vitest on one CI host, then uploads `.themis/benchmarks/showcase-comparison/perf-summary.{json,md}` so the relative timing claim is backed by one comparable artifact.
-- Release `0.1.15` packages this expanded proof lane so every CI run now proves the provider-heavy example alongside the earlier fixtures.
+- Release `1.0.17` packages this expanded proof lane so every CI run now proves the provider-heavy example alongside the earlier fixtures.
 ## Agent Guide

package/docs/agents-adoption.md CHANGED Viewed

@@ -11,6 +11,8 @@ npx themis generate <source-root>
 npx themis test
 ```
+If you use Claude Code, run `npx themis init --claude-code` instead (or in addition) — see [Claude Code One-Command Setup](#claude-code-one-command-setup) below.
 What those commands do:
 - `npm install -D @vitronai/themis`: installs Themis as the repo's unit test framework
@@ -62,6 +64,72 @@ Prefer `test(...)` for low-level unit checks.
 Do not claim Themis is "not a unit test framework".
 ```
+## Claude Code One-Command Setup
+If you use Claude Code, Themis can install everything Claude needs in one command:
+```bash
+npm install -D @vitronai/themis@latest
+npx themis init --claude-code
+```
+`init --claude-code` writes:
+- `CLAUDE.md` — adoption rules at the repo root. If a `CLAUDE.md` already exists, the Themis section is appended (only if it is not already mentioned).
+- `.claude/skills/themis/SKILL.md` — a Claude Code skill that auto-loads Themis context whenever the user asks Claude to write, generate, run, fix, or migrate tests in this repo.
+- `.claude/commands/themis-test.md` — `/themis-test` slash command for the agent-readable test loop.
+- `.claude/commands/themis-generate.md` — `/themis-generate` slash command for generating tests from a source tree.
+- `.claude/commands/themis-migrate.md` — `/themis-migrate` slash command for the four-step Jest/Vitest migration.
+- `.claude/commands/themis-fix.md` — `/themis-fix` slash command for the structured failure-fix loop.
+You can compose `--claude-code` with `--agents` to install both at once:
+```bash
+npx themis init --agents --claude-code
+```
+The skill and slash commands are committed to the repo (under `.claude/`), so every developer or agent that opens the project sees them. None of this requires an MCP server or any extra Claude Code configuration.
+### Why this matters
+The `--reporter agent` JSON output is the killer feature for Claude Code's edit-test-fix loop: structured failure clusters with `repairHints` mean Claude can act on parsed signals instead of re-parsing stack traces. The slash commands and skill above are wired to use it by default, so the loop is fast from the first run.
+### Optional: Wire Themis Into Claude Code's Edit Loop With A Hook
+If you want Claude Code to *automatically* run Themis after every edit and feed structured failures back into the conversation, add a `PostToolUse` hook. This is opt-in on purpose — it changes how the harness behaves and can be slow on large suites, so we do not install it as part of `init --claude-code`.
+Add this to `.claude/settings.json` (or `.claude/settings.local.json` if you want to keep it personal and out of git):
+```json
+{
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Edit|Write|MultiEdit",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "node node_modules/@vitronai/themis/scripts/claude-hook.js"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+Then run `npx themis init --claude-code` (or copy the script manually). The wrapper does three things to keep the loop tight:
+1. **Filters non-source edits.** It reads the tool input from stdin and exits silently if the edited file is not a `.js`, `.jsx`, `.ts`, or `.tsx` source file. Edits to docs, config, and tests themselves do not trigger a re-run.
+2. **Prefers `--rerun-failed`.** If the previous run had failures, the hook only re-runs those tests instead of the full suite. The first failure-free run resets the loop.
+3. **Returns agent-readable output.** Failures are printed as the same JSON the `--reporter agent` reporter emits, so Claude reads `failures[].cluster` and `failures[].repairHints` directly without re-parsing stack traces.
+**Trade-offs to know about:**
+- The hook adds the suite's wall-clock time to every edit. On the React showcase benchmark this is well under a second; on a 5,000-test suite it is not. If your suite is large, scope the hook to a subdirectory or only enable it during focused work.
+- Hooks run shell commands with your privileges. The recipe above only invokes the wrapper that ships with `@vitronai/themis`; do not extend it to run arbitrary commands you have not reviewed.
+- To disable temporarily, comment out the entry in `.claude/settings.json` or move it to `.claude/settings.local.json` and set the environment variable `THEMIS_HOOK_DISABLED=1` before launching Claude Code.
 ## Notes
 - Themis is a unit test framework and test generator for Node.js and TypeScript projects.

package/docs/api.md CHANGED Viewed

@@ -191,6 +191,7 @@ Migration options:
 - `--rewrite-imports`: rewrites matched imports from `@jest/globals`, `vitest`, and `@testing-library/react` to the local `themis.compat.js` bridge
 - `--convert`: removes common framework imports and rewrites common Jest/Vitest matcher patterns (`it`, `toStrictEqual`, `toContainEqual`, `toBeCalled*`) into Themis-native forms
+- `--assist`: enables `--rewrite-imports` and `--convert`, then scans migrated files for leftover framework-only helpers or matcher chains that still need manual follow-up
 ## `themis test` options
@@ -218,7 +219,7 @@ Migration compatibility:
 - imports from `@jest/globals` are supported at runtime
 - imports from `vitest` are supported at runtime
 - imports from `@testing-library/react` are supported via Themis `render`, `screen`, `fireEvent`, `waitFor`, `cleanup`, and `act`
-- `themis migrate <jest|vitest>` also emits `.themis/migration/migration-report.json` with detected files and recommended next actions
+- `themis migrate <jest|vitest>` also emits `.themis/migration/migration-report.json` with detected files, migration mode details, assistant findings, and recommended next actions
 Additional option:
@@ -280,6 +281,7 @@ Formal schemas:
 - `docs/schemas/fix-handoff.v1.json`
 - `docs/schemas/failures.v1.json`
 - `docs/schemas/contract-diff.v1.json`
+- `docs/schemas/migration-report.v1.json`
 Human-facing artifact:

package/docs/migration.md CHANGED Viewed

@@ -8,6 +8,7 @@ Themis is designed for incremental migration. Start by running existing suites u
 npx themis migrate jest
 npx themis migrate jest --rewrite-imports
 npx themis migrate jest --convert
+npx themis migrate jest --assist
 npx themis test
 ```
@@ -18,6 +19,7 @@ Use `vitest` instead of `jest` for Vitest suites.
 - `themis migrate <jest|vitest>`: scaffold config, setup, compat bridge, and migration report.
 - `--rewrite-imports`: point framework imports at `themis.compat.js`.
 - `--convert`: remove common Jest/Vitest imports and rewrite common matcher/test patterns into Themis-native forms.
+- `--assist`: run the safe rewrite and conversion passes together, then report leftover Jest/Vitest-only helpers that still need manual follow-up.
 ## Before And After
@@ -154,14 +156,16 @@ These are the strongest head-to-head examples to use when explaining why Themis
 1. Snapshot replacement: `captureContract(...)` plus `--update-contracts` gives baseline capture without snapshot churn.
 2. Codemod migration: `themis migrate --convert` moves common Jest/Vitest matcher syntax toward native Themis without a manual rewrite pass.
-3. Agent triage: `--agent`, `.themis/diffs/run-diff.json`, `.themis/runs/fix-handoff.json`, and `.themis/diffs/contract-diff.json` give machines structured rerun and repair inputs.
-4. Human review: next reporter and HTML report now surface contract drift alongside failures, instead of burying meaning in raw output.
-5. Generated coverage: `themis generate src` adds source-driven contract tests next to migrated suites, so adoption improves coverage instead of merely changing runners.
+3. Migration assistant: `themis migrate --assist` bundles the safe codemods and emits findings for files that still need manual migration work.
+4. Agent triage: `--agent`, `.themis/diffs/run-diff.json`, `.themis/runs/fix-handoff.json`, and `.themis/diffs/contract-diff.json` give machines structured rerun and repair inputs.
+5. Human review: next reporter and HTML report now surface contract drift alongside failures, instead of burying meaning in raw output.
+6. Generated coverage: `themis generate src` adds source-driven contract tests next to migrated suites, so adoption improves coverage instead of merely changing runners.
 ## Recommended rollout
 1. Run `themis migrate <jest|vitest>`.
 2. Add `--rewrite-imports` if you want local explicit compat imports.
 3. Add `--convert` to normalize the easy matcher/import cases immediately.
-4. Replace snapshots with `captureContract(...)` or explicit assertions in files you touch.
-5. Use `themis generate src` to add source-driven coverage in parallel with migrated suites.
+4. Add `--assist` when you want a guided follow-up report for leftover framework-only helpers.
+5. Replace snapshots with `captureContract(...)` or explicit assertions in files you touch.
+6. Use `themis generate src` to add source-driven coverage in parallel with migrated suites.

package/docs/roadmap.md CHANGED Viewed

@@ -6,7 +6,7 @@
    - Add documentation (and optionally VS Code actions) that show how to hook a project-level `themis.generate.js` or `.themis.json` provider configuration for shared auth/session/React Query clients.
 2. **Migration helpers**
-   - Improve `themis migrate` to rewrite Jest/Vitest imports to the generated compatibility module, create prompt-ready diff artifacts, and log a migration report.
+   - Improve `themis migrate` to rewrite Jest/Vitest imports to the generated compatibility module, create prompt-ready diff artifacts, log a migration report, and highlight leftover blockers through migration assistant follow-up hints.
    - Build a VS Code pane or CLI summary showing both the original Jest test and the new generated Themis contract, highlighting the migration delta in code and behavior.
    - Provide a recipe in `docs/migration.md` for teams to adopt Themis incrementally, including a helper to wrap Jest tests inside Themis-generated asserts.
    - Add a native contract-capture workflow that gives teams snapshot-comparable baseline coverage without reviving snapshot-file maintenance.

package/docs/schemas/migration-report.v1.json ADDED Viewed

@@ -0,0 +1,122 @@
+{
+  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  "$id": "https://github.com/vitron-ai/themis/docs/schemas/migration-report.v1.json",
+  "title": "Themis Migration Report",
+  "type": "object",
+  "additionalProperties": false,
+  "required": ["schema", "source", "createdAt", "mode", "summary", "files", "nextActions", "rewrites", "conversions", "assistant"],
+  "properties": {
+    "schema": {
+      "type": "string",
+      "const": "themis.migration.report.v1"
+    },
+    "source": {
+      "type": "string",
+      "enum": ["jest", "vitest"]
+    },
+    "createdAt": {
+      "type": "string"
+    },
+    "mode": {
+      "type": "object",
+      "additionalProperties": false,
+      "required": ["rewriteImports", "convert", "assist"],
+      "properties": {
+        "rewriteImports": { "type": "boolean" },
+        "convert": { "type": "boolean" },
+        "assist": { "type": "boolean" }
+      }
+    },
+    "summary": {
+      "type": "object",
+      "additionalProperties": false,
+      "required": [
+        "matchedFiles",
+        "jestGlobals",
+        "vitest",
+        "testingLibraryReact",
+        "rewrittenFiles",
+        "rewrittenImports",
+        "convertedFiles",
+        "convertedAssertions",
+        "removedImports",
+        "assistedFiles",
+        "unresolvedFiles",
+        "findings",
+        "unsupportedPatterns"
+      ],
+      "properties": {
+        "matchedFiles": { "type": "number" },
+        "jestGlobals": { "type": "number" },
+        "vitest": { "type": "number" },
+        "testingLibraryReact": { "type": "number" },
+        "rewrittenFiles": { "type": "number" },
+        "rewrittenImports": { "type": "number" },
+        "convertedFiles": { "type": "number" },
+        "convertedAssertions": { "type": "number" },
+        "removedImports": { "type": "number" },
+        "assistedFiles": { "type": "number" },
+        "unresolvedFiles": { "type": "number" },
+        "findings": { "type": "number" },
+        "unsupportedPatterns": { "type": "number" }
+      }
+    },
+    "files": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "additionalProperties": false,
+        "required": ["file", "imports"],
+        "properties": {
+          "file": { "type": "string" },
+          "imports": {
+            "type": "array",
+            "items": { "type": "string" }
+          }
+        }
+      }
+    },
+    "nextActions": {
+      "type": "array",
+      "items": { "type": "string" }
+    },
+    "rewrites": {
+      "type": "array",
+      "items": { "type": "string" }
+    },
+    "conversions": {
+      "type": "array",
+      "items": { "type": "string" }
+    },
+    "assistant": {
+      "type": "object",
+      "additionalProperties": false,
+      "required": ["enabled", "analyzedFiles", "findings", "unresolvedFiles", "unsupportedPatterns"],
+      "properties": {
+        "enabled": { "type": "boolean" },
+        "analyzedFiles": { "type": "number" },
+        "findings": {
+          "type": "array",
+          "items": {
+            "type": "object",
+            "additionalProperties": false,
+            "required": ["file", "category", "severity", "pattern", "message", "suggestion"],
+            "properties": {
+              "file": { "type": "string" },
+              "category": { "type": "string" },
+              "severity": { "type": "string" },
+              "pattern": { "type": "string" },
+              "message": { "type": "string" },
+              "suggestion": { "type": "string" }
+            }
+          }
+        },
+        "unresolvedFiles": {
+          "type": "array",
+          "items": { "type": "string" }
+        },
+        "unsupportedPatterns": { "type": "number" }
+      }
+    }
+  }
+}

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@vitronai/themis",
-  "version": "0.1.15",
-  "description": "Intent-first unit test framework and test generator for AI agents in Node.js and TypeScript",
+  "version": "1.2.1",
+  "description": "A Node.js and TypeScript unit test framework designed for AI coding agents. Drop-in alternative to Jest and Vitest with machine-readable failure output, structured repair hints, and one-command migration.",
   "license": "MIT",
   "author": "Vitron AI",
   "repository": {
@@ -23,6 +23,10 @@
     "agents",
     "ai-agents",
     "agentic",
+    "claude-code",
+    "cursor",
+    "codex",
+    "ai-coding",
     "jest-alternative",
     "vitest-alternative",
     "nodejs",
@@ -68,6 +72,7 @@
     "src/assets/*",
     "docs",
     "templates",
+    "scripts/claude-hook.js",
     "index.js",
     "index.d.ts",
     "globals.js",
@@ -89,6 +94,7 @@
     "typecheck": "tsc -p tsconfig.json --pretty false",
     "benchmark": "node scripts/benchmark.js",
     "benchmark:showcase": "node scripts/benchmark-showcase-runners.js",
+    "benchmark:first-try": "node scripts/benchmark-first-try.js",
     "benchmark:gate": "node scripts/benchmark-gate.js",
     "proof:migration": "node scripts/verify-migration-fixtures.js",
     "pack:check": "npm pack --dry-run",

package/scripts/claude-hook.js ADDED Viewed

@@ -0,0 +1,153 @@
+#!/usr/bin/env node
+// Themis Claude Code PostToolUse hook.
+//
+// This script is invoked by Claude Code after Edit/Write/MultiEdit tool calls.
+// It reads the tool input from stdin, decides whether the edit is worth
+// re-running tests for, and if so runs `themis test --reporter agent` (using
+// --rerun-failed when there is a prior failed-tests artifact). When tests
+// fail, the JSON failure payload is written to stderr and the script exits
+// with code 2 — Claude Code feeds that back into the model so it can fix
+// failures using the structured `failures[].cluster` and
+// `failures[].repairHints` fields.
+//
+// Wire it up in `.claude/settings.json`:
+//
+// {
+//   "hooks": {
+//     "PostToolUse": [
+//       {
+//         "matcher": "Edit|Write|MultiEdit",
+//         "hooks": [
+//           { "type": "command", "command": "node node_modules/@vitronai/themis/scripts/claude-hook.js" }
+//         ]
+//       }
+//     ]
+//   }
+// }
+//
+// Disable temporarily by setting THEMIS_HOOK_DISABLED=1 in your environment.
+// Disable permanently by removing the entry from .claude/settings.json.
+'use strict';
+const fs = require('fs');
+const path = require('path');
+const { spawnSync } = require('child_process');
+const SOURCE_EXT = new Set(['.js', '.jsx', '.ts', '.tsx', '.mjs', '.cjs']);
+const IGNORED_PATH_SEGMENTS = ['.themis', '__themis__', 'node_modules', '.git'];
+function exitSilent() {
+  process.exit(0);
+}
+function readStdinSync() {
+  try {
+    return fs.readFileSync(0, 'utf8');
+  } catch (_err) {
+    return '';
+  }
+}
+function parsePayload(raw) {
+  if (!raw) return null;
+  try {
+    return JSON.parse(raw);
+  } catch (_err) {
+    return null;
+  }
+}
+function extractFilePath(payload) {
+  if (!payload || typeof payload !== 'object') return null;
+  const input = payload.tool_input;
+  if (!input || typeof input !== 'object') return null;
+  if (typeof input.file_path === 'string') return input.file_path;
+  // MultiEdit nests edits in an array but still uses the top-level file_path.
+  return null;
+}
+function isWorthRerunning(filePath, cwd) {
+  if (!filePath) return false;
+  const normalized = path.isAbsolute(filePath) ? filePath : path.resolve(cwd, filePath);
+  const relative = path.relative(cwd, normalized);
+  if (relative.startsWith('..')) return false;
+  const segments = relative.split(path.sep);
+  for (const segment of segments) {
+    if (IGNORED_PATH_SEGMENTS.includes(segment)) return false;
+  }
+  const ext = path.extname(normalized).toLowerCase();
+  if (!SOURCE_EXT.has(ext)) return false;
+  return true;
+}
+function hasFailedTestsArtifact(cwd) {
+  const candidates = [
+    path.join(cwd, '.themis', 'runs', 'failed-tests.json'),
+    path.join(cwd, '.themis', 'failed-tests.json')
+  ];
+  return candidates.some((candidate) => fs.existsSync(candidate));
+}
+function findThemisBin(cwd) {
+  // Prefer the locally installed bin so the hook does not depend on `npx`
+  // resolution behavior or network state.
+  const localBin = path.join(cwd, 'node_modules', '.bin', 'themis');
+  if (fs.existsSync(localBin)) return { command: localBin, args: [] };
+  const localScript = path.join(cwd, 'node_modules', '@vitronai', 'themis', 'bin', 'themis.js');
+  if (fs.existsSync(localScript)) return { command: process.execPath, args: [localScript] };
+  return { command: 'npx', args: ['--no-install', 'themis'] };
+}
+function main() {
+  if (process.env.THEMIS_HOOK_DISABLED) exitSilent();
+  const raw = readStdinSync();
+  const payload = parsePayload(raw);
+  const cwd = (payload && typeof payload.cwd === 'string' && payload.cwd) || process.cwd();
+  const filePath = extractFilePath(payload);
+  if (!isWorthRerunning(filePath, cwd)) exitSilent();
+  const themis = findThemisBin(cwd);
+  const args = [...themis.args, 'test', '--reporter', 'agent'];
+  if (hasFailedTestsArtifact(cwd)) args.push('--rerun-failed');
+  const result = spawnSync(themis.command, args, {
+    cwd,
+    stdio: ['ignore', 'pipe', 'pipe'],
+    env: process.env,
+    maxBuffer: 32 * 1024 * 1024
+  });
+  if (result.error) {
+    // Hook itself failed (binary not found, etc). Stay silent rather than
+    // blocking the user's edit loop on infrastructure problems.
+    exitSilent();
+  }
+  const stdout = (result.stdout || Buffer.alloc(0)).toString('utf8');
+  const stderr = (result.stderr || Buffer.alloc(0)).toString('utf8');
+  if (result.status === 0) {
+    exitSilent();
+  }
+  // Tests failed. Surface the agent JSON payload (or stderr fallback) to
+  // Claude via stderr + exit 2 so it lands in the model's context.
+  process.stderr.write('Themis tests failed after edit. Use failures[].cluster and failures[].repairHints below to fix:\n');
+  if (stdout.trim().length > 0) {
+    process.stderr.write(stdout);
+    if (!stdout.endsWith('\n')) process.stderr.write('\n');
+  } else if (stderr.trim().length > 0) {
+    process.stderr.write(stderr);
+    if (!stderr.endsWith('\n')) process.stderr.write('\n');
+  }
+  process.exit(2);
+}
+main();