@vitronai/themis 1.2.1 → 1.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +84 -470
- package/docs/tutorial-claude-code.md +230 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -7,15 +7,17 @@
|
|
|
7
7
|
</a>
|
|
8
8
|
</p>
|
|
9
9
|
|
|
10
|
-
**A
|
|
10
|
+
**A unit test framework built for AI coding agents.**
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
Drop-in alternative to Jest and Vitest. Agents write tests, get structured failure output, and self-repair — all in the same edit-test-fix loop.
|
|
13
13
|
|
|
14
|
-
- **Faster
|
|
15
|
-
- **Agent-native
|
|
16
|
-
- **One-command migration** — `npx themis migrate jest` or `
|
|
17
|
-
- **Modern by default** —
|
|
18
|
-
- **Discoverable
|
|
14
|
+
- **Faster** — 68.59% faster than Vitest, 130.26% faster than Jest on the same benchmark ([proof](#performance))
|
|
15
|
+
- **Agent-native** — `--agent` JSON with failure clusters and structured repair hints
|
|
16
|
+
- **One-command migration** — `npx themis migrate jest` or `vitest` with codemods
|
|
17
|
+
- **Modern by default** — `.ts`, `.tsx`, `.js`, `.jsx`, ESM, React Testing Library, no config gymnastics
|
|
18
|
+
- **Discoverable** — ships `AGENTS.md`, `themis.ai.json`, and a [Tessl tile](tessl/tile.json) so agents find and adopt it automatically
|
|
19
|
+
|
|
20
|
+
---
|
|
19
21
|
|
|
20
22
|
## Quickstart
|
|
21
23
|
|
|
@@ -26,350 +28,51 @@ npx themis generate src # or `app` for Next App Router repos
|
|
|
26
28
|
npx themis test
|
|
27
29
|
```
|
|
28
30
|
|
|
29
|
-
`init --agents` writes
|
|
30
|
-
|
|
31
|
-
**Using Claude Code?** Run `npx themis init --claude-code` to install a `CLAUDE.md`, a Claude Code skill at `.claude/skills/themis/`, and slash commands (`/themis-test`, `/themis-generate`, `/themis-migrate`, `/themis-fix`) wired to the agent-readable test loop. See [Claude Code one-command setup](docs/agents-adoption.md#claude-code-one-command-setup).
|
|
32
|
-
|
|
33
|
-
- machine-readable agent manifest: [`themis.ai.json`](themis.ai.json)
|
|
34
|
-
- copyable downstream rules file: [`templates/AGENTS.themis.md`](templates/AGENTS.themis.md)
|
|
35
|
-
|
|
36
|
-
<p align="center">
|
|
37
|
-
<img src="src/assets/themisVerdictEngine.png" alt="Themis verdict engine art" width="960">
|
|
38
|
-
</p>
|
|
39
|
-
|
|
40
|
-
## Contents
|
|
41
|
-
|
|
42
|
-
- [Quickstart](#quickstart)
|
|
43
|
-
- [Adopt In Another Repo](#adopt-in-another-repo)
|
|
44
|
-
- [Code Scan](#code-scan)
|
|
45
|
-
- [Positioning](#positioning)
|
|
46
|
-
- [Performance Proof](#performance-proof)
|
|
47
|
-
- [Modern JS/TS Support](#modern-jsts-support)
|
|
48
|
-
- [Commands](#commands)
|
|
49
|
-
- [Agent Guide](#agent-guide)
|
|
50
|
-
- [VS Code](#vs-code)
|
|
51
|
-
- [Mocks And UI Primitives](#mocks-and-ui-primitives)
|
|
52
|
-
- [Intent Syntax](#intent-syntax)
|
|
53
|
-
- [Config](#config)
|
|
54
|
-
- [TypeScript](#typescript)
|
|
55
|
-
- [Benchmark](#benchmark)
|
|
56
|
-
- [Publish Readiness](#publish-readiness)
|
|
57
|
-
- [Agent Adoption Guide](docs/agents-adoption.md)
|
|
58
|
-
- [Why Themis](docs/why-themis.md)
|
|
59
|
-
- [API Reference](docs/api.md)
|
|
60
|
-
- [Showcase Comparisons](docs/showcases.md)
|
|
61
|
-
- [Release Policy](docs/release-policy.md)
|
|
62
|
-
- [Publish Guide](docs/publish.md)
|
|
63
|
-
|
|
64
|
-
## Positioning
|
|
65
|
-
|
|
66
|
-
- Best-in-class unit testing for AI agents in Node.js and TypeScript
|
|
67
|
-
- Deterministic execution with fast rerun loops
|
|
68
|
-
- Agent-native JSON and HTML reporting
|
|
69
|
-
- Structured contract workflows instead of opaque snapshot files
|
|
70
|
-
- Incremental migration path from Jest/Vitest without rewriting everything on day one
|
|
71
|
-
- AI verdict engine for human triage and machine automation
|
|
72
|
-
|
|
73
|
-
## Performance Proof
|
|
74
|
-
|
|
75
|
-
On the current same-host React showcase benchmark sample, Themis measured `68.59%` faster than Vitest and `130.26%` faster than Jest on median wall-clock time for the same two-spec suite.
|
|
76
|
-
|
|
77
|
-
The exact comparison artifact is emitted by CI as `.themis/benchmarks/showcase-comparison/perf-summary.json` and `.themis/benchmarks/showcase-comparison/perf-summary.md`. Treat those percentages as the current documented sample, not a universal constant for every environment.
|
|
78
|
-
|
|
79
|
-
### First-Try Test Pass Rate
|
|
80
|
-
|
|
81
|
-
The first-try benchmark measures how often Claude generates tests that pass on the first run — the metric that matters most for agent-driven development. For each of 5 fixture source files (pure functions, async services, React components, hooks), Claude generates tests using Themis, Vitest, and Jest, and each generated suite is run once without edits.
|
|
82
|
-
|
|
83
|
-
```bash
|
|
84
|
-
ANTHROPIC_API_KEY=sk-... npm run benchmark:first-try
|
|
85
|
-
```
|
|
86
|
-
|
|
87
|
-
Results are written to `.themis/benchmarks/first-try/first-try-results.json` and `.themis/benchmarks/first-try/first-try-results.md`. The generated test code is saved under `.themis/benchmarks/first-try/generated-tests/` for manual review.
|
|
88
|
-
|
|
89
|
-
Themis's advantage here comes from its `CLAUDE.md` template and Claude Code skill, which give Claude structured guidance about phase semantics, import conventions, and common pitfalls — context that Jest and Vitest users do not ship out of the box.
|
|
90
|
-
|
|
91
|
-
## Modern JS/TS Support
|
|
92
|
-
|
|
93
|
-
Themis is built for modern Node.js and TypeScript projects:
|
|
31
|
+
`init --agents` writes config, updates `.gitignore`, and scaffolds a downstream `AGENTS.md`.
|
|
94
32
|
|
|
95
|
-
-
|
|
96
|
-
- ESM `.js` loading in `type: "module"` projects
|
|
97
|
-
- `tsconfig` path alias resolution
|
|
98
|
-
- `node` and `jsdom` environments
|
|
99
|
-
- `setupFiles` for harness bootstrapping
|
|
100
|
-
- `testIgnore` patterns for deterministic discovery boundaries
|
|
101
|
-
- first-party mocks, spies, and deterministic UI primitives
|
|
102
|
-
- compatibility imports for `@jest/globals`, `vitest`, and `@testing-library/react`
|
|
103
|
-
- `--watch`, `--rerun-failed`, `--isolation in-process`, and `--cache` for tight local and agent rerun loops
|
|
33
|
+
**Using Claude Code?** Run `npx themis init --claude-code` to install `CLAUDE.md`, a Claude Code skill, and slash commands (`/themis-test`, `/themis-generate`, `/themis-migrate`, `/themis-fix`).
|
|
104
34
|
|
|
105
|
-
|
|
35
|
+
---
|
|
106
36
|
|
|
107
|
-
|
|
108
|
-
- Verdict engine art: [`src/assets/themisVerdictEngine.png`](src/assets/themisVerdictEngine.png)
|
|
109
|
-
- HTML verdict report art: [`src/assets/themisReport.png`](src/assets/themisReport.png)
|
|
110
|
-
- Background art used by the report: [`src/assets/themisBg.png`](src/assets/themisBg.png)
|
|
37
|
+
## Performance
|
|
111
38
|
|
|
112
|
-
|
|
39
|
+
On the same React showcase benchmark, Themis measured **68.59% faster than Vitest** and **130.26% faster than Jest** on median wall-clock time. The comparison artifact is emitted by CI as `.themis/benchmarks/showcase-comparison/perf-summary.json`.
|
|
113
40
|
|
|
114
|
-
|
|
115
|
-
TypeScript-generated suites use `import` syntax so downstream ESLint and ESM-style rules do not flag Themis output as legacy `require(...)` code.
|
|
116
|
-
|
|
117
|
-
If another repo wants its agents to reliably choose Themis, put the framework choice directly in that repo's agent instructions instead of assuming agents will infer it from package metadata alone.
|
|
118
|
-
|
|
119
|
-
For a copy-paste downstream setup guide, see [`docs/agents-adoption.md`](docs/agents-adoption.md).
|
|
120
|
-
|
|
121
|
-
For a ready-to-copy downstream agent rules file, see [`templates/AGENTS.themis.md`](templates/AGENTS.themis.md).
|
|
122
|
-
|
|
123
|
-
Generate the next-gen HTML report:
|
|
124
|
-
|
|
125
|
-
```bash
|
|
126
|
-
npx themis test --reporter html
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
Use the AI-agent payload:
|
|
41
|
+
The first-try benchmark measures how often an AI agent generates tests that pass on the first run — the metric that matters most for agent-driven development:
|
|
130
42
|
|
|
131
43
|
```bash
|
|
132
|
-
|
|
44
|
+
ANTHROPIC_API_KEY=sk-... npm run benchmark:first-try
|
|
133
45
|
```
|
|
134
46
|
|
|
135
|
-
|
|
47
|
+
---
|
|
136
48
|
|
|
137
|
-
|
|
138
|
-
npx themis test --watch --isolation in-process --cache --reporter next
|
|
139
|
-
```
|
|
49
|
+
## How it works
|
|
140
50
|
|
|
141
|
-
|
|
51
|
+
### Plain English → structured tests
|
|
142
52
|
|
|
143
|
-
```
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
npx themis migrate jest --rewrite-imports --convert
|
|
149
|
-
npx themis migrate vitest --rewrite-imports --convert
|
|
150
|
-
npx themis test
|
|
151
|
-
```
|
|
53
|
+
```js
|
|
54
|
+
intent('user can sign in', ({ context, run, verify, cleanup }) => {
|
|
55
|
+
context('a valid user', (ctx) => {
|
|
56
|
+
ctx.user = { email: 'a@b.com', password: 'pw' };
|
|
57
|
+
});
|
|
152
58
|
|
|
153
|
-
|
|
59
|
+
run('the user submits credentials', (ctx) => {
|
|
60
|
+
ctx.result = { ok: true };
|
|
61
|
+
});
|
|
154
62
|
|
|
155
|
-
|
|
63
|
+
verify('authentication succeeds', (ctx) => {
|
|
64
|
+
expect(ctx.result.ok).toBe(true);
|
|
65
|
+
});
|
|
156
66
|
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
67
|
+
cleanup('remove test state', (ctx) => {
|
|
68
|
+
delete ctx.user;
|
|
69
|
+
});
|
|
70
|
+
});
|
|
160
71
|
```
|
|
161
72
|
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
- checks the scanned export names when Themis can resolve them exactly
|
|
165
|
-
- asserts the normalized runtime export contract directly in generated source
|
|
166
|
-
- adds scenario adapters for React components/hooks, Next app/router files, route handlers, and service functions when Themis can infer or read useful inputs
|
|
167
|
-
- captures React interaction and hook state-transition contracts when event handlers or stateful methods are available
|
|
168
|
-
- asserts DOM-state and behavioral flow contracts directly for generated React and Next component adapters
|
|
169
|
-
- emits async behavioral flow contracts for generated React and Next component adapters when flow plans are inferred or hinted, including richer inferred input/submit/loading/success paths for common async forms
|
|
170
|
-
- supports provider-driven DOM flow contracts for empty, disabled, retry, error, and recovery states with attribute- and role-aware assertions
|
|
171
|
-
- fails with a regeneration hint when the source drifts after the scan
|
|
172
|
-
|
|
173
|
-
Themis also supports per-file generation hints with sidecars like `src/components/Button.themis.json` so humans and agents can provide props, component flows, args, route requests, and route context. When those sidecars do not exist yet, `--write-hints` can scaffold them automatically from the current source analysis.
|
|
174
|
-
|
|
175
|
-
This is the core alternative to snapshot-driven testing: generated and hand-written tests assert normalized contracts in readable source, so diffs stay reviewable and updates stay intentional.
|
|
176
|
-
|
|
177
|
-
For repo-wide generation defaults, add `themis.generate.js` or `themis.generate.cjs` at the project root. Providers in that file can match source paths, supply shared props/args/flow plans, register runtime mocks for generated UI scenarios, and wrap generated component renders so generated DOM contracts run inside the same provider shells humans use in app tests. Providers can also declare preset wrapper metadata for router, Next navigation, auth/session shells, React Query, Zustand, and Redux-style app state patterns, including route history/state, query status, auth permissions, and store selector/action metadata.
|
|
178
|
-
|
|
179
|
-
For CI and agent loops, Themis can also enforce generation quality instead of only writing files. Strict runs emit a structured backlog, fail on unresolved scan debt, and hand back exact remediation commands.
|
|
180
|
-
|
|
181
|
-
Use these flags to control the generation loop:
|
|
182
|
-
|
|
183
|
-
- `--json`: machine-readable payload for agents, including prompt-ready next steps
|
|
184
|
-
- `--plan`: alias for `--review --json` with persisted handoff artifacts
|
|
185
|
-
- `--review`: dry-run create/update/remove decisions without writing files
|
|
186
|
-
- `--update`: refresh existing generated files only
|
|
187
|
-
- `--clean`: remove generated files for the selected scope
|
|
188
|
-
- `--changed`: target changed files in a git worktree
|
|
189
|
-
- `--write-hints`: scaffold missing `.themis.json` sidecars so the next generate pass has explicit component props, hook args, service args, and route requests
|
|
190
|
-
- `--scenario`: limit generation to one adapter family such as `react-hook`, `next-app-component`, or `next-route-handler`
|
|
191
|
-
- `--min-confidence`: keep only entries at or above a confidence threshold
|
|
192
|
-
- `--strict`: fail the generate run on skips, conflicts, or entries below `high` confidence
|
|
193
|
-
- `--fail-on-skips`, `--fail-on-conflicts`: turn unresolved scan debt into a non-zero exit code
|
|
194
|
-
- `--require-confidence`: fail if selected generated tests fall below a confidence threshold
|
|
195
|
-
- `--files`, `--match-source`, `--match-export`, `--include`, `--exclude`: narrow the scan scope
|
|
196
|
-
- `--force`: replace a conflicting non-Themis file
|
|
197
|
-
- `--output <dir>`: change the generated test directory
|
|
198
|
-
|
|
199
|
-
Every generation run also writes:
|
|
200
|
-
|
|
201
|
-
- `.themis/generate/generate-map.json`: source-to-generated-test mapping plus scenario/confidence metadata
|
|
202
|
-
- `.themis/generate/generate-last.json`: the full machine-readable generate payload
|
|
203
|
-
- `.themis/generate/generate-handoff.json`: a compact agent handoff artifact with prompt-ready next actions
|
|
204
|
-
- `.themis/generate/generate-backlog.json`: unresolved skips, conflicts, and confidence debt with suggested fixes
|
|
205
|
-
|
|
206
|
-
Local test loops can also opt into a zero IPC execution path:
|
|
207
|
-
|
|
208
|
-
- `npx themis test --isolation in-process`: executes suites in-process instead of worker mode
|
|
209
|
-
- `npx themis test --watch --isolation in-process --cache`: keeps a fast local rerun loop with file-level result caching
|
|
210
|
-
- `npx themis test --isolation worker`: keeps process isolation for CI or global-heavy suites
|
|
211
|
-
|
|
212
|
-
When generated tests fail, Themis also writes:
|
|
213
|
-
|
|
214
|
-
- `.themis/runs/fix-handoff.json`: a deduped failure-to-fix artifact that maps generated failures back to source files, categories, repair strategies, candidate files, and remediation commands
|
|
73
|
+
Phase names: `context`, `run`, `verify`, `cleanup`. Legacy aliases (`arrange/act/assert`, `given/when/then`) also supported.
|
|
215
74
|
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
```bash
|
|
219
|
-
npx themis test --fix
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
`--fix` reads `.themis/runs/fix-handoff.json`, regenerates the affected source targets with `--update`, scaffolds hints when the repair strategy needs them, and reruns the suite.
|
|
223
|
-
|
|
224
|
-
Migration scaffolds also write:
|
|
225
|
-
|
|
226
|
-
- `.themis/migration/migration-report.json`: a machine-readable inventory of detected Jest/Vitest compatibility imports and recommended next actions
|
|
227
|
-
- `themis.compat.js`: an optional local compatibility bridge used by `themis migrate --rewrite-imports`
|
|
228
|
-
|
|
229
|
-
## Why Themis
|
|
230
|
-
|
|
231
|
-
See [`docs/why-themis.md`](docs/why-themis.md) for positioning, differentiators, and community messaging.
|
|
232
|
-
|
|
233
|
-
Short version:
|
|
234
|
-
|
|
235
|
-
- Themis aims to deliver the benefits people reach for in snapshots, without snapshot rot.
|
|
236
|
-
- Prefer explicit, normalized contracts over broad output dumps.
|
|
237
|
-
- Keep changes reviewable through source assertions, machine-readable artifacts, and diff-oriented rerun workflows.
|
|
238
|
-
- See [`docs/showcases.md`](docs/showcases.md) for direct Jest/Vitest comparison examples.
|
|
239
|
-
|
|
240
|
-
## Reference Docs
|
|
241
|
-
|
|
242
|
-
- API reference: [`docs/api.md`](docs/api.md)
|
|
243
|
-
- Agent adoption guide: [`docs/agents-adoption.md`](docs/agents-adoption.md)
|
|
244
|
-
- Migration guide: [`docs/migration.md`](docs/migration.md)
|
|
245
|
-
- Release policy: [`docs/release-policy.md`](docs/release-policy.md)
|
|
246
|
-
- Publish guide: [`docs/publish.md`](docs/publish.md)
|
|
247
|
-
- VS Code extension notes: [`docs/vscode-extension.md`](docs/vscode-extension.md)
|
|
248
|
-
- Agent result schema: [`docs/schemas/agent-result.v1.json`](docs/schemas/agent-result.v1.json)
|
|
249
|
-
- Generate result schema: [`docs/schemas/generate-result.v1.json`](docs/schemas/generate-result.v1.json)
|
|
250
|
-
- Generate map schema: [`docs/schemas/generate-map.v1.json`](docs/schemas/generate-map.v1.json)
|
|
251
|
-
- Generate handoff schema: [`docs/schemas/generate-handoff.v1.json`](docs/schemas/generate-handoff.v1.json)
|
|
252
|
-
- Generate backlog schema: [`docs/schemas/generate-backlog.v1.json`](docs/schemas/generate-backlog.v1.json)
|
|
253
|
-
- Fix handoff schema: [`docs/schemas/fix-handoff.v1.json`](docs/schemas/fix-handoff.v1.json)
|
|
254
|
-
- Failures artifact schema: [`docs/schemas/failures.v1.json`](docs/schemas/failures.v1.json)
|
|
255
|
-
- Contract diff schema: [`docs/schemas/contract-diff.v1.json`](docs/schemas/contract-diff.v1.json)
|
|
256
|
-
- Changelog: [`CHANGELOG.md`](CHANGELOG.md)
|
|
257
|
-
|
|
258
|
-
## Commands
|
|
259
|
-
|
|
260
|
-
- `npx themis init`: creates `themis.config.json`, adds `.themis/` to `.gitignore`, and adds `__themis__/reports/` plus `__themis__/shims/` to `.gitignore`.
|
|
261
|
-
- `npx themis init --agents`: does the same and also writes a downstream `AGENTS.md` from the Themis template if the repo does not already have one.
|
|
262
|
-
- `npx themis generate src`: scans source files and generates contract tests under `__themis__/tests`, using `.generated.test.ts` for TS/TSX sources and `.generated.test.js` for JS/JSX sources.
|
|
263
|
-
- `npx themis generate src --json`: emits a machine-readable generation payload for agents and automation.
|
|
264
|
-
- `npx themis generate src --plan`: emits a planning payload and handoff artifact without writing generated tests.
|
|
265
|
-
- `npx themis generate src --review --json`: previews create/update/remove decisions without writing files.
|
|
266
|
-
- `npx themis generate src --review --strict --json`: fails fast on unresolved generation debt while still emitting a machine-readable plan.
|
|
267
|
-
- `npx themis generate src --write-hints`: scaffolds missing hint sidecars and uses them in the same generate pass.
|
|
268
|
-
- `npx themis generate src --update`: refreshes existing generated tests only.
|
|
269
|
-
- `npx themis generate src --clean`: removes generated tests for the selected scope.
|
|
270
|
-
- `npx themis generate src --changed`: regenerates against changed files in the current git worktree.
|
|
271
|
-
- `npx themis generate src --scenario react-hook --min-confidence high`: targets one adapter family at a confidence threshold.
|
|
272
|
-
- `npx themis generate app --scenario next-route-handler`: focuses generation on Next app router request handlers.
|
|
273
|
-
- `npx themis migrate jest`: scaffolds a Themis config/setup bridge for existing Jest suites and gitignores `.themis/` plus `__themis__/reports/` and `__themis__/shims/`.
|
|
274
|
-
- `npx themis migrate jest --rewrite-imports`: rewrites matched Jest/Vitest/Testing Library imports to a local `themis.compat.js` bridge file.
|
|
275
|
-
- `npx themis migrate jest --convert`: applies codemods for common Jest/Vitest matcher/import patterns so suites move closer to native Themis style.
|
|
276
|
-
- `npx themis migrate vitest`: scaffolds the same bridge for Vitest suites and gitignores `.themis/` plus `__themis__/reports/` and `__themis__/shims/`.
|
|
277
|
-
- `npx themis generate src --require-confidence high`: enforces a quality bar for all selected generated tests.
|
|
278
|
-
- `npx themis generate src --files src/routes/ping.ts`: targets one or more explicit source files.
|
|
279
|
-
- `npx themis generate src --match-source "routes/" --match-export "GET|POST"`: narrows generation by source path and exported symbol.
|
|
280
|
-
- `npx themis generate src --output tests/contracts`: writes generated tests to a custom directory.
|
|
281
|
-
- `npx themis generate src --force`: replaces conflicting files in the target output directory.
|
|
282
|
-
- `npx themis test`: discovers and runs tests.
|
|
283
|
-
- `npx themis test --next`: next-gen console output mode.
|
|
284
|
-
- `npx themis test --json`: emits JSON result payload.
|
|
285
|
-
- `npx themis test --agent`: emits AI-agent-oriented JSON schema.
|
|
286
|
-
- `npx themis test --reporter html`: generates a next-gen HTML report file.
|
|
287
|
-
- `npx themis test --reporter html --html-output reports/themis.html`: writes HTML report to a custom path.
|
|
288
|
-
- `npx themis test --watch`: reruns the suite when watched project files change.
|
|
289
|
-
- `npx themis test --watch --isolation in-process --cache`: runs a zero IPC cached local loop for fast edit/rerun cycles.
|
|
290
|
-
- `npx themis test --workers 8`: overrides worker count (positive integer).
|
|
291
|
-
- `npx themis test --isolation in-process`: runs test files in-process instead of worker processes.
|
|
292
|
-
- `npx themis test --cache`: enables file-level result caching for in-process local loops.
|
|
293
|
-
- `npx themis test --environment jsdom`: runs tests in a browser-like DOM environment.
|
|
294
|
-
- `npx themis test --stability 3`: runs the suite three times and classifies each test as `stable_pass`, `stable_fail`, or `unstable`.
|
|
295
|
-
- `npx themis test --match "intent DSL"`: runs only tests whose full name matches regex.
|
|
296
|
-
- `npx themis test --rerun-failed`: reruns failing tests from `.themis/runs/failed-tests.json`.
|
|
297
|
-
- `npx themis test --fix`: applies generated-test autofixes from `.themis/runs/fix-handoff.json` and reruns the suite.
|
|
298
|
-
- `npx themis test --update-contracts --match "suite > case"`: accepts reviewed `captureContract(...)` changes for a narrow slice of the suite.
|
|
299
|
-
- `npx themis test --no-memes`: disables meme phase aliases (`cook`, `yeet`, `vibecheck`, `wipe`).
|
|
300
|
-
- `npx themis test --lexicon classic|themis`: rebrands human-readable status labels in `next/spec` reporters.
|
|
301
|
-
- `npm run lint`: runs ESLint across the CLI, runtime, scripts, tests, and the VS Code extension scaffold.
|
|
302
|
-
- `npm run validate`: runs lint, test, typecheck, and benchmark gate in one command.
|
|
303
|
-
- `npm run typecheck`: validates TypeScript types for Themis globals and DSL contracts.
|
|
304
|
-
- `npm run benchmark:gate`: fails when benchmark performance exceeds the configured threshold.
|
|
305
|
-
- `npm run pack:check`: previews the npm publish payload.
|
|
306
|
-
- `npm run proof:migration`: migrates checked-in Jest/Vitest fixture suites and proves they run cleanly under Themis.
|
|
307
|
-
|
|
308
|
-
## CI & Release Proof
|
|
309
|
-
|
|
310
|
-
- Lint job runs `npm run lint` on Node 20.
|
|
311
|
-
- Compatibility job runs `npm test` on Node 18 and 20.
|
|
312
|
-
- Release surface job runs `npm run typecheck`, `npm run pack:check`, the HTML + agent reports, verifies `.themis/diffs/contract-diff.json`, produces `.themis/benchmarks/benchmark-last.json`/`.themis/benchmarks/migration-proof.json`, and uploads all of the artifacts for later inspection.
|
|
313
|
-
- Perf gate job runs `npm run benchmark:gate` with `BENCH_MAX_AVG_MS=2500` to guard against regressions before publishing.
|
|
314
|
-
- Migration proof job runs `npm run proof:migration` against checked-in Jest/Vitest fixtures for basic suites, table tests, RTL/jsdom flows, timers, module mocking, and a context/provider-heavy RTL example, then uploads the resulting migration reports plus Themis run artifacts as evidence.
|
|
315
|
-
- Themis React Showcase job verifies a straight-up native Themis React fixture as a first-party example.
|
|
316
|
-
- React showcase perf job runs `npm run benchmark:showcase` on the exact same React scenarios for Themis, Jest, and Vitest on one CI host, then uploads `.themis/benchmarks/showcase-comparison/perf-summary.{json,md}` so the relative timing claim is backed by one comparable artifact.
|
|
317
|
-
- Release `1.0.17` packages this expanded proof lane so every CI run now proves the provider-heavy example alongside the earlier fixtures.
|
|
318
|
-
|
|
319
|
-
## Agent Guide
|
|
320
|
-
|
|
321
|
-
[`AGENTS.md`](AGENTS.md) is the AI-agent contributor contract for this repository. It tells agents working on Themis itself how to write tests, preserve determinism, and update artifact contracts safely.
|
|
322
|
-
|
|
323
|
-
It is not a package-discovery mechanism for every external repo. If another project wants its agents to use Themis, that project should say so in its own `AGENTS.md`, rules, or agent prompt.
|
|
324
|
-
|
|
325
|
-
For downstream install, generation, and migration guidance, see [`docs/agents-adoption.md`](docs/agents-adoption.md).
|
|
326
|
-
|
|
327
|
-
For a copyable downstream rules file, see [`templates/AGENTS.themis.md`](templates/AGENTS.themis.md).
|
|
328
|
-
|
|
329
|
-
You do not need an MCP server just to make agents use Themis. Package metadata, docs, CLI commands, and explicit downstream repo instructions are the primary adoption path. An MCP integration could be useful later for richer editor or automation workflows, but it is optional.
|
|
330
|
-
|
|
331
|
-
Themis writes artifacts under `.themis/`:
|
|
332
|
-
|
|
333
|
-
- `.themis/runs/last-run.json`: full machine-readable run payload.
|
|
334
|
-
- `.themis/runs/failed-tests.json`: compact failure list for retry loops.
|
|
335
|
-
- `.themis/diffs/run-diff.json`: diff against the previous run, including new and resolved failures.
|
|
336
|
-
- `.themis/runs/run-history.json`: rolling recent-run history for agent comparison loops.
|
|
337
|
-
- `.themis/runs/fix-handoff.json`: source-oriented repair handoff for generated test failures.
|
|
338
|
-
- `.themis/migration/migration-report.json`: compatibility inventory and next actions for migrated Jest/Vitest suites.
|
|
339
|
-
- `.themis/diffs/contract-diff.json`: contract capture drift, updates, and update commands for `captureContract(...)` workflows.
|
|
340
|
-
- `.themis/generate/generate-last.json`: latest machine-readable generate payload.
|
|
341
|
-
- `.themis/generate/generate-map.json`: source-to-generated-test mapping.
|
|
342
|
-
- `.themis/generate/generate-handoff.json`: prompt-ready generate handoff payload.
|
|
343
|
-
- `.themis/generate/generate-backlog.json`: unresolved generate debt and suggested remediation.
|
|
344
|
-
- `themis.compat.js`: optional local compat bridge for rewritten migration imports.
|
|
345
|
-
- `.themis/benchmarks/benchmark-last.json`: latest benchmark comparison payload, including migration proof output.
|
|
346
|
-
- `.themis/benchmarks/migration-proof.json`: synthetic migration-conversion proof artifact emitted by `npm run benchmark`.
|
|
347
|
-
- `__themis__/reports/report.html`: interactive HTML verdict report.
|
|
348
|
-
- `__themis__/shims/`: reserved namespace for framework-owned compatibility shims when a fallback file is truly needed. Themis should prefer built-in support first and should not drop ad hoc shim files into `tests/`.
|
|
349
|
-
|
|
350
|
-
`--agent` output includes deterministic failure fingerprints, grouped `analysis.failureClusters`, stability classifications, previous-run comparison data, and a direct generated-test repair hint via `npx themis test --fix`. Fix handoff entries also carry repair strategies, candidate files, and autofix commands for tighter failure-to-fix loops.
|
|
351
|
-
|
|
352
|
-
Machine-facing reporters intentionally emit compact JSON. Agents and tooling should parse the payloads rather than depend on whitespace formatting.
|
|
353
|
-
|
|
354
|
-
The HTML reporter is designed for agent-adjacent review workflows too: it combines verdict status, slow-test surfacing, artifact navigation, and interactive file filtering in one report.
|
|
355
|
-
|
|
356
|
-
## VS Code
|
|
357
|
-
|
|
358
|
-
The repo now includes a thin VS Code extension scaffold at [`packages/themis-vscode`](packages/themis-vscode).
|
|
359
|
-
|
|
360
|
-
The extension is intentionally artifact-driven:
|
|
361
|
-
|
|
362
|
-
- reads `.themis/runs/last-run.json`, `.themis/runs/failed-tests.json`, `.themis/diffs/run-diff.json`, `.themis/generate/generate-last.json`, `.themis/generate/generate-map.json`, `.themis/generate/generate-backlog.json`, and `__themis__/reports/report.html`
|
|
363
|
-
- shows the latest verdict and failures in a sidebar
|
|
364
|
-
- adds generated-review navigation for source/test/hint mappings plus unresolved generation backlog
|
|
365
|
-
- reruns Themis from VS Code commands
|
|
366
|
-
- opens the HTML report inside a webview
|
|
367
|
-
|
|
368
|
-
It does not replace the CLI. The CLI and `.themis/**` artifacts remain the source of truth.
|
|
369
|
-
|
|
370
|
-
## Mocks And UI Primitives
|
|
371
|
-
|
|
372
|
-
Themis now ships first-party test utilities for agent-generated tests:
|
|
75
|
+
### Mocks and UI primitives
|
|
373
76
|
|
|
374
77
|
```js
|
|
375
78
|
mock('../src/api', () => ({
|
|
@@ -378,121 +81,47 @@ mock('../src/api', () => ({
|
|
|
378
81
|
|
|
379
82
|
const { fetchUser } = require('../src/api');
|
|
380
83
|
|
|
381
|
-
test('captures
|
|
84
|
+
test('mock captures calls', () => {
|
|
382
85
|
const user = fetchUser();
|
|
383
86
|
expect(fetchUser).toHaveBeenCalledTimes(1);
|
|
384
87
|
expect(user).toMatchObject({ id: 'u_1', name: 'Ada' });
|
|
385
88
|
});
|
|
386
89
|
```
|
|
387
90
|
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
```js
|
|
391
|
-
test('captures a stable response contract', () => {
|
|
392
|
-
const payload = {
|
|
393
|
-
status: 'ok',
|
|
394
|
-
flags: ['fast', 'deterministic']
|
|
395
|
-
};
|
|
396
|
-
|
|
397
|
-
captureContract('status payload', payload);
|
|
398
|
-
expect(payload.status).toBe('ok');
|
|
399
|
-
});
|
|
400
|
-
```
|
|
401
|
-
|
|
402
|
-
Themis intentionally avoids first-party snapshot-file workflows. Prefer direct assertions, generated contract tests, and explicit flow expectations over large opaque snapshots. The goal is comparable baseline coverage with better reviewability: normalized contracts, focused assertions, machine-readable artifacts, and intentional updates instead of broad snapshot re-acceptance.
|
|
403
|
-
|
|
404
|
-
Available globals:
|
|
405
|
-
|
|
406
|
-
- `fn(...)`
|
|
407
|
-
- `spyOn(object, methodName)`
|
|
408
|
-
- `mock(moduleId, factoryOrExports)`
|
|
409
|
-
- `unmock(moduleId)`
|
|
410
|
-
- `clearAllMocks()`
|
|
411
|
-
- `resetAllMocks()`
|
|
412
|
-
- `restoreAllMocks()`
|
|
91
|
+
For `jsdom` tests, Themis ships `render`, `screen`, `fireEvent`, `waitFor`, `useFakeTimers`, `mockFetch`, and more. Full list in the [API reference](docs/api.md).
|
|
413
92
|
|
|
414
|
-
|
|
93
|
+
### Code generation
|
|
415
94
|
|
|
416
|
-
|
|
417
|
-
- `screen.getByText(...)`
|
|
418
|
-
- `screen.getByRole(...)`
|
|
419
|
-
- `screen.getByLabelText(...)`
|
|
420
|
-
- `fireEvent.click/change/input/submit/keyDown(...)`
|
|
421
|
-
- `waitFor(asyncAssertion)`
|
|
422
|
-
- `cleanup()`
|
|
423
|
-
- `useFakeTimers()`, `advanceTimersByTime(ms)`, `runAllTimers()`, `useRealTimers()`
|
|
424
|
-
- `flushMicrotasks()`
|
|
425
|
-
- `mockFetch(...)`, `resetFetchMocks()`, `restoreFetch()`
|
|
95
|
+
Themis scans your source tree and generates contract tests for exported modules, React components, hooks, Next.js routes, and services:
|
|
426
96
|
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
test('submits the form', async () => {
|
|
431
|
-
render(<button onClick={() => document.body.setAttribute('data-state', 'sent')}>Send</button>);
|
|
432
|
-
|
|
433
|
-
fireEvent.click(screen.getByRole('button', { name: 'Send' }));
|
|
434
|
-
|
|
435
|
-
await waitFor(() => {
|
|
436
|
-
expect(document.body).toHaveAttribute('data-state', 'sent');
|
|
437
|
-
});
|
|
438
|
-
});
|
|
97
|
+
```bash
|
|
98
|
+
npx themis generate src
|
|
99
|
+
npx themis test
|
|
439
100
|
```
|
|
440
101
|
|
|
441
|
-
|
|
102
|
+
When generated tests fail:
|
|
442
103
|
|
|
443
|
-
```
|
|
444
|
-
|
|
445
|
-
useFakeTimers();
|
|
446
|
-
const fetchMock = mockFetch({ json: { ok: true } });
|
|
447
|
-
|
|
448
|
-
let done = false;
|
|
449
|
-
setTimeout(async () => {
|
|
450
|
-
const response = await fetch('/api/status');
|
|
451
|
-
const payload = await response.json();
|
|
452
|
-
done = payload.ok;
|
|
453
|
-
}, 50);
|
|
454
|
-
|
|
455
|
-
advanceTimersByTime(50);
|
|
456
|
-
await flushMicrotasks();
|
|
457
|
-
|
|
458
|
-
expect(done).toBe(true);
|
|
459
|
-
expect(fetchMock).toHaveBeenCalled();
|
|
460
|
-
useRealTimers();
|
|
461
|
-
restoreFetch();
|
|
462
|
-
});
|
|
104
|
+
```bash
|
|
105
|
+
npx themis test --fix
|
|
463
106
|
```
|
|
464
107
|
|
|
465
|
-
|
|
466
|
-
|
|
467
|
-
Themis supports a strict code-native intent DSL:
|
|
468
|
-
|
|
469
|
-
```js
|
|
470
|
-
intent('user can sign in', ({ context, run, verify, cleanup }) => {
|
|
471
|
-
context('a valid user', (ctx) => {
|
|
472
|
-
ctx.user = { email: 'a@b.com', password: 'pw' };
|
|
473
|
-
});
|
|
474
|
-
|
|
475
|
-
run('the user submits credentials', (ctx) => {
|
|
476
|
-
ctx.result = { ok: true };
|
|
477
|
-
});
|
|
108
|
+
`--fix` regenerates affected targets and reruns the suite. See the [API reference](docs/api.md) for all generation flags (`--review`, `--plan`, `--write-hints`, `--strict`, `--changed`, etc.).
|
|
478
109
|
|
|
479
|
-
|
|
480
|
-
expect(ctx.result.ok).toBe(true);
|
|
481
|
-
});
|
|
110
|
+
### Migration
|
|
482
111
|
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
|
|
112
|
+
```bash
|
|
113
|
+
npx themis migrate jest
|
|
114
|
+
npx themis migrate vitest
|
|
115
|
+
npx themis test
|
|
487
116
|
```
|
|
488
117
|
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
118
|
+
One command scaffolds a compatibility bridge. Add `--rewrite-imports` to rewrite import paths, `--convert` for codemods. See the [migration guide](docs/migration.md).
|
|
119
|
+
|
|
120
|
+
---
|
|
492
121
|
|
|
493
122
|
## Config
|
|
494
123
|
|
|
495
|
-
`themis.config.json
|
|
124
|
+
`themis.config.json`:
|
|
496
125
|
|
|
497
126
|
```json
|
|
498
127
|
{
|
|
@@ -503,28 +132,15 @@ Easter egg aliases are also available: `cook`, `yeet`, `vibecheck`, `wipe`.
|
|
|
503
132
|
"reporter": "next",
|
|
504
133
|
"environment": "node",
|
|
505
134
|
"setupFiles": ["tests/setup.ts"],
|
|
506
|
-
"tsconfigPath": "tsconfig.json"
|
|
507
|
-
"htmlReportPath": "__themis__/reports/report.html",
|
|
508
|
-
"testIgnore": ["^tests/fixtures(?:/|$)"]
|
|
135
|
+
"tsconfigPath": "tsconfig.json"
|
|
509
136
|
}
|
|
510
137
|
```
|
|
511
138
|
|
|
512
|
-
|
|
513
|
-
Themis discovers both `testDir` and `generatedTestsDir` by default. Use `testIgnore` only for fixture folders, scratch suites, or other paths you intentionally want to skip.
|
|
514
|
-
Themis also stubs common frontend style and asset imports under Node or jsdom runs, including `.css`, `.scss`, `.png`, `.jpg`, `.svg`, and common font/media files, so repos should not need ad hoc `tests/*.cjs` setup files just to make those imports load.
|
|
515
|
-
|
|
516
|
-
## TypeScript
|
|
517
|
-
|
|
518
|
-
The package ships first-party typings for:
|
|
519
|
-
|
|
520
|
-
- programmatic APIs (`collectAndRun`, `runTests`, config helpers)
|
|
521
|
-
- global test APIs (`describe`, `test`, `intent`, hooks, `expect`)
|
|
522
|
-
- typed intent context (`intent<MyCtx>(...)`)
|
|
523
|
-
- project-aware module loading for `ts`, `tsx`, ESM `js`, `jsx`, `tsconfig` path aliases, and setup files
|
|
139
|
+
Use `environment: "jsdom"` for DOM-driven tests. Themis auto-stubs common style/asset imports (`.css`, `.scss`, `.png`, `.svg`, etc.).
|
|
524
140
|
|
|
525
|
-
|
|
141
|
+
---
|
|
526
142
|
|
|
527
|
-
|
|
143
|
+
## TypeScript
|
|
528
144
|
|
|
529
145
|
```json
|
|
530
146
|
{
|
|
@@ -534,37 +150,35 @@ Use the global types in your project with:
|
|
|
534
150
|
}
|
|
535
151
|
```
|
|
536
152
|
|
|
537
|
-
|
|
153
|
+
Ships first-party typings for all test APIs, typed intent context, and project-aware module loading for `.ts`, `.tsx`, ESM `.js`, `.jsx`, and `tsconfig` path aliases.
|
|
538
154
|
|
|
539
|
-
|
|
540
|
-
npm run benchmark
|
|
541
|
-
npm run benchmark:showcase
|
|
542
|
-
npm run benchmark:gate
|
|
543
|
-
```
|
|
155
|
+
---
|
|
544
156
|
|
|
545
|
-
|
|
157
|
+
## Pair with Alethia
|
|
546
158
|
|
|
547
|
-
|
|
548
|
-
- `BENCH_TESTS_PER_FILE` (default `25`)
|
|
549
|
-
- `BENCH_REPEATS` (default `3`)
|
|
550
|
-
- `BENCH_WORKERS` (default `4`)
|
|
551
|
-
- `BENCH_INCLUDE_EXTERNAL=1` to include Jest/Vitest/Bun comparisons
|
|
552
|
-
- `BENCH_MAX_AVG_MS` to override the gate threshold
|
|
553
|
-
- `BENCH_GATE_CONFIG` to point `benchmark:gate` at a custom config file
|
|
554
|
-
- `SHOWCASE_BENCH_WARMUPS` (default `1`) for the same-spec React showcase comparison
|
|
555
|
-
- `SHOWCASE_BENCH_REPEATS` (default `5`) for the same-spec React showcase comparison
|
|
159
|
+
Themis owns the unit/contract layer. [Alethia](https://github.com/vitron-ai/alethia) owns the E2E/policy layer. Together they form the tightest test loop an autonomous coding agent can sit inside:
|
|
556
160
|
|
|
557
|
-
|
|
558
|
-
|
|
559
|
-
|
|
161
|
+
1. Agent generates code
|
|
162
|
+
2. **Themis** verifies the contract in milliseconds
|
|
163
|
+
3. **Alethia** verifies the running app in a real browser, under safety policy, with a signed audit trail
|
|
560
164
|
|
|
561
|
-
|
|
165
|
+
Use Themis on its own — it's MIT and stands alone.
|
|
562
166
|
|
|
563
|
-
|
|
167
|
+
---
|
|
564
168
|
|
|
565
|
-
|
|
566
|
-
|
|
567
|
-
|
|
568
|
-
|
|
169
|
+
## Reference docs
|
|
170
|
+
|
|
171
|
+
- [API reference](docs/api.md) — all CLI flags, globals, matchers, mocks, UI primitives
|
|
172
|
+
- [Agent adoption guide](docs/agents-adoption.md) — downstream repo setup
|
|
173
|
+
- [Migration guide](docs/migration.md) — Jest/Vitest migration details
|
|
174
|
+
- [Why Themis](docs/why-themis.md) — positioning and differentiators
|
|
175
|
+
- [Showcase comparisons](docs/showcases.md) — direct Jest/Vitest examples
|
|
176
|
+
- [Tutorial: Testing with Claude Code](docs/tutorial-claude-code.md)
|
|
177
|
+
- [VS Code extension](docs/vscode-extension.md)
|
|
178
|
+
- [Release policy](docs/release-policy.md)
|
|
179
|
+
- [Publish guide](docs/publish.md)
|
|
180
|
+
- [Changelog](CHANGELOG.md)
|
|
569
181
|
|
|
570
|
-
|
|
182
|
+
<p align="center">
|
|
183
|
+
<img src="src/assets/themisVerdictEngine.png" alt="Themis verdict engine art" width="960">
|
|
184
|
+
</p>
|
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
# Testing With Claude Code and Themis
|
|
2
|
+
|
|
3
|
+
A step-by-step walkthrough showing how Themis turns Claude Code into a test-writing machine that gets it right on the first try.
|
|
4
|
+
|
|
5
|
+
## The Problem
|
|
6
|
+
|
|
7
|
+
When you ask Claude Code to write unit tests, it reaches for Jest or Vitest by default. The tests it generates are often correct, but just as often they have subtle issues: wrong import paths, misused mocking APIs, snapshot tests where assertions would be better, setup files where the framework handles things natively. You end up in an edit-test-fix loop that burns time and context window.
|
|
8
|
+
|
|
9
|
+
Themis fixes this by shipping structured guidance directly to Claude Code — a skill, slash commands, and a `CLAUDE.md` that tells Claude exactly how to write, run, and fix tests. No copy-pasting docs. No explaining the framework. Claude just knows.
|
|
10
|
+
|
|
11
|
+
## What You'll See
|
|
12
|
+
|
|
13
|
+
By the end of this tutorial you'll have:
|
|
14
|
+
|
|
15
|
+
1. A Node.js project with Themis installed and Claude Code fully wired up
|
|
16
|
+
2. Generated tests that pass on the first run
|
|
17
|
+
3. A structured failure-fix loop where Claude reads machine-parseable repair hints instead of raw stack traces
|
|
18
|
+
4. Slash commands (`/themis-test`, `/themis-generate`, `/themis-fix`) that work out of the box
|
|
19
|
+
|
|
20
|
+
## Step 1: Set Up a Project
|
|
21
|
+
|
|
22
|
+
Start with any Node.js or TypeScript project. For this tutorial we'll use a small utility library.
|
|
23
|
+
|
|
24
|
+
```bash
|
|
25
|
+
mkdir demo-project && cd demo-project
|
|
26
|
+
npm init -y
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Create a source file at `src/cart.js`:
|
|
30
|
+
|
|
31
|
+
```js
|
|
32
|
+
class Cart {
|
|
33
|
+
constructor() {
|
|
34
|
+
this.items = [];
|
|
35
|
+
}
|
|
36
|
+
|
|
37
|
+
add(item) {
|
|
38
|
+
if (!item || !item.name || typeof item.price !== 'number') {
|
|
39
|
+
throw new TypeError('Item must have a name and a numeric price');
|
|
40
|
+
}
|
|
41
|
+
const existing = this.items.find((i) => i.name === item.name);
|
|
42
|
+
if (existing) {
|
|
43
|
+
existing.quantity += item.quantity || 1;
|
|
44
|
+
} else {
|
|
45
|
+
this.items.push({ ...item, quantity: item.quantity || 1 });
|
|
46
|
+
}
|
|
47
|
+
}
|
|
48
|
+
|
|
49
|
+
remove(name) {
|
|
50
|
+
const index = this.items.findIndex((i) => i.name === name);
|
|
51
|
+
if (index === -1) throw new Error(`Item "${name}" not in cart`);
|
|
52
|
+
this.items.splice(index, 1);
|
|
53
|
+
}
|
|
54
|
+
|
|
55
|
+
total() {
|
|
56
|
+
return this.items.reduce((sum, item) => sum + item.price * item.quantity, 0);
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
checkout(paymentMethod) {
|
|
60
|
+
if (this.items.length === 0) throw new Error('Cannot checkout an empty cart');
|
|
61
|
+
const receipt = {
|
|
62
|
+
items: this.items.map((i) => ({ ...i })),
|
|
63
|
+
total: this.total(),
|
|
64
|
+
paymentMethod,
|
|
65
|
+
timestamp: new Date().toISOString()
|
|
66
|
+
};
|
|
67
|
+
this.items = [];
|
|
68
|
+
return receipt;
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
|
|
72
|
+
module.exports = { Cart };
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Step 2: Install Themis With Claude Code Integration
|
|
76
|
+
|
|
77
|
+
```bash
|
|
78
|
+
npm install -D @vitronai/themis@latest
|
|
79
|
+
npx themis init --claude-code
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
That one command installs:
|
|
83
|
+
|
|
84
|
+
- `CLAUDE.md` — adoption rules at the repo root that Claude Code reads automatically
|
|
85
|
+
- `.claude/skills/themis/SKILL.md` — a skill that auto-loads when Claude sees a test-related request
|
|
86
|
+
- `.claude/commands/themis-test.md` — `/themis-test` slash command
|
|
87
|
+
- `.claude/commands/themis-generate.md` — `/themis-generate` slash command
|
|
88
|
+
- `.claude/commands/themis-migrate.md` — `/themis-migrate` slash command
|
|
89
|
+
- `.claude/commands/themis-fix.md` — `/themis-fix` slash command
|
|
90
|
+
|
|
91
|
+
You can verify:
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
cat CLAUDE.md # Themis adoption rules
|
|
95
|
+
ls .claude/skills/ # themis/SKILL.md
|
|
96
|
+
ls .claude/commands/ # four slash command files
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
## Step 3: Generate Tests
|
|
100
|
+
|
|
101
|
+
Open Claude Code in the project and type:
|
|
102
|
+
|
|
103
|
+
```
|
|
104
|
+
/themis-generate src
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
Claude uses the installed skill context to run `npx themis generate src`. Generated tests land under `__themis__/tests/` as `.generated.test.js` files. These are deterministic, contract-style tests — not LLM-generated guesses.
|
|
108
|
+
|
|
109
|
+
## Step 4: Run the Test Loop
|
|
110
|
+
|
|
111
|
+
```
|
|
112
|
+
/themis-test
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
This runs `npx themis test --reporter agent` and Claude reads the structured JSON output. If everything passes, you're done. If there are failures, Claude sees:
|
|
116
|
+
|
|
117
|
+
```json
|
|
118
|
+
{
|
|
119
|
+
"failures": [
|
|
120
|
+
{
|
|
121
|
+
"cluster": "cart-checkout-validation",
|
|
122
|
+
"repairHints": ["checkout() throws when cart is empty — test passes an empty cart but expects success"],
|
|
123
|
+
"sourceFile": "src/cart.js",
|
|
124
|
+
"lineNumber": 32,
|
|
125
|
+
"expected": "Error: Cannot checkout an empty cart",
|
|
126
|
+
"actual": "{ items: [], total: 0 }"
|
|
127
|
+
}
|
|
128
|
+
]
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
Instead of re-reading a raw stack trace, Claude acts on the `repairHints` directly. This is the key difference: structured signals instead of unstructured error output.
|
|
133
|
+
|
|
134
|
+
## Step 5: Ask Claude to Write More Tests
|
|
135
|
+
|
|
136
|
+
Now ask Claude to add coverage for edge cases:
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
Write additional tests for the Cart class covering:
|
|
140
|
+
- adding duplicate items increments quantity
|
|
141
|
+
- removing a non-existent item throws
|
|
142
|
+
- checkout clears the cart
|
|
143
|
+
- total with no items returns 0
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
Because the Themis skill is loaded, Claude will:
|
|
147
|
+
|
|
148
|
+
1. Use `intent(...)` for behavior tests and `test(...)` for pure unit checks
|
|
149
|
+
2. Follow the four-phase shape: context, run, verify, cleanup
|
|
150
|
+
3. Use `expect(...)` assertions (not snapshots)
|
|
151
|
+
4. Place tests alongside the generated ones, not in a random `tests/` directory
|
|
152
|
+
|
|
153
|
+
Run `/themis-test` again to verify.
|
|
154
|
+
|
|
155
|
+
## Step 6: Fix Failures (When They Happen)
|
|
156
|
+
|
|
157
|
+
If any test fails, use:
|
|
158
|
+
|
|
159
|
+
```
|
|
160
|
+
/themis-fix
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
Claude will:
|
|
164
|
+
|
|
165
|
+
1. Run `npx themis test --reporter agent` to get the current failures
|
|
166
|
+
2. Group failures by `cluster` — fixes within a cluster share a root cause
|
|
167
|
+
3. Read `repairHints` before looking at the stack trace
|
|
168
|
+
4. Apply the smallest fix that addresses the root cause
|
|
169
|
+
5. Re-run with `--rerun-failed` to confirm the fix without running the full suite
|
|
170
|
+
|
|
171
|
+
This cluster-based fixing is faster than fixing tests one at a time, and the `--rerun-failed` flag means you don't pay the cost of a full suite run after each fix.
|
|
172
|
+
|
|
173
|
+
## Step 7: Optional — Wire Up the Automated Hook
|
|
174
|
+
|
|
175
|
+
For the tightest possible loop, add a PostToolUse hook that runs Themis automatically after every edit Claude makes:
|
|
176
|
+
|
|
177
|
+
Add this to `.claude/settings.json`:
|
|
178
|
+
|
|
179
|
+
```json
|
|
180
|
+
{
|
|
181
|
+
"hooks": {
|
|
182
|
+
"PostToolUse": [
|
|
183
|
+
{
|
|
184
|
+
"matcher": "Edit|Write|MultiEdit",
|
|
185
|
+
"hooks": [
|
|
186
|
+
{
|
|
187
|
+
"type": "command",
|
|
188
|
+
"command": "node node_modules/@vitronai/themis/scripts/claude-hook.js"
|
|
189
|
+
}
|
|
190
|
+
]
|
|
191
|
+
}
|
|
192
|
+
]
|
|
193
|
+
}
|
|
194
|
+
}
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
Now every time Claude edits a `.js`/`.ts`/`.jsx`/`.tsx` file, Themis runs automatically. If tests fail, the structured failure JSON is fed back into the conversation — Claude sees it immediately and can fix it in the next turn without you running anything.
|
|
198
|
+
|
|
199
|
+
The hook is smart about scope:
|
|
200
|
+
|
|
201
|
+
- Skips non-source edits (docs, config, etc.)
|
|
202
|
+
- Uses `--rerun-failed` when there's a prior failure artifact
|
|
203
|
+
- Exits silently when tests pass (no context noise)
|
|
204
|
+
- Set `THEMIS_HOOK_DISABLED=1` to pause it temporarily
|
|
205
|
+
|
|
206
|
+
## Why This Works
|
|
207
|
+
|
|
208
|
+
The magic is not in Themis being a better test runner (though it is faster). The magic is in the **structured agent context**:
|
|
209
|
+
|
|
210
|
+
1. **The skill** tells Claude exactly when and how to use Themis — it auto-loads without you mentioning the framework
|
|
211
|
+
2. **The CLAUDE.md** provides rules about what to avoid (no setup shims, no snapshots as defaults, no ad-hoc test directories)
|
|
212
|
+
3. **The `--reporter agent` output** gives Claude machine-parseable failure data with repair hints, instead of raw stack traces it has to re-parse
|
|
213
|
+
4. **The slash commands** encode the correct workflow so Claude doesn't have to figure out which flags to pass
|
|
214
|
+
|
|
215
|
+
In Tessl evaluations across 10 scenarios, agents scored **37% without** the Themis skill context and **97% with it**. The context is the product.
|
|
216
|
+
|
|
217
|
+
## What's Next
|
|
218
|
+
|
|
219
|
+
- **Migrate from Jest or Vitest**: Run `/themis-migrate` — Claude walks through the four-step incremental migration
|
|
220
|
+
- **Cursor users**: Run `npx themis init --cursor` to install `.cursorrules`
|
|
221
|
+
- **Both at once**: `npx themis init --agents --claude-code --cursor`
|
|
222
|
+
- **Auto-detection**: A bare `npx themis init` detects which agents are present and installs the right assets automatically
|
|
223
|
+
|
|
224
|
+
## Links
|
|
225
|
+
|
|
226
|
+
- npm: [`@vitronai/themis`](https://www.npmjs.com/package/@vitronai/themis)
|
|
227
|
+
- GitHub: [vitron-ai/themis](https://github.com/vitron-ai/themis)
|
|
228
|
+
- Tessl tile: [vitron-ai/themis](https://tessl.io/registry/vitron-ai/themis)
|
|
229
|
+
- Eval results: [37% baseline → 97% with skill](https://tessl.io/eval-runs/019d72a0-8211-74ea-84ef-a8e336ead3d2)
|
|
230
|
+
- Adoption guide: [`docs/agents-adoption.md`](agents-adoption.md)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@vitronai/themis",
|
|
3
|
-
"version": "1.2.
|
|
3
|
+
"version": "1.2.2",
|
|
4
4
|
"description": "A Node.js and TypeScript unit test framework designed for AI coding agents. Drop-in alternative to Jest and Vitest with machine-readable failure output, structured repair hints, and one-command migration.",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Vitron AI",
|