slice-tournament-zoo 0.5.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +202 -0
- package/README.md +357 -0
- package/bin/stz.mjs +15 -0
- package/package.json +35 -0
- package/src/README.md +19 -0
- package/src/bridge.ts +950 -0
- package/src/budget.ts +78 -0
- package/src/cli.ts +126 -0
- package/src/cost-tracker.ts +59 -0
- package/src/escalation.ts +89 -0
- package/src/eval-runner.ts +220 -0
- package/src/grpo.ts +54 -0
- package/src/hack-detector.ts +124 -0
- package/src/index.ts +17 -0
- package/src/merge.ts +245 -0
- package/src/mock/README.md +40 -0
- package/src/mock/interfaces.ts +114 -0
- package/src/mock/mock.ts +223 -0
- package/src/mock/orchestrator.ts +457 -0
- package/src/pressure.ts +81 -0
- package/src/project.ts +335 -0
- package/src/seal.ts +182 -0
- package/src/selection.ts +128 -0
- package/src/specdiff.ts +141 -0
- package/src/state.ts +95 -0
- package/src/taxonomy.ts +161 -0
- package/src/types.ts +305 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
1
|
+
|
|
2
|
+
Apache License
|
|
3
|
+
Version 2.0, January 2004
|
|
4
|
+
http://www.apache.org/licenses/
|
|
5
|
+
|
|
6
|
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
|
7
|
+
|
|
8
|
+
1. Definitions.
|
|
9
|
+
|
|
10
|
+
"License" shall mean the terms and conditions for use, reproduction,
|
|
11
|
+
and distribution as defined by Sections 1 through 9 of this document.
|
|
12
|
+
|
|
13
|
+
"Licensor" shall mean the copyright owner or entity authorized by
|
|
14
|
+
the copyright owner that is granting the License.
|
|
15
|
+
|
|
16
|
+
"Legal Entity" shall mean the union of the acting entity and all
|
|
17
|
+
other entities that control, are controlled by, or are under common
|
|
18
|
+
control with that entity. For the purposes of this definition,
|
|
19
|
+
"control" means (i) the power, direct or indirect, to cause the
|
|
20
|
+
direction or management of such entity, whether by contract or
|
|
21
|
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
|
22
|
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
|
23
|
+
|
|
24
|
+
"You" (or "Your") shall mean an individual or Legal Entity
|
|
25
|
+
exercising permissions granted by this License.
|
|
26
|
+
|
|
27
|
+
"Source" form shall mean the preferred form for making modifications,
|
|
28
|
+
including but not limited to software source code, documentation
|
|
29
|
+
source, and configuration files.
|
|
30
|
+
|
|
31
|
+
"Object" form shall mean any form resulting from mechanical
|
|
32
|
+
transformation or translation of a Source form, including but
|
|
33
|
+
not limited to compiled object code, generated documentation,
|
|
34
|
+
and conversions to other media types.
|
|
35
|
+
|
|
36
|
+
"Work" shall mean the work of authorship, whether in Source or
|
|
37
|
+
Object form, made available under the License, as indicated by a
|
|
38
|
+
copyright notice that is included in or attached to the work
|
|
39
|
+
(an example is provided in the Appendix below).
|
|
40
|
+
|
|
41
|
+
"Derivative Works" shall mean any work, whether in Source or Object
|
|
42
|
+
form, that is based on (or derived from) the Work and for which the
|
|
43
|
+
editorial revisions, annotations, elaborations, or other modifications
|
|
44
|
+
represent, as a whole, an original work of authorship. For the purposes
|
|
45
|
+
of this License, Derivative Works shall not include works that remain
|
|
46
|
+
separable from, or merely link (or bind by name) to the interfaces of,
|
|
47
|
+
the Work and Derivative Works thereof.
|
|
48
|
+
|
|
49
|
+
"Contribution" shall mean any work of authorship, including
|
|
50
|
+
the original version of the Work and any modifications or additions
|
|
51
|
+
to that Work or Derivative Works thereof, that is intentionally
|
|
52
|
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
|
53
|
+
or by an individual or Legal Entity authorized to submit on behalf of
|
|
54
|
+
the copyright owner. For the purposes of this definition, "submitted"
|
|
55
|
+
means any form of electronic, verbal, or written communication sent
|
|
56
|
+
to the Licensor or its representatives, including but not limited to
|
|
57
|
+
communication on electronic mailing lists, source code control systems,
|
|
58
|
+
and issue tracking systems that are managed by, or on behalf of, the
|
|
59
|
+
Licensor for the purpose of discussing and improving the Work, but
|
|
60
|
+
excluding communication that is conspicuously marked or otherwise
|
|
61
|
+
designated in writing by the copyright owner as "Not a Contribution."
|
|
62
|
+
|
|
63
|
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
|
64
|
+
on behalf of whom a Contribution has been received by Licensor and
|
|
65
|
+
subsequently incorporated within the Work.
|
|
66
|
+
|
|
67
|
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
|
68
|
+
this License, each Contributor hereby grants to You a perpetual,
|
|
69
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
|
70
|
+
copyright license to reproduce, prepare Derivative Works of,
|
|
71
|
+
publicly display, publicly perform, sublicense, and distribute the
|
|
72
|
+
Work and such Derivative Works in Source or Object form.
|
|
73
|
+
|
|
74
|
+
3. Grant of Patent License. Subject to the terms and conditions of
|
|
75
|
+
this License, each Contributor hereby grants to You a perpetual,
|
|
76
|
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
|
77
|
+
(except as stated in this section) patent license to make, have made,
|
|
78
|
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
|
79
|
+
where such license applies only to those patent claims licensable
|
|
80
|
+
by such Contributor that are necessarily infringed by their
|
|
81
|
+
Contribution(s) alone or by combination of their Contribution(s)
|
|
82
|
+
with the Work to which such Contribution(s) was submitted. If You
|
|
83
|
+
institute patent litigation against any entity (including a
|
|
84
|
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
|
85
|
+
or a Contribution incorporated within the Work constitutes direct
|
|
86
|
+
or contributory patent infringement, then any patent licenses
|
|
87
|
+
granted to You under this License for that Work shall terminate
|
|
88
|
+
as of the date such litigation is filed.
|
|
89
|
+
|
|
90
|
+
4. Redistribution. You may reproduce and distribute copies of the
|
|
91
|
+
Work or Derivative Works thereof in any medium, with or without
|
|
92
|
+
modifications, and in Source or Object form, provided that You
|
|
93
|
+
meet the following conditions:
|
|
94
|
+
|
|
95
|
+
(a) You must give any other recipients of the Work or
|
|
96
|
+
Derivative Works a copy of this License; and
|
|
97
|
+
|
|
98
|
+
(b) You must cause any modified files to carry prominent notices
|
|
99
|
+
stating that You changed the files; and
|
|
100
|
+
|
|
101
|
+
(c) You must retain, in the Source form of any Derivative Works
|
|
102
|
+
that You distribute, all copyright, patent, trademark, and
|
|
103
|
+
attribution notices from the Source form of the Work,
|
|
104
|
+
excluding those notices that do not pertain to any part of
|
|
105
|
+
the Derivative Works; and
|
|
106
|
+
|
|
107
|
+
(d) If the Work includes a "NOTICE" text file as part of its
|
|
108
|
+
distribution, then any Derivative Works that You distribute must
|
|
109
|
+
include a readable copy of the attribution notices contained
|
|
110
|
+
within such NOTICE file, excluding those notices that do not
|
|
111
|
+
pertain to any part of the Derivative Works, in at least one
|
|
112
|
+
of the following places: within a NOTICE text file distributed
|
|
113
|
+
as part of the Derivative Works; within the Source form or
|
|
114
|
+
documentation, if provided along with the Derivative Works; or,
|
|
115
|
+
within a display generated by the Derivative Works, if and
|
|
116
|
+
wherever such third-party notices normally appear. The contents
|
|
117
|
+
of the NOTICE file are for informational purposes only and
|
|
118
|
+
do not modify the License. You may add Your own attribution
|
|
119
|
+
notices within Derivative Works that You distribute, alongside
|
|
120
|
+
or as an addendum to the NOTICE text from the Work, provided
|
|
121
|
+
that such additional attribution notices cannot be construed
|
|
122
|
+
as modifying the License.
|
|
123
|
+
|
|
124
|
+
You may add Your own copyright statement to Your modifications and
|
|
125
|
+
may provide additional or different license terms and conditions
|
|
126
|
+
for use, reproduction, or distribution of Your modifications, or
|
|
127
|
+
for any such Derivative Works as a whole, provided Your use,
|
|
128
|
+
reproduction, and distribution of the Work otherwise complies with
|
|
129
|
+
the conditions stated in this License.
|
|
130
|
+
|
|
131
|
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
|
132
|
+
any Contribution intentionally submitted for inclusion in the Work
|
|
133
|
+
by You to the Licensor shall be under the terms and conditions of
|
|
134
|
+
this License, without any additional terms or conditions.
|
|
135
|
+
Notwithstanding the above, nothing herein shall supersede or modify
|
|
136
|
+
the terms of any separate license agreement you may have executed
|
|
137
|
+
with Licensor regarding such Contributions.
|
|
138
|
+
|
|
139
|
+
6. Trademarks. This License does not grant permission to use the trade
|
|
140
|
+
names, trademarks, service marks, or product names of the Licensor,
|
|
141
|
+
except as required for reasonable and customary use in describing the
|
|
142
|
+
origin of the Work and reproducing the content of the NOTICE file.
|
|
143
|
+
|
|
144
|
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
|
145
|
+
agreed to in writing, Licensor provides the Work (and each
|
|
146
|
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
|
147
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
|
148
|
+
implied, including, without limitation, any warranties or conditions
|
|
149
|
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
|
150
|
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
|
151
|
+
appropriateness of using or redistributing the Work and assume any
|
|
152
|
+
risks associated with Your exercise of permissions under this License.
|
|
153
|
+
|
|
154
|
+
8. Limitation of Liability. In no event and under no legal theory,
|
|
155
|
+
whether in tort (including negligence), contract, or otherwise,
|
|
156
|
+
unless required by applicable law (such as deliberate and grossly
|
|
157
|
+
negligent acts) or agreed to in writing, shall any Contributor be
|
|
158
|
+
liable to You for damages, including any direct, indirect, special,
|
|
159
|
+
incidental, or consequential damages of any character arising as a
|
|
160
|
+
result of this License or out of the use or inability to use the
|
|
161
|
+
Work (including but not limited to damages for loss of goodwill,
|
|
162
|
+
work stoppage, computer failure or malfunction, or any and all
|
|
163
|
+
other commercial damages or losses), even if such Contributor
|
|
164
|
+
has been advised of the possibility of such damages.
|
|
165
|
+
|
|
166
|
+
9. Accepting Warranty or Additional Liability. While redistributing
|
|
167
|
+
the Work or Derivative Works thereof, You may choose to offer,
|
|
168
|
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
|
169
|
+
or other liability obligations and/or rights consistent with this
|
|
170
|
+
License. However, in accepting such obligations, You may act only
|
|
171
|
+
on Your own behalf and on Your sole responsibility, not on behalf
|
|
172
|
+
of any other Contributor, and only if You agree to indemnify,
|
|
173
|
+
defend, and hold each Contributor harmless for any liability
|
|
174
|
+
incurred by, or claims asserted against, such Contributor by reason
|
|
175
|
+
of your accepting any such warranty or additional liability.
|
|
176
|
+
|
|
177
|
+
END OF TERMS AND CONDITIONS
|
|
178
|
+
|
|
179
|
+
APPENDIX: How to apply the Apache License to your work.
|
|
180
|
+
|
|
181
|
+
To apply the Apache License to your work, attach the following
|
|
182
|
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
|
183
|
+
replaced with your own identifying information. (Don't include
|
|
184
|
+
the brackets!) The text should be enclosed in the appropriate
|
|
185
|
+
comment syntax for the file format. We also recommend that a
|
|
186
|
+
file or class name and description of purpose be included on the
|
|
187
|
+
same "printed page" as the copyright notice for easier
|
|
188
|
+
identification within third-party archives.
|
|
189
|
+
|
|
190
|
+
Copyright 2026 Robert Li
|
|
191
|
+
|
|
192
|
+
Licensed under the Apache License, Version 2.0 (the "License");
|
|
193
|
+
you may not use this file except in compliance with the License.
|
|
194
|
+
You may obtain a copy of the License at
|
|
195
|
+
|
|
196
|
+
http://www.apache.org/licenses/LICENSE-2.0
|
|
197
|
+
|
|
198
|
+
Unless required by applicable law or agreed to in writing, software
|
|
199
|
+
distributed under the License is distributed on an "AS IS" BASIS,
|
|
200
|
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
201
|
+
See the License for the specific language governing permissions and
|
|
202
|
+
limitations under the License.
|
package/README.md
ADDED
|
@@ -0,0 +1,357 @@
|
|
|
1
|
+
<div align="center">
|
|
2
|
+
|
|
3
|
+
# Slice Tournament Zoo (STZ)
|
|
4
|
+
|
|
5
|
+
<pre>
|
|
6
|
+
██████╗ ████████╗ ███████╗
|
|
7
|
+
██╔════╝ ╚══██╔══╝ ╚══███╔╝
|
|
8
|
+
╚█████╗ ██║ ███╔╝
|
|
9
|
+
╚═══██╗ ██║ ███╔╝
|
|
10
|
+
██████╔╝ ██║ ███████╗
|
|
11
|
+
╚═════╝ ╚═╝ ╚══════╝
|
|
12
|
+
</pre>
|
|
13
|
+
|
|
14
|
+
[](https://github.com/dr-robert-li/slice-tournament-zoo/actions/workflows/ci.yml)
|
|
15
|
+
[](./LICENSE)
|
|
16
|
+
[](./package.json)
|
|
17
|
+
|
|
18
|
+
</div>
|
|
19
|
+
|
|
20
|
+
> An agentic-coding harness for "software-engineering dark factories with
|
|
21
|
+
> auditable outputs." Each slice is one interface contract plus its
|
|
22
|
+
> implementation plus its tests, implemented adversarially by N **specimens**.
|
|
23
|
+
> Survivors are selected by an eval-gate and a pairwise LLM judge against a
|
|
24
|
+
> **frozen, sealed** test suite the implementers never see. Every run leaves a
|
|
25
|
+
> markdown audit trail a human can replay.
|
|
26
|
+
|
|
27
|
+
## Contents
|
|
28
|
+
|
|
29
|
+
- [Requirements](#requirements)
|
|
30
|
+
- [Install](#install)
|
|
31
|
+
- [Use](#use)
|
|
32
|
+
- [Example commands and workflows](#example-commands-and-workflows)
|
|
33
|
+
- [Uninstall](#uninstall)
|
|
34
|
+
- [The pipeline](#the-pipeline-two-levels)
|
|
35
|
+
- [The audit tree](#the-stz-audit-tree)
|
|
36
|
+
- [Documentation](#documentation)
|
|
37
|
+
- [License](#license)
|
|
38
|
+
|
|
39
|
+
## Requirements
|
|
40
|
+
|
|
41
|
+
- Node.js 20 or newer.
|
|
42
|
+
- For the in-session harness: Claude Code (the CLI, desktop, or web app).
|
|
43
|
+
- No database, no vector service, no API keys beyond what Claude Code already
|
|
44
|
+
uses for its subagents.
|
|
45
|
+
|
|
46
|
+
## Install
|
|
47
|
+
|
|
48
|
+
STZ installs two ways: as a global CLI via **npm**, or as a **Claude Code
|
|
49
|
+
plugin**. They are complementary — the plugin drives the in-session `/stz:*`
|
|
50
|
+
commands, and the npm CLI gives you `stz init`, `stz run`, and direct
|
|
51
|
+
`stz bridge` access. Installing the npm CLI also satisfies the plugin's bridge
|
|
52
|
+
dependency without any `${CLAUDE_PLUGIN_ROOT}` fallback.
|
|
53
|
+
|
|
54
|
+
### Via npm (global CLI)
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
npm i -g slice-tournament-zoo # from npm
|
|
58
|
+
# or install straight from GitHub (no npm publish needed):
|
|
59
|
+
npm i -g dr-robert-li/slice-tournament-zoo
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
This puts `stz` on your `PATH` (`stz`, `stz init`, `stz run`, `stz bridge …`)
|
|
63
|
+
and bundles its `tsx` runtime, so it works offline after install. Requires
|
|
64
|
+
Node.js 20+. Run `stz` with no arguments to see the banner and commands.
|
|
65
|
+
|
|
66
|
+
### As a Claude Code plugin
|
|
67
|
+
|
|
68
|
+
From inside Claude Code, add the marketplace and install the plugin:
|
|
69
|
+
|
|
70
|
+
```text
|
|
71
|
+
/plugin marketplace add dr-robert-li/slice-tournament-zoo
|
|
72
|
+
/plugin install stz
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
This registers the project commands (`/stz:new`, `/stz:research`, `/stz:validate`,
|
|
76
|
+
`/stz:standards`, `/stz:tests`, `/stz:slice`, `/stz:summary`, `/stz:pipeline`,
|
|
77
|
+
`/stz:merge`) and `/stz:run`, the subagents (the per-slice specimen, judge,
|
|
78
|
+
test-author, cross-reference, documenter plus the project-level researcher,
|
|
79
|
+
validator, conventions, test-planner, slicer, summarizer), and a SessionStart hook
|
|
80
|
+
that announces STZ when a project contains a `.stz/` tree. Restart the session (or
|
|
81
|
+
reload) so the definitions load.
|
|
82
|
+
|
|
83
|
+
The plugin calls a bundled `stz bridge` CLI for every deterministic decision. If
|
|
84
|
+
you installed the npm CLI above, the commands use that `stz` directly. Otherwise
|
|
85
|
+
they resolve the bundled copy via `${CLAUDE_PLUGIN_ROOT}`, with no `PATH` setup
|
|
86
|
+
needed (Node.js 20+ is the only requirement; the bundled copy fetches `tsx` via
|
|
87
|
+
`npx` on first use, so that first call needs network).
|
|
88
|
+
|
|
89
|
+
> Developing STZ itself, or running the engine without Claude Code? See
|
|
90
|
+
> [`docs/development/local-and-testing.md`](./docs/development/local-and-testing.md).
|
|
91
|
+
|
|
92
|
+
## Use
|
|
93
|
+
|
|
94
|
+
### Scaffold a project
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
stz init . # create the .stz/ taxonomy + AGENTS.md in the current repo
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
This writes the tiered `.stz/` tree (`00-intent` through `90-audit`) and an
|
|
101
|
+
`AGENTS.md` table of contents. Nothing else is required to start.
|
|
102
|
+
|
|
103
|
+
### The full pipeline (in Claude Code)
|
|
104
|
+
|
|
105
|
+
`/stz:run` handles one slice. The full pipeline takes a project from an idea to a
|
|
106
|
+
completion report, one command per phase (a get-shit-done-style UX):
|
|
107
|
+
|
|
108
|
+
```text
|
|
109
|
+
/stz:new elicit intent + done-predicates + run config (batched Q&A)
|
|
110
|
+
/stz:research external (docs, prior art) + internal (codebase) research
|
|
111
|
+
/stz:validate ground-truth: verify each claim against reality, not recall
|
|
112
|
+
/stz:standards style, architecture, naming conventions
|
|
113
|
+
/stz:tests test strategy + coverage targets, locked BEFORE implementation
|
|
114
|
+
/stz:slice collaborative breakdown into a DAG of vertical slices
|
|
115
|
+
/stz:run <id> the adversarial tournament, once per slice
|
|
116
|
+
/stz:summary aggregate every document into one completion report
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
`/stz:pipeline` is a dashboard: it shows project-phase and per-slice status (plus
|
|
120
|
+
the run config), then dispatches the recommended next step (and can run
|
|
121
|
+
independent slices in parallel).
|
|
122
|
+
|
|
123
|
+
#### Run configuration (set once, applied everywhere)
|
|
124
|
+
|
|
125
|
+
`/stz:new` batches its questions per area and, at the end, captures a **run
|
|
126
|
+
config** the rest of the pipeline obeys — stored in `.stz/00-intent/run-config.json`
|
|
127
|
+
and surfaced by `stz bridge project-status` so every later command reads it in one
|
|
128
|
+
call:
|
|
129
|
+
|
|
130
|
+
| Choice | Set in `/stz:new` | Consumed by |
|
|
131
|
+
|---|---|---|
|
|
132
|
+
| **Slicing granularity** (`coarse`/`balanced`/`fine`) | area E | `/stz:slice` |
|
|
133
|
+
| **Specimen fan-out** (N, 2–16) | area E | `/stz:run` (the number of specimens) |
|
|
134
|
+
| **Model per role** (planning, research, execution, testing, validation, judging) | area E | each phase's subagent `model` |
|
|
135
|
+
| **Strictness** (coverage target, mutation policy, conventions) | area E | `/stz:standards`, `/stz:tests` |
|
|
136
|
+
|
|
137
|
+
Model choices follow the get-shit-done "Other" pattern: pick a suggested combo
|
|
138
|
+
(Balanced / Thrifty / Max quality) or type your own spawn alias
|
|
139
|
+
(`opus`/`sonnet`/`haiku`/`fable`) or model id. Anything unset falls back to a
|
|
140
|
+
balanced default, so the pipeline always has a complete config.
|
|
141
|
+
|
|
142
|
+
`--auto` means different things by scope, so keep the mental model straight:
|
|
143
|
+
|
|
144
|
+
- `/stz:run slice-01` runs that one slice's tournament and nothing else.
|
|
145
|
+
- `/stz:run slice-01 --auto` runs that one slice with no approval pause (it skips
|
|
146
|
+
the human winner-approval gate). It does **not** cascade to other slices.
|
|
147
|
+
- The project phase commands (`/stz:new --auto`, `/stz:research --auto`, …) each
|
|
148
|
+
chain to the next phase.
|
|
149
|
+
- `/stz:pipeline --auto` runs everything: it walks the DAG in dependency order,
|
|
150
|
+
fires `/stz:run` for each runnable slice (independent slices in the frontier in
|
|
151
|
+
parallel), and continues through to `/stz:summary`. This is the entry point for
|
|
152
|
+
"do the whole project automatically."
|
|
153
|
+
|
|
154
|
+
Two human gates remain even in full auto: confirming a done-predicate in
|
|
155
|
+
`/stz:new`, and approving the slice breakdown in `/stz:slice`.
|
|
156
|
+
|
|
157
|
+
#### Dark-factory mode (lights-out, fully autonomous)
|
|
158
|
+
|
|
159
|
+
`--auto` still pauses at those two human gates. **Dark-factory mode** goes one
|
|
160
|
+
step further: it skips *every* downstream human gate — the `/stz:slice` approval
|
|
161
|
+
and the `/stz:run` winner-approval — and runs the whole pipeline lights-out to a
|
|
162
|
+
final `/stz:summary` completion report. The only gate it cannot skip is the F2
|
|
163
|
+
done-predicate confirmation in `/stz:new`; acceptance criteria are never
|
|
164
|
+
auto-invented. Everything the run decides (DAG, winners, GRPO advantages, hack
|
|
165
|
+
findings) still lands in the `.stz/` audit tree for after-the-fact review.
|
|
166
|
+
|
|
167
|
+
It is offered once at the end of elicitation (after the predicate gate) and can be
|
|
168
|
+
flipped at any point:
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
stz bridge project-dark-factory --root . --on # engage; --off to disengage
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
The toggle only flips the `darkFactory` flag in the run config — it never resets
|
|
175
|
+
your fan-out / models / strictness. `project-status` hoists the flag to the top
|
|
176
|
+
level, so engaging it between phases takes effect immediately. See
|
|
177
|
+
[`docs/development/dark-factory.md`](docs/development/dark-factory.md) for the full
|
|
178
|
+
contract.
|
|
179
|
+
|
|
180
|
+
The DAG ordering and per-slice seeding are backed by the deterministic
|
|
181
|
+
`stz bridge project-status` (which computes the runnable frontier). The `--auto`
|
|
182
|
+
chaining itself is orchestration the agent follows from the command markdown, not
|
|
183
|
+
a hard-coded loop.
|
|
184
|
+
|
|
185
|
+
Each project-level phase writes its own `.stz/` tier and is settled once, before
|
|
186
|
+
any slice runs. When `/stz:slice` seeds the DAG, each slice inherits those early
|
|
187
|
+
phases as done, leaving only the tournament half for `/stz:run`. Project status
|
|
188
|
+
is derived from each slice's own `state.json`, so an interrupted pipeline resumes
|
|
189
|
+
by re-reading state. A worked run of the front phases (a `slugify` library) lives
|
|
190
|
+
in [`examples/full-pipeline/`](./examples/full-pipeline).
|
|
191
|
+
|
|
192
|
+
### Run a slice as a tournament (in Claude Code)
|
|
193
|
+
|
|
194
|
+
```text
|
|
195
|
+
/stz:run slice-01
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
You, the session, become the orchestrator. The command:
|
|
199
|
+
|
|
200
|
+
1. Reads or elicits the slice manifest (the contract plus at least one
|
|
201
|
+
machine-checkable done-predicate). It refuses prose-only acceptance.
|
|
202
|
+
2. Spawns a frozen **test-author** subagent to write the sealed held-out suite.
|
|
203
|
+
3. Spawns N **specimen** subagents in parallel, each implementing the contract
|
|
204
|
+
with a different strategy.
|
|
205
|
+
4. Runs the real eval runner over each specimen with `stz bridge eval`
|
|
206
|
+
(executed sealed suite, V8 coverage, mutation survival, hack-pattern
|
|
207
|
+
detection), then gates them.
|
|
208
|
+
5. Spawns **judge** subagents for pairwise votes across the survivors.
|
|
209
|
+
6. Selects a winner with `stz bridge select` (two-stage selection plus GRPO).
|
|
210
|
+
7. Pauses for your approval of the winner, then spawns a **documenter** and
|
|
211
|
+
writes the spec-diff, pressure log, and audit journal.
|
|
212
|
+
|
|
213
|
+
Every exact decision is made by the CLI, never by the agent's own arithmetic.
|
|
214
|
+
|
|
215
|
+
## Example commands and workflows
|
|
216
|
+
|
|
217
|
+
### A whole project (the full pipeline)
|
|
218
|
+
|
|
219
|
+
Run the project-level phases once, let `/stz:slice` break the work into a DAG and
|
|
220
|
+
seed the slices, then let `/stz:pipeline` drive each slice's tournament in
|
|
221
|
+
dependency order:
|
|
222
|
+
|
|
223
|
+
```text
|
|
224
|
+
/stz:new # elicit intent + done-predicates
|
|
225
|
+
/stz:research # external + internal research
|
|
226
|
+
/stz:validate # ground-truth the research
|
|
227
|
+
/stz:standards # conventions
|
|
228
|
+
/stz:tests # test strategy, before any code
|
|
229
|
+
/stz:slice # co-design the slice DAG; seeds 40-slices/<id> manifests
|
|
230
|
+
/stz:pipeline # dashboard: dispatches /stz:run for each slice in dep order
|
|
231
|
+
/stz:summary # completion report once the slices are done
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
You do not hand-author slice manifests or run `/stz:run` by hand here. `/stz:slice`
|
|
235
|
+
creates the manifests and `/stz:pipeline` sequences the tournaments. To run the
|
|
236
|
+
whole thing automatically, `/stz:pipeline --auto` walks the DAG and dispatches
|
|
237
|
+
each slice through to the summary. (Note: `/stz:run --auto` is single-slice only;
|
|
238
|
+
it just skips that slice's winner-approval pause and does not cascade.)
|
|
239
|
+
|
|
240
|
+
### A single slice, standalone (no project)
|
|
241
|
+
|
|
242
|
+
For a one-off slice without the project pipeline, `/stz:run <name>` elicits its
|
|
243
|
+
own contract and one done-predicate if no manifest exists, runs the tournament,
|
|
244
|
+
then you read the result:
|
|
245
|
+
|
|
246
|
+
```text
|
|
247
|
+
/stz:run payment-validator
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
```bash
|
|
251
|
+
cat .stz/40-slices/payment-validator/spec-diff.md # intent vs as-built
|
|
252
|
+
cat .stz/50-pressure/payment-validator/pressure.md # why the losers lost
|
|
253
|
+
cat .stz/90-audit/journal.md # the replayable event log
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
### Inspect a worked example without running anything
|
|
257
|
+
|
|
258
|
+
```bash
|
|
259
|
+
# a real tournament (one slice)
|
|
260
|
+
cat examples/clamp-tournament/stz-tree/40-slices/slice-01/tournament.md
|
|
261
|
+
# a real project front-pipeline (slugify)
|
|
262
|
+
cat examples/full-pipeline/stz-tree/90-audit/SUMMARY.md
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
`clamp-tournament`: four specimens implement `clamp`; a planted network-bypass
|
|
266
|
+
cheater passes all 304 sealed checks but is disqualified at the gate; the winner
|
|
267
|
+
is chosen by six judge votes and the highest GRPO advantage. `full-pipeline`: the
|
|
268
|
+
project phases run for a `slugify` library through to a seeded slice DAG.
|
|
269
|
+
|
|
270
|
+
## Uninstall
|
|
271
|
+
|
|
272
|
+
### Remove the plugin
|
|
273
|
+
|
|
274
|
+
```text
|
|
275
|
+
/plugin uninstall stz
|
|
276
|
+
/plugin marketplace remove dr-robert-li/slice-tournament-zoo
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
### Remove the CLI
|
|
280
|
+
|
|
281
|
+
```bash
|
|
282
|
+
npm unlink -g stz # if you used `npm link`
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### Remove harness data from a project
|
|
286
|
+
|
|
287
|
+
The `.stz/` tree is the only thing STZ writes into your repo. Delete it to
|
|
288
|
+
remove all harness state:
|
|
289
|
+
|
|
290
|
+
```bash
|
|
291
|
+
rm -rf .stz AGENTS.md
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
Nothing else is touched. There is no external state to clean up.
|
|
295
|
+
|
|
296
|
+
## The pipeline (two levels)
|
|
297
|
+
|
|
298
|
+
The pipeline runs at two levels. The **project level** settles intent, research,
|
|
299
|
+
conventions, and test strategy once for the whole project. **Slice
|
|
300
|
+
disaggregation** then breaks the work into a DAG and seeds each slice, marking
|
|
301
|
+
those early phases done so they are not repeated. Each slice then runs only the
|
|
302
|
+
**tournament half**.
|
|
303
|
+
|
|
304
|
+
```text
|
|
305
|
+
PROJECT (once):
|
|
306
|
+
elicit (/stz:new) -> research (/stz:research) -> ground-truth (/stz:validate)
|
|
307
|
+
-> standards (/stz:standards) -> test strategy (/stz:tests)
|
|
308
|
+
-> slice disaggregation (/stz:slice) [seeds each slice; early phases done]
|
|
309
|
+
|
|
310
|
+
PER SLICE (/stz:run <id>, sequenced by /stz:pipeline over the DAG):
|
|
311
|
+
test-author (frozen, sealed held-out suite)
|
|
312
|
+
-> spawn N specimens in parallel
|
|
313
|
+
-> eval-gate (sealed suite + coverage + mutation + hack-pattern detect)
|
|
314
|
+
-> judge (pairwise votes, GRPO group-relative advantage)
|
|
315
|
+
-> winner -> as-built spec -> spec-diff -> state.json checkpoint
|
|
316
|
+
|
|
317
|
+
FINISH:
|
|
318
|
+
/stz:summary -> completion report across every slice
|
|
319
|
+
|
|
320
|
+
failure (bounded): no passers -> 1 GRPO retry -> 1 replan -> halt + report
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
Note: the standalone mock demo (`stz run`, no Claude Code) runs all eight phases
|
|
324
|
+
inside a single slice for a self-contained, no-network smoke test. The two-level
|
|
325
|
+
split above is the real in-session flow.
|
|
326
|
+
|
|
327
|
+
## The `.stz/` audit tree
|
|
328
|
+
|
|
329
|
+
| Tier | Purpose |
|
|
330
|
+
| ---- | ------- |
|
|
331
|
+
| `00-intent/` | project + intent manifests, elicitation, done-predicates |
|
|
332
|
+
| `10-research/` | external/internal research, ground-truth validation |
|
|
333
|
+
| `20-standards/` | versioned conventions, ADRs |
|
|
334
|
+
| `30-tests/` | test strategy, rubric, sealed held-out suite |
|
|
335
|
+
| `40-slices/` | the slice DAG, manifests, specimen prototypes, tournament, spec-diff |
|
|
336
|
+
| `50-pressure/` | culled specimens' diffs and critiques (the pressure log) |
|
|
337
|
+
| `90-audit/` | project state, journal, call ledger, cost, completion report, SUMMARY |
|
|
338
|
+
|
|
339
|
+
## Documentation
|
|
340
|
+
|
|
341
|
+
For contributors and anyone going past day-to-day operation:
|
|
342
|
+
|
|
343
|
+
- **Contributing** — setup, the architecture rule, the quality bar:
|
|
344
|
+
[`CONTRIBUTING.md`](./CONTRIBUTING.md).
|
|
345
|
+
- **Source layout** — the `src/` module map: [`src/README.md`](./src/README.md).
|
|
346
|
+
- **Local development & testing** — run the engine without Claude Code, the mock
|
|
347
|
+
pipeline, CI checks: [`docs/development/local-and-testing.md`](./docs/development/local-and-testing.md).
|
|
348
|
+
- **The bridge CLI** — the deterministic `stz bridge` subcommands:
|
|
349
|
+
[`docs/development/bridge-cli.md`](./docs/development/bridge-cli.md).
|
|
350
|
+
- **Sealed-suite integrity** — the guide-vs-sensor contract behind the frozen
|
|
351
|
+
held-out suite: [`docs/development/sealed-suite.md`](./docs/development/sealed-suite.md).
|
|
352
|
+
- **Requirement-to-test mapping** — [`docs/TESTPLAN.md`](./docs/TESTPLAN.md).
|
|
353
|
+
- **What is real versus deferred** — [`docs/AS-BUILT.md`](./docs/AS-BUILT.md).
|
|
354
|
+
|
|
355
|
+
## License
|
|
356
|
+
|
|
357
|
+
[Apache-2.0](./LICENSE).
|
package/bin/stz.mjs
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
// npx stz entrypoint. Runs the TS CLI via tsx so the package needs no build
|
|
3
|
+
// step (N10: minimal toolchain). For a published build this would point at
|
|
4
|
+
// compiled dist/cli.js; for the source-available template repo, tsx is fine.
|
|
5
|
+
import { spawnSync } from "node:child_process";
|
|
6
|
+
import { fileURLToPath } from "node:url";
|
|
7
|
+
import { dirname, join } from "node:path";
|
|
8
|
+
|
|
9
|
+
const here = dirname(fileURLToPath(import.meta.url));
|
|
10
|
+
const cli = join(here, "..", "src", "cli.ts");
|
|
11
|
+
const r = spawnSync("npx", ["tsx", cli, ...process.argv.slice(2)], {
|
|
12
|
+
stdio: "inherit",
|
|
13
|
+
shell: process.platform === "win32",
|
|
14
|
+
});
|
|
15
|
+
process.exit(r.status ?? 1);
|
package/package.json
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "slice-tournament-zoo",
|
|
3
|
+
"version": "0.5.6",
|
|
4
|
+
"description": "STZ: a contract-bounded slice pipeline that implements each slice adversarially via an N-specimen tournament with frozen sealed tests, GRPO-style selection, layered anti-reward-hacking, and a replayable markdown audit trail.",
|
|
5
|
+
"license": "Apache-2.0",
|
|
6
|
+
"type": "module",
|
|
7
|
+
"bin": {
|
|
8
|
+
"stz": "bin/stz.mjs"
|
|
9
|
+
},
|
|
10
|
+
"exports": {
|
|
11
|
+
".": "./src/index.ts"
|
|
12
|
+
},
|
|
13
|
+
"files": [
|
|
14
|
+
"src",
|
|
15
|
+
"bin"
|
|
16
|
+
],
|
|
17
|
+
"scripts": {
|
|
18
|
+
"stz": "tsx src/cli.ts",
|
|
19
|
+
"test": "vitest run",
|
|
20
|
+
"test:watch": "vitest",
|
|
21
|
+
"typecheck": "tsc --noEmit",
|
|
22
|
+
"prepublishOnly": "npm run typecheck && npm test"
|
|
23
|
+
},
|
|
24
|
+
"engines": {
|
|
25
|
+
"node": ">=20"
|
|
26
|
+
},
|
|
27
|
+
"dependencies": {
|
|
28
|
+
"tsx": "^4.19.0"
|
|
29
|
+
},
|
|
30
|
+
"devDependencies": {
|
|
31
|
+
"@types/node": "^22.10.0",
|
|
32
|
+
"typescript": "^5.7.0",
|
|
33
|
+
"vitest": "^2.1.0"
|
|
34
|
+
}
|
|
35
|
+
}
|
package/src/README.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
# Module map (`src/`)
|
|
2
|
+
|
|
3
|
+
Production spine: `types.ts` (schema), `taxonomy.ts` (tree and frontmatter),
|
|
4
|
+
`state.ts` (checkpoint and recovery), `grpo.ts`, `selection.ts`,
|
|
5
|
+
`hack-detector.ts`, `escalation.ts`, `budget.ts`, `cost-tracker.ts`,
|
|
6
|
+
`pressure.ts`, `specdiff.ts`, `eval-runner.ts` (real tests, coverage, mutation),
|
|
7
|
+
`project.ts` (the project DAG driver), and `bridge.ts` (the in-session CLI,
|
|
8
|
+
per-slice and project subcommands).
|
|
9
|
+
|
|
10
|
+
The `mock/` subfolder is the no-network testing harness (the `stz run` demo):
|
|
11
|
+
its orchestrator, the model-layer seam, and the deterministic mock. Not part of
|
|
12
|
+
the production path — see [`mock/`](./mock).
|
|
13
|
+
|
|
14
|
+
## Further reading
|
|
15
|
+
|
|
16
|
+
- The requirement-to-test mapping is in [`docs/TESTPLAN.md`](../docs/TESTPLAN.md).
|
|
17
|
+
- What is real versus deferred is in [`docs/AS-BUILT.md`](../docs/AS-BUILT.md).
|
|
18
|
+
- Running the engine locally / in CI: [`docs/development/local-and-testing.md`](../docs/development/local-and-testing.md).
|
|
19
|
+
- The deterministic bridge CLI: [`docs/development/bridge-cli.md`](../docs/development/bridge-cli.md).
|