skilltest 0.4.0 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +13 -0
- package/dist/index.js +479 -125
- package/dist/index.js.map +1 -1
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -128,12 +128,17 @@ Flow:
|
|
|
128
128
|
- realistic fake skills
|
|
129
129
|
4. Computes TP, TN, FP, FN, precision, recall, F1.
|
|
130
130
|
|
|
131
|
+
For reproducible fake-skill sampling, pass `--seed <number>`. When a seed is used,
|
|
132
|
+
terminal and JSON output include it so the run can be repeated exactly. If you use
|
|
133
|
+
`.skilltestrc`, `trigger.seed` sets the default and the CLI flag overrides it.
|
|
134
|
+
|
|
131
135
|
Flags:
|
|
132
136
|
|
|
133
137
|
- `--model <model>` default: `claude-sonnet-4-5-20250929`
|
|
134
138
|
- `--provider <anthropic|openai>` default: `anthropic`
|
|
135
139
|
- `--queries <path>` use custom queries JSON
|
|
136
140
|
- `--num-queries <n>` default: `20` (must be even)
|
|
141
|
+
- `--seed <number>` RNG seed for reproducible fake-skill sampling
|
|
137
142
|
- `--save-queries <path>` save generated query set
|
|
138
143
|
- `--api-key <key>` explicit key override
|
|
139
144
|
- `--verbose` show full model decision text
|
|
@@ -178,6 +183,7 @@ Flags:
|
|
|
178
183
|
- `--api-key <key>` explicit key override
|
|
179
184
|
- `--queries <path>` custom trigger queries JSON
|
|
180
185
|
- `--num-queries <n>` default: `20` (must be even)
|
|
186
|
+
- `--seed <number>` RNG seed for reproducible trigger sampling
|
|
181
187
|
- `--prompts <path>` custom eval prompts JSON
|
|
182
188
|
- `--min-f1 <n>` default: `0.8`
|
|
183
189
|
- `--min-assert-pass-rate <n>` default: `0.9`
|
|
@@ -240,6 +246,12 @@ skilltest eval ./skill --json
|
|
|
240
246
|
skilltest check ./skill --json
|
|
241
247
|
```
|
|
242
248
|
|
|
249
|
+
Seeded trigger example:
|
|
250
|
+
|
|
251
|
+
```bash
|
|
252
|
+
skilltest trigger ./skill --seed 123
|
|
253
|
+
```
|
|
254
|
+
|
|
243
255
|
## API Keys
|
|
244
256
|
|
|
245
257
|
Anthropic:
|
|
@@ -344,6 +356,7 @@ Smoke tests:
|
|
|
344
356
|
```bash
|
|
345
357
|
node dist/index.js lint test-fixtures/sample-skill/
|
|
346
358
|
node dist/index.js trigger test-fixtures/sample-skill/ --num-queries 2
|
|
359
|
+
node dist/index.js trigger test-fixtures/sample-skill/ --queries path/to/queries.json --seed 123
|
|
347
360
|
node dist/index.js eval test-fixtures/sample-skill/ --prompts test-fixtures/eval-prompts.json
|
|
348
361
|
node dist/index.js check test-fixtures/sample-skill/ --num-queries 2 --prompts test-fixtures/eval-prompts.json
|
|
349
362
|
```
|