skilltest 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -128,12 +128,17 @@ Flow:
128
128
  - realistic fake skills
129
129
  4. Computes TP, TN, FP, FN, precision, recall, F1.
130
130
 
131
+ For reproducible fake-skill sampling, pass `--seed <number>`. When a seed is used,
132
+ terminal and JSON output include it so the run can be repeated exactly. If you use
133
+ `.skilltestrc`, `trigger.seed` sets the default and the CLI flag overrides it.
134
+
131
135
  Flags:
132
136
 
133
137
  - `--model <model>` default: `claude-sonnet-4-5-20250929`
134
138
  - `--provider <anthropic|openai>` default: `anthropic`
135
139
  - `--queries <path>` use custom queries JSON
136
140
  - `--num-queries <n>` default: `20` (must be even)
141
+ - `--seed <number>` RNG seed for reproducible fake-skill sampling
137
142
  - `--save-queries <path>` save generated query set
138
143
  - `--api-key <key>` explicit key override
139
144
  - `--verbose` show full model decision text
@@ -178,6 +183,7 @@ Flags:
178
183
  - `--api-key <key>` explicit key override
179
184
  - `--queries <path>` custom trigger queries JSON
180
185
  - `--num-queries <n>` default: `20` (must be even)
186
+ - `--seed <number>` RNG seed for reproducible trigger sampling
181
187
  - `--prompts <path>` custom eval prompts JSON
182
188
  - `--min-f1 <n>` default: `0.8`
183
189
  - `--min-assert-pass-rate <n>` default: `0.9`
@@ -240,6 +246,12 @@ skilltest eval ./skill --json
240
246
  skilltest check ./skill --json
241
247
  ```
242
248
 
249
+ Seeded trigger example:
250
+
251
+ ```bash
252
+ skilltest trigger ./skill --seed 123
253
+ ```
254
+
243
255
  ## API Keys
244
256
 
245
257
  Anthropic:
@@ -344,6 +356,7 @@ Smoke tests:
344
356
  ```bash
345
357
  node dist/index.js lint test-fixtures/sample-skill/
346
358
  node dist/index.js trigger test-fixtures/sample-skill/ --num-queries 2
359
+ node dist/index.js trigger test-fixtures/sample-skill/ --queries path/to/queries.json --seed 123
347
360
  node dist/index.js eval test-fixtures/sample-skill/ --prompts test-fixtures/eval-prompts.json
348
361
  node dist/index.js check test-fixtures/sample-skill/ --num-queries 2 --prompts test-fixtures/eval-prompts.json
349
362
  ```