@pickled-dev/cli 0.16.1 → 0.17.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +38 -2
- package/dist/index.js +114 -112
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -76,12 +76,14 @@ Run agent scenarios against registered sources.
|
|
|
76
76
|
| `-o, --output <file>` | Save JSON report to file |
|
|
77
77
|
| `-v, --verbose` | Show progress while scenarios run |
|
|
78
78
|
| `-t, --threshold <n>` | Minimum score percent needed to pass |
|
|
79
|
-
| `--target <name>` |
|
|
79
|
+
| `--target <name>` | Restrict to the named target. Overrides `matrix.target` for non-matrix scenarios; for matrix scenarios, also acts as `--interface` unless that flag is explicitly set. |
|
|
80
80
|
| `--scenario <name>` | Run only the named scenario (CI-matrix-friendly) |
|
|
81
|
-
| `--interface <name>` | Matrix cell filter: run only cells with this interface
|
|
81
|
+
| `--interface <name>` | Matrix cell filter: run only cells with this interface. Takes precedence over `--target` for matrix cells. |
|
|
82
82
|
| `--source <name>` | Matrix cell filter: run only cells with this source id |
|
|
83
83
|
| `--toolset <name>` | Matrix cell filter: run only cells with this toolset name |
|
|
84
84
|
|
|
85
|
+
`--target` and `--interface` are related but distinct: `--target` is the legacy flag that narrows the top-level `matrix.target` axis (used before per-scenario `scenario.matrix.interfaces` shipped in v0.16.0). When `--target` is the only flag passed, the CLI also applies it as `--interface` so matrix scenarios narrow consistently. Pass `--interface` explicitly to override.
|
|
86
|
+
|
|
85
87
|
The cell filters work with `scenario.matrix` declarations. Designed for GitHub Actions matrix usage where each CI job runs one cell:
|
|
86
88
|
|
|
87
89
|
```yaml
|
|
@@ -160,6 +162,40 @@ Codebase loader safety defaults: skips directories (`onlyFiles`), does not follo
|
|
|
160
162
|
|
|
161
163
|
URL sources are NOT scanned by the audit's trap cross-reference in v1; they are fetched only during `pickled check`.
|
|
162
164
|
|
|
165
|
+
## Toolsets
|
|
166
|
+
|
|
167
|
+
Matrix mode (`scenario.matrix.toolsets`) iterates each scenario across named toolset profiles. v0.16.x ships two:
|
|
168
|
+
|
|
169
|
+
- **`none`** (the deterministic baseline). Pickled injects the cell's active source content into the agent's prompt. Citation contract applies if `requiredSources` is declared. Same scoring shape as non-matrix scenarios.
|
|
170
|
+
- **`web`** on Claude Code only. Maps to `allowedTools: ["WebSearch", "WebFetch"]` on the cell's Claude Code target. Source is NOT injected; the cell's prompt is rewritten to name the active source as the discovery target ("the canonical source for this question is at ..."). Citation contract is skipped; the cell scores on traps + `expected.includes`/`excludes`.
|
|
171
|
+
|
|
172
|
+
Declare profiles at the top level of `pickled.yml`:
|
|
173
|
+
|
|
174
|
+
```yaml
|
|
175
|
+
toolsets:
|
|
176
|
+
none: {}
|
|
177
|
+
web:
|
|
178
|
+
webSearch: true
|
|
179
|
+
webFetch: true
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
Then reference them per scenario:
|
|
183
|
+
|
|
184
|
+
```yaml
|
|
185
|
+
scenarios:
|
|
186
|
+
- name: "Install"
|
|
187
|
+
matrix:
|
|
188
|
+
interfaces: [quick]
|
|
189
|
+
sources: [llms]
|
|
190
|
+
toolsets: [none, web]
|
|
191
|
+
expected:
|
|
192
|
+
includes: ["bunx pickled"]
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
That scenario produces 2 cells: `[quick · llms · none]` (injected) and `[quick · llms · web]` (discovered via tools).
|
|
196
|
+
|
|
197
|
+
Custom toolset names that have no recognized adapter throw a clear "not yet implemented" error per cell. Web toolset on a non-Claude-Code interface throws "implemented only on the claude-code interface" so the misconfiguration is obvious.
|
|
198
|
+
|
|
163
199
|
## Targets
|
|
164
200
|
|
|
165
201
|
Pickled ships three target shapes today. Each target is a distinct surface that exercises the agent differently; results are comparable but not identical.
|