@pickled-dev/cli 0.17.2 → 0.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +23 -5
- package/dist/index.js +112 -112
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -164,10 +164,17 @@ URL sources are NOT scanned by the audit's trap cross-reference in v1; they are
|
|
|
164
164
|
|
|
165
165
|
## Toolsets
|
|
166
166
|
|
|
167
|
-
Matrix mode (`scenario.matrix.toolsets`) iterates each scenario across named toolset profiles.
|
|
167
|
+
Matrix mode (`scenario.matrix.toolsets`) iterates each scenario across named toolset profiles. Three shapes ship today:
|
|
168
168
|
|
|
169
169
|
- **`none`** (the deterministic baseline). Pickled injects the cell's active source content into the agent's prompt. Citation contract applies if `requiredSources` is declared. Same scoring shape as non-matrix scenarios.
|
|
170
|
-
- **`web`** on Claude Code only.
|
|
170
|
+
- **`web`** on Claude Code only. Set `webSearch: true` and/or `webFetch: true`. The cell scopes the SDK's built-in tool set to exactly those web tools (`tools: ["WebSearch", "WebFetch"]`) so default Read/Edit/Bash do not leak, and adds them to `allowedTools` so they execute without permission prompts. Source is NOT injected; the cell's prompt is rewritten to name the active source as the discovery target. Citation contract is skipped; the cell scores on traps + `expected.includes`/`excludes` + tool-use provenance.
|
|
171
|
+
- **`mcp`** on Claude Code only. Declare `mcpServers` (a map of server name to `McpServerConfig` with `stdio`, `http`, or `sse` transport). The cell sets `tools: []` (all built-ins disabled; MCP tools come from `mcpServers`) and `allowedTools: ["mcp__<server>__*", ...]` (auto-permission for the configured server namespaces). Tool-use provenance accepts any invocation of `mcp__<server>__*` for any configured server.
|
|
172
|
+
|
|
173
|
+
The SDK `tools` option (not `allowedTools`) is what actually restricts which tools the agent can call: `allowedTools` is only a permission-prompt-bypass list. Pickled sets both for non-none cells so the agent is confined to the configured tool path with no fallback to local filesystem tools.
|
|
174
|
+
|
|
175
|
+
Tool-use provenance (web and MCP) is a hard veto. A cell that does not invoke at least one of the configured tools is forced to `NO` with confidence `0`, because an answer pulled from model prior knowledge cannot testify to the tool path the cell is meant to test.
|
|
176
|
+
|
|
177
|
+
For non-none cells, scenario-level `context` overrides for `allowedTools`, `disallowedTools`, and `mcpServers` are ignored: the toolset declaration is the single source of truth so the cell label honestly describes what the agent had available. None cells still honor `context` as before.
|
|
171
178
|
|
|
172
179
|
Declare profiles at the top level of `pickled.yml`:
|
|
173
180
|
|
|
@@ -177,8 +184,19 @@ toolsets:
|
|
|
177
184
|
web:
|
|
178
185
|
webSearch: true
|
|
179
186
|
webFetch: true
|
|
187
|
+
context7_mcp:
|
|
188
|
+
mcpServers:
|
|
189
|
+
context7:
|
|
190
|
+
type: http
|
|
191
|
+
url: https://mcp.context7.com/mcp
|
|
192
|
+
headers:
|
|
193
|
+
CONTEXT7_API_KEY: ${CONTEXT7_API_KEY}
|
|
180
194
|
```
|
|
181
195
|
|
|
196
|
+
Mixing `webSearch`/`webFetch` and `mcpServers` in the same toolset is rejected: declare separate toolsets so provenance can be attributed to one tool path.
|
|
197
|
+
|
|
198
|
+
String values in `pickled.yml` that match `${UPPER_SNAKE_CASE}` are replaced with the corresponding `process.env` entry at load time, so secrets (MCP auth headers, API keys) stay out of the config file. Missing env vars become empty strings so the failure surfaces at the call site (e.g., a 401 from the MCP server) rather than at config load. Bun auto-loads `.env`, so the conventional dotfile works.
|
|
199
|
+
|
|
182
200
|
Then reference them per scenario:
|
|
183
201
|
|
|
184
202
|
```yaml
|
|
@@ -187,14 +205,14 @@ scenarios:
|
|
|
187
205
|
matrix:
|
|
188
206
|
interfaces: [quick]
|
|
189
207
|
sources: [llms]
|
|
190
|
-
toolsets: [none, web]
|
|
208
|
+
toolsets: [none, web, context7_mcp]
|
|
191
209
|
expected:
|
|
192
210
|
includes: ["bunx pickled"]
|
|
193
211
|
```
|
|
194
212
|
|
|
195
|
-
That scenario produces
|
|
213
|
+
That scenario produces 3 cells: `[quick · llms · none]` (injected), `[quick · llms · web]` (discovered via web tools), `[quick · llms · context7_mcp]` (discovered via Context7 MCP).
|
|
196
214
|
|
|
197
|
-
|
|
215
|
+
Toolsets that declare neither web flags nor `mcpServers`, and toolsets on a non-Claude-Code interface, throw clear errors per cell so the misconfiguration is obvious.
|
|
198
216
|
|
|
199
217
|
## Targets
|
|
200
218
|
|