npm - @pickled-dev/cli - Versions diffs - 0.17.2 → 0.18.0 - Mend

@pickled-dev/cli 0.17.2 → 0.18.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -164,10 +164,17 @@ URL sources are NOT scanned by the audit's trap cross-reference in v1; they are
 ## Toolsets
-Matrix mode (`scenario.matrix.toolsets`) iterates each scenario across named toolset profiles. Two profiles ship today:
+Matrix mode (`scenario.matrix.toolsets`) iterates each scenario across named toolset profiles. Three shapes ship today:
 - **`none`** (the deterministic baseline). Pickled injects the cell's active source content into the agent's prompt. Citation contract applies if `requiredSources` is declared. Same scoring shape as non-matrix scenarios.
-- **`web`** on Claude Code only. Maps to `allowedTools: ["WebSearch", "WebFetch"]` on the cell's Claude Code target. Source is NOT injected; the cell's prompt is rewritten to name the active source as the discovery target ("the canonical source for this question is at ..."). Citation contract is skipped; the cell scores on traps + `expected.includes`/`excludes` + tool-use provenance. Tool-use provenance is a hard veto: a cell that does not invoke at least one of the configured web tools is forced to `NO` with confidence `0`, because an answer pulled from model prior knowledge cannot testify to the tool path the cell is meant to test.
+- **`web`** on Claude Code only. Set `webSearch: true` and/or `webFetch: true`. The cell scopes the SDK's built-in tool set to exactly those web tools (`tools: ["WebSearch", "WebFetch"]`) so default Read/Edit/Bash do not leak, and adds them to `allowedTools` so they execute without permission prompts. Source is NOT injected; the cell's prompt is rewritten to name the active source as the discovery target. Citation contract is skipped; the cell scores on traps + `expected.includes`/`excludes` + tool-use provenance.
+- **`mcp`** on Claude Code only. Declare `mcpServers` (a map of server name to `McpServerConfig` with `stdio`, `http`, or `sse` transport). The cell sets `tools: []` (all built-ins disabled; MCP tools come from `mcpServers`) and `allowedTools: ["mcp__<server>__*", ...]` (auto-permission for the configured server namespaces). Tool-use provenance accepts any invocation of `mcp__<server>__*` for any configured server.
+The SDK `tools` option (not `allowedTools`) is what actually restricts which tools the agent can call: `allowedTools` is only a permission-prompt-bypass list. Pickled sets both for non-none cells so the agent is confined to the configured tool path with no fallback to local filesystem tools.
+Tool-use provenance (web and MCP) is a hard veto. A cell that does not invoke at least one of the configured tools is forced to `NO` with confidence `0`, because an answer pulled from model prior knowledge cannot testify to the tool path the cell is meant to test.
+For non-none cells, scenario-level `context` overrides for `allowedTools`, `disallowedTools`, and `mcpServers` are ignored: the toolset declaration is the single source of truth so the cell label honestly describes what the agent had available. None cells still honor `context` as before.
 Declare profiles at the top level of `pickled.yml`:
@@ -177,8 +184,19 @@ toolsets:
   web:
     webSearch: true
     webFetch: true
+  context7_mcp:
+    mcpServers:
+      context7:
+        type: http
+        url: https://mcp.context7.com/mcp
+        headers:
+          CONTEXT7_API_KEY: ${CONTEXT7_API_KEY}
 ```
+Mixing `webSearch`/`webFetch` and `mcpServers` in the same toolset is rejected: declare separate toolsets so provenance can be attributed to one tool path.
+String values in `pickled.yml` that match `${UPPER_SNAKE_CASE}` are replaced with the corresponding `process.env` entry at load time, so secrets (MCP auth headers, API keys) stay out of the config file. Missing env vars become empty strings so the failure surfaces at the call site (e.g., a 401 from the MCP server) rather than at config load. Bun auto-loads `.env`, so the conventional dotfile works.
 Then reference them per scenario:
 ```yaml
@@ -187,14 +205,14 @@ scenarios:
     matrix:
       interfaces: [quick]
       sources: [llms]
-      toolsets: [none, web]
+      toolsets: [none, web, context7_mcp]
     expected:
       includes: ["bunx pickled"]
 ```
-That scenario produces 2 cells: `[quick · llms · none]` (injected) and `[quick · llms · web]` (discovered via tools).
+That scenario produces 3 cells: `[quick · llms · none]` (injected), `[quick · llms · web]` (discovered via web tools), `[quick · llms · context7_mcp]` (discovered via Context7 MCP).
-Custom toolset names that have no recognized adapter throw a clear "not yet implemented" error per cell. Web toolset on a non-Claude-Code interface throws "implemented only on the claude-code interface" so the misconfiguration is obvious.
+Toolsets that declare neither web flags nor `mcpServers`, and toolsets on a non-Claude-Code interface, throw clear errors per cell so the misconfiguration is obvious.
 ## Targets