@pickled-dev/cli 0.3.0 โ 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +52 -64
- package/dist/index.js +237 -216
- package/package.json +6 -4
package/README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
# @pickled-dev/cli
|
|
2
2
|
|
|
3
|
-
>
|
|
3
|
+
> Test what agents actually understand about your product
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Pickled runs scenarios against real agent targets, checks citations against registered sources, and matches declared traps deterministically. No LLM grades another LLM.
|
|
6
6
|
|
|
7
7
|
## Installation
|
|
8
8
|
|
|
@@ -22,25 +22,26 @@ Creates a `pickled.yml` file:
|
|
|
22
22
|
|
|
23
23
|
```yaml
|
|
24
24
|
tool:
|
|
25
|
-
name: "your-
|
|
26
|
-
description: "What your
|
|
25
|
+
name: "your-product"
|
|
26
|
+
description: "What your product does"
|
|
27
27
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
28
|
+
docs:
|
|
29
|
+
sources:
|
|
30
|
+
readme: ./README.md
|
|
31
31
|
|
|
32
|
+
scenarios:
|
|
32
33
|
- name: "Getting started"
|
|
33
|
-
prompt: "How do I set up this
|
|
34
|
+
prompt: "How do I install and set up this product?"
|
|
35
|
+
requiredSources: [readme]
|
|
34
36
|
|
|
35
|
-
|
|
36
|
-
prompt: "Show me a basic example of using this tool"
|
|
37
|
+
threshold: 80
|
|
37
38
|
```
|
|
38
39
|
|
|
39
40
|
### 2. Edit your config
|
|
40
41
|
|
|
41
|
-
|
|
42
|
+
Declare the sources agents should cite, the scenarios they should answer, and any stale patterns you want traps to catch.
|
|
42
43
|
|
|
43
|
-
### 3. Run check
|
|
44
|
+
### 3. Run the check
|
|
44
45
|
|
|
45
46
|
```bash
|
|
46
47
|
pickled check
|
|
@@ -52,75 +53,62 @@ pickled check
|
|
|
52
53
|
|
|
53
54
|
Create a starter `pickled.yml` config file.
|
|
54
55
|
|
|
56
|
+
### `pickled audit [path]`
|
|
57
|
+
|
|
58
|
+
Static scan of agent-context files. No LLM calls.
|
|
59
|
+
|
|
55
60
|
### `pickled check [path]`
|
|
56
61
|
|
|
57
|
-
Run
|
|
62
|
+
Run agent scenarios against registered sources.
|
|
58
63
|
|
|
59
|
-
| Option | Description
|
|
60
|
-
| --------------------- |
|
|
61
|
-
| `--json` | Output as JSON
|
|
62
|
-
| `-o, --output <file>` | Save report to file
|
|
63
|
-
| `-v, --verbose` | Show
|
|
64
|
-
| `-t, --threshold <n>` |
|
|
64
|
+
| Option | Description |
|
|
65
|
+
| --------------------- | ----------------------------------- |
|
|
66
|
+
| `--json` | Output as JSON |
|
|
67
|
+
| `-o, --output <file>` | Save JSON report to file |
|
|
68
|
+
| `-v, --verbose` | Show progress while scenarios run |
|
|
69
|
+
| `-t, --threshold <n>` | Minimum score percent needed to pass |
|
|
65
70
|
|
|
66
71
|
## Example Output
|
|
67
72
|
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
73
|
+
```text
|
|
74
|
+
pickled check
|
|
75
|
+
-------------------------------------------------------
|
|
72
76
|
Tool: zod
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
77
|
+
Sources: [readme], [llms]
|
|
78
|
+
Scenarios: 1
|
|
79
|
+
|
|
80
|
+
Scenario: Error handling
|
|
81
|
+
โ Trap fired (0%)
|
|
82
|
+
trap: old_v2_api
|
|
83
|
+
reason: Deprecated in Zod 4; use z.treeifyError()
|
|
84
|
+
match: "ZodError.format()"
|
|
85
|
+
cited: [readme], [llms]
|
|
86
|
+
|
|
87
|
+
-------------------------------------------------------
|
|
88
|
+
Overall: 0 / 100 ยท threshold 80 ยท run fails
|
|
89
|
+
Review fired traps before trusting this surface.
|
|
83
90
|
```
|
|
84
91
|
|
|
85
|
-
##
|
|
86
|
-
|
|
87
|
-
| Score | Status | Meaning |
|
|
88
|
-
|-------|--------|---------|
|
|
89
|
-
| 90%+ | Well preserved | AI nails it |
|
|
90
|
-
| 70-89% | Fresh | Good, minor gaps |
|
|
91
|
-
| 50-69% | Going stale | Needs attention |
|
|
92
|
-
| <50% | Gone sour | Major documentation gaps |
|
|
93
|
-
|
|
94
|
-
## Config Reference
|
|
92
|
+
## Result Labels
|
|
95
93
|
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
target: target-name # Optional: specific target
|
|
105
|
-
|
|
106
|
-
targets: # Optional: named targets
|
|
107
|
-
claude-sonnet:
|
|
108
|
-
category: cli
|
|
109
|
-
provider: claude-code
|
|
110
|
-
model: claude-sonnet-4-20250514
|
|
111
|
-
|
|
112
|
-
threshold: 80 # Optional: min score % to pass
|
|
113
|
-
```
|
|
94
|
+
| Label | Meaning |
|
|
95
|
+
| ----- | ------- |
|
|
96
|
+
| `Well grounded` | Required sources cited. No unknown sources. High confidence. |
|
|
97
|
+
| `Grounded` | Required sources cited. No unknown sources. Lower confidence. |
|
|
98
|
+
| `Partially grounded` | Some required citations are missing, or unknown citations appeared. |
|
|
99
|
+
| `Trap fired` | A declared stale pattern matched. Score is forced to 0 for that scenario. |
|
|
100
|
+
| `Ungrounded` | No valid citations, or every citation is unknown. |
|
|
101
|
+
| `Error` | The target failed before Pickled could score the response. |
|
|
114
102
|
|
|
115
|
-
## CI
|
|
103
|
+
## CI
|
|
116
104
|
|
|
117
105
|
```yaml
|
|
118
106
|
# GitHub Actions
|
|
119
|
-
- name: Check
|
|
107
|
+
- name: Check agent legibility
|
|
120
108
|
run: pickled check --threshold 80
|
|
121
109
|
```
|
|
122
110
|
|
|
123
|
-
Fail the
|
|
111
|
+
Fail the run when the overall score falls below the threshold.
|
|
124
112
|
|
|
125
113
|
## Local Development
|
|
126
114
|
|