@pickled-dev/cli 0.4.0 โ†’ 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +53 -64
  2. package/dist/index.js +225 -204
  3. package/package.json +5 -4
package/README.md CHANGED
@@ -1,8 +1,8 @@
1
1
  # @pickled-dev/cli
2
2
 
3
- > Stay fresh in AI ๐Ÿฅ’
3
+ > Test what agents actually understand about your product
4
4
 
5
- Test how well AI responds to questions about your developer tool. Define scenarios, run checks, and see your freshness score.
5
+ Pickled runs scenarios against real agent targets, checks citations against registered sources, and matches declared traps deterministically. No LLM grades another LLM.
6
6
 
7
7
  ## Installation
8
8
 
@@ -22,25 +22,26 @@ Creates a `pickled.yml` file:
22
22
 
23
23
  ```yaml
24
24
  tool:
25
- name: "your-tool"
26
- description: "What your tool does"
25
+ name: "your-product"
26
+ description: "What your product does"
27
27
 
28
- scenarios:
29
- - name: "Installation"
30
- prompt: "How do I install this tool?"
28
+ docs:
29
+ sources:
30
+ readme: ./README.md
31
31
 
32
+ scenarios:
32
33
  - name: "Getting started"
33
- prompt: "How do I set up this tool for my project?"
34
+ prompt: "How do I install and set up this product?"
35
+ requiredSources: [readme]
34
36
 
35
- - name: "Basic usage"
36
- prompt: "Show me a basic example of using this tool"
37
+ threshold: 80
37
38
  ```
38
39
 
39
40
  ### 2. Edit your config
40
41
 
41
- Update `pickled.yml` with your actual tool info and scenarios developers might ask about.
42
+ Declare the sources agents should cite, the scenarios they should answer, and any stale patterns you want traps to catch.
42
43
 
43
- ### 3. Run check
44
+ ### 3. Run the check
44
45
 
45
46
  ```bash
46
47
  pickled check
@@ -52,75 +53,63 @@ pickled check
52
53
 
53
54
  Create a starter `pickled.yml` config file.
54
55
 
56
+ ### `pickled audit [path]`
57
+
58
+ Static scan of agent-context files. No LLM calls.
59
+
55
60
  ### `pickled check [path]`
56
61
 
57
- Run freshness checks and report results.
62
+ Run agent scenarios against registered sources.
58
63
 
59
- | Option | Description |
60
- | --------------------- | ---------------------- |
61
- | `--json` | Output as JSON |
62
- | `-o, --output <file>` | Save report to file |
63
- | `-v, --verbose` | Show detailed progress |
64
- | `-t, --threshold <n>` | Min score % to pass |
64
+ | Option | Description |
65
+ | --------------------- | ------------------------------------------------ |
66
+ | `--json` | Output as JSON |
67
+ | `-o, --output <file>` | Save JSON report to file |
68
+ | `-v, --verbose` | Show progress while scenarios run |
69
+ | `-t, --threshold <n>` | Minimum score percent needed to pass |
70
+ | `--target <name>` | Run only the named target (overrides matrix) |
65
71
 
66
72
  ## Example Output
67
73
 
68
- ```
69
- ๐Ÿฅ’ Freshness Check
70
- โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
71
-
74
+ ```text
75
+ pickled check
76
+ -------------------------------------------------------
72
77
  Tool: zod
73
-
74
- [default] โœ“ "Installation" - Well preserved (92%)
75
- [default] โœ“ "Basic parsing" - Fresh (85%)
76
- [default] โš  "Error handling" - Going stale (65%)
77
- Missing: safeParse details
78
-
79
- โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
80
- Freshness Score: 81% ๐Ÿฅ’๐Ÿฅ’๐Ÿฅ’๐Ÿฅ’โ–‘
81
-
82
- ๐Ÿฅ’ Looking fresh! Your docs are doing well.
78
+ Sources: [readme], [llms]
79
+ Scenarios: 1
80
+
81
+ Scenario: Error handling
82
+ โœ— Trap fired (0%)
83
+ trap: old_v2_api
84
+ reason: Deprecated in Zod 4; use z.treeifyError()
85
+ match: "ZodError.format()"
86
+ cited: [readme], [llms]
87
+
88
+ -------------------------------------------------------
89
+ Overall: 0 / 100 ยท threshold 80 ยท run fails
90
+ Review fired traps before trusting this surface.
83
91
  ```
84
92
 
85
- ## Freshness Scores
86
-
87
- | Score | Status | Meaning |
88
- |-------|--------|---------|
89
- | 90%+ | Well preserved | AI nails it |
90
- | 70-89% | Fresh | Good, minor gaps |
91
- | 50-69% | Going stale | Needs attention |
92
- | <50% | Gone sour | Major documentation gaps |
93
-
94
- ## Config Reference
93
+ ## Result Labels
95
94
 
96
- ```yaml
97
- tool:
98
- name: "tool-name" # Required: your tool's name
99
- description: "desc" # Required: what it does
100
-
101
- scenarios: # Required: scenarios to check
102
- - name: "Scenario name" # Display name
103
- prompt: "The question" # What to ask AI
104
- target: target-name # Optional: specific target
105
-
106
- targets: # Optional: named targets
107
- claude-sonnet:
108
- category: cli
109
- provider: claude-code
110
- model: claude-sonnet-4-20250514
111
-
112
- threshold: 80 # Optional: min score % to pass
113
- ```
95
+ | Label | Meaning |
96
+ | ----- | ------- |
97
+ | `Well grounded` | Required sources cited. No unknown sources. High confidence. |
98
+ | `Grounded` | Required sources cited. No unknown sources. Lower confidence. |
99
+ | `Partially grounded` | Some required citations are missing, or unknown citations appeared. |
100
+ | `Trap fired` | A declared stale pattern matched. Score is forced to 0 for that scenario. |
101
+ | `Ungrounded` | No valid citations, or every citation is unknown. |
102
+ | `Error` | The target failed before Pickled could score the response. |
114
103
 
115
- ## CI/CD Integration
104
+ ## CI
116
105
 
117
106
  ```yaml
118
107
  # GitHub Actions
119
- - name: Check AI freshness
108
+ - name: Check agent legibility
120
109
  run: pickled check --threshold 80
121
110
  ```
122
111
 
123
- Fail the build if AI can't answer questions about your tool correctly.
112
+ Fail the run when the overall score falls below the threshold.
124
113
 
125
114
  ## Local Development
126
115