agent-skill-evals 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +155 -0
- package/dist/agent/index.d.mts +3 -0
- package/dist/agent/index.mjs +2 -0
- package/dist/agent-CM7fIL_C.mjs +1525 -0
- package/dist/agent-CM7fIL_C.mjs.map +1 -0
- package/dist/assertion-entries-CfmNt-fp.d.mts +9 -0
- package/dist/assertion-entries-CfmNt-fp.d.mts.map +1 -0
- package/dist/assertions/index.d.mts +47 -0
- package/dist/assertions/index.d.mts.map +1 -0
- package/dist/assertions/index.mjs +574 -0
- package/dist/assertions/index.mjs.map +1 -0
- package/dist/index-4l7TCFny.d.mts +90 -0
- package/dist/index-4l7TCFny.d.mts.map +1 -0
- package/dist/internal-services-5-mRgNls.mjs +226 -0
- package/dist/internal-services-5-mRgNls.mjs.map +1 -0
- package/dist/internal-services-DbsekQ_K.d.mts +76 -0
- package/dist/internal-services-DbsekQ_K.d.mts.map +1 -0
- package/dist/skill-checks/index.d.mts +113 -0
- package/dist/skill-checks/index.d.mts.map +1 -0
- package/dist/skill-checks/index.mjs +408 -0
- package/dist/skill-checks/index.mjs.map +1 -0
- package/package.json +56 -0
package/README.md
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
# agent-skill-evals
|
|
2
|
+
|
|
3
|
+
Agent Skill Evals helps you test agent skills with [Promptfoo](https://www.promptfoo.dev/).
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
```bash
|
|
8
|
+
pnpm add -D promptfoo agent-skill-evals
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
Use it to:
|
|
12
|
+
|
|
13
|
+
1. Check a `SKILL.md` file and its tests before you run an agent.
|
|
14
|
+
2. Copy a sample project, run an agent in the copy, save evidence, and check what changed.
|
|
15
|
+
|
|
16
|
+
Agent Skill Evals is a Promptfoo plugin package. Promptfoo is the eval runner:
|
|
17
|
+
it reads the YAML configs, runs providers, and calls assertions. Keep using
|
|
18
|
+
`promptfoo eval`; Agent Skill Evals adds skill-focused providers and assertions
|
|
19
|
+
that Promptfoo can load from `file://` paths.
|
|
20
|
+
|
|
21
|
+
## Add Agent Skill Evals To Promptfoo
|
|
22
|
+
|
|
23
|
+
Create a `agent-skill-evals/` folder in your project and add small loader files:
|
|
24
|
+
|
|
25
|
+
```js
|
|
26
|
+
// agent-skill-evals/agent.js
|
|
27
|
+
export { default } from "agent-skill-evals/agent";
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
```js
|
|
31
|
+
// agent-skill-evals/skill-checks.js
|
|
32
|
+
export { default } from "agent-skill-evals/skill-checks";
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
```js
|
|
36
|
+
// agent-skill-evals/assertions.js
|
|
37
|
+
export { default } from "agent-skill-evals/assertions";
|
|
38
|
+
export * from "agent-skill-evals/assertions";
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
Use these files from normal Promptfoo configs with `file://./agent-skill-evals/...`.
|
|
42
|
+
|
|
43
|
+
## Entrypoints
|
|
44
|
+
|
|
45
|
+
The package uses Promptfoo-facing subpaths:
|
|
46
|
+
|
|
47
|
+
- `agent-skill-evals/agent`
|
|
48
|
+
- `agent-skill-evals/skill-checks`
|
|
49
|
+
- `agent-skill-evals/assertions`
|
|
50
|
+
|
|
51
|
+
There is no root import from `agent-skill-evals`.
|
|
52
|
+
|
|
53
|
+
## Metrics
|
|
54
|
+
|
|
55
|
+
- `skill.checks` checks a skill file and its tests before an agent runs.
|
|
56
|
+
- `skill.test` checks evidence after an agent run.
|
|
57
|
+
- `skill.budget` checks real-agent token usage after an agent run.
|
|
58
|
+
- `skill.activation`, `skill.budgets`, `skill.context`, `skill.instructions`, `skill.tests`, and `skill.verifiers` run focused Skill Checks.
|
|
59
|
+
|
|
60
|
+
## Example: Check A Skill Before Runtime
|
|
61
|
+
|
|
62
|
+
Use `skill.checks` to check that a `SKILL.md` file and its tests are ready to
|
|
63
|
+
run:
|
|
64
|
+
|
|
65
|
+
```yaml
|
|
66
|
+
description: Skill checks
|
|
67
|
+
|
|
68
|
+
prompts:
|
|
69
|
+
- "skill-check"
|
|
70
|
+
|
|
71
|
+
providers:
|
|
72
|
+
- id: file://./agent-skill-evals/skill-checks.js
|
|
73
|
+
|
|
74
|
+
tests:
|
|
75
|
+
- description: bugfix skill checks
|
|
76
|
+
vars:
|
|
77
|
+
skillPath: ./skills/bugfix-workflow
|
|
78
|
+
testsGlob: ./tests/bugfix-workflow.yaml
|
|
79
|
+
assert:
|
|
80
|
+
- type: javascript
|
|
81
|
+
metric: skill.checks
|
|
82
|
+
value: file://./agent-skill-evals/assertions.js
|
|
83
|
+
config:
|
|
84
|
+
metric: skill.checks
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
Run it with Promptfoo:
|
|
88
|
+
|
|
89
|
+
```bash
|
|
90
|
+
promptfoo eval -c promptfoo.skill-checks.yaml
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Example: Test A Skill On A Copied Project
|
|
94
|
+
|
|
95
|
+
Use `skill.test` when you want the agent to work on a copied fixture and then
|
|
96
|
+
check the evidence from that run:
|
|
97
|
+
|
|
98
|
+
```yaml
|
|
99
|
+
description: Runtime skill test
|
|
100
|
+
|
|
101
|
+
prompts:
|
|
102
|
+
- "{{prompt}}"
|
|
103
|
+
|
|
104
|
+
providers:
|
|
105
|
+
- id: file://./agent-skill-evals/agent.js
|
|
106
|
+
config:
|
|
107
|
+
command: codex
|
|
108
|
+
args:
|
|
109
|
+
- exec
|
|
110
|
+
- --json
|
|
111
|
+
- --full-auto
|
|
112
|
+
|
|
113
|
+
tests:
|
|
114
|
+
- description: fixes login redirect locally
|
|
115
|
+
vars:
|
|
116
|
+
prompt: Fix the login redirect bug.
|
|
117
|
+
fixture: ./fixtures/login-bug
|
|
118
|
+
preconditions:
|
|
119
|
+
- verifier.fails:
|
|
120
|
+
run: ./verify_login_redirect.sh
|
|
121
|
+
should:
|
|
122
|
+
- verifier.succeeds:
|
|
123
|
+
run: ./verify_login_redirect.sh
|
|
124
|
+
- file.contains:
|
|
125
|
+
path: app.js
|
|
126
|
+
text: /dashboard
|
|
127
|
+
should_not:
|
|
128
|
+
- file.changes_outside_scope:
|
|
129
|
+
scope:
|
|
130
|
+
- app.js
|
|
131
|
+
assert:
|
|
132
|
+
- type: javascript
|
|
133
|
+
metric: skill.test
|
|
134
|
+
value: file://./agent-skill-evals/assertions.js
|
|
135
|
+
config:
|
|
136
|
+
metric: skill.test
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
Run it with Promptfoo:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
promptfoo eval -c promptfoo.codex.yaml
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
Agent Skill Evals copies `fixture` before the agent runs. The original sample project
|
|
146
|
+
stays clean, and `skill.test` checks the recorded evidence: changed files,
|
|
147
|
+
command results, tool calls, final output, usage, and run details.
|
|
148
|
+
|
|
149
|
+
## Docs
|
|
150
|
+
|
|
151
|
+
- [Getting Started](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/getting-started.md)
|
|
152
|
+
- [Promptfoo Setup](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/promptfoo-setup.md)
|
|
153
|
+
- [Runtime Checks](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/runtime-checks.md)
|
|
154
|
+
- [Brand Deck Example](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/examples/brand-deck-skill.md)
|
|
155
|
+
- [Metrics](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/metrics.md)
|
|
@@ -0,0 +1,3 @@
|
|
|
1
|
+
import { i as EvidenceSnapshot } from "../internal-services-DbsekQ_K.mjs";
|
|
2
|
+
import { i as evidenceFromSnapshot, n as AgentSkillEvalsProviderMetadata, r as EvidenceCollector, t as AgentSkillEvalsProvider } from "../index-4l7TCFny.mjs";
|
|
3
|
+
export { AgentSkillEvalsProvider, AgentSkillEvalsProvider as default, AgentSkillEvalsProviderMetadata, EvidenceCollector, EvidenceSnapshot, evidenceFromSnapshot };
|