agent-skill-evals 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,155 @@
1
+ # agent-skill-evals
2
+
3
+ Agent Skill Evals helps you test agent skills with [Promptfoo](https://www.promptfoo.dev/).
4
+
5
+ ## Install
6
+
7
+ ```bash
8
+ pnpm add -D promptfoo agent-skill-evals
9
+ ```
10
+
11
+ Use it to:
12
+
13
+ 1. Check a `SKILL.md` file and its tests before you run an agent.
14
+ 2. Copy a sample project, run an agent in the copy, save evidence, and check what changed.
15
+
16
+ Agent Skill Evals is a Promptfoo plugin package. Promptfoo is the eval runner:
17
+ it reads the YAML configs, runs providers, and calls assertions. Keep using
18
+ `promptfoo eval`; Agent Skill Evals adds skill-focused providers and assertions
19
+ that Promptfoo can load from `file://` paths.
20
+
21
+ ## Add Agent Skill Evals To Promptfoo
22
+
23
+ Create a `agent-skill-evals/` folder in your project and add small loader files:
24
+
25
+ ```js
26
+ // agent-skill-evals/agent.js
27
+ export { default } from "agent-skill-evals/agent";
28
+ ```
29
+
30
+ ```js
31
+ // agent-skill-evals/skill-checks.js
32
+ export { default } from "agent-skill-evals/skill-checks";
33
+ ```
34
+
35
+ ```js
36
+ // agent-skill-evals/assertions.js
37
+ export { default } from "agent-skill-evals/assertions";
38
+ export * from "agent-skill-evals/assertions";
39
+ ```
40
+
41
+ Use these files from normal Promptfoo configs with `file://./agent-skill-evals/...`.
42
+
43
+ ## Entrypoints
44
+
45
+ The package uses Promptfoo-facing subpaths:
46
+
47
+ - `agent-skill-evals/agent`
48
+ - `agent-skill-evals/skill-checks`
49
+ - `agent-skill-evals/assertions`
50
+
51
+ There is no root import from `agent-skill-evals`.
52
+
53
+ ## Metrics
54
+
55
+ - `skill.checks` checks a skill file and its tests before an agent runs.
56
+ - `skill.test` checks evidence after an agent run.
57
+ - `skill.budget` checks real-agent token usage after an agent run.
58
+ - `skill.activation`, `skill.budgets`, `skill.context`, `skill.instructions`, `skill.tests`, and `skill.verifiers` run focused Skill Checks.
59
+
60
+ ## Example: Check A Skill Before Runtime
61
+
62
+ Use `skill.checks` to check that a `SKILL.md` file and its tests are ready to
63
+ run:
64
+
65
+ ```yaml
66
+ description: Skill checks
67
+
68
+ prompts:
69
+ - "skill-check"
70
+
71
+ providers:
72
+ - id: file://./agent-skill-evals/skill-checks.js
73
+
74
+ tests:
75
+ - description: bugfix skill checks
76
+ vars:
77
+ skillPath: ./skills/bugfix-workflow
78
+ testsGlob: ./tests/bugfix-workflow.yaml
79
+ assert:
80
+ - type: javascript
81
+ metric: skill.checks
82
+ value: file://./agent-skill-evals/assertions.js
83
+ config:
84
+ metric: skill.checks
85
+ ```
86
+
87
+ Run it with Promptfoo:
88
+
89
+ ```bash
90
+ promptfoo eval -c promptfoo.skill-checks.yaml
91
+ ```
92
+
93
+ ## Example: Test A Skill On A Copied Project
94
+
95
+ Use `skill.test` when you want the agent to work on a copied fixture and then
96
+ check the evidence from that run:
97
+
98
+ ```yaml
99
+ description: Runtime skill test
100
+
101
+ prompts:
102
+ - "{{prompt}}"
103
+
104
+ providers:
105
+ - id: file://./agent-skill-evals/agent.js
106
+ config:
107
+ command: codex
108
+ args:
109
+ - exec
110
+ - --json
111
+ - --full-auto
112
+
113
+ tests:
114
+ - description: fixes login redirect locally
115
+ vars:
116
+ prompt: Fix the login redirect bug.
117
+ fixture: ./fixtures/login-bug
118
+ preconditions:
119
+ - verifier.fails:
120
+ run: ./verify_login_redirect.sh
121
+ should:
122
+ - verifier.succeeds:
123
+ run: ./verify_login_redirect.sh
124
+ - file.contains:
125
+ path: app.js
126
+ text: /dashboard
127
+ should_not:
128
+ - file.changes_outside_scope:
129
+ scope:
130
+ - app.js
131
+ assert:
132
+ - type: javascript
133
+ metric: skill.test
134
+ value: file://./agent-skill-evals/assertions.js
135
+ config:
136
+ metric: skill.test
137
+ ```
138
+
139
+ Run it with Promptfoo:
140
+
141
+ ```bash
142
+ promptfoo eval -c promptfoo.codex.yaml
143
+ ```
144
+
145
+ Agent Skill Evals copies `fixture` before the agent runs. The original sample project
146
+ stays clean, and `skill.test` checks the recorded evidence: changed files,
147
+ command results, tool calls, final output, usage, and run details.
148
+
149
+ ## Docs
150
+
151
+ - [Getting Started](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/getting-started.md)
152
+ - [Promptfoo Setup](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/promptfoo-setup.md)
153
+ - [Runtime Checks](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/runtime-checks.md)
154
+ - [Brand Deck Example](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/examples/brand-deck-skill.md)
155
+ - [Metrics](https://github.com/akshay5995/agent-skill-evals/blob/main/docs/guide/metrics.md)
@@ -0,0 +1,3 @@
1
+ import { i as EvidenceSnapshot } from "../internal-services-DbsekQ_K.mjs";
2
+ import { i as evidenceFromSnapshot, n as AgentSkillEvalsProviderMetadata, r as EvidenceCollector, t as AgentSkillEvalsProvider } from "../index-4l7TCFny.mjs";
3
+ export { AgentSkillEvalsProvider, AgentSkillEvalsProvider as default, AgentSkillEvalsProviderMetadata, EvidenceCollector, EvidenceSnapshot, evidenceFromSnapshot };
@@ -0,0 +1,2 @@
1
+ import { i as evidenceFromSnapshot, r as EvidenceCollector, t as AgentSkillEvalsProvider } from "../agent-CM7fIL_C.mjs";
2
+ export { AgentSkillEvalsProvider, AgentSkillEvalsProvider as default, EvidenceCollector, evidenceFromSnapshot };