@probelabs/visor 0.1.174-ee → 0.1.175-ee

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +3 -2
  2. package/dist/cli-main.d.ts.map +1 -1
  3. package/dist/docs/guides/tdd-assistant-workflows.md +519 -0
  4. package/dist/docs/testing/dsl-reference.md +93 -0
  5. package/dist/examples/lifecycle-hooks.tests.yaml +62 -0
  6. package/dist/generated/config-schema.d.ts +28 -7
  7. package/dist/generated/config-schema.d.ts.map +1 -1
  8. package/dist/generated/config-schema.json +31 -7
  9. package/dist/index.js +330 -25
  10. package/dist/providers/ai-check-provider.d.ts.map +1 -1
  11. package/dist/providers/mcp-custom-sse-server.d.ts +4 -0
  12. package/dist/providers/mcp-custom-sse-server.d.ts.map +1 -1
  13. package/dist/sdk/{check-provider-registry-53C2ZIXJ.mjs → check-provider-registry-K34RCO6G.mjs} +3 -3
  14. package/dist/sdk/{check-provider-registry-UPQNHHFF.mjs → check-provider-registry-O36CQEGD.mjs} +3 -3
  15. package/dist/sdk/{chunk-GKSSG5IM.mjs → chunk-4Z6HTWGJ.mjs} +153 -14
  16. package/dist/sdk/chunk-4Z6HTWGJ.mjs.map +1 -0
  17. package/dist/sdk/{chunk-2PL2YH3B.mjs → chunk-FZPCP444.mjs} +153 -14
  18. package/dist/sdk/chunk-FZPCP444.mjs.map +1 -0
  19. package/dist/sdk/{chunk-W4KCJM6J.mjs → chunk-MLXGCLZJ.mjs} +29 -8
  20. package/dist/sdk/chunk-MLXGCLZJ.mjs.map +1 -0
  21. package/dist/sdk/{config-BVL3KFMB.mjs → config-4JMBJKWS.mjs} +2 -2
  22. package/dist/sdk/{schedule-tool-5KDBDCFO.mjs → schedule-tool-XOXKUW5G.mjs} +3 -3
  23. package/dist/sdk/{schedule-tool-UMDRCNO5.mjs → schedule-tool-XVSYLH4Z.mjs} +3 -3
  24. package/dist/sdk/{schedule-tool-handler-5EPTHBLS.mjs → schedule-tool-handler-3I6AZ4N7.mjs} +3 -3
  25. package/dist/sdk/{schedule-tool-handler-MUF5V36L.mjs → schedule-tool-handler-CFMFHDUL.mjs} +3 -3
  26. package/dist/sdk/sdk.d.mts +9 -1
  27. package/dist/sdk/sdk.d.ts +9 -1
  28. package/dist/sdk/sdk.js +172 -12
  29. package/dist/sdk/sdk.js.map +1 -1
  30. package/dist/sdk/sdk.mjs +2 -2
  31. package/dist/sdk/{workflow-check-provider-EWMZEEES.mjs → workflow-check-provider-ETM452BO.mjs} +3 -3
  32. package/dist/sdk/{workflow-check-provider-RQUCBAYY.mjs → workflow-check-provider-EV6VCG7M.mjs} +3 -3
  33. package/dist/test-runner/conversation-sugar.d.ts.map +1 -1
  34. package/dist/test-runner/index.d.ts +19 -0
  35. package/dist/test-runner/index.d.ts.map +1 -1
  36. package/dist/test-runner/validator.d.ts.map +1 -1
  37. package/dist/types/config.d.ts +9 -1
  38. package/dist/types/config.d.ts.map +1 -1
  39. package/package.json +1 -1
  40. package/dist/sdk/chunk-2PL2YH3B.mjs.map +0 -1
  41. package/dist/sdk/chunk-GKSSG5IM.mjs.map +0 -1
  42. package/dist/sdk/chunk-W4KCJM6J.mjs.map +0 -1
  43. /package/dist/sdk/{check-provider-registry-53C2ZIXJ.mjs.map → check-provider-registry-K34RCO6G.mjs.map} +0 -0
  44. /package/dist/sdk/{check-provider-registry-UPQNHHFF.mjs.map → check-provider-registry-O36CQEGD.mjs.map} +0 -0
  45. /package/dist/sdk/{config-BVL3KFMB.mjs.map → config-4JMBJKWS.mjs.map} +0 -0
  46. /package/dist/sdk/{schedule-tool-5KDBDCFO.mjs.map → schedule-tool-XOXKUW5G.mjs.map} +0 -0
  47. /package/dist/sdk/{schedule-tool-UMDRCNO5.mjs.map → schedule-tool-XVSYLH4Z.mjs.map} +0 -0
  48. /package/dist/sdk/{schedule-tool-handler-5EPTHBLS.mjs.map → schedule-tool-handler-3I6AZ4N7.mjs.map} +0 -0
  49. /package/dist/sdk/{schedule-tool-handler-MUF5V36L.mjs.map → schedule-tool-handler-CFMFHDUL.mjs.map} +0 -0
  50. /package/dist/sdk/{workflow-check-provider-EWMZEEES.mjs.map → workflow-check-provider-ETM452BO.mjs.map} +0 -0
  51. /package/dist/sdk/{workflow-check-provider-RQUCBAYY.mjs.map → workflow-check-provider-EV6VCG7M.mjs.map} +0 -0
package/README.md CHANGED
@@ -34,6 +34,7 @@ Visor is an open-source workflow engine that lets you define multi-step AI pipel
34
34
  | **Chat assistant / Bot** | [Bot Integrations](docs/bot-integrations.md) | [teams-assistant.yaml](examples/teams-assistant.yaml) |
35
35
  | **Run shell commands + AI** | [Command Provider](docs/command-provider.md) | [ai-with-bash.yaml](examples/ai-with-bash.yaml) |
36
36
  | **Connect MCP tools** | [MCP Provider](docs/mcp-provider.md) | [mcp-provider-example.yaml](examples/mcp-provider-example.yaml) |
37
+ | **Add API integrations (TDD)** | [Guide: TDD Assistant Workflows](docs/guides/tdd-assistant-workflows.md) | [workable.tests.yaml](https://github.com/TykTechnologies/REFINE/blob/main/Oel/tests/workable.tests.yaml) |
37
38
 
38
39
  > **First time?** Run `npx visor init` to scaffold a working config, then `npx visor` to run it.
39
40
 
@@ -774,7 +775,7 @@ Learn more: [docs/enterprise-policy.md](docs/enterprise-policy.md)
774
775
  [Configuration](docs/configuration.md) · [AI config](docs/ai-configuration.md) · [CLI commands](docs/commands.md) · [GitHub Auth](docs/github-auth.md) · [CI/CLI mode](docs/ci-cli-mode.md) · [GitHub Action reference](docs/action-reference.md) · [Migration](docs/migration.md) · [FAQ](docs/faq.md) · [Glossary](docs/glossary.md)
775
776
 
776
777
  **Guides:**
777
- [Tools & Toolkits](docs/tools-and-toolkits.md) · [Assistant workflows](docs/assistant-workflows.md) · [Workflow creation](docs/workflow-creation-guide.md) · [Workflow style guide](docs/guides/workflow-style-guide.md) · [Dependencies](docs/dependencies.md) · [forEach propagation](docs/foreach-dependency-propagation.md) · [Failure routing](docs/failure-routing.md) · [Router patterns](docs/router-patterns.md) · [Lifecycle hooks](docs/lifecycle-hooks.md) · [Liquid templates](docs/liquid-templates.md) · [Schema-template system](docs/schema-templates.md) · [Fail conditions](docs/fail-if.md) · [Failure conditions schema](docs/failure-conditions-schema.md) · [Failure conditions impl](docs/failure-conditions-implementation.md) · [Timeouts](docs/timeouts.md) · [Execution limits](docs/limits.md) · [Event triggers](docs/event-triggers.md) · [Output formats](docs/output-formats.md) · [Output formatting](docs/output-formatting.md) · [Default output schema](docs/default-output-schema.md) · [Output history](docs/output-history.md) · [Reusable workflows](docs/workflows.md) · [Criticality modes](docs/guides/criticality-modes.md) · [Fault management](docs/guides/fault-management-and-contracts.md)
778
+ [Tools & Toolkits](docs/tools-and-toolkits.md) · [Assistant workflows](docs/assistant-workflows.md) · [TDD for assistant workflows](docs/guides/tdd-assistant-workflows.md) · [Workflow creation](docs/workflow-creation-guide.md) · [Workflow style guide](docs/guides/workflow-style-guide.md) · [Dependencies](docs/dependencies.md) · [forEach propagation](docs/foreach-dependency-propagation.md) · [Failure routing](docs/failure-routing.md) · [Router patterns](docs/router-patterns.md) · [Lifecycle hooks](docs/lifecycle-hooks.md) · [Liquid templates](docs/liquid-templates.md) · [Schema-template system](docs/schema-templates.md) · [Fail conditions](docs/fail-if.md) · [Failure conditions schema](docs/failure-conditions-schema.md) · [Failure conditions impl](docs/failure-conditions-implementation.md) · [Timeouts](docs/timeouts.md) · [Execution limits](docs/limits.md) · [Event triggers](docs/event-triggers.md) · [Output formats](docs/output-formats.md) · [Output formatting](docs/output-formatting.md) · [Default output schema](docs/default-output-schema.md) · [Output history](docs/output-history.md) · [Reusable workflows](docs/workflows.md) · [Criticality modes](docs/guides/criticality-modes.md) · [Fault management](docs/guides/fault-management-and-contracts.md)
778
779
 
779
780
  **Providers:**
780
781
  [A2A](docs/a2a-provider.md) · [Command](docs/command-provider.md) · [Script](docs/script.md) · [MCP](docs/mcp-provider.md) · [MCP tools for AI](docs/mcp.md) · [Claude Code](docs/claude-code.md) · [AI custom tools](docs/ai-custom-tools.md) · [AI custom tools usage](docs/ai-custom-tools-usage.md) · [Custom tools](docs/custom-tools.md) · [GitHub ops](docs/github-ops.md) · [Git checkout](docs/providers/git-checkout.md) · [HTTP integration](docs/http.md) · [Memory](docs/memory.md) · [Human input](docs/human-input-provider.md) · [Custom providers](docs/pluggable.md)
@@ -783,7 +784,7 @@ Learn more: [docs/enterprise-policy.md](docs/enterprise-policy.md)
783
784
  [Security](docs/security.md) · [Performance](docs/performance.md) · [Observability](docs/observability.md) · [Debugging](docs/debugging.md) · [Debug visualizer](docs/debug-visualizer.md) · [Telemetry setup](docs/telemetry-setup.md) · [Dashboards](docs/dashboards/README.md) · [Troubleshooting](docs/troubleshooting.md) · [Suppressions](docs/suppressions.md) · [GitHub checks](docs/GITHUB_CHECKS.md) · [Bot integrations](docs/bot-integrations.md) · [Slack](docs/slack-integration.md) · [Telegram](docs/telegram-integration.md) · [Email](docs/email-integration.md) · [WhatsApp](docs/whatsapp-integration.md) · [Teams](docs/teams-integration.md) · [Scheduler](docs/scheduler.md) · [Sandbox engines](docs/sandbox-engines.md)
784
785
 
785
786
  **Testing:**
786
- [Getting started](docs/testing/getting-started.md) · [DSL reference](docs/testing/dsl-reference.md) · [Flows](docs/testing/flows.md) · [Fixtures & mocks](docs/testing/fixtures-and-mocks.md) · [Assertions](docs/testing/assertions.md) · [Cookbook](docs/testing/cookbook.md) · [CLI & reporters](docs/testing/cli.md) · [CI integration](docs/testing/ci.md) · [Troubleshooting](docs/testing/troubleshooting.md)
787
+ [Getting started](docs/testing/getting-started.md) · [DSL reference](docs/testing/dsl-reference.md) · [Flows](docs/testing/flows.md) · [Fixtures & mocks](docs/testing/fixtures-and-mocks.md) · [Assertions](docs/testing/assertions.md) · [Cookbook](docs/testing/cookbook.md) · [TDD for assistants](docs/guides/tdd-assistant-workflows.md) · [CLI & reporters](docs/testing/cli.md) · [CI integration](docs/testing/ci.md) · [Troubleshooting](docs/testing/troubleshooting.md)
787
788
 
788
789
  **Enterprise:**
789
790
  [Licensing](docs/licensing.md) · [Enterprise policy](docs/enterprise-policy.md) · [Scheduler storage](docs/scheduler-storage.md) · [Database operations](docs/database-operations.md) · [Capacity planning](docs/capacity-planning.md) · [Production deployment](docs/production-deployment.md) · [Deployment](docs/DEPLOYMENT.md)
@@ -1 +1 @@
1
- {"version":3,"file":"","sourceRoot":"","sources":["file:///home/runner/work/visor/visor/src/cli-main.ts"],"names":[],"mappings":"AAirCA;;GAEG;AACH,wBAAsB,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,CA4gE1C"}
1
+ {"version":3,"file":"","sourceRoot":"","sources":["file:///home/runner/work/visor/visor/src/cli-main.ts"],"names":[],"mappings":"AAkrCA;;GAEG;AACH,wBAAsB,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,CA4gE1C"}
@@ -0,0 +1,519 @@
1
+ # Test-Driven Development for Assistant Workflows
2
+
3
+ Build and iterate on AI assistant workflows using visor's test framework. This guide covers the full cycle: define your workflow, write tests with mocks, then run against real AI to iterate on prompt quality and assertions.
4
+
5
+ ## The TDD Cycle
6
+
7
+ 1. **Define the workflow** — skills, tools, knowledge, intents
8
+ 2. **Write tests with mocks** — expected conversations and assertions
9
+ 3. **Run with mocks** — validate structure, routing, and assertion logic
10
+ 4. **Run with `--no-mocks`** — real AI + real tools, iterate on quality
11
+ 5. **Refine** — fix prompts, relax over-strict assertions, improve knowledge
12
+
13
+ ## Setting Up
14
+
15
+ ### Project Structure
16
+
17
+ ```
18
+ my-assistant/
19
+ ├── assistant.yaml # main workflow config
20
+ ├── config/
21
+ │ └── skills.yaml # skill definitions
22
+ ├── docs/
23
+ │ └── api-reference.md # knowledge docs for skills
24
+ ├── tests/
25
+ │ └── skills.tests.yaml # test file
26
+ └── .env # API tokens (not committed)
27
+ ```
28
+
29
+ ### Test File Basics
30
+
31
+ Test files extend your main config:
32
+
33
+ ```yaml
34
+ # tests/skills.tests.yaml
35
+ version: "1.0"
36
+ extends: "../assistant.yaml"
37
+
38
+ tests:
39
+ defaults:
40
+ strict: false # required for --no-mocks (internal steps run without expectations)
41
+
42
+ cases:
43
+ - name: basic-question
44
+ conversation:
45
+ routing: { max_loops: 2 }
46
+ turns:
47
+ - role: user
48
+ text: "What services do we run?"
49
+ mocks:
50
+ chat:
51
+ text: "We run 3 services: API gateway, dashboard, and pump."
52
+ intent: chat
53
+ skills: []
54
+ expect:
55
+ outputs:
56
+ - step: chat
57
+ path: text
58
+ matches: "(?i)gateway|dashboard|pump"
59
+ ```
60
+
61
+ Key fields:
62
+ - **`extends`** — imports the full workflow so the test runner knows all steps, skills, and routing
63
+ - **`strict: false`** — prevents failures from internal steps (routing, config building) that don't have assertions
64
+ - **`conversation`** — sugar syntax that auto-expands turns into flow stages with message history
65
+
66
+ ## Writing Conversation Tests
67
+
68
+ ### Single-Turn Test
69
+
70
+ The simplest test: one user message, assert on the response.
71
+
72
+ ```yaml
73
+ - name: greeting
74
+ conversation:
75
+ routing: { max_loops: 2 }
76
+ turns:
77
+ - role: user
78
+ text: "Hello, who are you?"
79
+ mocks:
80
+ chat:
81
+ text: "I'm your engineering assistant. I can help with code, deployments, and more."
82
+ intent: chat
83
+ skills: []
84
+ expect:
85
+ calls:
86
+ - step: chat
87
+ exactly: 1
88
+ outputs:
89
+ - step: chat
90
+ path: text
91
+ matches: "(?i)assistant|help"
92
+ ```
93
+
94
+ ### Multi-Turn Conversation
95
+
96
+ Each turn's history automatically includes previous turns. Mock response text becomes assistant messages in subsequent turns.
97
+
98
+ ```yaml
99
+ - name: code-explore-then-explain
100
+ conversation:
101
+ routing: { max_loops: 2 }
102
+ turns:
103
+ - role: user
104
+ text: "Find the authentication middleware in the backend service"
105
+ mocks:
106
+ chat:
107
+ text: "Found `auth.go` in `internal/middleware/`. It checks JWT tokens."
108
+ intent: chat
109
+ skills: [code-explorer]
110
+ expect:
111
+ outputs:
112
+ - step: chat
113
+ path: text
114
+ matches: "(?i)auth|middleware"
115
+
116
+ - role: user
117
+ text: "Explain how the JWT validation works in that auth middleware you found"
118
+ mocks:
119
+ chat:
120
+ text: "The middleware extracts the Bearer token, validates the signature..."
121
+ intent: chat
122
+ skills: [code-explorer]
123
+ expect:
124
+ llm_judge:
125
+ - step: chat
126
+ turn: current
127
+ path: text
128
+ prompt: "Does the response explain JWT validation with technical details?"
129
+ schema:
130
+ properties:
131
+ explains_jwt:
132
+ type: boolean
133
+ required: [explains_jwt]
134
+ assert:
135
+ explains_jwt: true
136
+ ```
137
+
138
+ ### Mock Response Format
139
+
140
+ Mocks simulate what the `chat` step returns:
141
+
142
+ ```yaml
143
+ mocks:
144
+ chat:
145
+ text: "The response text..."
146
+ intent: chat # which intent was classified
147
+ skills: [code-explorer] # which skills were activated
148
+ ```
149
+
150
+ Use **fictional data** in mocks. The mock text becomes the assistant message in subsequent turn history.
151
+
152
+ ## Assertion Types
153
+
154
+ ### Regex Matching (`outputs`)
155
+
156
+ Pattern-match on response fields:
157
+
158
+ ```yaml
159
+ expect:
160
+ outputs:
161
+ - step: chat
162
+ path: text
163
+ matches: "(?i)kubernetes|k8s"
164
+ - step: chat
165
+ turn: current # only check this turn's output
166
+ path: text
167
+ matches: "(?i)deploy"
168
+ ```
169
+
170
+ ### Call Counting (`calls`)
171
+
172
+ Assert how many times a step ran:
173
+
174
+ ```yaml
175
+ expect:
176
+ calls:
177
+ - step: chat
178
+ exactly: 1
179
+ ```
180
+
181
+ ### LLM Judge (`llm_judge`)
182
+
183
+ Semantic evaluation — assert on meaning, not exact text. This is the most powerful assertion for AI responses:
184
+
185
+ ```yaml
186
+ expect:
187
+ llm_judge:
188
+ - step: chat
189
+ turn: current
190
+ path: text
191
+ prompt: |
192
+ Does the response provide a clear architectural overview
193
+ with component names and their responsibilities?
194
+ schema:
195
+ properties:
196
+ names_components:
197
+ type: boolean
198
+ description: "Names specific components or services?"
199
+ explains_responsibilities:
200
+ type: boolean
201
+ description: "Explains what each component does?"
202
+ required: [names_components, explains_responsibilities]
203
+ assert:
204
+ names_components: true
205
+ explains_responsibilities: true
206
+ ```
207
+
208
+ The judge returns a JSON object matching your schema. `assert` checks specific fields. You can include fields in the schema for observability without asserting them.
209
+
210
+ ### Cross-Turn Assertions
211
+
212
+ Reference previous turns using `turn: N` (1-based):
213
+
214
+ ```yaml
215
+ # In turn 2, verify turn 1 was good too
216
+ expect:
217
+ llm_judge:
218
+ - step: chat
219
+ turn: current
220
+ path: text
221
+ prompt: "Does turn 2 build on the context from turn 1?"
222
+ ...
223
+ - step: chat
224
+ turn: 1
225
+ path: text
226
+ prompt: "Did turn 1 provide a good foundation?"
227
+ ...
228
+ ```
229
+
230
+ ## Running Tests
231
+
232
+ ```bash
233
+ # Validate structure (no AI calls)
234
+ visor test --config tests/skills.tests.yaml
235
+
236
+ # Run a single case
237
+ visor test --config tests/skills.tests.yaml --case basic-question
238
+
239
+ # Run with real AI and real tools
240
+ visor test --config tests/skills.tests.yaml --no-mocks
241
+
242
+ # Real AI, single case
243
+ visor test --config tests/skills.tests.yaml --case basic-question --no-mocks
244
+
245
+ # Debug mode
246
+ visor test --config tests/skills.tests.yaml --case basic-question --no-mocks --debug
247
+ ```
248
+
249
+ ### Mock vs No-Mock Mode
250
+
251
+ | | Mock mode | `--no-mocks` |
252
+ |--|-----------|-------------|
253
+ | AI calls | Mocked responses | Real AI provider |
254
+ | Tools | Not called | Real tool execution |
255
+ | `routing.max_loops` | Use `0` | Use `2+` (AI needs iterations for tool calls) |
256
+ | `strict` | Can be `true` | Must be `false` (internal steps fire) |
257
+ | Speed | Fast (seconds) | Slow (AI latency + tool calls) |
258
+ | Use for | Structure validation, CI | Prompt quality iteration |
259
+
260
+ ## Iterating with `--no-mocks`
261
+
262
+ This is where the real work happens. Common issues and fixes:
263
+
264
+ ### AI ignores tool guidance
265
+
266
+ **Symptom:** AI calls wrong endpoint, uses wrong arguments, or skips required steps.
267
+
268
+ **Fix:** Improve the knowledge doc with explicit instructions:
269
+
270
+ ```yaml
271
+ # In your skill knowledge:
272
+ knowledge: |
273
+ ### Important: Always search by project ID
274
+ When looking up items, **always** use `/projects/{id}/items`
275
+ — never the global `/items` endpoint which returns all projects.
276
+ ```
277
+
278
+ Also make user prompts more specific:
279
+
280
+ ```yaml
281
+ # Before (too vague):
282
+ text: "List the items in review"
283
+
284
+ # After (explicit):
285
+ text: "List the items in review stage for the Backend project"
286
+ ```
287
+
288
+ ### Assertion too strict for real responses
289
+
290
+ **Symptom:** Mock includes specific details but real AI response formats them differently.
291
+
292
+ **Fix:** Keep the field in the schema for observability but remove from `assert`:
293
+
294
+ ```yaml
295
+ schema:
296
+ properties:
297
+ lists_items:
298
+ type: boolean
299
+ includes_links:
300
+ type: boolean
301
+ description: "Includes profile links?"
302
+ required: [lists_items, includes_links]
303
+ assert:
304
+ lists_items: true
305
+ # includes_links intentionally not asserted — real API
306
+ # doesn't always return this without extra calls
307
+ ```
308
+
309
+ ### Turn 2 loses context from Turn 1
310
+
311
+ **Symptom:** AI asks "which items?" instead of referencing turn 1 results.
312
+
313
+ **Fix:** Make follow-up prompts self-contained:
314
+
315
+ ```yaml
316
+ # Before:
317
+ text: "Now compare these candidates"
318
+
319
+ # After:
320
+ text: "Now compare the 3 candidates you just evaluated for the SRE role"
321
+ ```
322
+
323
+ ### `strict: false` not working
324
+
325
+ **Symptom:** Test fails with "Step executed without expect: chat.route-intent".
326
+
327
+ **Fix:** `strict: false` must be at the **test case level** or in `tests.defaults`, not inside `conversation:`:
328
+
329
+ ```yaml
330
+ # Wrong — ignored by conversation sugar:
331
+ conversation:
332
+ strict: false
333
+ turns: ...
334
+
335
+ # Correct — applied to expanded flow stages:
336
+ - name: my-test
337
+ strict: false
338
+ conversation:
339
+ turns: ...
340
+
341
+ # Or set globally:
342
+ tests:
343
+ defaults:
344
+ strict: false
345
+ ```
346
+
347
+ ## Example: Adding an API Integration Skill
348
+
349
+ Here's a complete example adding an external REST API as a skill.
350
+
351
+ ### 1. Define the Skill
352
+
353
+ ```yaml
354
+ # config/skills.yaml
355
+ - id: hr-system
356
+ description: |
357
+ request relates to recruiting, hiring pipeline, candidates,
358
+ job postings, or HR pipeline management.
359
+ Examples: "list candidates", "show open positions", "evaluate candidate"
360
+ tools:
361
+ hr-api:
362
+ type: http_client
363
+ base_url: "https://api.hr-system.com/v3"
364
+ auth:
365
+ type: bearer
366
+ token: "${HR_API_TOKEN}"
367
+ headers:
368
+ Content-Type: "application/json"
369
+ knowledge: |
370
+ {% readfile "docs/hr-api-reference.md" %}
371
+ ```
372
+
373
+ ### 2. Write the Knowledge Doc
374
+
375
+ ```markdown
376
+ ## HR API Reference
377
+
378
+ Call the `hr-api` tool with these arguments:
379
+
380
+ | Argument | Required | Description |
381
+ |----------|----------|-------------|
382
+ | `path` | yes | API path (e.g. `/jobs`, `/candidates/{id}`) |
383
+ | `method` | no | HTTP method (default: `GET`) |
384
+ | `query` | no | Query parameters |
385
+ | `body` | no | Request body for POST/PUT |
386
+
387
+ ### Endpoints
388
+
389
+ | Operation | Method | Path |
390
+ |-----------|--------|------|
391
+ | List jobs | GET | `/jobs` |
392
+ | List candidates | GET | `/jobs/{id}/candidates` |
393
+ | Get candidate | GET | `/jobs/{id}/candidates/{cid}` |
394
+
395
+ ### Important
396
+ Always use job-specific endpoints (`/jobs/{id}/candidates`)
397
+ — never the global `/candidates` endpoint.
398
+ ```
399
+
400
+ ### 3. Write Tests
401
+
402
+ ```yaml
403
+ # tests/hr.tests.yaml
404
+ version: "1.0"
405
+ extends: "../assistant.yaml"
406
+
407
+ tests:
408
+ defaults:
409
+ strict: false
410
+
411
+ cases:
412
+ - name: hr-pipeline-stats
413
+ description: "Show candidate counts per pipeline stage"
414
+ conversation:
415
+ routing: { max_loops: 2 }
416
+ turns:
417
+ - role: user
418
+ text: "Show me candidate statistics per stage for the SRE role"
419
+ mocks:
420
+ chat:
421
+ text: |
422
+ Pipeline for **Site Reliability Engineer**:
423
+ | Stage | Count |
424
+ |-------|-------|
425
+ | Sourced | 12 |
426
+ | Applied | 8 |
427
+ | Screening | 5 |
428
+ intent: chat
429
+ skills: [hr-system]
430
+ expect:
431
+ outputs:
432
+ - step: chat
433
+ path: text
434
+ matches: "(?i)sourced|applied|screen"
435
+ llm_judge:
436
+ - step: chat
437
+ turn: current
438
+ path: text
439
+ prompt: |
440
+ Does the response show candidates broken down
441
+ by pipeline stage with numbers?
442
+ schema:
443
+ properties:
444
+ has_stage_breakdown:
445
+ type: boolean
446
+ covers_multiple_stages:
447
+ type: boolean
448
+ assert:
449
+ has_stage_breakdown: true
450
+ covers_multiple_stages: true
451
+ ```
452
+
453
+ ### 4. Iterate
454
+
455
+ ```bash
456
+ # First: validate test structure
457
+ visor test --config tests/hr.tests.yaml
458
+
459
+ # Then: run against real AI + API
460
+ visor test --config tests/hr.tests.yaml --no-mocks
461
+
462
+ # Iterate on a specific failing case
463
+ visor test --config tests/hr.tests.yaml --case hr-pipeline-stats --no-mocks --debug
464
+ ```
465
+
466
+ Each iteration typically involves:
467
+ 1. Run `--no-mocks` and read the failure
468
+ 2. Fix the knowledge doc (wrong endpoint? missing guidance?) or the user prompt (too vague?)
469
+ 3. Relax assertions that are too strict for real responses
470
+ 4. Re-run until green
471
+
472
+ ## Example: MCP Command Tool Skill
473
+
474
+ Skills can also use external MCP servers:
475
+
476
+ ```yaml
477
+ # config/skills.yaml
478
+ - id: jira
479
+ description: |
480
+ request relates to Jira issues, tickets, sprints, or project tracking.
481
+ tools:
482
+ atlassian:
483
+ command: uvx
484
+ args: [mcp-atlassian]
485
+ env:
486
+ JIRA_URL: "${JIRA_URL}"
487
+ JIRA_API_TOKEN: "${JIRA_API_TOKEN}"
488
+ allowedMethods: [jira_get_issue, jira_search, jira_create_issue]
489
+ knowledge: |
490
+ You have access to Jira via the atlassian MCP tool.
491
+ Use `jira_search` with JQL queries to find issues.
492
+ Use `jira_get_issue` with an issue key like `PROJ-123`.
493
+ ```
494
+
495
+ Test the same way — conversation sugar works identically regardless of tool type.
496
+
497
+ ## Checklist
498
+
499
+ When adding a new skill with tests:
500
+
501
+ - [ ] Add skill to `config/skills.yaml` with description, tools, and knowledge
502
+ - [ ] Write knowledge doc in `docs/` (be explicit about which endpoints/methods to use)
503
+ - [ ] Add secrets to `.env`
504
+ - [ ] Create `tests/<skill>.tests.yaml` with `extends`
505
+ - [ ] Write first test case with mocks — run `visor test` to validate
506
+ - [ ] Run with `--no-mocks` — iterate on knowledge doc and prompts
507
+ - [ ] Add multi-turn tests for complex flows
508
+ - [ ] Relax assertions that are too strict for real responses
509
+
510
+ ## Related Documentation
511
+
512
+ - [Getting Started](../testing/getting-started.md) — test framework basics
513
+ - [DSL Reference](../testing/dsl-reference.md) — complete test YAML schema
514
+ - [Assertions](../testing/assertions.md) — all assertion types including LLM judge
515
+ - [Flows](../testing/flows.md) — multi-stage tests and conversation sugar
516
+ - [Fixtures and Mocks](../testing/fixtures-and-mocks.md) — mock format reference
517
+ - [Cookbook](../testing/cookbook.md) — copy-pasteable test recipes
518
+ - [Assistant Workflows](../assistant-workflows.md) — skills, intents, and tool types
519
+ - [Tools & Toolkits](../tools-and-toolkits.md) — tool definition reference
@@ -29,6 +29,16 @@ tests:
29
29
  tags: "local,fast" # or [local, fast]
30
30
  exclude_tags: "experimental,slow" # or [experimental, slow]
31
31
 
32
+ hooks: # (optional) lifecycle hooks
33
+ before_all:
34
+ exec: <shell-command> # runs once before all cases
35
+ after_all:
36
+ exec: <shell-command> # runs once after all cases (always)
37
+ before_each:
38
+ exec: <shell-command> # runs before each case
39
+ after_each:
40
+ exec: <shell-command> # runs after each case (always)
41
+
32
42
  fixtures: [] # (optional) suite-level custom fixtures
33
43
 
34
44
  cases:
@@ -37,6 +47,13 @@ tests:
37
47
  skip: false|true
38
48
  ai_include_code_context: false # per-case override
39
49
 
50
+ hooks: # (optional) per-case lifecycle hooks
51
+ before:
52
+ exec: <shell-command> # runs before this case
53
+ after:
54
+ exec: <shell-command> # runs after this case (always)
55
+ timeout: 10000 # optional timeout in ms
56
+
40
57
  # Single-event case
41
58
  event: pr_opened | pr_updated | pr_closed | issue_opened | issue_comment | manual
42
59
  fixture: <builtin|{ builtin, overrides }>
@@ -85,6 +102,82 @@ tests:
85
102
  error_code: 500
86
103
  ```
87
104
 
105
+ ## Lifecycle Hooks
106
+
107
+ Hooks let you run shell commands at key points in the test lifecycle — useful for seeding databases, starting servers, or cleaning up test data.
108
+
109
+ ### Suite-level hooks
110
+
111
+ Defined under `tests.hooks`:
112
+
113
+ ```yaml
114
+ tests:
115
+ hooks:
116
+ before_all:
117
+ exec: npx tsx test-data/seed-db.ts
118
+ after_all:
119
+ exec: npx tsx test-data/clean-db.ts
120
+ before_each:
121
+ exec: npx tsx test-data/reset-state.ts
122
+ after_each:
123
+ exec: npx tsx test-data/cleanup-case.ts
124
+ cases: [...]
125
+ ```
126
+
127
+ | Hook | When | Runs |
128
+ |------|------|------|
129
+ | `before_all` | Once before any case | If it fails, all cases are skipped |
130
+ | `after_all` | Once after all cases | Always runs (like `finally`) |
131
+ | `before_each` | Before every case | If it fails, that case is skipped |
132
+ | `after_each` | After every case | Always runs (like `finally`) |
133
+
134
+ ### Case-level hooks
135
+
136
+ Defined under `case.hooks`:
137
+
138
+ ```yaml
139
+ cases:
140
+ - name: update-settlement
141
+ hooks:
142
+ before:
143
+ exec: npx tsx test-data/seed-db.ts --case update-settlement
144
+ after:
145
+ exec: npx tsx test-data/seed-db.ts --clean
146
+ timeout: 10000 # optional, default 30000ms
147
+ event: manual
148
+ mocks: { ... }
149
+ ```
150
+
151
+ | Hook | When | Runs |
152
+ |------|------|------|
153
+ | `before` | Before this specific case (after `before_each`) | If it fails, the case is skipped |
154
+ | `after` | After this specific case (before `after_each`) | Always runs (like `finally`) |
155
+
156
+ ### Hook properties
157
+
158
+ | Property | Type | Required | Description |
159
+ |----------|------|----------|-------------|
160
+ | `exec` | string | yes | Shell command to run |
161
+ | `timeout` | number | no | Timeout in ms (default: 30000) |
162
+
163
+ ### Execution order
164
+
165
+ For each case, hooks run in this order:
166
+
167
+ 1. `before_each` (suite)
168
+ 2. `before` (case)
169
+ 3. *test execution*
170
+ 4. `after` (case)
171
+ 5. `after_each` (suite)
172
+
173
+ Hooks inherit all environment variables from the parent process, so seed scripts can use the same `DB_PATH`, API keys, etc. that your checks use.
174
+
175
+ ### Error handling
176
+
177
+ - If `before_all` fails → all cases are skipped and reported as failed
178
+ - If `before_each` or `before` fails → that case is skipped and reported as failed
179
+ - `after`, `after_each`, and `after_all` always run, even if the test or a prior hook failed
180
+
88
181
  ## Fixtures
89
182
 
90
183
  - Built-in GitHub fixtures: `gh.pr_open.minimal`, `gh.pr_sync.minimal`, `gh.pr_closed.minimal`, `gh.issue_open.minimal`, `gh.issue_comment.standard`, `gh.issue_comment.visor_help`, `gh.issue_comment.visor_regenerate`.