@probelabs/visor 0.1.174-ee → 0.1.175-ee
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -2
- package/dist/cli-main.d.ts.map +1 -1
- package/dist/docs/guides/tdd-assistant-workflows.md +519 -0
- package/dist/docs/testing/dsl-reference.md +93 -0
- package/dist/examples/lifecycle-hooks.tests.yaml +62 -0
- package/dist/generated/config-schema.d.ts +28 -7
- package/dist/generated/config-schema.d.ts.map +1 -1
- package/dist/generated/config-schema.json +31 -7
- package/dist/index.js +330 -25
- package/dist/providers/ai-check-provider.d.ts.map +1 -1
- package/dist/providers/mcp-custom-sse-server.d.ts +4 -0
- package/dist/providers/mcp-custom-sse-server.d.ts.map +1 -1
- package/dist/sdk/{check-provider-registry-53C2ZIXJ.mjs → check-provider-registry-K34RCO6G.mjs} +3 -3
- package/dist/sdk/{check-provider-registry-UPQNHHFF.mjs → check-provider-registry-O36CQEGD.mjs} +3 -3
- package/dist/sdk/{chunk-GKSSG5IM.mjs → chunk-4Z6HTWGJ.mjs} +153 -14
- package/dist/sdk/chunk-4Z6HTWGJ.mjs.map +1 -0
- package/dist/sdk/{chunk-2PL2YH3B.mjs → chunk-FZPCP444.mjs} +153 -14
- package/dist/sdk/chunk-FZPCP444.mjs.map +1 -0
- package/dist/sdk/{chunk-W4KCJM6J.mjs → chunk-MLXGCLZJ.mjs} +29 -8
- package/dist/sdk/chunk-MLXGCLZJ.mjs.map +1 -0
- package/dist/sdk/{config-BVL3KFMB.mjs → config-4JMBJKWS.mjs} +2 -2
- package/dist/sdk/{schedule-tool-5KDBDCFO.mjs → schedule-tool-XOXKUW5G.mjs} +3 -3
- package/dist/sdk/{schedule-tool-UMDRCNO5.mjs → schedule-tool-XVSYLH4Z.mjs} +3 -3
- package/dist/sdk/{schedule-tool-handler-5EPTHBLS.mjs → schedule-tool-handler-3I6AZ4N7.mjs} +3 -3
- package/dist/sdk/{schedule-tool-handler-MUF5V36L.mjs → schedule-tool-handler-CFMFHDUL.mjs} +3 -3
- package/dist/sdk/sdk.d.mts +9 -1
- package/dist/sdk/sdk.d.ts +9 -1
- package/dist/sdk/sdk.js +172 -12
- package/dist/sdk/sdk.js.map +1 -1
- package/dist/sdk/sdk.mjs +2 -2
- package/dist/sdk/{workflow-check-provider-EWMZEEES.mjs → workflow-check-provider-ETM452BO.mjs} +3 -3
- package/dist/sdk/{workflow-check-provider-RQUCBAYY.mjs → workflow-check-provider-EV6VCG7M.mjs} +3 -3
- package/dist/test-runner/conversation-sugar.d.ts.map +1 -1
- package/dist/test-runner/index.d.ts +19 -0
- package/dist/test-runner/index.d.ts.map +1 -1
- package/dist/test-runner/validator.d.ts.map +1 -1
- package/dist/types/config.d.ts +9 -1
- package/dist/types/config.d.ts.map +1 -1
- package/package.json +1 -1
- package/dist/sdk/chunk-2PL2YH3B.mjs.map +0 -1
- package/dist/sdk/chunk-GKSSG5IM.mjs.map +0 -1
- package/dist/sdk/chunk-W4KCJM6J.mjs.map +0 -1
- /package/dist/sdk/{check-provider-registry-53C2ZIXJ.mjs.map → check-provider-registry-K34RCO6G.mjs.map} +0 -0
- /package/dist/sdk/{check-provider-registry-UPQNHHFF.mjs.map → check-provider-registry-O36CQEGD.mjs.map} +0 -0
- /package/dist/sdk/{config-BVL3KFMB.mjs.map → config-4JMBJKWS.mjs.map} +0 -0
- /package/dist/sdk/{schedule-tool-5KDBDCFO.mjs.map → schedule-tool-XOXKUW5G.mjs.map} +0 -0
- /package/dist/sdk/{schedule-tool-UMDRCNO5.mjs.map → schedule-tool-XVSYLH4Z.mjs.map} +0 -0
- /package/dist/sdk/{schedule-tool-handler-5EPTHBLS.mjs.map → schedule-tool-handler-3I6AZ4N7.mjs.map} +0 -0
- /package/dist/sdk/{schedule-tool-handler-MUF5V36L.mjs.map → schedule-tool-handler-CFMFHDUL.mjs.map} +0 -0
- /package/dist/sdk/{workflow-check-provider-EWMZEEES.mjs.map → workflow-check-provider-ETM452BO.mjs.map} +0 -0
- /package/dist/sdk/{workflow-check-provider-RQUCBAYY.mjs.map → workflow-check-provider-EV6VCG7M.mjs.map} +0 -0
package/README.md
CHANGED
|
@@ -34,6 +34,7 @@ Visor is an open-source workflow engine that lets you define multi-step AI pipel
|
|
|
34
34
|
| **Chat assistant / Bot** | [Bot Integrations](docs/bot-integrations.md) | [teams-assistant.yaml](examples/teams-assistant.yaml) |
|
|
35
35
|
| **Run shell commands + AI** | [Command Provider](docs/command-provider.md) | [ai-with-bash.yaml](examples/ai-with-bash.yaml) |
|
|
36
36
|
| **Connect MCP tools** | [MCP Provider](docs/mcp-provider.md) | [mcp-provider-example.yaml](examples/mcp-provider-example.yaml) |
|
|
37
|
+
| **Add API integrations (TDD)** | [Guide: TDD Assistant Workflows](docs/guides/tdd-assistant-workflows.md) | [workable.tests.yaml](https://github.com/TykTechnologies/REFINE/blob/main/Oel/tests/workable.tests.yaml) |
|
|
37
38
|
|
|
38
39
|
> **First time?** Run `npx visor init` to scaffold a working config, then `npx visor` to run it.
|
|
39
40
|
|
|
@@ -774,7 +775,7 @@ Learn more: [docs/enterprise-policy.md](docs/enterprise-policy.md)
|
|
|
774
775
|
[Configuration](docs/configuration.md) · [AI config](docs/ai-configuration.md) · [CLI commands](docs/commands.md) · [GitHub Auth](docs/github-auth.md) · [CI/CLI mode](docs/ci-cli-mode.md) · [GitHub Action reference](docs/action-reference.md) · [Migration](docs/migration.md) · [FAQ](docs/faq.md) · [Glossary](docs/glossary.md)
|
|
775
776
|
|
|
776
777
|
**Guides:**
|
|
777
|
-
[Tools & Toolkits](docs/tools-and-toolkits.md) · [Assistant workflows](docs/assistant-workflows.md) · [Workflow creation](docs/workflow-creation-guide.md) · [Workflow style guide](docs/guides/workflow-style-guide.md) · [Dependencies](docs/dependencies.md) · [forEach propagation](docs/foreach-dependency-propagation.md) · [Failure routing](docs/failure-routing.md) · [Router patterns](docs/router-patterns.md) · [Lifecycle hooks](docs/lifecycle-hooks.md) · [Liquid templates](docs/liquid-templates.md) · [Schema-template system](docs/schema-templates.md) · [Fail conditions](docs/fail-if.md) · [Failure conditions schema](docs/failure-conditions-schema.md) · [Failure conditions impl](docs/failure-conditions-implementation.md) · [Timeouts](docs/timeouts.md) · [Execution limits](docs/limits.md) · [Event triggers](docs/event-triggers.md) · [Output formats](docs/output-formats.md) · [Output formatting](docs/output-formatting.md) · [Default output schema](docs/default-output-schema.md) · [Output history](docs/output-history.md) · [Reusable workflows](docs/workflows.md) · [Criticality modes](docs/guides/criticality-modes.md) · [Fault management](docs/guides/fault-management-and-contracts.md)
|
|
778
|
+
[Tools & Toolkits](docs/tools-and-toolkits.md) · [Assistant workflows](docs/assistant-workflows.md) · [TDD for assistant workflows](docs/guides/tdd-assistant-workflows.md) · [Workflow creation](docs/workflow-creation-guide.md) · [Workflow style guide](docs/guides/workflow-style-guide.md) · [Dependencies](docs/dependencies.md) · [forEach propagation](docs/foreach-dependency-propagation.md) · [Failure routing](docs/failure-routing.md) · [Router patterns](docs/router-patterns.md) · [Lifecycle hooks](docs/lifecycle-hooks.md) · [Liquid templates](docs/liquid-templates.md) · [Schema-template system](docs/schema-templates.md) · [Fail conditions](docs/fail-if.md) · [Failure conditions schema](docs/failure-conditions-schema.md) · [Failure conditions impl](docs/failure-conditions-implementation.md) · [Timeouts](docs/timeouts.md) · [Execution limits](docs/limits.md) · [Event triggers](docs/event-triggers.md) · [Output formats](docs/output-formats.md) · [Output formatting](docs/output-formatting.md) · [Default output schema](docs/default-output-schema.md) · [Output history](docs/output-history.md) · [Reusable workflows](docs/workflows.md) · [Criticality modes](docs/guides/criticality-modes.md) · [Fault management](docs/guides/fault-management-and-contracts.md)
|
|
778
779
|
|
|
779
780
|
**Providers:**
|
|
780
781
|
[A2A](docs/a2a-provider.md) · [Command](docs/command-provider.md) · [Script](docs/script.md) · [MCP](docs/mcp-provider.md) · [MCP tools for AI](docs/mcp.md) · [Claude Code](docs/claude-code.md) · [AI custom tools](docs/ai-custom-tools.md) · [AI custom tools usage](docs/ai-custom-tools-usage.md) · [Custom tools](docs/custom-tools.md) · [GitHub ops](docs/github-ops.md) · [Git checkout](docs/providers/git-checkout.md) · [HTTP integration](docs/http.md) · [Memory](docs/memory.md) · [Human input](docs/human-input-provider.md) · [Custom providers](docs/pluggable.md)
|
|
@@ -783,7 +784,7 @@ Learn more: [docs/enterprise-policy.md](docs/enterprise-policy.md)
|
|
|
783
784
|
[Security](docs/security.md) · [Performance](docs/performance.md) · [Observability](docs/observability.md) · [Debugging](docs/debugging.md) · [Debug visualizer](docs/debug-visualizer.md) · [Telemetry setup](docs/telemetry-setup.md) · [Dashboards](docs/dashboards/README.md) · [Troubleshooting](docs/troubleshooting.md) · [Suppressions](docs/suppressions.md) · [GitHub checks](docs/GITHUB_CHECKS.md) · [Bot integrations](docs/bot-integrations.md) · [Slack](docs/slack-integration.md) · [Telegram](docs/telegram-integration.md) · [Email](docs/email-integration.md) · [WhatsApp](docs/whatsapp-integration.md) · [Teams](docs/teams-integration.md) · [Scheduler](docs/scheduler.md) · [Sandbox engines](docs/sandbox-engines.md)
|
|
784
785
|
|
|
785
786
|
**Testing:**
|
|
786
|
-
[Getting started](docs/testing/getting-started.md) · [DSL reference](docs/testing/dsl-reference.md) · [Flows](docs/testing/flows.md) · [Fixtures & mocks](docs/testing/fixtures-and-mocks.md) · [Assertions](docs/testing/assertions.md) · [Cookbook](docs/testing/cookbook.md) · [CLI & reporters](docs/testing/cli.md) · [CI integration](docs/testing/ci.md) · [Troubleshooting](docs/testing/troubleshooting.md)
|
|
787
|
+
[Getting started](docs/testing/getting-started.md) · [DSL reference](docs/testing/dsl-reference.md) · [Flows](docs/testing/flows.md) · [Fixtures & mocks](docs/testing/fixtures-and-mocks.md) · [Assertions](docs/testing/assertions.md) · [Cookbook](docs/testing/cookbook.md) · [TDD for assistants](docs/guides/tdd-assistant-workflows.md) · [CLI & reporters](docs/testing/cli.md) · [CI integration](docs/testing/ci.md) · [Troubleshooting](docs/testing/troubleshooting.md)
|
|
787
788
|
|
|
788
789
|
**Enterprise:**
|
|
789
790
|
[Licensing](docs/licensing.md) · [Enterprise policy](docs/enterprise-policy.md) · [Scheduler storage](docs/scheduler-storage.md) · [Database operations](docs/database-operations.md) · [Capacity planning](docs/capacity-planning.md) · [Production deployment](docs/production-deployment.md) · [Deployment](docs/DEPLOYMENT.md)
|
package/dist/cli-main.d.ts.map
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"","sourceRoot":"","sources":["file:///home/runner/work/visor/visor/src/cli-main.ts"],"names":[],"mappings":"
|
|
1
|
+
{"version":3,"file":"","sourceRoot":"","sources":["file:///home/runner/work/visor/visor/src/cli-main.ts"],"names":[],"mappings":"AAkrCA;;GAEG;AACH,wBAAsB,IAAI,IAAI,OAAO,CAAC,IAAI,CAAC,CA4gE1C"}
|
|
@@ -0,0 +1,519 @@
|
|
|
1
|
+
# Test-Driven Development for Assistant Workflows
|
|
2
|
+
|
|
3
|
+
Build and iterate on AI assistant workflows using visor's test framework. This guide covers the full cycle: define your workflow, write tests with mocks, then run against real AI to iterate on prompt quality and assertions.
|
|
4
|
+
|
|
5
|
+
## The TDD Cycle
|
|
6
|
+
|
|
7
|
+
1. **Define the workflow** — skills, tools, knowledge, intents
|
|
8
|
+
2. **Write tests with mocks** — expected conversations and assertions
|
|
9
|
+
3. **Run with mocks** — validate structure, routing, and assertion logic
|
|
10
|
+
4. **Run with `--no-mocks`** — real AI + real tools, iterate on quality
|
|
11
|
+
5. **Refine** — fix prompts, relax over-strict assertions, improve knowledge
|
|
12
|
+
|
|
13
|
+
## Setting Up
|
|
14
|
+
|
|
15
|
+
### Project Structure
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
my-assistant/
|
|
19
|
+
├── assistant.yaml # main workflow config
|
|
20
|
+
├── config/
|
|
21
|
+
│ └── skills.yaml # skill definitions
|
|
22
|
+
├── docs/
|
|
23
|
+
│ └── api-reference.md # knowledge docs for skills
|
|
24
|
+
├── tests/
|
|
25
|
+
│ └── skills.tests.yaml # test file
|
|
26
|
+
└── .env # API tokens (not committed)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### Test File Basics
|
|
30
|
+
|
|
31
|
+
Test files extend your main config:
|
|
32
|
+
|
|
33
|
+
```yaml
|
|
34
|
+
# tests/skills.tests.yaml
|
|
35
|
+
version: "1.0"
|
|
36
|
+
extends: "../assistant.yaml"
|
|
37
|
+
|
|
38
|
+
tests:
|
|
39
|
+
defaults:
|
|
40
|
+
strict: false # required for --no-mocks (internal steps run without expectations)
|
|
41
|
+
|
|
42
|
+
cases:
|
|
43
|
+
- name: basic-question
|
|
44
|
+
conversation:
|
|
45
|
+
routing: { max_loops: 2 }
|
|
46
|
+
turns:
|
|
47
|
+
- role: user
|
|
48
|
+
text: "What services do we run?"
|
|
49
|
+
mocks:
|
|
50
|
+
chat:
|
|
51
|
+
text: "We run 3 services: API gateway, dashboard, and pump."
|
|
52
|
+
intent: chat
|
|
53
|
+
skills: []
|
|
54
|
+
expect:
|
|
55
|
+
outputs:
|
|
56
|
+
- step: chat
|
|
57
|
+
path: text
|
|
58
|
+
matches: "(?i)gateway|dashboard|pump"
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Key fields:
|
|
62
|
+
- **`extends`** — imports the full workflow so the test runner knows all steps, skills, and routing
|
|
63
|
+
- **`strict: false`** — prevents failures from internal steps (routing, config building) that don't have assertions
|
|
64
|
+
- **`conversation`** — sugar syntax that auto-expands turns into flow stages with message history
|
|
65
|
+
|
|
66
|
+
## Writing Conversation Tests
|
|
67
|
+
|
|
68
|
+
### Single-Turn Test
|
|
69
|
+
|
|
70
|
+
The simplest test: one user message, assert on the response.
|
|
71
|
+
|
|
72
|
+
```yaml
|
|
73
|
+
- name: greeting
|
|
74
|
+
conversation:
|
|
75
|
+
routing: { max_loops: 2 }
|
|
76
|
+
turns:
|
|
77
|
+
- role: user
|
|
78
|
+
text: "Hello, who are you?"
|
|
79
|
+
mocks:
|
|
80
|
+
chat:
|
|
81
|
+
text: "I'm your engineering assistant. I can help with code, deployments, and more."
|
|
82
|
+
intent: chat
|
|
83
|
+
skills: []
|
|
84
|
+
expect:
|
|
85
|
+
calls:
|
|
86
|
+
- step: chat
|
|
87
|
+
exactly: 1
|
|
88
|
+
outputs:
|
|
89
|
+
- step: chat
|
|
90
|
+
path: text
|
|
91
|
+
matches: "(?i)assistant|help"
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
### Multi-Turn Conversation
|
|
95
|
+
|
|
96
|
+
Each turn's history automatically includes previous turns. Mock response text becomes assistant messages in subsequent turns.
|
|
97
|
+
|
|
98
|
+
```yaml
|
|
99
|
+
- name: code-explore-then-explain
|
|
100
|
+
conversation:
|
|
101
|
+
routing: { max_loops: 2 }
|
|
102
|
+
turns:
|
|
103
|
+
- role: user
|
|
104
|
+
text: "Find the authentication middleware in the backend service"
|
|
105
|
+
mocks:
|
|
106
|
+
chat:
|
|
107
|
+
text: "Found `auth.go` in `internal/middleware/`. It checks JWT tokens."
|
|
108
|
+
intent: chat
|
|
109
|
+
skills: [code-explorer]
|
|
110
|
+
expect:
|
|
111
|
+
outputs:
|
|
112
|
+
- step: chat
|
|
113
|
+
path: text
|
|
114
|
+
matches: "(?i)auth|middleware"
|
|
115
|
+
|
|
116
|
+
- role: user
|
|
117
|
+
text: "Explain how the JWT validation works in that auth middleware you found"
|
|
118
|
+
mocks:
|
|
119
|
+
chat:
|
|
120
|
+
text: "The middleware extracts the Bearer token, validates the signature..."
|
|
121
|
+
intent: chat
|
|
122
|
+
skills: [code-explorer]
|
|
123
|
+
expect:
|
|
124
|
+
llm_judge:
|
|
125
|
+
- step: chat
|
|
126
|
+
turn: current
|
|
127
|
+
path: text
|
|
128
|
+
prompt: "Does the response explain JWT validation with technical details?"
|
|
129
|
+
schema:
|
|
130
|
+
properties:
|
|
131
|
+
explains_jwt:
|
|
132
|
+
type: boolean
|
|
133
|
+
required: [explains_jwt]
|
|
134
|
+
assert:
|
|
135
|
+
explains_jwt: true
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
### Mock Response Format
|
|
139
|
+
|
|
140
|
+
Mocks simulate what the `chat` step returns:
|
|
141
|
+
|
|
142
|
+
```yaml
|
|
143
|
+
mocks:
|
|
144
|
+
chat:
|
|
145
|
+
text: "The response text..."
|
|
146
|
+
intent: chat # which intent was classified
|
|
147
|
+
skills: [code-explorer] # which skills were activated
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
Use **fictional data** in mocks. The mock text becomes the assistant message in subsequent turn history.
|
|
151
|
+
|
|
152
|
+
## Assertion Types
|
|
153
|
+
|
|
154
|
+
### Regex Matching (`outputs`)
|
|
155
|
+
|
|
156
|
+
Pattern-match on response fields:
|
|
157
|
+
|
|
158
|
+
```yaml
|
|
159
|
+
expect:
|
|
160
|
+
outputs:
|
|
161
|
+
- step: chat
|
|
162
|
+
path: text
|
|
163
|
+
matches: "(?i)kubernetes|k8s"
|
|
164
|
+
- step: chat
|
|
165
|
+
turn: current # only check this turn's output
|
|
166
|
+
path: text
|
|
167
|
+
matches: "(?i)deploy"
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
### Call Counting (`calls`)
|
|
171
|
+
|
|
172
|
+
Assert how many times a step ran:
|
|
173
|
+
|
|
174
|
+
```yaml
|
|
175
|
+
expect:
|
|
176
|
+
calls:
|
|
177
|
+
- step: chat
|
|
178
|
+
exactly: 1
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
### LLM Judge (`llm_judge`)
|
|
182
|
+
|
|
183
|
+
Semantic evaluation — assert on meaning, not exact text. This is the most powerful assertion for AI responses:
|
|
184
|
+
|
|
185
|
+
```yaml
|
|
186
|
+
expect:
|
|
187
|
+
llm_judge:
|
|
188
|
+
- step: chat
|
|
189
|
+
turn: current
|
|
190
|
+
path: text
|
|
191
|
+
prompt: |
|
|
192
|
+
Does the response provide a clear architectural overview
|
|
193
|
+
with component names and their responsibilities?
|
|
194
|
+
schema:
|
|
195
|
+
properties:
|
|
196
|
+
names_components:
|
|
197
|
+
type: boolean
|
|
198
|
+
description: "Names specific components or services?"
|
|
199
|
+
explains_responsibilities:
|
|
200
|
+
type: boolean
|
|
201
|
+
description: "Explains what each component does?"
|
|
202
|
+
required: [names_components, explains_responsibilities]
|
|
203
|
+
assert:
|
|
204
|
+
names_components: true
|
|
205
|
+
explains_responsibilities: true
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
The judge returns a JSON object matching your schema. `assert` checks specific fields. You can include fields in the schema for observability without asserting them.
|
|
209
|
+
|
|
210
|
+
### Cross-Turn Assertions
|
|
211
|
+
|
|
212
|
+
Reference previous turns using `turn: N` (1-based):
|
|
213
|
+
|
|
214
|
+
```yaml
|
|
215
|
+
# In turn 2, verify turn 1 was good too
|
|
216
|
+
expect:
|
|
217
|
+
llm_judge:
|
|
218
|
+
- step: chat
|
|
219
|
+
turn: current
|
|
220
|
+
path: text
|
|
221
|
+
prompt: "Does turn 2 build on the context from turn 1?"
|
|
222
|
+
...
|
|
223
|
+
- step: chat
|
|
224
|
+
turn: 1
|
|
225
|
+
path: text
|
|
226
|
+
prompt: "Did turn 1 provide a good foundation?"
|
|
227
|
+
...
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
## Running Tests
|
|
231
|
+
|
|
232
|
+
```bash
|
|
233
|
+
# Validate structure (no AI calls)
|
|
234
|
+
visor test --config tests/skills.tests.yaml
|
|
235
|
+
|
|
236
|
+
# Run a single case
|
|
237
|
+
visor test --config tests/skills.tests.yaml --case basic-question
|
|
238
|
+
|
|
239
|
+
# Run with real AI and real tools
|
|
240
|
+
visor test --config tests/skills.tests.yaml --no-mocks
|
|
241
|
+
|
|
242
|
+
# Real AI, single case
|
|
243
|
+
visor test --config tests/skills.tests.yaml --case basic-question --no-mocks
|
|
244
|
+
|
|
245
|
+
# Debug mode
|
|
246
|
+
visor test --config tests/skills.tests.yaml --case basic-question --no-mocks --debug
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
### Mock vs No-Mock Mode
|
|
250
|
+
|
|
251
|
+
| | Mock mode | `--no-mocks` |
|
|
252
|
+
|--|-----------|-------------|
|
|
253
|
+
| AI calls | Mocked responses | Real AI provider |
|
|
254
|
+
| Tools | Not called | Real tool execution |
|
|
255
|
+
| `routing.max_loops` | Use `0` | Use `2+` (AI needs iterations for tool calls) |
|
|
256
|
+
| `strict` | Can be `true` | Must be `false` (internal steps fire) |
|
|
257
|
+
| Speed | Fast (seconds) | Slow (AI latency + tool calls) |
|
|
258
|
+
| Use for | Structure validation, CI | Prompt quality iteration |
|
|
259
|
+
|
|
260
|
+
## Iterating with `--no-mocks`
|
|
261
|
+
|
|
262
|
+
This is where the real work happens. Common issues and fixes:
|
|
263
|
+
|
|
264
|
+
### AI ignores tool guidance
|
|
265
|
+
|
|
266
|
+
**Symptom:** AI calls wrong endpoint, uses wrong arguments, or skips required steps.
|
|
267
|
+
|
|
268
|
+
**Fix:** Improve the knowledge doc with explicit instructions:
|
|
269
|
+
|
|
270
|
+
```yaml
|
|
271
|
+
# In your skill knowledge:
|
|
272
|
+
knowledge: |
|
|
273
|
+
### Important: Always search by project ID
|
|
274
|
+
When looking up items, **always** use `/projects/{id}/items`
|
|
275
|
+
— never the global `/items` endpoint which returns all projects.
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
Also make user prompts more specific:
|
|
279
|
+
|
|
280
|
+
```yaml
|
|
281
|
+
# Before (too vague):
|
|
282
|
+
text: "List the items in review"
|
|
283
|
+
|
|
284
|
+
# After (explicit):
|
|
285
|
+
text: "List the items in review stage for the Backend project"
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
### Assertion too strict for real responses
|
|
289
|
+
|
|
290
|
+
**Symptom:** Mock includes specific details but real AI response formats them differently.
|
|
291
|
+
|
|
292
|
+
**Fix:** Keep the field in the schema for observability but remove from `assert`:
|
|
293
|
+
|
|
294
|
+
```yaml
|
|
295
|
+
schema:
|
|
296
|
+
properties:
|
|
297
|
+
lists_items:
|
|
298
|
+
type: boolean
|
|
299
|
+
includes_links:
|
|
300
|
+
type: boolean
|
|
301
|
+
description: "Includes profile links?"
|
|
302
|
+
required: [lists_items, includes_links]
|
|
303
|
+
assert:
|
|
304
|
+
lists_items: true
|
|
305
|
+
# includes_links intentionally not asserted — real API
|
|
306
|
+
# doesn't always return this without extra calls
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
### Turn 2 loses context from Turn 1
|
|
310
|
+
|
|
311
|
+
**Symptom:** AI asks "which items?" instead of referencing turn 1 results.
|
|
312
|
+
|
|
313
|
+
**Fix:** Make follow-up prompts self-contained:
|
|
314
|
+
|
|
315
|
+
```yaml
|
|
316
|
+
# Before:
|
|
317
|
+
text: "Now compare these candidates"
|
|
318
|
+
|
|
319
|
+
# After:
|
|
320
|
+
text: "Now compare the 3 candidates you just evaluated for the SRE role"
|
|
321
|
+
```
|
|
322
|
+
|
|
323
|
+
### `strict: false` not working
|
|
324
|
+
|
|
325
|
+
**Symptom:** Test fails with "Step executed without expect: chat.route-intent".
|
|
326
|
+
|
|
327
|
+
**Fix:** `strict: false` must be at the **test case level** or in `tests.defaults`, not inside `conversation:`:
|
|
328
|
+
|
|
329
|
+
```yaml
|
|
330
|
+
# Wrong — ignored by conversation sugar:
|
|
331
|
+
conversation:
|
|
332
|
+
strict: false
|
|
333
|
+
turns: ...
|
|
334
|
+
|
|
335
|
+
# Correct — applied to expanded flow stages:
|
|
336
|
+
- name: my-test
|
|
337
|
+
strict: false
|
|
338
|
+
conversation:
|
|
339
|
+
turns: ...
|
|
340
|
+
|
|
341
|
+
# Or set globally:
|
|
342
|
+
tests:
|
|
343
|
+
defaults:
|
|
344
|
+
strict: false
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
## Example: Adding an API Integration Skill
|
|
348
|
+
|
|
349
|
+
Here's a complete example adding an external REST API as a skill.
|
|
350
|
+
|
|
351
|
+
### 1. Define the Skill
|
|
352
|
+
|
|
353
|
+
```yaml
|
|
354
|
+
# config/skills.yaml
|
|
355
|
+
- id: hr-system
|
|
356
|
+
description: |
|
|
357
|
+
request relates to recruiting, hiring pipeline, candidates,
|
|
358
|
+
job postings, or HR pipeline management.
|
|
359
|
+
Examples: "list candidates", "show open positions", "evaluate candidate"
|
|
360
|
+
tools:
|
|
361
|
+
hr-api:
|
|
362
|
+
type: http_client
|
|
363
|
+
base_url: "https://api.hr-system.com/v3"
|
|
364
|
+
auth:
|
|
365
|
+
type: bearer
|
|
366
|
+
token: "${HR_API_TOKEN}"
|
|
367
|
+
headers:
|
|
368
|
+
Content-Type: "application/json"
|
|
369
|
+
knowledge: |
|
|
370
|
+
{% readfile "docs/hr-api-reference.md" %}
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
### 2. Write the Knowledge Doc
|
|
374
|
+
|
|
375
|
+
```markdown
|
|
376
|
+
## HR API Reference
|
|
377
|
+
|
|
378
|
+
Call the `hr-api` tool with these arguments:
|
|
379
|
+
|
|
380
|
+
| Argument | Required | Description |
|
|
381
|
+
|----------|----------|-------------|
|
|
382
|
+
| `path` | yes | API path (e.g. `/jobs`, `/candidates/{id}`) |
|
|
383
|
+
| `method` | no | HTTP method (default: `GET`) |
|
|
384
|
+
| `query` | no | Query parameters |
|
|
385
|
+
| `body` | no | Request body for POST/PUT |
|
|
386
|
+
|
|
387
|
+
### Endpoints
|
|
388
|
+
|
|
389
|
+
| Operation | Method | Path |
|
|
390
|
+
|-----------|--------|------|
|
|
391
|
+
| List jobs | GET | `/jobs` |
|
|
392
|
+
| List candidates | GET | `/jobs/{id}/candidates` |
|
|
393
|
+
| Get candidate | GET | `/jobs/{id}/candidates/{cid}` |
|
|
394
|
+
|
|
395
|
+
### Important
|
|
396
|
+
Always use job-specific endpoints (`/jobs/{id}/candidates`)
|
|
397
|
+
— never the global `/candidates` endpoint.
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
### 3. Write Tests
|
|
401
|
+
|
|
402
|
+
```yaml
|
|
403
|
+
# tests/hr.tests.yaml
|
|
404
|
+
version: "1.0"
|
|
405
|
+
extends: "../assistant.yaml"
|
|
406
|
+
|
|
407
|
+
tests:
|
|
408
|
+
defaults:
|
|
409
|
+
strict: false
|
|
410
|
+
|
|
411
|
+
cases:
|
|
412
|
+
- name: hr-pipeline-stats
|
|
413
|
+
description: "Show candidate counts per pipeline stage"
|
|
414
|
+
conversation:
|
|
415
|
+
routing: { max_loops: 2 }
|
|
416
|
+
turns:
|
|
417
|
+
- role: user
|
|
418
|
+
text: "Show me candidate statistics per stage for the SRE role"
|
|
419
|
+
mocks:
|
|
420
|
+
chat:
|
|
421
|
+
text: |
|
|
422
|
+
Pipeline for **Site Reliability Engineer**:
|
|
423
|
+
| Stage | Count |
|
|
424
|
+
|-------|-------|
|
|
425
|
+
| Sourced | 12 |
|
|
426
|
+
| Applied | 8 |
|
|
427
|
+
| Screening | 5 |
|
|
428
|
+
intent: chat
|
|
429
|
+
skills: [hr-system]
|
|
430
|
+
expect:
|
|
431
|
+
outputs:
|
|
432
|
+
- step: chat
|
|
433
|
+
path: text
|
|
434
|
+
matches: "(?i)sourced|applied|screen"
|
|
435
|
+
llm_judge:
|
|
436
|
+
- step: chat
|
|
437
|
+
turn: current
|
|
438
|
+
path: text
|
|
439
|
+
prompt: |
|
|
440
|
+
Does the response show candidates broken down
|
|
441
|
+
by pipeline stage with numbers?
|
|
442
|
+
schema:
|
|
443
|
+
properties:
|
|
444
|
+
has_stage_breakdown:
|
|
445
|
+
type: boolean
|
|
446
|
+
covers_multiple_stages:
|
|
447
|
+
type: boolean
|
|
448
|
+
assert:
|
|
449
|
+
has_stage_breakdown: true
|
|
450
|
+
covers_multiple_stages: true
|
|
451
|
+
```
|
|
452
|
+
|
|
453
|
+
### 4. Iterate
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
# First: validate test structure
|
|
457
|
+
visor test --config tests/hr.tests.yaml
|
|
458
|
+
|
|
459
|
+
# Then: run against real AI + API
|
|
460
|
+
visor test --config tests/hr.tests.yaml --no-mocks
|
|
461
|
+
|
|
462
|
+
# Iterate on a specific failing case
|
|
463
|
+
visor test --config tests/hr.tests.yaml --case hr-pipeline-stats --no-mocks --debug
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
Each iteration typically involves:
|
|
467
|
+
1. Run `--no-mocks` and read the failure
|
|
468
|
+
2. Fix the knowledge doc (wrong endpoint? missing guidance?) or the user prompt (too vague?)
|
|
469
|
+
3. Relax assertions that are too strict for real responses
|
|
470
|
+
4. Re-run until green
|
|
471
|
+
|
|
472
|
+
## Example: MCP Command Tool Skill
|
|
473
|
+
|
|
474
|
+
Skills can also use external MCP servers:
|
|
475
|
+
|
|
476
|
+
```yaml
|
|
477
|
+
# config/skills.yaml
|
|
478
|
+
- id: jira
|
|
479
|
+
description: |
|
|
480
|
+
request relates to Jira issues, tickets, sprints, or project tracking.
|
|
481
|
+
tools:
|
|
482
|
+
atlassian:
|
|
483
|
+
command: uvx
|
|
484
|
+
args: [mcp-atlassian]
|
|
485
|
+
env:
|
|
486
|
+
JIRA_URL: "${JIRA_URL}"
|
|
487
|
+
JIRA_API_TOKEN: "${JIRA_API_TOKEN}"
|
|
488
|
+
allowedMethods: [jira_get_issue, jira_search, jira_create_issue]
|
|
489
|
+
knowledge: |
|
|
490
|
+
You have access to Jira via the atlassian MCP tool.
|
|
491
|
+
Use `jira_search` with JQL queries to find issues.
|
|
492
|
+
Use `jira_get_issue` with an issue key like `PROJ-123`.
|
|
493
|
+
```
|
|
494
|
+
|
|
495
|
+
Test the same way — conversation sugar works identically regardless of tool type.
|
|
496
|
+
|
|
497
|
+
## Checklist
|
|
498
|
+
|
|
499
|
+
When adding a new skill with tests:
|
|
500
|
+
|
|
501
|
+
- [ ] Add skill to `config/skills.yaml` with description, tools, and knowledge
|
|
502
|
+
- [ ] Write knowledge doc in `docs/` (be explicit about which endpoints/methods to use)
|
|
503
|
+
- [ ] Add secrets to `.env`
|
|
504
|
+
- [ ] Create `tests/<skill>.tests.yaml` with `extends`
|
|
505
|
+
- [ ] Write first test case with mocks — run `visor test` to validate
|
|
506
|
+
- [ ] Run with `--no-mocks` — iterate on knowledge doc and prompts
|
|
507
|
+
- [ ] Add multi-turn tests for complex flows
|
|
508
|
+
- [ ] Relax assertions that are too strict for real responses
|
|
509
|
+
|
|
510
|
+
## Related Documentation
|
|
511
|
+
|
|
512
|
+
- [Getting Started](../testing/getting-started.md) — test framework basics
|
|
513
|
+
- [DSL Reference](../testing/dsl-reference.md) — complete test YAML schema
|
|
514
|
+
- [Assertions](../testing/assertions.md) — all assertion types including LLM judge
|
|
515
|
+
- [Flows](../testing/flows.md) — multi-stage tests and conversation sugar
|
|
516
|
+
- [Fixtures and Mocks](../testing/fixtures-and-mocks.md) — mock format reference
|
|
517
|
+
- [Cookbook](../testing/cookbook.md) — copy-pasteable test recipes
|
|
518
|
+
- [Assistant Workflows](../assistant-workflows.md) — skills, intents, and tool types
|
|
519
|
+
- [Tools & Toolkits](../tools-and-toolkits.md) — tool definition reference
|
|
@@ -29,6 +29,16 @@ tests:
|
|
|
29
29
|
tags: "local,fast" # or [local, fast]
|
|
30
30
|
exclude_tags: "experimental,slow" # or [experimental, slow]
|
|
31
31
|
|
|
32
|
+
hooks: # (optional) lifecycle hooks
|
|
33
|
+
before_all:
|
|
34
|
+
exec: <shell-command> # runs once before all cases
|
|
35
|
+
after_all:
|
|
36
|
+
exec: <shell-command> # runs once after all cases (always)
|
|
37
|
+
before_each:
|
|
38
|
+
exec: <shell-command> # runs before each case
|
|
39
|
+
after_each:
|
|
40
|
+
exec: <shell-command> # runs after each case (always)
|
|
41
|
+
|
|
32
42
|
fixtures: [] # (optional) suite-level custom fixtures
|
|
33
43
|
|
|
34
44
|
cases:
|
|
@@ -37,6 +47,13 @@ tests:
|
|
|
37
47
|
skip: false|true
|
|
38
48
|
ai_include_code_context: false # per-case override
|
|
39
49
|
|
|
50
|
+
hooks: # (optional) per-case lifecycle hooks
|
|
51
|
+
before:
|
|
52
|
+
exec: <shell-command> # runs before this case
|
|
53
|
+
after:
|
|
54
|
+
exec: <shell-command> # runs after this case (always)
|
|
55
|
+
timeout: 10000 # optional timeout in ms
|
|
56
|
+
|
|
40
57
|
# Single-event case
|
|
41
58
|
event: pr_opened | pr_updated | pr_closed | issue_opened | issue_comment | manual
|
|
42
59
|
fixture: <builtin|{ builtin, overrides }>
|
|
@@ -85,6 +102,82 @@ tests:
|
|
|
85
102
|
error_code: 500
|
|
86
103
|
```
|
|
87
104
|
|
|
105
|
+
## Lifecycle Hooks
|
|
106
|
+
|
|
107
|
+
Hooks let you run shell commands at key points in the test lifecycle — useful for seeding databases, starting servers, or cleaning up test data.
|
|
108
|
+
|
|
109
|
+
### Suite-level hooks
|
|
110
|
+
|
|
111
|
+
Defined under `tests.hooks`:
|
|
112
|
+
|
|
113
|
+
```yaml
|
|
114
|
+
tests:
|
|
115
|
+
hooks:
|
|
116
|
+
before_all:
|
|
117
|
+
exec: npx tsx test-data/seed-db.ts
|
|
118
|
+
after_all:
|
|
119
|
+
exec: npx tsx test-data/clean-db.ts
|
|
120
|
+
before_each:
|
|
121
|
+
exec: npx tsx test-data/reset-state.ts
|
|
122
|
+
after_each:
|
|
123
|
+
exec: npx tsx test-data/cleanup-case.ts
|
|
124
|
+
cases: [...]
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
| Hook | When | Runs |
|
|
128
|
+
|------|------|------|
|
|
129
|
+
| `before_all` | Once before any case | If it fails, all cases are skipped |
|
|
130
|
+
| `after_all` | Once after all cases | Always runs (like `finally`) |
|
|
131
|
+
| `before_each` | Before every case | If it fails, that case is skipped |
|
|
132
|
+
| `after_each` | After every case | Always runs (like `finally`) |
|
|
133
|
+
|
|
134
|
+
### Case-level hooks
|
|
135
|
+
|
|
136
|
+
Defined under `case.hooks`:
|
|
137
|
+
|
|
138
|
+
```yaml
|
|
139
|
+
cases:
|
|
140
|
+
- name: update-settlement
|
|
141
|
+
hooks:
|
|
142
|
+
before:
|
|
143
|
+
exec: npx tsx test-data/seed-db.ts --case update-settlement
|
|
144
|
+
after:
|
|
145
|
+
exec: npx tsx test-data/seed-db.ts --clean
|
|
146
|
+
timeout: 10000 # optional, default 30000ms
|
|
147
|
+
event: manual
|
|
148
|
+
mocks: { ... }
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
| Hook | When | Runs |
|
|
152
|
+
|------|------|------|
|
|
153
|
+
| `before` | Before this specific case (after `before_each`) | If it fails, the case is skipped |
|
|
154
|
+
| `after` | After this specific case (before `after_each`) | Always runs (like `finally`) |
|
|
155
|
+
|
|
156
|
+
### Hook properties
|
|
157
|
+
|
|
158
|
+
| Property | Type | Required | Description |
|
|
159
|
+
|----------|------|----------|-------------|
|
|
160
|
+
| `exec` | string | yes | Shell command to run |
|
|
161
|
+
| `timeout` | number | no | Timeout in ms (default: 30000) |
|
|
162
|
+
|
|
163
|
+
### Execution order
|
|
164
|
+
|
|
165
|
+
For each case, hooks run in this order:
|
|
166
|
+
|
|
167
|
+
1. `before_each` (suite)
|
|
168
|
+
2. `before` (case)
|
|
169
|
+
3. *test execution*
|
|
170
|
+
4. `after` (case)
|
|
171
|
+
5. `after_each` (suite)
|
|
172
|
+
|
|
173
|
+
Hooks inherit all environment variables from the parent process, so seed scripts can use the same `DB_PATH`, API keys, etc. that your checks use.
|
|
174
|
+
|
|
175
|
+
### Error handling
|
|
176
|
+
|
|
177
|
+
- If `before_all` fails → all cases are skipped and reported as failed
|
|
178
|
+
- If `before_each` or `before` fails → that case is skipped and reported as failed
|
|
179
|
+
- `after`, `after_each`, and `after_all` always run, even if the test or a prior hook failed
|
|
180
|
+
|
|
88
181
|
## Fixtures
|
|
89
182
|
|
|
90
183
|
- Built-in GitHub fixtures: `gh.pr_open.minimal`, `gh.pr_sync.minimal`, `gh.pr_closed.minimal`, `gh.issue_open.minimal`, `gh.issue_comment.standard`, `gh.issue_comment.visor_help`, `gh.issue_comment.visor_regenerate`.
|