rspec-agents 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/bin/rspec-agents +24 -0
- data/lib/async_workers/channel_config.rb +34 -0
- data/lib/async_workers/doc/process_manager_design.md +512 -0
- data/lib/async_workers/errors.rb +21 -0
- data/lib/async_workers/managed_process.rb +284 -0
- data/lib/async_workers/output_stream.rb +86 -0
- data/lib/async_workers/rpc_channel.rb +159 -0
- data/lib/async_workers/transport/base.rb +57 -0
- data/lib/async_workers/transport/stdio_transport.rb +91 -0
- data/lib/async_workers/transport/unix_socket_transport.rb +112 -0
- data/lib/async_workers/worker_group.rb +175 -0
- data/lib/async_workers.rb +17 -0
- data/lib/rspec/agents/agent_response.rb +61 -0
- data/lib/rspec/agents/agents/base.rb +123 -0
- data/lib/rspec/agents/cli.rb +342 -0
- data/lib/rspec/agents/conversation.rb +308 -0
- data/lib/rspec/agents/criterion.rb +237 -0
- data/lib/rspec/agents/doc/2026_01_22_observer-system-design.md +757 -0
- data/lib/rspec/agents/doc/2026_01_23_parallel_spec_runner-design.md +1060 -0
- data/lib/rspec/agents/doc/2026_01_27_event_serialization-design.md +294 -0
- data/lib/rspec/agents/doc/2026_01_27_experiment_aggregation_design.md +831 -0
- data/lib/rspec/agents/doc/2026_01_29_rspec-agents-studio-design.md +1332 -0
- data/lib/rspec/agents/doc/2026_01_29_testing-framework-design.md +1037 -0
- data/lib/rspec/agents/doc/2026_02_04-parallel-runner-ui.md +537 -0
- data/lib/rspec/agents/doc/2026_02_05_html_renderer_extensions.md +708 -0
- data/lib/rspec/agents/doc/scenario_guide.md +289 -0
- data/lib/rspec/agents/dsl/agent_proxy.rb +141 -0
- data/lib/rspec/agents/dsl/criterion_definition.rb +78 -0
- data/lib/rspec/agents/dsl/graph_builder.rb +38 -0
- data/lib/rspec/agents/dsl/runner_factory.rb +52 -0
- data/lib/rspec/agents/dsl/scenario_set_dsl.rb +166 -0
- data/lib/rspec/agents/dsl/test_context.rb +223 -0
- data/lib/rspec/agents/dsl/user_proxy.rb +71 -0
- data/lib/rspec/agents/dsl.rb +398 -0
- data/lib/rspec/agents/evaluation_result.rb +44 -0
- data/lib/rspec/agents/event_bus.rb +78 -0
- data/lib/rspec/agents/events.rb +141 -0
- data/lib/rspec/agents/isolated_event_bus.rb +86 -0
- data/lib/rspec/agents/judge.rb +244 -0
- data/lib/rspec/agents/llm/anthropic.rb +143 -0
- data/lib/rspec/agents/llm/base.rb +64 -0
- data/lib/rspec/agents/llm/mock.rb +181 -0
- data/lib/rspec/agents/llm/response.rb +52 -0
- data/lib/rspec/agents/matchers.rb +554 -0
- data/lib/rspec/agents/message.rb +81 -0
- data/lib/rspec/agents/metadata.rb +120 -0
- data/lib/rspec/agents/observers/base.rb +70 -0
- data/lib/rspec/agents/observers/parallel_terminal_observer.rb +151 -0
- data/lib/rspec/agents/observers/rpc_notify_observer.rb +43 -0
- data/lib/rspec/agents/observers/terminal_observer.rb +103 -0
- data/lib/rspec/agents/parallel/controller.rb +284 -0
- data/lib/rspec/agents/parallel/example_discovery.rb +153 -0
- data/lib/rspec/agents/parallel/partitioner.rb +31 -0
- data/lib/rspec/agents/parallel/run_result.rb +22 -0
- data/lib/rspec/agents/parallel/ui/interactive_ui.rb +605 -0
- data/lib/rspec/agents/parallel/ui/interleaved_ui.rb +139 -0
- data/lib/rspec/agents/parallel/ui/output_adapter.rb +127 -0
- data/lib/rspec/agents/parallel/ui/quiet_ui.rb +100 -0
- data/lib/rspec/agents/parallel/ui/ui_factory.rb +53 -0
- data/lib/rspec/agents/parallel/ui/ui_mode.rb +101 -0
- data/lib/rspec/agents/prompt_builders/base.rb +113 -0
- data/lib/rspec/agents/prompt_builders/criterion_evaluation.rb +136 -0
- data/lib/rspec/agents/prompt_builders/goal_achievement_evaluation.rb +142 -0
- data/lib/rspec/agents/prompt_builders/grounding_evaluation.rb +172 -0
- data/lib/rspec/agents/prompt_builders/intent_evaluation.rb +111 -0
- data/lib/rspec/agents/prompt_builders/topic_classification.rb +105 -0
- data/lib/rspec/agents/prompt_builders/user_simulation.rb +131 -0
- data/lib/rspec/agents/runners/headless_runner.rb +272 -0
- data/lib/rspec/agents/runners/parallel_terminal_runner.rb +220 -0
- data/lib/rspec/agents/runners/terminal_runner.rb +186 -0
- data/lib/rspec/agents/runners/user_simulator.rb +261 -0
- data/lib/rspec/agents/scenario.rb +133 -0
- data/lib/rspec/agents/scenario_loader.rb +145 -0
- data/lib/rspec/agents/serialization/conversation_renderer.rb +161 -0
- data/lib/rspec/agents/serialization/extension.rb +199 -0
- data/lib/rspec/agents/serialization/extensions/core_extension.rb +66 -0
- data/lib/rspec/agents/serialization/presenters.rb +281 -0
- data/lib/rspec/agents/serialization/run_data_aggregator.rb +197 -0
- data/lib/rspec/agents/serialization/run_data_builder.rb +189 -0
- data/lib/rspec/agents/serialization/templates/_alpine.min.js +5 -0
- data/lib/rspec/agents/serialization/templates/_base_components.css +196 -0
- data/lib/rspec/agents/serialization/templates/_base_components.js +46 -0
- data/lib/rspec/agents/serialization/templates/_conversation_fragment.html.haml +34 -0
- data/lib/rspec/agents/serialization/templates/_metadata_default.html.haml +17 -0
- data/lib/rspec/agents/serialization/templates/_scripts.js +89 -0
- data/lib/rspec/agents/serialization/templates/_styles.css +1211 -0
- data/lib/rspec/agents/serialization/templates/conversation_document.html.haml +29 -0
- data/lib/rspec/agents/serialization/templates/test_suite.html.haml +238 -0
- data/lib/rspec/agents/serialization/test_suite_renderer.rb +207 -0
- data/lib/rspec/agents/serialization.rb +374 -0
- data/lib/rspec/agents/simulator_config.rb +336 -0
- data/lib/rspec/agents/spec_executor.rb +494 -0
- data/lib/rspec/agents/stable_example_id.rb +147 -0
- data/lib/rspec/agents/templates/user_simulation.erb +9 -0
- data/lib/rspec/agents/tool_call.rb +53 -0
- data/lib/rspec/agents/topic.rb +307 -0
- data/lib/rspec/agents/topic_graph.rb +236 -0
- data/lib/rspec/agents/triggers.rb +122 -0
- data/lib/rspec/agents/turn.rb +63 -0
- data/lib/rspec/agents/turn_executor.rb +91 -0
- data/lib/rspec/agents/version.rb +7 -0
- data/lib/rspec/agents.rb +145 -0
- metadata +242 -0
|
@@ -0,0 +1,294 @@
|
|
|
1
|
+
# Event Serialization System Design Document
|
|
2
|
+
|
|
3
|
+
## 1. Problem Domain & Requirements
|
|
4
|
+
|
|
5
|
+
### 1.1 Overview
|
|
6
|
+
|
|
7
|
+
The rspec-agents framework emits events during test execution via the EventBus system (see `observer-system-design.md`). This document describes a serialization layer that:
|
|
8
|
+
|
|
9
|
+
- Captures events into structured data for rendering and analysis
|
|
10
|
+
- Supports extensible metadata (LLM tracing, custom user data)
|
|
11
|
+
- Enables file-based persistence for later viewing
|
|
12
|
+
- Works with both single-process and parallel test execution
|
|
13
|
+
|
|
14
|
+
### 1.2 Existing Infrastructure
|
|
15
|
+
|
|
16
|
+
- **14 event types** in `lib/rspec_agents/events.rb` using `Data.define`
|
|
17
|
+
- **EventBus** singleton with thread-safe pub/sub (supports multiple subscribers)
|
|
18
|
+
|
|
19
|
+
### 1.3 Design Goals
|
|
20
|
+
|
|
21
|
+
1. **Generic metadata**: Extensible structure for arbitrary nested data (tracing, custom fields)
|
|
22
|
+
2. **Single builder class**: Same `RunDataBuilder` used in single-process and parallel (controller) modes
|
|
23
|
+
3. **Canonical AgentResponse**: `AgentResponse` event is the authoritative source for tool calls
|
|
24
|
+
4. **Computed summaries**: Statistics calculated on-demand, not stored
|
|
25
|
+
|
|
26
|
+
### 1.4 Non-Goals
|
|
27
|
+
|
|
28
|
+
- Streaming output (JSONL real-time tailing)
|
|
29
|
+
- Backward compatibility with `lib/rspec/agents` format
|
|
30
|
+
- Streaming chunks for token-by-token display
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
## 2. Architecture Overview
|
|
35
|
+
|
|
36
|
+
### 2.1 Single Process Mode
|
|
37
|
+
|
|
38
|
+
```
|
|
39
|
+
┌─────────────────────────────────────────────────────────────────────────┐
|
|
40
|
+
│ RSpec Process │
|
|
41
|
+
│ │
|
|
42
|
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
43
|
+
│ │ RSpec Hooks │ │ Conversation │ │ LLM Adapter │ │
|
|
44
|
+
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
|
45
|
+
│ │ │ │ │
|
|
46
|
+
│ └─────────────────┼─────────────────┘ │
|
|
47
|
+
│ ▼ │
|
|
48
|
+
│ ┌──────────────┐ │
|
|
49
|
+
│ │ EventBus │ │
|
|
50
|
+
│ └──────┬───────┘ │
|
|
51
|
+
│ │ │
|
|
52
|
+
│ ▼ │
|
|
53
|
+
│ ┌──────────────┐ │
|
|
54
|
+
│ │RunDataBuilder│ │
|
|
55
|
+
│ └──────┬───────┘ │
|
|
56
|
+
│ │ │
|
|
57
|
+
│ ▼ │
|
|
58
|
+
│ ┌──────────────┐ │
|
|
59
|
+
│ │ RunData │ │
|
|
60
|
+
│ └──────────────┘ │
|
|
61
|
+
└─────────────────────────────────────────────────────────────────────────┘
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### 2.2 Parallel Mode
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
┌─────────────────────────────────────────────────────────────────────────┐
|
|
68
|
+
│ Worker Processes │
|
|
69
|
+
│ │
|
|
70
|
+
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
|
71
|
+
│ │ Worker 1 │ │ Worker 2 │ │ Worker 3 │ │
|
|
72
|
+
│ │ │ │ │ │ │ │
|
|
73
|
+
│ │ EventBus ──────┼──┼─ EventBus ──────┼──┼─ EventBus ──────┼──┐ │
|
|
74
|
+
│ │ (local) │ │ (local) │ │ (local) │ │ │
|
|
75
|
+
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
|
|
76
|
+
│ │ │
|
|
77
|
+
└──────────────────────────────────────────────────────────────────┼───────┘
|
|
78
|
+
│
|
|
79
|
+
Events forwarded via IPC │
|
|
80
|
+
▼
|
|
81
|
+
┌─────────────────────────────────────────────────────────────────────────┐
|
|
82
|
+
│ Controller Process │
|
|
83
|
+
│ │
|
|
84
|
+
│ ┌──────────────┐ │
|
|
85
|
+
│ │ EventBus │ (receives forwarded events) │
|
|
86
|
+
│ └──────┬───────┘ │
|
|
87
|
+
│ │ │
|
|
88
|
+
│ ▼ │
|
|
89
|
+
│ ┌──────────────┐ │
|
|
90
|
+
│ │RunDataBuilder│ (same class as single mode) │
|
|
91
|
+
│ └──────┬───────┘ │
|
|
92
|
+
│ │ │
|
|
93
|
+
│ ▼ │
|
|
94
|
+
│ ┌──────────────┐ │
|
|
95
|
+
│ │ RunData │ │
|
|
96
|
+
│ └──────────────┘ │
|
|
97
|
+
└─────────────────────────────────────────────────────────────────────────┘
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
The key design decision: workers forward serialized events via IPC, and the controller reconstructs them. This allows the same `RunDataBuilder` class to work identically in both modes—it just subscribes to whatever EventBus is available.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
## 3. Metadata System
|
|
105
|
+
|
|
106
|
+
The `Metadata` class is implemented in `lib/rspec_agents/metadata.rb`. It provides:
|
|
107
|
+
|
|
108
|
+
- Dynamic attribute access (`metadata.field = value`)
|
|
109
|
+
- Hash-style access (`metadata[:key]`)
|
|
110
|
+
- Scoped assignment (`metadata.scope!(:tracing) { |t| t.latency = 123 }`)
|
|
111
|
+
- Nested scopes and deep access via `dig`
|
|
112
|
+
|
|
113
|
+
See the implementation for full API details.
|
|
114
|
+
|
|
115
|
+
---
|
|
116
|
+
|
|
117
|
+
## 4. Data Model
|
|
118
|
+
|
|
119
|
+
### 4.1 Entity Hierarchy
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
RunData
|
|
123
|
+
├── run_id: String
|
|
124
|
+
├── started_at: Time
|
|
125
|
+
├── finished_at: Time?
|
|
126
|
+
├── seed: Integer
|
|
127
|
+
├── examples: Hash<String, ExampleData>
|
|
128
|
+
└── summary(): SummaryStats (computed)
|
|
129
|
+
|
|
130
|
+
ExampleData
|
|
131
|
+
├── id: String (from RSpec example_id)
|
|
132
|
+
├── file: String
|
|
133
|
+
├── description: String
|
|
134
|
+
├── location: String ("file:line")
|
|
135
|
+
├── status: Symbol (:pending, :running, :passed, :failed)
|
|
136
|
+
├── started_at: Time
|
|
137
|
+
├── finished_at: Time?
|
|
138
|
+
├── duration_ms: Integer?
|
|
139
|
+
├── exception: ExceptionData?
|
|
140
|
+
├── conversation: ConversationData?
|
|
141
|
+
├── evaluations: Array<EvaluationData>
|
|
142
|
+
└── metadata: Metadata
|
|
143
|
+
|
|
144
|
+
ConversationData
|
|
145
|
+
├── started_at: Time
|
|
146
|
+
├── ended_at: Time?
|
|
147
|
+
├── turns: Array<TurnData>
|
|
148
|
+
├── final_topic: String?
|
|
149
|
+
└── metadata: Metadata
|
|
150
|
+
|
|
151
|
+
TurnData
|
|
152
|
+
├── number: Integer (1-indexed)
|
|
153
|
+
├── user_message: MessageData
|
|
154
|
+
├── agent_response: MessageData?
|
|
155
|
+
├── tool_calls: Array<ToolCallData>
|
|
156
|
+
├── topic: String?
|
|
157
|
+
└── metadata: Metadata
|
|
158
|
+
|
|
159
|
+
MessageData
|
|
160
|
+
├── role: Symbol (:user, :agent)
|
|
161
|
+
├── content: String
|
|
162
|
+
├── timestamp: Time
|
|
163
|
+
├── source: Symbol? (:simulator, :script — user messages only)
|
|
164
|
+
└── metadata: Metadata (includes tracing for agent responses)
|
|
165
|
+
|
|
166
|
+
ToolCallData
|
|
167
|
+
├── name: String
|
|
168
|
+
├── arguments: Hash
|
|
169
|
+
├── result: String?
|
|
170
|
+
├── error: String?
|
|
171
|
+
├── timestamp: Time
|
|
172
|
+
└── metadata: Metadata
|
|
173
|
+
|
|
174
|
+
EvaluationData
|
|
175
|
+
├── name: String
|
|
176
|
+
├── description: String
|
|
177
|
+
├── passed: Boolean
|
|
178
|
+
├── reasoning: String?
|
|
179
|
+
├── timestamp: Time
|
|
180
|
+
└── metadata: Metadata
|
|
181
|
+
|
|
182
|
+
ExceptionData
|
|
183
|
+
├── class_name: String
|
|
184
|
+
├── message: String
|
|
185
|
+
└── backtrace: Array<String> (first 10 lines)
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### 4.2 Design Decisions
|
|
189
|
+
|
|
190
|
+
**Computed summaries**: `RunData#summary` returns statistics (pass/fail counts, total duration) calculated on-demand rather than stored. This avoids synchronization issues in parallel mode and ensures consistency.
|
|
191
|
+
|
|
192
|
+
**AgentResponse is canonical**: The `AgentResponse` event from the EventBus is the authoritative source for tool calls. Any interim `ToolCallCompleted` events are discarded when `AgentResponse` arrives—no merging occurs.
|
|
193
|
+
|
|
194
|
+
**Metadata at every level**: Each data class has a `Metadata` field. This supports different use cases:
|
|
195
|
+
- `MessageData.metadata`: LLM tracing (tokens, latency, model)
|
|
196
|
+
- `ToolCallData.metadata`: Database records, API timing
|
|
197
|
+
- `ExampleData.metadata`: Custom test metadata
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## 5. Event Types
|
|
202
|
+
|
|
203
|
+
### 5.1 Tool Call Events
|
|
204
|
+
|
|
205
|
+
Tool calls are captured via `ToolCallCompleted` events:
|
|
206
|
+
|
|
207
|
+
| Event | Purpose | Key Fields |
|
|
208
|
+
|-------|---------|------------|
|
|
209
|
+
| `ToolCallCompleted` | Records completed tool execution | `tool_name`, `arguments`, `result`, `error`, `metadata` |
|
|
210
|
+
|
|
211
|
+
### 5.2 Modified Events
|
|
212
|
+
|
|
213
|
+
| Event | Change |
|
|
214
|
+
|-------|--------|
|
|
215
|
+
| `AgentResponse` | Added `metadata` field (Hash) for tracing data |
|
|
216
|
+
| `SuiteStarted` | Added `seed` field for RSpec seed |
|
|
217
|
+
|
|
218
|
+
### 5.3 Event Flow for a Turn
|
|
219
|
+
|
|
220
|
+
```
|
|
221
|
+
1. UserMessage → User says something
|
|
222
|
+
2. ToolCallCompleted → Tool execution completes (optional, may repeat)
|
|
223
|
+
3. AgentResponse → Agent responds (canonical source for tool_calls)
|
|
224
|
+
4. [Next turn or ExamplePassed/ExampleFailed]
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
The `AgentResponse` event is the authoritative source for tool calls. `ToolCallCompleted` events may be used for real-time display but are replaced by `AgentResponse.tool_calls` in the final data.
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## 6. RunDataBuilder
|
|
232
|
+
|
|
233
|
+
### 6.1 Responsibilities
|
|
234
|
+
|
|
235
|
+
The `RunDataBuilder` class:
|
|
236
|
+
- Subscribes to all relevant events on the EventBus
|
|
237
|
+
- Maintains internal state for in-progress turns
|
|
238
|
+
- Builds the `RunData` structure incrementally
|
|
239
|
+
- Is thread-safe (uses mutex for parallel event arrival)
|
|
240
|
+
|
|
241
|
+
### 6.2 State Management
|
|
242
|
+
|
|
243
|
+
The builder tracks state per example:
|
|
244
|
+
|
|
245
|
+
**ConversationData creation**: Triggered by `ExampleStarted`. Each example gets a `ConversationData` initialized when the example begins.
|
|
246
|
+
|
|
247
|
+
**Turn tracking**: The builder tracks "current turns" per example—turns that have received a `UserMessage` but haven't yet been finalized. A turn is finalized when:
|
|
248
|
+
- A new `UserMessage` arrives (starts next turn)
|
|
249
|
+
- `ExamplePassed` or `ExampleFailed` is received
|
|
250
|
+
|
|
251
|
+
### 6.3 Tool Call Handling
|
|
252
|
+
|
|
253
|
+
When `AgentResponse` arrives:
|
|
254
|
+
- The `tool_calls` field from the event is used directly
|
|
255
|
+
- Any interim `ToolCallCompleted` events are discarded (no merging)
|
|
256
|
+
|
|
257
|
+
---
|
|
258
|
+
|
|
259
|
+
## 7. Serialization
|
|
260
|
+
|
|
261
|
+
### 7.1 JSON Format
|
|
262
|
+
|
|
263
|
+
All data classes implement `to_h` and `self.from_h` for JSON serialization:
|
|
264
|
+
- Times serialize as ISO 8601 with milliseconds
|
|
265
|
+
- Symbols serialize as strings
|
|
266
|
+
- Nested structures recurse
|
|
267
|
+
|
|
268
|
+
### 7.2 File Operations
|
|
269
|
+
|
|
270
|
+
`JsonFile` provides simple read/write:
|
|
271
|
+
|
|
272
|
+
```ruby
|
|
273
|
+
JsonFile.write("tmp/rspec_agents/run.json", run_data)
|
|
274
|
+
run_data = JsonFile.read("tmp/rspec_agents/run.json")
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
---
|
|
278
|
+
|
|
279
|
+
## 8. Parallel Execution
|
|
280
|
+
|
|
281
|
+
### 8.1 Event Forwarding
|
|
282
|
+
|
|
283
|
+
In parallel mode:
|
|
284
|
+
1. Each worker has its own local EventBus
|
|
285
|
+
2. Workers serialize events and forward via IPC (mechanism depends on parallel runner)
|
|
286
|
+
3. Controller receives events and republishes to its EventBus
|
|
287
|
+
4. Controller's `RunDataBuilder` processes events identically to single-process mode
|
|
288
|
+
|
|
289
|
+
### 8.2 Design Rationale
|
|
290
|
+
|
|
291
|
+
By forwarding events rather than partial data structures:
|
|
292
|
+
- Workers don't need to know about serialization format
|
|
293
|
+
- Controller uses the same builder class
|
|
294
|
+
- Event ordering is preserved per example (though interleaved across examples)
|