rspec-agents 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (104) hide show
  1. checksums.yaml +7 -0
  2. data/bin/rspec-agents +24 -0
  3. data/lib/async_workers/channel_config.rb +34 -0
  4. data/lib/async_workers/doc/process_manager_design.md +512 -0
  5. data/lib/async_workers/errors.rb +21 -0
  6. data/lib/async_workers/managed_process.rb +284 -0
  7. data/lib/async_workers/output_stream.rb +86 -0
  8. data/lib/async_workers/rpc_channel.rb +159 -0
  9. data/lib/async_workers/transport/base.rb +57 -0
  10. data/lib/async_workers/transport/stdio_transport.rb +91 -0
  11. data/lib/async_workers/transport/unix_socket_transport.rb +112 -0
  12. data/lib/async_workers/worker_group.rb +175 -0
  13. data/lib/async_workers.rb +17 -0
  14. data/lib/rspec/agents/agent_response.rb +61 -0
  15. data/lib/rspec/agents/agents/base.rb +123 -0
  16. data/lib/rspec/agents/cli.rb +342 -0
  17. data/lib/rspec/agents/conversation.rb +308 -0
  18. data/lib/rspec/agents/criterion.rb +237 -0
  19. data/lib/rspec/agents/doc/2026_01_22_observer-system-design.md +757 -0
  20. data/lib/rspec/agents/doc/2026_01_23_parallel_spec_runner-design.md +1060 -0
  21. data/lib/rspec/agents/doc/2026_01_27_event_serialization-design.md +294 -0
  22. data/lib/rspec/agents/doc/2026_01_27_experiment_aggregation_design.md +831 -0
  23. data/lib/rspec/agents/doc/2026_01_29_rspec-agents-studio-design.md +1332 -0
  24. data/lib/rspec/agents/doc/2026_01_29_testing-framework-design.md +1037 -0
  25. data/lib/rspec/agents/doc/2026_02_04-parallel-runner-ui.md +537 -0
  26. data/lib/rspec/agents/doc/2026_02_05_html_renderer_extensions.md +708 -0
  27. data/lib/rspec/agents/doc/scenario_guide.md +289 -0
  28. data/lib/rspec/agents/dsl/agent_proxy.rb +141 -0
  29. data/lib/rspec/agents/dsl/criterion_definition.rb +78 -0
  30. data/lib/rspec/agents/dsl/graph_builder.rb +38 -0
  31. data/lib/rspec/agents/dsl/runner_factory.rb +52 -0
  32. data/lib/rspec/agents/dsl/scenario_set_dsl.rb +166 -0
  33. data/lib/rspec/agents/dsl/test_context.rb +223 -0
  34. data/lib/rspec/agents/dsl/user_proxy.rb +71 -0
  35. data/lib/rspec/agents/dsl.rb +398 -0
  36. data/lib/rspec/agents/evaluation_result.rb +44 -0
  37. data/lib/rspec/agents/event_bus.rb +78 -0
  38. data/lib/rspec/agents/events.rb +141 -0
  39. data/lib/rspec/agents/isolated_event_bus.rb +86 -0
  40. data/lib/rspec/agents/judge.rb +244 -0
  41. data/lib/rspec/agents/llm/anthropic.rb +143 -0
  42. data/lib/rspec/agents/llm/base.rb +64 -0
  43. data/lib/rspec/agents/llm/mock.rb +181 -0
  44. data/lib/rspec/agents/llm/response.rb +52 -0
  45. data/lib/rspec/agents/matchers.rb +554 -0
  46. data/lib/rspec/agents/message.rb +81 -0
  47. data/lib/rspec/agents/metadata.rb +120 -0
  48. data/lib/rspec/agents/observers/base.rb +70 -0
  49. data/lib/rspec/agents/observers/parallel_terminal_observer.rb +151 -0
  50. data/lib/rspec/agents/observers/rpc_notify_observer.rb +43 -0
  51. data/lib/rspec/agents/observers/terminal_observer.rb +103 -0
  52. data/lib/rspec/agents/parallel/controller.rb +284 -0
  53. data/lib/rspec/agents/parallel/example_discovery.rb +153 -0
  54. data/lib/rspec/agents/parallel/partitioner.rb +31 -0
  55. data/lib/rspec/agents/parallel/run_result.rb +22 -0
  56. data/lib/rspec/agents/parallel/ui/interactive_ui.rb +605 -0
  57. data/lib/rspec/agents/parallel/ui/interleaved_ui.rb +139 -0
  58. data/lib/rspec/agents/parallel/ui/output_adapter.rb +127 -0
  59. data/lib/rspec/agents/parallel/ui/quiet_ui.rb +100 -0
  60. data/lib/rspec/agents/parallel/ui/ui_factory.rb +53 -0
  61. data/lib/rspec/agents/parallel/ui/ui_mode.rb +101 -0
  62. data/lib/rspec/agents/prompt_builders/base.rb +113 -0
  63. data/lib/rspec/agents/prompt_builders/criterion_evaluation.rb +136 -0
  64. data/lib/rspec/agents/prompt_builders/goal_achievement_evaluation.rb +142 -0
  65. data/lib/rspec/agents/prompt_builders/grounding_evaluation.rb +172 -0
  66. data/lib/rspec/agents/prompt_builders/intent_evaluation.rb +111 -0
  67. data/lib/rspec/agents/prompt_builders/topic_classification.rb +105 -0
  68. data/lib/rspec/agents/prompt_builders/user_simulation.rb +131 -0
  69. data/lib/rspec/agents/runners/headless_runner.rb +272 -0
  70. data/lib/rspec/agents/runners/parallel_terminal_runner.rb +220 -0
  71. data/lib/rspec/agents/runners/terminal_runner.rb +186 -0
  72. data/lib/rspec/agents/runners/user_simulator.rb +261 -0
  73. data/lib/rspec/agents/scenario.rb +133 -0
  74. data/lib/rspec/agents/scenario_loader.rb +145 -0
  75. data/lib/rspec/agents/serialization/conversation_renderer.rb +161 -0
  76. data/lib/rspec/agents/serialization/extension.rb +199 -0
  77. data/lib/rspec/agents/serialization/extensions/core_extension.rb +66 -0
  78. data/lib/rspec/agents/serialization/presenters.rb +281 -0
  79. data/lib/rspec/agents/serialization/run_data_aggregator.rb +197 -0
  80. data/lib/rspec/agents/serialization/run_data_builder.rb +189 -0
  81. data/lib/rspec/agents/serialization/templates/_alpine.min.js +5 -0
  82. data/lib/rspec/agents/serialization/templates/_base_components.css +196 -0
  83. data/lib/rspec/agents/serialization/templates/_base_components.js +46 -0
  84. data/lib/rspec/agents/serialization/templates/_conversation_fragment.html.haml +34 -0
  85. data/lib/rspec/agents/serialization/templates/_metadata_default.html.haml +17 -0
  86. data/lib/rspec/agents/serialization/templates/_scripts.js +89 -0
  87. data/lib/rspec/agents/serialization/templates/_styles.css +1211 -0
  88. data/lib/rspec/agents/serialization/templates/conversation_document.html.haml +29 -0
  89. data/lib/rspec/agents/serialization/templates/test_suite.html.haml +238 -0
  90. data/lib/rspec/agents/serialization/test_suite_renderer.rb +207 -0
  91. data/lib/rspec/agents/serialization.rb +374 -0
  92. data/lib/rspec/agents/simulator_config.rb +336 -0
  93. data/lib/rspec/agents/spec_executor.rb +494 -0
  94. data/lib/rspec/agents/stable_example_id.rb +147 -0
  95. data/lib/rspec/agents/templates/user_simulation.erb +9 -0
  96. data/lib/rspec/agents/tool_call.rb +53 -0
  97. data/lib/rspec/agents/topic.rb +307 -0
  98. data/lib/rspec/agents/topic_graph.rb +236 -0
  99. data/lib/rspec/agents/triggers.rb +122 -0
  100. data/lib/rspec/agents/turn.rb +63 -0
  101. data/lib/rspec/agents/turn_executor.rb +91 -0
  102. data/lib/rspec/agents/version.rb +7 -0
  103. data/lib/rspec/agents.rb +145 -0
  104. metadata +242 -0
@@ -0,0 +1,294 @@
1
+ # Event Serialization System Design Document
2
+
3
+ ## 1. Problem Domain & Requirements
4
+
5
+ ### 1.1 Overview
6
+
7
+ The rspec-agents framework emits events during test execution via the EventBus system (see `observer-system-design.md`). This document describes a serialization layer that:
8
+
9
+ - Captures events into structured data for rendering and analysis
10
+ - Supports extensible metadata (LLM tracing, custom user data)
11
+ - Enables file-based persistence for later viewing
12
+ - Works with both single-process and parallel test execution
13
+
14
+ ### 1.2 Existing Infrastructure
15
+
16
+ - **14 event types** in `lib/rspec_agents/events.rb` using `Data.define`
17
+ - **EventBus** singleton with thread-safe pub/sub (supports multiple subscribers)
18
+
19
+ ### 1.3 Design Goals
20
+
21
+ 1. **Generic metadata**: Extensible structure for arbitrary nested data (tracing, custom fields)
22
+ 2. **Single builder class**: Same `RunDataBuilder` used in single-process and parallel (controller) modes
23
+ 3. **Canonical AgentResponse**: `AgentResponse` event is the authoritative source for tool calls
24
+ 4. **Computed summaries**: Statistics calculated on-demand, not stored
25
+
26
+ ### 1.4 Non-Goals
27
+
28
+ - Streaming output (JSONL real-time tailing)
29
+ - Backward compatibility with `lib/rspec/agents` format
30
+ - Streaming chunks for token-by-token display
31
+
32
+ ---
33
+
34
+ ## 2. Architecture Overview
35
+
36
+ ### 2.1 Single Process Mode
37
+
38
+ ```
39
+ ┌─────────────────────────────────────────────────────────────────────────┐
40
+ │ RSpec Process │
41
+ │ │
42
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
43
+ │ │ RSpec Hooks │ │ Conversation │ │ LLM Adapter │ │
44
+ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
45
+ │ │ │ │ │
46
+ │ └─────────────────┼─────────────────┘ │
47
+ │ ▼ │
48
+ │ ┌──────────────┐ │
49
+ │ │ EventBus │ │
50
+ │ └──────┬───────┘ │
51
+ │ │ │
52
+ │ ▼ │
53
+ │ ┌──────────────┐ │
54
+ │ │RunDataBuilder│ │
55
+ │ └──────┬───────┘ │
56
+ │ │ │
57
+ │ ▼ │
58
+ │ ┌──────────────┐ │
59
+ │ │ RunData │ │
60
+ │ └──────────────┘ │
61
+ └─────────────────────────────────────────────────────────────────────────┘
62
+ ```
63
+
64
+ ### 2.2 Parallel Mode
65
+
66
+ ```
67
+ ┌─────────────────────────────────────────────────────────────────────────┐
68
+ │ Worker Processes │
69
+ │ │
70
+ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
71
+ │ │ Worker 1 │ │ Worker 2 │ │ Worker 3 │ │
72
+ │ │ │ │ │ │ │ │
73
+ │ │ EventBus ──────┼──┼─ EventBus ──────┼──┼─ EventBus ──────┼──┐ │
74
+ │ │ (local) │ │ (local) │ │ (local) │ │ │
75
+ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
76
+ │ │ │
77
+ └──────────────────────────────────────────────────────────────────┼───────┘
78
+
79
+ Events forwarded via IPC │
80
+
81
+ ┌─────────────────────────────────────────────────────────────────────────┐
82
+ │ Controller Process │
83
+ │ │
84
+ │ ┌──────────────┐ │
85
+ │ │ EventBus │ (receives forwarded events) │
86
+ │ └──────┬───────┘ │
87
+ │ │ │
88
+ │ ▼ │
89
+ │ ┌──────────────┐ │
90
+ │ │RunDataBuilder│ (same class as single mode) │
91
+ │ └──────┬───────┘ │
92
+ │ │ │
93
+ │ ▼ │
94
+ │ ┌──────────────┐ │
95
+ │ │ RunData │ │
96
+ │ └──────────────┘ │
97
+ └─────────────────────────────────────────────────────────────────────────┘
98
+ ```
99
+
100
+ The key design decision: workers forward serialized events via IPC, and the controller reconstructs them. This allows the same `RunDataBuilder` class to work identically in both modes—it just subscribes to whatever EventBus is available.
101
+
102
+ ---
103
+
104
+ ## 3. Metadata System
105
+
106
+ The `Metadata` class is implemented in `lib/rspec_agents/metadata.rb`. It provides:
107
+
108
+ - Dynamic attribute access (`metadata.field = value`)
109
+ - Hash-style access (`metadata[:key]`)
110
+ - Scoped assignment (`metadata.scope!(:tracing) { |t| t.latency = 123 }`)
111
+ - Nested scopes and deep access via `dig`
112
+
113
+ See the implementation for full API details.
114
+
115
+ ---
116
+
117
+ ## 4. Data Model
118
+
119
+ ### 4.1 Entity Hierarchy
120
+
121
+ ```
122
+ RunData
123
+ ├── run_id: String
124
+ ├── started_at: Time
125
+ ├── finished_at: Time?
126
+ ├── seed: Integer
127
+ ├── examples: Hash<String, ExampleData>
128
+ └── summary(): SummaryStats (computed)
129
+
130
+ ExampleData
131
+ ├── id: String (from RSpec example_id)
132
+ ├── file: String
133
+ ├── description: String
134
+ ├── location: String ("file:line")
135
+ ├── status: Symbol (:pending, :running, :passed, :failed)
136
+ ├── started_at: Time
137
+ ├── finished_at: Time?
138
+ ├── duration_ms: Integer?
139
+ ├── exception: ExceptionData?
140
+ ├── conversation: ConversationData?
141
+ ├── evaluations: Array<EvaluationData>
142
+ └── metadata: Metadata
143
+
144
+ ConversationData
145
+ ├── started_at: Time
146
+ ├── ended_at: Time?
147
+ ├── turns: Array<TurnData>
148
+ ├── final_topic: String?
149
+ └── metadata: Metadata
150
+
151
+ TurnData
152
+ ├── number: Integer (1-indexed)
153
+ ├── user_message: MessageData
154
+ ├── agent_response: MessageData?
155
+ ├── tool_calls: Array<ToolCallData>
156
+ ├── topic: String?
157
+ └── metadata: Metadata
158
+
159
+ MessageData
160
+ ├── role: Symbol (:user, :agent)
161
+ ├── content: String
162
+ ├── timestamp: Time
163
+ ├── source: Symbol? (:simulator, :script — user messages only)
164
+ └── metadata: Metadata (includes tracing for agent responses)
165
+
166
+ ToolCallData
167
+ ├── name: String
168
+ ├── arguments: Hash
169
+ ├── result: String?
170
+ ├── error: String?
171
+ ├── timestamp: Time
172
+ └── metadata: Metadata
173
+
174
+ EvaluationData
175
+ ├── name: String
176
+ ├── description: String
177
+ ├── passed: Boolean
178
+ ├── reasoning: String?
179
+ ├── timestamp: Time
180
+ └── metadata: Metadata
181
+
182
+ ExceptionData
183
+ ├── class_name: String
184
+ ├── message: String
185
+ └── backtrace: Array<String> (first 10 lines)
186
+ ```
187
+
188
+ ### 4.2 Design Decisions
189
+
190
+ **Computed summaries**: `RunData#summary` returns statistics (pass/fail counts, total duration) calculated on-demand rather than stored. This avoids synchronization issues in parallel mode and ensures consistency.
191
+
192
+ **AgentResponse is canonical**: The `AgentResponse` event from the EventBus is the authoritative source for tool calls. Any interim `ToolCallCompleted` events are discarded when `AgentResponse` arrives—no merging occurs.
193
+
194
+ **Metadata at every level**: Each data class has a `Metadata` field. This supports different use cases:
195
+ - `MessageData.metadata`: LLM tracing (tokens, latency, model)
196
+ - `ToolCallData.metadata`: Database records, API timing
197
+ - `ExampleData.metadata`: Custom test metadata
198
+
199
+ ---
200
+
201
+ ## 5. Event Types
202
+
203
+ ### 5.1 Tool Call Events
204
+
205
+ Tool calls are captured via `ToolCallCompleted` events:
206
+
207
+ | Event | Purpose | Key Fields |
208
+ |-------|---------|------------|
209
+ | `ToolCallCompleted` | Records completed tool execution | `tool_name`, `arguments`, `result`, `error`, `metadata` |
210
+
211
+ ### 5.2 Modified Events
212
+
213
+ | Event | Change |
214
+ |-------|--------|
215
+ | `AgentResponse` | Added `metadata` field (Hash) for tracing data |
216
+ | `SuiteStarted` | Added `seed` field for RSpec seed |
217
+
218
+ ### 5.3 Event Flow for a Turn
219
+
220
+ ```
221
+ 1. UserMessage → User says something
222
+ 2. ToolCallCompleted → Tool execution completes (optional, may repeat)
223
+ 3. AgentResponse → Agent responds (canonical source for tool_calls)
224
+ 4. [Next turn or ExamplePassed/ExampleFailed]
225
+ ```
226
+
227
+ The `AgentResponse` event is the authoritative source for tool calls. `ToolCallCompleted` events may be used for real-time display but are replaced by `AgentResponse.tool_calls` in the final data.
228
+
229
+ ---
230
+
231
+ ## 6. RunDataBuilder
232
+
233
+ ### 6.1 Responsibilities
234
+
235
+ The `RunDataBuilder` class:
236
+ - Subscribes to all relevant events on the EventBus
237
+ - Maintains internal state for in-progress turns
238
+ - Builds the `RunData` structure incrementally
239
+ - Is thread-safe (uses mutex for parallel event arrival)
240
+
241
+ ### 6.2 State Management
242
+
243
+ The builder tracks state per example:
244
+
245
+ **ConversationData creation**: Triggered by `ExampleStarted`. Each example gets a `ConversationData` initialized when the example begins.
246
+
247
+ **Turn tracking**: The builder tracks "current turns" per example—turns that have received a `UserMessage` but haven't yet been finalized. A turn is finalized when:
248
+ - A new `UserMessage` arrives (starts next turn)
249
+ - `ExamplePassed` or `ExampleFailed` is received
250
+
251
+ ### 6.3 Tool Call Handling
252
+
253
+ When `AgentResponse` arrives:
254
+ - The `tool_calls` field from the event is used directly
255
+ - Any interim `ToolCallCompleted` events are discarded (no merging)
256
+
257
+ ---
258
+
259
+ ## 7. Serialization
260
+
261
+ ### 7.1 JSON Format
262
+
263
+ All data classes implement `to_h` and `self.from_h` for JSON serialization:
264
+ - Times serialize as ISO 8601 with milliseconds
265
+ - Symbols serialize as strings
266
+ - Nested structures recurse
267
+
268
+ ### 7.2 File Operations
269
+
270
+ `JsonFile` provides simple read/write:
271
+
272
+ ```ruby
273
+ JsonFile.write("tmp/rspec_agents/run.json", run_data)
274
+ run_data = JsonFile.read("tmp/rspec_agents/run.json")
275
+ ```
276
+
277
+ ---
278
+
279
+ ## 8. Parallel Execution
280
+
281
+ ### 8.1 Event Forwarding
282
+
283
+ In parallel mode:
284
+ 1. Each worker has its own local EventBus
285
+ 2. Workers serialize events and forward via IPC (mechanism depends on parallel runner)
286
+ 3. Controller receives events and republishes to its EventBus
287
+ 4. Controller's `RunDataBuilder` processes events identically to single-process mode
288
+
289
+ ### 8.2 Design Rationale
290
+
291
+ By forwarding events rather than partial data structures:
292
+ - Workers don't need to know about serialization format
293
+ - Controller uses the same builder class
294
+ - Event ordering is preserved per example (though interleaved across examples)