looped 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/PLAN.md +856 -0
- data/README.md +340 -0
- data/docs/self-improving-coding-agent.md +374 -0
- data/exe/looped +115 -0
- data/lib/looped/agent.rb +188 -0
- data/lib/looped/application.rb +252 -0
- data/lib/looped/judge.rb +90 -0
- data/lib/looped/memory.rb +96 -0
- data/lib/looped/optimizer.rb +267 -0
- data/lib/looped/signatures.rb +40 -0
- data/lib/looped/state.rb +120 -0
- data/lib/looped/tools/read_file.rb +35 -0
- data/lib/looped/tools/run_command.rb +56 -0
- data/lib/looped/tools/search_code.rb +38 -0
- data/lib/looped/tools/write_file.rb +37 -0
- data/lib/looped/types.rb +53 -0
- data/lib/looped/version.rb +6 -0
- data/lib/looped.rb +100 -0
- data/looped.gemspec +47 -0
- metadata +246 -0
data/PLAN.md
ADDED
|
@@ -0,0 +1,856 @@
|
|
|
1
|
+
# Looped - Self-Improving Coding Agent
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
**looped** is a standalone Ruby gem that provides a **self-improving coding agent**:
|
|
6
|
+
1. Uses **DSPy.rb ReAct** with coding tools for controlled, auditable actions
|
|
7
|
+
2. Implements **ephemeral memory** with context engineering (rich storage, lean prompts)
|
|
8
|
+
3. **Continuously evolves prompts** using GEPA running as a background async task
|
|
9
|
+
4. Evaluates with **LLM-as-judge** (configurable model)
|
|
10
|
+
5. **Persists state to disk** (~/.looped/) for cross-session learning
|
|
11
|
+
6. **Full Sorbet type annotations** throughout
|
|
12
|
+
|
|
13
|
+
**Dependency**: `dspy-rb` gem (including `dspy-gepa`)
|
|
14
|
+
|
|
15
|
+
## Architecture
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
19
|
+
│ Foreground │
|
|
20
|
+
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
21
|
+
│ │ Looped::Agent │ │
|
|
22
|
+
│ │ - Loads current best instructions from disk │ │
|
|
23
|
+
│ │ - Handles coding tasks with DSPy::ReAct + tools │ │
|
|
24
|
+
│ │ - Writes results to training buffer │ │
|
|
25
|
+
│ └───────────────────────────────────────────────────────────┘ │
|
|
26
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
27
|
+
│
|
|
28
|
+
▼ writes
|
|
29
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
30
|
+
│ ~/.looped/ │
|
|
31
|
+
│ ├── instructions.json # Current best instructions │
|
|
32
|
+
│ ├── frontier.json # Pareto frontier state │
|
|
33
|
+
│ ├── training_buffer.json # Recent task results for learning │
|
|
34
|
+
│ └── history/ # Historical training data │
|
|
35
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
36
|
+
│
|
|
37
|
+
▲ reads/updates
|
|
38
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
39
|
+
│ Background (Async Task) │
|
|
40
|
+
│ ┌───────────────────────────────────────────────────────────┐ │
|
|
41
|
+
│ │ Looped::Optimizer │ │
|
|
42
|
+
│ │ - Monitors training buffer for new results │ │
|
|
43
|
+
│ │ - Runs GEPA reflection cycles when buffer has data │ │
|
|
44
|
+
│ │ - Evaluates candidates against validation set │ │
|
|
45
|
+
│ │ - Hot-swaps instructions.json when improvement found │ │
|
|
46
|
+
│ └───────────────────────────────────────────────────────────┘ │
|
|
47
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Gem Structure
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
looped/
|
|
54
|
+
├── looped.gemspec
|
|
55
|
+
├── Gemfile
|
|
56
|
+
├── README.md
|
|
57
|
+
├── LICENSE.txt
|
|
58
|
+
├── bin/
|
|
59
|
+
│ └── looped # CLI entry point
|
|
60
|
+
├── lib/
|
|
61
|
+
│ ├── looped.rb # Main entry, Looped.start
|
|
62
|
+
│ └── looped/
|
|
63
|
+
│ ├── version.rb
|
|
64
|
+
│ ├── types.rb # Sorbet T::Struct types
|
|
65
|
+
│ ├── signatures.rb # DSPy Signatures
|
|
66
|
+
│ ├── agent.rb # Looped::Agent (ReAct-based)
|
|
67
|
+
│ ├── optimizer.rb # Looped::Optimizer (GEPA wrapper)
|
|
68
|
+
│ ├── state.rb # Looped::State (file persistence)
|
|
69
|
+
│ ├── judge.rb # Looped::Judge (LLM-as-judge)
|
|
70
|
+
│ ├── memory.rb # Looped::Memory (context engineering)
|
|
71
|
+
│ └── tools/
|
|
72
|
+
│ ├── base.rb # Common tool functionality
|
|
73
|
+
│ ├── read_file.rb
|
|
74
|
+
│ ├── write_file.rb
|
|
75
|
+
│ ├── run_command.rb # Docker-sandboxed
|
|
76
|
+
│ └── search_code.rb
|
|
77
|
+
└── spec/
|
|
78
|
+
├── spec_helper.rb
|
|
79
|
+
├── looped/
|
|
80
|
+
│ ├── agent_spec.rb
|
|
81
|
+
│ ├── optimizer_spec.rb
|
|
82
|
+
│ ├── state_spec.rb
|
|
83
|
+
│ └── memory_spec.rb
|
|
84
|
+
└── integration/
|
|
85
|
+
└── full_loop_spec.rb # VCR-recorded integration test
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Implementation Plan
|
|
89
|
+
|
|
90
|
+
### Step 1: Sorbet Types (lib/looped/types.rb)
|
|
91
|
+
|
|
92
|
+
```ruby
|
|
93
|
+
# typed: strict
|
|
94
|
+
# frozen_string_literal: true
|
|
95
|
+
|
|
96
|
+
module Looped
|
|
97
|
+
module Types
|
|
98
|
+
extend T::Sig
|
|
99
|
+
|
|
100
|
+
# Rich memory entry for storage/analytics
|
|
101
|
+
class MemoryEntry < T::Struct
|
|
102
|
+
const :action_type, String
|
|
103
|
+
const :action_input, T::Hash[String, T.untyped]
|
|
104
|
+
const :action_output, String
|
|
105
|
+
const :timestamp, String
|
|
106
|
+
const :model_id, T.nilable(String)
|
|
107
|
+
const :error, T.nilable(String)
|
|
108
|
+
const :tokens_used, T.nilable(Integer)
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
# Lean context entry for prompts
|
|
112
|
+
class ActionSummary < T::Struct
|
|
113
|
+
const :action, String
|
|
114
|
+
const :result, String
|
|
115
|
+
end
|
|
116
|
+
|
|
117
|
+
# Training result stored to buffer
|
|
118
|
+
class TrainingResult < T::Struct
|
|
119
|
+
const :task, String
|
|
120
|
+
const :solution, String
|
|
121
|
+
const :score, Float
|
|
122
|
+
const :feedback, String
|
|
123
|
+
const :timestamp, String
|
|
124
|
+
end
|
|
125
|
+
|
|
126
|
+
# Persisted instructions with metadata
|
|
127
|
+
class Instructions < T::Struct
|
|
128
|
+
const :thought_generator, T.nilable(String)
|
|
129
|
+
const :observation_processor, T.nilable(String)
|
|
130
|
+
const :score, Float
|
|
131
|
+
const :generation, Integer
|
|
132
|
+
const :updated_at, String
|
|
133
|
+
end
|
|
134
|
+
|
|
135
|
+
# Judgment from LLM-as-judge
|
|
136
|
+
class Judgment < T::Struct
|
|
137
|
+
const :score, Float
|
|
138
|
+
const :passed, T::Boolean
|
|
139
|
+
const :critique, String
|
|
140
|
+
const :suggestions, T::Array[String]
|
|
141
|
+
end
|
|
142
|
+
end
|
|
143
|
+
end
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### Step 2: DSPy Signatures (lib/looped/signatures.rb)
|
|
147
|
+
|
|
148
|
+
```ruby
|
|
149
|
+
# typed: strict
|
|
150
|
+
# frozen_string_literal: true
|
|
151
|
+
|
|
152
|
+
module Looped
|
|
153
|
+
# Main coding task signature
|
|
154
|
+
class CodingTaskSignature < DSPy::Signature
|
|
155
|
+
description "Complete a coding task in any programming language."
|
|
156
|
+
|
|
157
|
+
input do
|
|
158
|
+
const :task, String
|
|
159
|
+
const :context, String, default: ''
|
|
160
|
+
const :history, T::Array[Types::ActionSummary], default: []
|
|
161
|
+
end
|
|
162
|
+
|
|
163
|
+
output do
|
|
164
|
+
const :solution, String
|
|
165
|
+
const :files_modified, T::Array[String]
|
|
166
|
+
end
|
|
167
|
+
end
|
|
168
|
+
|
|
169
|
+
# LLM-as-Judge signature
|
|
170
|
+
class JudgeSignature < DSPy::Signature
|
|
171
|
+
description "Evaluate code quality and correctness."
|
|
172
|
+
|
|
173
|
+
input do
|
|
174
|
+
const :task, String
|
|
175
|
+
const :solution, String
|
|
176
|
+
const :expected_behavior, String
|
|
177
|
+
end
|
|
178
|
+
|
|
179
|
+
output do
|
|
180
|
+
const :score, Float
|
|
181
|
+
const :passed, T::Boolean
|
|
182
|
+
const :critique, String
|
|
183
|
+
const :suggestions, T::Array[String]
|
|
184
|
+
end
|
|
185
|
+
end
|
|
186
|
+
end
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Step 3: Memory with Context Engineering (lib/looped/memory.rb)
|
|
190
|
+
|
|
191
|
+
```ruby
|
|
192
|
+
# typed: strict
|
|
193
|
+
# frozen_string_literal: true
|
|
194
|
+
|
|
195
|
+
module Looped
|
|
196
|
+
class Memory
|
|
197
|
+
extend T::Sig
|
|
198
|
+
|
|
199
|
+
DEFAULT_MAX_ENTRIES = 10
|
|
200
|
+
DEFAULT_MAX_RESULT_LENGTH = 500
|
|
201
|
+
|
|
202
|
+
sig { params(max_entries: Integer, max_result_length: Integer).void }
|
|
203
|
+
def initialize(max_entries: DEFAULT_MAX_ENTRIES, max_result_length: DEFAULT_MAX_RESULT_LENGTH)
|
|
204
|
+
@entries = T.let([], T::Array[Types::MemoryEntry])
|
|
205
|
+
@max_entries = max_entries
|
|
206
|
+
@max_result_length = max_result_length
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
sig { params(action: String, input: T::Hash[String, T.untyped], output: String, model_id: T.nilable(String)).void }
|
|
210
|
+
def add(action:, input:, output:, model_id: nil)
|
|
211
|
+
@entries << Types::MemoryEntry.new(
|
|
212
|
+
action_type: action,
|
|
213
|
+
action_input: input,
|
|
214
|
+
action_output: output,
|
|
215
|
+
timestamp: Time.now.utc.iso8601,
|
|
216
|
+
model_id: model_id,
|
|
217
|
+
error: nil,
|
|
218
|
+
tokens_used: nil
|
|
219
|
+
)
|
|
220
|
+
end
|
|
221
|
+
|
|
222
|
+
sig { returns(T::Array[Types::ActionSummary]) }
|
|
223
|
+
def to_context
|
|
224
|
+
@entries.last(@max_entries).map do |entry|
|
|
225
|
+
Types::ActionSummary.new(
|
|
226
|
+
action: summarize_action(entry),
|
|
227
|
+
result: truncate(entry.action_output)
|
|
228
|
+
)
|
|
229
|
+
end
|
|
230
|
+
end
|
|
231
|
+
|
|
232
|
+
sig { returns(T::Array[Types::MemoryEntry]) }
|
|
233
|
+
def entries
|
|
234
|
+
@entries.dup
|
|
235
|
+
end
|
|
236
|
+
|
|
237
|
+
sig { void }
|
|
238
|
+
def clear
|
|
239
|
+
@entries.clear
|
|
240
|
+
end
|
|
241
|
+
|
|
242
|
+
private
|
|
243
|
+
|
|
244
|
+
sig { params(entry: Types::MemoryEntry).returns(String) }
|
|
245
|
+
def summarize_action(entry)
|
|
246
|
+
input_summary = entry.action_input.map { |k, v| "#{k}=#{v.to_s[0..50]}" }.join(', ')
|
|
247
|
+
"#{entry.action_type}(#{input_summary})"
|
|
248
|
+
end
|
|
249
|
+
|
|
250
|
+
sig { params(text: String).returns(String) }
|
|
251
|
+
def truncate(text)
|
|
252
|
+
return text if text.length <= @max_result_length
|
|
253
|
+
"#{text[0...@max_result_length]}..."
|
|
254
|
+
end
|
|
255
|
+
end
|
|
256
|
+
end
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
### Step 4: State Persistence (lib/looped/state.rb)
|
|
260
|
+
|
|
261
|
+
```ruby
|
|
262
|
+
# typed: strict
|
|
263
|
+
# frozen_string_literal: true
|
|
264
|
+
|
|
265
|
+
require 'json'
|
|
266
|
+
require 'fileutils'
|
|
267
|
+
|
|
268
|
+
module Looped
|
|
269
|
+
class State
|
|
270
|
+
extend T::Sig
|
|
271
|
+
|
|
272
|
+
STORAGE_DIR = T.let(File.expand_path('~/.looped'), String)
|
|
273
|
+
|
|
274
|
+
sig { void }
|
|
275
|
+
def initialize
|
|
276
|
+
FileUtils.mkdir_p(STORAGE_DIR)
|
|
277
|
+
FileUtils.mkdir_p(File.join(STORAGE_DIR, 'history'))
|
|
278
|
+
end
|
|
279
|
+
|
|
280
|
+
sig { returns(T.nilable(Types::Instructions)) }
|
|
281
|
+
def load_instructions
|
|
282
|
+
path = instructions_path
|
|
283
|
+
return nil unless File.exist?(path)
|
|
284
|
+
|
|
285
|
+
data = JSON.parse(File.read(path), symbolize_names: true)
|
|
286
|
+
Types::Instructions.new(
|
|
287
|
+
thought_generator: data.dig(:instructions, :thought_generator),
|
|
288
|
+
observation_processor: data.dig(:instructions, :observation_processor),
|
|
289
|
+
score: data[:score] || 0.0,
|
|
290
|
+
generation: data[:generation] || 0,
|
|
291
|
+
updated_at: data[:updated_at] || Time.now.utc.iso8601
|
|
292
|
+
)
|
|
293
|
+
end
|
|
294
|
+
|
|
295
|
+
sig { params(instructions: T::Hash[Symbol, T.nilable(String)], score: Float, generation: Integer).void }
|
|
296
|
+
def save_instructions(instructions:, score:, generation:)
|
|
297
|
+
data = {
|
|
298
|
+
instructions: instructions,
|
|
299
|
+
score: score,
|
|
300
|
+
generation: generation,
|
|
301
|
+
updated_at: Time.now.utc.iso8601
|
|
302
|
+
}
|
|
303
|
+
File.write(instructions_path, JSON.pretty_generate(data))
|
|
304
|
+
end
|
|
305
|
+
|
|
306
|
+
sig { params(result: Types::TrainingResult).void }
|
|
307
|
+
def append_training_result(result)
|
|
308
|
+
buffer = load_training_buffer
|
|
309
|
+
buffer << result.serialize
|
|
310
|
+
File.write(training_buffer_path, JSON.pretty_generate(buffer))
|
|
311
|
+
end
|
|
312
|
+
|
|
313
|
+
sig { returns(T::Array[Types::TrainingResult]) }
|
|
314
|
+
def peek_training_buffer
|
|
315
|
+
load_training_buffer.map { |data| deserialize_training_result(data) }
|
|
316
|
+
end
|
|
317
|
+
|
|
318
|
+
sig { returns(T::Array[Types::TrainingResult]) }
|
|
319
|
+
def consume_training_buffer
|
|
320
|
+
buffer = peek_training_buffer
|
|
321
|
+
return [] if buffer.empty?
|
|
322
|
+
|
|
323
|
+
# Archive to history
|
|
324
|
+
archive_path = File.join(STORAGE_DIR, 'history', "#{Time.now.to_i}.json")
|
|
325
|
+
File.write(archive_path, JSON.pretty_generate(load_training_buffer))
|
|
326
|
+
|
|
327
|
+
# Clear buffer
|
|
328
|
+
File.write(training_buffer_path, '[]')
|
|
329
|
+
|
|
330
|
+
buffer
|
|
331
|
+
end
|
|
332
|
+
|
|
333
|
+
private
|
|
334
|
+
|
|
335
|
+
sig { returns(String) }
|
|
336
|
+
def instructions_path
|
|
337
|
+
File.join(STORAGE_DIR, 'instructions.json')
|
|
338
|
+
end
|
|
339
|
+
|
|
340
|
+
sig { returns(String) }
|
|
341
|
+
def training_buffer_path
|
|
342
|
+
File.join(STORAGE_DIR, 'training_buffer.json')
|
|
343
|
+
end
|
|
344
|
+
|
|
345
|
+
sig { returns(T::Array[T::Hash[Symbol, T.untyped]]) }
|
|
346
|
+
def load_training_buffer
|
|
347
|
+
return [] unless File.exist?(training_buffer_path)
|
|
348
|
+
JSON.parse(File.read(training_buffer_path), symbolize_names: true)
|
|
349
|
+
end
|
|
350
|
+
|
|
351
|
+
sig { params(data: T::Hash[Symbol, T.untyped]).returns(Types::TrainingResult) }
|
|
352
|
+
def deserialize_training_result(data)
|
|
353
|
+
Types::TrainingResult.new(
|
|
354
|
+
task: data[:task],
|
|
355
|
+
solution: data[:solution],
|
|
356
|
+
score: data[:score],
|
|
357
|
+
feedback: data[:feedback],
|
|
358
|
+
timestamp: data[:timestamp]
|
|
359
|
+
)
|
|
360
|
+
end
|
|
361
|
+
end
|
|
362
|
+
end
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
### Step 5: Tools (lib/looped/tools/)
|
|
366
|
+
|
|
367
|
+
```ruby
|
|
368
|
+
# typed: strict
|
|
369
|
+
# frozen_string_literal: true
|
|
370
|
+
|
|
371
|
+
# lib/looped/tools/read_file.rb
|
|
372
|
+
module Looped
|
|
373
|
+
module Tools
|
|
374
|
+
class ReadFile < DSPy::Tools::Base
|
|
375
|
+
extend T::Sig
|
|
376
|
+
|
|
377
|
+
tool_name 'read_file'
|
|
378
|
+
tool_description 'Read contents of a file at the given path'
|
|
379
|
+
|
|
380
|
+
sig { params(path: String).returns(String) }
|
|
381
|
+
def call(path:)
|
|
382
|
+
File.read(path)
|
|
383
|
+
rescue Errno::ENOENT
|
|
384
|
+
"Error: File not found: #{path}"
|
|
385
|
+
rescue Errno::EACCES
|
|
386
|
+
"Error: Permission denied: #{path}"
|
|
387
|
+
rescue => e
|
|
388
|
+
"Error: #{e.message}"
|
|
389
|
+
end
|
|
390
|
+
end
|
|
391
|
+
|
|
392
|
+
# lib/looped/tools/write_file.rb
|
|
393
|
+
class WriteFile < DSPy::Tools::Base
|
|
394
|
+
extend T::Sig
|
|
395
|
+
|
|
396
|
+
tool_name 'write_file'
|
|
397
|
+
tool_description 'Write content to a file at the given path'
|
|
398
|
+
|
|
399
|
+
sig { params(path: String, content: String).returns(String) }
|
|
400
|
+
def call(path:, content:)
|
|
401
|
+
FileUtils.mkdir_p(File.dirname(path))
|
|
402
|
+
File.write(path, content)
|
|
403
|
+
"Successfully wrote #{content.length} bytes to #{path}"
|
|
404
|
+
rescue => e
|
|
405
|
+
"Error: #{e.message}"
|
|
406
|
+
end
|
|
407
|
+
end
|
|
408
|
+
|
|
409
|
+
# lib/looped/tools/search_code.rb
|
|
410
|
+
class SearchCode < DSPy::Tools::Base
|
|
411
|
+
extend T::Sig
|
|
412
|
+
|
|
413
|
+
tool_name 'search_code'
|
|
414
|
+
tool_description 'Search for a pattern in code files using ripgrep'
|
|
415
|
+
|
|
416
|
+
sig { params(pattern: String, path: String, file_type: T.nilable(String)).returns(String) }
|
|
417
|
+
def call(pattern:, path: '.', file_type: nil)
|
|
418
|
+
cmd = ['rg', '--line-number', '--no-heading', pattern, path]
|
|
419
|
+
cmd += ['--type', file_type] if file_type
|
|
420
|
+
|
|
421
|
+
output, status = Open3.capture2(*cmd)
|
|
422
|
+
status.success? ? output : "No matches found for: #{pattern}"
|
|
423
|
+
rescue => e
|
|
424
|
+
"Error: #{e.message}"
|
|
425
|
+
end
|
|
426
|
+
end
|
|
427
|
+
|
|
428
|
+
# lib/looped/tools/run_command.rb
|
|
429
|
+
class RunCommand < DSPy::Tools::Base
|
|
430
|
+
extend T::Sig
|
|
431
|
+
|
|
432
|
+
DEFAULT_TIMEOUT = 30
|
|
433
|
+
|
|
434
|
+
tool_name 'run_command'
|
|
435
|
+
tool_description 'Execute a shell command in a Docker sandbox and return output'
|
|
436
|
+
|
|
437
|
+
sig { params(command: String, timeout: Integer).returns(String) }
|
|
438
|
+
def call(command:, timeout: DEFAULT_TIMEOUT)
|
|
439
|
+
# TODO: Implement Docker sandbox via trusted-sandbox gem
|
|
440
|
+
# For now, basic execution with timeout
|
|
441
|
+
Timeout.timeout(timeout) do
|
|
442
|
+
output, status = Open3.capture2e(command)
|
|
443
|
+
"Exit code: #{status.exitstatus}\n#{output}"
|
|
444
|
+
end
|
|
445
|
+
rescue Timeout::Error
|
|
446
|
+
"Error: Command timed out after #{timeout} seconds"
|
|
447
|
+
rescue => e
|
|
448
|
+
"Error: #{e.message}"
|
|
449
|
+
end
|
|
450
|
+
end
|
|
451
|
+
end
|
|
452
|
+
end
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
### Step 6: Judge (lib/looped/judge.rb)
|
|
456
|
+
|
|
457
|
+
```ruby
|
|
458
|
+
# typed: strict
|
|
459
|
+
# frozen_string_literal: true
|
|
460
|
+
|
|
461
|
+
module Looped
|
|
462
|
+
class Judge < DSPy::Predict
|
|
463
|
+
extend T::Sig
|
|
464
|
+
|
|
465
|
+
sig { void }
|
|
466
|
+
def initialize
|
|
467
|
+
super(JudgeSignature)
|
|
468
|
+
end
|
|
469
|
+
|
|
470
|
+
sig { params(task: String, solution: String, expected_behavior: String).returns(Types::Judgment) }
|
|
471
|
+
def evaluate(task:, solution:, expected_behavior:)
|
|
472
|
+
result = call(
|
|
473
|
+
task: task,
|
|
474
|
+
solution: solution,
|
|
475
|
+
expected_behavior: expected_behavior
|
|
476
|
+
)
|
|
477
|
+
|
|
478
|
+
Types::Judgment.new(
|
|
479
|
+
score: result.score,
|
|
480
|
+
passed: result.passed,
|
|
481
|
+
critique: result.critique,
|
|
482
|
+
suggestions: result.suggestions
|
|
483
|
+
)
|
|
484
|
+
end
|
|
485
|
+
end
|
|
486
|
+
end
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
### Step 7: Agent (lib/looped/agent.rb)
|
|
490
|
+
|
|
491
|
+
```ruby
|
|
492
|
+
# typed: strict
|
|
493
|
+
# frozen_string_literal: true
|
|
494
|
+
|
|
495
|
+
module Looped
|
|
496
|
+
class Agent < DSPy::Module
|
|
497
|
+
extend T::Sig
|
|
498
|
+
|
|
499
|
+
around :update_memory
|
|
500
|
+
around :record_for_training
|
|
501
|
+
|
|
502
|
+
sig { params(tools: T::Array[DSPy::Tools::Base], state: State, max_context_entries: Integer).void }
|
|
503
|
+
def initialize(tools:, state:, max_context_entries: 10)
|
|
504
|
+
super()
|
|
505
|
+
@state = state
|
|
506
|
+
@memory = Memory.new(max_entries: max_context_entries)
|
|
507
|
+
@react = T.let(
|
|
508
|
+
DSPy::ReAct.new(CodingTaskSignature, tools: tools, max_iterations: 15),
|
|
509
|
+
DSPy::ReAct
|
|
510
|
+
)
|
|
511
|
+
@judge = T.let(Judge.new, Judge)
|
|
512
|
+
@current_task = T.let(nil, T.nilable(String))
|
|
513
|
+
|
|
514
|
+
reload_instructions
|
|
515
|
+
end
|
|
516
|
+
|
|
517
|
+
sig { void }
|
|
518
|
+
def reload_instructions
|
|
519
|
+
instructions = @state.load_instructions
|
|
520
|
+
return unless instructions
|
|
521
|
+
|
|
522
|
+
apply_instructions(instructions)
|
|
523
|
+
puts "[agent] Loaded instructions (gen #{instructions.generation}, score #{instructions.score.round(2)})"
|
|
524
|
+
end
|
|
525
|
+
|
|
526
|
+
sig { params(instructions: Types::Instructions).void }
|
|
527
|
+
def apply_instructions(instructions)
|
|
528
|
+
@react.with_instruction(instructions.thought_generator) if instructions.thought_generator
|
|
529
|
+
end
|
|
530
|
+
|
|
531
|
+
sig { returns(T::Hash[Symbol, T.nilable(String)]) }
|
|
532
|
+
def extract_instructions
|
|
533
|
+
{
|
|
534
|
+
thought_generator: @react.named_predictors['thought_generator']&.instruction,
|
|
535
|
+
observation_processor: @react.named_predictors['observation_processor']&.instruction
|
|
536
|
+
}
|
|
537
|
+
end
|
|
538
|
+
|
|
539
|
+
sig { params(task: String, context: String).returns(T.untyped) }
|
|
540
|
+
def forward(task:, context: '')
|
|
541
|
+
@current_task = task
|
|
542
|
+
history = @memory.to_context
|
|
543
|
+
|
|
544
|
+
@react.forward(
|
|
545
|
+
task: task,
|
|
546
|
+
context: context,
|
|
547
|
+
history: history
|
|
548
|
+
)
|
|
549
|
+
end
|
|
550
|
+
|
|
551
|
+
private
|
|
552
|
+
|
|
553
|
+
sig { params(_args: T.untyped, kwargs: T.untyped, block: T.proc.returns(T.untyped)).returns(T.untyped) }
|
|
554
|
+
def update_memory(_args, kwargs, &block)
|
|
555
|
+
result = yield
|
|
556
|
+
|
|
557
|
+
result.history&.each do |step|
|
|
558
|
+
@memory.add(
|
|
559
|
+
action: step[:action],
|
|
560
|
+
input: step[:action_input] || {},
|
|
561
|
+
output: step[:observation] || ''
|
|
562
|
+
)
|
|
563
|
+
end
|
|
564
|
+
|
|
565
|
+
result
|
|
566
|
+
end
|
|
567
|
+
|
|
568
|
+
sig { params(_args: T.untyped, kwargs: T.untyped, block: T.proc.returns(T.untyped)).returns(T.untyped) }
|
|
569
|
+
def record_for_training(_args, kwargs, &block)
|
|
570
|
+
result = yield
|
|
571
|
+
task = @current_task
|
|
572
|
+
|
|
573
|
+
return result unless task
|
|
574
|
+
|
|
575
|
+
judgment = @judge.evaluate(
|
|
576
|
+
task: task,
|
|
577
|
+
solution: result.solution || '',
|
|
578
|
+
expected_behavior: "Task completed successfully"
|
|
579
|
+
)
|
|
580
|
+
|
|
581
|
+
training_result = Types::TrainingResult.new(
|
|
582
|
+
task: task,
|
|
583
|
+
solution: result.solution || '',
|
|
584
|
+
score: judgment.score,
|
|
585
|
+
feedback: judgment.critique,
|
|
586
|
+
timestamp: Time.now.utc.iso8601
|
|
587
|
+
)
|
|
588
|
+
|
|
589
|
+
@state.append_training_result(training_result)
|
|
590
|
+
|
|
591
|
+
result
|
|
592
|
+
end
|
|
593
|
+
end
|
|
594
|
+
end
|
|
595
|
+
```
|
|
596
|
+
|
|
597
|
+
### Step 8: Optimizer (lib/looped/optimizer.rb)
|
|
598
|
+
|
|
599
|
+
```ruby
|
|
600
|
+
# typed: strict
|
|
601
|
+
# frozen_string_literal: true
|
|
602
|
+
|
|
603
|
+
module Looped
|
|
604
|
+
class Optimizer
|
|
605
|
+
extend T::Sig
|
|
606
|
+
|
|
607
|
+
MIN_BUFFER_SIZE = 10
|
|
608
|
+
POLL_INTERVAL = 60
|
|
609
|
+
|
|
610
|
+
sig do
|
|
611
|
+
params(
|
|
612
|
+
state: State,
|
|
613
|
+
agent_builder: T.proc.returns(Agent),
|
|
614
|
+
judge_lm: DSPy::LM,
|
|
615
|
+
on_improvement: T.nilable(T.proc.void)
|
|
616
|
+
).void
|
|
617
|
+
end
|
|
618
|
+
def initialize(state:, agent_builder:, judge_lm:, on_improvement: nil)
|
|
619
|
+
@state = state
|
|
620
|
+
@agent_builder = agent_builder
|
|
621
|
+
@judge_lm = judge_lm
|
|
622
|
+
@reflection_lm = T.let(DSPy::ReflectionLM.new('openai/gpt-4o-mini'), DSPy::ReflectionLM)
|
|
623
|
+
@on_improvement = on_improvement
|
|
624
|
+
end
|
|
625
|
+
|
|
626
|
+
sig { void }
|
|
627
|
+
def run_forever
|
|
628
|
+
loop do
|
|
629
|
+
begin
|
|
630
|
+
check_and_optimize
|
|
631
|
+
rescue => e
|
|
632
|
+
puts "[optimizer] Error: #{e.message}"
|
|
633
|
+
end
|
|
634
|
+
|
|
635
|
+
sleep POLL_INTERVAL
|
|
636
|
+
end
|
|
637
|
+
end
|
|
638
|
+
|
|
639
|
+
private
|
|
640
|
+
|
|
641
|
+
sig { void }
|
|
642
|
+
def check_and_optimize
|
|
643
|
+
buffer = @state.peek_training_buffer
|
|
644
|
+
return if buffer.size < MIN_BUFFER_SIZE
|
|
645
|
+
|
|
646
|
+
puts "[optimizer] Found #{buffer.size} results. Running GEPA..."
|
|
647
|
+
|
|
648
|
+
buffer = @state.consume_training_buffer
|
|
649
|
+
|
|
650
|
+
trainset = buffer.map do |result|
|
|
651
|
+
DSPy::Example.new(
|
|
652
|
+
inputs: { task: result.task },
|
|
653
|
+
expected: { expected_behavior: result.feedback }
|
|
654
|
+
)
|
|
655
|
+
end
|
|
656
|
+
|
|
657
|
+
train, val = trainset.partition.with_index { |_, i| i % 5 != 0 }
|
|
658
|
+
|
|
659
|
+
agent = @agent_builder.call
|
|
660
|
+
current = @state.load_instructions
|
|
661
|
+
|
|
662
|
+
if current
|
|
663
|
+
agent.apply_instructions(current)
|
|
664
|
+
end
|
|
665
|
+
|
|
666
|
+
gepa = DSPy::Teleprompt::GEPA.new(
|
|
667
|
+
metric: create_metric,
|
|
668
|
+
reflection_lm: @reflection_lm,
|
|
669
|
+
config: { max_metric_calls: 50, minibatch_size: 4 }
|
|
670
|
+
)
|
|
671
|
+
|
|
672
|
+
result = gepa.compile(agent, trainset: train, valset: val)
|
|
673
|
+
current_score = current&.score || 0.0
|
|
674
|
+
|
|
675
|
+
if result.best_score_value > current_score
|
|
676
|
+
puts "[optimizer] Improvement! #{current_score.round(2)} → #{result.best_score_value.round(2)}"
|
|
677
|
+
|
|
678
|
+
@state.save_instructions(
|
|
679
|
+
instructions: result.optimized_program.extract_instructions,
|
|
680
|
+
score: result.best_score_value,
|
|
681
|
+
generation: (current&.generation || 0) + 1
|
|
682
|
+
)
|
|
683
|
+
|
|
684
|
+
@on_improvement&.call
|
|
685
|
+
else
|
|
686
|
+
puts "[optimizer] No improvement. Score: #{result.best_score_value.round(2)}"
|
|
687
|
+
end
|
|
688
|
+
end
|
|
689
|
+
|
|
690
|
+
sig { returns(T.proc.params(example: DSPy::Example, prediction: T.untyped).returns(DSPy::Prediction)) }
|
|
691
|
+
def create_metric
|
|
692
|
+
judge = Judge.new
|
|
693
|
+
judge.configure { |c| c.lm = @judge_lm }
|
|
694
|
+
|
|
695
|
+
lambda do |example, prediction|
|
|
696
|
+
judgment = judge.evaluate(
|
|
697
|
+
task: example.input_values[:task],
|
|
698
|
+
solution: prediction.solution || '',
|
|
699
|
+
expected_behavior: example.expected_values[:expected_behavior]
|
|
700
|
+
)
|
|
701
|
+
|
|
702
|
+
DSPy::Prediction.new(score: judgment.score, feedback: judgment.critique)
|
|
703
|
+
end
|
|
704
|
+
end
|
|
705
|
+
end
|
|
706
|
+
end
|
|
707
|
+
```
|
|
708
|
+
|
|
709
|
+
### Step 9: Main Entry Point (lib/looped.rb)
|
|
710
|
+
|
|
711
|
+
```ruby
|
|
712
|
+
# typed: strict
|
|
713
|
+
# frozen_string_literal: true
|
|
714
|
+
|
|
715
|
+
require 'async'
|
|
716
|
+
require 'dspy'
|
|
717
|
+
require 'dspy/gepa'
|
|
718
|
+
|
|
719
|
+
require_relative 'looped/version'
|
|
720
|
+
require_relative 'looped/types'
|
|
721
|
+
require_relative 'looped/signatures'
|
|
722
|
+
require_relative 'looped/memory'
|
|
723
|
+
require_relative 'looped/state'
|
|
724
|
+
require_relative 'looped/tools/read_file'
|
|
725
|
+
require_relative 'looped/tools/write_file'
|
|
726
|
+
require_relative 'looped/tools/search_code'
|
|
727
|
+
require_relative 'looped/tools/run_command'
|
|
728
|
+
require_relative 'looped/judge'
|
|
729
|
+
require_relative 'looped/agent'
|
|
730
|
+
require_relative 'looped/optimizer'
|
|
731
|
+
|
|
732
|
+
module Looped
|
|
733
|
+
extend T::Sig
|
|
734
|
+
|
|
735
|
+
class << self
|
|
736
|
+
extend T::Sig
|
|
737
|
+
|
|
738
|
+
sig { params(judge_model: T.nilable(String), agent_model: T.nilable(String)).void }
|
|
739
|
+
def start(judge_model: nil, agent_model: nil)
|
|
740
|
+
app = Application.new(judge_model: judge_model, agent_model: agent_model)
|
|
741
|
+
app.run
|
|
742
|
+
end
|
|
743
|
+
end
|
|
744
|
+
|
|
745
|
+
class Application
|
|
746
|
+
extend T::Sig
|
|
747
|
+
|
|
748
|
+
sig { params(judge_model: T.nilable(String), agent_model: T.nilable(String)).void }
|
|
749
|
+
def initialize(judge_model: nil, agent_model: nil)
|
|
750
|
+
@state = T.let(State.new, State)
|
|
751
|
+
@judge_lm = T.let(
|
|
752
|
+
DSPy::LM.new(judge_model || ENV['LOOPED_JUDGE_MODEL'] || 'openai/gpt-4o'),
|
|
753
|
+
DSPy::LM
|
|
754
|
+
)
|
|
755
|
+
@agent_lm = T.let(
|
|
756
|
+
DSPy::LM.new(agent_model || ENV['LOOPED_AGENT_MODEL'] || 'openai/gpt-4o-mini'),
|
|
757
|
+
DSPy::LM
|
|
758
|
+
)
|
|
759
|
+
|
|
760
|
+
@agent = T.let(build_agent, Agent)
|
|
761
|
+
@optimizer = T.let(
|
|
762
|
+
Optimizer.new(
|
|
763
|
+
state: @state,
|
|
764
|
+
agent_builder: -> { build_agent },
|
|
765
|
+
judge_lm: @judge_lm,
|
|
766
|
+
on_improvement: -> { @agent.reload_instructions }
|
|
767
|
+
),
|
|
768
|
+
Optimizer
|
|
769
|
+
)
|
|
770
|
+
end
|
|
771
|
+
|
|
772
|
+
sig { void }
|
|
773
|
+
def run
|
|
774
|
+
Async do |task|
|
|
775
|
+
optimizer_task = task.async { @optimizer.run_forever }
|
|
776
|
+
|
|
777
|
+
puts "[looped] Optimizer started (background)"
|
|
778
|
+
puts "[looped] Agent ready. Type a task or 'quit' to exit."
|
|
779
|
+
|
|
780
|
+
loop do
|
|
781
|
+
print "\n> "
|
|
782
|
+
input = $stdin.gets&.chomp
|
|
783
|
+
break if input.nil? || input == 'quit'
|
|
784
|
+
next if input.empty?
|
|
785
|
+
|
|
786
|
+
begin
|
|
787
|
+
result = @agent.forward(task: input)
|
|
788
|
+
puts "\n#{result.solution}"
|
|
789
|
+
rescue => e
|
|
790
|
+
puts "[error] #{e.message}"
|
|
791
|
+
end
|
|
792
|
+
end
|
|
793
|
+
|
|
794
|
+
optimizer_task.stop
|
|
795
|
+
puts "\n[looped] Goodbye! State saved to ~/.looped/"
|
|
796
|
+
end
|
|
797
|
+
end
|
|
798
|
+
|
|
799
|
+
private
|
|
800
|
+
|
|
801
|
+
sig { returns(Agent) }
|
|
802
|
+
def build_agent
|
|
803
|
+
tools = T.let([
|
|
804
|
+
Tools::ReadFile.new,
|
|
805
|
+
Tools::WriteFile.new,
|
|
806
|
+
Tools::RunCommand.new,
|
|
807
|
+
Tools::SearchCode.new
|
|
808
|
+
], T::Array[DSPy::Tools::Base])
|
|
809
|
+
|
|
810
|
+
agent = Agent.new(tools: tools, state: @state)
|
|
811
|
+
agent.configure { |c| c.lm = @agent_lm }
|
|
812
|
+
agent
|
|
813
|
+
end
|
|
814
|
+
end
|
|
815
|
+
end
|
|
816
|
+
```
|
|
817
|
+
|
|
818
|
+
## Usage
|
|
819
|
+
|
|
820
|
+
```bash
|
|
821
|
+
# Single command - optimizer runs as async background task
|
|
822
|
+
looped
|
|
823
|
+
|
|
824
|
+
# Interactive session with GEPA learning in background
|
|
825
|
+
[looped] Optimizer started (background)
|
|
826
|
+
[looped] Agent ready (gen 3, score 0.82)
|
|
827
|
+
|
|
828
|
+
> Fix the failing test in spec/user_spec.rb
|
|
829
|
+
[agent] Reading spec/user_spec.rb...
|
|
830
|
+
[agent] Applied fix. Judge score: 0.9
|
|
831
|
+
|
|
832
|
+
> quit
|
|
833
|
+
[looped] Goodbye! State saved to ~/.looped/
|
|
834
|
+
```
|
|
835
|
+
|
|
836
|
+
## Design Decisions
|
|
837
|
+
|
|
838
|
+
1. **Standalone gem**: `looped` depends on `dspy-rb` and `dspy-gepa`
|
|
839
|
+
2. **Sorbet types**: Full `T::Struct` types for all data structures
|
|
840
|
+
3. **Sandbox**: Docker containers via `trusted-sandbox` gem for RunCommand
|
|
841
|
+
4. **Training data**: Real usage - agent learns from your actual coding tasks
|
|
842
|
+
5. **Judge model**: Configurable via `LOOPED_JUDGE_MODEL` env var
|
|
843
|
+
6. **GEPA trigger**: Background async task continuously monitors and optimizes
|
|
844
|
+
7. **Persistence**: File-based in `~/.looped/`
|
|
845
|
+
8. **Concurrency**: Async gem - single command runs both agent and optimizer
|
|
846
|
+
|
|
847
|
+
## Testing Strategy
|
|
848
|
+
|
|
849
|
+
1. **Unit tests** for Memory, State, Tools (isolated, fast)
|
|
850
|
+
2. **Integration tests** with VCR for Judge and Agent
|
|
851
|
+
3. **Smoke test** for GEPA optimization loop
|
|
852
|
+
4. **TDD approach**: Write failing tests first, then implement
|
|
853
|
+
|
|
854
|
+
## Documentation
|
|
855
|
+
|
|
856
|
+
- `docs/self-improving-coding-agent.md` - Tutorial article explaining the architecture step-by-step
|