looped 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,374 @@
1
+ # Building a Self-Improving Coding Agent
2
+
3
+ When you use an LLM-based coding agent, every task generates valuable feedback: what worked, what failed, and why. Most agents throw this data away. **Looped** captures it and uses GEPA to continuously improve your agent's prompts in the background.
4
+
5
+ This article walks through building a coding agent that gets better the more you use it.
6
+
7
+ ## The Problem
8
+
9
+ Traditional coding agents have static prompts. You craft instructions once, deploy, and hope for the best. When the agent struggles with certain tasks, you manually tweak prompts based on intuition.
10
+
11
+ What if the agent could learn from its own performance?
12
+
13
+ ## The Solution: Continuous Prompt Optimization
14
+
15
+ Looped combines three ideas:
16
+
17
+ 1. **ReAct Agent** - A coding agent with tools (read files, write files, run commands)
18
+ 2. **LLM-as-Judge** - Every task gets scored and critiqued automatically
19
+ 3. **GEPA Optimizer** - Runs in the background, improving prompts based on accumulated feedback
20
+
21
+ ```
22
+ ┌─────────────────────────────────────────────────────────────────┐
23
+ │ You Use The Agent │
24
+ │ > Fix the failing test in user_spec.rb │
25
+ │ [agent] Reading file... Found issue... Applied fix. │
26
+ └─────────────────────────────────────────────────────────────────┘
27
+
28
+ ▼ Every task gets judged
29
+ ┌─────────────────────────────────────────────────────────────────┐
30
+ │ LLM-as-Judge │
31
+ │ Score: 0.85 │
32
+ │ Critique: "Fixed the test but didn't handle edge case X" │
33
+ └─────────────────────────────────────────────────────────────────┘
34
+
35
+ ▼ Results accumulate
36
+ ┌─────────────────────────────────────────────────────────────────┐
37
+ │ Training Buffer (~/.looped/) │
38
+ │ [task1, score: 0.9, feedback: "..."] │
39
+ │ [task2, score: 0.7, feedback: "..."] │
40
+ │ [task3, score: 0.85, feedback: "..."] │
41
+ └─────────────────────────────────────────────────────────────────┘
42
+
43
+ ▼ Every 60 seconds, GEPA checks
44
+ ┌─────────────────────────────────────────────────────────────────┐
45
+ │ Background Optimizer │
46
+ │ "Found 12 results. Running reflection..." │
47
+ │ "Improvement! 0.82 → 0.89" │
48
+ │ → Hot-swaps new instructions into the agent │
49
+ └─────────────────────────────────────────────────────────────────┘
50
+ ```
51
+
52
+ ## Step 1: The Simplest Coding Agent
53
+
54
+ Let's start with a basic ReAct agent that can read and write files:
55
+
56
+ ```ruby
57
+ require 'dspy'
58
+
59
+ class CodingTaskSignature < DSPy::Signature
60
+ description "Complete a coding task."
61
+
62
+ input do
63
+ const :task, String
64
+ const :context, String, default: ''
65
+ end
66
+
67
+ output do
68
+ const :solution, String
69
+ const :files_modified, T::Array[String]
70
+ end
71
+ end
72
+
73
+ # Define tools
74
+ class ReadFileTool < DSPy::Tools::Base
75
+ tool_name 'read_file'
76
+ tool_description 'Read contents of a file'
77
+
78
+ sig { params(path: String).returns(String) }
79
+ def call(path:)
80
+ File.read(path)
81
+ rescue => e
82
+ "Error: #{e.message}"
83
+ end
84
+ end
85
+
86
+ class WriteFileTool < DSPy::Tools::Base
87
+ tool_name 'write_file'
88
+ tool_description 'Write content to a file'
89
+
90
+ sig { params(path: String, content: String).returns(String) }
91
+ def call(path:, content:)
92
+ File.write(path, content)
93
+ "Wrote #{content.length} bytes to #{path}"
94
+ end
95
+ end
96
+
97
+ # Create the agent
98
+ tools = [ReadFileTool.new, WriteFileTool.new]
99
+ agent = DSPy::ReAct.new(CodingTaskSignature, tools: tools)
100
+
101
+ # Use it
102
+ result = agent.forward(task: "Add a greeting method to lib/hello.rb")
103
+ puts result.solution
104
+ ```
105
+
106
+ This works, but the agent never improves. Let's add evaluation.
107
+
108
+ ## Step 2: Add LLM-as-Judge
109
+
110
+ Every task should be evaluated. We define a Judge signature that scores and critiques:
111
+
112
+ ```ruby
113
+ class JudgeSignature < DSPy::Signature
114
+ description "Evaluate code quality and correctness."
115
+
116
+ input do
117
+ const :task, String
118
+ const :solution, String
119
+ const :expected_behavior, String
120
+ end
121
+
122
+ output do
123
+ const :score, Float # 0.0 - 1.0
124
+ const :passed, T::Boolean
125
+ const :critique, String # "Missing edge case handling for..."
126
+ const :suggestions, T::Array[String]
127
+ end
128
+ end
129
+
130
+ class Judge < DSPy::Predict
131
+ def initialize
132
+ super(JudgeSignature)
133
+ end
134
+
135
+ def evaluate(task:, solution:, expected_behavior:)
136
+ call(
137
+ task: task,
138
+ solution: solution,
139
+ expected_behavior: expected_behavior
140
+ )
141
+ end
142
+ end
143
+ ```
144
+
145
+ The critique is crucial—it becomes the feedback that GEPA uses to improve prompts.
146
+
147
+ ## Step 3: Memory with Context Engineering
148
+
149
+ As the agent works, it accumulates history. But we don't want to dump raw data into the prompt. We use the **two-struct pattern**: rich storage, lean context.
150
+
151
+ ```ruby
152
+ # Rich struct for storage (debugging, analytics)
153
+ class MemoryEntry < T::Struct
154
+ const :action_type, String
155
+ const :action_input, T::Hash[String, T.untyped]
156
+ const :action_output, String
157
+ const :timestamp, String
158
+ const :model_id, T.nilable(String)
159
+ const :tokens_used, T.nilable(Integer)
160
+ end
161
+
162
+ # Lean struct for prompts (only what the LLM needs)
163
+ class ActionSummary < T::Struct
164
+ const :action, String # "read_file(path=lib/hello.rb)"
165
+ const :result, String # First 500 chars of output
166
+ end
167
+ ```
168
+
169
+ The Memory class handles the transformation:
170
+
171
+ ```ruby
172
+ class Memory
173
+ def initialize(max_entries: 10)
174
+ @entries = []
175
+ @max_entries = max_entries
176
+ end
177
+
178
+ def add(action:, input:, output:)
179
+ @entries << MemoryEntry.new(
180
+ action_type: action,
181
+ action_input: input,
182
+ action_output: output,
183
+ timestamp: Time.now.utc.iso8601
184
+ )
185
+ end
186
+
187
+ # Shape into lean context for the LLM
188
+ def to_context
189
+ @entries.last(@max_entries).map do |entry|
190
+ ActionSummary.new(
191
+ action: "#{entry.action_type}(#{summarize(entry.action_input)})",
192
+ result: truncate(entry.action_output, 500)
193
+ )
194
+ end
195
+ end
196
+ end
197
+ ```
198
+
199
+ ## Step 4: Persist Training Data
200
+
201
+ Every task result needs to be saved so the optimizer can learn from it:
202
+
203
+ ```ruby
204
+ class State
205
+ STORAGE_DIR = File.expand_path('~/.looped')
206
+
207
+ def append_training_result(result)
208
+ buffer = load_buffer
209
+ buffer << {
210
+ task: result.task,
211
+ solution: result.solution,
212
+ score: result.score,
213
+ feedback: result.feedback,
214
+ timestamp: Time.now.utc.iso8601
215
+ }
216
+ save_buffer(buffer)
217
+ end
218
+
219
+ def consume_training_buffer
220
+ buffer = load_buffer
221
+ archive_buffer(buffer) # Save to history/
222
+ clear_buffer
223
+ buffer
224
+ end
225
+ end
226
+ ```
227
+
228
+ ## Step 5: Background GEPA Optimizer
229
+
230
+ The magic happens here. A background task monitors the training buffer and runs GEPA when enough data accumulates:
231
+
232
+ ```ruby
233
+ class Optimizer
234
+ MIN_BUFFER_SIZE = 10
235
+ POLL_INTERVAL = 60
236
+
237
+ def run_forever
238
+ loop do
239
+ check_and_optimize
240
+ sleep POLL_INTERVAL
241
+ end
242
+ end
243
+
244
+ private
245
+
246
+ def check_and_optimize
247
+ buffer = @state.peek_training_buffer
248
+ return if buffer.size < MIN_BUFFER_SIZE
249
+
250
+ puts "[optimizer] Found #{buffer.size} results. Running GEPA..."
251
+
252
+ # Convert to DSPy examples
253
+ trainset = buffer.map do |result|
254
+ DSPy::Example.new(
255
+ inputs: { task: result[:task] },
256
+ expected: { expected_behavior: result[:feedback] }
257
+ )
258
+ end
259
+
260
+ # Run GEPA
261
+ gepa = DSPy::Teleprompt::GEPA.new(
262
+ metric: create_metric,
263
+ config: { max_metric_calls: 50, minibatch_size: 4 }
264
+ )
265
+
266
+ result = gepa.compile(@agent, trainset: trainset)
267
+
268
+ if result.best_score_value > current_score
269
+ puts "[optimizer] Improvement! #{current_score} → #{result.best_score_value}"
270
+ save_new_instructions(result.optimized_program)
271
+ notify_agent_to_reload
272
+ end
273
+ end
274
+ end
275
+ ```
276
+
277
+ ## Step 6: Putting It Together with Async
278
+
279
+ The final piece: run the agent and optimizer together using Ruby's async gem:
280
+
281
+ ```ruby
282
+ require 'async'
283
+
284
+ class Application
285
+ def run
286
+ Async do |task|
287
+ # Background: Optimizer checks every 60s
288
+ optimizer_task = task.async { @optimizer.run_forever }
289
+
290
+ # Foreground: Interactive agent
291
+ puts "[looped] Agent ready. Type a task or 'quit' to exit."
292
+
293
+ loop do
294
+ print "\n> "
295
+ input = $stdin.gets&.chomp
296
+ break if input.nil? || input == 'quit'
297
+
298
+ result = @agent.forward(task: input)
299
+ puts result.solution
300
+ end
301
+
302
+ optimizer_task.stop
303
+ end
304
+ end
305
+ end
306
+
307
+ # Run it
308
+ Looped.start
309
+ ```
310
+
311
+ ## How GEPA Improves Prompts
312
+
313
+ GEPA's reflection loop works like this:
314
+
315
+ 1. **Sample failures** from your training buffer
316
+ 2. **Show them to a reflection LLM** along with the current instruction
317
+ 3. **Ask for improvements**: "Given these failures, how should we modify the instruction?"
318
+ 4. **Test the new instruction** on a validation set
319
+ 5. **Keep it if it's better** (Pareto frontier prevents regression)
320
+
321
+ The judge's critique is key. Instead of just "score: 0.7", it provides actionable feedback like:
322
+
323
+ > "The agent fixed the immediate bug but didn't check for nil values in the input. Consider adding defensive checks."
324
+
325
+ GEPA's reflection LLM sees this and might propose:
326
+
327
+ > "When fixing bugs, always check for edge cases like nil inputs, empty arrays, and boundary conditions before applying the fix."
328
+
329
+ ## Usage
330
+
331
+ ```bash
332
+ $ looped
333
+ [looped] Optimizer started (background)
334
+ [looped] Agent ready (gen 1, score 0.75)
335
+
336
+ > Fix the failing test in spec/user_spec.rb
337
+ [agent] Reading spec/user_spec.rb...
338
+ [agent] Found assertion failure on line 42...
339
+ [agent] Applied fix.
340
+
341
+ > Add input validation to signup controller
342
+ [agent] Reading app/controllers/signup_controller.rb...
343
+ ...
344
+
345
+ [optimizer] Found 12 results. Running GEPA...
346
+ [optimizer] Improvement! 0.75 → 0.82
347
+ [agent] Hot-reloaded instructions (gen 2)
348
+
349
+ > quit
350
+ [looped] Goodbye! State saved to ~/.looped/
351
+ ```
352
+
353
+ Over time, the agent learns your codebase patterns, common mistake categories, and effective fix strategies—all automatically.
354
+
355
+ ## Key Takeaways
356
+
357
+ 1. **Capture everything** - Every task generates training data (task, solution, score, critique)
358
+
359
+ 2. **LLM-as-judge provides rich feedback** - Critiques like "missed edge case X" are more useful than bare scores
360
+
361
+ 3. **Background optimization is seamless** - Users don't wait; improvements happen asynchronously
362
+
363
+ 4. **Hot-reload keeps the agent fresh** - New instructions apply immediately without restart
364
+
365
+ 5. **Pareto frontier prevents regression** - GEPA won't accept changes that break previously working cases
366
+
367
+ ## What's Next
368
+
369
+ - **Sandbox tool execution** - Use Docker for safe command execution
370
+ - **Per-tool feedback** - Separate optimization for different tool usage patterns
371
+ - **Multi-model routing** - Route complex tasks to stronger models
372
+ - **Persistent memory** - Remember context across sessions
373
+
374
+ The code is available at [github.com/vicentereig/looped](https://github.com/vicentereig/looped).
data/exe/looped ADDED
@@ -0,0 +1,115 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require 'optparse'
5
+ require 'looped'
6
+
7
+ options = {}
8
+
9
+ parser = OptionParser.new do |opts|
10
+ opts.banner = <<~BANNER
11
+ Looped - Self-improving coding agent powered by DSPy.rb + GEPA
12
+
13
+ Usage: looped [options] [task]
14
+
15
+ If no task is provided, starts interactive mode.
16
+
17
+ BANNER
18
+
19
+ opts.separator 'Options:'
20
+
21
+ opts.on('-m', '--model MODEL', 'Agent model (default: openai/gpt-4o-mini)') do |model|
22
+ options[:model] = model
23
+ end
24
+
25
+ opts.on('-j', '--judge-model MODEL', 'Judge model for evaluation') do |model|
26
+ options[:judge_model] = model
27
+ end
28
+
29
+ opts.on('-r', '--reflection-model MODEL', 'GEPA reflection model') do |model|
30
+ options[:reflection_model] = model
31
+ end
32
+
33
+ opts.on('-i', '--max-iterations N', Integer, 'Max ReAct iterations (default: 10)') do |n|
34
+ options[:max_iterations] = n
35
+ end
36
+
37
+ opts.on('-c', '--context FILE', 'Load context from file') do |file|
38
+ options[:context_file] = file
39
+ end
40
+
41
+ opts.on('--no-optimizer', 'Disable background optimizer') do
42
+ options[:no_optimizer] = true
43
+ end
44
+
45
+ opts.on('-v', '--version', 'Show version') do
46
+ puts "looped #{Looped::VERSION}"
47
+ exit
48
+ end
49
+
50
+ opts.on('-h', '--help', 'Show this help message') do
51
+ puts opts
52
+ exit
53
+ end
54
+
55
+ opts.separator ''
56
+ opts.separator 'Environment Variables:'
57
+ opts.separator ' OPENAI_API_KEY OpenAI API key (required for openai/* models)'
58
+ opts.separator ' ANTHROPIC_API_KEY Anthropic API key (for anthropic/* models)'
59
+ opts.separator ' GEMINI_API_KEY Google API key (for gemini/* models)'
60
+ opts.separator ' LOOPED_MODEL Default agent model'
61
+ opts.separator ' LOOPED_JUDGE_MODEL Default judge model'
62
+ opts.separator ' LOOPED_STORAGE_DIR Storage directory (default: ~/.looped)'
63
+ opts.separator ''
64
+ opts.separator 'Examples:'
65
+ opts.separator ' looped # Interactive mode'
66
+ opts.separator ' looped "Write a fibonacci function in Ruby"'
67
+ opts.separator ' looped -m openai/gpt-4o "Fix the bug in main.rb"'
68
+ opts.separator ' looped -c project_context.md "Add unit tests"'
69
+ opts.separator ''
70
+ end
71
+
72
+ begin
73
+ parser.parse!
74
+ rescue OptionParser::InvalidOption, OptionParser::MissingArgument => e
75
+ puts "Error: #{e.message}"
76
+ puts ''
77
+ puts parser
78
+ exit 1
79
+ end
80
+
81
+ # Load context from file if specified
82
+ context = ''
83
+ if options[:context_file]
84
+ unless File.exist?(options[:context_file])
85
+ puts "Error: Context file not found: #{options[:context_file]}"
86
+ exit 1
87
+ end
88
+ context = File.read(options[:context_file])
89
+ end
90
+
91
+ # Build run options, filtering out nil values
92
+ run_options = {
93
+ model: options[:model],
94
+ judge_model: options[:judge_model],
95
+ reflection_model: options[:reflection_model],
96
+ max_iterations: options[:max_iterations] || 10
97
+ }.compact
98
+
99
+ # If a task is provided as argument, run single task mode
100
+ if ARGV.any?
101
+ task = ARGV.join(' ')
102
+ result = Looped.execute(task: task, context: context, **run_options)
103
+
104
+ puts "Score: #{result.score.round(2)}/10"
105
+ puts ''
106
+ puts 'Solution:'
107
+ puts result.solution
108
+ puts ''
109
+ puts 'Feedback:'
110
+ puts result.feedback
111
+ exit(result.score >= 7.0 ? 0 : 1)
112
+ end
113
+
114
+ # Interactive mode
115
+ Looped.run(**run_options)
@@ -0,0 +1,188 @@
1
+ # typed: strict
2
+ # frozen_string_literal: true
3
+
4
+ module Looped
5
+ class Agent
6
+ extend T::Sig
7
+
8
+ DEFAULT_MODEL = 'openai/gpt-4o-mini'
9
+ DEFAULT_MAX_ITERATIONS = 10
10
+
11
+ sig { returns(DSPy::ReAct) }
12
+ attr_reader :react
13
+
14
+ sig { returns(Memory) }
15
+ attr_reader :memory
16
+
17
+ sig { returns(State) }
18
+ attr_reader :state
19
+
20
+ sig { returns(Judge) }
21
+ attr_reader :judge
22
+
23
+ sig { returns(T.nilable(String)) }
24
+ attr_reader :instructions_mtime
25
+
26
+ sig { params(model: T.nilable(String), max_iterations: Integer, judge_model: T.nilable(String)).void }
27
+ def initialize(model: nil, max_iterations: DEFAULT_MAX_ITERATIONS, judge_model: nil)
28
+ @model_id = T.let(model || ENV.fetch('LOOPED_MODEL', DEFAULT_MODEL), String)
29
+ @max_iterations = T.let(max_iterations, Integer)
30
+ @memory = T.let(Memory.new, Memory)
31
+ @state = T.let(State.new, State)
32
+ @judge = T.let(Judge.new(model: judge_model), Judge)
33
+ @instructions_mtime = T.let(nil, T.nilable(String))
34
+
35
+ # Build the ReAct agent with tools
36
+ @react = T.let(build_react_agent, DSPy::ReAct)
37
+
38
+ # Load any existing instructions
39
+ maybe_reload_instructions
40
+ end
41
+
42
+ sig { params(task: String, context: String).returns(Types::TrainingResult) }
43
+ def run(task:, context: '')
44
+ # Check for instruction hot-reload
45
+ maybe_reload_instructions
46
+
47
+ # Clear memory for new task
48
+ @memory.clear
49
+
50
+ # Execute the agent
51
+ result = execute_task(task: task, context: context)
52
+
53
+ # Judge the result
54
+ judgment = @judge.evaluate(task: task, solution: result[:solution])
55
+
56
+ # Create training result
57
+ training_result = Types::TrainingResult.new(
58
+ task: task,
59
+ solution: result[:solution],
60
+ score: judgment.score,
61
+ feedback: @judge.to_feedback(judgment),
62
+ timestamp: Time.now.utc.iso8601
63
+ )
64
+
65
+ # Persist for GEPA optimization
66
+ @state.append_training_result(training_result)
67
+
68
+ training_result
69
+ end
70
+
71
+ sig { void }
72
+ def reload_instructions
73
+ instructions = @state.load_instructions
74
+ return unless instructions
75
+
76
+ # Create a new react agent with the updated instructions
77
+ thought_instruction = instructions.thought_generator
78
+ observation_instruction = instructions.observation_processor
79
+
80
+ if thought_instruction || observation_instruction
81
+ @react = build_react_agent(
82
+ thought_instruction: thought_instruction,
83
+ observation_instruction: observation_instruction
84
+ )
85
+
86
+ # Track mtime for hot-reload detection
87
+ @instructions_mtime = instructions.updated_at
88
+ end
89
+ end
90
+
91
+ private
92
+
93
+ sig { params(thought_instruction: T.nilable(String), observation_instruction: T.nilable(String)).returns(DSPy::ReAct) }
94
+ def build_react_agent(thought_instruction: nil, observation_instruction: nil)
95
+ # Configure DSPy with our model
96
+ DSPy.configure do |config|
97
+ config.lm = DSPy::LM.new(@model_id, api_key: resolve_api_key(@model_id))
98
+ end
99
+
100
+ # Build tools
101
+ tools = [
102
+ Tools::ReadFile.new,
103
+ Tools::WriteFile.new,
104
+ Tools::SearchCode.new,
105
+ Tools::RunCommand.new
106
+ ]
107
+
108
+ # Create base ReAct agent
109
+ agent = DSPy::ReAct.new(
110
+ Looped::CodingTaskSignature,
111
+ tools: tools,
112
+ max_iterations: @max_iterations
113
+ )
114
+
115
+ # Apply custom instructions if present
116
+ if thought_instruction
117
+ agent = agent.with_instruction(thought_instruction)
118
+ end
119
+
120
+ agent
121
+ end
122
+
123
+ sig { params(task: String, context: String).returns(T::Hash[Symbol, T.untyped]) }
124
+ def execute_task(task:, context:)
125
+ # Build history from memory for context
126
+ history = @memory.to_context
127
+
128
+ # Run ReAct
129
+ result = @react.forward(
130
+ task: task,
131
+ context: context,
132
+ history: history
133
+ )
134
+
135
+ # Record actions in memory
136
+ result.history.each do |entry|
137
+ # Normalize action_input to a hash
138
+ action_input = entry[:action_input]
139
+ action_input = case action_input
140
+ when Hash then action_input
141
+ when String then { 'input' => action_input }
142
+ when NilClass then {}
143
+ else { 'value' => action_input.to_s }
144
+ end
145
+
146
+ @memory.add(
147
+ action_type: entry[:action] || 'unknown',
148
+ action_input: action_input,
149
+ action_output: entry[:observation]&.to_s || '',
150
+ model_id: @model_id
151
+ )
152
+ end
153
+
154
+ {
155
+ solution: result.solution,
156
+ files_modified: result.files_modified,
157
+ iterations: result.iterations,
158
+ tools_used: result.tools_used
159
+ }
160
+ end
161
+
162
+ sig { void }
163
+ def maybe_reload_instructions
164
+ instructions = @state.load_instructions
165
+ return unless instructions
166
+
167
+ # Reload if mtime changed
168
+ if @instructions_mtime != instructions.updated_at
169
+ reload_instructions
170
+ end
171
+ end
172
+
173
+ sig { params(model_id: String).returns(T.nilable(String)) }
174
+ def resolve_api_key(model_id)
175
+ provider = model_id.split('/').first
176
+ case provider
177
+ when 'openai'
178
+ ENV['OPENAI_API_KEY']
179
+ when 'anthropic'
180
+ ENV['ANTHROPIC_API_KEY']
181
+ when 'gemini', 'google'
182
+ ENV['GEMINI_API_KEY']
183
+ else
184
+ ENV['OPENAI_API_KEY']
185
+ end
186
+ end
187
+ end
188
+ end