RubyGems - looped - Versions diffs - 0.1.0 - Mend

looped 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

checksums.yaml +7 -0
data/PLAN.md +856 -0
data/README.md +340 -0
data/docs/self-improving-coding-agent.md +374 -0
data/exe/looped +115 -0
data/lib/looped/agent.rb +188 -0
data/lib/looped/application.rb +252 -0
data/lib/looped/judge.rb +90 -0
data/lib/looped/memory.rb +96 -0
data/lib/looped/optimizer.rb +267 -0
data/lib/looped/signatures.rb +40 -0
data/lib/looped/state.rb +120 -0
data/lib/looped/tools/read_file.rb +35 -0
data/lib/looped/tools/run_command.rb +56 -0
data/lib/looped/tools/search_code.rb +38 -0
data/lib/looped/tools/write_file.rb +37 -0
data/lib/looped/types.rb +53 -0
data/lib/looped/version.rb +6 -0
data/lib/looped.rb +100 -0
data/looped.gemspec +47 -0
metadata +246 -0

data/docs/self-improving-coding-agent.md ADDED Viewed

@@ -0,0 +1,374 @@
+# Building a Self-Improving Coding Agent
+When you use an LLM-based coding agent, every task generates valuable feedback: what worked, what failed, and why. Most agents throw this data away. **Looped** captures it and uses GEPA to continuously improve your agent's prompts in the background.
+This article walks through building a coding agent that gets better the more you use it.
+## The Problem
+Traditional coding agents have static prompts. You craft instructions once, deploy, and hope for the best. When the agent struggles with certain tasks, you manually tweak prompts based on intuition.
+What if the agent could learn from its own performance?
+## The Solution: Continuous Prompt Optimization
+Looped combines three ideas:
+1. **ReAct Agent** - A coding agent with tools (read files, write files, run commands)
+2. **LLM-as-Judge** - Every task gets scored and critiqued automatically
+3. **GEPA Optimizer** - Runs in the background, improving prompts based on accumulated feedback
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        You Use The Agent                        │
+│  > Fix the failing test in user_spec.rb                        │
+│  [agent] Reading file... Found issue... Applied fix.           │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼ Every task gets judged
+┌─────────────────────────────────────────────────────────────────┐
+│                         LLM-as-Judge                            │
+│  Score: 0.85                                                    │
+│  Critique: "Fixed the test but didn't handle edge case X"      │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼ Results accumulate
+┌─────────────────────────────────────────────────────────────────┐
+│                    Training Buffer (~/.looped/)                 │
+│  [task1, score: 0.9, feedback: "..."]                          │
+│  [task2, score: 0.7, feedback: "..."]                          │
+│  [task3, score: 0.85, feedback: "..."]                         │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼ Every 60 seconds, GEPA checks
+┌─────────────────────────────────────────────────────────────────┐
+│                    Background Optimizer                         │
+│  "Found 12 results. Running reflection..."                     │
+│  "Improvement! 0.82 → 0.89"                                    │
+│  → Hot-swaps new instructions into the agent                   │
+└─────────────────────────────────────────────────────────────────┘
+```
+## Step 1: The Simplest Coding Agent
+Let's start with a basic ReAct agent that can read and write files:
+```ruby
+require 'dspy'
+class CodingTaskSignature < DSPy::Signature
+  description "Complete a coding task."
+  input do
+    const :task, String
+    const :context, String, default: ''
+  end
+  output do
+    const :solution, String
+    const :files_modified, T::Array[String]
+  end
+end
+# Define tools
+class ReadFileTool < DSPy::Tools::Base
+  tool_name 'read_file'
+  tool_description 'Read contents of a file'
+  sig { params(path: String).returns(String) }
+  def call(path:)
+    File.read(path)
+  rescue => e
+    "Error: #{e.message}"
+  end
+end
+class WriteFileTool < DSPy::Tools::Base
+  tool_name 'write_file'
+  tool_description 'Write content to a file'
+  sig { params(path: String, content: String).returns(String) }
+  def call(path:, content:)
+    File.write(path, content)
+    "Wrote #{content.length} bytes to #{path}"
+  end
+end
+# Create the agent
+tools = [ReadFileTool.new, WriteFileTool.new]
+agent = DSPy::ReAct.new(CodingTaskSignature, tools: tools)
+# Use it
+result = agent.forward(task: "Add a greeting method to lib/hello.rb")
+puts result.solution
+```
+This works, but the agent never improves. Let's add evaluation.
+## Step 2: Add LLM-as-Judge
+Every task should be evaluated. We define a Judge signature that scores and critiques:
+```ruby
+class JudgeSignature < DSPy::Signature
+  description "Evaluate code quality and correctness."
+  input do
+    const :task, String
+    const :solution, String
+    const :expected_behavior, String
+  end
+  output do
+    const :score, Float           # 0.0 - 1.0
+    const :passed, T::Boolean
+    const :critique, String       # "Missing edge case handling for..."
+    const :suggestions, T::Array[String]
+  end
+end
+class Judge < DSPy::Predict
+  def initialize
+    super(JudgeSignature)
+  end
+  def evaluate(task:, solution:, expected_behavior:)
+    call(
+      task: task,
+      solution: solution,
+      expected_behavior: expected_behavior
+    )
+  end
+end
+```
+The critique is crucial—it becomes the feedback that GEPA uses to improve prompts.
+## Step 3: Memory with Context Engineering
+As the agent works, it accumulates history. But we don't want to dump raw data into the prompt. We use the **two-struct pattern**: rich storage, lean context.
+```ruby
+# Rich struct for storage (debugging, analytics)
+class MemoryEntry < T::Struct
+  const :action_type, String
+  const :action_input, T::Hash[String, T.untyped]
+  const :action_output, String
+  const :timestamp, String
+  const :model_id, T.nilable(String)
+  const :tokens_used, T.nilable(Integer)
+end
+# Lean struct for prompts (only what the LLM needs)
+class ActionSummary < T::Struct
+  const :action, String   # "read_file(path=lib/hello.rb)"
+  const :result, String   # First 500 chars of output
+end
+```
+The Memory class handles the transformation:
+```ruby
+class Memory
+  def initialize(max_entries: 10)
+    @entries = []
+    @max_entries = max_entries
+  end
+  def add(action:, input:, output:)
+    @entries << MemoryEntry.new(
+      action_type: action,
+      action_input: input,
+      action_output: output,
+      timestamp: Time.now.utc.iso8601
+    )
+  end
+  # Shape into lean context for the LLM
+  def to_context
+    @entries.last(@max_entries).map do |entry|
+      ActionSummary.new(
+        action: "#{entry.action_type}(#{summarize(entry.action_input)})",
+        result: truncate(entry.action_output, 500)
+      )
+    end
+  end
+end
+```
+## Step 4: Persist Training Data
+Every task result needs to be saved so the optimizer can learn from it:
+```ruby
+class State
+  STORAGE_DIR = File.expand_path('~/.looped')
+  def append_training_result(result)
+    buffer = load_buffer
+    buffer << {
+      task: result.task,
+      solution: result.solution,
+      score: result.score,
+      feedback: result.feedback,
+      timestamp: Time.now.utc.iso8601
+    }
+    save_buffer(buffer)
+  end
+  def consume_training_buffer
+    buffer = load_buffer
+    archive_buffer(buffer)  # Save to history/
+    clear_buffer
+    buffer
+  end
+end
+```
+## Step 5: Background GEPA Optimizer
+The magic happens here. A background task monitors the training buffer and runs GEPA when enough data accumulates:
+```ruby
+class Optimizer
+  MIN_BUFFER_SIZE = 10
+  POLL_INTERVAL = 60
+  def run_forever
+    loop do
+      check_and_optimize
+      sleep POLL_INTERVAL
+    end
+  end
+  private
+  def check_and_optimize
+    buffer = @state.peek_training_buffer
+    return if buffer.size < MIN_BUFFER_SIZE
+    puts "[optimizer] Found #{buffer.size} results. Running GEPA..."
+    # Convert to DSPy examples
+    trainset = buffer.map do |result|
+      DSPy::Example.new(
+        inputs: { task: result[:task] },
+        expected: { expected_behavior: result[:feedback] }
+      )
+    end
+    # Run GEPA
+    gepa = DSPy::Teleprompt::GEPA.new(
+      metric: create_metric,
+      config: { max_metric_calls: 50, minibatch_size: 4 }
+    )
+    result = gepa.compile(@agent, trainset: trainset)
+    if result.best_score_value > current_score
+      puts "[optimizer] Improvement! #{current_score} → #{result.best_score_value}"
+      save_new_instructions(result.optimized_program)
+      notify_agent_to_reload
+    end
+  end
+end
+```
+## Step 6: Putting It Together with Async
+The final piece: run the agent and optimizer together using Ruby's async gem:
+```ruby
+require 'async'
+class Application
+  def run
+    Async do |task|
+      # Background: Optimizer checks every 60s
+      optimizer_task = task.async { @optimizer.run_forever }
+      # Foreground: Interactive agent
+      puts "[looped] Agent ready. Type a task or 'quit' to exit."
+      loop do
+        print "\n> "
+        input = $stdin.gets&.chomp
+        break if input.nil? || input == 'quit'
+        result = @agent.forward(task: input)
+        puts result.solution
+      end
+      optimizer_task.stop
+    end
+  end
+end
+# Run it
+Looped.start
+```
+## How GEPA Improves Prompts
+GEPA's reflection loop works like this:
+1. **Sample failures** from your training buffer
+2. **Show them to a reflection LLM** along with the current instruction
+3. **Ask for improvements**: "Given these failures, how should we modify the instruction?"
+4. **Test the new instruction** on a validation set
+5. **Keep it if it's better** (Pareto frontier prevents regression)
+The judge's critique is key. Instead of just "score: 0.7", it provides actionable feedback like:
+> "The agent fixed the immediate bug but didn't check for nil values in the input. Consider adding defensive checks."
+GEPA's reflection LLM sees this and might propose:
+> "When fixing bugs, always check for edge cases like nil inputs, empty arrays, and boundary conditions before applying the fix."
+## Usage
+```bash
+$ looped
+[looped] Optimizer started (background)
+[looped] Agent ready (gen 1, score 0.75)
+> Fix the failing test in spec/user_spec.rb
+[agent] Reading spec/user_spec.rb...
+[agent] Found assertion failure on line 42...
+[agent] Applied fix.
+> Add input validation to signup controller
+[agent] Reading app/controllers/signup_controller.rb...
+...
+[optimizer] Found 12 results. Running GEPA...
+[optimizer] Improvement! 0.75 → 0.82
+[agent] Hot-reloaded instructions (gen 2)
+> quit
+[looped] Goodbye! State saved to ~/.looped/
+```
+Over time, the agent learns your codebase patterns, common mistake categories, and effective fix strategies—all automatically.
+## Key Takeaways
+1. **Capture everything** - Every task generates training data (task, solution, score, critique)
+2. **LLM-as-judge provides rich feedback** - Critiques like "missed edge case X" are more useful than bare scores
+3. **Background optimization is seamless** - Users don't wait; improvements happen asynchronously
+4. **Hot-reload keeps the agent fresh** - New instructions apply immediately without restart
+5. **Pareto frontier prevents regression** - GEPA won't accept changes that break previously working cases
+## What's Next
+- **Sandbox tool execution** - Use Docker for safe command execution
+- **Per-tool feedback** - Separate optimization for different tool usage patterns
+- **Multi-model routing** - Route complex tasks to stronger models
+- **Persistent memory** - Remember context across sessions
+The code is available at [github.com/vicentereig/looped](https://github.com/vicentereig/looped).

data/exe/looped ADDED Viewed

@@ -0,0 +1,115 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require 'optparse'
+require 'looped'
+options = {}
+parser = OptionParser.new do |opts|
+  opts.banner = <<~BANNER
+    Looped - Self-improving coding agent powered by DSPy.rb + GEPA
+    Usage: looped [options] [task]
+    If no task is provided, starts interactive mode.
+  BANNER
+  opts.separator 'Options:'
+  opts.on('-m', '--model MODEL', 'Agent model (default: openai/gpt-4o-mini)') do |model|
+    options[:model] = model
+  end
+  opts.on('-j', '--judge-model MODEL', 'Judge model for evaluation') do |model|
+    options[:judge_model] = model
+  end
+  opts.on('-r', '--reflection-model MODEL', 'GEPA reflection model') do |model|
+    options[:reflection_model] = model
+  end
+  opts.on('-i', '--max-iterations N', Integer, 'Max ReAct iterations (default: 10)') do |n|
+    options[:max_iterations] = n
+  end
+  opts.on('-c', '--context FILE', 'Load context from file') do |file|
+    options[:context_file] = file
+  end
+  opts.on('--no-optimizer', 'Disable background optimizer') do
+    options[:no_optimizer] = true
+  end
+  opts.on('-v', '--version', 'Show version') do
+    puts "looped #{Looped::VERSION}"
+    exit
+  end
+  opts.on('-h', '--help', 'Show this help message') do
+    puts opts
+    exit
+  end
+  opts.separator ''
+  opts.separator 'Environment Variables:'
+  opts.separator '  OPENAI_API_KEY          OpenAI API key (required for openai/* models)'
+  opts.separator '  ANTHROPIC_API_KEY       Anthropic API key (for anthropic/* models)'
+  opts.separator '  GEMINI_API_KEY          Google API key (for gemini/* models)'
+  opts.separator '  LOOPED_MODEL            Default agent model'
+  opts.separator '  LOOPED_JUDGE_MODEL      Default judge model'
+  opts.separator '  LOOPED_STORAGE_DIR      Storage directory (default: ~/.looped)'
+  opts.separator ''
+  opts.separator 'Examples:'
+  opts.separator '  looped                           # Interactive mode'
+  opts.separator '  looped "Write a fibonacci function in Ruby"'
+  opts.separator '  looped -m openai/gpt-4o "Fix the bug in main.rb"'
+  opts.separator '  looped -c project_context.md "Add unit tests"'
+  opts.separator ''
+end
+begin
+  parser.parse!
+rescue OptionParser::InvalidOption, OptionParser::MissingArgument => e
+  puts "Error: #{e.message}"
+  puts ''
+  puts parser
+  exit 1
+end
+# Load context from file if specified
+context = ''
+if options[:context_file]
+  unless File.exist?(options[:context_file])
+    puts "Error: Context file not found: #{options[:context_file]}"
+    exit 1
+  end
+  context = File.read(options[:context_file])
+end
+# Build run options, filtering out nil values
+run_options = {
+  model: options[:model],
+  judge_model: options[:judge_model],
+  reflection_model: options[:reflection_model],
+  max_iterations: options[:max_iterations] || 10
+}.compact
+# If a task is provided as argument, run single task mode
+if ARGV.any?
+  task = ARGV.join(' ')
+  result = Looped.execute(task: task, context: context, **run_options)
+  puts "Score: #{result.score.round(2)}/10"
+  puts ''
+  puts 'Solution:'
+  puts result.solution
+  puts ''
+  puts 'Feedback:'
+  puts result.feedback
+  exit(result.score >= 7.0 ? 0 : 1)
+end
+# Interactive mode
+Looped.run(**run_options)

data/lib/looped/agent.rb ADDED Viewed

@@ -0,0 +1,188 @@
+# typed: strict
+# frozen_string_literal: true
+module Looped
+  class Agent
+    extend T::Sig
+    DEFAULT_MODEL = 'openai/gpt-4o-mini'
+    DEFAULT_MAX_ITERATIONS = 10
+    sig { returns(DSPy::ReAct) }
+    attr_reader :react
+    sig { returns(Memory) }
+    attr_reader :memory
+    sig { returns(State) }
+    attr_reader :state
+    sig { returns(Judge) }
+    attr_reader :judge
+    sig { returns(T.nilable(String)) }
+    attr_reader :instructions_mtime
+    sig { params(model: T.nilable(String), max_iterations: Integer, judge_model: T.nilable(String)).void }
+    def initialize(model: nil, max_iterations: DEFAULT_MAX_ITERATIONS, judge_model: nil)
+      @model_id = T.let(model || ENV.fetch('LOOPED_MODEL', DEFAULT_MODEL), String)
+      @max_iterations = T.let(max_iterations, Integer)
+      @memory = T.let(Memory.new, Memory)
+      @state = T.let(State.new, State)
+      @judge = T.let(Judge.new(model: judge_model), Judge)
+      @instructions_mtime = T.let(nil, T.nilable(String))
+      # Build the ReAct agent with tools
+      @react = T.let(build_react_agent, DSPy::ReAct)
+      # Load any existing instructions
+      maybe_reload_instructions
+    end
+    sig { params(task: String, context: String).returns(Types::TrainingResult) }
+    def run(task:, context: '')
+      # Check for instruction hot-reload
+      maybe_reload_instructions
+      # Clear memory for new task
+      @memory.clear
+      # Execute the agent
+      result = execute_task(task: task, context: context)
+      # Judge the result
+      judgment = @judge.evaluate(task: task, solution: result[:solution])
+      # Create training result
+      training_result = Types::TrainingResult.new(
+        task: task,
+        solution: result[:solution],
+        score: judgment.score,
+        feedback: @judge.to_feedback(judgment),
+        timestamp: Time.now.utc.iso8601
+      )
+      # Persist for GEPA optimization
+      @state.append_training_result(training_result)
+      training_result
+    end
+    sig { void }
+    def reload_instructions
+      instructions = @state.load_instructions
+      return unless instructions
+      # Create a new react agent with the updated instructions
+      thought_instruction = instructions.thought_generator
+      observation_instruction = instructions.observation_processor
+      if thought_instruction || observation_instruction
+        @react = build_react_agent(
+          thought_instruction: thought_instruction,
+          observation_instruction: observation_instruction
+        )
+        # Track mtime for hot-reload detection
+        @instructions_mtime = instructions.updated_at
+      end
+    end
+    private
+    sig { params(thought_instruction: T.nilable(String), observation_instruction: T.nilable(String)).returns(DSPy::ReAct) }
+    def build_react_agent(thought_instruction: nil, observation_instruction: nil)
+      # Configure DSPy with our model
+      DSPy.configure do |config|
+        config.lm = DSPy::LM.new(@model_id, api_key: resolve_api_key(@model_id))
+      end
+      # Build tools
+      tools = [
+        Tools::ReadFile.new,
+        Tools::WriteFile.new,
+        Tools::SearchCode.new,
+        Tools::RunCommand.new
+      ]
+      # Create base ReAct agent
+      agent = DSPy::ReAct.new(
+        Looped::CodingTaskSignature,
+        tools: tools,
+        max_iterations: @max_iterations
+      )
+      # Apply custom instructions if present
+      if thought_instruction
+        agent = agent.with_instruction(thought_instruction)
+      end
+      agent
+    end
+    sig { params(task: String, context: String).returns(T::Hash[Symbol, T.untyped]) }
+    def execute_task(task:, context:)
+      # Build history from memory for context
+      history = @memory.to_context
+      # Run ReAct
+      result = @react.forward(
+        task: task,
+        context: context,
+        history: history
+      )
+      # Record actions in memory
+      result.history.each do |entry|
+        # Normalize action_input to a hash
+        action_input = entry[:action_input]
+        action_input = case action_input
+                       when Hash then action_input
+                       when String then { 'input' => action_input }
+                       when NilClass then {}
+                       else { 'value' => action_input.to_s }
+                       end
+        @memory.add(
+          action_type: entry[:action] || 'unknown',
+          action_input: action_input,
+          action_output: entry[:observation]&.to_s || '',
+          model_id: @model_id
+        )
+      end
+      {
+        solution: result.solution,
+        files_modified: result.files_modified,
+        iterations: result.iterations,
+        tools_used: result.tools_used
+      }
+    end
+    sig { void }
+    def maybe_reload_instructions
+      instructions = @state.load_instructions
+      return unless instructions
+      # Reload if mtime changed
+      if @instructions_mtime != instructions.updated_at
+        reload_instructions
+      end
+    end
+    sig { params(model_id: String).returns(T.nilable(String)) }
+    def resolve_api_key(model_id)
+      provider = model_id.split('/').first
+      case provider
+      when 'openai'
+        ENV['OPENAI_API_KEY']
+      when 'anthropic'
+        ENV['ANTHROPIC_API_KEY']
+      when 'gemini', 'google'
+        ENV['GEMINI_API_KEY']
+      else
+        ENV['OPENAI_API_KEY']
+      end
+    end
+  end
+end