RubyGems - completion-kit - Versions diffs - 0.1.0.rc1 → 0.1.0 - Mend

completion-kit 0.1.0.rc1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: cdecf15deb685a524186a2bd4ba48268e10475da0d2cc2914969318893268f70
-  data.tar.gz: ce395abf147434f9a825f79902e47b171b074038d20c06bf7b86b80ba70eaa00
+  metadata.gz: f8b122b978bb3d74051e14d734203da11cd951ef6ea0cb60b0459215812000e9
+  data.tar.gz: 458b93ab81bf13dcaf1fd7431e6934899855b4e6513ef16e82b4cd085dcb3e67
 SHA512:
-  metadata.gz: 9d13fd1d1863c87ca0f7ed78eb1853ac921ff39862b84c678e5b5c9977f1832edcfb50c52b2371f2dc88ad6360be22a78d39ef53385239939a1ac9297df444ff
-  data.tar.gz: bf95df5b178ccfe455350d216a894836a85cfeff5c117d0206b87211ef638c1d690790350168b5a9ad84e8e3895add8c89acdea98738a1c0d2025115679d939f
+  metadata.gz: d3c49afda3bb03eda67c89df4b5a8144da2259e8002ff35e906fa7516e0d115c0c7e3549a6b2de78f5478b72fefdcf758a0f6e042f83d3befc430c0bb26e7fa0
+  data.tar.gz: ed5fb37dcacd8c3bc1947d75b7ff83e4abde0fbcc418944ac7f3f4dc837679eb6d5cce5c144546ddd59aa4bc6548bdb83ac03c4b13428b916ef6e40b0f510e10

data/README.md CHANGED Viewed

@@ -1,29 +1,44 @@
 <p align="center">
-  <img src="https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/logo.png" alt="CompletionKit logo" width="120" />
+  <img src="https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/logo.png" alt="CompletionKit" width="360" />
 </p>
-# CompletionKit
-[![CI](https://github.com/homemade-software-inc/completion-kit/actions/workflows/ci.yml/badge.svg)](https://github.com/homemade-software-inc/completion-kit/actions/workflows/ci.yml)
-![coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)
-![dependencies](https://img.shields.io/badge/dependencies-7-blue)
-[![Dependabot](https://img.shields.io/badge/dependabot-enabled-blue?logo=dependabot)](https://github.com/homemade-software-inc/completion-kit/network/updates)
-You need to know whether your prompts produce the output you expect, consistently, across real data. CompletionKit gives you that, inside your Rails app.
+<p align="center">
+  <a href="https://github.com/homemade-software-inc/completion-kit/actions/workflows/ci.yml"><img src="https://github.com/homemade-software-inc/completion-kit/actions/workflows/ci.yml/badge.svg" alt="CI" /></a>
+  <img src="https://img.shields.io/badge/coverage-100%25-brightgreen" alt="coverage" />
+</p>
-Mount the engine, bring your prompts and datasets, and every input runs through a model you pick. Each output is scored against your own metrics and rubrics by an LLM-as-judge. When you change a prompt, re-run the same dataset and see exactly what got better and what broke — and when the scores tell you something's off, CompletionKit can suggest an improved version of the prompt based on the reviews, which you inspect as a diff and apply as a new version.
+Your prompts need tests too.
-Drive it from the web UI, from the REST API, or from Claude Code and other MCP-aware agents via the built-in Model Context Protocol server. All three share the same state — your prompts, runs, datasets, and scores are one source of truth.
+Run every prompt against real data. Score each output with an LLM judge against criteria you define. Change anything: the prompt, the model, the temperature, the dataset. Re-run and see exactly what got better and what broke. When the scores tell you something's off, CompletionKit suggests an improved prompt based on the judge's actual feedback on your runs. You inspect the diff, apply it as a new version, and verify the improvement.
 It's the difference between "this prompt seems to work" and "this prompt scores 4.3 out of 5 across 200 inputs, up from 3.8 last version."
+**[completionkit.com](https://completionkit.com)** | **[RubyGems](https://rubygems.org/gems/completion-kit)**
 ![Prompts index](https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/screenshots/prompts.png)
 ![Prompt detail with metrics and rubrics](https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/screenshots/prompt-detail.png)
 ![Test run with scored results](https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/screenshots/test-run.png)
-## Setup
+## Quick Start
+### Run the standalone app
+The fastest way to start. No existing Rails app needed.
+```bash
+git clone https://github.com/homemade-software-inc/completion-kit.git
+cd completion-kit/standalone
+bundle install
+bin/rails completion_kit:install:migrations
+bin/rails db:migrate
+bin/rails server
+```
+Visit `http://localhost:3000`. Add a provider credential (Settings), create a prompt, upload a CSV dataset, and run it.
+### Or mount as an engine in your existing Rails app
 ```ruby
 gem "completion-kit"
@@ -34,26 +49,40 @@ bin/rails generate completion_kit:install
 bin/rails db:migrate
 ```
-Set your provider keys via environment variables or the generated initializer:
+The engine mounts at `/completion_kit` in your app.
-```bash
-OPENAI_API_KEY=...
-ANTHROPIC_API_KEY=...
-LLAMA_API_KEY=...
-LLAMA_API_ENDPOINT=...
-```
+## Providers
+CompletionKit discovers available models from each provider's API automatically.
-Available models are discovered dynamically from each provider's API.
+| Provider | Env vars | What it covers |
+|----------|----------|----------------|
+| **OpenAI** | `OPENAI_API_KEY` | GPT-5, GPT-4.1, GPT-4o, etc. |
+| **Anthropic** | `ANTHROPIC_API_KEY` | Claude Opus, Sonnet, Haiku |
+| **Ollama / local endpoint** | `OLLAMA_API_ENDPOINT` (default: `http://localhost:11434/v1`) | Any model you've `ollama pull`-ed, or any OpenAI-compatible local server (vLLM, LM Studio, llama.cpp) |
+| **OpenRouter** | `OPENROUTER_API_KEY` | 100+ models from 30+ providers through one API key |
-### Encryption keys
+Set these as environment variables or configure them in the generated initializer. You can also add provider credentials through the web UI under Settings.
-Provider API keys are stored using [Rails Active Record encryption](https://guides.rubyonrails.org/active_record_encryption.html), so the host app must have encryption keys configured. If you haven't set them up already:
+### Encryption
+Provider API keys are encrypted at rest using [Active Record encryption](https://guides.rubyonrails.org/active_record_encryption.html). You need three encryption keys configured before the app will boot in production.
+Generate them:
 ```bash
 bin/rails db:encryption:init
 ```
-Copy the generated keys into `config/credentials.yml.enc` under `active_record_encryption`, or set the equivalent environment variables. CompletionKit won't boot without valid keys in production.
+Then set them as environment variables:
+```bash
+COMPLETION_KIT_ENCRYPTION_PRIMARY_KEY=<generated value>
+COMPLETION_KIT_ENCRYPTION_DETERMINISTIC_KEY=<generated value>
+COMPLETION_KIT_ENCRYPTION_KEY_DERIVATION_SALT=<generated value>
+```
+Or add them to `config/credentials.yml.enc` under `active_record_encryption`. In development, the standalone app uses built-in fallback values so you can skip this step locally.
 ## Authentication
@@ -62,7 +91,6 @@ CompletionKit requires authentication in production. In development, routes are
 ### Basic Auth (recommended for simple setups)
 ```ruby
-# config/initializers/completion_kit.rb
 CompletionKit.configure do |c|
   c.username = "admin"
   c.password = ENV["COMPLETION_KIT_PASSWORD"]
@@ -72,58 +100,49 @@ end
 ### Custom Auth (Devise, etc.)
 ```ruby
-# config/initializers/completion_kit.rb
 CompletionKit.configure do |c|
   c.auth_strategy = ->(controller) { controller.authenticate_user! }
 end
 ```
-Only one mode can be active — setting both raises a `ConfigurationError`.
+Only one mode can be active.
-## Usage
+## How it works
-1. Create a prompt with `{{variable}}` placeholders
-2. Create a test run and paste CSV data (headers match variable names)
-3. Generate outputs, run AI review, inspect scored results
+1. **Create a prompt** with `{{variable}}` placeholders
+2. **Upload a dataset.** A CSV where column headers match the variable names.
+3. **Run it** against a model and score outputs with an LLM judge against criteria you define.
+4. **Iterate.** Change the prompt, the model, the temperature, or the dataset and re-run. CompletionKit versions your prompts so you can always compare against previous results.
+5. **Get suggestions.** When scores drop, ask CompletionKit for an AI-generated improvement. The suggestion is based on the judge's actual per-response feedback, not generic prompt-engineering advice. Inspect the diff and apply it as a new version.
-## Programmatic access
+## Concepts
-CompletionKit exposes every resource through both a REST JSON API and an MCP server. Both share the same bearer-token auth, so configure once and use either interface:
+- **Prompt.** A versioned template with `{{variable}}` placeholders. Publishing freezes the template; editing a published prompt creates a new version.
+- **Dataset.** A CSV of real inputs. Each row becomes one test case.
+- **Run.** One execution of a prompt against a dataset. Captures every input (model, temperature, metrics) and stores all outputs and scores.
+- **Response.** The model's output for one dataset row, with reviews attached.
+- **Metric.** An evaluation dimension with a name, instruction, evaluation steps, and a 1-5 star scoring scale. The LLM judge uses this to score each response.
+- **Metric Group.** A reusable group of metrics you can apply to a run as a set.
+- **Provider Credential.** An API key for a model provider. Encrypted at rest, never returned through the API.
+## REST API
+Every resource is accessible via a bearer-token JSON API:
 ```ruby
-# config/initializers/completion_kit.rb
 CompletionKit.configure { |c| c.api_token = ENV["COMPLETION_KIT_API_TOKEN"] }
 ```
-### Concepts
-These are the objects you'll work with, whether through the UI, the REST API, or the MCP server:
-- **Prompt** — A named, versioned template with `{{variable}}` placeholders. Publishing a prompt freezes its template so runs always reference a known version; editing a published prompt creates a new version.
-- **Dataset** — A CSV of real inputs. Column headers match the prompt's `{{variable}}` names, and each row becomes one test case.
-- **Run** — A single execution of a prompt against a dataset. Tracks progress, stores outputs, and records which metrics were used for scoring.
-- **Response** — The model's output for one row of the dataset, with any reviews attached.
-- **Metric** — One evaluation dimension: a name, an instruction, evaluation steps, and 1–5-star rubric bands. The judge uses a metric to score a response.
-- **Criteria** — A named, reusable bundle of metrics you can apply to a run in one step.
-- **Provider Credential** — An API key for a model provider (OpenAI, Anthropic, Ollama, OpenRouter). Encrypted at rest using Rails' Active Record encryption, and never returned through the API.
-### REST API
 ```bash
 curl -H "Authorization: Bearer $TOKEN" \
   http://localhost:3000/completion_kit/api/v1/prompts
-curl -X POST http://localhost:3000/completion_kit/api/v1/prompts \
-  -H "Authorization: Bearer $TOKEN" \
-  -H "Content-Type: application/json" \
-  -d '{"name":"summarizer","template":"Summarize: {{text}}","llm_model":"gpt-4.1"}'
 ```
-Mount the engine, then visit **`/completion_kit/api_reference`** in your running app for per-endpoint documentation with copy-to-clipboard curl examples pre-filled with your token.
+Visit `/completion_kit/api_reference` in your running app for per-endpoint docs with copy-to-clipboard curl examples.
-### MCP server
+## MCP server
-CompletionKit also runs a [Model Context Protocol](https://modelcontextprotocol.io) server at the `/mcp` path within the engine mount, exposing the same resources as 36 tools (one per CRUD action plus process actions like `runs_generate` and `prompts_publish`). Point Claude Code, Cursor, or any other MCP client at it:
+CompletionKit runs a [Model Context Protocol](https://modelcontextprotocol.io) server at `/completion_kit/mcp`, exposing every resource as tools that MCP-aware clients (Claude Code, Cursor, etc.) can drive directly:
 ```json
 {
@@ -136,40 +155,21 @@ CompletionKit also runs a [Model Context Protocol](https://modelcontextprotocol.
 }
 ```
-The in-app API reference page also ships install snippets you can copy straight into your MCP client config.
-## Standalone App
-CompletionKit ships with a standalone Rails app you can deploy as a hosted service.
-### Quick Start
-```bash
-cd standalone
-bundle install
-bin/rails completion_kit:install:migrations
-bin/rails db:migrate
-bin/rails server
-```
-Visit `http://localhost:3000` for the home page, or `http://localhost:3000/completion_kit` for the engine UI.
+The in-app API reference page has install snippets you can copy straight into your MCP client config.
-### Configuration
+## Deploying the standalone app
-Set environment variables:
+Any Rails-friendly host works (Fly, Heroku, Render, Docker, etc.). Point it at a Postgres instance via `DATABASE_URL`, set your provider env vars, and run `cd standalone && bin/rails db:migrate` on each deploy.
 | Variable | Purpose | Default |
 |----------|---------|---------|
-| `COMPLETION_KIT_API_TOKEN` | Bearer token for REST API and MCP access | (none — API disabled) |
+| `COMPLETION_KIT_API_TOKEN` | Bearer token for REST API and MCP | (none, API disabled) |
 | `COMPLETION_KIT_USERNAME` | Web UI login username | `admin` |
-| `COMPLETION_KIT_PASSWORD` | Web UI login password | (none — open in dev) |
-| `DATABASE_URL` | PostgreSQL connection string (production) | SQLite in dev |
-### Deploying
+| `COMPLETION_KIT_PASSWORD` | Web UI login password | (none, open in dev) |
-Any Rails-friendly host works — Fly, Heroku, Render, self-managed Docker, etc. Point your host at a Postgres instance via `DATABASE_URL`, set the environment variables above, and run `cd standalone && bin/rails db:migrate` on each deploy.
+You also need the three `COMPLETION_KIT_ENCRYPTION_*` keys from the [Encryption](#encryption) section above.
-When the gem ships a new engine migration, install it into your standalone app locally and commit the generated file before pushing:
+When the gem ships a new migration, install it locally and commit before pushing:
 ```bash
 cd standalone
@@ -178,14 +178,9 @@ bin/rails db:migrate
 git add db/migrate/ && git commit -m "install new engine migration"
 ```
-That way your host's `db:migrate` picks up the new file on the next deploy. Don't run `completion_kit:install:migrations` on the host itself — migration files are source artifacts, they belong in git.
-## Development
+## Contributing
-```bash
-bundle install
-bundle exec rspec
-```
+See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and pull request guidelines.
 ## License

data/app/controllers/completion_kit/api/v1/metric_groups_controller.rb ADDED Viewed

@@ -0,0 +1,62 @@
+module CompletionKit
+  module Api
+    module V1
+      class MetricGroupsController < BaseController
+        before_action :set_metric_group, only: [:show, :update, :destroy]
+        def index
+          render json: MetricGroup.order(created_at: :desc)
+        end
+        def show
+          render json: @metric_group
+        end
+        def create
+          metric_group = MetricGroup.new(metric_group_params.except(:metric_ids))
+          if metric_group.save
+            replace_metric_memberships(metric_group, params[:metric_ids]) if params.key?(:metric_ids)
+            render json: metric_group.reload, status: :created
+          else
+            render json: {errors: metric_group.errors}, status: :unprocessable_entity
+          end
+        end
+        def update
+          if @metric_group.update(metric_group_params.except(:metric_ids))
+            replace_metric_memberships(@metric_group, params[:metric_ids]) if params.key?(:metric_ids)
+            render json: @metric_group.reload
+          else
+            render json: {errors: @metric_group.errors}, status: :unprocessable_entity
+          end
+        end
+        def destroy
+          @metric_group.destroy!
+          head :no_content
+        end
+        private
+        def set_metric_group
+          @metric_group = MetricGroup.find(params[:id])
+        rescue ActiveRecord::RecordNotFound
+          not_found
+        end
+        def metric_group_params
+          params.permit(:name, :description, metric_ids: [])
+        end
+        def replace_metric_memberships(metric_group, metric_ids)
+          return unless metric_ids
+          metric_group.metric_group_memberships.delete_all
+          Array(metric_ids).reject(&:blank?).each_with_index do |metric_id, index|
+            metric_group.metric_group_memberships.create!(metric_id: metric_id, position: index + 1)
+          end
+        end
+      end
+    end
+  end
+end

data/app/controllers/completion_kit/api/v1/metrics_controller.rb CHANGED Viewed

@@ -43,7 +43,7 @@ module CompletionKit
         end
         def metric_params
-          params.permit(:name, :instruction, evaluation_steps: [], rubric_bands: [:stars, :description])
+          params.permit(:name, :instruction, rubric_bands: [:stars, :description])
         end
       end
     end

data/app/controllers/completion_kit/metric_groups_controller.rb ADDED Viewed

@@ -0,0 +1,67 @@
+module CompletionKit
+  class MetricGroupsController < ApplicationController
+    before_action :set_metric_group, only: [:show, :edit, :update, :destroy]
+    def index
+      @metric_groups = MetricGroup.includes(:metrics).order(:name)
+    end
+    def show
+    end
+    def new
+      @metric_group = MetricGroup.new
+      @metrics = Metric.order(:name)
+    end
+    def edit
+      @metrics = Metric.order(:name)
+    end
+    def create
+      @metric_group = MetricGroup.new(metric_group_params.except(:metric_ids))
+      @metrics = Metric.order(:name)
+      if @metric_group.save
+        replace_metric_memberships
+        redirect_to metric_group_path(@metric_group), notice: "Metric group was successfully created."
+      else
+        render :new, status: :unprocessable_entity
+      end
+    end
+    def update
+      @metrics = Metric.order(:name)
+      if @metric_group.update(metric_group_params.except(:metric_ids))
+        replace_metric_memberships
+        redirect_to metric_group_path(@metric_group), notice: "Metric group was successfully updated."
+      else
+        render :edit, status: :unprocessable_entity
+      end
+    end
+    def destroy
+      @metric_group.destroy
+      redirect_to metric_groups_path, notice: "Metric group was successfully destroyed."
+    end
+    private
+    def set_metric_group
+      @metric_group = MetricGroup.find(params[:id])
+    end
+    def metric_group_params
+      params.require(:metric_group).permit(:name, :description, metric_ids: [])
+    end
+    def replace_metric_memberships
+      metric_ids = Array(metric_group_params[:metric_ids]).reject(&:blank?)
+      @metric_group.metric_group_memberships.delete_all
+      metric_ids.each_with_index do |metric_id, index|
+        @metric_group.metric_group_memberships.create!(metric_id: metric_id, position: index + 1)
+      end
+    end
+  end
+end

data/app/controllers/completion_kit/metrics_controller.rb CHANGED Viewed

@@ -3,7 +3,7 @@ module CompletionKit
     before_action :set_metric, only: [:show, :edit, :update, :destroy]
     def index
-      @metrics = Metric.order(:name)
+      @metrics = Metric.includes(:metric_groups).order(:name)
     end
     def show
@@ -46,7 +46,7 @@ module CompletionKit
     end
     def metric_params
-      params.require(:metric).permit(:name, :instruction, evaluation_steps: [], rubric_bands: [:stars, :description])
+      params.require(:metric).permit(:name, :instruction, rubric_bands: [:stars, :description])
     end
   end
 end

data/app/controllers/completion_kit/runs_controller.rb CHANGED Viewed

@@ -112,7 +112,7 @@ module CompletionKit
     def load_form_collections
       @prompts = Prompt.order(:name)
       @datasets = Dataset.order(:name)
-      @criterias = Criteria.includes(:metrics).order(:name)
+      @metric_groups = MetricGroup.includes(:metrics).order(:name)
       @all_metrics = Metric.order(:name)
     end

data/app/models/completion_kit/metric.rb CHANGED Viewed

@@ -8,12 +8,11 @@ module CompletionKit
       { "stars" => 1, "description" => "Fails to meet the criteria. Major errors or completely off-target." }
     ].freeze
-    has_many :criteria_memberships, dependent: :destroy
-    has_many :criterias, through: :criteria_memberships, source: :criteria
+    has_many :metric_group_memberships, dependent: :destroy
+    has_many :metric_groups, through: :metric_group_memberships, source: :metric_group
     has_many :reviews, dependent: :nullify
     serialize :rubric_bands, coder: JSON
-    serialize :evaluation_steps, coder: JSON
     validates :name, presence: true
     validates :key, uniqueness: true, allow_nil: true
@@ -74,7 +73,7 @@ module CompletionKit
     def as_json(options = {})
       {
         id: id, name: name, key: key, instruction: instruction,
-        evaluation_steps: evaluation_steps, rubric_bands: rubric_bands,
+        rubric_bands: rubric_bands,
         created_at: created_at, updated_at: updated_at
       }
     end
@@ -86,7 +85,6 @@ module CompletionKit
     end
     def set_defaults
-      self.evaluation_steps ||= []
       self.rubric_bands = self.class.default_rubric_bands if rubric_bands.blank?
     end

data/app/models/completion_kit/metric_group.rb ADDED Viewed

@@ -0,0 +1,22 @@
+module CompletionKit
+  class MetricGroup < ApplicationRecord
+    self.table_name = "completion_kit_metric_groups"
+    has_many :metric_group_memberships, -> { order(:position, :id) }, dependent: :destroy
+    has_many :metrics, through: :metric_group_memberships
+    validates :name, presence: true
+    def ordered_metrics
+      metric_group_memberships.includes(:metric).map(&:metric).compact
+    end
+    def as_json(options = {})
+      {
+        id: id, name: name, description: description,
+        created_at: created_at, updated_at: updated_at,
+        metric_ids: metric_ids
+      }
+    end
+  end
+end

data/app/models/completion_kit/metric_group_membership.rb ADDED Viewed

@@ -0,0 +1,20 @@
+module CompletionKit
+  class MetricGroupMembership < ApplicationRecord
+    self.table_name = "completion_kit_metric_group_memberships"
+    belongs_to :metric_group, class_name: "CompletionKit::MetricGroup", foreign_key: "metric_group_id"
+    belongs_to :metric
+    validates :metric_id, uniqueness: { scope: :metric_group_id }
+    before_validation :set_default_position
+    private
+    def set_default_position
+      return if position.present? || metric_group.blank?
+      self.position = metric_group.metric_group_memberships.maximum(:position).to_i + 1
+    end
+  end
+end

data/app/models/completion_kit/run.rb CHANGED Viewed

@@ -114,7 +114,6 @@ module CompletionKit
             response.expected_output,
             prompt.template,
             criteria: metric.respond_to?(:instruction) ? metric.instruction.to_s : "",
-            evaluation_steps: metric.respond_to?(:evaluation_steps) ? metric.evaluation_steps : nil,
             rubric_text: metric.respond_to?(:display_rubric_text) ? metric.display_rubric_text : nil,
             input_data: response.input_data
           )

data/app/services/completion_kit/judge_service.rb CHANGED Viewed

@@ -8,11 +8,11 @@ module CompletionKit
       @judge_client = LlmClient.for_model(@judge_model, ApiConfig.for_model(@judge_model))
     end
-    def evaluate(output, expected_output = nil, prompt = nil, criteria: nil, evaluation_steps: nil, rubric_text: nil, human_examples: nil, input_data: nil, **_extras)
+    def evaluate(output, expected_output = nil, prompt = nil, criteria: nil, rubric_text: nil, human_examples: nil, input_data: nil, **_extras)
       return { score: 1, feedback: "Judge not configured" } unless @judge_client.configured?
       judge_prompt = build_judge_prompt(output, expected_output, prompt,
-        criteria: criteria, evaluation_steps: evaluation_steps,
+        criteria: criteria,
         rubric_text: rubric_text, human_examples: human_examples,
         input_data: input_data)
@@ -27,7 +27,7 @@ module CompletionKit
     private
-    def build_judge_prompt(output, expected_output, prompt, criteria: nil, evaluation_steps: nil, rubric_text: nil, human_examples: nil, input_data: nil)
+    def build_judge_prompt(output, expected_output, prompt, criteria: nil, rubric_text: nil, human_examples: nil, input_data: nil)
       judge_prompt = <<~PROMPT
         You are an expert evaluator. You MUST respond with ONLY two lines in this exact format, nothing else:
@@ -44,10 +44,6 @@ module CompletionKit
         judge_prompt += "\nCriteria: #{criteria}\n"
       end
-      if evaluation_steps.present? && evaluation_steps.any?
-        judge_prompt += "\nEvaluation steps:\n#{evaluation_steps.each_with_index.map { |step, i| "#{i + 1}. #{step}" }.join("\n")}\n"
-      end
       if human_examples.present?
         judge_prompt += "\nCalibration examples:\n"
         human_examples.each_with_index do |example, index|

data/app/services/completion_kit/mcp_dispatcher.rb CHANGED Viewed

@@ -33,7 +33,7 @@ module CompletionKit
         McpTools::Responses.definitions +
         McpTools::Datasets.definitions +
         McpTools::Metrics.definitions +
-        McpTools::Criteria.definitions +
+        McpTools::MetricGroups.definitions +
         McpTools::ProviderCredentials.definitions
     end
@@ -44,7 +44,7 @@ module CompletionKit
       when /\Aresponses_/            then McpTools::Responses.call(name, arguments)
       when /\Adatasets_/             then McpTools::Datasets.call(name, arguments)
       when /\Ametrics_/              then McpTools::Metrics.call(name, arguments)
-      when /\Acriteria_/             then McpTools::Criteria.call(name, arguments)
+      when /\Ametric_groups_/        then McpTools::MetricGroups.call(name, arguments)
       when /\Aprovider_credentials_/ then McpTools::ProviderCredentials.call(name, arguments)
       else raise MethodNotFound, "Unknown tool: #{name}"
       end