RubyGems - completion-kit - Versions diffs - 0.5.10 → 0.5.11 - Mend

completion-kit 0.5.10 → 0.5.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/README.md +15 -15
data/app/assets/stylesheets/completion_kit/application.css +3 -0
data/app/controllers/completion_kit/api/v1/runs_controller.rb +1 -1
data/app/controllers/completion_kit/runs_controller.rb +8 -2
data/app/jobs/completion_kit/judge_review_job.rb +1 -1
data/app/models/completion_kit/run.rb +55 -10
data/app/services/completion_kit/mcp_tools/runs.rb +6 -4
data/app/views/completion_kit/api_reference/_body.html.erb +1 -1
data/app/views/completion_kit/responses/show.html.erb +26 -11
data/app/views/completion_kit/runs/_form.html.erb +50 -3
data/app/views/completion_kit/runs/_row.html.erb +6 -2
data/app/views/completion_kit/runs/_status_header.html.erb +5 -1
data/app/views/completion_kit/runs/edit.html.erb +6 -2
data/app/views/completion_kit/runs/show.html.erb +24 -15
data/db/migrate/20260514000001_allow_judge_only_runs.rb +6 -0
data/lib/completion_kit/version.rb +1 -1
metadata +2 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 64a36635f11bc5109afed180697b763a1c8231347b4912aa452271fd19694a45
-  data.tar.gz: 366ef22faa5914db74402b7ae1c9769f50f14ae8204e2144a245b8d29f89b464
+  metadata.gz: ed531ae29162bb91d2c463c3ff4eb20b5da469b9b7a21baddf5054a0ccc15041
+  data.tar.gz: b86aea95b2e1cf73abf6514093565dc07b12dc0f4fe5c5c5c8b80db3fbdfa83d
 SHA512:
-  metadata.gz: f84d7a7e0bcd5a3a054e84231cb390026dd07ac469b34ff8bb8a10a6f9f77fbbcb53748b4814d7b86aa94d1e67f41f3ad1e9c8eb1907f0ac56388169765174d6
-  data.tar.gz: ab627c627f3c88d1ef4a8f14afa0d72d7a92acc83b1107a4ebfeb25dacaf7a04282dfe3cfa5657d3c474171e508dec9944db95da526cbe8b3d94bbd0a9ef32a7
+  metadata.gz: 04ae500020e71d52c41073c36a6741bc47b94a06ceec6548720d6022a60ce7422be8a354d627c39a7be8174af2ce65219041c5d99ad175157c7bf4b4eaf8f056
+  data.tar.gz: 261daeeb1555b3aecb8e2e18edb7f14ebdc37f974c2713ecbe12c43a281e8109edc45222be0f6039c325a0428e89c1c11a1a7104f0a36ace2b618fb2ef1cb7e8

data/README.md CHANGED Viewed

@@ -14,21 +14,23 @@ Run every prompt against real data. Score each output with an LLM judge against
 It's the difference between "this prompt seems to work" and "this prompt scores 4.3 out of 5 across 200 inputs, up from 3.8 last version."
-**[completionkit.com](https://completionkit.com)** | **[RubyGems](https://rubygems.org/gems/completion-kit)**
+**[Start on completionkit.com →](https://completionkit.com)** | **[RubyGems](https://rubygems.org/gems/completion-kit)**
-> **CompletionKit Cloud** — hosted, managed CompletionKit with zero setup. Same engine, run for you. See plans at [completionkit.com/pricing](https://completionkit.com/pricing).
+> **Just want to use it?** [CompletionKit Cloud](https://completionkit.com) is the same engine, fully hosted — zero install, no Rails ops, plans at [completionkit.com/pricing](https://completionkit.com/pricing).
 ![Test run with scored results](https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/screenshots/test-run.png)
-## Quick Start
+## Three ways to run it
-### Use CompletionKit Cloud
+Same engine, same UI, same REST API and MCP server — pick the deployment that fits.
-The fastest way to start — no install, no servers to run. Sign up at [completionkit.com](https://completionkit.com) and you get the same engine you'd self-host, hosted for you. Best fit if you want to skip the Rails ops.
+### 1. Hosted — [completionkit.com](https://completionkit.com) (recommended)
-### Or run the standalone app
+The fastest path. Sign up and you're running on the same engine you'd self-host, without touching a Rails app. No `db:migrate`, no Puma, no Solid Queue, no provider key management — multi-tenant workspaces, your team logs in, you go. Plans at [completionkit.com/pricing](https://completionkit.com/pricing).
-Self-host the same engine. No existing Rails app needed.
+### 2. Self-hosted — the bundled standalone Rails app
+Run it on your own infra. No existing Rails app required; Postgres + any Rails-friendly host (Fly, Render, Heroku, Docker, …).
 ```bash
 git clone https://github.com/homemade-software-inc/completion-kit.git
@@ -38,7 +40,7 @@ bin/rails completion_kit:install:migrations
 bin/rails db:migrate
 ```
-Then run **both** processes — a web server and a Solid Queue worker. In two terminals:
+Run **both** a web server and a Solid Queue worker. In two terminals:
 ```bash
 bin/rails server
@@ -50,9 +52,9 @@ bin/jobs
 Or with [foreman](https://github.com/ddollar/foreman) in one terminal: `foreman start -f Procfile.dev`.
-Visit `http://localhost:3000`. Add a provider credential (Settings), create a prompt, upload a CSV dataset, and run it.
+Visit `http://localhost:3000`. Add a provider credential (Settings), create a prompt, upload a CSV dataset, and run it. See [Deploying self-hosted](#deploying-self-hosted) for the production-env setup.
-### Or mount as an engine in your existing Rails app
+### 3. Rails engine — mount into your existing Rails app
 ```ruby
 gem "completion-kit"
@@ -63,11 +65,9 @@ bin/rails generate completion_kit:install
 bin/rails db:migrate
 ```
-The engine mounts at `/completion_kit` in your app. CompletionKit's generate and judge flows enqueue Active Job jobs (`CompletionKit::GenerateRowJob`, `CompletionKit::JudgeReviewJob`, `CompletionKit::RunCompletionCheckJob`), so your host app needs an Active Job adapter that actually processes them — Solid Queue, Sidekiq, GoodJob, etc. The `:async` adapter is **not** suitable for production: it runs jobs in the web Puma's thread pool with no durability and no retry, and a long LLM call will block request handling.
-### Host-app layout integration
+The engine mounts at `/completion_kit`. Generate / judge flows enqueue Active Job jobs (`CompletionKit::GenerateRowJob`, `CompletionKit::JudgeReviewJob`, `CompletionKit::RunCompletionCheckJob`), so your host app needs an Active Job adapter that actually processes them — Solid Queue, Sidekiq, GoodJob, etc. The `:async` adapter is **not** suitable for production: it runs jobs in the web Puma's thread pool with no durability and no retry, and a long LLM call will block request handling.
-If your host app overrides the engine layout (e.g. `layout "application"` on engine controllers, or rendering engine views inside your own shell), include both the engine's stylesheet and JavaScript in that layout:
+**Host-app layout integration.** If your host app overrides the engine layout (e.g. `layout "application"` on engine controllers, or rendering engine views inside your own shell), include both the engine's stylesheet and JavaScript in that layout:
 ```erb
 <%= stylesheet_link_tag "completion_kit/application", media: "all" %>
@@ -183,7 +183,7 @@ CompletionKit runs a [Model Context Protocol](https://modelcontextprotocol.io) s
 The in-app API reference page has install snippets you can copy straight into your MCP client config.
-## Deploying the standalone app
+## Deploying self-hosted
 Any Rails-friendly host works (Fly, Heroku, Render, Docker, etc.). Point it at a Postgres instance via `DATABASE_URL`, set your provider env vars, and run `cd standalone && bin/rails db:migrate` on each deploy.

data/app/assets/stylesheets/completion_kit/application.css CHANGED Viewed

@@ -1537,6 +1537,9 @@ tr:hover .ck-chip--publish {
   display: grid;
   gap: 0.4rem;
 }
+.ck-field[hidden] {
+  display: none;
+}
 .ck-field--spacious {
   margin-top: 0.3rem;

data/app/controllers/completion_kit/api/v1/runs_controller.rb CHANGED Viewed

@@ -76,7 +76,7 @@ module CompletionKit
         end
         def run_params
-          params.permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature,
+          params.permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature, :output_column,
             metric_ids: [], tag_names: [])
         end
       end

data/app/controllers/completion_kit/runs_controller.rb CHANGED Viewed

@@ -84,6 +84,7 @@ module CompletionKit
         dataset_id: @run.dataset_id,
         judge_model: @run.judge_model,
         temperature: @run.temperature,
+        output_column: @run.output_column,
         tag_names: @run.tag_names,
         status: "pending"
       )
@@ -108,6 +109,11 @@ module CompletionKit
     end
     def suggest
+      if @run.prompt.nil?
+        redirect_to run_path(@run), alert: "Judge-only runs don't have a prompt to improve."
+        return
+      end
       service = PromptImprovementService.new(@run)
       result = service.suggest
       suggestion = @run.suggestions.create!(
@@ -159,13 +165,13 @@ module CompletionKit
     end
     def run_params
-      params.require(:run).permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature, metric_ids: [], tag_names: [])
+      params.require(:run).permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature, :output_column, metric_ids: [], tag_names: [])
     end
     # Editing a run that already has results forks a new run — but only when a
     # field that affects generation or judging changed. Renaming or retagging is
     # pure metadata and updates the run in place.
-    GENERATION_RUN_FIELDS = %i[prompt_id dataset_id judge_model temperature].freeze
+    GENERATION_RUN_FIELDS = %i[prompt_id dataset_id judge_model temperature output_column].freeze
     def run_generation_changed?
       GENERATION_RUN_FIELDS.each do |field|

data/app/jobs/completion_kit/judge_review_job.rb CHANGED Viewed

@@ -54,7 +54,7 @@ module CompletionKit
       evaluation = judge.evaluate(
         response.response_text,
         response.expected_output,
-        run.prompt.template,
+        run.prompt&.template,
         criteria: metric.instruction.to_s,
         rubric_text: metric.display_rubric_text,
         input_data: response.input_data

data/app/models/completion_kit/run.rb CHANGED Viewed

@@ -5,7 +5,7 @@ module CompletionKit
     STATUSES = %w[pending running completed failed].freeze
-    belongs_to :prompt
+    belongs_to :prompt, optional: true
     belongs_to :dataset, optional: true
     has_many :responses, dependent: :destroy
     has_many :run_metrics, -> { order(:position) }, dependent: :destroy
@@ -15,10 +15,18 @@ module CompletionKit
     validates :name, presence: true
     validates :status, inclusion: { in: STATUSES }
     validate :dataset_supplies_prompt_variables
+    validate :judge_only_run_supplies_output_column
     before_validation :set_default_status, on: :create
     before_validation :set_auto_name, on: :create
+    # A judge-only run grades a pre-existing column on the dataset instead of
+    # generating new outputs. No prompt is attached; the response text is read
+    # from row[output_column]; no LLM generation happens.
+    def judge_only?
+      prompt.nil?
+    end
     def missing_dataset_variables
       return [] unless prompt
       vars = prompt.variables
@@ -89,9 +97,14 @@ module CompletionKit
       return fail_with_summary!("Dataset has no rows") if rows.empty?
-      client = LlmClient.for_model(prompt.llm_model, ApiConfig.for_model(prompt.llm_model))
-      unless client.configured?
-        return fail_with_summary!("LLM API not configured: #{client.configuration_errors.join(', ')}")
+      if judge_only?
+        column = output_column.presence || "actual_output"
+        return fail_with_summary!("Dataset has no \"#{column}\" column") unless dataset && dataset.headers.include?(column)
+      else
+        client = LlmClient.for_model(prompt.llm_model, ApiConfig.for_model(prompt.llm_model))
+        unless client.configured?
+          return fail_with_summary!("LLM API not configured: #{client.configuration_errors.join(', ')}")
+        end
       end
       transaction do
@@ -105,14 +118,27 @@ module CompletionKit
         )
         rows.each_with_index do |row, index|
           input = row.empty? ? nil : row.to_json
-          response = responses.create!(
+          attrs = {
             status: "pending",
             row_index: index,
             input_data: input,
             expected_output: row["expected_output"]
-          )
-          GenerateRowJob.perform_later(id, response.id)
+          }
+          if judge_only?
+            attrs[:status] = "succeeded"
+            attrs[:response_text] = row[output_column.presence || "actual_output"].to_s
+          end
+          response = responses.create!(attrs)
+          if judge_only?
+            metrics.each { |m| JudgeReviewJob.perform_later(response.id, m.id) } if judge_configured?
+          else
+            GenerateRowJob.perform_later(id, response.id)
+          end
         end
+        RunCompletionCheckJob.perform_later(id) if judge_only?
       end
       broadcast_ui
@@ -168,6 +194,7 @@ module CompletionKit
       {
         id: id, name: name, status: status, prompt_id: prompt_id,
         dataset_id: dataset_id, judge_model: judge_model, temperature: temperature,
+        output_column: output_column,
         created_at: created_at, updated_at: updated_at,
         responses_count: responses.count, avg_score: avg_score,
         progress_current: snap[:generated_done],
@@ -274,10 +301,14 @@ module CompletionKit
     def set_auto_name
       return if name.present?
-      return unless prompt.present?
-      count = Run.where(prompt_id: prompt_id).count + 1
-      self.name = "#{prompt.name} — v#{prompt.version_number} ##{count}"
+      if prompt.present?
+        count = Run.where(prompt_id: prompt_id).count + 1
+        self.name = "#{prompt.name} — v#{prompt.version_number} ##{count}"
+      elsif dataset.present?
+        count = Run.where(prompt_id: nil, dataset_id: dataset.id).count + 1
+        self.name = "#{dataset.name} — judge-only ##{count}"
+      end
     end
     def dataset_supplies_prompt_variables
@@ -290,5 +321,19 @@ module CompletionKit
         errors.add(:dataset_id, "is missing columns required by the prompt: #{missing.join(', ')}")
       end
     end
+    def judge_only_run_supplies_output_column
+      return if prompt.present?
+      if dataset.nil?
+        errors.add(:dataset_id, "is required for a judge-only run (no prompt)")
+        return
+      end
+      column = output_column.presence || "actual_output"
+      unless dataset.headers.include?(column)
+        errors.add(:output_column, "\"#{column}\" is not a column on dataset \"#{dataset.name}\"")
+      end
+    end
   end
 end

data/app/services/completion_kit/mcp_tools/runs.rb CHANGED Viewed

@@ -15,16 +15,17 @@ module CompletionKit
           handler: :get
         },
         "runs_create" => {
-          description: "Create a run",
+          description: "Create a run. Omit prompt_id and provide output_column for a judge-only run that grades a pre-existing dataset column instead of generating new outputs.",
           inputSchema: {
             type: "object",
             properties: {
               name: {type: "string"}, prompt_id: {type: "integer"},
               dataset_id: {type: "integer"}, judge_model: {type: "string"},
+              output_column: {type: "string", description: "Dataset column to grade when prompt_id is omitted; defaults to \"actual_output\"."},
               metric_ids: {type: "array", items: {type: "integer"}},
               tag_names: {type: "array", items: {type: "string"}}
             },
-            required: ["name", "prompt_id"]
+            required: ["name"]
           },
           handler: :create
         },
@@ -35,6 +36,7 @@ module CompletionKit
             properties: {
               id: {type: "integer"}, name: {type: "string"},
               dataset_id: {type: "integer"}, judge_model: {type: "string"},
+              output_column: {type: "string"},
               metric_ids: {type: "array", items: {type: "integer"}},
               tag_names: {type: "array", items: {type: "string"}}
             },
@@ -63,7 +65,7 @@ module CompletionKit
       end
       def self.create(args)
-        run = Run.new(args.slice("name", "prompt_id", "dataset_id", "judge_model"))
+        run = Run.new(args.slice("name", "prompt_id", "dataset_id", "judge_model", "output_column"))
         if run.save
           run.replace_metrics!(args["metric_ids"])
           run.update!(tag_names: args["tag_names"]) if args.key?("tag_names")
@@ -75,7 +77,7 @@ module CompletionKit
       def self.update(args)
         run = Run.find(args["id"])
-        if run.update(args.except("id", "metric_ids", "tag_names").slice("name", "dataset_id", "judge_model"))
+        if run.update(args.except("id", "metric_ids", "tag_names").slice("name", "dataset_id", "judge_model", "output_column"))
           run.replace_metrics!(args["metric_ids"]) if args.key?("metric_ids")
           run.update!(tag_names: args["tag_names"]) if args.key?("tag_names")
           text_result(run.reload.as_json)

data/app/views/completion_kit/api_reference/_body.html.erb CHANGED Viewed

@@ -121,7 +121,7 @@
       <div class="ck-api-endpoint">
         <p class="ck-api-method"><span class="ck-chip ck-chip--soft">POST</span> /api/v1/runs</p>
         <p class="ck-meta-copy">Create a new run.</p>
-        <p class="ck-api-params"><strong>Required:</strong>&ensp;<code>prompt_id</code>&emsp;<strong>Optional:</strong>&ensp;<code>name</code>, <code>dataset_id</code>, <code>metric_ids</code>, <code>judge_model</code></p>
+        <p class="ck-api-params"><strong>Optional:</strong>&ensp;<code>name</code>, <code>prompt_id</code>, <code>dataset_id</code>, <code>metric_ids</code>, <code>judge_model</code>, <code>output_column</code> (judge-only: omit <code>prompt_id</code> and grade a dataset column instead, default <code>actual_output</code>)</p>
         <%= render "completion_kit/api_reference/example", base_url: base_url, token: token, real_token: real_token, cmd: "curl -X POST #{base_url}/api/v1/runs \\\n  -H \"Authorization: Bearer #{token}\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"prompt_id\": 1, \"dataset_id\": 1, \"metric_ids\": [1, 2]}'" %>
       </div>
       <div class="ck-api-endpoint">

data/app/views/completion_kit/responses/show.html.erb CHANGED Viewed

@@ -1,6 +1,10 @@
 <ol class="ck-breadcrumb">
-  <li><%= link_to "Prompts", prompts_path %></li>
-  <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
+  <% if @run.prompt %>
+    <li><%= link_to "Prompts", prompts_path %></li>
+    <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
+  <% else %>
+    <li><%= link_to "Runs", runs_path %></li>
+  <% end %>
   <li><%= link_to @run.name, run_path(@run) %></li>
   <li>Response #<%= @response_number %></li>
 </ol>
@@ -30,20 +34,29 @@
     <span class="ck-run-config__key">Run</span>
     <%= link_to @run.name, run_path(@run), class: "ck-link" %>
   </div>
-  <div class="ck-run-config__row">
-    <span class="ck-run-config__key">Prompt</span>
-    <%= link_to @run.prompt.display_name, prompt_path(@run.prompt), class: "ck-link" %>
-  </div>
+  <% if @run.prompt %>
+    <div class="ck-run-config__row">
+      <span class="ck-run-config__key">Prompt</span>
+      <%= link_to @run.prompt.display_name, prompt_path(@run.prompt), class: "ck-link" %>
+    </div>
+  <% else %>
+    <div class="ck-run-config__row">
+      <span class="ck-run-config__key">Output</span>
+      <span>Dataset column <code><%= @run.output_column.presence || "actual_output" %></code></span>
+    </div>
+  <% end %>
   <% if @run.dataset %>
     <div class="ck-run-config__row">
       <span class="ck-run-config__key">Dataset</span>
       <%= link_to @run.dataset.name, dataset_path(@run.dataset), class: "ck-link" %>
     </div>
   <% end %>
-  <div class="ck-run-config__row">
-    <span class="ck-run-config__key">Model</span>
-    <span style="text-transform: none;"><%= @run.prompt.llm_model %></span>
-  </div>
+  <% if @run.prompt %>
+    <div class="ck-run-config__row">
+      <span class="ck-run-config__key">Model</span>
+      <span style="text-transform: none;"><%= @run.prompt.llm_model %></span>
+    </div>
+  <% end %>
   <% if @run.judge_model.present? %>
     <div class="ck-run-config__row">
       <span class="ck-run-config__key">Judge</span>
@@ -60,7 +73,9 @@
 <section class="ck-card--spaced">
   <div class="ck-prompt-preview__header">
     <p class="ck-kicker">Response</p>
-    <span class="ck-chip ck-chip--soft" style="text-transform: none;"><%= @run.prompt.llm_model %></span>
+    <% if @run.prompt %>
+      <span class="ck-chip ck-chip--soft" style="text-transform: none;"><%= @run.prompt.llm_model %></span>
+    <% end %>
   </div>
   <pre class="ck-code"><%= @response.response_text %></pre>
 </section>

data/app/views/completion_kit/runs/_form.html.erb CHANGED Viewed

@@ -17,6 +17,17 @@
     </div>
     <div class="ck-field">
+      <label class="ck-checkbox-label">
+        <%= check_box_tag "run[judge_only]", "1", run.persisted? && run.judge_only?, id: "run_judge_only", class: "ck-checkbox" %>
+        <span class="ck-checkbox-label__box" aria-hidden="true"></span>
+        <span class="ck-checkbox-label__body">
+          <span class="ck-checkbox-label__text">Judge-only run</span>
+          <span class="ck-checkbox-label__hint">Grade an existing column on the dataset instead of running a prompt. Roughly half the LLM calls per row.</span>
+        </span>
+      </label>
+    </div>
+    <div class="ck-field" id="prompt-field">
       <%= form.label :prompt_id, "Prompt", class: "ck-label" %>
       <%= form.select :prompt_id,
             @prompts.map { |p|
@@ -43,6 +54,12 @@
       </div>
     </div>
+    <div class="ck-field" id="output-column-field" hidden>
+      <%= form.label :output_column, "Output column", class: "ck-label" %>
+      <%= form.text_field :output_column, value: run.output_column.presence || "actual_output", class: "ck-input", id: "run_output_column", placeholder: "actual_output" %>
+      <p class="ck-field-hint">Name of the dataset column whose value will be graded as the response. Defaults to <code>actual_output</code>.</p>
+    </div>
     <div class="ck-field" id="dataset-field">
       <%= form.label :dataset_id, "Dataset", class: "ck-label" %>
       <% if @datasets.empty? %>
@@ -157,6 +174,15 @@
 function updateRunForm() {
   var promptEl = document.getElementById('run_prompt_id');
   var judgeEl = document.getElementById('run_judge_model');
+  var judgeOnlyEl = document.getElementById('run_judge_only');
+  var judgeOnly = !!(judgeOnlyEl && judgeOnlyEl.checked);
+  var promptField = document.getElementById('prompt-field');
+  var outputColumnField = document.getElementById('output-column-field');
+  var outputColumnEl = document.getElementById('run_output_column');
+  if (promptField) promptField.hidden = judgeOnly;
+  if (outputColumnField) outputColumnField.hidden = !judgeOnly;
+  if (judgeOnly && promptEl) promptEl.value = '';
   var prompt = promptEl ? promptEl.value : '';
   var judge = judgeEl ? judgeEl.value : '';
   var metrics = document.querySelectorAll('input[name="run[metric_ids][]"]:checked');
@@ -222,11 +248,28 @@ function updateRunForm() {
     }
   }
-  var valid = prompt !== '';
+  var valid;
+  if (judgeOnly) {
+    valid = !!dataset;
+    if (dataset && datasetEl && outputColumnEl) {
+      var headersJudge = (datasetEl.options[datasetEl.selectedIndex] && datasetEl.options[datasetEl.selectedIndex].dataset.headers ? datasetEl.options[datasetEl.selectedIndex].dataset.headers.split(/,\s*/) : []).filter(Boolean);
+      var col = (outputColumnEl.value || 'actual_output').trim();
+      if (col === '' || headersJudge.indexOf(col) === -1) {
+        valid = false;
+        if (datasetField) datasetField.className = 'ck-field ck-field--error';
+        if (datasetHint) datasetHint.textContent = "Dataset has no \"" + col + "\" column — pick a different output column or dataset.";
+      }
+    } else if (!dataset) {
+      if (datasetField) datasetField.className = 'ck-field ck-field--info';
+      if (datasetHint) datasetHint.textContent = 'Judge-only runs need a dataset that supplies the output column.';
+    }
+  } else {
+    valid = prompt !== '';
+    if (hasVars && !dataset) valid = false;
+    if (missingVars.length > 0) valid = false;
+  }
   if (judge && metrics.length === 0) valid = false;
   if (!judge && metrics.length > 0) valid = false;
-  if (hasVars && !dataset) valid = false;
-  if (missingVars.length > 0) valid = false;
   if (submitBtn) submitBtn.disabled = !valid;
   ckUpdateMetricGroupsState();
@@ -260,9 +303,13 @@ function ckUpdateMetricGroupsState() {
 var judgeEl = document.getElementById('run_judge_model');
 var promptEl = document.getElementById('run_prompt_id');
 var datasetEl = document.getElementById('run_dataset_id');
+var judgeOnlyEl = document.getElementById('run_judge_only');
+var outputColumnEl = document.getElementById('run_output_column');
 if (judgeEl) judgeEl.addEventListener('change', updateRunForm);
 if (promptEl) promptEl.addEventListener('change', updateRunForm);
 if (datasetEl) datasetEl.addEventListener('change', updateRunForm);
+if (judgeOnlyEl) judgeOnlyEl.addEventListener('change', updateRunForm);
+if (outputColumnEl) outputColumnEl.addEventListener('input', updateRunForm);
 document.querySelectorAll('input[name="run[metric_ids][]"]').forEach(function(cb) {
   cb.addEventListener('change', updateRunForm);
 });

data/app/views/completion_kit/runs/_row.html.erb CHANGED Viewed

@@ -6,8 +6,12 @@
         <strong><%= run.name %></strong>
       </span>
       <div class="ck-runs-table__config">
-        <%= link_to run.prompt.name, prompt_path(run.prompt), class: "ck-runs-table__config-link", onclick: "event.stopPropagation();" %>
-        <span class="ck-runs-table__version">v<%= run.prompt.version_number %></span>
+        <% if run.prompt %>
+          <%= link_to run.prompt.name, prompt_path(run.prompt), class: "ck-runs-table__config-link", onclick: "event.stopPropagation();" %>
+          <span class="ck-runs-table__version">v<%= run.prompt.version_number %></span>
+        <% else %>
+          <span class="ck-runs-table__version">Judge-only</span>
+        <% end %>
         <% if run.dataset %>
           <span class="ck-runs-table__sep">·</span>
           <%= link_to run.dataset.name, dataset_path(run.dataset), class: "ck-runs-table__config-link", onclick: "event.stopPropagation();" %>

data/app/views/completion_kit/runs/_status_header.html.erb CHANGED Viewed

@@ -19,7 +19,11 @@
         <span class="ck-status-badge__label"><%= run.status.upcase %></span>
       </span>
       <h1 class="ck-title"><%= run.name %></h1>
-      <p class="ck-meta-copy"><%= link_to run.prompt.display_name, prompt_path(run.prompt), class: "ck-link" %>&ensp;<span class="ck-chip" style="text-transform: none;"><%= run.prompt.llm_model %></span></p>
+      <% if run.prompt %>
+        <p class="ck-meta-copy"><%= link_to run.prompt.display_name, prompt_path(run.prompt), class: "ck-link" %>&ensp;<span class="ck-chip" style="text-transform: none;"><%= run.prompt.llm_model %></span></p>
+      <% else %>
+        <p class="ck-meta-copy">Judge-only run — grading column <code><%= run.output_column.presence || "actual_output" %></code><% if run.dataset %> on <%= link_to run.dataset.name, dataset_path(run.dataset), class: "ck-link" %><% end %></p>
+      <% end %>
     </div>
     <%= render "completion_kit/runs/actions", run: run %>
   </section>

data/app/views/completion_kit/runs/edit.html.erb CHANGED Viewed

@@ -1,6 +1,10 @@
 <ol class="ck-breadcrumb">
-  <li><%= link_to "Prompts", prompts_path %></li>
-  <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
+  <% if @run.prompt %>
+    <li><%= link_to "Prompts", prompts_path %></li>
+    <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
+  <% else %>
+    <li><%= link_to "Runs", runs_path %></li>
+  <% end %>
   <li><%= link_to @run.name, run_path(@run) %></li>
   <li>Edit</li>
 </ol>

data/app/views/completion_kit/runs/show.html.erb CHANGED Viewed

@@ -59,24 +59,33 @@
   </div>
 </div>
-<div class="ck-prompt-preview">
-  <div class="ck-prompt-preview__header">
-    <p class="ck-kicker">Prompt</p>
-    <% latest_suggestion = @run.suggestions.order(created_at: :desc).first %>
-    <% if latest_suggestion %>
-      <%= link_to "View suggestion", suggestion_path(latest_suggestion, from: "run"), class: ck_button_classes(:light, variant: :outline) + " ck-button--sm" %>
-    <% elsif @run.status == "completed" && @run.responses.joins(:reviews).exists? %>
-      <%= button_to suggest_run_path(@run), method: :post, class: ck_button_classes(:light, variant: :outline) + " ck-button--sm", form_class: "inline-block" do %>
-        <%= heroicon_tag "sparkles", variant: :outline, class: "ck-magic-icon", "aria-hidden": "true" %>
-        Suggest improvements
+<% if @run.prompt %>
+  <div class="ck-prompt-preview">
+    <div class="ck-prompt-preview__header">
+      <p class="ck-kicker">Prompt</p>
+      <% latest_suggestion = @run.suggestions.order(created_at: :desc).first %>
+      <% if latest_suggestion %>
+        <%= link_to "View suggestion", suggestion_path(latest_suggestion, from: "run"), class: ck_button_classes(:light, variant: :outline) + " ck-button--sm" %>
+      <% elsif @run.status == "completed" && @run.responses.joins(:reviews).exists? %>
+        <%= button_to suggest_run_path(@run), method: :post, class: ck_button_classes(:light, variant: :outline) + " ck-button--sm", form_class: "inline-block" do %>
+          <%= heroicon_tag "sparkles", variant: :outline, class: "ck-magic-icon", "aria-hidden": "true" %>
+          Suggest improvements
+        <% end %>
       <% end %>
+    </div>
+    <p class="ck-prompt-preview__text" id="prompt_text"><%= @run.prompt.template %></p>
+    <% if @run.prompt.template.length > 200 %>
+      <button type="button" class="ck-disclosure-toggle" id="prompt_toggle" aria-expanded="false" aria-controls="prompt_text" onclick="var t=document.getElementById('prompt_text');var l=this;var expanded=t.classList.toggle('ck-prompt-preview__text--expanded');l.firstChild.textContent=expanded?'Show less':'Show more';l.setAttribute('aria-expanded',expanded?'true':'false')"><span>Show more</span></button>
     <% end %>
   </div>
-  <p class="ck-prompt-preview__text" id="prompt_text"><%= @run.prompt.template %></p>
-  <% if @run.prompt.template.length > 200 %>
-    <button type="button" class="ck-disclosure-toggle" id="prompt_toggle" aria-expanded="false" aria-controls="prompt_text" onclick="var t=document.getElementById('prompt_text');var l=this;var expanded=t.classList.toggle('ck-prompt-preview__text--expanded');l.firstChild.textContent=expanded?'Show less':'Show more';l.setAttribute('aria-expanded',expanded?'true':'false')"><span>Show more</span></button>
-  <% end %>
-</div>
+<% else %>
+  <div class="ck-prompt-preview">
+    <div class="ck-prompt-preview__header">
+      <p class="ck-kicker">Output source</p>
+    </div>
+    <p class="ck-prompt-preview__text">Dataset column <code><%= @run.output_column.presence || "actual_output" %></code> — no prompt generated these outputs.</p>
+  </div>
+<% end %>
 <% if @run.dataset %>
   <dialog id="dataset-preview-<%= @run.id %>" class="ck-modal" onclick="if(event.target===this)this.close()">

data/db/migrate/20260514000001_allow_judge_only_runs.rb ADDED Viewed

@@ -0,0 +1,6 @@
+class AllowJudgeOnlyRuns < ActiveRecord::Migration[8.1]
+  def change
+    change_column_null :completion_kit_runs, :prompt_id, true
+    add_column :completion_kit_runs, :output_column, :string
+  end
+end

data/lib/completion_kit/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module CompletionKit
-  VERSION = "0.5.10"
+  VERSION = "0.5.11"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: completion-kit
 version: !ruby/object:Gem::Version
-  version: 0.5.10
+  version: 0.5.11
 platform: ruby
 authors:
 - Damien Bastin
@@ -381,6 +381,7 @@ files:
 - db/migrate/20260509000001_create_completion_kit_tags.rb
 - db/migrate/20260509000002_create_completion_kit_taggings.rb
 - db/migrate/20260513000001_create_completion_kit_mcp_sessions.rb
+- db/migrate/20260514000001_allow_judge_only_runs.rb
 - lib/completion-kit.rb
 - lib/completion_kit.rb
 - lib/completion_kit/concurrency_check.rb