completion-kit 0.5.10 → 0.5.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 64a36635f11bc5109afed180697b763a1c8231347b4912aa452271fd19694a45
4
- data.tar.gz: 366ef22faa5914db74402b7ae1c9769f50f14ae8204e2144a245b8d29f89b464
3
+ metadata.gz: ed531ae29162bb91d2c463c3ff4eb20b5da469b9b7a21baddf5054a0ccc15041
4
+ data.tar.gz: b86aea95b2e1cf73abf6514093565dc07b12dc0f4fe5c5c5c8b80db3fbdfa83d
5
5
  SHA512:
6
- metadata.gz: f84d7a7e0bcd5a3a054e84231cb390026dd07ac469b34ff8bb8a10a6f9f77fbbcb53748b4814d7b86aa94d1e67f41f3ad1e9c8eb1907f0ac56388169765174d6
7
- data.tar.gz: ab627c627f3c88d1ef4a8f14afa0d72d7a92acc83b1107a4ebfeb25dacaf7a04282dfe3cfa5657d3c474171e508dec9944db95da526cbe8b3d94bbd0a9ef32a7
6
+ metadata.gz: 04ae500020e71d52c41073c36a6741bc47b94a06ceec6548720d6022a60ce7422be8a354d627c39a7be8174af2ce65219041c5d99ad175157c7bf4b4eaf8f056
7
+ data.tar.gz: 261daeeb1555b3aecb8e2e18edb7f14ebdc37f974c2713ecbe12c43a281e8109edc45222be0f6039c325a0428e89c1c11a1a7104f0a36ace2b618fb2ef1cb7e8
data/README.md CHANGED
@@ -14,21 +14,23 @@ Run every prompt against real data. Score each output with an LLM judge against
14
14
 
15
15
  It's the difference between "this prompt seems to work" and "this prompt scores 4.3 out of 5 across 200 inputs, up from 3.8 last version."
16
16
 
17
- **[completionkit.com](https://completionkit.com)** | **[RubyGems](https://rubygems.org/gems/completion-kit)**
17
+ **[Start on completionkit.com](https://completionkit.com)** | **[RubyGems](https://rubygems.org/gems/completion-kit)**
18
18
 
19
- > **CompletionKit Cloud** hosted, managed CompletionKit with zero setup. Same engine, run for you. See plans at [completionkit.com/pricing](https://completionkit.com/pricing).
19
+ > **Just want to use it?** [CompletionKit Cloud](https://completionkit.com) is the same engine, fully hosted zero install, no Rails ops, plans at [completionkit.com/pricing](https://completionkit.com/pricing).
20
20
 
21
21
  ![Test run with scored results](https://raw.githubusercontent.com/homemade-software-inc/completion-kit/main/docs/screenshots/test-run.png)
22
22
 
23
- ## Quick Start
23
+ ## Three ways to run it
24
24
 
25
- ### Use CompletionKit Cloud
25
+ Same engine, same UI, same REST API and MCP server — pick the deployment that fits.
26
26
 
27
- The fastest way to start no install, no servers to run. Sign up at [completionkit.com](https://completionkit.com) and you get the same engine you'd self-host, hosted for you. Best fit if you want to skip the Rails ops.
27
+ ### 1. Hosted — [completionkit.com](https://completionkit.com) (recommended)
28
28
 
29
- ### Or run the standalone app
29
+ The fastest path. Sign up and you're running on the same engine you'd self-host, without touching a Rails app. No `db:migrate`, no Puma, no Solid Queue, no provider key management — multi-tenant workspaces, your team logs in, you go. Plans at [completionkit.com/pricing](https://completionkit.com/pricing).
30
30
 
31
- Self-host the same engine. No existing Rails app needed.
31
+ ### 2. Self-hosted the bundled standalone Rails app
32
+
33
+ Run it on your own infra. No existing Rails app required; Postgres + any Rails-friendly host (Fly, Render, Heroku, Docker, …).
32
34
 
33
35
  ```bash
34
36
  git clone https://github.com/homemade-software-inc/completion-kit.git
@@ -38,7 +40,7 @@ bin/rails completion_kit:install:migrations
38
40
  bin/rails db:migrate
39
41
  ```
40
42
 
41
- Then run **both** processes — a web server and a Solid Queue worker. In two terminals:
43
+ Run **both** a web server and a Solid Queue worker. In two terminals:
42
44
 
43
45
  ```bash
44
46
  bin/rails server
@@ -50,9 +52,9 @@ bin/jobs
50
52
 
51
53
  Or with [foreman](https://github.com/ddollar/foreman) in one terminal: `foreman start -f Procfile.dev`.
52
54
 
53
- Visit `http://localhost:3000`. Add a provider credential (Settings), create a prompt, upload a CSV dataset, and run it.
55
+ Visit `http://localhost:3000`. Add a provider credential (Settings), create a prompt, upload a CSV dataset, and run it. See [Deploying self-hosted](#deploying-self-hosted) for the production-env setup.
54
56
 
55
- ### Or mount as an engine in your existing Rails app
57
+ ### 3. Rails engine mount into your existing Rails app
56
58
 
57
59
  ```ruby
58
60
  gem "completion-kit"
@@ -63,11 +65,9 @@ bin/rails generate completion_kit:install
63
65
  bin/rails db:migrate
64
66
  ```
65
67
 
66
- The engine mounts at `/completion_kit` in your app. CompletionKit's generate and judge flows enqueue Active Job jobs (`CompletionKit::GenerateRowJob`, `CompletionKit::JudgeReviewJob`, `CompletionKit::RunCompletionCheckJob`), so your host app needs an Active Job adapter that actually processes them — Solid Queue, Sidekiq, GoodJob, etc. The `:async` adapter is **not** suitable for production: it runs jobs in the web Puma's thread pool with no durability and no retry, and a long LLM call will block request handling.
67
-
68
- ### Host-app layout integration
68
+ The engine mounts at `/completion_kit`. Generate / judge flows enqueue Active Job jobs (`CompletionKit::GenerateRowJob`, `CompletionKit::JudgeReviewJob`, `CompletionKit::RunCompletionCheckJob`), so your host app needs an Active Job adapter that actually processes them — Solid Queue, Sidekiq, GoodJob, etc. The `:async` adapter is **not** suitable for production: it runs jobs in the web Puma's thread pool with no durability and no retry, and a long LLM call will block request handling.
69
69
 
70
- If your host app overrides the engine layout (e.g. `layout "application"` on engine controllers, or rendering engine views inside your own shell), include both the engine's stylesheet and JavaScript in that layout:
70
+ **Host-app layout integration.** If your host app overrides the engine layout (e.g. `layout "application"` on engine controllers, or rendering engine views inside your own shell), include both the engine's stylesheet and JavaScript in that layout:
71
71
 
72
72
  ```erb
73
73
  <%= stylesheet_link_tag "completion_kit/application", media: "all" %>
@@ -183,7 +183,7 @@ CompletionKit runs a [Model Context Protocol](https://modelcontextprotocol.io) s
183
183
 
184
184
  The in-app API reference page has install snippets you can copy straight into your MCP client config.
185
185
 
186
- ## Deploying the standalone app
186
+ ## Deploying self-hosted
187
187
 
188
188
  Any Rails-friendly host works (Fly, Heroku, Render, Docker, etc.). Point it at a Postgres instance via `DATABASE_URL`, set your provider env vars, and run `cd standalone && bin/rails db:migrate` on each deploy.
189
189
 
@@ -1537,6 +1537,9 @@ tr:hover .ck-chip--publish {
1537
1537
  display: grid;
1538
1538
  gap: 0.4rem;
1539
1539
  }
1540
+ .ck-field[hidden] {
1541
+ display: none;
1542
+ }
1540
1543
 
1541
1544
  .ck-field--spacious {
1542
1545
  margin-top: 0.3rem;
@@ -76,7 +76,7 @@ module CompletionKit
76
76
  end
77
77
 
78
78
  def run_params
79
- params.permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature,
79
+ params.permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature, :output_column,
80
80
  metric_ids: [], tag_names: [])
81
81
  end
82
82
  end
@@ -84,6 +84,7 @@ module CompletionKit
84
84
  dataset_id: @run.dataset_id,
85
85
  judge_model: @run.judge_model,
86
86
  temperature: @run.temperature,
87
+ output_column: @run.output_column,
87
88
  tag_names: @run.tag_names,
88
89
  status: "pending"
89
90
  )
@@ -108,6 +109,11 @@ module CompletionKit
108
109
  end
109
110
 
110
111
  def suggest
112
+ if @run.prompt.nil?
113
+ redirect_to run_path(@run), alert: "Judge-only runs don't have a prompt to improve."
114
+ return
115
+ end
116
+
111
117
  service = PromptImprovementService.new(@run)
112
118
  result = service.suggest
113
119
  suggestion = @run.suggestions.create!(
@@ -159,13 +165,13 @@ module CompletionKit
159
165
  end
160
166
 
161
167
  def run_params
162
- params.require(:run).permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature, metric_ids: [], tag_names: [])
168
+ params.require(:run).permit(:name, :prompt_id, :dataset_id, :judge_model, :temperature, :output_column, metric_ids: [], tag_names: [])
163
169
  end
164
170
 
165
171
  # Editing a run that already has results forks a new run — but only when a
166
172
  # field that affects generation or judging changed. Renaming or retagging is
167
173
  # pure metadata and updates the run in place.
168
- GENERATION_RUN_FIELDS = %i[prompt_id dataset_id judge_model temperature].freeze
174
+ GENERATION_RUN_FIELDS = %i[prompt_id dataset_id judge_model temperature output_column].freeze
169
175
 
170
176
  def run_generation_changed?
171
177
  GENERATION_RUN_FIELDS.each do |field|
@@ -54,7 +54,7 @@ module CompletionKit
54
54
  evaluation = judge.evaluate(
55
55
  response.response_text,
56
56
  response.expected_output,
57
- run.prompt.template,
57
+ run.prompt&.template,
58
58
  criteria: metric.instruction.to_s,
59
59
  rubric_text: metric.display_rubric_text,
60
60
  input_data: response.input_data
@@ -5,7 +5,7 @@ module CompletionKit
5
5
 
6
6
  STATUSES = %w[pending running completed failed].freeze
7
7
 
8
- belongs_to :prompt
8
+ belongs_to :prompt, optional: true
9
9
  belongs_to :dataset, optional: true
10
10
  has_many :responses, dependent: :destroy
11
11
  has_many :run_metrics, -> { order(:position) }, dependent: :destroy
@@ -15,10 +15,18 @@ module CompletionKit
15
15
  validates :name, presence: true
16
16
  validates :status, inclusion: { in: STATUSES }
17
17
  validate :dataset_supplies_prompt_variables
18
+ validate :judge_only_run_supplies_output_column
18
19
 
19
20
  before_validation :set_default_status, on: :create
20
21
  before_validation :set_auto_name, on: :create
21
22
 
23
+ # A judge-only run grades a pre-existing column on the dataset instead of
24
+ # generating new outputs. No prompt is attached; the response text is read
25
+ # from row[output_column]; no LLM generation happens.
26
+ def judge_only?
27
+ prompt.nil?
28
+ end
29
+
22
30
  def missing_dataset_variables
23
31
  return [] unless prompt
24
32
  vars = prompt.variables
@@ -89,9 +97,14 @@ module CompletionKit
89
97
 
90
98
  return fail_with_summary!("Dataset has no rows") if rows.empty?
91
99
 
92
- client = LlmClient.for_model(prompt.llm_model, ApiConfig.for_model(prompt.llm_model))
93
- unless client.configured?
94
- return fail_with_summary!("LLM API not configured: #{client.configuration_errors.join(', ')}")
100
+ if judge_only?
101
+ column = output_column.presence || "actual_output"
102
+ return fail_with_summary!("Dataset has no \"#{column}\" column") unless dataset && dataset.headers.include?(column)
103
+ else
104
+ client = LlmClient.for_model(prompt.llm_model, ApiConfig.for_model(prompt.llm_model))
105
+ unless client.configured?
106
+ return fail_with_summary!("LLM API not configured: #{client.configuration_errors.join(', ')}")
107
+ end
95
108
  end
96
109
 
97
110
  transaction do
@@ -105,14 +118,27 @@ module CompletionKit
105
118
  )
106
119
  rows.each_with_index do |row, index|
107
120
  input = row.empty? ? nil : row.to_json
108
- response = responses.create!(
121
+ attrs = {
109
122
  status: "pending",
110
123
  row_index: index,
111
124
  input_data: input,
112
125
  expected_output: row["expected_output"]
113
- )
114
- GenerateRowJob.perform_later(id, response.id)
126
+ }
127
+ if judge_only?
128
+ attrs[:status] = "succeeded"
129
+ attrs[:response_text] = row[output_column.presence || "actual_output"].to_s
130
+ end
131
+
132
+ response = responses.create!(attrs)
133
+
134
+ if judge_only?
135
+ metrics.each { |m| JudgeReviewJob.perform_later(response.id, m.id) } if judge_configured?
136
+ else
137
+ GenerateRowJob.perform_later(id, response.id)
138
+ end
115
139
  end
140
+
141
+ RunCompletionCheckJob.perform_later(id) if judge_only?
116
142
  end
117
143
 
118
144
  broadcast_ui
@@ -168,6 +194,7 @@ module CompletionKit
168
194
  {
169
195
  id: id, name: name, status: status, prompt_id: prompt_id,
170
196
  dataset_id: dataset_id, judge_model: judge_model, temperature: temperature,
197
+ output_column: output_column,
171
198
  created_at: created_at, updated_at: updated_at,
172
199
  responses_count: responses.count, avg_score: avg_score,
173
200
  progress_current: snap[:generated_done],
@@ -274,10 +301,14 @@ module CompletionKit
274
301
 
275
302
  def set_auto_name
276
303
  return if name.present?
277
- return unless prompt.present?
278
304
 
279
- count = Run.where(prompt_id: prompt_id).count + 1
280
- self.name = "#{prompt.name} — v#{prompt.version_number} ##{count}"
305
+ if prompt.present?
306
+ count = Run.where(prompt_id: prompt_id).count + 1
307
+ self.name = "#{prompt.name} — v#{prompt.version_number} ##{count}"
308
+ elsif dataset.present?
309
+ count = Run.where(prompt_id: nil, dataset_id: dataset.id).count + 1
310
+ self.name = "#{dataset.name} — judge-only ##{count}"
311
+ end
281
312
  end
282
313
 
283
314
  def dataset_supplies_prompt_variables
@@ -290,5 +321,19 @@ module CompletionKit
290
321
  errors.add(:dataset_id, "is missing columns required by the prompt: #{missing.join(', ')}")
291
322
  end
292
323
  end
324
+
325
+ def judge_only_run_supplies_output_column
326
+ return if prompt.present?
327
+
328
+ if dataset.nil?
329
+ errors.add(:dataset_id, "is required for a judge-only run (no prompt)")
330
+ return
331
+ end
332
+
333
+ column = output_column.presence || "actual_output"
334
+ unless dataset.headers.include?(column)
335
+ errors.add(:output_column, "\"#{column}\" is not a column on dataset \"#{dataset.name}\"")
336
+ end
337
+ end
293
338
  end
294
339
  end
@@ -15,16 +15,17 @@ module CompletionKit
15
15
  handler: :get
16
16
  },
17
17
  "runs_create" => {
18
- description: "Create a run",
18
+ description: "Create a run. Omit prompt_id and provide output_column for a judge-only run that grades a pre-existing dataset column instead of generating new outputs.",
19
19
  inputSchema: {
20
20
  type: "object",
21
21
  properties: {
22
22
  name: {type: "string"}, prompt_id: {type: "integer"},
23
23
  dataset_id: {type: "integer"}, judge_model: {type: "string"},
24
+ output_column: {type: "string", description: "Dataset column to grade when prompt_id is omitted; defaults to \"actual_output\"."},
24
25
  metric_ids: {type: "array", items: {type: "integer"}},
25
26
  tag_names: {type: "array", items: {type: "string"}}
26
27
  },
27
- required: ["name", "prompt_id"]
28
+ required: ["name"]
28
29
  },
29
30
  handler: :create
30
31
  },
@@ -35,6 +36,7 @@ module CompletionKit
35
36
  properties: {
36
37
  id: {type: "integer"}, name: {type: "string"},
37
38
  dataset_id: {type: "integer"}, judge_model: {type: "string"},
39
+ output_column: {type: "string"},
38
40
  metric_ids: {type: "array", items: {type: "integer"}},
39
41
  tag_names: {type: "array", items: {type: "string"}}
40
42
  },
@@ -63,7 +65,7 @@ module CompletionKit
63
65
  end
64
66
 
65
67
  def self.create(args)
66
- run = Run.new(args.slice("name", "prompt_id", "dataset_id", "judge_model"))
68
+ run = Run.new(args.slice("name", "prompt_id", "dataset_id", "judge_model", "output_column"))
67
69
  if run.save
68
70
  run.replace_metrics!(args["metric_ids"])
69
71
  run.update!(tag_names: args["tag_names"]) if args.key?("tag_names")
@@ -75,7 +77,7 @@ module CompletionKit
75
77
 
76
78
  def self.update(args)
77
79
  run = Run.find(args["id"])
78
- if run.update(args.except("id", "metric_ids", "tag_names").slice("name", "dataset_id", "judge_model"))
80
+ if run.update(args.except("id", "metric_ids", "tag_names").slice("name", "dataset_id", "judge_model", "output_column"))
79
81
  run.replace_metrics!(args["metric_ids"]) if args.key?("metric_ids")
80
82
  run.update!(tag_names: args["tag_names"]) if args.key?("tag_names")
81
83
  text_result(run.reload.as_json)
@@ -121,7 +121,7 @@
121
121
  <div class="ck-api-endpoint">
122
122
  <p class="ck-api-method"><span class="ck-chip ck-chip--soft">POST</span> /api/v1/runs</p>
123
123
  <p class="ck-meta-copy">Create a new run.</p>
124
- <p class="ck-api-params"><strong>Required:</strong>&ensp;<code>prompt_id</code>&emsp;<strong>Optional:</strong>&ensp;<code>name</code>, <code>dataset_id</code>, <code>metric_ids</code>, <code>judge_model</code></p>
124
+ <p class="ck-api-params"><strong>Optional:</strong>&ensp;<code>name</code>, <code>prompt_id</code>, <code>dataset_id</code>, <code>metric_ids</code>, <code>judge_model</code>, <code>output_column</code> (judge-only: omit <code>prompt_id</code> and grade a dataset column instead, default <code>actual_output</code>)</p>
125
125
  <%= render "completion_kit/api_reference/example", base_url: base_url, token: token, real_token: real_token, cmd: "curl -X POST #{base_url}/api/v1/runs \\\n -H \"Authorization: Bearer #{token}\" \\\n -H \"Content-Type: application/json\" \\\n -d '{\"prompt_id\": 1, \"dataset_id\": 1, \"metric_ids\": [1, 2]}'" %>
126
126
  </div>
127
127
  <div class="ck-api-endpoint">
@@ -1,6 +1,10 @@
1
1
  <ol class="ck-breadcrumb">
2
- <li><%= link_to "Prompts", prompts_path %></li>
3
- <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
2
+ <% if @run.prompt %>
3
+ <li><%= link_to "Prompts", prompts_path %></li>
4
+ <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
5
+ <% else %>
6
+ <li><%= link_to "Runs", runs_path %></li>
7
+ <% end %>
4
8
  <li><%= link_to @run.name, run_path(@run) %></li>
5
9
  <li>Response #<%= @response_number %></li>
6
10
  </ol>
@@ -30,20 +34,29 @@
30
34
  <span class="ck-run-config__key">Run</span>
31
35
  <%= link_to @run.name, run_path(@run), class: "ck-link" %>
32
36
  </div>
33
- <div class="ck-run-config__row">
34
- <span class="ck-run-config__key">Prompt</span>
35
- <%= link_to @run.prompt.display_name, prompt_path(@run.prompt), class: "ck-link" %>
36
- </div>
37
+ <% if @run.prompt %>
38
+ <div class="ck-run-config__row">
39
+ <span class="ck-run-config__key">Prompt</span>
40
+ <%= link_to @run.prompt.display_name, prompt_path(@run.prompt), class: "ck-link" %>
41
+ </div>
42
+ <% else %>
43
+ <div class="ck-run-config__row">
44
+ <span class="ck-run-config__key">Output</span>
45
+ <span>Dataset column <code><%= @run.output_column.presence || "actual_output" %></code></span>
46
+ </div>
47
+ <% end %>
37
48
  <% if @run.dataset %>
38
49
  <div class="ck-run-config__row">
39
50
  <span class="ck-run-config__key">Dataset</span>
40
51
  <%= link_to @run.dataset.name, dataset_path(@run.dataset), class: "ck-link" %>
41
52
  </div>
42
53
  <% end %>
43
- <div class="ck-run-config__row">
44
- <span class="ck-run-config__key">Model</span>
45
- <span style="text-transform: none;"><%= @run.prompt.llm_model %></span>
46
- </div>
54
+ <% if @run.prompt %>
55
+ <div class="ck-run-config__row">
56
+ <span class="ck-run-config__key">Model</span>
57
+ <span style="text-transform: none;"><%= @run.prompt.llm_model %></span>
58
+ </div>
59
+ <% end %>
47
60
  <% if @run.judge_model.present? %>
48
61
  <div class="ck-run-config__row">
49
62
  <span class="ck-run-config__key">Judge</span>
@@ -60,7 +73,9 @@
60
73
  <section class="ck-card--spaced">
61
74
  <div class="ck-prompt-preview__header">
62
75
  <p class="ck-kicker">Response</p>
63
- <span class="ck-chip ck-chip--soft" style="text-transform: none;"><%= @run.prompt.llm_model %></span>
76
+ <% if @run.prompt %>
77
+ <span class="ck-chip ck-chip--soft" style="text-transform: none;"><%= @run.prompt.llm_model %></span>
78
+ <% end %>
64
79
  </div>
65
80
  <pre class="ck-code"><%= @response.response_text %></pre>
66
81
  </section>
@@ -17,6 +17,17 @@
17
17
  </div>
18
18
 
19
19
  <div class="ck-field">
20
+ <label class="ck-checkbox-label">
21
+ <%= check_box_tag "run[judge_only]", "1", run.persisted? && run.judge_only?, id: "run_judge_only", class: "ck-checkbox" %>
22
+ <span class="ck-checkbox-label__box" aria-hidden="true"></span>
23
+ <span class="ck-checkbox-label__body">
24
+ <span class="ck-checkbox-label__text">Judge-only run</span>
25
+ <span class="ck-checkbox-label__hint">Grade an existing column on the dataset instead of running a prompt. Roughly half the LLM calls per row.</span>
26
+ </span>
27
+ </label>
28
+ </div>
29
+
30
+ <div class="ck-field" id="prompt-field">
20
31
  <%= form.label :prompt_id, "Prompt", class: "ck-label" %>
21
32
  <%= form.select :prompt_id,
22
33
  @prompts.map { |p|
@@ -43,6 +54,12 @@
43
54
  </div>
44
55
  </div>
45
56
 
57
+ <div class="ck-field" id="output-column-field" hidden>
58
+ <%= form.label :output_column, "Output column", class: "ck-label" %>
59
+ <%= form.text_field :output_column, value: run.output_column.presence || "actual_output", class: "ck-input", id: "run_output_column", placeholder: "actual_output" %>
60
+ <p class="ck-field-hint">Name of the dataset column whose value will be graded as the response. Defaults to <code>actual_output</code>.</p>
61
+ </div>
62
+
46
63
  <div class="ck-field" id="dataset-field">
47
64
  <%= form.label :dataset_id, "Dataset", class: "ck-label" %>
48
65
  <% if @datasets.empty? %>
@@ -157,6 +174,15 @@
157
174
  function updateRunForm() {
158
175
  var promptEl = document.getElementById('run_prompt_id');
159
176
  var judgeEl = document.getElementById('run_judge_model');
177
+ var judgeOnlyEl = document.getElementById('run_judge_only');
178
+ var judgeOnly = !!(judgeOnlyEl && judgeOnlyEl.checked);
179
+ var promptField = document.getElementById('prompt-field');
180
+ var outputColumnField = document.getElementById('output-column-field');
181
+ var outputColumnEl = document.getElementById('run_output_column');
182
+ if (promptField) promptField.hidden = judgeOnly;
183
+ if (outputColumnField) outputColumnField.hidden = !judgeOnly;
184
+ if (judgeOnly && promptEl) promptEl.value = '';
185
+
160
186
  var prompt = promptEl ? promptEl.value : '';
161
187
  var judge = judgeEl ? judgeEl.value : '';
162
188
  var metrics = document.querySelectorAll('input[name="run[metric_ids][]"]:checked');
@@ -222,11 +248,28 @@ function updateRunForm() {
222
248
  }
223
249
  }
224
250
 
225
- var valid = prompt !== '';
251
+ var valid;
252
+ if (judgeOnly) {
253
+ valid = !!dataset;
254
+ if (dataset && datasetEl && outputColumnEl) {
255
+ var headersJudge = (datasetEl.options[datasetEl.selectedIndex] && datasetEl.options[datasetEl.selectedIndex].dataset.headers ? datasetEl.options[datasetEl.selectedIndex].dataset.headers.split(/,\s*/) : []).filter(Boolean);
256
+ var col = (outputColumnEl.value || 'actual_output').trim();
257
+ if (col === '' || headersJudge.indexOf(col) === -1) {
258
+ valid = false;
259
+ if (datasetField) datasetField.className = 'ck-field ck-field--error';
260
+ if (datasetHint) datasetHint.textContent = "Dataset has no \"" + col + "\" column — pick a different output column or dataset.";
261
+ }
262
+ } else if (!dataset) {
263
+ if (datasetField) datasetField.className = 'ck-field ck-field--info';
264
+ if (datasetHint) datasetHint.textContent = 'Judge-only runs need a dataset that supplies the output column.';
265
+ }
266
+ } else {
267
+ valid = prompt !== '';
268
+ if (hasVars && !dataset) valid = false;
269
+ if (missingVars.length > 0) valid = false;
270
+ }
226
271
  if (judge && metrics.length === 0) valid = false;
227
272
  if (!judge && metrics.length > 0) valid = false;
228
- if (hasVars && !dataset) valid = false;
229
- if (missingVars.length > 0) valid = false;
230
273
  if (submitBtn) submitBtn.disabled = !valid;
231
274
 
232
275
  ckUpdateMetricGroupsState();
@@ -260,9 +303,13 @@ function ckUpdateMetricGroupsState() {
260
303
  var judgeEl = document.getElementById('run_judge_model');
261
304
  var promptEl = document.getElementById('run_prompt_id');
262
305
  var datasetEl = document.getElementById('run_dataset_id');
306
+ var judgeOnlyEl = document.getElementById('run_judge_only');
307
+ var outputColumnEl = document.getElementById('run_output_column');
263
308
  if (judgeEl) judgeEl.addEventListener('change', updateRunForm);
264
309
  if (promptEl) promptEl.addEventListener('change', updateRunForm);
265
310
  if (datasetEl) datasetEl.addEventListener('change', updateRunForm);
311
+ if (judgeOnlyEl) judgeOnlyEl.addEventListener('change', updateRunForm);
312
+ if (outputColumnEl) outputColumnEl.addEventListener('input', updateRunForm);
266
313
  document.querySelectorAll('input[name="run[metric_ids][]"]').forEach(function(cb) {
267
314
  cb.addEventListener('change', updateRunForm);
268
315
  });
@@ -6,8 +6,12 @@
6
6
  <strong><%= run.name %></strong>
7
7
  </span>
8
8
  <div class="ck-runs-table__config">
9
- <%= link_to run.prompt.name, prompt_path(run.prompt), class: "ck-runs-table__config-link", onclick: "event.stopPropagation();" %>
10
- <span class="ck-runs-table__version">v<%= run.prompt.version_number %></span>
9
+ <% if run.prompt %>
10
+ <%= link_to run.prompt.name, prompt_path(run.prompt), class: "ck-runs-table__config-link", onclick: "event.stopPropagation();" %>
11
+ <span class="ck-runs-table__version">v<%= run.prompt.version_number %></span>
12
+ <% else %>
13
+ <span class="ck-runs-table__version">Judge-only</span>
14
+ <% end %>
11
15
  <% if run.dataset %>
12
16
  <span class="ck-runs-table__sep">·</span>
13
17
  <%= link_to run.dataset.name, dataset_path(run.dataset), class: "ck-runs-table__config-link", onclick: "event.stopPropagation();" %>
@@ -19,7 +19,11 @@
19
19
  <span class="ck-status-badge__label"><%= run.status.upcase %></span>
20
20
  </span>
21
21
  <h1 class="ck-title"><%= run.name %></h1>
22
- <p class="ck-meta-copy"><%= link_to run.prompt.display_name, prompt_path(run.prompt), class: "ck-link" %>&ensp;<span class="ck-chip" style="text-transform: none;"><%= run.prompt.llm_model %></span></p>
22
+ <% if run.prompt %>
23
+ <p class="ck-meta-copy"><%= link_to run.prompt.display_name, prompt_path(run.prompt), class: "ck-link" %>&ensp;<span class="ck-chip" style="text-transform: none;"><%= run.prompt.llm_model %></span></p>
24
+ <% else %>
25
+ <p class="ck-meta-copy">Judge-only run — grading column <code><%= run.output_column.presence || "actual_output" %></code><% if run.dataset %> on <%= link_to run.dataset.name, dataset_path(run.dataset), class: "ck-link" %><% end %></p>
26
+ <% end %>
23
27
  </div>
24
28
  <%= render "completion_kit/runs/actions", run: run %>
25
29
  </section>
@@ -1,6 +1,10 @@
1
1
  <ol class="ck-breadcrumb">
2
- <li><%= link_to "Prompts", prompts_path %></li>
3
- <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
2
+ <% if @run.prompt %>
3
+ <li><%= link_to "Prompts", prompts_path %></li>
4
+ <li><%= link_to @run.prompt.name, prompt_path(@run.prompt) %></li>
5
+ <% else %>
6
+ <li><%= link_to "Runs", runs_path %></li>
7
+ <% end %>
4
8
  <li><%= link_to @run.name, run_path(@run) %></li>
5
9
  <li>Edit</li>
6
10
  </ol>
@@ -59,24 +59,33 @@
59
59
  </div>
60
60
  </div>
61
61
 
62
- <div class="ck-prompt-preview">
63
- <div class="ck-prompt-preview__header">
64
- <p class="ck-kicker">Prompt</p>
65
- <% latest_suggestion = @run.suggestions.order(created_at: :desc).first %>
66
- <% if latest_suggestion %>
67
- <%= link_to "View suggestion", suggestion_path(latest_suggestion, from: "run"), class: ck_button_classes(:light, variant: :outline) + " ck-button--sm" %>
68
- <% elsif @run.status == "completed" && @run.responses.joins(:reviews).exists? %>
69
- <%= button_to suggest_run_path(@run), method: :post, class: ck_button_classes(:light, variant: :outline) + " ck-button--sm", form_class: "inline-block" do %>
70
- <%= heroicon_tag "sparkles", variant: :outline, class: "ck-magic-icon", "aria-hidden": "true" %>
71
- Suggest improvements
62
+ <% if @run.prompt %>
63
+ <div class="ck-prompt-preview">
64
+ <div class="ck-prompt-preview__header">
65
+ <p class="ck-kicker">Prompt</p>
66
+ <% latest_suggestion = @run.suggestions.order(created_at: :desc).first %>
67
+ <% if latest_suggestion %>
68
+ <%= link_to "View suggestion", suggestion_path(latest_suggestion, from: "run"), class: ck_button_classes(:light, variant: :outline) + " ck-button--sm" %>
69
+ <% elsif @run.status == "completed" && @run.responses.joins(:reviews).exists? %>
70
+ <%= button_to suggest_run_path(@run), method: :post, class: ck_button_classes(:light, variant: :outline) + " ck-button--sm", form_class: "inline-block" do %>
71
+ <%= heroicon_tag "sparkles", variant: :outline, class: "ck-magic-icon", "aria-hidden": "true" %>
72
+ Suggest improvements
73
+ <% end %>
72
74
  <% end %>
75
+ </div>
76
+ <p class="ck-prompt-preview__text" id="prompt_text"><%= @run.prompt.template %></p>
77
+ <% if @run.prompt.template.length > 200 %>
78
+ <button type="button" class="ck-disclosure-toggle" id="prompt_toggle" aria-expanded="false" aria-controls="prompt_text" onclick="var t=document.getElementById('prompt_text');var l=this;var expanded=t.classList.toggle('ck-prompt-preview__text--expanded');l.firstChild.textContent=expanded?'Show less':'Show more';l.setAttribute('aria-expanded',expanded?'true':'false')"><span>Show more</span></button>
73
79
  <% end %>
74
80
  </div>
75
- <p class="ck-prompt-preview__text" id="prompt_text"><%= @run.prompt.template %></p>
76
- <% if @run.prompt.template.length > 200 %>
77
- <button type="button" class="ck-disclosure-toggle" id="prompt_toggle" aria-expanded="false" aria-controls="prompt_text" onclick="var t=document.getElementById('prompt_text');var l=this;var expanded=t.classList.toggle('ck-prompt-preview__text--expanded');l.firstChild.textContent=expanded?'Show less':'Show more';l.setAttribute('aria-expanded',expanded?'true':'false')"><span>Show more</span></button>
78
- <% end %>
79
- </div>
81
+ <% else %>
82
+ <div class="ck-prompt-preview">
83
+ <div class="ck-prompt-preview__header">
84
+ <p class="ck-kicker">Output source</p>
85
+ </div>
86
+ <p class="ck-prompt-preview__text">Dataset column <code><%= @run.output_column.presence || "actual_output" %></code> — no prompt generated these outputs.</p>
87
+ </div>
88
+ <% end %>
80
89
 
81
90
  <% if @run.dataset %>
82
91
  <dialog id="dataset-preview-<%= @run.id %>" class="ck-modal" onclick="if(event.target===this)this.close()">
@@ -0,0 +1,6 @@
1
+ class AllowJudgeOnlyRuns < ActiveRecord::Migration[8.1]
2
+ def change
3
+ change_column_null :completion_kit_runs, :prompt_id, true
4
+ add_column :completion_kit_runs, :output_column, :string
5
+ end
6
+ end
@@ -1,3 +1,3 @@
1
1
  module CompletionKit
2
- VERSION = "0.5.10"
2
+ VERSION = "0.5.11"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: completion-kit
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.10
4
+ version: 0.5.11
5
5
  platform: ruby
6
6
  authors:
7
7
  - Damien Bastin
@@ -381,6 +381,7 @@ files:
381
381
  - db/migrate/20260509000001_create_completion_kit_tags.rb
382
382
  - db/migrate/20260509000002_create_completion_kit_taggings.rb
383
383
  - db/migrate/20260513000001_create_completion_kit_mcp_sessions.rb
384
+ - db/migrate/20260514000001_allow_judge_only_runs.rb
384
385
  - lib/completion-kit.rb
385
386
  - lib/completion_kit.rb
386
387
  - lib/completion_kit/concurrency_check.rb