minitest-promptfoo 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 1687874b77d0e965b186cf7f488579e65c9b318f9c24c0a888f9871aa20d30ff
4
+ data.tar.gz: 37315fe836966e9ebae13d72012140e3abd8a06c09456e492d68655896312e8b
5
+ SHA512:
6
+ metadata.gz: 0a53f83579b00a493e0bf824a8a6cf784c13aefc4a5ef401efc3a59e5632178f7213380855093e6a7de319c6cbde0e4113d58bdf38dae04e3a1b0ff9e63b9f5c
7
+ data.tar.gz: f269bc6f6127042ea83118a262b794ddb288995533178290ce2dcc00b197548767bbef1397f2760bb65abb38ad0a8771932085ad63668e9b37a1ecfb5764dbc2
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ 3.4.3
data/CHANGELOG.md ADDED
@@ -0,0 +1,26 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ### Added
11
+ - Initial release of minitest-promptfoo
12
+ - Core `Minitest::Promptfoo::Test` class for prompt testing
13
+ - Configuration system for promptfoo executable path
14
+ - Support for multiple providers
15
+ - Assertion DSL: `includes`, `matches`, `equals`, `json_includes`, `javascript`, `rubric`
16
+ - Rails integration with automatic prompt file discovery
17
+ - Support for both .ptmpl and .liquid prompt formats
18
+ - Pre-rendering support for template conflicts
19
+ - Debug mode with `DEBUG_PROMPT_TEST` environment variable
20
+ - Verbose mode for detailed failure messages
21
+ - Comprehensive README with examples
22
+ - Basic test coverage
23
+
24
+ ## [0.1.0] - TBD
25
+
26
+ - Initial release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 Chris Waters
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,326 @@
1
+ # Minitest::Promptfoo
2
+
3
+ A thin Minitest wrapper around [promptfoo](https://www.promptfoo.dev/) that brings prompt testing to Ruby projects. Test your LLM prompts with a familiar Minitest-like DSL, supporting multiple providers and assertion types.
4
+
5
+ ## Why Test Your Prompts?
6
+
7
+ LLM outputs are non-deterministic, but that doesn't mean you can't test them. With minitest-promptfoo, you can:
8
+
9
+ - Ensure prompts produce expected types of responses
10
+ - Validate JSON structure in responses
11
+ - Use LLM-as-judge for qualitative evaluation
12
+ - Test against multiple providers simultaneously
13
+ - Catch prompt regressions before they hit production
14
+
15
+ ## Installation
16
+
17
+ Add this line to your application's Gemfile:
18
+
19
+ ```ruby
20
+ gem 'minitest-promptfoo'
21
+ ```
22
+
23
+ And then execute:
24
+
25
+ ```bash
26
+ $ bundle install
27
+ ```
28
+
29
+ Or install it yourself as:
30
+
31
+ ```bash
32
+ $ gem install minitest-promptfoo
33
+ ```
34
+
35
+ ### Promptfoo Setup
36
+
37
+ You'll need promptfoo installed. You can either:
38
+
39
+ 1. Install it locally via npm:
40
+ ```bash
41
+ npm install -D promptfoo
42
+ ```
43
+
44
+ 2. Or use npx (no installation required):
45
+ ```bash
46
+ # The gem will automatically fall back to `npx promptfoo`
47
+ ```
48
+
49
+ ## Basic Usage
50
+
51
+ ### Plain Ruby Projects
52
+
53
+ ```ruby
54
+ require 'minitest/autorun'
55
+ require 'minitest/promptfoo'
56
+
57
+ class GreetingPromptTest < Minitest::Promptfoo::Test
58
+ # Set provider(s) for all tests in this class
59
+ self.providers = "openai:gpt-4o-mini"
60
+
61
+ def prompt_path
62
+ "prompts/greeting.ptmpl" # Or .liquid
63
+ end
64
+
65
+ def test_generates_professional_greeting
66
+ assert_prompt(vars: { name: "Alice" }) do |response|
67
+ response.includes("Hello Alice")
68
+ response.matches(/[A-Z]/) # Starts with capital
69
+ response.rubric("Response is professional and courteous")
70
+ end
71
+ end
72
+
73
+ def test_validates_json_structure
74
+ assert_prompt(vars: { format: "json" }) do |response|
75
+ response.json_includes(key: "greeting", value: "Hello")
76
+ response.json_includes(key: "sentiment", value: "positive")
77
+ end
78
+ end
79
+ end
80
+ ```
81
+
82
+ ### Rails Projects
83
+
84
+ In Rails, the gem automatically discovers prompt files based on test file paths:
85
+
86
+ ```ruby
87
+ # test/services/greeting_service_test.rb
88
+ class GreetingServiceTest < Minitest::Promptfoo::RailsTest
89
+ self.providers = "openai:gpt-4o-mini"
90
+
91
+ # Automatically finds app/services/greeting_service.ptmpl
92
+ # No need to define prompt_path!
93
+
94
+ def test_greeting_is_friendly
95
+ assert_prompt(vars: { name: "Bob" }) do |response|
96
+ response.includes("Hello Bob")
97
+ response.rubric("Greeting is warm and welcoming", threshold: 0.7)
98
+ end
99
+ end
100
+ end
101
+ ```
102
+
103
+ ## Configuration
104
+
105
+ Configure the gem in your test helper or setup file:
106
+
107
+ ```ruby
108
+ # test/test_helper.rb
109
+ require 'minitest/promptfoo'
110
+
111
+ Minitest::Promptfoo.configure do |config|
112
+ # Optional: specify custom promptfoo executable path
113
+ config.promptfoo_executable = "./node_modules/.bin/promptfoo"
114
+
115
+ # Optional: set root path for resolving prompt files
116
+ config.root_path = Rails.root # or Dir.pwd
117
+ end
118
+ ```
119
+
120
+ ## Assertion Types
121
+
122
+ ### String Matching
123
+
124
+ ```ruby
125
+ assert_prompt(vars: { topic: "weather" }) do |response|
126
+ # Contains substring
127
+ response.includes("sunny")
128
+
129
+ # Matches regex
130
+ response.matches(/\d+°[CF]/)
131
+
132
+ # Exact equality
133
+ response.equals("It's a beautiful day!")
134
+ end
135
+ ```
136
+
137
+ ### JSON Validation
138
+
139
+ ```ruby
140
+ assert_prompt(vars: { query: "status" }) do |response|
141
+ response.json_includes(key: "status", value: "success")
142
+ response.json_includes(key: "code", value: 200)
143
+ end
144
+ ```
145
+
146
+ ### Custom JavaScript
147
+
148
+ ```ruby
149
+ assert_prompt(vars: { count: 5 }) do |response|
150
+ response.javascript("parseInt(output) > 3")
151
+ response.javascript("output.split(' ').length <= 10")
152
+ end
153
+ ```
154
+
155
+ ### LLM-as-Judge
156
+
157
+ ```ruby
158
+ assert_prompt(vars: { tone: "professional" }) do |response|
159
+ response.rubric("Response is professional and courteous")
160
+ response.rubric("Uses business-appropriate language", threshold: 0.8)
161
+ end
162
+ ```
163
+
164
+ ## Multiple Providers
165
+
166
+ Test your prompt across multiple providers:
167
+
168
+ ```ruby
169
+ class MultiProviderTest < Minitest::Promptfoo::Test
170
+ self.providers = [
171
+ "openai:gpt-4o-mini",
172
+ "openai:chat:anthropic:claude-3-7-sonnet",
173
+ "openai:chat:google:gemini-2.0-flash"
174
+ ]
175
+
176
+ def prompt_path
177
+ "prompts/greeting.ptmpl"
178
+ end
179
+
180
+ def test_works_across_providers
181
+ assert_prompt(vars: { name: "Alice" }) do |response|
182
+ response.includes("Alice")
183
+ end
184
+ end
185
+ end
186
+ ```
187
+
188
+ ## Provider Configuration
189
+
190
+ Pass custom configuration to providers:
191
+
192
+ ```ruby
193
+ def test_json_response_format
194
+ json_provider = {
195
+ id: "openai:gpt-4o-mini",
196
+ config: {
197
+ response_format: { type: "json_object" },
198
+ temperature: 0.7
199
+ }
200
+ }
201
+
202
+ assert_prompt(vars: { input: "data" }, providers: json_provider) do |response|
203
+ response.json_includes(key: "result", value: "success")
204
+ end
205
+ end
206
+ ```
207
+
208
+ ## Prompt File Formats
209
+
210
+ ### Promptfoo Templates (.ptmpl)
211
+
212
+ Use double-brace syntax for variables:
213
+
214
+ ```
215
+ You are a helpful assistant.
216
+
217
+ Greet the user named {{name}} in a {{tone}} manner.
218
+ ```
219
+
220
+ ### Liquid Templates (.liquid)
221
+
222
+ Standard Liquid syntax (converted internally):
223
+
224
+ ```
225
+ You are a helpful assistant.
226
+
227
+ Greet the user named {name} in a {tone} manner.
228
+ ```
229
+
230
+ ## Pre-rendering Templates
231
+
232
+ If your prompt contains syntax that conflicts with promptfoo's templating (like analyzing Liquid code), pre-render it:
233
+
234
+ ```ruby
235
+ def test_liquid_code_analysis
236
+ assert_prompt(
237
+ vars: { code: "{{user.name | upcase}}" },
238
+ pre_render: true
239
+ ) do |response|
240
+ response.includes("variable interpolation")
241
+ end
242
+ end
243
+ ```
244
+
245
+ ## Debugging
246
+
247
+ Enable debug output to see the generated promptfoo config:
248
+
249
+ ```bash
250
+ DEBUG_PROMPT_TEST=1 bundle exec rake test
251
+ ```
252
+
253
+ Or enable verbose mode for detailed failure messages:
254
+
255
+ ```ruby
256
+ assert_prompt(vars: { name: "Alice" }, verbose: true) do |response|
257
+ response.rubric("Be friendly")
258
+ end
259
+ ```
260
+
261
+ ## Real-World Example
262
+
263
+ ```ruby
264
+ class CustomerSupportPromptTest < Minitest::Promptfoo::Test
265
+ self.providers = "openai:gpt-4o-mini"
266
+
267
+ def prompt_path
268
+ "prompts/customer_support.ptmpl"
269
+ end
270
+
271
+ def test_handles_refund_request_professionally
272
+ assert_prompt(vars: {
273
+ issue: "item arrived damaged",
274
+ customer_name: "Jane Doe"
275
+ }) do |response|
276
+ response.includes("Jane")
277
+ response.rubric("Acknowledges the issue empathetically")
278
+ response.rubric("Offers clear next steps")
279
+ response.rubric("Maintains professional tone")
280
+ response.matches(/refund|replacement/i)
281
+ end
282
+ end
283
+
284
+ def test_escalates_complex_issues
285
+ assert_prompt(vars: {
286
+ issue: "legal complaint about data breach",
287
+ customer_name: "John Smith"
288
+ }) do |response|
289
+ response.rubric("Recognizes this requires escalation")
290
+ response.rubric("Does not make promises outside of AI's authority")
291
+ response.includes("escalate")
292
+ end
293
+ end
294
+ end
295
+ ```
296
+
297
+ ## Differences from ActiveSupport::TestCase
298
+
299
+ When using `Minitest::Promptfoo::Test` (non-Rails), note these differences:
300
+
301
+ - No fixtures or setup helpers from Rails
302
+ - Must explicitly define `prompt_path`
303
+ - No automatic database transaction rollbacks
304
+ - Uses plain Minitest assertions
305
+
306
+ For Rails projects, use `Minitest::Promptfoo::RailsTest` to get all Rails testing features plus automatic prompt discovery.
307
+
308
+ ## Development
309
+
310
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
311
+
312
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
313
+
314
+ ## Contributing
315
+
316
+ Bug reports and pull requests are welcome on GitHub at https://github.com/christhesoul/minitest-promptfoo.
317
+
318
+ ## License
319
+
320
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
321
+
322
+ ## Credits
323
+
324
+ Built with love on top of:
325
+ - [promptfoo](https://www.promptfoo.dev/) - The excellent prompt testing framework
326
+ - [minitest](https://github.com/minitest/minitest) - Ruby's favorite testing library
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "minitest/test_task"
5
+
6
+ Minitest::TestTask.create
7
+
8
+ require "standard/rake"
9
+
10
+ task default: %i[test standard]
@@ -0,0 +1,3 @@
1
+ You are a helpful assistant.
2
+
3
+ Greet the user named {{name}} in a {{tone}} manner.
@@ -0,0 +1,34 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Example usage of minitest-promptfoo
4
+ #
5
+ # To run this example:
6
+ # 1. Create a prompt file at examples/greeting.ptmpl
7
+ # 2. Run: ruby examples/simple_prompt_test.rb
8
+
9
+ require "bundler/setup"
10
+ require "minitest/autorun"
11
+ require "minitest/promptfoo"
12
+
13
+ class SimplePromptTest < Minitest::Promptfoo::Test
14
+ # Use the echo provider for testing (doesn't call actual LLMs)
15
+ self.providers = "echo"
16
+
17
+ def prompt_path
18
+ File.join(__dir__, "greeting.ptmpl")
19
+ end
20
+
21
+ def test_prompt_includes_name
22
+ assert_prompt(vars: {name: "Alice", tone: "friendly"}) do |response|
23
+ response.includes("Alice")
24
+ response.includes("friendly")
25
+ end
26
+ end
27
+
28
+ def test_prompt_with_different_tone
29
+ assert_prompt(vars: {name: "Bob", tone: "professional"}) do |response|
30
+ response.includes("Bob")
31
+ response.includes("professional")
32
+ end
33
+ end
34
+ end
@@ -0,0 +1,79 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+
5
+ module Minitest
6
+ module Promptfoo
7
+ # DSL for building promptfoo assertions in a minitest-like style
8
+ #
9
+ # Example:
10
+ # builder = AssertionBuilder.new
11
+ # builder.includes("Hello")
12
+ # builder.matches(/\d+/)
13
+ # builder.rubric("Response is professional")
14
+ # builder.to_promptfoo_assertions
15
+ class AssertionBuilder
16
+ def initialize
17
+ @assertions = []
18
+ end
19
+
20
+ # String inclusion check
21
+ def includes(text)
22
+ @assertions << {
23
+ "type" => "contains",
24
+ "value" => text
25
+ }
26
+ end
27
+
28
+ # Regex pattern matching
29
+ def matches(pattern)
30
+ @assertions << {
31
+ "type" => "regex",
32
+ "value" => pattern.source
33
+ }
34
+ end
35
+
36
+ # Exact equality check
37
+ def equals(expected)
38
+ @assertions << {
39
+ "type" => "equals",
40
+ "value" => expected
41
+ }
42
+ end
43
+
44
+ # JSON structure validation using JavaScript
45
+ def json_includes(key:, value:)
46
+ @assertions << {
47
+ "type" => "is-json"
48
+ }
49
+ # Handle both string output (needs parsing) and object output (already parsed)
50
+ @assertions << {
51
+ "type" => "javascript",
52
+ "value" => "(typeof output === 'string' ? JSON.parse(output) : output)[#{key.inspect}] === #{value.to_json}"
53
+ }
54
+ end
55
+
56
+ # Custom JavaScript assertion
57
+ def javascript(js_code)
58
+ @assertions << {
59
+ "type" => "javascript",
60
+ "value" => js_code
61
+ }
62
+ end
63
+
64
+ # LLM-as-judge rubric evaluation
65
+ def rubric(criteria, threshold: 0.5)
66
+ @assertions << {
67
+ "type" => "llm-rubric",
68
+ "value" => criteria,
69
+ "threshold" => threshold
70
+ }
71
+ end
72
+
73
+ # Convert to promptfoo assertion format
74
+ def to_promptfoo_assertions
75
+ @assertions
76
+ end
77
+ end
78
+ end
79
+ end
@@ -0,0 +1,49 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minitest
4
+ module Promptfoo
5
+ class Configuration
6
+ attr_accessor :promptfoo_executable, :root_path
7
+
8
+ def initialize
9
+ @promptfoo_executable = nil
10
+ @root_path = Dir.pwd
11
+ end
12
+
13
+ # Resolves the promptfoo executable path
14
+ # Priority: configured path > npx promptfoo
15
+ def resolve_executable
16
+ return promptfoo_executable if promptfoo_executable && executable_exists?(promptfoo_executable)
17
+
18
+ # Try local node_modules
19
+ local_bin = File.join(root_path, "node_modules", ".bin", "promptfoo")
20
+ return local_bin if executable_exists?(local_bin)
21
+
22
+ # Fall back to npx
23
+ "npx promptfoo"
24
+ end
25
+
26
+ private
27
+
28
+ def executable_exists?(path)
29
+ File.exist?(path) && File.executable?(path)
30
+ end
31
+ end
32
+
33
+ class << self
34
+ attr_writer :configuration
35
+
36
+ def configuration
37
+ @configuration ||= Configuration.new
38
+ end
39
+
40
+ def configure
41
+ yield(configuration)
42
+ end
43
+
44
+ def reset_configuration!
45
+ @configuration = Configuration.new
46
+ end
47
+ end
48
+ end
49
+ end
@@ -0,0 +1,226 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+
5
+ module Minitest
6
+ module Promptfoo
7
+ # Formats promptfoo test failures into human-readable error messages
8
+ class FailureFormatter
9
+ def initialize(verbose: false)
10
+ @verbose = verbose
11
+ end
12
+
13
+ # Main entry point: formats a complete failure message from promptfoo results
14
+ def format_results(passing_providers, failing_providers)
15
+ msg = "Prompt evaluation results:\n"
16
+
17
+ passing_providers.each do |provider_id|
18
+ msg += " ✓ #{provider_id}\n"
19
+ end
20
+
21
+ failing_providers.each do |failure|
22
+ msg += " ✗ #{failure[:id]}\n"
23
+ end
24
+
25
+ msg += "\n"
26
+
27
+ failing_providers.each do |failure|
28
+ msg += format_provider_failure(failure[:id], failure[:result])
29
+ msg += "\n"
30
+ end
31
+
32
+ unless @verbose
33
+ msg += "💡 Tip: Add `verbose: true` to assert_prompt for detailed debugging output\n"
34
+ end
35
+
36
+ msg
37
+ end
38
+
39
+ private
40
+
41
+ def format_provider_failure(provider_id, provider_result)
42
+ output_text = provider_result.dig("response", "output") || provider_result.dig("output")
43
+ error = provider_result.dig("error") || provider_result.dig("response", "error")
44
+ grading_result = provider_result.dig("gradingResult") || {}
45
+ component_results = grading_result.dig("componentResults") || []
46
+
47
+ msg = "#{provider_id} FAILED:\n\n"
48
+
49
+ msg += format_api_error(error) if error&.length&.positive?
50
+ msg += format_response_output(output_text, error)
51
+
52
+ assertion_failures = extract_assertion_failures(component_results)
53
+ msg += format_assertion_failures(assertion_failures, output_text) if assertion_failures.any?
54
+
55
+ msg += format_verbose_output(provider_result) if @verbose
56
+
57
+ msg
58
+ end
59
+
60
+ def format_api_error(error)
61
+ "API Error:\n #{error}\n\n"
62
+ end
63
+
64
+ def format_response_output(output_text, error)
65
+ if output_text && output_text.to_s.length > 0
66
+ formatted_output = output_text.is_a?(String) ? output_text : JSON.pretty_generate(output_text)
67
+ "Response:\n #{formatted_output.gsub("\n", "\n ")}\n\n"
68
+ elsif !error || error.length == 0
69
+ "No response received from provider\n\n"
70
+ else
71
+ ""
72
+ end
73
+ end
74
+
75
+ def format_assertion_failures(assertion_failures, output_text)
76
+ msg = "Failures:\n"
77
+
78
+ # If JSON parsing failed, only show that error (other failures are consequences)
79
+ json_parse_failure = assertion_failures.find { |f| f[:type] == "is-json" }
80
+
81
+ if json_parse_failure
82
+ msg += format_assertion_failure(json_parse_failure, output_text)
83
+ else
84
+ assertion_failures.each do |failure|
85
+ msg += format_assertion_failure(failure, output_text)
86
+ end
87
+ end
88
+
89
+ msg
90
+ end
91
+
92
+ def format_verbose_output(provider_result)
93
+ "\nRaw Provider Result (verbose mode):\n" \
94
+ " #{JSON.pretty_generate(provider_result).gsub("\n", "\n ")}\n"
95
+ end
96
+
97
+ def extract_assertion_failures(component_results)
98
+ component_results.select { |result| !result.dig("pass") }.map do |result|
99
+ {
100
+ type: result.dig("assertion", "type"),
101
+ value: result.dig("assertion", "value"),
102
+ threshold: result.dig("assertion", "threshold"),
103
+ score: result.dig("score"),
104
+ reason: result.dig("reason"),
105
+ named_scores: result.dig("namedScores")
106
+ }
107
+ end
108
+ end
109
+
110
+ def format_assertion_failure(failure, output_text)
111
+ case failure[:type]
112
+ when "llm-rubric"
113
+ format_rubric_failure(failure)
114
+ when "contains"
115
+ " ✗ includes(#{failure[:value].inspect}) - not found in response\n"
116
+ when "regex"
117
+ " ✗ matches(/#{failure[:value]}/) - pattern not found\n"
118
+ when "equals"
119
+ " ✗ equals(#{failure[:value].inspect}) - response does not match\n"
120
+ when "javascript"
121
+ format_javascript_failure(failure, output_text)
122
+ when "is-json"
123
+ format_invalid_json_failure(failure, output_text)
124
+ else
125
+ " ✗ #{failure[:type]} assertion failed\n"
126
+ end
127
+ end
128
+
129
+ def format_javascript_failure(failure, output_text)
130
+ js_code = failure[:value].to_s
131
+
132
+ if json_assertion?(js_code)
133
+ parsed = parse_json_assertion(js_code)
134
+ if parsed
135
+ key = parsed[:key]
136
+ expected = parsed[:expected]
137
+ actual_value = extract_json_value(output_text, key.to_s)
138
+ msg = " ✗ json_includes(key: #{key.inspect})\n"
139
+ msg += " Expected: #{expected.inspect}\n"
140
+ msg += " Actual: #{actual_value.inspect}\n"
141
+ return msg
142
+ end
143
+ end
144
+
145
+ " ✗ javascript assertion failed\n"
146
+ end
147
+
148
+ def format_invalid_json_failure(failure, output_text)
149
+ msg = " ✗ response is not valid JSON\n"
150
+
151
+ if output_text && output_text.to_s.length > 0
152
+ text = output_text.is_a?(String) ? output_text : JSON.pretty_generate(output_text)
153
+ snippet = (text.length > 100) ? "#{text[0..100]}..." : text
154
+ msg += " Output: #{snippet.inspect}\n"
155
+ end
156
+
157
+ msg
158
+ end
159
+
160
+ def format_rubric_failure(failure)
161
+ score = failure[:score] || 0
162
+ threshold = failure[:threshold] || 0.5
163
+
164
+ msg = " ✗ rubric (score: #{score.round(2)}/#{threshold})\n"
165
+ if score >= threshold
166
+ msg += " Note: Score meets threshold but one or more criteria failed\n"
167
+ msg += " Promptfoo requires ALL criteria to pass, not just the aggregate score\n"
168
+ end
169
+
170
+ if @verbose
171
+ criteria = failure[:value]
172
+ reason = failure[:reason]
173
+
174
+ if criteria && criteria.to_s.length > 0
175
+ msg += "\n Rubric criteria:\n"
176
+ criteria.split("\n").each do |line|
177
+ msg += " #{line}\n" if line.strip.length > 0
178
+ end
179
+ end
180
+
181
+ if reason && reason.to_s.length > 0
182
+ msg += "\n Judge feedback:\n"
183
+ reason.split("\n").each do |line|
184
+ msg += " #{line}\n"
185
+ end
186
+ end
187
+
188
+ msg += "\n"
189
+ end
190
+
191
+ msg
192
+ end
193
+
194
+ # JSON assertion helpers
195
+
196
+ def json_assertion?(js_code)
197
+ js_code.to_s.match?(/JSON\.parse\(output\)\[/)
198
+ end
199
+
200
+ def parse_json_assertion(js_code)
201
+ match = js_code.match(/JSON\.parse\(output\)\[(['"])(.+?)\1\]\s*===\s*(.+)/)
202
+ return nil unless match
203
+
204
+ key = match[2]
205
+ expected_json = match[3]
206
+
207
+ expected_value = begin
208
+ JSON.parse(expected_json)
209
+ rescue JSON::ParserError
210
+ expected_json
211
+ end
212
+
213
+ {key: key, expected: expected_value}
214
+ end
215
+
216
+ def extract_json_value(output_text, key)
217
+ return nil unless output_text && output_text.to_s.length > 0
218
+
219
+ parsed = output_text.is_a?(String) ? JSON.parse(output_text) : output_text
220
+ parsed[key]
221
+ rescue JSON::ParserError
222
+ nil
223
+ end
224
+ end
225
+ end
226
+ end
@@ -0,0 +1,66 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "open3"
4
+ require "json"
5
+
6
+ module Minitest
7
+ module Promptfoo
8
+ # Handles execution of the promptfoo CLI and parsing of results
9
+ class PromptfooRunner
10
+ class ExecutionError < StandardError; end
11
+
12
+ def initialize(configuration)
13
+ @configuration = configuration
14
+ end
15
+
16
+ # Executes promptfoo CLI with the given config and options
17
+ # Returns a hash with :success, :stdout, :stderr keys
18
+ def execute(config_path, working_dir, pre_render: false, show_output: false)
19
+ env_vars = build_env_vars(pre_render: pre_render)
20
+ cmd = build_command(config_path)
21
+
22
+ if show_output
23
+ execute_with_output(env_vars, cmd, working_dir)
24
+ else
25
+ execute_silently(env_vars, cmd, working_dir)
26
+ end
27
+ end
28
+
29
+ # Parses promptfoo JSON output file
30
+ def parse_output(output_path)
31
+ return {} unless File.exist?(output_path)
32
+
33
+ JSON.parse(File.read(output_path))
34
+ rescue JSON::ParserError => e
35
+ raise ExecutionError, "Failed to parse promptfoo output: #{e.message}"
36
+ end
37
+
38
+ private
39
+
40
+ def build_env_vars(pre_render:)
41
+ pre_render ? {"PROMPTFOO_DISABLE_TEMPLATING" => "true"} : {}
42
+ end
43
+
44
+ def build_command(config_path)
45
+ base_cmd = @configuration.resolve_executable
46
+ args = ["eval", "-c", config_path, "--no-cache"]
47
+
48
+ if base_cmd.start_with?("npx")
49
+ base_cmd.split + args
50
+ else
51
+ [base_cmd] + args
52
+ end
53
+ end
54
+
55
+ def execute_with_output(env_vars, cmd, working_dir)
56
+ success = system(env_vars, *cmd, chdir: working_dir)
57
+ {success: success, stdout: "", stderr: ""}
58
+ end
59
+
60
+ def execute_silently(env_vars, cmd, working_dir)
61
+ stdout, stderr, status = Open3.capture3(env_vars, *cmd, chdir: working_dir)
62
+ {success: status.success?, stdout: stdout, stderr: stderr}
63
+ end
64
+ end
65
+ end
66
+ end
@@ -0,0 +1,85 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minitest
4
+ module Promptfoo
5
+ # Rails integration for automatic prompt file discovery
6
+ #
7
+ # Automatically discovers .ptmpl or .liquid prompt files based on Rails conventions:
8
+ # app/services/foo/bar.ptmpl → test/services/foo/bar_test.rb
9
+ #
10
+ # Usage:
11
+ # class MyPromptTest < Minitest::Promptfoo::RailsTest
12
+ # # No need to define prompt_path, it's auto-discovered!
13
+ #
14
+ # test "generates greeting" do
15
+ # assert_prompt(vars: { name: "Alice" }) do |response|
16
+ # response.includes("Hello Alice")
17
+ # end
18
+ # end
19
+ # end
20
+ module Rails
21
+ def self.included(base)
22
+ base.class_eval do
23
+ # Override prompt_path to use Rails convention-based discovery
24
+ def prompt_path
25
+ @prompt_path ||= resolve_prompt_path_rails
26
+ end
27
+
28
+ private
29
+
30
+ def resolve_prompt_path_rails
31
+ test_file_path = method(name).source_location[0]
32
+ test_dir = File.dirname(test_file_path)
33
+ test_basename = File.basename(test_file_path, "_test.rb")
34
+
35
+ app_dir = test_dir.gsub(%r{^(.*/)?test/}, '\1app/')
36
+
37
+ [".ptmpl", ".liquid"].each do |ext|
38
+ candidate = File.join(app_dir, "#{test_basename}#{ext}")
39
+ return candidate if File.exist?(candidate)
40
+ end
41
+
42
+ raise PromptNotFoundError, "Could not find prompt file for #{test_file_path}"
43
+ end
44
+ end
45
+ end
46
+ end
47
+
48
+ # Convenience class that combines Test + Rails integration
49
+ # Inherits from ActiveSupport::TestCase if available, otherwise Minitest::Test
50
+ if defined?(ActiveSupport::TestCase)
51
+ class RailsTest < ActiveSupport::TestCase
52
+ include Minitest::Promptfoo::Rails
53
+
54
+ # Borrow all the assertion methods from Test
55
+ # but keep ActiveSupport::TestCase as the base
56
+ include Minitest::Promptfoo::Test.instance_methods(false).each_with_object(Module.new) { |m, mod|
57
+ mod.define_method(m, Minitest::Promptfoo::Test.instance_method(m))
58
+ }
59
+
60
+ # Include class methods
61
+ class << self
62
+ attr_accessor :_providers
63
+
64
+ def providers
65
+ @_providers || "echo"
66
+ end
67
+
68
+ def providers=(value)
69
+ @_providers = value
70
+ end
71
+
72
+ def inherited(subclass)
73
+ super
74
+ subclass._providers = _providers
75
+ end
76
+ end
77
+ end
78
+ else
79
+ # Fallback if ActiveSupport isn't available
80
+ class RailsTest < Test
81
+ include Rails
82
+ end
83
+ end
84
+ end
85
+ end
@@ -0,0 +1,238 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "yaml"
4
+ require "tmpdir"
5
+ require "minitest/test"
6
+ require_relative "assertion_builder"
7
+ require_relative "failure_formatter"
8
+ require_relative "promptfoo_runner"
9
+
10
+ module Minitest
11
+ module Promptfoo
12
+ # Base class for testing LLM prompts using promptfoo.
13
+ #
14
+ # Recommended Usage (Minitest-like DSL):
15
+ # class MyPromptTest < Minitest::Promptfoo::Test
16
+ # # Set provider(s) for ALL tests in this class (DRY!)
17
+ # # Providers can be strings or hashes with config (see promptfoo docs)
18
+ # self.providers = [
19
+ # "openai:gpt-4o-mini", # Simple string format
20
+ # {
21
+ # id: "openai:chat:anthropic:claude-3-7-sonnet",
22
+ # config: { response_format: { type: "json_object" } } # With config
23
+ # }
24
+ # ]
25
+ #
26
+ # def prompt_path
27
+ # "prompts/greeting.ptmpl" # Or .liquid
28
+ # end
29
+ #
30
+ # test "generates professional greeting" do
31
+ # assert_prompt(vars: { name: "Alice" }) do |response|
32
+ # response.includes("Hello Alice")
33
+ # response.matches(/[A-Z]/) # Starts with capital letter
34
+ # response.rubric("Response is professional and courteous")
35
+ # end
36
+ # end
37
+ # end
38
+ class Test < Minitest::Test
39
+ class PromptNotFoundError < StandardError; end
40
+ class EvaluationError < StandardError; end
41
+
42
+ # Class-level configuration
43
+ class << self
44
+ def debug?
45
+ ENV["DEBUG_PROMPT_TEST"] == "1"
46
+ end
47
+
48
+ def providers
49
+ @providers || "echo"
50
+ end
51
+
52
+ attr_writer :providers
53
+
54
+ def inherited(subclass)
55
+ super
56
+ subclass.providers = providers if defined?(@providers)
57
+ end
58
+ end
59
+
60
+ def prompt_path
61
+ raise NotImplementedError, "#{self.class}#prompt_path must be implemented"
62
+ end
63
+
64
+ def prompt_content
65
+ @prompt_content ||= begin
66
+ path = prompt_path
67
+ raise PromptNotFoundError, "Prompt file not found: #{path}" unless File.exist?(path)
68
+ File.read(path, encoding: "UTF-8")
69
+ end
70
+ end
71
+
72
+ # Minitest-like DSL for prompt testing
73
+ #
74
+ # Example:
75
+ # assert_prompt(vars: { input: "test" }) do |response|
76
+ # response.includes("expected text")
77
+ # response.matches(/\d{3}-\d{4}/)
78
+ # response.rubric("Response is professional and courteous")
79
+ # end
80
+ def assert_prompt(vars:, providers: nil, verbose: false, pre_render: false, &block)
81
+ builder = AssertionBuilder.new
82
+ yield(builder)
83
+
84
+ output = evaluate_prompt(
85
+ prompt_text: prompt_content,
86
+ vars: vars,
87
+ providers: providers,
88
+ assertions: builder.to_promptfoo_assertions,
89
+ verbose: verbose,
90
+ pre_render: pre_render
91
+ )
92
+
93
+ # Real assertion: verify promptfoo produced results
94
+ assert(output.any?, "Promptfoo evaluation produced no output")
95
+
96
+ output
97
+ end
98
+
99
+ def evaluate_prompt(prompt_text:, vars:, providers: nil, assertions: [], pre_render: false, verbose: false, show_output: false)
100
+ Dir.mktmpdir do |tmpdir|
101
+ config_path = File.join(tmpdir, "promptfooconfig.yaml")
102
+ output_path = File.join(tmpdir, "output.json")
103
+
104
+ # Convert single-brace {var} syntax to double-brace {{var}} for promptfoo
105
+ promptfoo_text = prompt_text.gsub(/(?<!\{)\{(\w+)\}(?!\})/, '{{\1}}')
106
+
107
+ if pre_render
108
+ vars.each do |key, value|
109
+ promptfoo_text = promptfoo_text.gsub("{{#{key}}}", value.to_s)
110
+ end
111
+ config_vars = {}
112
+ else
113
+ config_vars = vars
114
+ end
115
+
116
+ # Use provided provider(s) or fall back to class-level default
117
+ providers_array = wrap_array(providers || self.class.providers)
118
+
119
+ config = build_promptfoo_config(
120
+ prompt: promptfoo_text,
121
+ vars: config_vars,
122
+ providers: providers_array,
123
+ assertions: assertions,
124
+ output_path: output_path
125
+ )
126
+
127
+ config_yaml = YAML.dump(config)
128
+ File.write(config_path, config_yaml)
129
+
130
+ debug("Promptfoo Config", config_yaml)
131
+
132
+ runner = PromptfooRunner.new(Minitest::Promptfoo.configuration)
133
+ result = runner.execute(config_path, tmpdir, show_output: show_output, pre_render: pre_render)
134
+
135
+ debug("Promptfoo Result", result.inspect)
136
+
137
+ output = runner.parse_output(output_path)
138
+
139
+ unless result[:success] || output.any?
140
+ raise EvaluationError, <<~ERROR
141
+ promptfoo evaluation failed
142
+ STDOUT: #{result[:stdout]}
143
+ STDERR: #{result[:stderr]}
144
+ ERROR
145
+ end
146
+
147
+ check_provider_failures(output, providers_array, verbose: verbose) if assertions.any?
148
+
149
+ output
150
+ end
151
+ end
152
+
153
+ private
154
+
155
+ def check_provider_failures(output, providers, verbose: false)
156
+ results = output.dig("results", "results") || []
157
+ passing_providers = []
158
+ failing_providers = []
159
+
160
+ results.each do |provider_result|
161
+ provider_id = provider_result.dig("provider", "id")
162
+ success = provider_result.dig("success")
163
+
164
+ if success
165
+ passing_providers << provider_id
166
+ else
167
+ failing_providers << {
168
+ id: provider_id,
169
+ result: provider_result
170
+ }
171
+ end
172
+ end
173
+
174
+ if failing_providers.any?
175
+ formatter = FailureFormatter.new(verbose: verbose)
176
+ error_msg = formatter.format_results(passing_providers, failing_providers)
177
+ flunk(error_msg)
178
+ end
179
+ end
180
+
181
+ def build_promptfoo_config(prompt:, vars:, providers:, assertions:, output_path:)
182
+ normalized_providers = providers.map do |provider|
183
+ case provider
184
+ when String
185
+ provider
186
+ when Hash
187
+ deep_stringify_keys(provider)
188
+ end
189
+ end
190
+
191
+ {
192
+ "prompts" => [prompt],
193
+ "providers" => normalized_providers,
194
+ "tests" => [
195
+ {
196
+ "vars" => vars.transform_keys(&:to_s),
197
+ "assert" => assertions
198
+ }
199
+ ],
200
+ "outputPath" => output_path
201
+ }
202
+ end
203
+
204
+ def debug(title, content)
205
+ return unless self.class.debug?
206
+
207
+ warn "\n=== #{title} ==="
208
+ warn content
209
+ warn "=" * (title.length + 8)
210
+ warn ""
211
+ end
212
+
213
+ # Simple array wrapper (replaces ActiveSupport's Array.wrap)
214
+ def wrap_array(object)
215
+ case object
216
+ when nil then []
217
+ when Array then object
218
+ else [object]
219
+ end
220
+ end
221
+
222
+ # Simple deep stringify keys (replaces ActiveSupport method)
223
+ def deep_stringify_keys(hash)
224
+ hash.each_with_object({}) do |(key, value), result|
225
+ result[key.to_s] = stringify_value(value)
226
+ end
227
+ end
228
+
229
+ def stringify_value(value)
230
+ case value
231
+ when Hash then deep_stringify_keys(value)
232
+ when Array then value.map { |v| stringify_value(v) }
233
+ else value
234
+ end
235
+ end
236
+ end
237
+ end
238
+ end
@@ -0,0 +1,7 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Minitest
4
+ module Promptfoo
5
+ VERSION = "0.1.0"
6
+ end
7
+ end
@@ -0,0 +1,16 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "promptfoo/version"
4
+ require_relative "promptfoo/configuration"
5
+ require_relative "promptfoo/test"
6
+
7
+ # Auto-load Rails integration if Rails is detected
8
+ if defined?(Rails)
9
+ require_relative "promptfoo/rails"
10
+ end
11
+
12
+ module Minitest
13
+ module Promptfoo
14
+ class Error < StandardError; end
15
+ end
16
+ end
@@ -0,0 +1,6 @@
1
+ module Minitest
2
+ module Promptfoo
3
+ VERSION: String
4
+ # See the writing guide of rbs: https://github.com/ruby/rbs#guides
5
+ end
6
+ end
metadata ADDED
@@ -0,0 +1,103 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: minitest-promptfoo
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Chris Waters
8
+ bindir: exe
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies:
12
+ - !ruby/object:Gem::Dependency
13
+ name: minitest
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '5.0'
19
+ type: :runtime
20
+ prerelease: false
21
+ version_requirements: !ruby/object:Gem::Requirement
22
+ requirements:
23
+ - - "~>"
24
+ - !ruby/object:Gem::Version
25
+ version: '5.0'
26
+ - !ruby/object:Gem::Dependency
27
+ name: rake
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '13.0'
33
+ type: :development
34
+ prerelease: false
35
+ version_requirements: !ruby/object:Gem::Requirement
36
+ requirements:
37
+ - - "~>"
38
+ - !ruby/object:Gem::Version
39
+ version: '13.0'
40
+ - !ruby/object:Gem::Dependency
41
+ name: standard
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 1.35.1
47
+ type: :development
48
+ prerelease: false
49
+ version_requirements: !ruby/object:Gem::Requirement
50
+ requirements:
51
+ - - ">="
52
+ - !ruby/object:Gem::Version
53
+ version: 1.35.1
54
+ description: A thin Minitest wrapper around promptfoo that brings prompt testing to
55
+ Ruby projects. Test LLM prompts with a familiar Minitest-like DSL, supporting multiple
56
+ providers and assertion types.
57
+ email:
58
+ - chris.waters@shopify.com
59
+ executables: []
60
+ extensions: []
61
+ extra_rdoc_files: []
62
+ files:
63
+ - ".ruby-version"
64
+ - CHANGELOG.md
65
+ - LICENSE.txt
66
+ - README.md
67
+ - Rakefile
68
+ - examples/greeting.ptmpl
69
+ - examples/simple_prompt_test.rb
70
+ - lib/minitest/promptfoo.rb
71
+ - lib/minitest/promptfoo/assertion_builder.rb
72
+ - lib/minitest/promptfoo/configuration.rb
73
+ - lib/minitest/promptfoo/failure_formatter.rb
74
+ - lib/minitest/promptfoo/promptfoo_runner.rb
75
+ - lib/minitest/promptfoo/rails.rb
76
+ - lib/minitest/promptfoo/test.rb
77
+ - lib/minitest/promptfoo/version.rb
78
+ - sig/minitest/promptfoo.rbs
79
+ homepage: https://github.com/christhesoul/minitest-promptfoo
80
+ licenses:
81
+ - MIT
82
+ metadata:
83
+ homepage_uri: https://github.com/christhesoul/minitest-promptfoo
84
+ source_code_uri: https://github.com/christhesoul/minitest-promptfoo
85
+ changelog_uri: https://github.com/christhesoul/minitest-promptfoo/blob/main/CHANGELOG.md
86
+ rdoc_options: []
87
+ require_paths:
88
+ - lib
89
+ required_ruby_version: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - ">="
92
+ - !ruby/object:Gem::Version
93
+ version: 2.7.0
94
+ required_rubygems_version: !ruby/object:Gem::Requirement
95
+ requirements:
96
+ - - ">="
97
+ - !ruby/object:Gem::Version
98
+ version: '0'
99
+ requirements: []
100
+ rubygems_version: 3.6.7
101
+ specification_version: 4
102
+ summary: Minitest integration for promptfoo - test your LLM prompts with confidence
103
+ test_files: []