minitest-promptfoo 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.ruby-version +1 -0
- data/CHANGELOG.md +26 -0
- data/LICENSE.txt +21 -0
- data/README.md +326 -0
- data/Rakefile +10 -0
- data/examples/greeting.ptmpl +3 -0
- data/examples/simple_prompt_test.rb +34 -0
- data/lib/minitest/promptfoo/assertion_builder.rb +79 -0
- data/lib/minitest/promptfoo/configuration.rb +49 -0
- data/lib/minitest/promptfoo/failure_formatter.rb +226 -0
- data/lib/minitest/promptfoo/promptfoo_runner.rb +66 -0
- data/lib/minitest/promptfoo/rails.rb +85 -0
- data/lib/minitest/promptfoo/test.rb +238 -0
- data/lib/minitest/promptfoo/version.rb +7 -0
- data/lib/minitest/promptfoo.rb +16 -0
- data/sig/minitest/promptfoo.rbs +6 -0
- metadata +103 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 1687874b77d0e965b186cf7f488579e65c9b318f9c24c0a888f9871aa20d30ff
|
|
4
|
+
data.tar.gz: 37315fe836966e9ebae13d72012140e3abd8a06c09456e492d68655896312e8b
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 0a53f83579b00a493e0bf824a8a6cf784c13aefc4a5ef401efc3a59e5632178f7213380855093e6a7de319c6cbde0e4113d58bdf38dae04e3a1b0ff9e63b9f5c
|
|
7
|
+
data.tar.gz: f269bc6f6127042ea83118a262b794ddb288995533178290ce2dcc00b197548767bbef1397f2760bb65abb38ad0a8771932085ad63668e9b37a1ecfb5764dbc2
|
data/.ruby-version
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.4.3
|
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
### Added
|
|
11
|
+
- Initial release of minitest-promptfoo
|
|
12
|
+
- Core `Minitest::Promptfoo::Test` class for prompt testing
|
|
13
|
+
- Configuration system for promptfoo executable path
|
|
14
|
+
- Support for multiple providers
|
|
15
|
+
- Assertion DSL: `includes`, `matches`, `equals`, `json_includes`, `javascript`, `rubric`
|
|
16
|
+
- Rails integration with automatic prompt file discovery
|
|
17
|
+
- Support for both .ptmpl and .liquid prompt formats
|
|
18
|
+
- Pre-rendering support for template conflicts
|
|
19
|
+
- Debug mode with `DEBUG_PROMPT_TEST` environment variable
|
|
20
|
+
- Verbose mode for detailed failure messages
|
|
21
|
+
- Comprehensive README with examples
|
|
22
|
+
- Basic test coverage
|
|
23
|
+
|
|
24
|
+
## [0.1.0] - TBD
|
|
25
|
+
|
|
26
|
+
- Initial release
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Chris Waters
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,326 @@
|
|
|
1
|
+
# Minitest::Promptfoo
|
|
2
|
+
|
|
3
|
+
A thin Minitest wrapper around [promptfoo](https://www.promptfoo.dev/) that brings prompt testing to Ruby projects. Test your LLM prompts with a familiar Minitest-like DSL, supporting multiple providers and assertion types.
|
|
4
|
+
|
|
5
|
+
## Why Test Your Prompts?
|
|
6
|
+
|
|
7
|
+
LLM outputs are non-deterministic, but that doesn't mean you can't test them. With minitest-promptfoo, you can:
|
|
8
|
+
|
|
9
|
+
- Ensure prompts produce expected types of responses
|
|
10
|
+
- Validate JSON structure in responses
|
|
11
|
+
- Use LLM-as-judge for qualitative evaluation
|
|
12
|
+
- Test against multiple providers simultaneously
|
|
13
|
+
- Catch prompt regressions before they hit production
|
|
14
|
+
|
|
15
|
+
## Installation
|
|
16
|
+
|
|
17
|
+
Add this line to your application's Gemfile:
|
|
18
|
+
|
|
19
|
+
```ruby
|
|
20
|
+
gem 'minitest-promptfoo'
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
And then execute:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
$ bundle install
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
Or install it yourself as:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
$ gem install minitest-promptfoo
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
### Promptfoo Setup
|
|
36
|
+
|
|
37
|
+
You'll need promptfoo installed. You can either:
|
|
38
|
+
|
|
39
|
+
1. Install it locally via npm:
|
|
40
|
+
```bash
|
|
41
|
+
npm install -D promptfoo
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
2. Or use npx (no installation required):
|
|
45
|
+
```bash
|
|
46
|
+
# The gem will automatically fall back to `npx promptfoo`
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Basic Usage
|
|
50
|
+
|
|
51
|
+
### Plain Ruby Projects
|
|
52
|
+
|
|
53
|
+
```ruby
|
|
54
|
+
require 'minitest/autorun'
|
|
55
|
+
require 'minitest/promptfoo'
|
|
56
|
+
|
|
57
|
+
class GreetingPromptTest < Minitest::Promptfoo::Test
|
|
58
|
+
# Set provider(s) for all tests in this class
|
|
59
|
+
self.providers = "openai:gpt-4o-mini"
|
|
60
|
+
|
|
61
|
+
def prompt_path
|
|
62
|
+
"prompts/greeting.ptmpl" # Or .liquid
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
def test_generates_professional_greeting
|
|
66
|
+
assert_prompt(vars: { name: "Alice" }) do |response|
|
|
67
|
+
response.includes("Hello Alice")
|
|
68
|
+
response.matches(/[A-Z]/) # Starts with capital
|
|
69
|
+
response.rubric("Response is professional and courteous")
|
|
70
|
+
end
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
def test_validates_json_structure
|
|
74
|
+
assert_prompt(vars: { format: "json" }) do |response|
|
|
75
|
+
response.json_includes(key: "greeting", value: "Hello")
|
|
76
|
+
response.json_includes(key: "sentiment", value: "positive")
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### Rails Projects
|
|
83
|
+
|
|
84
|
+
In Rails, the gem automatically discovers prompt files based on test file paths:
|
|
85
|
+
|
|
86
|
+
```ruby
|
|
87
|
+
# test/services/greeting_service_test.rb
|
|
88
|
+
class GreetingServiceTest < Minitest::Promptfoo::RailsTest
|
|
89
|
+
self.providers = "openai:gpt-4o-mini"
|
|
90
|
+
|
|
91
|
+
# Automatically finds app/services/greeting_service.ptmpl
|
|
92
|
+
# No need to define prompt_path!
|
|
93
|
+
|
|
94
|
+
def test_greeting_is_friendly
|
|
95
|
+
assert_prompt(vars: { name: "Bob" }) do |response|
|
|
96
|
+
response.includes("Hello Bob")
|
|
97
|
+
response.rubric("Greeting is warm and welcoming", threshold: 0.7)
|
|
98
|
+
end
|
|
99
|
+
end
|
|
100
|
+
end
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
## Configuration
|
|
104
|
+
|
|
105
|
+
Configure the gem in your test helper or setup file:
|
|
106
|
+
|
|
107
|
+
```ruby
|
|
108
|
+
# test/test_helper.rb
|
|
109
|
+
require 'minitest/promptfoo'
|
|
110
|
+
|
|
111
|
+
Minitest::Promptfoo.configure do |config|
|
|
112
|
+
# Optional: specify custom promptfoo executable path
|
|
113
|
+
config.promptfoo_executable = "./node_modules/.bin/promptfoo"
|
|
114
|
+
|
|
115
|
+
# Optional: set root path for resolving prompt files
|
|
116
|
+
config.root_path = Rails.root # or Dir.pwd
|
|
117
|
+
end
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## Assertion Types
|
|
121
|
+
|
|
122
|
+
### String Matching
|
|
123
|
+
|
|
124
|
+
```ruby
|
|
125
|
+
assert_prompt(vars: { topic: "weather" }) do |response|
|
|
126
|
+
# Contains substring
|
|
127
|
+
response.includes("sunny")
|
|
128
|
+
|
|
129
|
+
# Matches regex
|
|
130
|
+
response.matches(/\d+°[CF]/)
|
|
131
|
+
|
|
132
|
+
# Exact equality
|
|
133
|
+
response.equals("It's a beautiful day!")
|
|
134
|
+
end
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
### JSON Validation
|
|
138
|
+
|
|
139
|
+
```ruby
|
|
140
|
+
assert_prompt(vars: { query: "status" }) do |response|
|
|
141
|
+
response.json_includes(key: "status", value: "success")
|
|
142
|
+
response.json_includes(key: "code", value: 200)
|
|
143
|
+
end
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
### Custom JavaScript
|
|
147
|
+
|
|
148
|
+
```ruby
|
|
149
|
+
assert_prompt(vars: { count: 5 }) do |response|
|
|
150
|
+
response.javascript("parseInt(output) > 3")
|
|
151
|
+
response.javascript("output.split(' ').length <= 10")
|
|
152
|
+
end
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
### LLM-as-Judge
|
|
156
|
+
|
|
157
|
+
```ruby
|
|
158
|
+
assert_prompt(vars: { tone: "professional" }) do |response|
|
|
159
|
+
response.rubric("Response is professional and courteous")
|
|
160
|
+
response.rubric("Uses business-appropriate language", threshold: 0.8)
|
|
161
|
+
end
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
## Multiple Providers
|
|
165
|
+
|
|
166
|
+
Test your prompt across multiple providers:
|
|
167
|
+
|
|
168
|
+
```ruby
|
|
169
|
+
class MultiProviderTest < Minitest::Promptfoo::Test
|
|
170
|
+
self.providers = [
|
|
171
|
+
"openai:gpt-4o-mini",
|
|
172
|
+
"openai:chat:anthropic:claude-3-7-sonnet",
|
|
173
|
+
"openai:chat:google:gemini-2.0-flash"
|
|
174
|
+
]
|
|
175
|
+
|
|
176
|
+
def prompt_path
|
|
177
|
+
"prompts/greeting.ptmpl"
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
def test_works_across_providers
|
|
181
|
+
assert_prompt(vars: { name: "Alice" }) do |response|
|
|
182
|
+
response.includes("Alice")
|
|
183
|
+
end
|
|
184
|
+
end
|
|
185
|
+
end
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
## Provider Configuration
|
|
189
|
+
|
|
190
|
+
Pass custom configuration to providers:
|
|
191
|
+
|
|
192
|
+
```ruby
|
|
193
|
+
def test_json_response_format
|
|
194
|
+
json_provider = {
|
|
195
|
+
id: "openai:gpt-4o-mini",
|
|
196
|
+
config: {
|
|
197
|
+
response_format: { type: "json_object" },
|
|
198
|
+
temperature: 0.7
|
|
199
|
+
}
|
|
200
|
+
}
|
|
201
|
+
|
|
202
|
+
assert_prompt(vars: { input: "data" }, providers: json_provider) do |response|
|
|
203
|
+
response.json_includes(key: "result", value: "success")
|
|
204
|
+
end
|
|
205
|
+
end
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
## Prompt File Formats
|
|
209
|
+
|
|
210
|
+
### Promptfoo Templates (.ptmpl)
|
|
211
|
+
|
|
212
|
+
Use double-brace syntax for variables:
|
|
213
|
+
|
|
214
|
+
```
|
|
215
|
+
You are a helpful assistant.
|
|
216
|
+
|
|
217
|
+
Greet the user named {{name}} in a {{tone}} manner.
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
### Liquid Templates (.liquid)
|
|
221
|
+
|
|
222
|
+
Standard Liquid syntax (converted internally):
|
|
223
|
+
|
|
224
|
+
```
|
|
225
|
+
You are a helpful assistant.
|
|
226
|
+
|
|
227
|
+
Greet the user named {name} in a {tone} manner.
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
## Pre-rendering Templates
|
|
231
|
+
|
|
232
|
+
If your prompt contains syntax that conflicts with promptfoo's templating (like analyzing Liquid code), pre-render it:
|
|
233
|
+
|
|
234
|
+
```ruby
|
|
235
|
+
def test_liquid_code_analysis
|
|
236
|
+
assert_prompt(
|
|
237
|
+
vars: { code: "{{user.name | upcase}}" },
|
|
238
|
+
pre_render: true
|
|
239
|
+
) do |response|
|
|
240
|
+
response.includes("variable interpolation")
|
|
241
|
+
end
|
|
242
|
+
end
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
## Debugging
|
|
246
|
+
|
|
247
|
+
Enable debug output to see the generated promptfoo config:
|
|
248
|
+
|
|
249
|
+
```bash
|
|
250
|
+
DEBUG_PROMPT_TEST=1 bundle exec rake test
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
Or enable verbose mode for detailed failure messages:
|
|
254
|
+
|
|
255
|
+
```ruby
|
|
256
|
+
assert_prompt(vars: { name: "Alice" }, verbose: true) do |response|
|
|
257
|
+
response.rubric("Be friendly")
|
|
258
|
+
end
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
## Real-World Example
|
|
262
|
+
|
|
263
|
+
```ruby
|
|
264
|
+
class CustomerSupportPromptTest < Minitest::Promptfoo::Test
|
|
265
|
+
self.providers = "openai:gpt-4o-mini"
|
|
266
|
+
|
|
267
|
+
def prompt_path
|
|
268
|
+
"prompts/customer_support.ptmpl"
|
|
269
|
+
end
|
|
270
|
+
|
|
271
|
+
def test_handles_refund_request_professionally
|
|
272
|
+
assert_prompt(vars: {
|
|
273
|
+
issue: "item arrived damaged",
|
|
274
|
+
customer_name: "Jane Doe"
|
|
275
|
+
}) do |response|
|
|
276
|
+
response.includes("Jane")
|
|
277
|
+
response.rubric("Acknowledges the issue empathetically")
|
|
278
|
+
response.rubric("Offers clear next steps")
|
|
279
|
+
response.rubric("Maintains professional tone")
|
|
280
|
+
response.matches(/refund|replacement/i)
|
|
281
|
+
end
|
|
282
|
+
end
|
|
283
|
+
|
|
284
|
+
def test_escalates_complex_issues
|
|
285
|
+
assert_prompt(vars: {
|
|
286
|
+
issue: "legal complaint about data breach",
|
|
287
|
+
customer_name: "John Smith"
|
|
288
|
+
}) do |response|
|
|
289
|
+
response.rubric("Recognizes this requires escalation")
|
|
290
|
+
response.rubric("Does not make promises outside of AI's authority")
|
|
291
|
+
response.includes("escalate")
|
|
292
|
+
end
|
|
293
|
+
end
|
|
294
|
+
end
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
## Differences from ActiveSupport::TestCase
|
|
298
|
+
|
|
299
|
+
When using `Minitest::Promptfoo::Test` (non-Rails), note these differences:
|
|
300
|
+
|
|
301
|
+
- No fixtures or setup helpers from Rails
|
|
302
|
+
- Must explicitly define `prompt_path`
|
|
303
|
+
- No automatic database transaction rollbacks
|
|
304
|
+
- Uses plain Minitest assertions
|
|
305
|
+
|
|
306
|
+
For Rails projects, use `Minitest::Promptfoo::RailsTest` to get all Rails testing features plus automatic prompt discovery.
|
|
307
|
+
|
|
308
|
+
## Development
|
|
309
|
+
|
|
310
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
|
311
|
+
|
|
312
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
313
|
+
|
|
314
|
+
## Contributing
|
|
315
|
+
|
|
316
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/christhesoul/minitest-promptfoo.
|
|
317
|
+
|
|
318
|
+
## License
|
|
319
|
+
|
|
320
|
+
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
|
321
|
+
|
|
322
|
+
## Credits
|
|
323
|
+
|
|
324
|
+
Built with love on top of:
|
|
325
|
+
- [promptfoo](https://www.promptfoo.dev/) - The excellent prompt testing framework
|
|
326
|
+
- [minitest](https://github.com/minitest/minitest) - Ruby's favorite testing library
|
data/Rakefile
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Example usage of minitest-promptfoo
|
|
4
|
+
#
|
|
5
|
+
# To run this example:
|
|
6
|
+
# 1. Create a prompt file at examples/greeting.ptmpl
|
|
7
|
+
# 2. Run: ruby examples/simple_prompt_test.rb
|
|
8
|
+
|
|
9
|
+
require "bundler/setup"
|
|
10
|
+
require "minitest/autorun"
|
|
11
|
+
require "minitest/promptfoo"
|
|
12
|
+
|
|
13
|
+
class SimplePromptTest < Minitest::Promptfoo::Test
|
|
14
|
+
# Use the echo provider for testing (doesn't call actual LLMs)
|
|
15
|
+
self.providers = "echo"
|
|
16
|
+
|
|
17
|
+
def prompt_path
|
|
18
|
+
File.join(__dir__, "greeting.ptmpl")
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def test_prompt_includes_name
|
|
22
|
+
assert_prompt(vars: {name: "Alice", tone: "friendly"}) do |response|
|
|
23
|
+
response.includes("Alice")
|
|
24
|
+
response.includes("friendly")
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
def test_prompt_with_different_tone
|
|
29
|
+
assert_prompt(vars: {name: "Bob", tone: "professional"}) do |response|
|
|
30
|
+
response.includes("Bob")
|
|
31
|
+
response.includes("professional")
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
end
|
|
@@ -0,0 +1,79 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "json"
|
|
4
|
+
|
|
5
|
+
module Minitest
|
|
6
|
+
module Promptfoo
|
|
7
|
+
# DSL for building promptfoo assertions in a minitest-like style
|
|
8
|
+
#
|
|
9
|
+
# Example:
|
|
10
|
+
# builder = AssertionBuilder.new
|
|
11
|
+
# builder.includes("Hello")
|
|
12
|
+
# builder.matches(/\d+/)
|
|
13
|
+
# builder.rubric("Response is professional")
|
|
14
|
+
# builder.to_promptfoo_assertions
|
|
15
|
+
class AssertionBuilder
|
|
16
|
+
def initialize
|
|
17
|
+
@assertions = []
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
# String inclusion check
|
|
21
|
+
def includes(text)
|
|
22
|
+
@assertions << {
|
|
23
|
+
"type" => "contains",
|
|
24
|
+
"value" => text
|
|
25
|
+
}
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
# Regex pattern matching
|
|
29
|
+
def matches(pattern)
|
|
30
|
+
@assertions << {
|
|
31
|
+
"type" => "regex",
|
|
32
|
+
"value" => pattern.source
|
|
33
|
+
}
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
# Exact equality check
|
|
37
|
+
def equals(expected)
|
|
38
|
+
@assertions << {
|
|
39
|
+
"type" => "equals",
|
|
40
|
+
"value" => expected
|
|
41
|
+
}
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
# JSON structure validation using JavaScript
|
|
45
|
+
def json_includes(key:, value:)
|
|
46
|
+
@assertions << {
|
|
47
|
+
"type" => "is-json"
|
|
48
|
+
}
|
|
49
|
+
# Handle both string output (needs parsing) and object output (already parsed)
|
|
50
|
+
@assertions << {
|
|
51
|
+
"type" => "javascript",
|
|
52
|
+
"value" => "(typeof output === 'string' ? JSON.parse(output) : output)[#{key.inspect}] === #{value.to_json}"
|
|
53
|
+
}
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
# Custom JavaScript assertion
|
|
57
|
+
def javascript(js_code)
|
|
58
|
+
@assertions << {
|
|
59
|
+
"type" => "javascript",
|
|
60
|
+
"value" => js_code
|
|
61
|
+
}
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
# LLM-as-judge rubric evaluation
|
|
65
|
+
def rubric(criteria, threshold: 0.5)
|
|
66
|
+
@assertions << {
|
|
67
|
+
"type" => "llm-rubric",
|
|
68
|
+
"value" => criteria,
|
|
69
|
+
"threshold" => threshold
|
|
70
|
+
}
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
# Convert to promptfoo assertion format
|
|
74
|
+
def to_promptfoo_assertions
|
|
75
|
+
@assertions
|
|
76
|
+
end
|
|
77
|
+
end
|
|
78
|
+
end
|
|
79
|
+
end
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Minitest
|
|
4
|
+
module Promptfoo
|
|
5
|
+
class Configuration
|
|
6
|
+
attr_accessor :promptfoo_executable, :root_path
|
|
7
|
+
|
|
8
|
+
def initialize
|
|
9
|
+
@promptfoo_executable = nil
|
|
10
|
+
@root_path = Dir.pwd
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
# Resolves the promptfoo executable path
|
|
14
|
+
# Priority: configured path > npx promptfoo
|
|
15
|
+
def resolve_executable
|
|
16
|
+
return promptfoo_executable if promptfoo_executable && executable_exists?(promptfoo_executable)
|
|
17
|
+
|
|
18
|
+
# Try local node_modules
|
|
19
|
+
local_bin = File.join(root_path, "node_modules", ".bin", "promptfoo")
|
|
20
|
+
return local_bin if executable_exists?(local_bin)
|
|
21
|
+
|
|
22
|
+
# Fall back to npx
|
|
23
|
+
"npx promptfoo"
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
private
|
|
27
|
+
|
|
28
|
+
def executable_exists?(path)
|
|
29
|
+
File.exist?(path) && File.executable?(path)
|
|
30
|
+
end
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
class << self
|
|
34
|
+
attr_writer :configuration
|
|
35
|
+
|
|
36
|
+
def configuration
|
|
37
|
+
@configuration ||= Configuration.new
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
def configure
|
|
41
|
+
yield(configuration)
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
def reset_configuration!
|
|
45
|
+
@configuration = Configuration.new
|
|
46
|
+
end
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
end
|
|
@@ -0,0 +1,226 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "json"
|
|
4
|
+
|
|
5
|
+
module Minitest
|
|
6
|
+
module Promptfoo
|
|
7
|
+
# Formats promptfoo test failures into human-readable error messages
|
|
8
|
+
class FailureFormatter
|
|
9
|
+
def initialize(verbose: false)
|
|
10
|
+
@verbose = verbose
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
# Main entry point: formats a complete failure message from promptfoo results
|
|
14
|
+
def format_results(passing_providers, failing_providers)
|
|
15
|
+
msg = "Prompt evaluation results:\n"
|
|
16
|
+
|
|
17
|
+
passing_providers.each do |provider_id|
|
|
18
|
+
msg += " ✓ #{provider_id}\n"
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
failing_providers.each do |failure|
|
|
22
|
+
msg += " ✗ #{failure[:id]}\n"
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
msg += "\n"
|
|
26
|
+
|
|
27
|
+
failing_providers.each do |failure|
|
|
28
|
+
msg += format_provider_failure(failure[:id], failure[:result])
|
|
29
|
+
msg += "\n"
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
unless @verbose
|
|
33
|
+
msg += "💡 Tip: Add `verbose: true` to assert_prompt for detailed debugging output\n"
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
msg
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
private
|
|
40
|
+
|
|
41
|
+
def format_provider_failure(provider_id, provider_result)
|
|
42
|
+
output_text = provider_result.dig("response", "output") || provider_result.dig("output")
|
|
43
|
+
error = provider_result.dig("error") || provider_result.dig("response", "error")
|
|
44
|
+
grading_result = provider_result.dig("gradingResult") || {}
|
|
45
|
+
component_results = grading_result.dig("componentResults") || []
|
|
46
|
+
|
|
47
|
+
msg = "#{provider_id} FAILED:\n\n"
|
|
48
|
+
|
|
49
|
+
msg += format_api_error(error) if error&.length&.positive?
|
|
50
|
+
msg += format_response_output(output_text, error)
|
|
51
|
+
|
|
52
|
+
assertion_failures = extract_assertion_failures(component_results)
|
|
53
|
+
msg += format_assertion_failures(assertion_failures, output_text) if assertion_failures.any?
|
|
54
|
+
|
|
55
|
+
msg += format_verbose_output(provider_result) if @verbose
|
|
56
|
+
|
|
57
|
+
msg
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def format_api_error(error)
|
|
61
|
+
"API Error:\n #{error}\n\n"
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def format_response_output(output_text, error)
|
|
65
|
+
if output_text && output_text.to_s.length > 0
|
|
66
|
+
formatted_output = output_text.is_a?(String) ? output_text : JSON.pretty_generate(output_text)
|
|
67
|
+
"Response:\n #{formatted_output.gsub("\n", "\n ")}\n\n"
|
|
68
|
+
elsif !error || error.length == 0
|
|
69
|
+
"No response received from provider\n\n"
|
|
70
|
+
else
|
|
71
|
+
""
|
|
72
|
+
end
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def format_assertion_failures(assertion_failures, output_text)
|
|
76
|
+
msg = "Failures:\n"
|
|
77
|
+
|
|
78
|
+
# If JSON parsing failed, only show that error (other failures are consequences)
|
|
79
|
+
json_parse_failure = assertion_failures.find { |f| f[:type] == "is-json" }
|
|
80
|
+
|
|
81
|
+
if json_parse_failure
|
|
82
|
+
msg += format_assertion_failure(json_parse_failure, output_text)
|
|
83
|
+
else
|
|
84
|
+
assertion_failures.each do |failure|
|
|
85
|
+
msg += format_assertion_failure(failure, output_text)
|
|
86
|
+
end
|
|
87
|
+
end
|
|
88
|
+
|
|
89
|
+
msg
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def format_verbose_output(provider_result)
|
|
93
|
+
"\nRaw Provider Result (verbose mode):\n" \
|
|
94
|
+
" #{JSON.pretty_generate(provider_result).gsub("\n", "\n ")}\n"
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
def extract_assertion_failures(component_results)
|
|
98
|
+
component_results.select { |result| !result.dig("pass") }.map do |result|
|
|
99
|
+
{
|
|
100
|
+
type: result.dig("assertion", "type"),
|
|
101
|
+
value: result.dig("assertion", "value"),
|
|
102
|
+
threshold: result.dig("assertion", "threshold"),
|
|
103
|
+
score: result.dig("score"),
|
|
104
|
+
reason: result.dig("reason"),
|
|
105
|
+
named_scores: result.dig("namedScores")
|
|
106
|
+
}
|
|
107
|
+
end
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
def format_assertion_failure(failure, output_text)
|
|
111
|
+
case failure[:type]
|
|
112
|
+
when "llm-rubric"
|
|
113
|
+
format_rubric_failure(failure)
|
|
114
|
+
when "contains"
|
|
115
|
+
" ✗ includes(#{failure[:value].inspect}) - not found in response\n"
|
|
116
|
+
when "regex"
|
|
117
|
+
" ✗ matches(/#{failure[:value]}/) - pattern not found\n"
|
|
118
|
+
when "equals"
|
|
119
|
+
" ✗ equals(#{failure[:value].inspect}) - response does not match\n"
|
|
120
|
+
when "javascript"
|
|
121
|
+
format_javascript_failure(failure, output_text)
|
|
122
|
+
when "is-json"
|
|
123
|
+
format_invalid_json_failure(failure, output_text)
|
|
124
|
+
else
|
|
125
|
+
" ✗ #{failure[:type]} assertion failed\n"
|
|
126
|
+
end
|
|
127
|
+
end
|
|
128
|
+
|
|
129
|
+
def format_javascript_failure(failure, output_text)
|
|
130
|
+
js_code = failure[:value].to_s
|
|
131
|
+
|
|
132
|
+
if json_assertion?(js_code)
|
|
133
|
+
parsed = parse_json_assertion(js_code)
|
|
134
|
+
if parsed
|
|
135
|
+
key = parsed[:key]
|
|
136
|
+
expected = parsed[:expected]
|
|
137
|
+
actual_value = extract_json_value(output_text, key.to_s)
|
|
138
|
+
msg = " ✗ json_includes(key: #{key.inspect})\n"
|
|
139
|
+
msg += " Expected: #{expected.inspect}\n"
|
|
140
|
+
msg += " Actual: #{actual_value.inspect}\n"
|
|
141
|
+
return msg
|
|
142
|
+
end
|
|
143
|
+
end
|
|
144
|
+
|
|
145
|
+
" ✗ javascript assertion failed\n"
|
|
146
|
+
end
|
|
147
|
+
|
|
148
|
+
def format_invalid_json_failure(failure, output_text)
|
|
149
|
+
msg = " ✗ response is not valid JSON\n"
|
|
150
|
+
|
|
151
|
+
if output_text && output_text.to_s.length > 0
|
|
152
|
+
text = output_text.is_a?(String) ? output_text : JSON.pretty_generate(output_text)
|
|
153
|
+
snippet = (text.length > 100) ? "#{text[0..100]}..." : text
|
|
154
|
+
msg += " Output: #{snippet.inspect}\n"
|
|
155
|
+
end
|
|
156
|
+
|
|
157
|
+
msg
|
|
158
|
+
end
|
|
159
|
+
|
|
160
|
+
def format_rubric_failure(failure)
|
|
161
|
+
score = failure[:score] || 0
|
|
162
|
+
threshold = failure[:threshold] || 0.5
|
|
163
|
+
|
|
164
|
+
msg = " ✗ rubric (score: #{score.round(2)}/#{threshold})\n"
|
|
165
|
+
if score >= threshold
|
|
166
|
+
msg += " Note: Score meets threshold but one or more criteria failed\n"
|
|
167
|
+
msg += " Promptfoo requires ALL criteria to pass, not just the aggregate score\n"
|
|
168
|
+
end
|
|
169
|
+
|
|
170
|
+
if @verbose
|
|
171
|
+
criteria = failure[:value]
|
|
172
|
+
reason = failure[:reason]
|
|
173
|
+
|
|
174
|
+
if criteria && criteria.to_s.length > 0
|
|
175
|
+
msg += "\n Rubric criteria:\n"
|
|
176
|
+
criteria.split("\n").each do |line|
|
|
177
|
+
msg += " #{line}\n" if line.strip.length > 0
|
|
178
|
+
end
|
|
179
|
+
end
|
|
180
|
+
|
|
181
|
+
if reason && reason.to_s.length > 0
|
|
182
|
+
msg += "\n Judge feedback:\n"
|
|
183
|
+
reason.split("\n").each do |line|
|
|
184
|
+
msg += " #{line}\n"
|
|
185
|
+
end
|
|
186
|
+
end
|
|
187
|
+
|
|
188
|
+
msg += "\n"
|
|
189
|
+
end
|
|
190
|
+
|
|
191
|
+
msg
|
|
192
|
+
end
|
|
193
|
+
|
|
194
|
+
# JSON assertion helpers
|
|
195
|
+
|
|
196
|
+
def json_assertion?(js_code)
|
|
197
|
+
js_code.to_s.match?(/JSON\.parse\(output\)\[/)
|
|
198
|
+
end
|
|
199
|
+
|
|
200
|
+
def parse_json_assertion(js_code)
|
|
201
|
+
match = js_code.match(/JSON\.parse\(output\)\[(['"])(.+?)\1\]\s*===\s*(.+)/)
|
|
202
|
+
return nil unless match
|
|
203
|
+
|
|
204
|
+
key = match[2]
|
|
205
|
+
expected_json = match[3]
|
|
206
|
+
|
|
207
|
+
expected_value = begin
|
|
208
|
+
JSON.parse(expected_json)
|
|
209
|
+
rescue JSON::ParserError
|
|
210
|
+
expected_json
|
|
211
|
+
end
|
|
212
|
+
|
|
213
|
+
{key: key, expected: expected_value}
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
def extract_json_value(output_text, key)
|
|
217
|
+
return nil unless output_text && output_text.to_s.length > 0
|
|
218
|
+
|
|
219
|
+
parsed = output_text.is_a?(String) ? JSON.parse(output_text) : output_text
|
|
220
|
+
parsed[key]
|
|
221
|
+
rescue JSON::ParserError
|
|
222
|
+
nil
|
|
223
|
+
end
|
|
224
|
+
end
|
|
225
|
+
end
|
|
226
|
+
end
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "open3"
|
|
4
|
+
require "json"
|
|
5
|
+
|
|
6
|
+
module Minitest
|
|
7
|
+
module Promptfoo
|
|
8
|
+
# Handles execution of the promptfoo CLI and parsing of results
|
|
9
|
+
class PromptfooRunner
|
|
10
|
+
class ExecutionError < StandardError; end
|
|
11
|
+
|
|
12
|
+
def initialize(configuration)
|
|
13
|
+
@configuration = configuration
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
# Executes promptfoo CLI with the given config and options
|
|
17
|
+
# Returns a hash with :success, :stdout, :stderr keys
|
|
18
|
+
def execute(config_path, working_dir, pre_render: false, show_output: false)
|
|
19
|
+
env_vars = build_env_vars(pre_render: pre_render)
|
|
20
|
+
cmd = build_command(config_path)
|
|
21
|
+
|
|
22
|
+
if show_output
|
|
23
|
+
execute_with_output(env_vars, cmd, working_dir)
|
|
24
|
+
else
|
|
25
|
+
execute_silently(env_vars, cmd, working_dir)
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
# Parses promptfoo JSON output file
|
|
30
|
+
def parse_output(output_path)
|
|
31
|
+
return {} unless File.exist?(output_path)
|
|
32
|
+
|
|
33
|
+
JSON.parse(File.read(output_path))
|
|
34
|
+
rescue JSON::ParserError => e
|
|
35
|
+
raise ExecutionError, "Failed to parse promptfoo output: #{e.message}"
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
private
|
|
39
|
+
|
|
40
|
+
def build_env_vars(pre_render:)
|
|
41
|
+
pre_render ? {"PROMPTFOO_DISABLE_TEMPLATING" => "true"} : {}
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
def build_command(config_path)
|
|
45
|
+
base_cmd = @configuration.resolve_executable
|
|
46
|
+
args = ["eval", "-c", config_path, "--no-cache"]
|
|
47
|
+
|
|
48
|
+
if base_cmd.start_with?("npx")
|
|
49
|
+
base_cmd.split + args
|
|
50
|
+
else
|
|
51
|
+
[base_cmd] + args
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
def execute_with_output(env_vars, cmd, working_dir)
|
|
56
|
+
success = system(env_vars, *cmd, chdir: working_dir)
|
|
57
|
+
{success: success, stdout: "", stderr: ""}
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def execute_silently(env_vars, cmd, working_dir)
|
|
61
|
+
stdout, stderr, status = Open3.capture3(env_vars, *cmd, chdir: working_dir)
|
|
62
|
+
{success: status.success?, stdout: stdout, stderr: stderr}
|
|
63
|
+
end
|
|
64
|
+
end
|
|
65
|
+
end
|
|
66
|
+
end
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Minitest
|
|
4
|
+
module Promptfoo
|
|
5
|
+
# Rails integration for automatic prompt file discovery
|
|
6
|
+
#
|
|
7
|
+
# Automatically discovers .ptmpl or .liquid prompt files based on Rails conventions:
|
|
8
|
+
# app/services/foo/bar.ptmpl → test/services/foo/bar_test.rb
|
|
9
|
+
#
|
|
10
|
+
# Usage:
|
|
11
|
+
# class MyPromptTest < Minitest::Promptfoo::RailsTest
|
|
12
|
+
# # No need to define prompt_path, it's auto-discovered!
|
|
13
|
+
#
|
|
14
|
+
# test "generates greeting" do
|
|
15
|
+
# assert_prompt(vars: { name: "Alice" }) do |response|
|
|
16
|
+
# response.includes("Hello Alice")
|
|
17
|
+
# end
|
|
18
|
+
# end
|
|
19
|
+
# end
|
|
20
|
+
module Rails
|
|
21
|
+
def self.included(base)
|
|
22
|
+
base.class_eval do
|
|
23
|
+
# Override prompt_path to use Rails convention-based discovery
|
|
24
|
+
def prompt_path
|
|
25
|
+
@prompt_path ||= resolve_prompt_path_rails
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
private
|
|
29
|
+
|
|
30
|
+
def resolve_prompt_path_rails
|
|
31
|
+
test_file_path = method(name).source_location[0]
|
|
32
|
+
test_dir = File.dirname(test_file_path)
|
|
33
|
+
test_basename = File.basename(test_file_path, "_test.rb")
|
|
34
|
+
|
|
35
|
+
app_dir = test_dir.gsub(%r{^(.*/)?test/}, '\1app/')
|
|
36
|
+
|
|
37
|
+
[".ptmpl", ".liquid"].each do |ext|
|
|
38
|
+
candidate = File.join(app_dir, "#{test_basename}#{ext}")
|
|
39
|
+
return candidate if File.exist?(candidate)
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
raise PromptNotFoundError, "Could not find prompt file for #{test_file_path}"
|
|
43
|
+
end
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
# Convenience class that combines Test + Rails integration
|
|
49
|
+
# Inherits from ActiveSupport::TestCase if available, otherwise Minitest::Test
|
|
50
|
+
if defined?(ActiveSupport::TestCase)
|
|
51
|
+
class RailsTest < ActiveSupport::TestCase
|
|
52
|
+
include Minitest::Promptfoo::Rails
|
|
53
|
+
|
|
54
|
+
# Borrow all the assertion methods from Test
|
|
55
|
+
# but keep ActiveSupport::TestCase as the base
|
|
56
|
+
include Minitest::Promptfoo::Test.instance_methods(false).each_with_object(Module.new) { |m, mod|
|
|
57
|
+
mod.define_method(m, Minitest::Promptfoo::Test.instance_method(m))
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
# Include class methods
|
|
61
|
+
class << self
|
|
62
|
+
attr_accessor :_providers
|
|
63
|
+
|
|
64
|
+
def providers
|
|
65
|
+
@_providers || "echo"
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def providers=(value)
|
|
69
|
+
@_providers = value
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
def inherited(subclass)
|
|
73
|
+
super
|
|
74
|
+
subclass._providers = _providers
|
|
75
|
+
end
|
|
76
|
+
end
|
|
77
|
+
end
|
|
78
|
+
else
|
|
79
|
+
# Fallback if ActiveSupport isn't available
|
|
80
|
+
class RailsTest < Test
|
|
81
|
+
include Rails
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
end
|
|
85
|
+
end
|
|
@@ -0,0 +1,238 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "yaml"
|
|
4
|
+
require "tmpdir"
|
|
5
|
+
require "minitest/test"
|
|
6
|
+
require_relative "assertion_builder"
|
|
7
|
+
require_relative "failure_formatter"
|
|
8
|
+
require_relative "promptfoo_runner"
|
|
9
|
+
|
|
10
|
+
module Minitest
|
|
11
|
+
module Promptfoo
|
|
12
|
+
# Base class for testing LLM prompts using promptfoo.
|
|
13
|
+
#
|
|
14
|
+
# Recommended Usage (Minitest-like DSL):
|
|
15
|
+
# class MyPromptTest < Minitest::Promptfoo::Test
|
|
16
|
+
# # Set provider(s) for ALL tests in this class (DRY!)
|
|
17
|
+
# # Providers can be strings or hashes with config (see promptfoo docs)
|
|
18
|
+
# self.providers = [
|
|
19
|
+
# "openai:gpt-4o-mini", # Simple string format
|
|
20
|
+
# {
|
|
21
|
+
# id: "openai:chat:anthropic:claude-3-7-sonnet",
|
|
22
|
+
# config: { response_format: { type: "json_object" } } # With config
|
|
23
|
+
# }
|
|
24
|
+
# ]
|
|
25
|
+
#
|
|
26
|
+
# def prompt_path
|
|
27
|
+
# "prompts/greeting.ptmpl" # Or .liquid
|
|
28
|
+
# end
|
|
29
|
+
#
|
|
30
|
+
# test "generates professional greeting" do
|
|
31
|
+
# assert_prompt(vars: { name: "Alice" }) do |response|
|
|
32
|
+
# response.includes("Hello Alice")
|
|
33
|
+
# response.matches(/[A-Z]/) # Starts with capital letter
|
|
34
|
+
# response.rubric("Response is professional and courteous")
|
|
35
|
+
# end
|
|
36
|
+
# end
|
|
37
|
+
# end
|
|
38
|
+
class Test < Minitest::Test
|
|
39
|
+
class PromptNotFoundError < StandardError; end
|
|
40
|
+
class EvaluationError < StandardError; end
|
|
41
|
+
|
|
42
|
+
# Class-level configuration
|
|
43
|
+
class << self
|
|
44
|
+
def debug?
|
|
45
|
+
ENV["DEBUG_PROMPT_TEST"] == "1"
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
def providers
|
|
49
|
+
@providers || "echo"
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
attr_writer :providers
|
|
53
|
+
|
|
54
|
+
def inherited(subclass)
|
|
55
|
+
super
|
|
56
|
+
subclass.providers = providers if defined?(@providers)
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def prompt_path
|
|
61
|
+
raise NotImplementedError, "#{self.class}#prompt_path must be implemented"
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def prompt_content
|
|
65
|
+
@prompt_content ||= begin
|
|
66
|
+
path = prompt_path
|
|
67
|
+
raise PromptNotFoundError, "Prompt file not found: #{path}" unless File.exist?(path)
|
|
68
|
+
File.read(path, encoding: "UTF-8")
|
|
69
|
+
end
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# Minitest-like DSL for prompt testing
|
|
73
|
+
#
|
|
74
|
+
# Example:
|
|
75
|
+
# assert_prompt(vars: { input: "test" }) do |response|
|
|
76
|
+
# response.includes("expected text")
|
|
77
|
+
# response.matches(/\d{3}-\d{4}/)
|
|
78
|
+
# response.rubric("Response is professional and courteous")
|
|
79
|
+
# end
|
|
80
|
+
def assert_prompt(vars:, providers: nil, verbose: false, pre_render: false, &block)
|
|
81
|
+
builder = AssertionBuilder.new
|
|
82
|
+
yield(builder)
|
|
83
|
+
|
|
84
|
+
output = evaluate_prompt(
|
|
85
|
+
prompt_text: prompt_content,
|
|
86
|
+
vars: vars,
|
|
87
|
+
providers: providers,
|
|
88
|
+
assertions: builder.to_promptfoo_assertions,
|
|
89
|
+
verbose: verbose,
|
|
90
|
+
pre_render: pre_render
|
|
91
|
+
)
|
|
92
|
+
|
|
93
|
+
# Real assertion: verify promptfoo produced results
|
|
94
|
+
assert(output.any?, "Promptfoo evaluation produced no output")
|
|
95
|
+
|
|
96
|
+
output
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
def evaluate_prompt(prompt_text:, vars:, providers: nil, assertions: [], pre_render: false, verbose: false, show_output: false)
|
|
100
|
+
Dir.mktmpdir do |tmpdir|
|
|
101
|
+
config_path = File.join(tmpdir, "promptfooconfig.yaml")
|
|
102
|
+
output_path = File.join(tmpdir, "output.json")
|
|
103
|
+
|
|
104
|
+
# Convert single-brace {var} syntax to double-brace {{var}} for promptfoo
|
|
105
|
+
promptfoo_text = prompt_text.gsub(/(?<!\{)\{(\w+)\}(?!\})/, '{{\1}}')
|
|
106
|
+
|
|
107
|
+
if pre_render
|
|
108
|
+
vars.each do |key, value|
|
|
109
|
+
promptfoo_text = promptfoo_text.gsub("{{#{key}}}", value.to_s)
|
|
110
|
+
end
|
|
111
|
+
config_vars = {}
|
|
112
|
+
else
|
|
113
|
+
config_vars = vars
|
|
114
|
+
end
|
|
115
|
+
|
|
116
|
+
# Use provided provider(s) or fall back to class-level default
|
|
117
|
+
providers_array = wrap_array(providers || self.class.providers)
|
|
118
|
+
|
|
119
|
+
config = build_promptfoo_config(
|
|
120
|
+
prompt: promptfoo_text,
|
|
121
|
+
vars: config_vars,
|
|
122
|
+
providers: providers_array,
|
|
123
|
+
assertions: assertions,
|
|
124
|
+
output_path: output_path
|
|
125
|
+
)
|
|
126
|
+
|
|
127
|
+
config_yaml = YAML.dump(config)
|
|
128
|
+
File.write(config_path, config_yaml)
|
|
129
|
+
|
|
130
|
+
debug("Promptfoo Config", config_yaml)
|
|
131
|
+
|
|
132
|
+
runner = PromptfooRunner.new(Minitest::Promptfoo.configuration)
|
|
133
|
+
result = runner.execute(config_path, tmpdir, show_output: show_output, pre_render: pre_render)
|
|
134
|
+
|
|
135
|
+
debug("Promptfoo Result", result.inspect)
|
|
136
|
+
|
|
137
|
+
output = runner.parse_output(output_path)
|
|
138
|
+
|
|
139
|
+
unless result[:success] || output.any?
|
|
140
|
+
raise EvaluationError, <<~ERROR
|
|
141
|
+
promptfoo evaluation failed
|
|
142
|
+
STDOUT: #{result[:stdout]}
|
|
143
|
+
STDERR: #{result[:stderr]}
|
|
144
|
+
ERROR
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
check_provider_failures(output, providers_array, verbose: verbose) if assertions.any?
|
|
148
|
+
|
|
149
|
+
output
|
|
150
|
+
end
|
|
151
|
+
end
|
|
152
|
+
|
|
153
|
+
private
|
|
154
|
+
|
|
155
|
+
def check_provider_failures(output, providers, verbose: false)
|
|
156
|
+
results = output.dig("results", "results") || []
|
|
157
|
+
passing_providers = []
|
|
158
|
+
failing_providers = []
|
|
159
|
+
|
|
160
|
+
results.each do |provider_result|
|
|
161
|
+
provider_id = provider_result.dig("provider", "id")
|
|
162
|
+
success = provider_result.dig("success")
|
|
163
|
+
|
|
164
|
+
if success
|
|
165
|
+
passing_providers << provider_id
|
|
166
|
+
else
|
|
167
|
+
failing_providers << {
|
|
168
|
+
id: provider_id,
|
|
169
|
+
result: provider_result
|
|
170
|
+
}
|
|
171
|
+
end
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
if failing_providers.any?
|
|
175
|
+
formatter = FailureFormatter.new(verbose: verbose)
|
|
176
|
+
error_msg = formatter.format_results(passing_providers, failing_providers)
|
|
177
|
+
flunk(error_msg)
|
|
178
|
+
end
|
|
179
|
+
end
|
|
180
|
+
|
|
181
|
+
def build_promptfoo_config(prompt:, vars:, providers:, assertions:, output_path:)
|
|
182
|
+
normalized_providers = providers.map do |provider|
|
|
183
|
+
case provider
|
|
184
|
+
when String
|
|
185
|
+
provider
|
|
186
|
+
when Hash
|
|
187
|
+
deep_stringify_keys(provider)
|
|
188
|
+
end
|
|
189
|
+
end
|
|
190
|
+
|
|
191
|
+
{
|
|
192
|
+
"prompts" => [prompt],
|
|
193
|
+
"providers" => normalized_providers,
|
|
194
|
+
"tests" => [
|
|
195
|
+
{
|
|
196
|
+
"vars" => vars.transform_keys(&:to_s),
|
|
197
|
+
"assert" => assertions
|
|
198
|
+
}
|
|
199
|
+
],
|
|
200
|
+
"outputPath" => output_path
|
|
201
|
+
}
|
|
202
|
+
end
|
|
203
|
+
|
|
204
|
+
def debug(title, content)
|
|
205
|
+
return unless self.class.debug?
|
|
206
|
+
|
|
207
|
+
warn "\n=== #{title} ==="
|
|
208
|
+
warn content
|
|
209
|
+
warn "=" * (title.length + 8)
|
|
210
|
+
warn ""
|
|
211
|
+
end
|
|
212
|
+
|
|
213
|
+
# Simple array wrapper (replaces ActiveSupport's Array.wrap)
|
|
214
|
+
def wrap_array(object)
|
|
215
|
+
case object
|
|
216
|
+
when nil then []
|
|
217
|
+
when Array then object
|
|
218
|
+
else [object]
|
|
219
|
+
end
|
|
220
|
+
end
|
|
221
|
+
|
|
222
|
+
# Simple deep stringify keys (replaces ActiveSupport method)
|
|
223
|
+
def deep_stringify_keys(hash)
|
|
224
|
+
hash.each_with_object({}) do |(key, value), result|
|
|
225
|
+
result[key.to_s] = stringify_value(value)
|
|
226
|
+
end
|
|
227
|
+
end
|
|
228
|
+
|
|
229
|
+
def stringify_value(value)
|
|
230
|
+
case value
|
|
231
|
+
when Hash then deep_stringify_keys(value)
|
|
232
|
+
when Array then value.map { |v| stringify_value(v) }
|
|
233
|
+
else value
|
|
234
|
+
end
|
|
235
|
+
end
|
|
236
|
+
end
|
|
237
|
+
end
|
|
238
|
+
end
|
|
@@ -0,0 +1,16 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require_relative "promptfoo/version"
|
|
4
|
+
require_relative "promptfoo/configuration"
|
|
5
|
+
require_relative "promptfoo/test"
|
|
6
|
+
|
|
7
|
+
# Auto-load Rails integration if Rails is detected
|
|
8
|
+
if defined?(Rails)
|
|
9
|
+
require_relative "promptfoo/rails"
|
|
10
|
+
end
|
|
11
|
+
|
|
12
|
+
module Minitest
|
|
13
|
+
module Promptfoo
|
|
14
|
+
class Error < StandardError; end
|
|
15
|
+
end
|
|
16
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: minitest-promptfoo
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Chris Waters
|
|
8
|
+
bindir: exe
|
|
9
|
+
cert_chain: []
|
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
|
+
dependencies:
|
|
12
|
+
- !ruby/object:Gem::Dependency
|
|
13
|
+
name: minitest
|
|
14
|
+
requirement: !ruby/object:Gem::Requirement
|
|
15
|
+
requirements:
|
|
16
|
+
- - "~>"
|
|
17
|
+
- !ruby/object:Gem::Version
|
|
18
|
+
version: '5.0'
|
|
19
|
+
type: :runtime
|
|
20
|
+
prerelease: false
|
|
21
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
22
|
+
requirements:
|
|
23
|
+
- - "~>"
|
|
24
|
+
- !ruby/object:Gem::Version
|
|
25
|
+
version: '5.0'
|
|
26
|
+
- !ruby/object:Gem::Dependency
|
|
27
|
+
name: rake
|
|
28
|
+
requirement: !ruby/object:Gem::Requirement
|
|
29
|
+
requirements:
|
|
30
|
+
- - "~>"
|
|
31
|
+
- !ruby/object:Gem::Version
|
|
32
|
+
version: '13.0'
|
|
33
|
+
type: :development
|
|
34
|
+
prerelease: false
|
|
35
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
36
|
+
requirements:
|
|
37
|
+
- - "~>"
|
|
38
|
+
- !ruby/object:Gem::Version
|
|
39
|
+
version: '13.0'
|
|
40
|
+
- !ruby/object:Gem::Dependency
|
|
41
|
+
name: standard
|
|
42
|
+
requirement: !ruby/object:Gem::Requirement
|
|
43
|
+
requirements:
|
|
44
|
+
- - ">="
|
|
45
|
+
- !ruby/object:Gem::Version
|
|
46
|
+
version: 1.35.1
|
|
47
|
+
type: :development
|
|
48
|
+
prerelease: false
|
|
49
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
50
|
+
requirements:
|
|
51
|
+
- - ">="
|
|
52
|
+
- !ruby/object:Gem::Version
|
|
53
|
+
version: 1.35.1
|
|
54
|
+
description: A thin Minitest wrapper around promptfoo that brings prompt testing to
|
|
55
|
+
Ruby projects. Test LLM prompts with a familiar Minitest-like DSL, supporting multiple
|
|
56
|
+
providers and assertion types.
|
|
57
|
+
email:
|
|
58
|
+
- chris.waters@shopify.com
|
|
59
|
+
executables: []
|
|
60
|
+
extensions: []
|
|
61
|
+
extra_rdoc_files: []
|
|
62
|
+
files:
|
|
63
|
+
- ".ruby-version"
|
|
64
|
+
- CHANGELOG.md
|
|
65
|
+
- LICENSE.txt
|
|
66
|
+
- README.md
|
|
67
|
+
- Rakefile
|
|
68
|
+
- examples/greeting.ptmpl
|
|
69
|
+
- examples/simple_prompt_test.rb
|
|
70
|
+
- lib/minitest/promptfoo.rb
|
|
71
|
+
- lib/minitest/promptfoo/assertion_builder.rb
|
|
72
|
+
- lib/minitest/promptfoo/configuration.rb
|
|
73
|
+
- lib/minitest/promptfoo/failure_formatter.rb
|
|
74
|
+
- lib/minitest/promptfoo/promptfoo_runner.rb
|
|
75
|
+
- lib/minitest/promptfoo/rails.rb
|
|
76
|
+
- lib/minitest/promptfoo/test.rb
|
|
77
|
+
- lib/minitest/promptfoo/version.rb
|
|
78
|
+
- sig/minitest/promptfoo.rbs
|
|
79
|
+
homepage: https://github.com/christhesoul/minitest-promptfoo
|
|
80
|
+
licenses:
|
|
81
|
+
- MIT
|
|
82
|
+
metadata:
|
|
83
|
+
homepage_uri: https://github.com/christhesoul/minitest-promptfoo
|
|
84
|
+
source_code_uri: https://github.com/christhesoul/minitest-promptfoo
|
|
85
|
+
changelog_uri: https://github.com/christhesoul/minitest-promptfoo/blob/main/CHANGELOG.md
|
|
86
|
+
rdoc_options: []
|
|
87
|
+
require_paths:
|
|
88
|
+
- lib
|
|
89
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
90
|
+
requirements:
|
|
91
|
+
- - ">="
|
|
92
|
+
- !ruby/object:Gem::Version
|
|
93
|
+
version: 2.7.0
|
|
94
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
95
|
+
requirements:
|
|
96
|
+
- - ">="
|
|
97
|
+
- !ruby/object:Gem::Version
|
|
98
|
+
version: '0'
|
|
99
|
+
requirements: []
|
|
100
|
+
rubygems_version: 3.6.7
|
|
101
|
+
specification_version: 4
|
|
102
|
+
summary: Minitest integration for promptfoo - test your LLM prompts with confidence
|
|
103
|
+
test_files: []
|