RubyGems - qualspec - Versions diffs - 0.1.0 → 0.1.1 - Mend

qualspec 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/.DS_Store +0 -0
data/.rubocop_todo.yml +29 -19
data/docs/.DS_Store +0 -0
data/docs/to_implement/factory_bot_integration_design.md +819 -0
data/docs/to_implement/variants_first_pass.md +480 -0
data/examples/README.md +63 -0
data/examples/prompt_variants_factory.rb +98 -0
data/examples/results/simple_variant_comparison.json +340 -0
data/examples/simple_variant_comparison.rb +68 -0
data/examples/variant_comparison.rb +71 -0
data/lib/qualspec/prompt_variant.rb +94 -0
data/lib/qualspec/suite/dsl.rb +3 -5
data/lib/qualspec/suite/reporter.rb +2 -6
data/lib/qualspec/suite/runner.rb +3 -7
data/lib/qualspec/version.rb +1 -1
data/qualspec_structure.md +80 -0
metadata +13 -2

data/lib/qualspec/suite/runner.rb CHANGED Viewed

@@ -81,9 +81,7 @@ module Qualspec
         end
         # Phase 2: Judge all responses together
-        if responses.any?
-          judge_responses(responses, scenario, variant, temperature, progress: progress)
-        end
+        judge_responses(responses, scenario, variant, temperature, progress: progress) if responses.any?
         # Record errors
         record_errors(errors, scenario, variant, temperature)
@@ -231,8 +229,7 @@ module Qualspec
         @finished_at = nil
       end
-      def record_response(candidate:, scenario:, variant: 'default', temperature: nil,
-                          response:, duration_ms: nil, cost: nil, variant_data: nil)
+      def record_response(candidate:, scenario:, response:, variant: 'default', temperature: nil, duration_ms: nil, cost: nil, variant_data: nil)
         # Store in nested structure
         @responses[candidate] ||= {}
         @responses[candidate][scenario] ||= {}
@@ -253,8 +250,7 @@ module Qualspec
         @costs[candidate] += cost
       end
-      def record_evaluation(candidate:, scenario:, variant: 'default', temperature: nil,
-                            criteria:, evaluation:, winner: nil)
+      def record_evaluation(candidate:, scenario:, criteria:, evaluation:, variant: 'default', temperature: nil, winner: nil)
         @evaluations << {
           candidate: candidate,
           scenario: scenario,

data/lib/qualspec/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Qualspec
-  VERSION = '0.1.0'
+  VERSION = '0.1.1'
 end

data/qualspec_structure.md ADDED Viewed

@@ -0,0 +1,80 @@
+# Qualspec - Key Structure
+## Repository: estiens/qualspec
+### Description
+LLM-judged qualitative testing for Ruby. Evaluate AI agents, compare models, and test subjective qualities that traditional assertions can't capture.
+### Core Library Files (lib/qualspec/)
+- **builtin_rubrics.rb** - Built-in evaluation criteria
+- **client.rb** - API client for LLM interactions
+- **configuration.rb** - Configuration management
+- **evaluation.rb** - Core evaluation logic
+- **judge.rb** - LLM judge implementation
+- **recorder.rb** - VCR integration for recording
+- **rspec.rb** - RSpec integration entry point
+- **rubric.rb** - Custom rubric definitions
+- **version.rb** - Version info
+### Subdirectories
+- **rspec/** - RSpec helpers and matchers
+- **suite/** - Evaluation suite components
+### Configuration Environment Variables
+| Variable | Description | Default |
+|----------|-------------|---------|
+| QUALSPEC_API_KEY | API key (required) | - |
+| QUALSPEC_API_URL | API endpoint | https://openrouter.ai/api/v1 |
+| QUALSPEC_MODEL | Default model for candidates | google/gemini-3-flash-preview |
+| QUALSPEC_JUDGE_MODEL | Model used as judge | Same as QUALSPEC_MODEL |
+### Key Features
+1. **Model Comparison CLI** - Compare multiple models on the same prompts
+2. **LLM Judge** - Use an LLM to evaluate responses qualitatively
+3. **RSpec Integration** - Test your agents with qualitative assertions
+4. **Built-in Rubrics** - Pre-defined evaluation criteria
+5. **Custom Rubrics** - Define your own evaluation criteria
+6. **VCR Recording** - Record and replay API calls for testing
+7. **HTML Reports** - Generate visual comparison reports
+### Example: Model Comparison
+```ruby
+# eval/comparison.rb
+Qualspec.evaluation "Model Comparison" do
+  candidates do
+    candidate "gpt4", model: "openai/gpt-4"
+    candidate "claude", model: "anthropic/claude-3-sonnet"
+  end
+  scenario "helpfulness" do
+    prompt "How do I center a div in CSS?"
+    eval "provides a working solution"
+    eval "explains the approach"
+  end
+end
+```
+### Example: RSpec Integration
+```ruby
+require "qualspec/rspec"
+RSpec.describe MyAgent do
+  include Qualspec::RSpec::Helpers
+  it "responds helpfully" do
+    response = MyAgent.call("Hello")
+    result = qualspec_evaluate(response, "responds in a friendly manner")
+    expect(result).to be_passing
+  end
+end
+```
+### CLI Usage
+```shell
+# Run comparison
+qualspec eval/comparison.rb
+# Generate HTML report
+qualspec --html report.html eval/comparison.rb
+```

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: qualspec
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.1
 platform: ruby
 authors:
 - Eric Stiens
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-12-26 00:00:00.000000000 Z
+date: 2026-01-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: faraday
@@ -62,6 +62,7 @@ executables:
 extensions: []
 extra_rdoc_files: []
 files:
+- ".DS_Store"
 - ".qualspec_cassettes/comparison_test.yml"
 - ".qualspec_cassettes/quick_test.yml"
 - ".rspec"
@@ -70,12 +71,16 @@ files:
 - CHANGELOG.md
 - README.md
 - Rakefile
+- docs/.DS_Store
 - docs/configuration.md
 - docs/evaluation-suites.md
 - docs/getting-started.md
 - docs/recording.md
 - docs/rspec-integration.md
 - docs/rubrics.md
+- docs/to_implement/factory_bot_integration_design.md
+- docs/to_implement/variants_first_pass.md
+- examples/README.md
 - examples/cassettes/qualspec_rspec_integration_basic_evaluation_evaluates_responses_with_inline_criteria.yml
 - examples/cassettes/qualspec_rspec_integration_basic_evaluation_provides_detailed_feedback_on_failure.yml
 - examples/cassettes/qualspec_rspec_integration_comparative_evaluation_compares_multiple_responses.yml
@@ -86,9 +91,13 @@ files:
 - examples/comparison.rb
 - examples/model_comparison.rb
 - examples/persona_test.rb
+- examples/prompt_variants_factory.rb
 - examples/quick_test.rb
 - examples/report.html
+- examples/results/simple_variant_comparison.json
 - examples/rspec_example_spec.rb
+- examples/simple_variant_comparison.rb
+- examples/variant_comparison.rb
 - exe/qualspec
 - lib/qualspec.rb
 - lib/qualspec/builtin_rubrics.rb
@@ -96,6 +105,7 @@ files:
 - lib/qualspec/configuration.rb
 - lib/qualspec/evaluation.rb
 - lib/qualspec/judge.rb
+- lib/qualspec/prompt_variant.rb
 - lib/qualspec/recorder.rb
 - lib/qualspec/rspec.rb
 - lib/qualspec/rspec/configuration.rb
@@ -112,6 +122,7 @@ files:
 - lib/qualspec/suite/runner.rb
 - lib/qualspec/suite/scenario.rb
 - lib/qualspec/version.rb
+- qualspec_structure.md
 - sig/qualspec.rbs
 homepage: https://github.com/estiens/qualspec
 licenses: