RubyGems - ask-eval - Versions diffs - 0.1.0 → 0.1.1 - Mend

ask-eval 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 255e463739c832d6784d161c3f6b26b267dbaaf6e29b27268ac8b49bd3a3fa05
-  data.tar.gz: 842364b8ea52fbae673bd53ebec870627481cac8a9214b98d92476daa609789c
+  metadata.gz: bade74b9a66f3d955fea90e17535033015f504f0b6221c332a07c7c947a486c1
+  data.tar.gz: 228d85c034b3f9f50fef305c0a5422179959e98906cc3262306bde76219bbabe
 SHA512:
-  metadata.gz: dade52a1b23dc7415788d02c858ee3ca2a0c14739a45f2bcb172b72818cc003db215c2bda019cf68fe072fdc8f40d3aa0a1ae8628df98e5e9e52b435c14c0d0b
-  data.tar.gz: e82d90fb63e88693df7f5c4e1c89dbb166ade6617bfa3980a0a7ba1cfdcc355b7d8abcc458882d60a84d701f98c21cb741c65b586399b1826701c061b3123936
+  metadata.gz: f393cad79fb781b4b76caa6e3bc036dd8b2787d6389dd9b59bdf23d01f025fbd5dbda0e4b4068ef3abd313e62f1f4c063abee67876b56c5c860a16ee775c3c5b
+  data.tar.gz: e8b4d2cb025fbb53cf9d76db336b636c6583c759a76e4b90771e384064e7e57926b4e526dc78c7d0689f7d5b29289998ab4c789345a3e35d29373cf5ac433c19

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+## [0.1.1] - 2026-06-25
+### Changed
+- Expanded tests: Runner(12t), TestCase(9t), DSL(29t), Configuration(10t), MinitestPlugin(20t), Reporters(16t). Infrastructure: rubocop, overcommit, CI matrix, gemspec, SimpleCov.
 # Changelog
 ## [0.1.0] - 2026-06-10
@@ -13,7 +17,7 @@
 - Batch evaluation runner (`Ask::Eval::Runner`)
 - CI reporters: Console, JUnit XML, GitHub Actions annotations
 - Cost tracking with `CostTracker` — per-model pricing, summary reports
+- Custom judge API — subclass `Ask::Eval::Judge` with `#call`, `#system_prompt`, `#user_message`
 - Zero runtime dependencies — deterministic assertions work standalone
 - Optional ask-llm-providers integration for judge models
 - Tests: 88 minitest tests covering all components
-</RUBY>

data/README.md CHANGED Viewed

@@ -138,9 +138,9 @@ bundle exec rake test
 ## Design Philosophy
-**This gem should NOT be a port of ruby_llm-tribunal.** See the comparison:
+**This gem is NOT a port of ruby_llm-tribunal.** See the comparison below:
-| ruby_llm-tribunal (~500 lines, 25+ files) | ask-eval (~300 lines, 10 files) |
+| ruby_llm-tribunal | ask-eval |
 |---|---|
 | Standalone evaluator with its own API | **Minitest-native assertions** — drops into existing tests |
 | 10 judges (including niche: jailbreak, PII, refusal) | **5 essential judges** — faithful, hallucination, bias, toxicity, correctness |
@@ -148,10 +148,64 @@ bundle exec rake test
 | Dataset management, red teaming, custom judges | **No datasets, no red teaming.** Focus on what matters for 80% of users. |
 | Tied to RubyLLM for judge model | **Any model as judge** — cheap gpt-4o-mini, accurate claude, or local |
 | Cost tracking: none | **Cost tracking per evaluation** |
-| Snapshot testing: none | **Eval snapshots for regression detection** |
+| Snapshot testing: none | **Eval snapshots for regression detection** (v0.2.0) |
 | Test framework integration: requires include | **Minitest plugin** — auto-loads with `require "ask/eval/minitest"` |
 ## License
 MIT
 </RUBY>
+## Custom Judges
+The 5 built-in judges cover common cases, but you can create your own by
+subclassing `Ask::Eval::Judge`:
+```ruby
+class BrandVoiceJudge < Ask::Eval::Judge
+  def call(tc)
+    query_judge(tc)
+  end
+  private
+  def system_prompt
+    <<~PROMPT
+      You are a brand voice evaluator. Determine if the response matches our guidelines:
+      - Friendly but professional tone
+      - No jargon or technical terms
+      - Empathetic and helpful
+      Respond in JSON format:
+      { "passed": true/false, "score": 0.0-1.0, "reason": "..." }
+    PROMPT
+  end
+  def user_message(tc)
+    "Response to evaluate: " + tc.actual_output
+  end
+end
+# Use it directly
+judge = BrandVoiceJudge.new(model: my_model)
+result = judge.call(Ask::Eval::TestCase.new(actual_output: response))
+puts result.reason if result.passed?
+```
+### Using a lambda for custom evaluation
+For simple checks, pass a callable directly as the `model:` parameter --
+you do not need a full judge class:
+```ruby
+assert_faithful response, context: docs, model: ->(messages) {
+  { content: JSON.generate({ passed: true, score: 1.0, reason: "All good" }) }
+}
+```
+No registration system needed. Subclassing `Judge` and implementing
+`#call`, `#system_prompt`, and `#user_message` is the entire API.

data/lib/ask/eval/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 module Ask
   module Eval
-    VERSION = "0.1.0"
+    VERSION = "0.1.1"
   end
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ask-eval
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.1.1
 platform: ruby
 authors:
 - Kaka Ruto