RubyGems - braintrust - Versions diffs - 0.2.1 → 0.3.1 - Mend

braintrust 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +4 -4
data/README.md +163 -10
data/lib/braintrust/api/functions.rb +3 -1
data/lib/braintrust/api/internal/btql.rb +3 -33
data/lib/braintrust/contrib/rails/server/application_controller.rb +34 -0
data/lib/braintrust/contrib/rails/server/engine.rb +72 -0
data/lib/braintrust/contrib/rails/server/eval_controller.rb +36 -0
data/lib/braintrust/contrib/rails/server/generator.rb +43 -0
data/lib/braintrust/contrib/rails/server/health_controller.rb +15 -0
data/lib/braintrust/contrib/rails/server/list_controller.rb +16 -0
data/lib/braintrust/contrib/rails/server/routes.rb +8 -0
data/lib/braintrust/contrib/rails/server.rb +20 -0
data/lib/braintrust/eval/context.rb +84 -21
data/lib/braintrust/eval/evaluator.rb +16 -2
data/lib/braintrust/eval/runner.rb +120 -75
data/lib/braintrust/eval.rb +22 -2
data/lib/braintrust/internal/retry.rb +41 -0
data/lib/braintrust/prompt.rb +11 -5
data/lib/braintrust/scorer.rb +55 -4
data/lib/braintrust/server/handlers/eval.rb +8 -168
data/lib/braintrust/server/handlers/list.rb +3 -41
data/lib/braintrust/server/rack.rb +2 -0
data/lib/braintrust/server/services/eval_service.rb +226 -0
data/lib/braintrust/server/services/list_service.rb +64 -0
data/lib/braintrust/trace/span_processor.rb +0 -5
data/lib/braintrust/version.rb +1 -1
metadata +26 -127

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 747b190f21c7de342f85390f8a51b17628e23fa2436776989a3ebe637bf9d596
-  data.tar.gz: 1e6c0c59c9ce56d499a04d8424506c56e2c2ad359506a6d5175c7173dc4ab238
+  metadata.gz: 27e146b06451b844b1e6416353b20f6bd572c3d1169a12a439745cb7280ce0ec
+  data.tar.gz: d726e3a146a2180bf2714846d56e65fa9d3ef1ce773adb116a8e6b1b79ba823c
 SHA512:
-  metadata.gz: 3f652583ec04f5b874e3417db4cc0dff7f43341eeffe686466b8caad5614ed336e8580ac7533ef100726f09cdb264900e0f454edd11328e611513ffc8f77d3cb
-  data.tar.gz: 3316d0cb4ccc77e2d0c0ae48c033b6f5c026237d85c85e75e139434023f713820c3790a98090dac636ae6d44127692279404bcd3ab88b0d50a3de3127d38e3a6
+  metadata.gz: 69e5150452e9dde1491664af1137cc05a9a5b651dbb5fdee27ff8a09e0e11b51c283c163019566045e1771679ed6f2eece4dd1753aa06f899e3681e7c6b99d15
+  data.tar.gz: 28cc8c86bdc13db8d33ad0dc28325c0d858f37ba1b9f41212c52e514eed649b14596c66153bca58de251c4c6dd1ddcb170d24ae100a33f912f49349671821f7a

data/README.md CHANGED Viewed

@@ -21,6 +21,7 @@ This is the official Ruby SDK for [Braintrust](https://www.braintrust.dev), for
   - [Attachments](#attachments)
   - [Viewing traces](#viewing-traces)
 - [Evals](#evals)
+  - [Tasks](#tasks)
   - [Datasets](#datasets)
   - [Scorers](#scorers)
   - [Dev Server](#dev-server)
@@ -259,6 +260,50 @@ Braintrust::Eval.run(
 )
 ```
+See [eval.rb](./examples/eval.rb) for a full example.
+### Tasks
+Define the code being evaluated as a lambda or a class. Tasks receive `input:` as a keyword argument:
+```ruby
+# Lambda
+task = ->(input:) { classify(input) }
+# Class-based (auto-derives name from class: "food_classifier")
+class FoodClassifier
+  include Braintrust::Task
+  def call(input:)
+    classify(input)
+  end
+end
+```
+#### With parameters
+Tasks can accept `parameters:` as input to drive their behavior:
+```ruby
+task = ->(input:, parameters:) {
+  value = parameters["value"]
+  from_unit = parameters["to_unit"] || 'c'
+  to_unit = parameters["from_unit"] || 'f'
+  convert_temp(temperature: value, from_unit: from_unit , to_unit: to_unit)
+}
+Braintrust::Eval.run(
+  project: "my-project",
+  cases: [...],
+  task: task,
+  scorers: [...],
+  parameters: {"value" => 23.0}
+)
+```
+See [parameters.rb](./examples/eval/parameters.rb) for a full example.
 ### Datasets
 Use test cases from a Braintrust dataset:
@@ -287,6 +332,8 @@ Braintrust::Eval.run(
 )
 ```
+See [dataset.rb](./examples/eval/dataset.rb) for a full example.
 ### Scorers
 Use scoring functions defined in Braintrust:
@@ -315,6 +362,46 @@ Braintrust::Eval.run(
 )
 ```
+See [remote_functions.rb](./examples/eval/remote_functions.rb) for a full example.
+#### Scorer metadata
+Scorers can return a Hash with `:score` and `:metadata` to attach structured context to the score. The metadata is logged on the scorer's span and visible in the Braintrust UI for debugging and filtering:
+```ruby
+Braintrust::Scorer.new("translation") do |expected:, output:|
+  common_words = output.downcase.split & expected.downcase.split
+  overlap = common_words.size.to_f / expected.split.size
+  {
+    score: overlap,
+    metadata: {word_overlap: common_words.size, missing_words: expected.downcase.split - output.downcase.split}
+  }
+end
+```
+See [scorer_metadata.rb](./examples/eval/scorer_metadata.rb) for a full example.
+#### Multiple scores from one scorer
+When several scores can be computed together (e.g. in one LLM call), you can return an `Array` of score `Hash` instead of a single value. Each metric appears as a separate score column in the Braintrust UI:
+```ruby
+Braintrust::Scorer.new("summary_quality") do |output:, expected:|
+  words = output.downcase.split
+  key_terms = expected[:key_terms]
+  covered = key_terms.count { |t| words.include?(t) }
+  [
+    {name: "coverage", score: covered.to_f / key_terms.size, metadata: {missing: key_terms - words}},
+    {name: "conciseness", score: words.size <= expected[:max_words] ? 1.0 : 0.0}
+  ]
+end
+```
+`name` and `score` are required, `metadata` is optional.
+See [multi_score.rb](./examples/eval/multi_score.rb) for a full example.
 #### Trace scoring
 Scorers can access the full evaluation trace (all spans generated by the task) by declaring a `trace:` keyword parameter. This is useful for inspecting intermediate LLM calls, validating tool usage, or checking the message thread:
@@ -344,11 +431,28 @@ Braintrust::Eval.run(
 )
 ```
-See examples: [eval.rb](./examples/eval.rb), [dataset.rb](./examples/eval/dataset.rb), [remote_functions.rb](./examples/eval/remote_functions.rb), [trace_scoring.rb](./examples/eval/trace_scoring.rb)
+See [trace_scoring.rb](./examples/eval/trace_scoring.rb) for a full example.
+#### Scorer parameters
+Scorers can also accept `parameters:` to use runtime configuration in their scoring logic. Like tasks, scorers that don't declare `parameters:` are unaffected:
+```ruby
+Braintrust::Scorer.new("threshold_match") do |expected:, output:, parameters:|
+  threshold = parameters["threshold"] || 0.8
+  similarity(output, expected) >= threshold ? 1.0 : 0.0
+end
+```
+See [parameters.rb](./examples/eval/parameters.rb) for a full example.
 ### Dev Server
-Run evaluations from the Braintrust web UI against code in your own application. Define evaluators, pass them to the dev server, and start serving:
+Run evaluations from the Braintrust web UI against code in your own application.
+#### Run as a Rack app
+Define evaluators, pass them to the dev server, and start serving:
 ```ruby
 # eval_server.ru
@@ -374,10 +478,21 @@ run Braintrust::Server::Rack.app(
 )
 ```
+Add your Rack server to your Gemfile:
+```ruby
+gem "rack"
+gem "puma" # recommended
+```
+Then start the server:
 ```bash
 bundle exec rackup eval_server.ru -p 8300 -o 0.0.0.0
 ```
+See example: [server/eval.ru](./examples/server/eval.ru)
 **Custom evaluators**
 Evaluators can also be defined as subclasses:
@@ -394,6 +509,51 @@ class FoodClassifier < Braintrust::Eval::Evaluator
 end
 ```
+#### Run as a Rails engine
+Use the Rails engine when your evaluators live inside an existing Rails app and you want to mount the Braintrust eval server into that application.
+Define each evaluator in its own file, for example under `app/evaluators/`:
+```ruby
+# app/evaluators/food_classifier.rb
+class FoodClassifier < Braintrust::Eval::Evaluator
+  def task
+    ->(input:) { classify(input) }
+  end
+  def scorers
+    [Braintrust::Scorer.new("exact_match") { |expected:, output:| output == expected ? 1.0 : 0.0 }]
+  end
+end
+```
+Then generate the Braintrust initializer:
+```bash
+bin/rails generate braintrust:eval_server
+```
+```ruby
+# config/routes.rb
+Rails.application.routes.draw do
+  mount Braintrust::Contrib::Rails::Engine, at: "/braintrust"
+end
+```
+The generator writes `config/initializers/braintrust_server.rb`, where you can review or customize the slug-to-evaluator mapping it discovers from `app/evaluators/**/*.rb` and `evaluators/**/*.rb`.
+See example: [contrib/rails/eval.rb](./examples/contrib/rails/eval.rb)
+**Developing locally**
+If you want to skip authentication on incoming eval requests while developing locally:
+- **For Rack**: Pass `auth: :none` to `Braintrust::Server::Rack.app(...)`
+- **For Rails**: Set `config.auth = :none` in `config/initializers/braintrust_server.rb`
+*NOTE: Setting `:none` disables authentication on incoming requests into your server; executing evals requires a `BRAINTRUST_API_KEY` to fetch resources.*
 **Supported web servers**
 The dev server requires the `rack` gem and a Rack-compatible web server.
@@ -405,14 +565,7 @@ The dev server requires the `rack` gem and a Rack-compatible web server.
 | [Passenger](https://www.phusionpassenger.com/) | 6.x               |                                      |
 | [WEBrick](https://github.com/ruby/webrick)     | Not supported     | Does not support server-sent events. |
-Add your chosen server to your Gemfile:
-```ruby
-gem "rack"
-gem "puma" # recommended
-```
-See example: [server/eval.ru](./examples/server/eval.ru)
+See examples: [server/eval.ru](./examples/server/eval.ru),
 ## Documentation

data/lib/braintrust/api/functions.rb CHANGED Viewed

@@ -25,13 +25,15 @@ module Braintrust
       # List functions with optional filters
       # GET /v1/function?project_name=X&...
       # @param project_name [String, nil] Filter by project name
+      # @param project_id [String, nil] Filter by project ID (UUID)
       # @param function_name [String, nil] Filter by function name
       # @param slug [String, nil] Filter by slug
       # @param limit [Integer, nil] Limit number of results
       # @return [Hash] Response with "objects" array
-      def list(project_name: nil, function_name: nil, slug: nil, limit: nil)
+      def list(project_name: nil, project_id: nil, function_name: nil, slug: nil, limit: nil)
         params = {}
         params["project_name"] = project_name if project_name
+        params["project_id"] = project_id if project_id
         params["function_name"] = function_name if function_name
         params["slug"] = slug if slug
         params["limit"] = limit if limit

data/lib/braintrust/api/internal/btql.rb CHANGED Viewed

@@ -11,19 +11,6 @@ module Braintrust
       # Internal BTQL client for querying spans.
       # Not part of the public API — instantiated directly where needed.
       class BTQL
-        # Maximum number of retries before returning partial results.
-        # Covers both freshness lag (partially indexed) and ingestion lag
-        # (spans not yet visible to BTQL after OTel flush).
-        MAX_FRESHNESS_RETRIES = 7
-        # Base delay (seconds) between retries (doubles each attempt, capped).
-        FRESHNESS_BASE_DELAY = 1.0
-        # Maximum delay (seconds) between retries. Caps exponential growth
-        # so we keep polling at a reasonable rate in the later window.
-        # Schedule: 1, 2, 4, 8, 8, 8, 8 = ~39s total worst-case.
-        MAX_FRESHNESS_DELAY = 8.0
         def initialize(state)
           @state = state
         end
@@ -31,36 +18,19 @@ module Braintrust
         # Query spans belonging to a specific trace within an object.
         #
         # Builds a BTQL SQL query that matches the root_span_id and excludes scorer spans.
-        # Retries with exponential backoff if the response indicates data is not yet fresh.
+        # Returns a single-shot result; callers are responsible for retry and error handling.
         #
         # @param object_type [String] e.g. "experiment"
         # @param object_id [String] Object UUID
         # @param root_span_id [String] Hex trace ID of the root span
-        # @return [Array<Hash>] Parsed span data
+        # @return [Array(Array<Hash>, String)] [rows, freshness]
         def trace_spans(object_type:, object_id:, root_span_id:)
           query = build_trace_query(
             object_type: object_type,
             object_id: object_id,
             root_span_id: root_span_id
           )
-          payload = {query: query, fmt: "jsonl"}
-          retries = 0
-          loop do
-            rows, freshness = execute_query(payload)
-            # Return when data is fresh AND non-empty, or we've exhausted retries.
-            # We retry on empty even when "complete" because there is ingestion lag
-            # between OTel flush and BTQL indexing — the server may report "complete"
-            # before it knows about newly-flushed spans.
-            return rows if (freshness == "complete" && !rows.empty?) || retries >= MAX_FRESHNESS_RETRIES
-            retries += 1
-            delay = [FRESHNESS_BASE_DELAY * (2**(retries - 1)), MAX_FRESHNESS_DELAY].min
-            sleep(delay)
-          end
-        rescue => e
-          Braintrust::Log.warn("[BTQL] Query failed: #{e.message}")
-          []
+          execute_query(query: query, fmt: "jsonl")
         end
         private

data/lib/braintrust/contrib/rails/server/application_controller.rb ADDED Viewed

@@ -0,0 +1,34 @@
+# frozen_string_literal: true
+module Braintrust
+  module Contrib
+    module Rails
+      module Server
+        class ApplicationController < ActionController::API
+          before_action :authenticate!
+          private
+          def authenticate!
+            auth_result = Engine.auth_strategy.authenticate(request.env)
+            unless auth_result
+              render json: {"error" => "Unauthorized"}, status: :unauthorized
+              return
+            end
+            request.env["braintrust.auth"] = auth_result
+            @braintrust_auth = auth_result
+          end
+          def parse_json_body
+            body = request.body.read
+            return nil if body.nil? || body.empty?
+            JSON.parse(body)
+          rescue JSON::ParserError
+            nil
+          end
+        end
+      end
+    end
+  end
+end

data/lib/braintrust/contrib/rails/server/engine.rb ADDED Viewed

@@ -0,0 +1,72 @@
+# frozen_string_literal: true
+module Braintrust
+  module Contrib
+    module Rails
+      module Server
+        class Engine < ::Rails::Engine
+          isolate_namespace Braintrust::Contrib::Rails::Server
+          config.evaluators = {}
+          config.auth = :clerk_token
+          # Register the engine's routes file so Rails loads it during initialization.
+          paths["config/routes.rb"] << File.expand_path("routes.rb", __dir__)
+          initializer "braintrust.server.cors" do |app|
+            app.middleware.use Braintrust::Server::Middleware::Cors
+          end
+          # Class-level helpers that read from engine config.
+          def self.evaluators
+            config.evaluators
+          end
+          def self.auth_strategy
+            resolve_auth(config.auth)
+          end
+          def self.list_service
+            Braintrust::Server::Services::List.new(-> { config.evaluators })
+          end
+          # Long-lived so the state cache persists across requests.
+          def self.eval_service
+            @eval_service ||= Braintrust::Server::Services::Eval.new(-> { config.evaluators })
+          end
+          # Support the explicit `|config|` style used by this integration while
+          # still delegating zero-arity DSL blocks to Rails' native implementation.
+          def self.configure(&block)
+            return super if block&.arity == 0
+            yield config if block
+          end
+          def self.resolve_auth(auth)
+            case auth
+            when :none
+              Braintrust::Server::Auth::NoAuth.new
+            when :clerk_token
+              Braintrust::Server::Auth::ClerkToken.new
+            when Symbol, String
+              raise ArgumentError, "Unknown auth strategy #{auth.inspect}. Expected :none, :clerk_token, or an auth object."
+            else
+              auth
+            end
+          end
+          private_class_method :resolve_auth
+          generators do
+            require "braintrust/contrib/rails/server/generator"
+          end
+        end
+      end
+    end
+  end
+end
+require_relative "application_controller"
+require_relative "health_controller"
+require_relative "list_controller"
+require_relative "eval_controller"

data/lib/braintrust/contrib/rails/server/eval_controller.rb ADDED Viewed

@@ -0,0 +1,36 @@
+# frozen_string_literal: true
+module Braintrust
+  module Contrib
+    module Rails
+      module Server
+        class EvalController < ApplicationController
+          include ActionController::Live
+          def create
+            body = parse_json_body
+            unless body
+              render json: {"error" => "Invalid JSON body"}, status: :bad_request
+              return
+            end
+            result = Engine.eval_service.validate(body)
+            if result[:error]
+              render json: {"error" => result[:error]}, status: result[:status]
+              return
+            end
+            response.headers["Content-Type"] = "text/event-stream"
+            response.headers["Cache-Control"] = "no-cache"
+            response.headers["Connection"] = "keep-alive"
+            sse = Braintrust::Server::SSEWriter.new { |chunk| response.stream.write(chunk) }
+            Engine.eval_service.stream(result, auth: @braintrust_auth, sse: sse)
+          ensure
+            response.stream.close
+          end
+        end
+      end
+    end
+  end
+end

data/lib/braintrust/contrib/rails/server/generator.rb ADDED Viewed

@@ -0,0 +1,43 @@
+# frozen_string_literal: true
+require "rails/generators"
+module Braintrust
+  module Contrib
+    module Rails
+      module Server
+        module Generators
+          class ServerGenerator < ::Rails::Generators::Base
+            namespace "braintrust:server"
+            source_root File.expand_path("templates", __dir__)
+            def create_initializer
+              @evaluators = discovered_evaluators
+              template "initializer.rb.tt", "config/initializers/braintrust_server.rb"
+            end
+            private
+            def discovered_evaluators
+              evaluator_roots.flat_map do |root|
+                Dir[File.join(destination_root, root, "**/*.rb")].sort.map do |file|
+                  relative_path = file.delete_prefix("#{File.join(destination_root, root)}/").sub(/\.rb\z/, "")
+                  {
+                    class_name: relative_path.split("/").map(&:camelize).join("::"),
+                    slug: relative_path.tr("/", "-").tr("_", "-")
+                  }
+                end
+              end
+            end
+            def evaluator_roots
+              %w[app/evaluators evaluators].select do |root|
+                Dir.exist?(File.join(destination_root, root))
+              end
+            end
+          end
+        end
+      end
+    end
+  end
+end

data/lib/braintrust/contrib/rails/server/health_controller.rb ADDED Viewed

@@ -0,0 +1,15 @@
+# frozen_string_literal: true
+module Braintrust
+  module Contrib
+    module Rails
+      module Server
+        class HealthController < ApplicationController
+          def show
+            render json: {"status" => "ok"}
+          end
+        end
+      end
+    end
+  end
+end

data/lib/braintrust/contrib/rails/server/list_controller.rb ADDED Viewed

@@ -0,0 +1,16 @@
+# frozen_string_literal: true
+module Braintrust
+  module Contrib
+    module Rails
+      module Server
+        class ListController < ApplicationController
+          def show
+            result = Engine.list_service.call
+            render json: result
+          end
+        end
+      end
+    end
+  end
+end

data/lib/braintrust/contrib/rails/server/routes.rb ADDED Viewed

@@ -0,0 +1,8 @@
+# frozen_string_literal: true
+Braintrust::Contrib::Rails::Server::Engine.routes.draw do
+  get "/", to: "health#show"
+  get "/list", to: "list#show"
+  post "/list", to: "list#show"
+  post "/eval", to: "eval#create"
+end

data/lib/braintrust/contrib/rails/server.rb ADDED Viewed

@@ -0,0 +1,20 @@
+# frozen_string_literal: true
+begin
+  require "action_controller"
+  require "rails/engine"
+rescue LoadError
+  raise LoadError,
+    "Rails (actionpack + railties) is required for the Braintrust Rails server engine. " \
+    "Add `gem 'rails'` or `gem 'actionpack'` and `gem 'railties'` to your Gemfile."
+end
+require "json"
+require_relative "../../eval"
+require_relative "../../server/sse"
+require_relative "../../server/auth/no_auth"
+require_relative "../../server/auth/clerk_token"
+require_relative "../../server/middleware/cors"
+require_relative "../../server/services/list_service"
+require_relative "../../server/services/eval_service"
+require_relative "server/engine"