RubyGems - langsmith-sdk - Versions diffs - 0.3.2 → 0.4.0 - Mend

langsmith-sdk 0.3.2 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +14 -1
data/README.md +63 -2
data/langsmith.gemspec +38 -0
data/lib/langsmith/context.rb +18 -1
data/lib/langsmith/evaluation/experiment_runner.rb +27 -16
data/lib/langsmith/evaluation.rb +3 -1
data/lib/langsmith/run_tree.rb +4 -1
data/lib/langsmith/version.rb +1 -1
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eff170a34b29bf7c78f0f7cc81f3634aefff1e0b33c50cc88d37568e9da2c24a
-  data.tar.gz: 13c343e6387ad70f82c9d1f85f3dd07f5d0a5aa09a6cb2f35b873b28c59520b6
+  metadata.gz: 60f8521d25bfd486266a608d4ef371ca88966598d65efb62f1ee0d16934e7589
+  data.tar.gz: 8fcba710ec2d8d5eb8a017baba74b419cddf1ab927343bd3a17afc2430fe8d95
 SHA512:
-  metadata.gz: f5a525d017355d9a0aa320a5d1dd4f64893404f7856bd640df61d1aa12d868edefc2e8f29891512f88a6a8c838e0869be462edbcf956d5b8656856e73b90f427
-  data.tar.gz: 87476c11dd5789b1460e0ff8d386fa6e843c0c3073253eb2de2f96fb3c8baf6c58aa2b148b51ebc0797a2e206e41398c3a595c331ce586c67f8dbcffc47bf0d2
+  metadata.gz: 13851df9e6cabd412d88cd45919ba0d8168bc54d9f3168de09c4c9373cf385fd44d96d7b26d949969eb0e5bcae99036a5604e9e7283ede390b31dd875f9d7f2c
+  data.tar.gz: 4a46c7321921dfcbcb2d7a7828e13d51d9f4c66984903562f265acca9dfde452f97a4ab375d0a5c0587fb4a6fda536655777494cfae1e13b2697f2d048e398ad

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [0.4.0] - 2026-02-17
+### Added
+- Multi-tenant evaluation support with `tenant_id` parameter in `ExperimentRunner`
+- Context tracking for evaluation root run tenant ID
+- Tenant ID propagation to dataset, experiment, and feedback API calls
+### Changed
+- Improved experiment cleanup with ensure block in `ExperimentRunner#run`
 ## [0.3.2] - 2026-02-11
 ### Fixed
@@ -89,7 +101,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `prompt` - Prompt template rendering
 - `parser` - Output parsing operations
-[Unreleased]: https://github.com/felipekb/langsmith-ruby-sdk/compare/v0.3.2...HEAD
+[Unreleased]: https://github.com/felipekb/langsmith-ruby-sdk/compare/v0.4.0...HEAD
+[0.4.0]: https://github.com/felipekb/langsmith-ruby-sdk/compare/v0.3.2...v0.4.0
 [0.3.2]: https://github.com/felipekb/langsmith-ruby-sdk/compare/v0.3.1...v0.3.2
 [0.3.1]: https://github.com/felipekb/langsmith-ruby-sdk/compare/v0.3.0...v0.3.1
 [0.3.0]: https://github.com/felipekb/langsmith-ruby-sdk/compare/v0.2.0...v0.3.0

data/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # LangSmith Ruby SDK
-A Ruby SDK for [LangSmith](https://smith.langchain.com/) tracing and observability.
+A Ruby SDK for [LangSmith](https://smith.langchain.com/) tracing, experiments, and evaluations.
 ## Installation
@@ -164,6 +164,67 @@ Langsmith.trace("openai_call", run_type: "llm") do |run|
 end
 ```
+## Evaluations (Datasets + Experiments)
+Run your app against a LangSmith dataset and attach evaluator feedback to each traced example run:
+```ruby
+require "langsmith"
+summary = Langsmith::Evaluation.run(
+  dataset_id: "dataset-uuid",
+  experiment_name: "qa-baseline-v1",
+  description: "First baseline on FAQ dataset",
+  metadata: { model: "gpt-4", prompt_version: 3 },
+  evaluators: {
+    correctness: lambda { |outputs:, reference_outputs:, inputs:, run:|
+      predicted = outputs[:answer].to_s.strip.downcase
+      expected = reference_outputs[:answer].to_s.strip.downcase
+      {
+        score: predicted == expected ? 1.0 : 0.0,
+        value: predicted,
+        comment: "question=#{inputs[:question]} run_id=#{run[:id]}"
+      }
+    },
+    has_answer: ->(outputs:, **) { outputs[:answer].to_s.empty? ? 0.0 : 1.0 }
+  }
+) do |example|
+  # Wrap each dataset example in a trace so feedback can attach to the run.
+  Langsmith.trace("qa_inference", run_type: "chain", inputs: example[:inputs]) do
+    answer = MyApp.answer(example[:inputs][:question])
+    { answer: answer }
+  end
+end
+pp summary
+```
+### Evaluator Contract
+Each evaluator receives keyword arguments:
+- `outputs:` your block return value
+- `reference_outputs:` `example[:outputs]` from the dataset
+- `inputs:` `example[:inputs]` from the dataset
+- `run:` the LangSmith run hash for the traced example
+Evaluator return values:
+- `Numeric` -> used as `score`
+- `true` / `false` -> converted to `1.0` / `0.0`
+- `Hash` -> expected keys: `:score`, `:value`, `:comment`
+- `nil` -> skip feedback creation for that evaluator
+If one evaluator raises, the others still run. If your example block raises, the example is marked failed and the experiment continues.
+### Evaluation Summary
+`Langsmith::Evaluation.run` returns:
+- `:experiment_id`
+- `:total`
+- `:succeeded`
+- `:failed`
+- `:results` (per-example `:example_id`, `:run_id`, `:status`, `:error`, `:feedback`)
 ## Examples
 See [`examples/LLM_TRACING.md`](examples/LLM_TRACING.md) for comprehensive examples including:
@@ -175,6 +236,7 @@ See [`examples/LLM_TRACING.md`](examples/LLM_TRACING.md) for comprehensive examp
 - Error handling and retries
 - Multi-tenant tracing
 - Per-trace project overrides
+- Dataset experiments and evaluations (see section above)
 ## Development
@@ -183,4 +245,3 @@ After checking out the repo, run `bundle install` to install dependencies. Then,
 ## License
 The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

data/langsmith.gemspec ADDED Viewed

@@ -0,0 +1,38 @@
+# frozen_string_literal: true
+require_relative "lib/langsmith/version"
+Gem::Specification.new do |spec|
+  spec.name = "langsmith-sdk"
+  spec.version = Langsmith::VERSION
+  spec.authors = ["Felipe Cabezudo"]
+  spec.email = ["felipecabedilo@gmail.com"]
+  spec.summary = "Ruby SDK for LangSmith tracing and observability"
+  spec.description = "A Ruby client for LangSmith, providing tracing and observability for LLM applications"
+  spec.homepage = "https://github.com/felipekb/langsmith-ruby-sdk"
+  spec.license = "MIT"
+  spec.required_ruby_version = ">= 3.1.0"
+  spec.metadata["allowed_push_host"] = "https://rubygems.org"
+  spec.metadata["homepage_uri"] = spec.homepage
+  spec.metadata["source_code_uri"] = spec.homepage
+  spec.metadata["changelog_uri"] = "#{spec.homepage}/blob/main/CHANGELOG.md"
+  spec.metadata["rubygems_mfa_required"] = "true"
+  spec.files = Dir.chdir(__dir__) do
+    `git ls-files -z`.split("\x0").reject do |f|
+      (File.expand_path(f) == __FILE__) ||
+        f.start_with?(*%w[bin/ test/ spec/ features/ .git .github appveyor Gemfile])
+    end
+  end
+  spec.bindir = "exe"
+  spec.executables = spec.files.grep(%r{\Aexe/}) { |f| File.basename(f) }
+  spec.require_paths = ["lib"]
+  # Runtime dependencies
+  spec.add_dependency "concurrent-ruby", ">= 1.1", "< 3.0"
+  spec.add_dependency "faraday", "~> 2.0"
+  spec.add_dependency "faraday-net_http_persistent", "~> 2.0"
+  spec.add_dependency "faraday-retry", "~> 2.0"
+end

data/lib/langsmith/context.rb CHANGED Viewed

@@ -14,7 +14,9 @@ module Langsmith
     CONTEXT_KEY = :langsmith_run_stack
     EVALUATION_CONTEXT_KEY = :langsmith_evaluation_context
     EVALUATION_ROOT_RUN_ID_KEY = :langsmith_evaluation_root_run_id
-    private_constant :CONTEXT_KEY, :EVALUATION_CONTEXT_KEY, :EVALUATION_ROOT_RUN_ID_KEY
+    EVALUATION_ROOT_RUN_TENANT_ID_KEY = :langsmith_evaluation_root_run_tenant_id
+    private_constant :CONTEXT_KEY, :EVALUATION_CONTEXT_KEY, :EVALUATION_ROOT_RUN_ID_KEY,
+                     :EVALUATION_ROOT_RUN_TENANT_ID_KEY
     class << self
       # Returns the current run stack for this thread.
@@ -56,6 +58,7 @@ module Langsmith
         Thread.current[CONTEXT_KEY] = []
         Thread.current[EVALUATION_CONTEXT_KEY] = nil
         Thread.current[EVALUATION_ROOT_RUN_ID_KEY] = nil
+        Thread.current[EVALUATION_ROOT_RUN_TENANT_ID_KEY] = nil
       end
       # Check if there's an active trace context
@@ -99,6 +102,19 @@ module Langsmith
         Thread.current[EVALUATION_ROOT_RUN_ID_KEY]
       end
+      # Stores the root run tenant ID for the current evaluation example.
+      #
+      # @param tenant_id [String, nil] the root run's tenant ID
+      def set_evaluation_root_run_tenant_id(tenant_id)
+        Thread.current[EVALUATION_ROOT_RUN_TENANT_ID_KEY] = tenant_id
+      end
+      # Returns the root run tenant ID for the current evaluation example, or nil.
+      # @return [String, nil]
+      def evaluation_root_run_tenant_id
+        Thread.current[EVALUATION_ROOT_RUN_TENANT_ID_KEY]
+      end
       # Execute a block with evaluation context set.
       # Context is cleared in ensure block even if the block raises.
       #
@@ -110,6 +126,7 @@ module Langsmith
       ensure
         Thread.current[EVALUATION_CONTEXT_KEY] = nil
         Thread.current[EVALUATION_ROOT_RUN_ID_KEY] = nil
+        Thread.current[EVALUATION_ROOT_RUN_TENANT_ID_KEY] = nil
       end
     end
   end

data/lib/langsmith/evaluation/experiment_runner.rb CHANGED Viewed

@@ -13,13 +13,16 @@ module Langsmith
       # @param description [String, nil] optional experiment description
       # @param metadata [Hash, nil] optional experiment metadata
       # @param evaluators [Hash] map of evaluator key to callable
+      # @param tenant_id [String, nil] tenant ID for dataset/session/feedback API calls
       # @param block [Proc] block that receives each example and produces a result
-      def initialize(dataset_id:, experiment_name:, description: nil, metadata: nil, evaluators: {}, &block)
+      def initialize(dataset_id:, experiment_name:, description: nil, metadata: nil, evaluators: {}, tenant_id: nil,
+                     &block)
         @dataset_id = dataset_id
         @experiment_name = experiment_name
         @description = description
         @metadata = metadata
         @evaluators = evaluators
+        @tenant_id = tenant_id
         @block = block
       end
@@ -27,22 +30,22 @@ module Langsmith
       #
       # @return [Hash] summary with :experiment_id, :total, :succeeded, :failed, :results
       def run
-        examples = client.list_examples(dataset_id: @dataset_id)
+        experiment_id = nil
+        examples = client.list_examples(dataset_id: @dataset_id, tenant_id: @tenant_id)
         experiment = client.create_experiment(
           name: @experiment_name,
           dataset_id: @dataset_id,
           description: @description,
-          metadata: @metadata
+          metadata: @metadata,
+          tenant_id: @tenant_id
         )
         experiment_id = experiment[:id]
         results = examples.map { |example| run_example(example, experiment_id) }
-        Langsmith.flush
-        client.close_experiment(experiment_id: experiment_id, end_time: Time.now.utc.iso8601)
         build_summary(experiment_id, results)
+      ensure
+        close_experiment(experiment_id) if experiment_id
       end
       private
@@ -54,45 +57,53 @@ module Langsmith
       def run_example(example, experiment_id)
         outputs = nil
         run_id = nil
+        run_tenant_id = nil
         begin
           Context.with_evaluation(experiment_id: experiment_id, example_id: example[:id]) do
             outputs = @block.call(example)
             run_id = Context.evaluation_root_run_id
+            run_tenant_id = Context.evaluation_root_run_tenant_id
           end
         rescue StandardError => e
           return { example_id: example[:id], run_id: nil, status: :error, error: e.message, feedback: nil }
         end
-        feedback = run_evaluators(example, outputs, run_id)
+        feedback = run_evaluators(example, outputs, run_id, run_tenant_id)
         { example_id: example[:id], run_id: run_id, status: :success, error: nil, feedback: feedback }
       rescue StandardError => e
         { example_id: example[:id], run_id: run_id, status: :success, error: e.message, feedback: nil }
       end
-      def run_evaluators(example, outputs, run_id)
+      def run_evaluators(example, outputs, run_id, run_tenant_id)
         return nil if @evaluators.empty? || run_id.nil?
+        tenant_id = run_tenant_id || @tenant_id
         Langsmith.flush
-        run = fetch_run_with_retry(run_id)
+        run = fetch_run_with_retry(run_id, tenant_id: tenant_id)
         @evaluators.each_with_object({}) do |(key, evaluator), feedback|
-          feedback[key] = execute_evaluator(key, evaluator, example, outputs, run_id, run)
+          feedback[key] = execute_evaluator(key, evaluator, example, outputs, run_id, run, tenant_id)
         end
       end
       # LangSmith has indexing lag after batch ingest — the run may not be
       # queryable immediately. Retry a few times with a short delay.
-      def fetch_run_with_retry(run_id, retries: 3, delay: 1)
-        client.read_run(run_id: run_id)
+      def fetch_run_with_retry(run_id, tenant_id:, retries: 3, delay: 1)
+        client.read_run(run_id: run_id, tenant_id: tenant_id)
       rescue Client::APIError => e
         raise unless e.status_code == 404 && retries.positive?
         sleep(delay)
-        fetch_run_with_retry(run_id, retries: retries - 1, delay: delay)
+        fetch_run_with_retry(run_id, tenant_id: tenant_id, retries: retries - 1, delay: delay)
+      end
+      def close_experiment(experiment_id)
+        Langsmith.flush
+        client.close_experiment(experiment_id: experiment_id, end_time: Time.now.utc.iso8601, tenant_id: @tenant_id)
       end
-      def execute_evaluator(key, evaluator, example, outputs, run_id, run)
+      def execute_evaluator(key, evaluator, example, outputs, run_id, run, tenant_id)
         result = evaluator.call(
           outputs: outputs,
           reference_outputs: example[:outputs],
@@ -102,7 +113,7 @@ module Langsmith
         return { score: nil, success: true, skipped: true } if result.nil?
         normalized = normalize_result(result)
-        client.create_feedback(run_id: run_id, key: key.to_s, **normalized)
+        client.create_feedback(run_id: run_id, key: key.to_s, tenant_id: tenant_id, **normalized)
         normalized.merge(success: true)
       rescue StandardError => e
         { score: nil, success: false, error: e.message }

data/lib/langsmith/evaluation.rb CHANGED Viewed

@@ -22,15 +22,17 @@ module Langsmith
     # @param description [String, nil] optional experiment description
     # @param metadata [Hash, nil] optional experiment metadata
     # @param evaluators [Hash] map of evaluator key to callable (see ExperimentRunner)
+    # @param tenant_id [String, nil] tenant ID for dataset/session/feedback API calls
     # @yield [Hash] each dataset example
     # @return [Hash] summary with :experiment_id, :total, :succeeded, :failed, :results
-    def self.run(dataset_id:, experiment_name:, description: nil, metadata: nil, evaluators: {}, &block)
+    def self.run(dataset_id:, experiment_name:, description: nil, metadata: nil, evaluators: {}, tenant_id: nil, &block)
       ExperimentRunner.new(
         dataset_id: dataset_id,
         experiment_name: experiment_name,
         description: description,
         metadata: metadata,
         evaluators: evaluators,
+        tenant_id: tenant_id,
         &block
       ).run
     end

data/lib/langsmith/run_tree.rb CHANGED Viewed

@@ -140,7 +140,10 @@ module Langsmith
     # attach feedback to it later. Only root runs (no parent) register;
     # child runs must not overwrite.
     def register_evaluation_root_run(effective_parent_id)
-      Context.set_evaluation_root_run_id(@run.id) if effective_parent_id.nil? && Context.evaluating?
+      return unless effective_parent_id.nil? && Context.evaluating?
+      Context.set_evaluation_root_run_id(@run.id)
+      Context.set_evaluation_root_run_tenant_id(@run.tenant_id)
     end
     # Sanitize block results to prevent circular references.

data/lib/langsmith/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Langsmith
-  VERSION = "0.3.2"
+  VERSION = "0.4.0"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: langsmith-sdk
 version: !ruby/object:Gem::Version
-  version: 0.3.2
+  version: 0.4.0
 platform: ruby
 authors:
 - Felipe Cabezudo
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2026-02-11 00:00:00.000000000 Z
+date: 2026-02-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: concurrent-ruby
@@ -91,6 +91,7 @@ files:
 - examples/complex_agent.rb
 - examples/llm_tracing.rb
 - examples/openai_integration.rb
+- langsmith.gemspec
 - lib/langsmith.rb
 - lib/langsmith/batch_processor.rb
 - lib/langsmith/client.rb