RubyGems - active_genie - Versions diffs - 0.0.12 → 0.0.19 - Mend

active_genie 0.0.12 → 0.0.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

checksums.yaml +4 -4
data/README.md +65 -22
data/VERSION +1 -1
data/lib/active_genie/battle/README.md +7 -7
data/lib/active_genie/battle/basic.rb +48 -32
data/lib/active_genie/battle.rb +4 -0
data/lib/active_genie/clients/anthropic_client.rb +84 -0
data/lib/active_genie/clients/base_client.rb +241 -0
data/lib/active_genie/clients/google_client.rb +135 -0
data/lib/active_genie/clients/helpers/retry.rb +29 -0
data/lib/active_genie/clients/openai_client.rb +70 -91
data/lib/active_genie/clients/unified_client.rb +4 -4
data/lib/active_genie/concerns/loggable.rb +44 -0
data/lib/active_genie/configuration/log_config.rb +1 -1
data/lib/active_genie/configuration/providers/anthropic_config.rb +54 -0
data/lib/active_genie/configuration/providers/base_config.rb +85 -0
data/lib/active_genie/configuration/providers/deepseek_config.rb +54 -0
data/lib/active_genie/configuration/providers/google_config.rb +56 -0
data/lib/active_genie/configuration/providers/openai_config.rb +54 -0
data/lib/active_genie/configuration/providers_config.rb +7 -4
data/lib/active_genie/configuration/runtime_config.rb +35 -0
data/lib/active_genie/configuration.rb +18 -4
data/lib/active_genie/data_extractor/basic.rb +16 -3
data/lib/active_genie/data_extractor.rb +4 -0
data/lib/active_genie/logger.rb +40 -21
data/lib/active_genie/ranking/elo_round.rb +71 -50
data/lib/active_genie/ranking/free_for_all.rb +31 -14
data/lib/active_genie/ranking/player.rb +11 -16
data/lib/active_genie/ranking/players_collection.rb +4 -4
data/lib/active_genie/ranking/ranking.rb +74 -19
data/lib/active_genie/ranking/ranking_scoring.rb +3 -3
data/lib/active_genie/scoring/basic.rb +44 -25
data/lib/active_genie/scoring/recommended_reviewers.rb +1 -1
data/lib/active_genie/scoring.rb +3 -0
data/lib/tasks/benchmark.rake +27 -0
metadata +92 -70
data/lib/active_genie/configuration/openai_config.rb +0 -56

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: fbc4033f3c0880973cf732d704921bb61c3cb861c438b732aa6ebe0cc88b2de4
-  data.tar.gz: 11794293170ec43a1d3b3d09e5ec488351a22428a5370ae0e42266a6f8098646
+  metadata.gz: 12c5c526a10f93e649ca39c4789a21065c0f2d329bd57248260b1fd997507296
+  data.tar.gz: 6fc9074a5282f1b9759c41dd98aaa31a179bc495964839e8bfc42e891b15e7d2
 SHA512:
-  metadata.gz: f6b13b3f36a8e516e5126d4cea0de52d046e498834cd10b90b1905460e9bba5ff6e7c4cebd58e6ea2c073176566601c8b806015baa01f11f28792942eec6ca1d
-  data.tar.gz: fc79aed12aaba0ca335d3a7ef8405f4a7c2809de9d5e41ca14fa7bd843d9fb69db8e124b6a2e96321432575a58b271c724a52fd3c1a1165c5ecfe346bedd4b5f
+  metadata.gz: 76672044e7a1a88779100b9b0f32d75547ac1105031f5ca40d6bbce71a34edecadb63e0dda99c6ca23ee03647bf5dd09a3b3bc98a38faad3a07fdbc245f87dc4
+  data.tar.gz: 2ac5ed0ae70edbf7a736e7b7bc3408a16c76628c7efecb277291641bb8b04e8ea633a36c16dc0e86851bbf3c40d82c3f4456af0e7e3ea1beea057bb60660ad6a

data/README.md CHANGED Viewed

@@ -2,16 +2,10 @@
 > The lodash for GenAI, stop reinventing the wheel
 [![Gem Version](https://badge.fury.io/rb/active_genie.svg?icon=si%3Arubygems)](https://badge.fury.io/rb/active_genie)
-[![Ruby](https://github.com/roriz/active_genie/actions/workflows/ruby.yml/badge.svg)](https://github.com/roriz/active_genie/actions/workflows/ruby.yml)
+[![Ruby](https://github.com/roriz/active_genie/actions/workflows/benchmark.yml/badge.svg)](https://github.com/roriz/active_genie/actions/workflows/benchmark.yml)
-ActiveGenie is a Ruby gem that provides a polished, production-ready interface for working with Generative AI (GenAI) models. Just like Lodash or ActiveStorage, ActiveGenie simplifies GenAI integration in your Ruby applications.
-## Features
-- 🎯 **Data Extraction**: Extract structured data from unstructured text with type validation
-- 📊 **Data Scoring**: Multi-reviewer evaluation system
-- ⚔️ **Data Battle**: Battle between two data like a political debate
-- 💭 **Data Ranking**: Consistent rank data using scoring + elo ranking + battles
+ActiveGenie is a Ruby gem that provides valuable solutions powered by Generative AI (GenAI) models. Just like Lodash or ActiveStorage, ActiveGenie brings a set of Modules reach real value fast and reliable.
+ActiveGenie is backed by a custom benchmarking system that ensures consistent quality and performance across different models and providers in every release.
 ## Installation
@@ -41,6 +35,7 @@ end
 ## Quick Start
 ### Data Extractor
 Extract structured data from text using AI-powered analysis, handling informal language and complex expressions.
 ```ruby
@@ -55,13 +50,17 @@ schema = {
     minimum: 0
   },
   size: {
-    type: 'integer',
+    type: 'number',
     minimum: 35,
     maximum: 46
   }
 }
-result = ActiveGenie::DataExtractor.call(text, schema)
+result = ActiveGenie::DataExtractor.call(
+  text,
+  schema,
+  config: { provider: :openai, model: 'gpt-4o-mini' } # optional
+)
 # => {
 #      brand: "Nike",
 #      brand_explanation: "Brand name found at start of text",
@@ -72,6 +71,8 @@ result = ActiveGenie::DataExtractor.call(text, schema)
 #    }
 ```
+*Recommended model*: `gpt-4o-mini`
 Features:
 - Structured data extraction with type validation
 - Schema-based extraction with custom constraints
@@ -80,14 +81,18 @@ Features:
 See the [Data Extractor README](lib/active_genie/data_extractor/README.md) for informal text processing, advanced schemas, and detailed interface documentation.
-### Data Scoring
+### Scoring
 Text evaluation system that provides detailed scoring and feedback using multiple expert reviewers. Get balanced scoring through AI-powered expert reviewers that automatically adapt to your content.
 ```ruby
 text = "The code implements a binary search algorithm with O(log n) complexity"
 criteria = "Evaluate technical accuracy and clarity"
-result = ActiveGenie::Scoring.basic(text, criteria)
+result = ActiveGenie::Scoring.basic(
+  text,
+  criteria,
+  config: { provider: :anthropic, model: 'claude-3-5-haiku-20241022' } # optional
+)
 # => {
 #      algorithm_expert_score: 95,
 #      algorithm_expert_reasoning: "Accurately describes binary search and its complexity",
@@ -97,6 +102,8 @@ result = ActiveGenie::Scoring.basic(text, criteria)
 #    }
 ```
+*Recommended model*: `claude-3-5-haiku-20241022`
 Features:
 - Multi-reviewer evaluation with automatic expert selection
 - Detailed feedback with scoring reasoning
@@ -105,26 +112,33 @@ Features:
 See the [Scoring README](lib/active_genie/scoring/README.md) for advanced usage, custom reviewers, and detailed interface documentation.
-### Data Battle
+### Battle
 AI-powered battle evaluation system that determines winners between two players based on specified criteria.
 ```ruby
 require 'active_genie'
-player_a = "Implementation uses dependency injection for better testability"
-player_b = "Code has high test coverage but tightly coupled components"
+player_1 = "Implementation uses dependency injection for better testability"
+player_2 = "Code has high test coverage but tightly coupled components"
 criteria = "Evaluate code quality and maintainability"
-result = ActiveGenie::Battle.call(player_a, player_b, criteria)
+result = ActiveGenie::Battle.call(
+  player_1,
+  player_2,
+  criteria,
+  config: { provider: :google, model: 'gemini-2.0-flash-lite' } # optional
+)
 # => {
 #      winner_player: "Implementation uses dependency injection for better testability",
-#      reasoning: "Player A's implementation demonstrates better maintainability through dependency injection,
-#                 which allows for easier testing and component replacement. While Player B has good test coverage,
+#      reasoning: "Player 1 implementation demonstrates better maintainability through dependency injection,
+#                 which allows for easier testing and component replacement. While Player 2 has good test coverage,
 #                 the tight coupling makes the code harder to maintain and modify.",
 #      what_could_be_changed_to_avoid_draw: "Focus on specific architectural patterns and design principles"
 #    }
 ```
+*Recommended model*: `gemini-2.0-flash-lite`
 Features:
 - Multi-reviewer evaluation with automatic expert selection
 - Detailed feedback with scoring reasoning
@@ -133,23 +147,28 @@ Features:
 See the [Battle README](lib/active_genie/battle/README.md) for advanced usage, custom reviewers, and detailed interface documentation.
-### Data Ranking
+### Ranking
 The Ranking module provides competitive ranking through multi-stage evaluation:
 ```ruby
 require 'active_genie'
 players = ['REST API', 'GraphQL API', 'SOAP API', 'gRPC API', 'Websocket API']
 criteria = "Best one to be used into a high changing environment"
-result = ActiveGenie::Ranking.call(players, criteria)
+result = ActiveGenie::Ranking.call(
+  players,
+  criteria,
+  config: { provider: :google, model: 'gemini-2.0-flash-lite' } # optional
+)
 # => {
 #      winner_player: "gRPC API",
 #      reasoning: "gRPC API is the best one to be used into a high changing environment",
 #    }
 ```
+*Recommended model*: `gemini-2.0-flash-lite`
 - **Multi-phase ranking system** combining expert scoring and ELO algorithms
 - **Automatic elimination** of inconsistent performers using statistical analysis
 - **Dynamic ranking adjustments** based on simulated pairwise battles, from bottom to top
@@ -157,10 +176,34 @@ result = ActiveGenie::Ranking.call(players, criteria)
 See the [Ranking README](lib/active_genie/ranking/README.md) for implementation details, configuration, and advanced ranking strategies.
 ### Text Summarizer (Future)
+### Categorizer (Future)
 ### Language detector (Future)
 ### Translator (Future)
 ### Sentiment analyzer (Future)
+## Benchmarking 🧪
+ActiveGenie includes a comprehensive benchmarking system to ensure consistent, high-quality outputs across different LLM models and providers.
+```ruby
+# Run all benchmarks
+bundle exec rake active_genie:benchmark
+# Run benchmarks for a specific module
+bundle exec rake active_genie:benchmark[data_extractor]
+```
+### Latest Results
+| Model | Overall Precision |
+|-------|-------------------|
+| claude-3-5-haiku-20241022 | 92.25% |
+| gemini-2.0-flash-lite | 84.25% |
+| gpt-4o-mini | 62.75% |
+| deepseek-chat | 57.25% |
+See the [Benchmark README](benchmark/README.md) for detailed results, methodology, and how to contribute to our test suite.
 ## Configuration
 | Config | Description | Default |

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.0.12
1	+ 0.0.19

data/lib/active_genie/battle/README.md CHANGED Viewed

@@ -12,11 +12,11 @@ AI-powered battle evaluation system that determines winners between two players
 Evaluate a battle between two players with simple text content:
 ```ruby
-player_a = "Implementation uses dependency injection for better testability"
-player_b = "Code has high test coverage but tightly coupled components"
+player_1 = "Implementation uses dependency injection for better testability"
+player_2 = "Code has high test coverage but tightly coupled components"
 criteria = "Evaluate code quality and maintainability"
-result = ActiveGenie::Battle::Basic.call(player_a, player_b, criteria)
+result = ActiveGenie::Battle::Basic.call(player_1, player_2, criteria)
 # => {
 #      winner_player: "Implementation uses dependency injection for better testability",
 #      reasoning: "Player A's implementation demonstrates better maintainability through dependency injection,
@@ -27,13 +27,13 @@ result = ActiveGenie::Battle::Basic.call(player_a, player_b, criteria)
 ```
 ## Interface
-### Basic.call(player_a, player_b, criteria, config: {})
-- `player_a` [String, Hash] - The content or submission from the first player
-- `player_b` [String, Hash] - The content or submission from the second player
+### Basic.call(player_1, player_2, criteria, config: {})
+- `player_1` [String, Hash] - The content or submission from the first player
+- `player_2` [String, Hash] - The content or submission from the second player
 - `criteria` [String] - The evaluation criteria or rules to assess against
 - `config` [Hash] - Additional configuration config that modify the battle evaluation behavior
 Returns a Hash containing:
-- `winner_player` [String, Hash] - The winning player's content (either player_a or player_b)
+- `winner_player` [String, Hash] - The winning player's content (either player_1 or player_2)
 - `reasoning` [String] - Detailed explanation of why the winner was chosen
 - `what_could_be_changed_to_avoid_draw` [String] - A suggestion on how to avoid a draw

data/lib/active_genie/battle/basic.rb CHANGED Viewed

@@ -18,17 +18,17 @@ module ActiveGenie::Battle
       new(...).call
     end
-    # @param player_a [String] The content or submission from the first player
-    # @param player_b [String] The content or submission from the second player
+    # @param player_1 [String] The content or submission from the first player
+    # @param player_2 [String] The content or submission from the second player
     # @param criteria [String] The evaluation criteria or rules to assess against
-    # @param config [Hash] Additional configuration config that modify the battle evaluation behavior
+    # @param config [Hash] Additional configuration options that modify the battle evaluation behavior
     # @return [Hash] The evaluation result containing the winner and reasoning
-    #   @return [String] :winner The @param player_a or player_b
+    #   @return [String] :winner The winner, either player_1 or player_2
     #   @return [String] :reasoning Detailed explanation of why the winner was chosen
     #   @return [String] :what_could_be_changed_to_avoid_draw A suggestion on how to avoid a draw
-    def initialize(player_a, player_b, criteria, config: {})
-      @player_a = player_a
-      @player_b = player_b
+    def initialize(player_1, player_2, criteria, config: {})
+      @player_1 = player_1
+      @player_2 = player_2
       @criteria = criteria
       @config = ActiveGenie::Configuration.to_h(config)
     end
@@ -37,8 +37,8 @@ module ActiveGenie::Battle
       messages = [
         {  role: 'system', content: PROMPT },
         {  role: 'user', content: "criteria: #{@criteria}" },
-        {  role: 'user', content: "player_a: #{@player_a}" },
-        {  role: 'user', content: "player_b: #{@player_b}" },
+        {  role: 'user', content: "player_1: #{@player_1}" },
+        {  role: 'user', content: "player_2: #{@player_2}" },
       ]
       response = ::ActiveGenie::Clients::UnifiedClient.function_calling(
@@ -48,6 +48,15 @@ module ActiveGenie::Battle
         config: @config
       )
+      ActiveGenie::Logger.debug({
+        code: :battle,
+        player_1: @player_1[0..30],
+        player_2: @player_2[0..30],
+        criteria: @criteria[0..30],
+        winner: response['impartial_judge_winner'],
+        reasoning: response['impartial_judge_winner_reasoning']
+      })
       response_formatted(response)
     end
@@ -56,23 +65,22 @@ module ActiveGenie::Battle
     def response_formatted(response)
       winner = response['impartial_judge_winner']
       loser = case response['impartial_judge_winner']
-              when 'player_a' then 'player_b'
-              when 'player_b' then 'player_a'
-              else 'draw'
+              when 'player_1' then 'player_2'
+              when 'player_2' then 'player_1'
               end
-      { winner:, loser:, reasoning: response['impartial_judge_winner_reasoning'] }
+      { 'winner' => winner, 'loser' => loser, 'reasoning' => response['impartial_judge_winner_reasoning'] }
     end
     PROMPT = <<~PROMPT
-    Based on two players, player_a and player_b, they will battle against each other based on criteria. Criteria are vital as they provide a clear metric to compare the players. Follow these criteria strictly.
+    Based on two players, player_1 and player_2, they will battle against each other based on criteria. Criteria are vital as they provide a clear metric to compare the players. Follow these criteria strictly.
     # Steps
-    1. Player_a sells himself, highlighting his strengths and how he meets the criteria. Max of 100 words.
-    2. Player_b sells himself, highlighting his strengths and how he meets the criteria. Max of 100 words.
-    3. Player_a argues why he is the winner compared to player_b. Max of 100 words.
-    4. Player_b counter-argues why he is the winner compared to player_a. Max of 100 words.
-    5. The impartial judge chooses which player as the winner.
+    1. player_1 presents their strengths and how they meet the criteria. Max of 100 words.
+    2. player_2 presents their strengths and how they meet the criteria. Max of 100 words.
+    3. player_1 argues why they should be the winner compared to player_2. Max of 100 words.
+    4. player_2 counter-argues why they should be the winner compared to player_1. Max of 100 words.
+    5. The impartial judge chooses the winner.
     # Output Format
     - The impartial judge chooses this player as the winner.
@@ -85,25 +93,25 @@ module ActiveGenie::Battle
     FUNCTION =  {
       name: 'battle_evaluation',
-      description: 'Evaluate a battle between player_a and player_b using predefined criteria and identify the winner.',
-      schema: {
+      description: 'Evaluate a battle between player_1 and player_2 using predefined criteria and identify the winner.',
+      parameters: {
         type: "object",
         properties: {
-          player_a_sell_himself: {
+          player_1_sell_himself: {
             type: 'string',
-            description: 'player_a sell himself, highlighting his strengths and how he meets the criteria. Max of 100 words.',
+            description: 'player_1 presents their strengths and how they meet the criteria. Max of 100 words.',
           },
-          player_b_sell_himself: {
+          player_2_sell_himself: {
             type: 'string',
-            description: 'player_b sell himself, highlighting his strengths and how he meets the criteria. Max of 100 words.',
+            description: 'player_2 presents their strengths and how they meet the criteria. Max of 100 words.',
           },
-          player_a_arguments: {
+          player_1_arguments: {
             type: 'string',
-            description: 'player_a arguments why he is the winner compared to player_b. Max of 100 words.',
+            description: 'player_1 arguments for why they should be the winner compared to player_2. Max of 100 words.',
           },
-          player_b_counter: {
+          player_2_counter: {
             type: 'string',
-            description: 'player_b counter arguments why he is the winner compared to player_a. Max of 100 words.',
+            description: 'player_2 counter arguments for why they should be the winner compared to player_1. Max of 100 words.',
           },
           impartial_judge_winner_reasoning: {
             type: 'string',
@@ -111,10 +119,18 @@ module ActiveGenie::Battle
           },
           impartial_judge_winner: {
             type: 'string',
-            description: 'The impartial judge chose this player as the winner.',
-            enum: ['player_a', 'player_b', 'draw']
+            description: 'Who is the winner based on the impartial judge reasoning?',
+            enum: ['player_1', 'player_2']
           },
-        }
+        },
+        required: [
+          'player_1_sell_himself',
+          'player_2_sell_himself',
+          'player_1_arguments',
+          'player_2_counter',
+          'impartial_judge_winner_reasoning',
+          'impartial_judge_winner'
+        ]
       }
     }
   end

data/lib/active_genie/battle.rb CHANGED Viewed

@@ -9,5 +9,9 @@ module ActiveGenie
     def basic(...)
       Basic.call(...)
     end
+    def call(...)
+      Basic.call(...)
+    end
   end
 end

data/lib/active_genie/clients/anthropic_client.rb ADDED Viewed

@@ -0,0 +1,84 @@
+require 'json'
+require 'net/http'
+require 'uri'
+require_relative './helpers/retry'
+require_relative './base_client'
+module ActiveGenie::Clients
+  # Client for interacting with the Anthropic (Claude) API with json response
+  class AnthropicClient < BaseClient
+    class AnthropicError < ClientError; end
+    class RateLimitError < AnthropicError; end
+    ANTHROPIC_VERSION = '2023-06-01'
+    ANTHROPIC_ENDPOINT = '/v1/messages'
+    def initialize(config)
+      super(config)
+    end
+    # Requests structured JSON output from the Anthropic Claude model based on a schema.
+    #
+    # @param messages [Array<Hash>] A list of messages representing the conversation history.
+    #   Each hash should have :role ('user', 'assistant', or 'system') and :content (String).
+    #   Claude uses 'user', 'assistant', and 'system' roles.
+    # @param function [Hash] A JSON schema definition describing the desired output format.
+    # @param model_tier [Symbol, nil] A symbolic representation of the model quality/size tier.
+    # @param config [Hash] Optional configuration overrides:
+    #   - :api_key [String] Override the default API key.
+    #   - :model [String] Override the model name directly.
+    #   - :max_retries [Integer] Max retries for the request.
+    #   - :retry_delay [Integer] Initial delay for retries.
+    #   - :anthropic_version [String] Override the default Anthropic API version.
+    # @return [Hash, nil] The parsed JSON object matching the schema, or nil if parsing fails or content is empty.
+    def function_calling(messages, function, model_tier: nil, config: {})
+      model = config[:runtime][:model] || @app_config.tier_to_model(model_tier)
+      system_message = messages.find { |m| m[:role] == 'system' }&.dig(:content) || ''
+      user_messages = messages.select { |m| m[:role] == 'user' || m[:role] == 'assistant' }
+        .map { |m| { role: m[:role], content: m[:content] } }
+      anthropic_function = function.dup
+      anthropic_function[:input_schema] = function[:parameters]
+      anthropic_function.delete(:parameters)
+      payload = {
+        model:,
+        system: system_message,
+        messages: user_messages,
+        tools: [anthropic_function],
+        tool_choice: { name: anthropic_function[:name], type: 'tool' },
+        max_tokens: config[:runtime][:max_tokens],
+        temperature: config[:runtime][:temperature] || 0,
+      }
+      api_key = config[:runtime][:api_key] || @app_config.api_key
+      headers = {
+        'x-api-key': api_key,
+        'anthropic-version': config[:anthropic_version] || ANTHROPIC_VERSION
+      }.compact
+      retry_with_backoff(config:) do
+        start_time = Time.now
+        response = post(ANTHROPIC_ENDPOINT, payload, headers: headers, config: config)
+        content = response.dig('content', 0, 'input')
+        ActiveGenie::Logger.trace({
+          code: :llm_usage,
+          input_tokens: response.dig('usage', 'input_tokens'),
+          output_tokens: response.dig('usage', 'output_tokens'),
+          total_tokens: response.dig('usage', 'input_tokens') + response.dig('usage', 'output_tokens'),
+          model: payload[:model],
+          duration: Time.now - start_time,
+          usage: response.dig('usage')
+        })
+        ActiveGenie::Logger.trace({code: :function_calling, payload:, parsed_response: content})
+        content
+      end
+    end
+  end
+end