RubyGems - ruby-skill-bench - Versions diffs - 1.0.1 → 1.1.0 - Mend

ruby-skill-bench 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/README.md +145 -0
data/lib/skill_bench/agent/react_agent.rb +2 -1
data/lib/skill_bench/clients/all.rb +1 -0
data/lib/skill_bench/clients/base_client.rb +2 -5
data/lib/skill_bench/clients/request_builder.rb +2 -4
data/lib/skill_bench/clients/response_builder.rb +91 -0
data/lib/skill_bench/clients/response_error_handler.rb +5 -17
data/lib/skill_bench/clients/retry_handler.rb +4 -7
data/lib/skill_bench/constants.rb +58 -0
data/lib/skill_bench/execution/context_hydrator.rb +16 -6
data/lib/skill_bench/execution/sandbox.rb +18 -3
data/lib/skill_bench/tools/run_command.rb +2 -17
data/lib/skill_bench/version.rb +1 -1
data/lib/skill_bench.rb +1 -0
metadata +3 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d3c4edfe40e04251d2e7b758e7c630ee9affaa9e8170ceb0fa379d61bacc81e6
-  data.tar.gz: e9ef2eb8ef7a524d607c6e44705df772feec8939a376b516adff032eeeb8b535
+  metadata.gz: d2ad524e13bc006a56f0197d07b3ba7b0ce2f99f60b61f0739c3d5bc0d75a687
+  data.tar.gz: a920c473148b52584653acbb1e91cb3973791c09de6c4df994a77c097eabc476
 SHA512:
-  metadata.gz: b92554c769e34205d1c197bd67a9ca2ae61876b83c5429e202c667831100470fa9f1ed48a297ea184855e33e7ac3945fb513909b2344634078b8090750325dc9
-  data.tar.gz: 7ae92f1331f2061cccf42a1f27f80cbe41c73d54d0909499900efa84ad3984edada8e7df10b5a018717861234974918cdfac80b5242483df108272093eec8deb
+  metadata.gz: c1f131af9bcde90e7fc3a7e6bef7f3770edfa4e2826ee19c3aabf5c210d6d3b6e5bdd460778a87f6fdc77b5b99bc17b2225e1b79de31674ad4acfe1bbc89f862
+  data.tar.gz: d8e3791c91242b25779afa3a21c57daeb06995bc4c65b01ca6f378a69491aeec95c5844ea541e5dcaf18b46d8e7f153ffb38ff2bb060ab5dacd653c8c1026bcd

data/README.md CHANGED Viewed

@@ -859,6 +859,151 @@ bundle exec ruby -Itest test/integration_test.rb
 - `test/agent_eval/` — CLI, models, and service tests
 - `test/clients/` — Provider client tests
+---
+## Security
+### Threat Model
+Ruby Skill Bench is designed with security as a primary concern. The system executes AI agents in isolated environments and must protect against various attack vectors:
+- **Path Traversal:** Preventing agents from accessing files outside the sandbox
+- **Command Injection:** Preventing execution of arbitrary shell commands
+- **Resource Exhaustion:** Preventing denial-of-service through resource consumption
+- **Information Leakage:** Protecting sensitive data like API keys
+### Security Features
+#### Path Traversal Protection
+- **Symlink Validation:** All symlinks are validated to ensure they don't escape the sandbox
+- **TOCTOU Mitigation:** Path validation is re-checked after directory creation operations
+- **Path Normalization:** All paths are normalized and validated against working directory boundaries
+- **Character Validation:** Paths are validated against strict character patterns
+#### Command Execution Security
+- **Command Allowlist:** Only explicitly allowed commands can be executed
+- **Dangerous Commands Blocklist:** Dangerous commands (bash, curl, sudo, etc.) are always blocked
+- **Shell Tokenization:** Commands are tokenized before execution to prevent shell injection
+- **Docker Isolation:** Commands can be executed in isolated Docker containers with hardened security settings
+#### Docker Security Hardening
+When Docker is available, containers are launched with hardened security settings:
+- **Non-root User:** Containers run as a non-root user
+- **Privilege Prevention:** `--security-opt no-new-privileges` prevents privilege escalation
+- **Capability Dropping:** All Linux capabilities are dropped except minimal needed ones
+- **Network Isolation:** `--network none` disables network access
+- **Read-only Root:** Container filesystem is read-only (except for mounted volumes)
+#### Resource Limits
+- **File Size Limits:** Individual files in context hydration are limited to 50KB
+- **Total Context Size:** Total context size is limited to 1MB to prevent memory exhaustion
+- **Execution Timeout:** Commands are limited to a configurable timeout (default: 30 seconds)
+- **Max Iterations:** Agent loops are limited to prevent infinite loops
+### API Key Security
+- **Environment Variables:** API keys are loaded from environment variables, not hardcoded
+- **Configuration Hierarchy:** Keys can be set in `skill-bench.json` or environment variables
+- **No Logging:** API keys are never logged or exposed in error messages
+- **Provider-Specific Keys:** Each provider uses its own API key configuration
+### Best Practices for Users
+1. **Never Commit API Keys:** Never commit `skill-bench.json` with API keys to version control
+2. **Use Environment Variables:** Prefer environment variables for sensitive configuration
+3. **Minimal Command Allowlist:** Only allow commands necessary for your evals
+4. **Regular Updates:** Keep dependencies updated to patch security vulnerabilities
+5. **Review Changes:** Review skill files before execution to ensure they don't contain malicious code
+### Reporting Security Issues
+If you discover a security vulnerability:
+1. **Do Not Open a Public Issue:** Send a private email to the maintainers
+2. **Provide Details:** Include steps to reproduce and potential impact
+3. **Allow Time for Fix:** Give maintainers time to address the issue before disclosure
+4. **Follow Responsible Disclosure:** Follow responsible disclosure practices
+---
+## Troubleshooting
+### Common Issues and Solutions
+#### Configuration Issues
+**Problem:** "Config load failed, using mock provider"
+- **Solution:** Ensure your `skill-bench.json` file is properly formatted JSON and contains required fields
+- **Check:** Verify the file exists in your project root or home directory
+**Problem:** "API Key not set for [Provider]"
+- **Solution:** Set the appropriate environment variable (e.g., `SKILL_BENCH_OPENAI_API_KEY`) or add it to your `skill-bench.json`
+- **Check:** Run `env | grep SKILL_BENCH` to verify environment variables are set
+**Problem:** "No allowed commands configured"
+- **Solution:** Add `allowed_commands` array to your `skill-bench.json` with the commands you want to allow
+- **Check:** Ensure commands are in the allowlist and not in the dangerous commands list
+#### Execution Issues
+**Problem:** "Command execution timed out"
+- **Solution:** Increase `max_execution_time` in your `skill-bench.json` or simplify the task
+- **Check:** Verify the command isn't hanging or waiting for input
+**Problem:** "Docker container failed to start"
+- **Solution:** Ensure Docker is running and you have permissions to run Docker commands
+- **Check:** Run `docker info` to verify Docker daemon is accessible
+**Problem:** "Context hydration failed"
+- **Solution:** Verify the source path exists and is a directory
+- **Check:** Ensure the path is within the base directory and file sizes are under limits
+#### Network Issues
+**Problem:** "Network Error: Connection refused"
+- **Solution:** Check your internet connection and API provider status
+- **Check:** Verify the base URL in your configuration is correct
+**Problem:** "API Request failed: 429"
+- **Solution:** This is a rate limit error. The system will retry automatically
+- **Check:** Reduce request frequency or check your API quota
+#### Test Failures
+**Problem:** Tests fail with "WebMock::NetConnectNotAllowedError"
+- **Solution:** This occurs when tests try to make real HTTP requests. Ensure test stubs are properly configured
+- **Check:** Verify WebMock is properly stubbing the expected URLs
+**Problem:** "E2E sibling repositories not present"
+- **Solution:** This is expected if you don't have the agent-mcp-runtime repository cloned
+- **Check:** These tests will be skipped and won't affect the overall test results
+### Debug Mode
+For detailed debugging, you can enable verbose logging:
+```bash
+# Set environment variable for verbose logging
+export SKILL_BENCH_DEBUG=true
+skill-bench run my-eval --skill=my-skill
+```
+### Getting Help
+If you encounter issues not covered here:
+1. Check the [GitHub Issues](https://github.com/igmarin/ruby-skill-bench/issues) for similar problems
+2. Create a new issue with detailed information about your environment and the problem
+3. Include Ruby version, SkillBench version, and error messages
+4. Provide steps to reproduce the issue
+---
 ## CI/CD Integration
 GitHub Actions workflow included (`.github/workflows/ci.yml`):

data/lib/skill_bench/agent/react_agent.rb CHANGED Viewed

@@ -1,5 +1,6 @@
 # frozen_string_literal: true
+require_relative '../constants'
 require_relative 'react_agent/step'
 require_relative 'react_agent/loop_runner'
@@ -29,7 +30,7 @@ module SkillBench
       def initialize(params)
         @system_prompt = params[:system_prompt]
         @initial_prompt = params[:initial_prompt]
-        @max_iterations = params[:max_iterations] || 25
+        @max_iterations = params[:max_iterations] || Constants::ReactAgent::DEFAULT_MAX_ITERATIONS
         @working_dir = params[:working_dir] || Dir.pwd
         @container_id = params[:container_id]
         @client_params = params[:client_params] || {}

data/lib/skill_bench/clients/all.rb CHANGED Viewed

@@ -2,6 +2,7 @@
 require_relative 'response_parser'
 require_relative 'response_error_handler'
+require_relative 'response_builder'
 require_relative 'request_builder'
 require_relative 'retry_handler'
 require_relative 'base_client'

data/lib/skill_bench/clients/base_client.rb CHANGED Viewed

@@ -4,6 +4,7 @@ require_relative '../config'
 require_relative 'provider_config'
 require_relative 'response_parser'
 require_relative 'response_error_handler'
+require_relative 'response_builder'
 require_relative 'request_builder'
 require_relative 'retry_handler'
@@ -135,7 +136,7 @@ module SkillBench
                   else
                     "#{missing.first} not set for #{@provider_display_name}"
                   end
-        { success: false, response: { error: { message: message } }, result: message, status: 'error' }
+        ResponseBuilder.error(message: message)
       end
       # Extracts the message hash from the provider's specific response body structure.
@@ -182,10 +183,6 @@ module SkillBench
         message = extract_message(parsed)
         return missing_message_response(response, parsed) unless ResponseParser.valid_message?(message)
-        success_response(parsed, message)
-      end
-      def success_response(parsed, message)
         content = ResponseParser.extract_content(message)
         {
           success: true,

data/lib/skill_bench/clients/request_builder.rb CHANGED Viewed

@@ -1,22 +1,20 @@
 # frozen_string_literal: true
 require 'faraday'
+require_relative '../constants'
 module SkillBench
   module Clients
     # Builds and executes HTTP requests to LLM provider APIs.
     # Encapsulates Faraday connection setup and request execution.
     class RequestBuilder
-      DEFAULT_OPEN_TIMEOUT = 10
-      DEFAULT_TIMEOUT = 120
       # Creates a Faraday connection with JSON middleware.
       #
       # @param base_url [String] The API base URL
       # @param open_timeout [Integer] Connection open timeout in seconds
       # @param timeout [Integer] Request timeout in seconds
       # @return [Faraday::Connection] Configured Faraday connection
-      def self.build_connection(base_url, open_timeout: DEFAULT_OPEN_TIMEOUT, timeout: DEFAULT_TIMEOUT)
+      def self.build_connection(base_url, open_timeout: Constants::HttpClient::DEFAULT_OPEN_TIMEOUT, timeout: Constants::HttpClient::DEFAULT_TIMEOUT)
         Faraday.new(url: base_url) do |f|
           f.request :json
           f.response :json

data/lib/skill_bench/clients/response_builder.rb ADDED Viewed

@@ -0,0 +1,91 @@
+# frozen_string_literal: true
+module SkillBench
+  module Clients
+    # Service object for building standardized response hashes.
+    # Eliminates duplication of error response formatting across the codebase.
+    class ResponseBuilder
+      # Builds a standardized error response.
+      #
+      # @param message [String] The error message.
+      # @param status [String] The status identifier (default: 'error').
+      # @return [Hash] Standardized error response hash.
+      def self.error(message:, status: 'error')
+        {
+          success: false,
+          response: { error: { message: message } },
+          result: message,
+          status: status
+        }
+      end
+      # Builds a standardized success response.
+      #
+      # @param content [String] The response content.
+      # @param metadata [Hash] Additional metadata to include in response.
+      # @return [Hash] Standardized success response hash.
+      def self.success(content:, metadata: {})
+        {
+          success: true,
+          result: content,
+          response: { content: content }.merge(metadata),
+          status: 'success'
+        }
+      end
+      # Builds a standardized API error response.
+      #
+      # @param error_message [String] The API error message.
+      # @param usage [Hash] Token usage information.
+      # @return [Hash] Standardized API error response hash.
+      def self.api_error(error_message:, usage: {})
+        {
+          success: false,
+          result: "API Error: #{error_message}",
+          usage: usage,
+          response: { error: { message: "API Error: #{error_message}" } },
+          status: 'error'
+        }
+      end
+      # Builds a standardized network error response.
+      #
+      # @param error_message [String] The network error message.
+      # @return [Hash] Standardized network error response hash.
+      def self.network_error(error_message:)
+        {
+          success: false,
+          response: { error: { message: "Network Error: #{error_message}" } },
+          result: "Network Error: #{error_message}",
+          status: 'error'
+        }
+      end
+      # Builds a standardized parsing error response.
+      #
+      # @param error_message [String] The parsing error message.
+      # @return [Hash] Standardized parsing error response hash.
+      def self.parsing_error(error_message:)
+        {
+          success: false,
+          response: { error: { message: "Parsing Error: #{error_message}" } },
+          result: "Parsing Error: #{error_message}",
+          status: 'error'
+        }
+      end
+      # Builds a standardized unexpected error response.
+      #
+      # @param error_message [String] The unexpected error message.
+      # @return [Hash] Standardized unexpected error response hash.
+      def self.unexpected_error(error_message:)
+        {
+          success: false,
+          response: { error: { message: "Unexpected Error: #{error_message}" } },
+          result: "Unexpected Error: #{error_message}",
+          status: 'error'
+        }
+      end
+    end
+  end
+end

data/lib/skill_bench/clients/response_error_handler.rb CHANGED Viewed

@@ -23,14 +23,8 @@ module SkillBench
           error_msg += " - #{detail}"
         end
-        {
-          success: false,
-          result: error_msg,
-          usage: usage_extractor.call(parsed),
-          response: { error: { message: error_msg } },
-          status: 'error',
-          code: response.status
-        }
+        base_response = ResponseBuilder.api_error(error_message: error_msg, usage: usage_extractor.call(parsed))
+        base_response.merge(code: response.status)
       end
       # Creates an error response when the LLM response has no message content.
@@ -41,14 +35,8 @@ module SkillBench
       # @return [Hash] Standardized error response
       def self.missing_message_response(response, parsed, &usage_extractor)
         error_msg = 'LLM response missing message content'
-        {
-          success: false,
-          result: error_msg,
-          usage: usage_extractor.call(parsed),
-          response: { error: { message: error_msg } },
-          status: 'error',
-          code: response.status
-        }
+        base_response = ResponseBuilder.error(message: error_msg)
+        base_response.merge(usage: usage_extractor.call(parsed), code: response.status)
       end
       # Handles an exception by logging and returning a standardized error response.
@@ -58,7 +46,7 @@ module SkillBench
       # @return [Hash] Standardized error response
       def self.handle_exception(error, type)
         log_error(error)
-        { success: false, result: "#{type}: #{error.message}", status: 'error' }
+        ResponseBuilder.error(message: "#{type}: #{error.message}")
       end
       # Logs an error message and backtrace to Rails.logger or stderr.

data/lib/skill_bench/clients/retry_handler.rb CHANGED Viewed

@@ -2,6 +2,7 @@
 require 'faraday'
 require_relative '../error_logger'
+require_relative '../constants'
 module SkillBench
   module Clients
@@ -9,10 +10,6 @@ module SkillBench
     # Retries on transient errors (429, 503). Raises permanent errors immediately.
     # Returns the block result on success.
     class RetryHandler
-      RETRYABLE_STATUSES = [429, 503].freeze
-      MAX_DELAY = 30 # Maximum delay cap in seconds
       # Executes the given block with retry logic.
       #
       # @param max_attempts [Integer] Maximum number of attempts (default: 3).
@@ -21,7 +18,7 @@ module SkillBench
       # @return [Object] The block's return value on success.
       # @raise [Faraday::Error] On non-retryable errors or after exhausting retries.
       # @raise [ArgumentError] if no block is given or max_attempts < 1.
-      def self.call(max_attempts: 3, base_delay: 1, &block)
+      def self.call(max_attempts: Constants::HttpClient::DEFAULT_MAX_RETRIES, base_delay: Constants::HttpClient::DEFAULT_RETRY_DELAY, &block)
         raise ArgumentError, 'RetryHandler requires a block' unless block
         raise ArgumentError, 'max_attempts must be >= 1' if max_attempts < 1
@@ -59,11 +56,11 @@ module SkillBench
       private
       def retryable?(status, attempt)
-        RETRYABLE_STATUSES.include?(status) && attempt < @max_attempts
+        Constants::HttpClient::RETRYABLE_STATUSES.include?(status) && attempt < @max_attempts
       end
       def compute_delay(attempt)
-        [@base_delay * (2**(attempt - 1)), MAX_DELAY].min
+        [@base_delay * (2**(attempt - 1)), Constants::ReactAgent::DEFAULT_MAX_DELAY].min
       end
       def extract_status(error)

data/lib/skill_bench/constants.rb ADDED Viewed

@@ -0,0 +1,58 @@
+# frozen_string_literal: true
+module SkillBench
+  # Centralized configuration constants for the SkillBench system.
+  # This eliminates magic numbers and provides a single source of truth
+  # for configurable values across the codebase.
+  module Constants
+    # ReAct Agent Configuration
+    module ReactAgent
+      DEFAULT_MAX_ITERATIONS = 25
+      DEFAULT_MAX_DELAY = 30 # Maximum delay cap in seconds for retry logic
+    end
+    # HTTP Client Configuration
+    module HttpClient
+      DEFAULT_OPEN_TIMEOUT = 10
+      DEFAULT_TIMEOUT = 120
+      DEFAULT_MAX_RETRIES = 3
+      DEFAULT_RETRY_DELAY = 1
+      RETRYABLE_STATUSES = [429, 503].freeze
+    end
+    # Context Hydration Configuration
+    module ContextHydration
+      MAX_FILE_SIZE = 50_000 # Maximum file size in bytes
+      MAX_TOTAL_CONTEXT_SIZE = 1_000_000 # Maximum total context size in bytes (1MB)
+      TEXT_EXTENSIONS = %w[.md .rb .json .yml .yaml .txt].freeze
+    end
+    # Sandbox Configuration
+    module Sandbox
+      DOCKER_IMAGE_NAME = 'evaluator-sandbox'
+    end
+    # Tool Execution Configuration
+    module Tools
+      DANGEROUS_COMMANDS = %w[
+        bash sh zsh fish dash ksh csh tcsh
+        python python3 python2 ruby perl node
+        php lua tcl wish
+        curl wget nc ncat socat
+        eval exec
+        sudo su doas
+        chmod chown mount umount
+        dd mkfs fdisk parted
+        insmod rmmod modprobe
+        systemctl service
+        passwd useradd userdel groupadd groupdel
+      ].freeze
+    end
+    # File Path Configuration
+    module FilePath
+      ALLOWED_PATH_PATTERN = %r{\A[a-zA-Z0-9._\-/]+\z}
+      MAX_PATH_LENGTH = 4096
+    end
+  end
+end

data/lib/skill_bench/execution/context_hydrator.rb CHANGED Viewed

@@ -2,6 +2,7 @@
 require 'pathname'
 require 'cgi'
+require_relative '../constants'
 module SkillBench
   module Execution
@@ -10,10 +11,6 @@ module SkillBench
     class ContextHydrator
       # Error message returned when context hydration fails.
       HYDRATION_FAILED = 'Failed to hydrate context from source path'
-      # File extensions considered for context hydration.
-      TEXT_EXTENSIONS = %w[.md .rb .json .yml .yaml .txt].freeze
-      # Maximum file size (in bytes) for files included in context hydration.
-      MAX_FILE_SIZE = 50_000
       # Loads and formats source context files.
       #
@@ -50,6 +47,8 @@ module SkillBench
         return missing_path_result unless full_path.exist? && full_path.directory?
         context_files = collect_context_files(full_path)
+        return missing_path_result unless validate_total_size?(context_files)
         xml_context = build_xml(context_files)
         { success: true, response: { context: xml_context } }
@@ -65,12 +64,23 @@ module SkillBench
       end
       def collect_context_files(full_path)
-        pattern = full_path.join("*{#{TEXT_EXTENSIONS.join(',')}}").to_s
+        pattern = full_path.join("*{#{Constants::ContextHydration::TEXT_EXTENSIONS.join(',')}}").to_s
         Dir.glob(pattern).reject { |f| File.symlink?(f) }
-                         .select { |f| File.size(f) <= MAX_FILE_SIZE }
+                         .select { |f| File.size(f) <= Constants::ContextHydration::MAX_FILE_SIZE }
                          .sort
       end
+      def validate_total_size?(context_files)
+        total_size = context_files.sum { |f| File.size(f) }
+        return true if total_size <= Constants::ContextHydration::MAX_TOTAL_CONTEXT_SIZE
+        SkillBench::ErrorLogger.log_error(
+          StandardError.new("Total context size #{total_size} exceeds maximum #{Constants::ContextHydration::MAX_TOTAL_CONTEXT_SIZE}"),
+          'ContextHydrator'
+        )
+        false
+      end
       # Builds the XML structure wrapping the contents of the context files.
       #
       # @param context_files [Array<String>] List of absolute paths to context files.

data/lib/skill_bench/execution/sandbox.rb CHANGED Viewed

@@ -3,6 +3,7 @@
 require 'fileutils'
 require 'tmpdir'
 require 'open3'
+require_relative '../constants'
 module SkillBench
   module Execution
@@ -143,18 +144,32 @@ module SkillBench
       # Starts a Docker container for isolated command execution.
       # Builds the image only if it does not already exist.
+      # Uses hardened security settings for production safety.
       #
       # @raise [RuntimeError] when the Docker image cannot be built or the container fails to start.
       def start_container
-        image_name = 'evaluator-sandbox'
+        image_name = Constants::Sandbox::DOCKER_IMAGE_NAME
         docker_dir = File.expand_path('docker', __dir__)
         # Build image (Docker layer cache handles no-op builds)
         raise "Failed to build Docker image #{image_name}" unless system('docker', 'build', '-t', image_name, docker_dir, '--quiet')
-        # Start a detached container mounting the sandbox dir to /sandbox
+        # Start a detached container with hardened security settings
+        # --user $(id -u):$(id -g): Runs as non-root user
+        # --security-opt no-new-privileges: Prevents privilege escalation
+        # --cap-drop ALL: Drops all Linux capabilities
+        # --cap-add CHOWN, DAC_OVERRIDE: Adds back minimal capabilities for git operations
+        # --network none: Disables network access for additional isolation
         stdout, stderr, status = Open3.capture3(
-          'docker', 'run', '-d', '--rm', '-v', "#{@path}:/sandbox", image_name
+          'docker', 'run', '-d', '--rm',
+          '--user', "#{Process.uid}:#{Process.gid}",
+          '--security-opt', 'no-new-privileges',
+          '--cap-drop', 'ALL',
+          '--cap-add', 'CHOWN',
+          '--cap-add', 'DAC_OVERRIDE',
+          '--network', 'none',
+          '-v', "#{@path}:/sandbox:rw",
+          image_name
         )
         raise "Failed to start Docker container: #{stderr}" unless status.success?

data/lib/skill_bench/tools/run_command.rb CHANGED Viewed

@@ -4,27 +4,12 @@ require 'open3'
 require 'timeout'
 require 'shellwords'
 require_relative '../config'
+require_relative '../constants'
 module SkillBench
   module Tools
     # Handles executing a shell command within the working directory.
     class RunCommand
-      # Commands that are always blocked even if listed in allowed_commands,
-      # because they can be used to escape the sandbox or execute arbitrary code.
-      DANGEROUS_COMMANDS = %w[
-        bash sh zsh fish dash ksh csh tcsh
-        python python3 python2 ruby perl node
-        php lua tcl wish
-        curl wget nc ncat socat
-        eval exec
-        sudo su doas
-        chmod chown mount umount
-        dd mkfs fdisk parted
-        insmod rmmod modprobe
-        systemctl service
-        passwd useradd userdel groupadd groupdel
-      ].freeze
       # @return [Hash] The tool definition for the LLM API.
       def self.definition
         {
@@ -59,7 +44,7 @@ module SkillBench
         return 'Error: Empty command.' if argv.empty?
         base_cmd = argv.first
-        return "Error: Command '#{base_cmd}' is blocked for security reasons." if DANGEROUS_COMMANDS.include?(base_cmd)
+        return "Error: Command '#{base_cmd}' is blocked for security reasons." if Constants::Tools::DANGEROUS_COMMANDS.include?(base_cmd)
         allowed = SkillBench::Config.allowed_commands
         return 'Error: No allowed commands configured. Set allowed_commands in skill-bench.json or use --mode mock.' if allowed.nil?

data/lib/skill_bench/version.rb CHANGED Viewed

@@ -2,5 +2,5 @@
 module SkillBench
   # The current gem version.
-  VERSION = '1.0.1'
+  VERSION = '1.1.0'
 end

data/lib/skill_bench.rb CHANGED Viewed

@@ -8,6 +8,7 @@
 # Core modules
 require_relative 'skill_bench/version'
+require_relative 'skill_bench/constants'
 require_relative 'skill_bench/dimension'
 require_relative 'skill_bench/criteria'
 require_relative 'skill_bench/delta_report'

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ruby-skill-bench
 version: !ruby/object:Gem::Version
-  version: 1.0.1
+  version: 1.1.0
 platform: ruby
 authors:
 - Ismael Marin
@@ -147,6 +147,7 @@ files:
 - lib/skill_bench/clients/providers/opencode.rb
 - lib/skill_bench/clients/providers/openrouter.rb
 - lib/skill_bench/clients/request_builder.rb
+- lib/skill_bench/clients/response_builder.rb
 - lib/skill_bench/clients/response_error_handler.rb
 - lib/skill_bench/clients/response_parser.rb
 - lib/skill_bench/clients/retry_handler.rb
@@ -162,6 +163,7 @@ files:
 - lib/skill_bench/config/facade_writers.rb
 - lib/skill_bench/config/json_loader.rb
 - lib/skill_bench/config/store.rb
+- lib/skill_bench/constants.rb
 - lib/skill_bench/criteria.rb
 - lib/skill_bench/delta_report.rb
 - lib/skill_bench/dimension.rb