ruby-skill-bench 0.1.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +86 -0
  3. data/lib/skill_bench/cli/compare_command.rb +91 -0
  4. data/lib/skill_bench/cli/help_printer.rb +9 -1
  5. data/lib/skill_bench/cli/run_command.rb +6 -4
  6. data/lib/skill_bench/cli.rb +7 -4
  7. data/lib/skill_bench/clients/all.rb +1 -0
  8. data/lib/skill_bench/clients/providers/mock.rb +56 -0
  9. data/lib/skill_bench/commands/run.rb +6 -2
  10. data/lib/skill_bench/config/applier.rb +1 -0
  11. data/lib/skill_bench/config/defaults.rb +1 -0
  12. data/lib/skill_bench/config/facade_readers.rb +7 -0
  13. data/lib/skill_bench/config/json_loader.rb +3 -3
  14. data/lib/skill_bench/config/store.rb +5 -0
  15. data/lib/skill_bench/config.rb +10 -1
  16. data/lib/skill_bench/delta_report.rb +20 -0
  17. data/lib/skill_bench/execution/source_path_resolver.rb +59 -3
  18. data/lib/skill_bench/registry/pack_resolver.rb +119 -0
  19. data/lib/skill_bench/services/agent_spawner_service.rb +114 -0
  20. data/lib/skill_bench/services/compare_option_parser.rb +55 -0
  21. data/lib/skill_bench/services/comparison_reporter.rb +97 -0
  22. data/lib/skill_bench/services/comparison_runner.rb +49 -0
  23. data/lib/skill_bench/services/context_loader_service.rb +42 -0
  24. data/lib/skill_bench/services/error_response_builder.rb +119 -0
  25. data/lib/skill_bench/services/eval_resolver.rb +33 -0
  26. data/lib/skill_bench/services/exit_code_calculator.rb +39 -0
  27. data/lib/skill_bench/services/judge_params_builder.rb +54 -0
  28. data/lib/skill_bench/services/manifest_finder.rb +36 -0
  29. data/lib/skill_bench/services/output_formatter.rb +28 -0
  30. data/lib/skill_bench/services/prompt_builder_service.rb +98 -0
  31. data/lib/skill_bench/services/provider_resolver.rb +73 -0
  32. data/lib/skill_bench/services/runner_service.rb +84 -315
  33. data/lib/skill_bench/services/skill_resolver.rb +37 -9
  34. data/lib/skill_bench/services/skill_resolver_service.rb +70 -0
  35. data/lib/skill_bench/services/source_path_resolver_service.rb +45 -0
  36. data/lib/skill_bench/services/trend_recorder_service.rb +67 -0
  37. data/lib/skill_bench/services/variant_parser.rb +32 -0
  38. data/lib/skill_bench/services/variant_resolver.rb +63 -0
  39. data/lib/skill_bench/version.rb +1 -1
  40. metadata +23 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 58713d379ef4db5ce99a695309440159115257fee5b995ed6c6b8f1cbdca13b7
4
- data.tar.gz: 52cab6f8913582728c66fa1d34d3109958c5c5c33bbf9ffd5969f5b5cb13908e
3
+ metadata.gz: d3c4edfe40e04251d2e7b758e7c630ee9affaa9e8170ceb0fa379d61bacc81e6
4
+ data.tar.gz: e9ef2eb8ef7a524d607c6e44705df772feec8939a376b516adff032eeeb8b535
5
5
  SHA512:
6
- metadata.gz: 74cc26703d9cb9da5362ed0450987e02af86847025c7c687d6519b191f921a300bce66a166a5596a956bedb7c6b52c45cc29a003bf3f93df8e491565608ca9af
7
- data.tar.gz: 1dce2943558b3c0672be0950905e140892546928f0ddb89ddcd5b6457d99e9b562915f77239e3ab71d290737ed1705977f5f4abfc4f5eed505ec9783fa28d46f
6
+ metadata.gz: b92554c769e34205d1c197bd67a9ca2ae61876b83c5429e202c667831100470fa9f1ed48a297ea184855e33e7ac3945fb513909b2344634078b8090750325dc9
7
+ data.tar.gz: 7ae92f1331f2061cccf42a1f27f80cbe41c73d54d0909499900efa84ad3984edada8e7df10b5a018717861234974918cdfac80b5242483df108272093eec8deb
data/README.md CHANGED
@@ -7,6 +7,21 @@
7
7
 
8
8
  *A high-fidelity evaluation engine for benchmarking AI agent skills across any stack (Rails-first, but extensible).*
9
9
 
10
+ ## Part of the AI Skill Ecosystem
11
+
12
+ This repo is one of 6 in a composable AI skill ecosystem:
13
+
14
+ | Repo | Role |
15
+ |------|------|
16
+ | [`ruby-core-skills`](https://github.com/igmarin/ruby-core-skills) | 15 shared Ruby skills + process discipline |
17
+ | [`rails-agent-skills`](https://github.com/igmarin/rails-agent-skills) | 28 Rails-specific skills + 9 agents |
18
+ | [`hanakai-yaku`](https://github.com/igmarin/hanakai-yaku) | 35 Hanami/dry-rb skills + 10 agents |
19
+ | [`agnostic-planning-skills`](https://github.com/igmarin/agnostic-planning-skills) | 10 planning skills + 4 agents |
20
+ | [`agent-mcp-runtime`](https://github.com/igmarin/agent-mcp-runtime) | Rust CLI runtime (pack resolution, MCP) |
21
+ | [**`ruby-skill-bench`**](https://github.com/igmarin/ruby-skill-bench) | Benchmark/eval engine |
22
+
23
+ See the [Ecosystem Overview](https://github.com/igmarin/agent-mcp-runtime/blob/main/docs/ecosystem.md) for the full architecture.
24
+
10
25
  ---
11
26
 
12
27
  ## Features
@@ -343,6 +358,77 @@ Both skill contexts are concatenated and sent to the agent. The judge evaluates
343
358
 
344
359
  ---
345
360
 
361
+ ## Multi-Repo Skill Benchmarking
362
+
363
+ Skills in the ecosystem are split across multiple repos:
364
+ - `ruby-core-skills` — 15 shared Ruby skills (DDD, patterns, process discipline)
365
+ - `rails-agent-skills` — 28 Rails-specific skills
366
+ - `hanakai-yaku` — 35 Hanami/dry-rb skills
367
+
368
+ To benchmark a skill from an external repo, use the `--skill` flag:
369
+
370
+ ```bash
371
+ # Benchmark a core skill
372
+ skill-bench run evals/skills/write-yard-docs/basic \
373
+ --skill /path/to/ruby-core-skills/skills/patterns/write-yard-docs
374
+
375
+ # Benchmark a Rails skill
376
+ skill-bench run evals/skills/code-review/pr-review \
377
+ --skill /path/to/rails-agent-skills/skills/code-quality/code-review
378
+ ```
379
+
380
+ ### Config-Based Multi-Repo Resolution
381
+
382
+ Configure `skill_sources` in `skill-bench.json` to automatically resolve skills across repos without `--skill` every time:
383
+
384
+ ```json
385
+ {
386
+ "provider": "openai",
387
+ "model": "gpt-4o",
388
+ "skill_sources": {
389
+ "core": "../ruby-core-skills/skills",
390
+ "rails": "../rails-agent-skills/skills",
391
+ "hanami": "../hanakai-yaku/skills"
392
+ }
393
+ }
394
+ ```
395
+
396
+ Each key is a source name (for logging), each value is a path to a `skills/` directory. When a skill is not found locally, SkillBench iterates through `skill_sources` and uses the first match.
397
+
398
+ ### Pack-Based Resolution (`--pack`)
399
+
400
+ Resolve skills via the ecosystem registry manifest (from `agent-mcp-runtime`):
401
+
402
+ ```bash
403
+ # Run an eval using the Rails pack's version of code-review
404
+ skill-bench run evals/skills/code-review/basic \
405
+ --skill code-review \
406
+ --pack rails
407
+
408
+ # Override the default registry manifest path
409
+ skill-bench run evals/skills/code-review/basic \
410
+ --skill code-review \
411
+ --pack rails \
412
+ --registry-manifest /path/to/registry.json
413
+ ```
414
+
415
+ ### Variant Comparison (`compare`)
416
+
417
+ Compare the same skill across two pack variants to measure context-dependent performance:
418
+
419
+ ```bash
420
+ skill-bench compare code-review \
421
+ --variant-a "pack:rails" \
422
+ --variant-b "pack:hanami" \
423
+ --eval evals/skills/code-review/basic
424
+ ```
425
+
426
+ The `--variant` spec supports two forms:
427
+ - `pack:<name>` — resolve via registry manifest
428
+ - `/absolute/path` or `relative/path` — use a direct path
429
+
430
+ ---
431
+
346
432
  ## File Reference: What Lives on Disk
347
433
 
348
434
  SkillBench creates and manages three files in your project. Understanding them helps you iterate faster.
@@ -0,0 +1,91 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../services/compare_option_parser'
4
+ require_relative '../services/variant_parser'
5
+ require_relative '../services/comparison_runner'
6
+ require_relative '../services/comparison_reporter'
7
+ require_relative '../services/exit_code_calculator'
8
+
9
+ module SkillBench
10
+ module Cli
11
+ # Handles the `skill-bench compare` command.
12
+ # Runs the same eval with two skill variants and reports the comparison.
13
+ class CompareCommand
14
+ # Parses argv and executes the comparison.
15
+ #
16
+ # @param argv [Array<String>] Raw CLI arguments
17
+ # @return [Integer] Exit code
18
+ def self.call(argv)
19
+ new(argv).call
20
+ end
21
+
22
+ # @param argv [Array<String>] Raw CLI arguments
23
+ def initialize(argv)
24
+ @argv = argv
25
+ end
26
+
27
+ # Parses options, runs both variants, and prints a comparison report.
28
+ #
29
+ # @return [Integer] Exit code (0 if both pass, 1 otherwise)
30
+ def call
31
+ options = Services::CompareOptionParser.call(@argv)
32
+
33
+ skill_name = @argv.shift
34
+ return error_missing_skill unless skill_name
35
+ return error_missing_variant_a unless options[:variant_a]
36
+ return error_missing_variant_b unless options[:variant_b]
37
+ return error_missing_eval unless options[:eval]
38
+
39
+ variant_a = Services::VariantParser.call(options[:variant_a])
40
+ variant_b = Services::VariantParser.call(options[:variant_b])
41
+
42
+ puts "--- Running Variant A: #{options[:variant_a]} ---"
43
+ puts "--- Running Variant B: #{options[:variant_b]} ---"
44
+
45
+ results = Services::ComparisonRunner.call(
46
+ variant_a,
47
+ variant_b,
48
+ skill_name,
49
+ options[:eval]
50
+ )
51
+
52
+ Services::ComparisonReporter.call(
53
+ results[:result_a],
54
+ results[:result_b],
55
+ options[:variant_a],
56
+ options[:variant_b]
57
+ )
58
+
59
+ Services::ExitCodeCalculator.call(results[:result_a], results[:result_b])
60
+ rescue SkillBench::HelpRequested
61
+ 0
62
+ rescue StandardError => e
63
+ warn "Error: #{e.message}"
64
+ 1
65
+ end
66
+
67
+ private
68
+
69
+ def error_missing_skill
70
+ warn 'Error: skill name is required'
71
+ warn 'Usage: skill-bench compare <skill-name> --variant-a <spec> --variant-b <spec> --eval <path>'
72
+ 1
73
+ end
74
+
75
+ def error_missing_variant_a
76
+ warn 'Error: --variant-a is required'
77
+ 1
78
+ end
79
+
80
+ def error_missing_variant_b
81
+ warn 'Error: --variant-b is required'
82
+ 1
83
+ end
84
+
85
+ def error_missing_eval
86
+ warn 'Error: --eval is required'
87
+ 1
88
+ end
89
+ end
90
+ end
91
+ end
@@ -19,11 +19,19 @@ module SkillBench
19
19
  Providers: #{providers}
20
20
  --force Overwrite existing config file
21
21
 
22
- run <eval> --skill <name> [--skill <name>] [--format FORMAT]
22
+ run <eval> --skill <name> [--skill <name>] [--format FORMAT] [--pack NAME]
23
23
  Run an evaluation
24
24
  --skill Skill to use (can be specified multiple times)
25
+ --pack Pack context for registry-based skill resolution
26
+ --registry-manifest PATH Path to registry.json manifest
25
27
  --format Output format: human, json, junit (default: human)
26
28
 
29
+ compare <skill-name> --variant-a SPEC --variant-b SPEC --eval PATH
30
+ Compare the same skill across two pack variants
31
+ --variant-a First variant (e.g., "pack:rails" or "/path/to/skill")
32
+ --variant-b Second variant (e.g., "pack:hanami")
33
+ --eval Path to the eval directory
34
+
27
35
  skill new <name> [--mode MODE] [--template TYPE]
28
36
  Create a new skill
29
37
  --mode simple, advanced, or rails (default: simple)
@@ -29,7 +29,7 @@ module SkillBench
29
29
 
30
30
  eval_name = @argv.shift
31
31
  return error_missing_eval unless eval_name
32
- return error_missing_skill if options[:skill_names].empty?
32
+ return error_missing_skill if options[:skill_names].empty? && !options[:pack]
33
33
 
34
34
  options[:eval_name] = eval_name
35
35
  exec_options = options.reject { |key| key == :format }
@@ -48,6 +48,8 @@ module SkillBench
48
48
  OptionParser.new do |opts|
49
49
  opts.banner = 'Usage: skill-bench run <eval> [options]'
50
50
  opts.on('--skill NAME', 'Skill to use (can be specified multiple times)') { |v| options[:skill_names] << v }
51
+ opts.on('--pack NAME', 'Pack context for skill resolution') { |v| options[:pack] = v }
52
+ opts.on('--registry-manifest PATH', 'Path to registry.json manifest') { |v| options[:registry_manifest] = v }
51
53
  opts.on('--format FORMAT', 'Output format (human, json, junit)') { |v| options[:format] = v.to_sym }
52
54
  opts.on('-h', '--help', 'Prints this help') do
53
55
  puts opts
@@ -58,13 +60,13 @@ module SkillBench
58
60
 
59
61
  def error_missing_eval
60
62
  warn 'Error: eval name is required'
61
- warn 'Usage: skill-bench run <eval> --skill <name>'
63
+ warn 'Usage: skill-bench run <eval> [--skill <name>] [--pack <name>]'
62
64
  1
63
65
  end
64
66
 
65
67
  def error_missing_skill
66
- warn 'Error: skill name is required'
67
- warn 'Usage: skill-bench run <eval> --skill <name>'
68
+ warn 'Error: skill name or pack is required'
69
+ warn 'Usage: skill-bench run <eval> --skill <name> [--pack <name>]'
68
70
  1
69
71
  end
70
72
  end
@@ -2,6 +2,7 @@
2
2
 
3
3
  require_relative 'cli/init_command'
4
4
  require_relative 'cli/run_command'
5
+ require_relative 'cli/compare_command'
5
6
  require_relative 'cli/skill_command'
6
7
  require_relative 'cli/eval_command'
7
8
  require_relative 'cli/help_printer'
@@ -18,6 +19,7 @@ module SkillBench
18
19
  # @param argv [Array<String>] Raw CLI arguments.
19
20
  # @return [Integer] Exit code.
20
21
  def self.call(argv)
22
+ Config.reset
21
23
  new(argv).call
22
24
  end
23
25
 
@@ -35,10 +37,11 @@ module SkillBench
35
37
 
36
38
  subcommand = @argv.shift
37
39
  case subcommand
38
- when 'init' then Cli::InitCommand.call(@argv)
39
- when 'run' then Cli::RunCommand.call(@argv)
40
- when 'skill' then Cli::SkillCommand.call(@argv)
41
- when 'eval' then Cli::EvalCommand.call(@argv)
40
+ when 'init' then Cli::InitCommand.call(@argv)
41
+ when 'run' then Cli::RunCommand.call(@argv)
42
+ when 'compare' then Cli::CompareCommand.call(@argv)
43
+ when 'skill' then Cli::SkillCommand.call(@argv)
44
+ when 'eval' then Cli::EvalCommand.call(@argv)
42
45
  when '-h', '--help', 'help'
43
46
  help.call
44
47
  else
@@ -17,3 +17,4 @@ require_relative 'providers/opencode'
17
17
  require_relative 'providers/groq'
18
18
  require_relative 'providers/deepseek'
19
19
  require_relative 'providers/openrouter'
20
+ require_relative 'providers/mock'
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative '../provider_registry'
4
+ require 'json'
5
+
6
+ module SkillBench
7
+ module Clients
8
+ module Providers
9
+ # Mock LLM client for testing and local validation.
10
+ class Mock
11
+ SkillBench::Clients::ProviderRegistry.register(:mock, self)
12
+
13
+ # Mock call implementation to simulate LLM responses for test suites.
14
+ #
15
+ # @param system_prompt [String] system prompt instructions.
16
+ # @param messages [Array<Hash>] chat history messages.
17
+ # @param _options [Hash] additional keyword options.
18
+ # @return [Hash] mock response hash.
19
+ def self.call(system_prompt:, messages:, **_options)
20
+ _ = system_prompt
21
+ prompt = messages.first[:content] || messages.first['content'] || ''
22
+
23
+ # Parse dimensions from prompt
24
+ dimensions = {}
25
+ prompt.scan(/-\s+([^:]+):\s+max_score=(\d+)/).each do |name, max_score|
26
+ max = max_score.to_i
27
+ # Give baseline slightly lower score than context to simulate improvement
28
+ is_context = prompt.match?(/## Skill Context\s+\S+/)
29
+ score = is_context ? (max * 0.95).round : (max * 0.8).round
30
+ dimensions[name] = {
31
+ 'score' => score,
32
+ 'max_score' => max,
33
+ 'reasoning' => "Mock evaluation for #{name}"
34
+ }
35
+ end
36
+
37
+ dimensions['correctness'] = { 'score' => 8, 'max_score' => 10, 'reasoning' => 'Mock correctness' } if dimensions.empty?
38
+
39
+ content = {
40
+ 'dimensions' => dimensions,
41
+ 'overall_reasoning' => 'Mock evaluation overall reasoning'
42
+ }.to_json
43
+
44
+ {
45
+ success: true,
46
+ response: {
47
+ message: {
48
+ content: content
49
+ }
50
+ }
51
+ }
52
+ end
53
+ end
54
+ end
55
+ end
56
+ end
@@ -9,11 +9,15 @@ module SkillBench
9
9
  # Run an eval with specified skill(s)
10
10
  # @param eval_name [String] Name of eval to run (e.g., 'test-eval' or 'evals/test-eval')
11
11
  # @param skill_names [Array<String>] Names of skills to use
12
+ # @param pack [String, nil] Optional pack name for registry-based skill resolution
13
+ # @param registry_manifest [String, nil] Optional path to registry.json manifest
12
14
  # @return [Hash] Result with pass/fail and score
13
- def self.run(eval_name:, skill_names:)
15
+ def self.run(eval_name:, skill_names:, pack: nil, registry_manifest: nil)
14
16
  Services::RunnerService.call(
15
17
  eval_name: eval_name,
16
- skill_names: skill_names
18
+ skill_names: skill_names,
19
+ pack: pack,
20
+ registry_manifest: registry_manifest
17
21
  )
18
22
  end
19
23
  end
@@ -41,6 +41,7 @@ module SkillBench
41
41
  assign_current_provider
42
42
  @store.assign_max_execution_time(@data[:max_execution_time]) if @data.key?(:max_execution_time)
43
43
  @store.assign_allowed_commands(@data[:allowed_commands]) if @data.key?(:allowed_commands)
44
+ @store.skill_sources = @data[:skill_sources] if @data.key?(:skill_sources)
44
45
  end
45
46
 
46
47
  def apply_provider_values
@@ -19,6 +19,7 @@ module SkillBench
19
19
  current_llm_provider: :openai,
20
20
  max_execution_time: 30,
21
21
  allowed_commands: nil,
22
+ skill_sources: {},
22
23
  llm_providers_config: {
23
24
  openai: { api_key: nil, model: 'gpt-4o' },
24
25
  anthropic: { api_key: nil, model: 'claude-sonnet-4-20250514' },
@@ -32,6 +32,13 @@ module SkillBench
32
32
  store.llm_providers_config
33
33
  end
34
34
 
35
+ # Returns skill sources mapping.
36
+ #
37
+ # @return [Hash, nil] skill source name → directory path
38
+ def skill_sources
39
+ store.skill_sources
40
+ end
41
+
35
42
  # Returns the API key for the current LLM provider.
36
43
  #
37
44
  # @return [String, nil] API key for the current provider
@@ -29,9 +29,9 @@ module SkillBench
29
29
  data = JSON.parse(File.read(@path), symbolize_names: true)
30
30
  return warn_invalid_config unless data.is_a?(Hash)
31
31
 
32
- success(data.slice(:current_llm_provider, :max_execution_time, :allowed_commands)
33
- .compact
34
- .merge(providers: normalized_providers(data[:providers])))
32
+ success_data = data.slice(:current_llm_provider, :max_execution_time, :allowed_commands, :skill_sources).compact
33
+ success_data[:current_llm_provider] ||= data[:provider] if data.key?(:provider)
34
+ success(success_data.merge(providers: normalized_providers(data[:providers])))
35
35
  rescue JSON::ParserError => e
36
36
  log_parse_error(e)
37
37
  failure('Failed to parse config file')
@@ -24,6 +24,11 @@ module SkillBench
24
24
  # @return [Hash, nil] provider configuration by provider name
25
25
  attr_accessor :llm_providers_config
26
26
 
27
+ # Returns skill sources mapping.
28
+ #
29
+ # @return [Hash, nil] skill source name → directory path
30
+ attr_accessor :skill_sources
31
+
27
32
  # Initializes a new configuration store with empty provider settings.
28
33
  def initialize
29
34
  @llm_providers_config = {}
@@ -74,7 +74,9 @@ module SkillBench
74
74
  @store = Config::Store.new
75
75
  apply_defaults
76
76
  apply_json_config(home_config_path)
77
- apply_json_config(Pathname.new(Dir.pwd).join(CONFIG_FILENAME))
77
+ local_path = Pathname.new(Dir.pwd).join(CONFIG_FILENAME)
78
+ is_workspace_file = File.exist?(File.join(Dir.pwd, 'ruby-skill-bench.gemspec'))
79
+ apply_json_config(local_path) unless defined?(Minitest) && is_workspace_file
78
80
  apply_env_overrides
79
81
  end
80
82
 
@@ -122,6 +124,13 @@ module SkillBench
122
124
  store.llm_providers_config || {}
123
125
  end
124
126
 
127
+ # Returns skill sources mapping.
128
+ #
129
+ # @return [Hash, nil] skill source name → directory path
130
+ def skill_sources
131
+ store.skill_sources || {}
132
+ end
133
+
125
134
  # Returns API key from configuration.
126
135
  #
127
136
  # @return [String, nil] API key
@@ -49,6 +49,26 @@ module SkillBench
49
49
  { success: false, response: { error: { message: e.message } } }
50
50
  end
51
51
 
52
+ # Compatibility methods for ComparisonReporter
53
+
54
+ # Returns the list of dimensions from the context run.
55
+ #
56
+ # @return [Array<Object>] List of objects responding to name and score
57
+ def dimensions
58
+ return [] unless context_dimensions
59
+
60
+ context_dimensions.map do |name, dim_hash|
61
+ Struct.new(:name, :score).new(name.to_s, dim_hash[:score] || dim_hash['score'])
62
+ end
63
+ end
64
+
65
+ # Returns the total context score.
66
+ #
67
+ # @return [Numeric, nil]
68
+ def total
69
+ context_total
70
+ end
71
+
52
72
  private
53
73
 
54
74
  attr_reader :baseline, :context
@@ -1,5 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'pathname'
4
+
3
5
  module SkillBench
4
6
  module Execution
5
7
  # Resolves the source skill or workflow path for a given evaluation target.
@@ -8,6 +10,8 @@ module SkillBench
8
10
  #
9
11
  # @param eval_folder_path [String] Relative path to the eval directory.
10
12
  # @param skill_path [String, nil] Optional explicit override for the source directory.
13
+ # @param skill_sources [Hash] Optional skill source name → directory path mapping for fallback.
14
+ # When provided and local resolution does not yield an existing path, each source is checked.
11
15
  # @return [String, nil] The resolved source path relative to the evaluator repo root, or nil if unmappable.
12
16
  # @example Infer a skill source path (NEW format):
13
17
  # SkillBench::Execution::SourcePathResolver.call(
@@ -19,12 +23,57 @@ module SkillBench
19
23
  # eval_folder_path: 'evals/skills/code-quality/rails-code-review/review-order'
20
24
  # )
21
25
  # # => "skills/code-quality/rails-code-review"
22
- def self.call(eval_folder_path:, skill_path: nil)
26
+ def self.call(eval_folder_path:, skill_path: nil, skill_sources: {})
23
27
  return skill_path if skill_path && !skill_path.empty?
24
28
 
25
- segments = eval_folder_path.to_s.split('/').reject(&:empty?)
29
+ segments = Pathname.new(eval_folder_path.to_s).each_filename.to_a
30
+
31
+ local = resolve_skills_path(segments) || resolve_workflows_path(segments)
32
+
33
+ unless local.nil? || skill_sources.empty?
34
+ skill_name = extract_skill_name(segments)
35
+ return local unless skill_name
36
+ return local if skill_exists_at?(local)
37
+
38
+ skill_sources.each_value do |source_path|
39
+ candidate = find_skill_in_source(source_path, skill_name)
40
+ return candidate if candidate
41
+ end
42
+ end
43
+
44
+ local
45
+ end
46
+
47
+ # Extracts the skill name from the eval path segments.
48
+ #
49
+ # @param segments [Array<String>] Path segments
50
+ # @return [String, nil] Skill name or nil
51
+ def self.extract_skill_name(segments)
52
+ index = segments.rindex('skills')
53
+ return nil unless index
54
+
55
+ remaining = segments[(index + 1)..]
56
+ return nil if remaining.empty?
26
57
 
27
- resolve_skills_path(segments) || resolve_workflows_path(segments)
58
+ remaining[0]
59
+ end
60
+
61
+ # Finds a skill directory within a source path by name.
62
+ #
63
+ # @param source_path [String] Root directory containing skill categories
64
+ # @param skill_name [String] Name of the skill to find
65
+ # @return [String, nil] Path to the skill directory or nil
66
+ def self.find_skill_in_source(source_path, skill_name)
67
+ return nil unless source_path && Dir.exist?(source_path)
68
+
69
+ Dir.glob(File.join(source_path, '*')).each do |entry|
70
+ next unless Dir.exist?(entry)
71
+
72
+ candidate = File.join(entry, skill_name)
73
+ return candidate if Dir.exist?(candidate) && File.exist?(File.join(candidate, 'SKILL.md'))
74
+ end
75
+
76
+ nil
28
77
  end
29
78
 
30
79
  private_class_method def self.resolve_skills_path(segments)
@@ -55,6 +104,13 @@ module SkillBench
55
104
  workflow_name = segments[index + 1]
56
105
  "workflows/#{workflow_name}" if workflow_name
57
106
  end
107
+
108
+ private_class_method def self.skill_exists_at?(path)
109
+ return false unless path
110
+
111
+ full_path = path.end_with?('SKILL.md') ? path : File.join(path, 'SKILL.md')
112
+ File.exist?(full_path)
113
+ end
58
114
  end
59
115
  end
60
116
  end