robot_lab 0.0.9 → 0.0.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +53 -0
  3. data/README.md +210 -1
  4. data/Rakefile +2 -1
  5. data/docs/api/core/result.md +123 -0
  6. data/docs/api/core/robot.md +182 -0
  7. data/docs/api/errors.md +185 -0
  8. data/docs/guides/building-robots.md +125 -0
  9. data/docs/guides/creating-networks.md +21 -0
  10. data/docs/guides/index.md +10 -0
  11. data/docs/guides/knowledge.md +182 -0
  12. data/docs/guides/mcp-integration.md +106 -0
  13. data/docs/guides/memory.md +2 -0
  14. data/docs/guides/observability.md +486 -0
  15. data/docs/guides/ractor-parallelism.md +364 -0
  16. data/docs/superpowers/plans/2026-04-14-ractor-integration.md +1538 -0
  17. data/docs/superpowers/specs/2026-04-14-ractor-integration-design.md +258 -0
  18. data/examples/19_token_tracking.rb +128 -0
  19. data/examples/20_circuit_breaker.rb +153 -0
  20. data/examples/21_learning_loop.rb +164 -0
  21. data/examples/22_context_compression.rb +179 -0
  22. data/examples/23_convergence.rb +137 -0
  23. data/examples/24_structured_delegation.rb +150 -0
  24. data/examples/25_history_search/conversation.jsonl +30 -0
  25. data/examples/25_history_search.rb +136 -0
  26. data/examples/26_document_store/api_versioning_adr.md +52 -0
  27. data/examples/26_document_store/incident_postmortem.md +46 -0
  28. data/examples/26_document_store/postgres_runbook.md +49 -0
  29. data/examples/26_document_store/redis_caching_guide.md +48 -0
  30. data/examples/26_document_store/sidekiq_guide.md +51 -0
  31. data/examples/26_document_store.rb +147 -0
  32. data/examples/27_incident_response/incident_response.rb +244 -0
  33. data/examples/28_mcp_discovery.rb +112 -0
  34. data/examples/29_ractor_tools.rb +243 -0
  35. data/examples/30_ractor_network.rb +256 -0
  36. data/examples/README.md +136 -0
  37. data/examples/prompts/skill_with_mcp_test.md +9 -0
  38. data/examples/prompts/skill_with_robot_name_test.md +5 -0
  39. data/examples/prompts/skill_with_tools_test.md +6 -0
  40. data/lib/robot_lab/bus_poller.rb +149 -0
  41. data/lib/robot_lab/convergence.rb +69 -0
  42. data/lib/robot_lab/delegation_future.rb +93 -0
  43. data/lib/robot_lab/document_store.rb +155 -0
  44. data/lib/robot_lab/error.rb +25 -0
  45. data/lib/robot_lab/history_compressor.rb +205 -0
  46. data/lib/robot_lab/mcp/client.rb +17 -5
  47. data/lib/robot_lab/mcp/connection_poller.rb +187 -0
  48. data/lib/robot_lab/mcp/server.rb +7 -2
  49. data/lib/robot_lab/mcp/server_discovery.rb +110 -0
  50. data/lib/robot_lab/mcp/transports/stdio.rb +6 -0
  51. data/lib/robot_lab/memory.rb +103 -6
  52. data/lib/robot_lab/network.rb +44 -9
  53. data/lib/robot_lab/ractor_boundary.rb +42 -0
  54. data/lib/robot_lab/ractor_job.rb +37 -0
  55. data/lib/robot_lab/ractor_memory_proxy.rb +85 -0
  56. data/lib/robot_lab/ractor_network_scheduler.rb +154 -0
  57. data/lib/robot_lab/ractor_worker_pool.rb +117 -0
  58. data/lib/robot_lab/robot/bus_messaging.rb +43 -65
  59. data/lib/robot_lab/robot/history_search.rb +69 -0
  60. data/lib/robot_lab/robot.rb +228 -11
  61. data/lib/robot_lab/robot_result.rb +24 -5
  62. data/lib/robot_lab/run_config.rb +1 -1
  63. data/lib/robot_lab/text_analysis.rb +103 -0
  64. data/lib/robot_lab/tool.rb +42 -3
  65. data/lib/robot_lab/tool_config.rb +1 -1
  66. data/lib/robot_lab/version.rb +1 -1
  67. data/lib/robot_lab/waiter.rb +49 -29
  68. data/lib/robot_lab.rb +25 -0
  69. data/mkdocs.yml +1 -0
  70. metadata +72 -2
@@ -0,0 +1,164 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example 21: Learning Accumulation Loop
5
+ #
6
+ # Demonstrates robot.learn() for building up observations between runs.
7
+ # A code reviewer robot analyzes Ruby snippets. After each review, the
8
+ # caller records a key insight as a learning. On the next run, those
9
+ # learnings are automatically prepended to the user message so the robot
10
+ # can incorporate prior observations without needing a persistent chat.
11
+ #
12
+ # Demonstrates:
13
+ # - robot.learn(text) — adds a learning, deduplicates automatically
14
+ # - robot.learnings — read the accumulated list
15
+ # - Learnings injected as "LEARNINGS FROM PREVIOUS RUNS:" prefix
16
+ # - Superset dedup: a broader learning replaces narrower earlier ones
17
+ # - Memory persistence: learnings survive rebuilding with the same Memory
18
+ #
19
+ # Usage:
20
+ # ANTHROPIC_API_KEY=your_key ruby examples/21_learning_loop.rb
21
+
22
+ ENV["ROBOT_LAB_TEMPLATE_PATH"] ||= File.join(__dir__, "prompts")
23
+
24
+ require_relative "../lib/robot_lab"
25
+
26
+ SNIPPETS = [
27
+ {
28
+ code: <<~RUBY,
29
+ def process(items)
30
+ results = []
31
+ items.each do |item|
32
+ results << item * 2
33
+ end
34
+ results
35
+ end
36
+ RUBY
37
+ insight: "This codebase prefers map/collect over manual array accumulation"
38
+ },
39
+ {
40
+ code: <<~RUBY,
41
+ def find_user(id)
42
+ user = User.find(id)
43
+ if user != nil
44
+ return user
45
+ end
46
+ return nil
47
+ end
48
+ RUBY
49
+ insight: "Explicit nil comparisons and redundant returns appear frequently here"
50
+ },
51
+ {
52
+ code: <<~RUBY,
53
+ def calculate_total(cart)
54
+ total = 0
55
+ cart.items.each do |item|
56
+ if item.discount != nil
57
+ total = total + (item.price - item.discount)
58
+ else
59
+ total = total + item.price
60
+ end
61
+ end
62
+ return total
63
+ end
64
+ RUBY
65
+ insight: "Cart/pricing logic tends to have missing edge cases around nil discounts and zero values"
66
+ }
67
+ ].freeze
68
+
69
+ puts "=" * 65
70
+ puts "Example 21: Learning Accumulation Loop"
71
+ puts "=" * 65
72
+ puts
73
+
74
+ robot = RobotLab.build(
75
+ name: "code_reviewer",
76
+ system_prompt: <<~PROMPT,
77
+ You are a concise Ruby code reviewer. For each snippet:
78
+ 1. Identify the main issue (one sentence).
79
+ 2. Show the improved version (code block).
80
+ Keep responses under 80 words total.
81
+ PROMPT
82
+ model: "claude-haiku-4-5-20251001"
83
+ )
84
+
85
+ SNIPPETS.each_with_index do |item, i|
86
+ run_number = i + 1
87
+
88
+ # ---------------------------------------------------------------
89
+ # Show what learnings are active going into this run
90
+ # ---------------------------------------------------------------
91
+ puts "--- Run #{run_number} ---"
92
+ if robot.learnings.empty?
93
+ puts "Learnings: (none yet)"
94
+ else
95
+ puts "Learnings injected into this prompt:"
96
+ robot.learnings.each { |l| puts " • #{l}" }
97
+ end
98
+ puts
99
+
100
+ # ---------------------------------------------------------------
101
+ # Run the robot — accumulated learnings are prepended automatically
102
+ # ---------------------------------------------------------------
103
+ result = robot.run("Review this Ruby snippet:\n\n#{item[:code]}")
104
+
105
+ puts "Review:"
106
+ puts result.reply&.strip&.gsub(/^/, " ")
107
+ puts
108
+
109
+ # ---------------------------------------------------------------
110
+ # Record the insight from this run as a learning
111
+ # ---------------------------------------------------------------
112
+ robot.learn(item[:insight])
113
+ puts "Added learning: #{item[:insight].inspect}"
114
+ puts
115
+ end
116
+
117
+ # ---------------------------------------------------------------
118
+ # Show the full accumulated learning list
119
+ # ---------------------------------------------------------------
120
+ puts "=" * 65
121
+ puts "Accumulated learnings (#{robot.learnings.size} total):"
122
+ robot.learnings.each_with_index { |l, i| puts " #{i + 1}. #{l}" }
123
+ puts
124
+
125
+ # ---------------------------------------------------------------
126
+ # Demonstrate superset dedup: a broader learning replaces narrower ones
127
+ # ---------------------------------------------------------------
128
+ puts "--- Deduplication demo ---"
129
+ puts
130
+ robot2 = RobotLab.build(name: "reviewer2", system_prompt: "You review code.")
131
+
132
+ robot2.learn("avoid using puts")
133
+ robot2.learn("avoid using puts and p in production code") # covers the first
134
+
135
+ puts "Learnings after adding broader statement (should be 1, not 2):"
136
+ robot2.learnings.each_with_index { |l, i| puts " #{i + 1}. #{l}" }
137
+ puts
138
+
139
+ # ---------------------------------------------------------------
140
+ # Demonstrate persistence: learnings survive a robot rebuild using
141
+ # the same Memory object
142
+ # ---------------------------------------------------------------
143
+ puts "--- Persistence across rebuild ---"
144
+ puts
145
+
146
+ shared_memory = robot.instance_variable_get(:@memory)
147
+ rebuilt = RobotLab.build(
148
+ name: "code_reviewer",
149
+ system_prompt: "You review code.",
150
+ model: "claude-haiku-4-5-20251001"
151
+ )
152
+ rebuilt.instance_variable_set(:@memory, shared_memory)
153
+
154
+ # Trigger the memory restore path
155
+ persisted = shared_memory.get(:learnings)
156
+ rebuilt.instance_variable_set(:@learnings, Array(persisted))
157
+
158
+ puts "Learnings on rebuilt robot (#{rebuilt.learnings.size}):"
159
+ rebuilt.learnings.each_with_index { |l, i| puts " #{i + 1}. #{l}" }
160
+ puts
161
+
162
+ puts "=" * 65
163
+ puts "Learning loop demo complete."
164
+ puts "=" * 65
@@ -0,0 +1,179 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example 22: Context Window Compression
5
+ #
6
+ # Demonstrates robot.compress_history() for reducing token usage in long
7
+ # conversations. Old turns are scored against the recent context using
8
+ # stemmed term-frequency cosine similarity. High-relevance turns are kept
9
+ # verbatim; irrelevant turns are dropped; medium-relevance turns can be
10
+ # summarized by a second robot.
11
+ #
12
+ # Demonstrates:
13
+ # - robot.compress_history() — drop/keep/summarize old turns in-place
14
+ # - recent_turns: N — last N user+assistant pairs always protected
15
+ # - keep_threshold: / drop_threshold: — tunable relevance bands
16
+ # - summarizer: — optional lambda(text) -> String for medium-relevance
17
+ # - Token reduction reported before/after compression
18
+ #
19
+ # Requires:
20
+ # gem 'classifier', '~> 2.3' # add to your Gemfile
21
+ #
22
+ # Usage:
23
+ # ANTHROPIC_API_KEY=your_key ruby examples/22_context_compression.rb
24
+
25
+ ENV["ROBOT_LAB_TEMPLATE_PATH"] ||= File.join(__dir__, "prompts")
26
+
27
+ require_relative "../lib/robot_lab"
28
+
29
+ # ---------------------------------------------------------------------------
30
+ # Check optional dependency
31
+ # ---------------------------------------------------------------------------
32
+ begin
33
+ require "classifier"
34
+ rescue LoadError
35
+ puts "This example requires the classifier gem."
36
+ puts "Add to your Gemfile: gem 'classifier', '~> 2.3'"
37
+ exit 1
38
+ end
39
+
40
+ # ---------------------------------------------------------------------------
41
+ # Helper to count approximate tokens (rough: 4 chars per token)
42
+ # ---------------------------------------------------------------------------
43
+ def approx_tokens(messages)
44
+ messages.sum do |m|
45
+ content = m.respond_to?(:content) ? m.content.to_s : m.to_s
46
+ (content.length / 4.0).ceil
47
+ end
48
+ end
49
+
50
+ puts "=" * 60
51
+ puts "Example 22: Context Window Compression"
52
+ puts "=" * 60
53
+ puts
54
+
55
+ # ---------------------------------------------------------------------------
56
+ # Build a robot and simulate a long conversation on two topics
57
+ # ---------------------------------------------------------------------------
58
+ bot = RobotLab.build(
59
+ name: "assistant",
60
+ system_prompt: "You are a concise Ruby expert. Reply in 2-3 sentences."
61
+ )
62
+
63
+ puts "Simulating a long conversation (no real LLM calls)..."
64
+ puts
65
+
66
+ # Simulate a conversation history with two distinct topics:
67
+ # older turns: Ruby metaprogramming (will become irrelevant)
68
+ # recent turns: Rails routing (current topic)
69
+
70
+ require "ostruct"
71
+
72
+ FakeMsg = Struct.new(:role, :content, :tool_calls, :stop_reason) do
73
+ def text? = true
74
+ def tool_use? = false
75
+ def system? = role == :system
76
+ def user? = role == :user
77
+ def assistant? = role == :assistant
78
+ end
79
+
80
+ def fake(role, content)
81
+ FakeMsg.new(role.to_sym, content, nil, :stop)
82
+ end
83
+
84
+ history = [
85
+ fake(:system, "You are a concise Ruby expert. Reply in 2-3 sentences."),
86
+
87
+ # --- OLD TOPIC: Ruby metaprogramming (5 turns back) ---
88
+ fake(:user, "Explain Ruby's method_missing and when to use it."),
89
+ fake(:assistant, "method_missing is called when an object receives an undefined message. It's useful for DSLs and proxy objects, but add respond_to_missing? too. Use sparingly as it hurts performance."),
90
+ fake(:user, "What's the difference between define_method and def?"),
91
+ fake(:assistant, "define_method creates methods dynamically from a block, capturing closure variables. def is the static keyword form. Use define_method when the method body depends on runtime values."),
92
+ fake(:user, "How does BasicObject differ from Object in Ruby?"),
93
+ fake(:assistant, "BasicObject is the root class with minimal methods, useful for proxy and DSL objects that must not inherit standard methods. Object inherits from BasicObject and adds Kernel, making it the normal base class."),
94
+
95
+ # --- RECENT TOPIC: Rails routing (last 2 turns, always protected) ---
96
+ fake(:user, "How does Rails routing work with resourceful controllers?"),
97
+ fake(:assistant, "resources :posts generates 7 RESTful routes mapping HTTP verbs to controller actions. You can nest resources and add member/collection routes. Run rails routes to see everything."),
98
+ fake(:user, "What is the difference between member and collection routes in Rails?"),
99
+ fake(:assistant, "Member routes operate on a specific resource (needs :id), collection routes operate on the whole collection. Use member { get :preview } and collection { get :search } inside a resources block.")
100
+ ]
101
+
102
+ before_count = history.size
103
+ before_tokens = approx_tokens(history)
104
+
105
+ puts "Before compression:"
106
+ puts " Messages : #{before_count}"
107
+ puts " ~Tokens : #{before_tokens}"
108
+ puts
109
+
110
+ # ---------------------------------------------------------------------------
111
+ # Option A: Drop medium-relevance turns (no summarizer)
112
+ # ---------------------------------------------------------------------------
113
+ compressor_a = RobotLab::HistoryCompressor.new(
114
+ messages: history,
115
+ recent_turns: 2,
116
+ keep_threshold: 0.25,
117
+ drop_threshold: 0.05,
118
+ summarizer: nil
119
+ )
120
+
121
+ result_a = compressor_a.call
122
+ tokens_a = approx_tokens(result_a)
123
+
124
+ puts "After compression (drop mode, recent_turns: 2, keep: 0.25, drop: 0.05):"
125
+ puts " Messages : #{result_a.size} (removed #{before_count - result_a.size})"
126
+ puts " ~Tokens : #{tokens_a} (saved #{before_tokens - tokens_a})"
127
+ puts " Kept roles: #{result_a.map(&:role).join(', ')}"
128
+ puts
129
+
130
+ # ---------------------------------------------------------------------------
131
+ # Option B: With a summarizer lambda for medium-relevance turns
132
+ # ---------------------------------------------------------------------------
133
+ puts "After compression (summarize mode, keep: 0.5, drop: 0.05):"
134
+
135
+ summarizer = lambda do |text|
136
+ # In production this would call a small LLM robot.
137
+ # Here we fake it by taking the first sentence.
138
+ text.split(/[.!?]/).first.to_s.strip + "."
139
+ end
140
+
141
+ compressor_b = RobotLab::HistoryCompressor.new(
142
+ messages: history,
143
+ recent_turns: 2,
144
+ keep_threshold: 0.5,
145
+ drop_threshold: 0.05,
146
+ summarizer: summarizer
147
+ )
148
+
149
+ result_b = compressor_b.call
150
+ tokens_b = approx_tokens(result_b)
151
+
152
+ puts " Messages : #{result_b.size} (removed #{before_count - result_b.size})"
153
+ puts " ~Tokens : #{tokens_b} (saved #{before_tokens - tokens_b})"
154
+ puts " Kept roles: #{result_b.map(&:role).join(', ')}"
155
+ puts
156
+
157
+ # ---------------------------------------------------------------------------
158
+ # Show the LLM summarizer pattern (not executed — requires API key)
159
+ # ---------------------------------------------------------------------------
160
+ puts "=" * 60
161
+ puts "LLM summarizer pattern (requires API key):"
162
+ puts "=" * 60
163
+ puts <<~RUBY
164
+
165
+ summarizer_bot = RobotLab.build(
166
+ name: "summarizer",
167
+ system_prompt: "Summarize the following text in one sentence."
168
+ )
169
+
170
+ robot.compress_history(
171
+ recent_turns: 3,
172
+ keep_threshold: 0.6,
173
+ drop_threshold: 0.2,
174
+ summarizer: ->(text) { summarizer_bot.run("Summarize: \#{text}").reply }
175
+ )
176
+
177
+ RUBY
178
+
179
+ puts "Done."
@@ -0,0 +1,137 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example 23: Debate Convergence Detection
5
+ #
6
+ # Demonstrates RobotLab::Convergence for detecting when two independent
7
+ # agents have reached the same conclusion. This enables a fast-path that
8
+ # skips an expensive reconciler LLM call when verifiers already agree.
9
+ #
10
+ # Demonstrates:
11
+ # - Convergence.similarity(a, b) — 0.0..1.0 cosine similarity score
12
+ # - Convergence.detected?(a, b) — boolean above default threshold (0.85)
13
+ # - Convergence.detected?(a, b, threshold: 0.6) — custom threshold
14
+ # - Router fast-path pattern: skip reconciler when verifiers agree
15
+ #
16
+ # Requires:
17
+ # gem 'classifier', '~> 2.3' # add to your Gemfile
18
+ #
19
+ # Usage:
20
+ # ANTHROPIC_API_KEY=your_key ruby examples/23_convergence.rb
21
+
22
+ ENV["ROBOT_LAB_TEMPLATE_PATH"] ||= File.join(__dir__, "prompts")
23
+
24
+ require_relative "../lib/robot_lab"
25
+
26
+ # ---------------------------------------------------------------------------
27
+ # Check optional dependency
28
+ # ---------------------------------------------------------------------------
29
+ begin
30
+ require "classifier"
31
+ rescue LoadError
32
+ puts "This example requires the classifier gem."
33
+ puts "Add to your Gemfile: gem 'classifier', '~> 2.3'"
34
+ exit 1
35
+ end
36
+
37
+ puts "=" * 60
38
+ puts "Example 23: Debate Convergence Detection"
39
+ puts "=" * 60
40
+ puts
41
+
42
+ # ---------------------------------------------------------------------------
43
+ # Similarity scoring
44
+ # ---------------------------------------------------------------------------
45
+ pairs = {
46
+ "Identical responses" => [
47
+ "The time complexity of quicksort is O(n log n) average case and O(n²) worst case. Use merge sort for guaranteed O(n log n).",
48
+ "The time complexity of quicksort is O(n log n) average case and O(n²) worst case. Use merge sort for guaranteed O(n log n)."
49
+ ],
50
+ "Semantically similar (same conclusion)" => [
51
+ "Quicksort has average O(n log n) time complexity but degrades to O(n²) in the worst case. Prefer merge sort when stability matters.",
52
+ "The average time complexity of quicksort is O(n log n). In the worst case it becomes O(n²), so merge sort is safer for sorted input."
53
+ ],
54
+ "Partially related (same topic, different focus)" => [
55
+ "Quicksort is O(n log n) average case. It is in-place and cache-friendly, making it fast in practice despite worst-case concerns.",
56
+ "Merge sort guarantees O(n log n) in all cases and is stable. It requires O(n) extra space unlike the in-place quicksort."
57
+ ],
58
+ "Unrelated responses" => [
59
+ "Quicksort has average O(n log n) time complexity but degrades to O(n²) in the worst case.",
60
+ "The Pacific Ocean is the largest and deepest ocean on Earth, covering more than thirty percent of the planet surface area."
61
+ ]
62
+ }
63
+
64
+ puts "Similarity scores:"
65
+ puts "-" * 60
66
+ pairs.each do |label, (a, b)|
67
+ score = RobotLab::Convergence.similarity(a, b)
68
+ converged = RobotLab::Convergence.detected?(a, b, threshold: 0.6)
69
+ puts " #{label}"
70
+ puts " Score: #{"%.3f" % score} | Converged at 0.60: #{converged ? "YES" : "no"}"
71
+ puts
72
+ end
73
+
74
+ # ---------------------------------------------------------------------------
75
+ # Router fast-path pattern
76
+ # ---------------------------------------------------------------------------
77
+ puts "=" * 60
78
+ puts "Router fast-path pattern"
79
+ puts "=" * 60
80
+ puts <<~RUBY
81
+
82
+ # Two verifier robots run in parallel and store their replies in shared memory.
83
+ # The router checks convergence before dispatching to the expensive reconciler.
84
+
85
+ router = ->(args) do
86
+ a = args.context[:verifier_a]&.reply.to_s
87
+ b = args.context[:verifier_b]&.reply.to_s
88
+
89
+ if RobotLab::Convergence.detected?(a, b)
90
+ nil # Both agree — skip reconciler, network halts here
91
+ else
92
+ ["reconciler"]
93
+ end
94
+ end
95
+
96
+ network = RobotLab.create_network(
97
+ name: "fact_check",
98
+ robots: [verifier_a, verifier_b, reconciler],
99
+ router: router
100
+ )
101
+
102
+ result = network.run(message: "Is this claim accurate?")
103
+
104
+ RUBY
105
+
106
+ # ---------------------------------------------------------------------------
107
+ # Demonstrate with simulated verifier outputs
108
+ # ---------------------------------------------------------------------------
109
+ puts "Simulating verifier fast-path:"
110
+ puts "-" * 60
111
+
112
+ verifier_outputs = [
113
+ {
114
+ label: "Verifiers agree (skip reconciler)",
115
+ a: "The claim is accurate. Photosynthesis converts light energy into glucose using carbon dioxide and water, producing oxygen as a byproduct.",
116
+ b: "This is correct. Photosynthesis uses sunlight, CO₂, and water to produce glucose and oxygen in plant cells.",
117
+ threshold: 0.5
118
+ },
119
+ {
120
+ label: "Verifiers disagree (call reconciler)",
121
+ a: "The claim is accurate. The Great Wall of China is visible from space with the naked eye at low Earth orbit.",
122
+ b: "The claim is false. Astronauts confirm the Great Wall cannot be seen from space without magnification because it is far too narrow.",
123
+ threshold: 0.7
124
+ }
125
+ ]
126
+
127
+ verifier_outputs.each do |scenario|
128
+ score = RobotLab::Convergence.similarity(scenario[:a], scenario[:b])
129
+ converged = RobotLab::Convergence.detected?(scenario[:a], scenario[:b], threshold: scenario[:threshold])
130
+ action = converged ? "SKIP reconciler (fast-path)" : "CALL reconciler"
131
+
132
+ puts " #{scenario[:label]}"
133
+ puts " Score: #{"%.3f" % score} → #{action}"
134
+ puts
135
+ end
136
+
137
+ puts "Done."
@@ -0,0 +1,150 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Example 24: Structured Delegation
5
+ #
6
+ # Demonstrates robot.delegate(to:, task:) for structured inter-robot calls,
7
+ # in both synchronous (blocking) and asynchronous (parallel fan-out) modes.
8
+ #
9
+ # Demonstrates:
10
+ # - robot.delegate(to:, task:) — sync: blocks, returns RobotResult
11
+ # - robot.delegate(to:, task:, async: true) — async: returns DelegationFuture
12
+ # - future.value / future.value(timeout: N) — block until result ready
13
+ # - future.resolved? — non-blocking poll
14
+ # - result.delegated_by — which robot delegated
15
+ # - result.robot_name — which robot did the work
16
+ # - result.duration — wall-clock seconds for the delegated call
17
+ # - result.input_tokens / result.output_tokens — delegatee's token usage
18
+ # - Contrast with bus messaging (fire-and-forget) and pipelines (predefined)
19
+ #
20
+ # Usage:
21
+ # ANTHROPIC_API_KEY=your_key ruby examples/24_structured_delegation.rb
22
+
23
+ ENV["ROBOT_LAB_TEMPLATE_PATH"] ||= File.join(__dir__, "prompts")
24
+
25
+ require_relative "../lib/robot_lab"
26
+
27
+ puts "=" * 60
28
+ puts "Example 24: Structured Delegation"
29
+ puts "=" * 60
30
+ puts
31
+
32
+ # ---------------------------------------------------------------------------
33
+ # Build a manager and two specialist robots
34
+ # ---------------------------------------------------------------------------
35
+ manager = RobotLab.build(
36
+ name: "manager",
37
+ system_prompt: "You are a project manager. Delegate tasks concisely."
38
+ )
39
+
40
+ summarizer = RobotLab.build(
41
+ name: "summarizer",
42
+ system_prompt: "You are a concise summarizer. Produce a 1-2 sentence summary."
43
+ )
44
+
45
+ analyst = RobotLab.build(
46
+ name: "analyst",
47
+ system_prompt: "You are a data analyst. Identify the key metric in one sentence."
48
+ )
49
+
50
+ # ---------------------------------------------------------------------------
51
+ # Manager delegates to each specialist in turn
52
+ # ---------------------------------------------------------------------------
53
+ document = <<~TEXT
54
+ Q4 revenue came in at $4.2M, up 18% year-over-year. Customer acquisition
55
+ cost dropped to $120, the lowest in three years. Churn held steady at 2.1%.
56
+ Net promoter score improved from 42 to 58. The mobile app drove 34% of new
57
+ sign-ups, compared to 19% in Q3.
58
+ TEXT
59
+
60
+ puts "Document:"
61
+ puts document
62
+ puts "-" * 60
63
+
64
+ # ---------------------------------------------------------------------------
65
+ # Synchronous delegation — sequential, blocks until each result arrives
66
+ # ---------------------------------------------------------------------------
67
+ puts "── Synchronous (sequential) ──────────────────────────────"
68
+ puts
69
+
70
+ puts "Delegating to summarizer (blocking)..."
71
+ summary_result = manager.delegate(to: summarizer, task: "Summarize this report:\n\n#{document}")
72
+
73
+ puts "Summary (from #{summary_result.robot_name}, delegated by #{summary_result.delegated_by}):"
74
+ puts " #{summary_result.reply}"
75
+ puts " Duration: #{"%.2f" % summary_result.duration}s | " \
76
+ "Tokens: #{summary_result.input_tokens} in / #{summary_result.output_tokens} out"
77
+ puts
78
+
79
+ puts "Delegating to analyst (blocking)..."
80
+ analysis_result = manager.delegate(to: analyst, task: "What is the single most important metric here?\n\n#{document}")
81
+
82
+ puts "Analysis (from #{analysis_result.robot_name}, delegated by #{analysis_result.delegated_by}):"
83
+ puts " #{analysis_result.reply}"
84
+ puts " Duration: #{"%.2f" % analysis_result.duration}s | " \
85
+ "Tokens: #{analysis_result.input_tokens} in / #{analysis_result.output_tokens} out"
86
+ puts
87
+
88
+ # ---------------------------------------------------------------------------
89
+ # Asynchronous delegation — parallel fan-out, results collected later
90
+ # ---------------------------------------------------------------------------
91
+ puts "── Asynchronous (parallel fan-out) ───────────────────────"
92
+ puts
93
+
94
+ # Fresh robots — each delegate call should start from a clean slate
95
+ async_summarizer = RobotLab.build(
96
+ name: "summarizer",
97
+ system_prompt: "You are a concise summarizer. Produce a 1-2 sentence summary."
98
+ )
99
+ async_analyst = RobotLab.build(
100
+ name: "analyst",
101
+ system_prompt: "You are a data analyst. Identify the key metric in one sentence."
102
+ )
103
+
104
+ puts "Firing both delegations in parallel..."
105
+ t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
106
+
107
+ f_summary = manager.delegate(to: async_summarizer, task: "Summarize this report:\n\n#{document}", async: true)
108
+ f_analysis = manager.delegate(to: async_analyst, task: "What is the single most important metric?\n\n#{document}", async: true)
109
+
110
+ puts "Both futures launched. Futures resolved? " \
111
+ "summary=#{f_summary.resolved?} analysis=#{f_analysis.resolved?}"
112
+ puts "Collecting results..."
113
+
114
+ summary = f_summary.value(timeout: 60)
115
+ analysis = f_analysis.value(timeout: 60)
116
+
117
+ elapsed = Process.clock_gettime(Process::CLOCK_MONOTONIC) - t0
118
+
119
+ puts
120
+ puts "Summary (#{summary.robot_name}): #{summary.reply}"
121
+ puts "Analysis (#{analysis.robot_name}): #{analysis.reply}"
122
+ puts
123
+ puts "Total wall time with parallelism: #{"%.2f" % elapsed}s " \
124
+ "(vs ~#{"%.2f" % (summary.duration + analysis.duration)}s sequential)"
125
+ puts
126
+
127
+ # ---------------------------------------------------------------------------
128
+ # Contrast with the alternatives
129
+ # ---------------------------------------------------------------------------
130
+ puts "=" * 60
131
+ puts "When to use delegate vs. the alternatives"
132
+ puts "=" * 60
133
+ puts <<~TEXT
134
+
135
+ bus messaging — fire-and-forget; no return value; async
136
+ use when: you want to notify without waiting
137
+
138
+ pipeline — predefined sequence; robots share memory
139
+ use when: you have a fixed workflow graph
140
+
141
+ delegate() — synchronous; blocks; returns RobotResult with metadata
142
+ use when: one robot needs the result of another's work
143
+
144
+ delegate(async:true) — returns DelegationFuture immediately
145
+ use when: you want to run multiple delegates in
146
+ parallel and collect results when ready
147
+
148
+ TEXT
149
+
150
+ puts "Done."
@@ -0,0 +1,30 @@
1
+ {"role":"user","content":"We need to add a `status` column to the orders table. There are about 4 million rows in production. What's the safest migration approach?"}
2
+ {"role":"assistant","content":"For a 4M-row table on Postgres, use strong_migrations: add the column with a default of null first, then backfill in batches of 1000 rows using a separate script, and finally add the NOT NULL constraint with a DEFAULT only after the backfill completes. This avoids a full table lock. Set lock_timeout to 5 seconds and statement_timeout to 10 seconds on the migration connection so a slow query can't block deploys."}
3
+ {"role":"user","content":"Should we add an index on status at the same time?"}
4
+ {"role":"assistant","content":"Create it concurrently: `add_index :orders, :status, algorithm: :concurrent`. This builds the index without holding an exclusive lock. Run it in a separate migration from the column addition, and monitor pg_stat_activity during the build. Expect it to take 2-5 minutes on a 4M-row table; plan a maintenance window if your DB is under heavy write load during that time."}
5
+ {"role":"user","content":"What about the foreign key from order_items to orders? We have cascade deletes enabled."}
6
+ {"role":"assistant","content":"Cascade deletes with a busy foreign key can cause unexpected lock chains. Before the migration, check pg_locks for any long-running transactions that hold row locks on orders. If you see contention, defer the cascade to application-level logic: remove the database-level ON DELETE CASCADE, handle the deletion in an after_destroy callback, and wrap it in a transaction with a retry on deadlock. Log any cascade failures to a dead_letter table so nothing silently disappears."}
7
+ {"role":"user","content":"The GET /api/v2/orders endpoint is averaging 800ms in production. Our SLA is 200ms. Where do we start?"}
8
+ {"role":"assistant","content":"Start with rack-mini-profiler in staging with the pp=flamegraph query parameter — it will show exactly where the time is going. My first guess is N+1 queries: every order record is probably triggering individual SELECTs on users, products, and shipments. Add includes(:user, :line_items => :product, :shipments) to your base scope. Run EXPLAIN (ANALYZE, BUFFERS) on the resulting query; you want to see index scans, not sequential scans."}
9
+ {"role":"user","content":"We found 14 N+1 queries. After fixing those, we're down to 420ms. Still too slow."}
10
+ {"role":"assistant","content":"Good progress. Next suspect is serialization. If you're using ActiveModelSerializers or JBuilder, move to Blueprinter or oj with a hand-rolled hash — serialization overhead on large collections is significant. Also check whether you're calling .count anywhere inside a loop; that's a hidden query. Beyond that, look at pagination: if the endpoint returns more than 50 records, add Kaminari or cursor-based pagination and cap the page size. A 50-record page should serialize in under 5ms."}
11
+ {"role":"user","content":"Response is 210ms now. Can we cache this endpoint?"}
12
+ {"role":"assistant","content":"Yes. Use Russian doll caching: cache the full collection response keyed on max(orders.updated_at) plus the user_id and page number. Set a 5-minute TTL and add a cache-control: public, max-age=60 header for CDN caching. Invalidate on order write using an after_commit hook. For authenticated endpoints, ensure Vary: Authorization is in the response headers so the CDN does not serve one user's data to another. Monitor hit rate with Rails.cache.stats — aim for > 80% before calling it done."}
13
+ {"role":"user","content":"Our current deploy takes 22 minutes from merge to production. The team is frustrated. What's slowing it down?"}
14
+ {"role":"assistant","content":"Pull the CI timing breakdown. Common culprits in that range: (1) test suite parallelism is low — if you're running 2000 tests on a single container, split across 4 and save 8-10 minutes; (2) Docker image builds are not caching gem layers — pin the Gemfile.lock COPY before the rest of the source so the bundle install layer is reused on unchanged deps; (3) you may be running integration tests that hit external APIs or spin up the full Rails app — stub those with WebMock or VCR in CI and save 3-4 minutes. Which CI platform are you on?"}
15
+ {"role":"user","content":"GitHub Actions. We're already using matrix builds but our Docker layer caching is definitely off."}
16
+ {"role":"assistant","content":"Add cache-from: type=gha and cache-to: type=gha,mode=max to your docker/build-push-action step. With GHA cache, the gem install layer will be a cache hit on most PRs and saves 5-7 minutes. Also enable --mount=type=cache,target=/usr/local/bundle in your Dockerfile RUN step for the bundle install — this caches gems inside the BuildKit layer cache. Together these typically cut Docker build time from 8 minutes to under 2."}
17
+ {"role":"user","content":"What's the right rollback procedure if something goes wrong post-deploy?"}
18
+ {"role":"assistant","content":"Maintain a three-step rollback playbook: (1) Feature flag off — if the broken feature is behind a flag, disable it in LaunchDarkly or Flipper without touching code; (2) ECS rollback to previous image — update the task definition to the previous image tag; this takes under 60 seconds. (3) Database rollback — only if a migration ran and is reversible. Run db:rollback on each app instance, then redeploy the old image. Keep an #incidents Slack channel with a pinned deploy log showing the last 5 image SHAs so the on-call engineer does not have to dig."}
19
+ {"role":"user","content":"Should we do blue-green deploys?"}
20
+ {"role":"assistant","content":"Blue-green makes sense if your rollback SLA is under 30 seconds. The overhead: you need to run double the app servers during deploy (cost), and any migrations must be backwards-compatible with both the old and new code simultaneously — no column renames, no dropping columns the old code reads. A simpler middle ground is a canary deploy: route 5% of traffic to the new version, watch error rate and p99 for 10 minutes, then promote. This requires a weighted target group split in the load balancer."}
21
+ {"role":"user","content":"Our invoice generation Sidekiq jobs are failing intermittently with Stripe timeout errors. How should we handle retries?"}
22
+ {"role":"assistant","content":"Use Sidekiq's built-in retry with exponential backoff, but add a custom retry delay for Stripe errors specifically. In the job class: sidekiq_retry_in { |count| (count ** 4) + 15 + rand(30) * count }. This gives you roughly 15s, 1m, 5m, 17m, 34m for retries 1-5. More importantly, make the job idempotent: before calling Stripe, check whether an invoice with your idempotency key already exists. Stripe accepts an Idempotency-Key header — set it to \"invoice-ORDER_ID-ATTEMPT\" so a duplicate call returns the already-created invoice rather than billing twice."}
23
+ {"role":"user","content":"We're seeing jobs pile up in the dead queue when Stripe is degraded. 500+ dead jobs after a 20-minute outage."}
24
+ {"role":"assistant","content":"That pile-up usually means max_retries is too low and jobs exhaust retries during the outage window. Raise it to 10 retries (covers ~6 hours of backoff). Add a Sidekiq job middleware that checks a circuit breaker flag in Redis before processing: if stripe:circuit_open is set, re-enqueue the job with a 5-minute delay instead of retrying immediately. Set the circuit breaker when you detect 3 consecutive Stripe failures, and auto-clear it after 10 minutes. This preserves the dead queue for genuine bugs rather than temporary outages."}
25
+ {"role":"user","content":"How do we handle the dead queue cleanup after an outage?"}
26
+ {"role":"assistant","content":"Build a Rake task that pulls jobs from the dead set, filters by error class and job age, and re-enqueues them in batches of 50 with a 1-second sleep between batches to avoid hammering the newly-recovered service. Log each re-enqueue with the original job args and failure reason. Run it manually after confirming Stripe is healthy. Never blindly drain the dead queue — inspect the error messages first; some failures may be data issues that need a code fix before retry."}
27
+ {"role":"user","content":"We have two new engineers starting Monday. What should their first week look like?"}
28
+ {"role":"assistant","content":"Day 1: environment setup (README-driven, should take under 2 hours), 1:1 with their manager, and a codebase walkthrough covering the main domain models. Days 2-3: pair on a small bug or chore ticket — something real but low-stakes. Day 4: own a small feature end-to-end, including writing the test. Day 5: retro on what was confusing in the onboarding docs and update the README. The goal by end of week one is that they've shipped something, however small, and feel unblocked asking questions."}
29
+ {"role":"user","content":"Thanks"}
30
+ {"role":"assistant","content":"Sure!"}