RubyGems - turnkit - Versions diffs - 0.2.9 → 0.3.0 - Mend

turnkit 0.2.9 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +17 -0
data/README.md +112 -6
data/UPGRADE.md +37 -299
data/lib/turnkit/adapters/ruby_llm.rb +29 -0
data/lib/turnkit/agent.rb +61 -7
data/lib/turnkit/budget.rb +44 -10
data/lib/turnkit/compaction.rb +16 -4
data/lib/turnkit/error.rb +2 -0
data/lib/turnkit/generators/turnkit/install/templates/create_turnkit_tables.rb +0 -1
data/lib/turnkit/load_skill_tool.rb +29 -0
data/lib/turnkit/memory_store.rb +11 -0
data/lib/turnkit/message.rb +14 -7
data/lib/turnkit/message_projection.rb +17 -2
data/lib/turnkit/output_audit.rb +92 -0
data/lib/turnkit/output_policy.rb +127 -0
data/lib/turnkit/result.rb +29 -4
data/lib/turnkit/run.rb +4 -3
data/lib/turnkit/schema_check.rb +68 -0
data/lib/turnkit/skill.rb +16 -2
data/lib/turnkit/store.rb +6 -0
data/lib/turnkit/stores/active_record_store.rb +10 -2
data/lib/turnkit/sub_agent_tool.rb +2 -1
data/lib/turnkit/system_prompt.rb +1 -1
data/lib/turnkit/tool.rb +2 -21
data/lib/turnkit/tool_call.rb +3 -3
data/lib/turnkit/tool_runner.rb +40 -11
data/lib/turnkit/turn.rb +162 -18
data/lib/turnkit/version.rb +1 -1
data/lib/turnkit/workflow.rb +24 -69
data/lib/turnkit.rb +16 -9
metadata +6 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a7069b120432ec902d846961157f5635c946602a8298ed4471f09dde3e3e3e0d
-  data.tar.gz: '09a5d64ff294f89ebde99a6cf1d36dc8731c6cabbf06216d4e9b9551cbe88a1e'
+  metadata.gz: b5694bc97b2f735e5076574e2863ee5addc41926bd85edf02e1835263ffb3516
+  data.tar.gz: 65286330a1d0b4bbd0e3e6c11ba73abd836fb22a44ae4b3ab48a58ecf9d19425
 SHA512:
-  metadata.gz: de794838f5979194aa2469890848eb7cd60932d6f223e95d17be4d8912a6f2777afb55143f9776d7093be2072451c4a7ba0aa83ca8783c82a29375da56a11c90
-  data.tar.gz: c037fb4946a252ebf9bb2e0f99b76cca23d60f29275ce4e07a15f71232d4fdc0dce23337ad1b4b47bacd7df50ca7eedd3cf050c82167bcccb30debaa70cdfe22
+  metadata.gz: 2b3674abf0cae37286a04431f0ceb02a30e282c715e4d6d96e51c0a08d600c94a9fee6c82bf178c0b97ff080ee221b00a18b5409e72003d92c7a5430b34d5733
+  data.tar.gz: 7141f5cc00df42bfaf0e9b035d75f54e0b7c9b14ff71a8c95805242b32835fb410358f7f42ff2161f89896425b62fd840c05dcc2504f555430450517dc61bf9b

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,22 @@
 # Changelog
+## 0.3.0 - 2026-06-10
+- Make the task-runtime API skills-first and intentionally breaking: `max_spend` is the only spend-limit name and output validation is exposed as `output_policy` / `policy_audit`.
+- Store message content as ordered typed parts, with text derived from content and tool calls/results persisted in the transcript instead of metadata.
+- Add `load_skill` for progressively disclosed available skills.
+- Add output-policy revision loops with `output_retries`, including skill/policy rehydration in revision prompts.
+- Add deterministic `input_schema` validation before turns are created.
+- Ensure terminal tools never orphan sibling tool calls; skipped siblings receive cancelled executions and tool-result messages.
+- Add turn claiming, tool-runner heartbeats, persisted budget resume, and sub-agent failure details.
+## 0.2.10 - 2026-06-10
+- Add output audits and file-backed output policies for validating final run output.
+- Add per-tool execution limits and explicit budget errors.
+- Improve workflow event callbacks, model telemetry events, and compaction usage accounting.
+- Add an Amazon memo writer example and batched page reading in the workflow researcher example.
 ## 0.2.9 - 2026-06-08
 - Add `TurnKit::Workflow` for reusable single-orchestrator task runtimes with workflow skills, tools, guardrails, compaction, and run monitoring.

data/README.md CHANGED Viewed

@@ -5,7 +5,7 @@
 [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE.md)
 Build durable Ruby and Rails agents with conversations, runs, workflows, tools,
-skills, sub-agents, and persistence.
+skills, output audits, sub-agents, and persistence.
 ## Installation
@@ -65,6 +65,11 @@ For runnable, API-key-free examples of the three core entry points, see
 - agent run: one bounded application task;
 - workflow: reusable task runner with skills, tools, and limits.
+For fuller workflow examples, see:
+- [`examples/workflow_researcher`](examples/workflow_researcher): source-grounded research with web tools, batch reads, per-tool limits, and deep monitoring;
+- [`examples/amazon_memo_writer`](examples/amazon_memo_writer): strict memo generation with research tools, a structured terminal submit tool, deterministic format checks, and an LLM output policy.
 ### Models
 Set a model:
@@ -208,6 +213,10 @@ workflow = TurnKit::Workflow.new(
   max_spend: 0.25,
   max_iterations: 12,
   max_tool_executions: 25,
+  max_tool_executions_by_name: {
+    web_search: 2,
+    read_web_page: 8
+  },
   compaction: {
     context_limit: 64_000,
     threshold: 0.75
@@ -253,6 +262,10 @@ Prompt caching and compaction solve different problems:
 - budgets (`max_spend`, `max_iterations`, `max_tool_executions`) keep autonomous
   loops bounded.
+Use `max_tool_executions_by_name` when a workflow needs different budgets for
+different tools. For example, allow many cheap reads but only one final submit
+tool, or cap web searches while allowing a batch page reader.
 Reach for separate agents and `sub_agents` only when the isolation is worth the
 extra model calls, such as different models, different tool permissions,
 parallel specialist review, or separate durable child conversations.
@@ -289,6 +302,97 @@ class SaveBrief < TurnKit::Tool
 end
 ```
+### Output audits and policies
+Use output audits for deterministic checks that should not depend on another
+model call: required headings, source counts, forbidden characters, JSON shape,
+or project-specific formatting rules.
+```ruby
+no_em_dash = ->(output) do
+  next unless output.include?("—")
+  { rule: "no_em_dash", message: "contains an em dash" }
+end
+numbered_lists_only = ->(output) do
+  lines = output.lines.each_with_index.filter_map do |line, index|
+    index + 1 if line.match?(/^\s*[-*]\s+/)
+  end
+  next if lines.empty?
+  {
+    rule: "numbered_lists_only",
+    message: "contains unordered list markers",
+    metadata: { lines: lines }
+  }
+end
+workflow = TurnKit::Workflow.new(
+  name: "memo_writer",
+  output_policy: [no_em_dash, numbered_lists_only],
+  output_policy_mode: :fail
+)
+```
+Run checks directly when you want to test a renderer or policy without calling a
+model:
+```ruby
+audit = TurnKit.check_output_policy(
+  "1. Recommendation\n- unordered item — fix this\n",
+  constraints: [no_em_dash, numbered_lists_only]
+)
+puts audit.clean?
+puts audit.messages
+```
+Use `output_policy` when a semantic judge is worth the extra model call. The
+policy can be a `.md`, `.markdown`, or `.txt` file path, a `TurnKit::Skill`, a
+`TurnKit::OutputPolicy`, or any object that responds to `#call` or `#check`.
+```ruby
+workflow = TurnKit::Workflow.new(
+  name: "memo_writer",
+  output_policy: "app/ai/policies/amazon_memo.md",
+  output_policy_model: "gpt-4.1-mini",
+  output_policy_thinking: { effort: :low },
+  output_policy_mode: :report
+)
+```
+`output_policy_mode: :report` records violations while allowing the run to
+complete. `:fail` marks the run failed after recording the output and audit;
+`:fail` is the default for contract-driven workflows. Policy model usage and
+cost are counted on the parent run.
+Add `output_retries:` to turn policy failures into bounded revision loops instead
+of dead ends:
+```ruby
+voice = TurnKit::Skill.from_file("app/ai/skills/memo_voice.md")
+workflow = TurnKit::Workflow.new(
+  name: "memo_writer",
+  skills: [voice],
+  output_policy: [voice, no_em_dash],
+  output_retries: 2,
+  input_schema: {
+    "type" => "object",
+    "required" => ["project_id"],
+    "properties" => { "project_id" => { "type" => "string" } }
+  }
+)
+```
+`skills:` are always loaded into the prompt. `available_skills:` are listed in
+`<skills_available>` and exposed through the `load_skill` tool, so the model can
+load full instructions on demand. Every advertised tool call receives exactly one
+tool result, including validation errors, budget denials, and calls skipped after
+a terminal tool ends the turn.
 ### Prompt Preview
 Preview a pending turn:
@@ -326,9 +430,7 @@ class SaveReport < TurnKit::Tool
   parameter :title, :string, required: true
   parameter :body, :string, required: true
-  def self.ends_turn? = true
-  def self.completion_message(result)
+  terminal! do |result|
     "Saved #{result.fetch("report_id")}."
   end
@@ -544,9 +646,12 @@ TurnKit.reconcile_stale!
 | `TurnKit.max_iterations` | Limit model loop iterations. |
 | `TurnKit.max_depth` | Limit sub-agent depth. |
 | `TurnKit.max_tool_executions` | Limit tool calls per turn. |
+| `TurnKit.max_tool_executions_by_name` | Limit specific tools independently. |
 | `TurnKit.timeout` | Limit turn runtime. |
 | `TurnKit.max_spend` | Limit estimated turn cost. |
 | `TurnKit.compaction` | Configure context compaction. |
+| `TurnKit.output_policy_model` | Default model for file-backed output policies. |
+| `TurnKit.output_policy_thinking` | Default thinking config for file-backed output policies. |
 | `TurnKit.on_event` | Subscribe to lifecycle events. |
 Set options globally:
@@ -555,11 +660,12 @@ Set options globally:
 TurnKit.default_model = "gpt-4.1-mini"
 TurnKit.max_spend = 0.25
 TurnKit.max_iterations = 25
+TurnKit.max_tool_executions_by_name = { web_search: 2 }
+TurnKit.output_policy_model = "gpt-4.1-mini"
 TurnKit.timeout = 300
 ```
-`TurnKit.cost_limit` remains supported as the internal/legacy name for
-`max_spend`.
+`max_spend` is the only spend-limit name in the public API.
 Set options per agent:

data/UPGRADE.md CHANGED Viewed

@@ -1,313 +1,51 @@
 # Upgrade Guide
-This guide covers migrating to the workflow-based task-runtime API. The
-recommended migration is about making the three work shapes easier to read:
+## 0.3.0 is a clean break
-- conversations for durable multi-turn threads;
-- runs for one non-interactive application task;
-- workflows for reusable task runners with tools, skills, limits, and policy.
+TurnKit 0.3.0 intentionally removes the short-lived legacy names from the 0.2
+series. The gem is pre-1.0 and the durable transcript schema changed, so migrate
+by updating call sites and reinstalling the generated tables for new projects.
-## Quick summary
+### Renames
-Before changing call sites, bump TurnKit to the latest version and run your
-test suite against the new release.
+- `TurnKit.cost_limit` → `TurnKit.max_spend`
+- `Agent.new(cost_limit:)` → `Agent.new(max_spend:)`
+- `Workflow.new(cost_limit:)` / `workflow.run(cost_limit:)` → `max_spend:`
+- `output_audit:` → `output_policy:`
+- `output_audit_mode:` → `output_policy_mode:`
+- `run.output_audit` → `run.policy_audit`
+- `run.output_audit_clean?` → `run.policy_clean?`
+- `TurnKit.audit_output(...)` → `TurnKit.check_output_policy(...)`
-```ruby
-# Gemfile
-gem "turnkit", "~> 0.2.9"
-```
+The audit result class remains `TurnKit::OutputAudit`; only the public option and
+run-accessor names changed.
-```sh
-bundle update turnkit
-```
+### Message schema
-Use workflows for reusable autonomous task runners.
+`turnkit_messages.text` was removed. Message `content` is now the canonical
+ordered array of parts:
-Recommended new forms:
+- `text`
+- `thinking`
+- `tool_call`
+- `tool_result`
+- opaque provider parts
-```ruby
-TurnKit.configure do |config|
-  config.model = "gpt-5.2"
-  config.max_spend = 0.25
-end
+`Message#text` is derived from text parts. New Rails installs should regenerate
+the install migration; there is no compatibility shim for older schemas.
-workflow = TurnKit::Workflow.new(name: "brief_writer", tools: [WebSearch, SaveBrief])
-run = workflow.run("Create a source-grounded brief.", input: { topic: "Rails 8" })
+### Workflows
-puts run.output
-```
+`TurnKit::Workflow` now forwards options directly to `Agent`. Use
+`workflow.options[:name]` or `workflow.agent` for inspection instead of per-option
+workflow attr readers. Workflow `instructions:` compose with the orchestrator
+preamble by default; pass `preamble: false` to opt out.
-## Configuration
+### Skills and policy loops
-### Model name
-Before:
-```ruby
-TurnKit.default_model = "gpt-5.2"
-```
-After:
-```ruby
-TurnKit.model = "gpt-5.2"
-```
-`TurnKit.default_model` remains supported. `TurnKit.model` is the shorter public
-alias for app code and initializers.
-### Global setup
-Before:
-```ruby
-TurnKit.default_model = "gpt-5.2"
-TurnKit.cost_limit = 0.25
-TurnKit.max_iterations = 12
-```
-After:
-```ruby
-TurnKit.configure do |config|
-  config.model = "gpt-5.2"
-  config.max_spend = 0.25
-  config.max_iterations = 12
-end
-```
-`TurnKit.configure` simply yields the `TurnKit` module. There is no separate
-configuration object or DSL.
-### Spend limit naming
-Before:
-```ruby
-TurnKit.cost_limit = 0.25
-```
-After:
-```ruby
-TurnKit.max_spend = 0.25
-```
-`cost_limit` remains supported. Prefer `max_spend` in application-facing code
-because it matches how developers think about autonomous runs.
-## Running application tasks
-### Agent tasks
-Before:
-```ruby
-run = agent.run(task: "Classify this lead.", input: lead.attributes)
-puts run.output_text
-```
-After:
-```ruby
-run = agent.run("Classify this lead.", input: lead.attributes)
-puts run.output
-```
-The keyword form still works. The positional string is the recommended form for
-the common case. `Agent#run` uses task prompt behavior by default; pass
-`prompt_mode: :full` if you need conversation-style prompt behavior for a run.
-### Pending runs
-No behavior change.
-```ruby
-run = agent.run("Classify later.", async: true)
-request = run.preview
-run.run!
-```
-The existing keyword form remains valid:
-```ruby
-run = agent.run(task: "Classify later.", async: true)
-```
-## Workflows
-The preferred name for reusable autonomous task runtimes is now workflow. A
-workflow packages:
-- one task-mode orchestrator
-- workflow skills
-- tools
-- guardrails
-- compaction
-- optional persistence/action tools
-### Construction
-```ruby
-workflow = TurnKit::Workflow.new(
-  name: "sales_enrichment",
-  tools: [AccountLookup, WebSearch, SaveEnrichment],
-  skills: [sales_research_skill],
-  max_spend: 0.25
-)
-```
-### Running
-```ruby
-run = workflow.run(
-  "Enrich this account for responsible outreach.",
-  input: account.attributes
-)
-```
-`task:` remains supported.
-## Run inspection
-New convenience methods were added to `TurnKit::Run`.
-Before:
-```ruby
-run.output_text
-run.tool_executions
-run.turn_records.length
-TurnKit.store.load_turn(run.id)["error"]
-```
-After:
-```ruby
-run.output
-run.tool_calls
-run.steps
-run.error
-```
-Old methods remain available. Prefer the shorter methods in application code,
-examples, and docs.
-## Save/action tools
-Use `terminal!` for tools that complete the run by saving an artifact or taking
-the final action.
-Before:
-```ruby
-class SaveBrief < TurnKit::Tool
-  def self.ends_turn? = true
-  def self.completion_message(result) = "Saved #{result.fetch("id")}."
-  def call(title:, body:, context:)
-    { "id" => Brief.create!(title: title, body: body).id }
-  end
-end
-```
-After:
-```ruby
-class SaveBrief < TurnKit::Tool
-  terminal! { |result| "Saved #{result.fetch("id")}." }
-  def call(title:, body:, context:)
-    { "id" => Brief.create!(title: title, body: body).id }
-  end
-end
-```
-The old `ends_turn?` and `completion_message` methods remain supported. Prefer
-`terminal!` for readability.
-## Tool instances
-If a tool needs constructor arguments, register an instance instead of a class.
-Before, this may have failed at runtime:
-```ruby
-class WebSearch < TurnKit::Tool
-  def initialize(client:)
-    @client = client
-  end
-end
-agent = TurnKit::Agent.new(tools: [WebSearch])
-```
-After:
-```ruby
-client = SearchClient.new(api_key: ENV.fetch("SEARCH_API_KEY"))
-agent = TurnKit::Agent.new(tools: [WebSearch.new(client: client)])
-```
-This is the recommended pattern for API clients, test doubles, and per-tenant
-dependencies.
-## Multi-agent workflows
-If you previously modeled every role as a separate agent, consider migrating the
-default path to one workflow with a workflow skill.
-Before:
-```ruby
-researcher = TurnKit::Agent.new(name: "researcher", tools: [WebSearch])
-writer = TurnKit::Agent.new(name: "writer")
-verifier = TurnKit::Agent.new(name: "verifier")
-orchestrator = TurnKit::Agent.new(
-  name: "orchestrator",
-  sub_agents: [researcher, writer, verifier]
-)
-```
-After:
-```ruby
-workflow = TurnKit::Skill.new(
-  key: "source_grounded_brief",
-  name: "Source Grounded Brief",
-  content: <<~TEXT
-    Research first. Build an evidence pack. Draft only from evidence. Verify
-    important claims. Revise unsupported claims before final output.
-  TEXT
-)
-source_brief = TurnKit::Workflow.new(
-  name: "source_brief",
-  skills: [workflow],
-  tools: [WebSearch, ReadWebPage, SaveBrief],
-  max_spend: 0.25,
-  max_tool_executions: 20
-)
-```
-Keep separate agents when the isolation is worth the extra model calls:
-- different models
-- different tool permissions
-- adversarial review
-- parallel specialist research
-- separate durable child conversations
-## Suggested migration order
-1. Replace `TurnKit.default_model =` with `TurnKit.model =` in app-level config.
-2. Wrap global settings in `TurnKit.configure` if you have more than one.
-3. Use `TurnKit::Workflow.new(name: "...")` for reusable autonomous task runners.
-4. Replace `run(task: "...")` with `run("...")` where it improves readability.
-5. Replace `run.output_text` with `run.output` in application code.
-6. Replace save/action tool overrides with `terminal!` when convenient.
-7. Consider collapsing role-agent workflows into one workflow plus workflow skills if
-   cost or complexity is a concern.
-Run your test suite after migrating call sites.
+- `available_skills:` now exposes a real `load_skill` tool.
+- `output_policy:` accepts `TurnKit::Skill` instances.
+- `output_retries:` controls bounded revision loops. The default policy mode is
+  now `:fail`; use `output_policy_mode: :report` if dirty output should complete.
+- `input_schema:` validates application input before any conversation or turn is
+  created.

data/lib/turnkit/adapters/ruby_llm.rb CHANGED Viewed

@@ -182,6 +182,7 @@ module TurnKit
           )
           Result.new(
             text: response_text(response),
+            parts: response_parts(response, tool_calls: tool_calls),
             output_data: response_data(response),
             tool_calls: tool_calls,
             usage: usage,
@@ -189,6 +190,34 @@ module TurnKit
           )
         end
+        def response_parts(response, tool_calls:)
+          content = response.respond_to?(:content) ? response.content : response
+          parts = case content
+          when Array
+            content.map { |part| normalize_provider_part(part) }
+          when Hash
+            [ { "type" => "text", "text" => content.to_json } ]
+          else
+            text = content.to_s
+            text.empty? ? [] : [ { "type" => "text", "text" => text } ]
+          end.compact
+          parts + Array(tool_calls).map { |call| { "type" => "tool_call", "id" => call.id, "name" => call.name, "arguments" => call.arguments } }
+        end
+        def normalize_provider_part(part)
+          attrs = part.respond_to?(:to_h) ? part.to_h.transform_keys(&:to_s) : nil
+          return { "type" => "text", "text" => part.to_s } unless attrs
+          case attrs["type"].to_s
+          when "text", "output_text"
+            { "type" => "text", "text" => attrs["text"] || attrs["content"].to_s }
+          when "thinking", "reasoning"
+            { "type" => "thinking", "text" => attrs["text"] || attrs["content"].to_s, "signature" => attrs["signature"], "redacted" => attrs["redacted"] || false }.compact
+          else
+            { "type" => "provider", "kind" => attrs["type"].to_s, "data" => attrs }
+          end
+        end
         def response_text(response)
           content = response.respond_to?(:content) ? response.content : response
           content.is_a?(Hash) || content.is_a?(Array) ? content.to_json : content.to_s