turnkit 0.2.9 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a7069b120432ec902d846961157f5635c946602a8298ed4471f09dde3e3e3e0d
4
- data.tar.gz: '09a5d64ff294f89ebde99a6cf1d36dc8731c6cabbf06216d4e9b9551cbe88a1e'
3
+ metadata.gz: b5694bc97b2f735e5076574e2863ee5addc41926bd85edf02e1835263ffb3516
4
+ data.tar.gz: 65286330a1d0b4bbd0e3e6c11ba73abd836fb22a44ae4b3ab48a58ecf9d19425
5
5
  SHA512:
6
- metadata.gz: de794838f5979194aa2469890848eb7cd60932d6f223e95d17be4d8912a6f2777afb55143f9776d7093be2072451c4a7ba0aa83ca8783c82a29375da56a11c90
7
- data.tar.gz: c037fb4946a252ebf9bb2e0f99b76cca23d60f29275ce4e07a15f71232d4fdc0dce23337ad1b4b47bacd7df50ca7eedd3cf050c82167bcccb30debaa70cdfe22
6
+ metadata.gz: 2b3674abf0cae37286a04431f0ceb02a30e282c715e4d6d96e51c0a08d600c94a9fee6c82bf178c0b97ff080ee221b00a18b5409e72003d92c7a5430b34d5733
7
+ data.tar.gz: 7141f5cc00df42bfaf0e9b035d75f54e0b7c9b14ff71a8c95805242b32835fb410358f7f42ff2161f89896425b62fd840c05dcc2504f555430450517dc61bf9b
data/CHANGELOG.md CHANGED
@@ -1,5 +1,22 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.3.0 - 2026-06-10
4
+
5
+ - Make the task-runtime API skills-first and intentionally breaking: `max_spend` is the only spend-limit name and output validation is exposed as `output_policy` / `policy_audit`.
6
+ - Store message content as ordered typed parts, with text derived from content and tool calls/results persisted in the transcript instead of metadata.
7
+ - Add `load_skill` for progressively disclosed available skills.
8
+ - Add output-policy revision loops with `output_retries`, including skill/policy rehydration in revision prompts.
9
+ - Add deterministic `input_schema` validation before turns are created.
10
+ - Ensure terminal tools never orphan sibling tool calls; skipped siblings receive cancelled executions and tool-result messages.
11
+ - Add turn claiming, tool-runner heartbeats, persisted budget resume, and sub-agent failure details.
12
+
13
+ ## 0.2.10 - 2026-06-10
14
+
15
+ - Add output audits and file-backed output policies for validating final run output.
16
+ - Add per-tool execution limits and explicit budget errors.
17
+ - Improve workflow event callbacks, model telemetry events, and compaction usage accounting.
18
+ - Add an Amazon memo writer example and batched page reading in the workflow researcher example.
19
+
3
20
  ## 0.2.9 - 2026-06-08
4
21
 
5
22
  - Add `TurnKit::Workflow` for reusable single-orchestrator task runtimes with workflow skills, tools, guardrails, compaction, and run monitoring.
data/README.md CHANGED
@@ -5,7 +5,7 @@
5
5
  [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE.md)
6
6
 
7
7
  Build durable Ruby and Rails agents with conversations, runs, workflows, tools,
8
- skills, sub-agents, and persistence.
8
+ skills, output audits, sub-agents, and persistence.
9
9
 
10
10
  ## Installation
11
11
 
@@ -65,6 +65,11 @@ For runnable, API-key-free examples of the three core entry points, see
65
65
  - agent run: one bounded application task;
66
66
  - workflow: reusable task runner with skills, tools, and limits.
67
67
 
68
+ For fuller workflow examples, see:
69
+
70
+ - [`examples/workflow_researcher`](examples/workflow_researcher): source-grounded research with web tools, batch reads, per-tool limits, and deep monitoring;
71
+ - [`examples/amazon_memo_writer`](examples/amazon_memo_writer): strict memo generation with research tools, a structured terminal submit tool, deterministic format checks, and an LLM output policy.
72
+
68
73
  ### Models
69
74
 
70
75
  Set a model:
@@ -208,6 +213,10 @@ workflow = TurnKit::Workflow.new(
208
213
  max_spend: 0.25,
209
214
  max_iterations: 12,
210
215
  max_tool_executions: 25,
216
+ max_tool_executions_by_name: {
217
+ web_search: 2,
218
+ read_web_page: 8
219
+ },
211
220
  compaction: {
212
221
  context_limit: 64_000,
213
222
  threshold: 0.75
@@ -253,6 +262,10 @@ Prompt caching and compaction solve different problems:
253
262
  - budgets (`max_spend`, `max_iterations`, `max_tool_executions`) keep autonomous
254
263
  loops bounded.
255
264
 
265
+ Use `max_tool_executions_by_name` when a workflow needs different budgets for
266
+ different tools. For example, allow many cheap reads but only one final submit
267
+ tool, or cap web searches while allowing a batch page reader.
268
+
256
269
  Reach for separate agents and `sub_agents` only when the isolation is worth the
257
270
  extra model calls, such as different models, different tool permissions,
258
271
  parallel specialist review, or separate durable child conversations.
@@ -289,6 +302,97 @@ class SaveBrief < TurnKit::Tool
289
302
  end
290
303
  ```
291
304
 
305
+ ### Output audits and policies
306
+
307
+ Use output audits for deterministic checks that should not depend on another
308
+ model call: required headings, source counts, forbidden characters, JSON shape,
309
+ or project-specific formatting rules.
310
+
311
+ ```ruby
312
+ no_em_dash = ->(output) do
313
+ next unless output.include?("—")
314
+
315
+ { rule: "no_em_dash", message: "contains an em dash" }
316
+ end
317
+
318
+ numbered_lists_only = ->(output) do
319
+ lines = output.lines.each_with_index.filter_map do |line, index|
320
+ index + 1 if line.match?(/^\s*[-*]\s+/)
321
+ end
322
+
323
+ next if lines.empty?
324
+
325
+ {
326
+ rule: "numbered_lists_only",
327
+ message: "contains unordered list markers",
328
+ metadata: { lines: lines }
329
+ }
330
+ end
331
+
332
+ workflow = TurnKit::Workflow.new(
333
+ name: "memo_writer",
334
+ output_policy: [no_em_dash, numbered_lists_only],
335
+ output_policy_mode: :fail
336
+ )
337
+ ```
338
+
339
+ Run checks directly when you want to test a renderer or policy without calling a
340
+ model:
341
+
342
+ ```ruby
343
+ audit = TurnKit.check_output_policy(
344
+ "1. Recommendation\n- unordered item — fix this\n",
345
+ constraints: [no_em_dash, numbered_lists_only]
346
+ )
347
+
348
+ puts audit.clean?
349
+ puts audit.messages
350
+ ```
351
+
352
+ Use `output_policy` when a semantic judge is worth the extra model call. The
353
+ policy can be a `.md`, `.markdown`, or `.txt` file path, a `TurnKit::Skill`, a
354
+ `TurnKit::OutputPolicy`, or any object that responds to `#call` or `#check`.
355
+
356
+ ```ruby
357
+ workflow = TurnKit::Workflow.new(
358
+ name: "memo_writer",
359
+ output_policy: "app/ai/policies/amazon_memo.md",
360
+ output_policy_model: "gpt-4.1-mini",
361
+ output_policy_thinking: { effort: :low },
362
+ output_policy_mode: :report
363
+ )
364
+ ```
365
+
366
+ `output_policy_mode: :report` records violations while allowing the run to
367
+ complete. `:fail` marks the run failed after recording the output and audit;
368
+ `:fail` is the default for contract-driven workflows. Policy model usage and
369
+ cost are counted on the parent run.
370
+
371
+ Add `output_retries:` to turn policy failures into bounded revision loops instead
372
+ of dead ends:
373
+
374
+ ```ruby
375
+ voice = TurnKit::Skill.from_file("app/ai/skills/memo_voice.md")
376
+
377
+ workflow = TurnKit::Workflow.new(
378
+ name: "memo_writer",
379
+ skills: [voice],
380
+ output_policy: [voice, no_em_dash],
381
+ output_retries: 2,
382
+ input_schema: {
383
+ "type" => "object",
384
+ "required" => ["project_id"],
385
+ "properties" => { "project_id" => { "type" => "string" } }
386
+ }
387
+ )
388
+ ```
389
+
390
+ `skills:` are always loaded into the prompt. `available_skills:` are listed in
391
+ `<skills_available>` and exposed through the `load_skill` tool, so the model can
392
+ load full instructions on demand. Every advertised tool call receives exactly one
393
+ tool result, including validation errors, budget denials, and calls skipped after
394
+ a terminal tool ends the turn.
395
+
292
396
  ### Prompt Preview
293
397
 
294
398
  Preview a pending turn:
@@ -326,9 +430,7 @@ class SaveReport < TurnKit::Tool
326
430
  parameter :title, :string, required: true
327
431
  parameter :body, :string, required: true
328
432
 
329
- def self.ends_turn? = true
330
-
331
- def self.completion_message(result)
433
+ terminal! do |result|
332
434
  "Saved #{result.fetch("report_id")}."
333
435
  end
334
436
 
@@ -544,9 +646,12 @@ TurnKit.reconcile_stale!
544
646
  | `TurnKit.max_iterations` | Limit model loop iterations. |
545
647
  | `TurnKit.max_depth` | Limit sub-agent depth. |
546
648
  | `TurnKit.max_tool_executions` | Limit tool calls per turn. |
649
+ | `TurnKit.max_tool_executions_by_name` | Limit specific tools independently. |
547
650
  | `TurnKit.timeout` | Limit turn runtime. |
548
651
  | `TurnKit.max_spend` | Limit estimated turn cost. |
549
652
  | `TurnKit.compaction` | Configure context compaction. |
653
+ | `TurnKit.output_policy_model` | Default model for file-backed output policies. |
654
+ | `TurnKit.output_policy_thinking` | Default thinking config for file-backed output policies. |
550
655
  | `TurnKit.on_event` | Subscribe to lifecycle events. |
551
656
 
552
657
  Set options globally:
@@ -555,11 +660,12 @@ Set options globally:
555
660
  TurnKit.default_model = "gpt-4.1-mini"
556
661
  TurnKit.max_spend = 0.25
557
662
  TurnKit.max_iterations = 25
663
+ TurnKit.max_tool_executions_by_name = { web_search: 2 }
664
+ TurnKit.output_policy_model = "gpt-4.1-mini"
558
665
  TurnKit.timeout = 300
559
666
  ```
560
667
 
561
- `TurnKit.cost_limit` remains supported as the internal/legacy name for
562
- `max_spend`.
668
+ `max_spend` is the only spend-limit name in the public API.
563
669
 
564
670
  Set options per agent:
565
671
 
data/UPGRADE.md CHANGED
@@ -1,313 +1,51 @@
1
1
  # Upgrade Guide
2
2
 
3
- This guide covers migrating to the workflow-based task-runtime API. The
4
- recommended migration is about making the three work shapes easier to read:
3
+ ## 0.3.0 is a clean break
5
4
 
6
- - conversations for durable multi-turn threads;
7
- - runs for one non-interactive application task;
8
- - workflows for reusable task runners with tools, skills, limits, and policy.
5
+ TurnKit 0.3.0 intentionally removes the short-lived legacy names from the 0.2
6
+ series. The gem is pre-1.0 and the durable transcript schema changed, so migrate
7
+ by updating call sites and reinstalling the generated tables for new projects.
9
8
 
10
- ## Quick summary
9
+ ### Renames
11
10
 
12
- Before changing call sites, bump TurnKit to the latest version and run your
13
- test suite against the new release.
11
+ - `TurnKit.cost_limit` `TurnKit.max_spend`
12
+ - `Agent.new(cost_limit:)` `Agent.new(max_spend:)`
13
+ - `Workflow.new(cost_limit:)` / `workflow.run(cost_limit:)` → `max_spend:`
14
+ - `output_audit:` → `output_policy:`
15
+ - `output_audit_mode:` → `output_policy_mode:`
16
+ - `run.output_audit` → `run.policy_audit`
17
+ - `run.output_audit_clean?` → `run.policy_clean?`
18
+ - `TurnKit.audit_output(...)` → `TurnKit.check_output_policy(...)`
14
19
 
15
- ```ruby
16
- # Gemfile
17
- gem "turnkit", "~> 0.2.9"
18
- ```
20
+ The audit result class remains `TurnKit::OutputAudit`; only the public option and
21
+ run-accessor names changed.
19
22
 
20
- ```sh
21
- bundle update turnkit
22
- ```
23
+ ### Message schema
23
24
 
24
- Use workflows for reusable autonomous task runners.
25
+ `turnkit_messages.text` was removed. Message `content` is now the canonical
26
+ ordered array of parts:
25
27
 
26
- Recommended new forms:
28
+ - `text`
29
+ - `thinking`
30
+ - `tool_call`
31
+ - `tool_result`
32
+ - opaque provider parts
27
33
 
28
- ```ruby
29
- TurnKit.configure do |config|
30
- config.model = "gpt-5.2"
31
- config.max_spend = 0.25
32
- end
34
+ `Message#text` is derived from text parts. New Rails installs should regenerate
35
+ the install migration; there is no compatibility shim for older schemas.
33
36
 
34
- workflow = TurnKit::Workflow.new(name: "brief_writer", tools: [WebSearch, SaveBrief])
35
- run = workflow.run("Create a source-grounded brief.", input: { topic: "Rails 8" })
37
+ ### Workflows
36
38
 
37
- puts run.output
38
- ```
39
+ `TurnKit::Workflow` now forwards options directly to `Agent`. Use
40
+ `workflow.options[:name]` or `workflow.agent` for inspection instead of per-option
41
+ workflow attr readers. Workflow `instructions:` compose with the orchestrator
42
+ preamble by default; pass `preamble: false` to opt out.
39
43
 
40
- ## Configuration
44
+ ### Skills and policy loops
41
45
 
42
- ### Model name
43
-
44
- Before:
45
-
46
- ```ruby
47
- TurnKit.default_model = "gpt-5.2"
48
- ```
49
-
50
- After:
51
-
52
- ```ruby
53
- TurnKit.model = "gpt-5.2"
54
- ```
55
-
56
- `TurnKit.default_model` remains supported. `TurnKit.model` is the shorter public
57
- alias for app code and initializers.
58
-
59
- ### Global setup
60
-
61
- Before:
62
-
63
- ```ruby
64
- TurnKit.default_model = "gpt-5.2"
65
- TurnKit.cost_limit = 0.25
66
- TurnKit.max_iterations = 12
67
- ```
68
-
69
- After:
70
-
71
- ```ruby
72
- TurnKit.configure do |config|
73
- config.model = "gpt-5.2"
74
- config.max_spend = 0.25
75
- config.max_iterations = 12
76
- end
77
- ```
78
-
79
- `TurnKit.configure` simply yields the `TurnKit` module. There is no separate
80
- configuration object or DSL.
81
-
82
- ### Spend limit naming
83
-
84
- Before:
85
-
86
- ```ruby
87
- TurnKit.cost_limit = 0.25
88
- ```
89
-
90
- After:
91
-
92
- ```ruby
93
- TurnKit.max_spend = 0.25
94
- ```
95
-
96
- `cost_limit` remains supported. Prefer `max_spend` in application-facing code
97
- because it matches how developers think about autonomous runs.
98
-
99
- ## Running application tasks
100
-
101
- ### Agent tasks
102
-
103
- Before:
104
-
105
- ```ruby
106
- run = agent.run(task: "Classify this lead.", input: lead.attributes)
107
- puts run.output_text
108
- ```
109
-
110
- After:
111
-
112
- ```ruby
113
- run = agent.run("Classify this lead.", input: lead.attributes)
114
- puts run.output
115
- ```
116
-
117
- The keyword form still works. The positional string is the recommended form for
118
- the common case. `Agent#run` uses task prompt behavior by default; pass
119
- `prompt_mode: :full` if you need conversation-style prompt behavior for a run.
120
-
121
- ### Pending runs
122
-
123
- No behavior change.
124
-
125
- ```ruby
126
- run = agent.run("Classify later.", async: true)
127
- request = run.preview
128
- run.run!
129
- ```
130
-
131
- The existing keyword form remains valid:
132
-
133
- ```ruby
134
- run = agent.run(task: "Classify later.", async: true)
135
- ```
136
-
137
- ## Workflows
138
-
139
- The preferred name for reusable autonomous task runtimes is now workflow. A
140
- workflow packages:
141
-
142
- - one task-mode orchestrator
143
- - workflow skills
144
- - tools
145
- - guardrails
146
- - compaction
147
- - optional persistence/action tools
148
-
149
- ### Construction
150
-
151
- ```ruby
152
- workflow = TurnKit::Workflow.new(
153
- name: "sales_enrichment",
154
- tools: [AccountLookup, WebSearch, SaveEnrichment],
155
- skills: [sales_research_skill],
156
- max_spend: 0.25
157
- )
158
- ```
159
-
160
- ### Running
161
-
162
- ```ruby
163
- run = workflow.run(
164
- "Enrich this account for responsible outreach.",
165
- input: account.attributes
166
- )
167
- ```
168
-
169
- `task:` remains supported.
170
-
171
- ## Run inspection
172
-
173
- New convenience methods were added to `TurnKit::Run`.
174
-
175
- Before:
176
-
177
- ```ruby
178
- run.output_text
179
- run.tool_executions
180
- run.turn_records.length
181
- TurnKit.store.load_turn(run.id)["error"]
182
- ```
183
-
184
- After:
185
-
186
- ```ruby
187
- run.output
188
- run.tool_calls
189
- run.steps
190
- run.error
191
- ```
192
-
193
- Old methods remain available. Prefer the shorter methods in application code,
194
- examples, and docs.
195
-
196
- ## Save/action tools
197
-
198
- Use `terminal!` for tools that complete the run by saving an artifact or taking
199
- the final action.
200
-
201
- Before:
202
-
203
- ```ruby
204
- class SaveBrief < TurnKit::Tool
205
- def self.ends_turn? = true
206
- def self.completion_message(result) = "Saved #{result.fetch("id")}."
207
-
208
- def call(title:, body:, context:)
209
- { "id" => Brief.create!(title: title, body: body).id }
210
- end
211
- end
212
- ```
213
-
214
- After:
215
-
216
- ```ruby
217
- class SaveBrief < TurnKit::Tool
218
- terminal! { |result| "Saved #{result.fetch("id")}." }
219
-
220
- def call(title:, body:, context:)
221
- { "id" => Brief.create!(title: title, body: body).id }
222
- end
223
- end
224
- ```
225
-
226
- The old `ends_turn?` and `completion_message` methods remain supported. Prefer
227
- `terminal!` for readability.
228
-
229
- ## Tool instances
230
-
231
- If a tool needs constructor arguments, register an instance instead of a class.
232
-
233
- Before, this may have failed at runtime:
234
-
235
- ```ruby
236
- class WebSearch < TurnKit::Tool
237
- def initialize(client:)
238
- @client = client
239
- end
240
- end
241
-
242
- agent = TurnKit::Agent.new(tools: [WebSearch])
243
- ```
244
-
245
- After:
246
-
247
- ```ruby
248
- client = SearchClient.new(api_key: ENV.fetch("SEARCH_API_KEY"))
249
- agent = TurnKit::Agent.new(tools: [WebSearch.new(client: client)])
250
- ```
251
-
252
- This is the recommended pattern for API clients, test doubles, and per-tenant
253
- dependencies.
254
-
255
- ## Multi-agent workflows
256
-
257
- If you previously modeled every role as a separate agent, consider migrating the
258
- default path to one workflow with a workflow skill.
259
-
260
- Before:
261
-
262
- ```ruby
263
- researcher = TurnKit::Agent.new(name: "researcher", tools: [WebSearch])
264
- writer = TurnKit::Agent.new(name: "writer")
265
- verifier = TurnKit::Agent.new(name: "verifier")
266
-
267
- orchestrator = TurnKit::Agent.new(
268
- name: "orchestrator",
269
- sub_agents: [researcher, writer, verifier]
270
- )
271
- ```
272
-
273
- After:
274
-
275
- ```ruby
276
- workflow = TurnKit::Skill.new(
277
- key: "source_grounded_brief",
278
- name: "Source Grounded Brief",
279
- content: <<~TEXT
280
- Research first. Build an evidence pack. Draft only from evidence. Verify
281
- important claims. Revise unsupported claims before final output.
282
- TEXT
283
- )
284
-
285
- source_brief = TurnKit::Workflow.new(
286
- name: "source_brief",
287
- skills: [workflow],
288
- tools: [WebSearch, ReadWebPage, SaveBrief],
289
- max_spend: 0.25,
290
- max_tool_executions: 20
291
- )
292
- ```
293
-
294
- Keep separate agents when the isolation is worth the extra model calls:
295
-
296
- - different models
297
- - different tool permissions
298
- - adversarial review
299
- - parallel specialist research
300
- - separate durable child conversations
301
-
302
- ## Suggested migration order
303
-
304
- 1. Replace `TurnKit.default_model =` with `TurnKit.model =` in app-level config.
305
- 2. Wrap global settings in `TurnKit.configure` if you have more than one.
306
- 3. Use `TurnKit::Workflow.new(name: "...")` for reusable autonomous task runners.
307
- 4. Replace `run(task: "...")` with `run("...")` where it improves readability.
308
- 5. Replace `run.output_text` with `run.output` in application code.
309
- 6. Replace save/action tool overrides with `terminal!` when convenient.
310
- 7. Consider collapsing role-agent workflows into one workflow plus workflow skills if
311
- cost or complexity is a concern.
312
-
313
- Run your test suite after migrating call sites.
46
+ - `available_skills:` now exposes a real `load_skill` tool.
47
+ - `output_policy:` accepts `TurnKit::Skill` instances.
48
+ - `output_retries:` controls bounded revision loops. The default policy mode is
49
+ now `:fail`; use `output_policy_mode: :report` if dirty output should complete.
50
+ - `input_schema:` validates application input before any conversation or turn is
51
+ created.
@@ -182,6 +182,7 @@ module TurnKit
182
182
  )
183
183
  Result.new(
184
184
  text: response_text(response),
185
+ parts: response_parts(response, tool_calls: tool_calls),
185
186
  output_data: response_data(response),
186
187
  tool_calls: tool_calls,
187
188
  usage: usage,
@@ -189,6 +190,34 @@ module TurnKit
189
190
  )
190
191
  end
191
192
 
193
+ def response_parts(response, tool_calls:)
194
+ content = response.respond_to?(:content) ? response.content : response
195
+ parts = case content
196
+ when Array
197
+ content.map { |part| normalize_provider_part(part) }
198
+ when Hash
199
+ [ { "type" => "text", "text" => content.to_json } ]
200
+ else
201
+ text = content.to_s
202
+ text.empty? ? [] : [ { "type" => "text", "text" => text } ]
203
+ end.compact
204
+ parts + Array(tool_calls).map { |call| { "type" => "tool_call", "id" => call.id, "name" => call.name, "arguments" => call.arguments } }
205
+ end
206
+
207
+ def normalize_provider_part(part)
208
+ attrs = part.respond_to?(:to_h) ? part.to_h.transform_keys(&:to_s) : nil
209
+ return { "type" => "text", "text" => part.to_s } unless attrs
210
+
211
+ case attrs["type"].to_s
212
+ when "text", "output_text"
213
+ { "type" => "text", "text" => attrs["text"] || attrs["content"].to_s }
214
+ when "thinking", "reasoning"
215
+ { "type" => "thinking", "text" => attrs["text"] || attrs["content"].to_s, "signature" => attrs["signature"], "redacted" => attrs["redacted"] || false }.compact
216
+ else
217
+ { "type" => "provider", "kind" => attrs["type"].to_s, "data" => attrs }
218
+ end
219
+ end
220
+
192
221
  def response_text(response)
193
222
  content = response.respond_to?(:content) ? response.content : response
194
223
  content.is_a?(Hash) || content.is_a?(Array) ? content.to_json : content.to_s