turnkit 0.2.8 → 0.2.10

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 2859e971c248c783c407f498d81c9dc489f89120dc273edb517b19e56ef42111
4
- data.tar.gz: 2afce740b36683dd513a47770353b7fb74ed174297e96ab7e17efece82d446a6
3
+ metadata.gz: 268561a36c656098e1d23ea6de4c17616358ff931e05e1389e707a9e28fe458b
4
+ data.tar.gz: 8f6731d78fed5b3e3cc94d781c4f4e26accc4f8d05842b5c56eb58a6e7448907
5
5
  SHA512:
6
- metadata.gz: 3e70a71f00507cad12b7ea9d8f8506d4ae16ddbd7dbfbf3e59808ebdc244392e9dde14cf7ee4ee2d7578351bca0af906bb4ede6a6419e8136cf00706b2115307
7
- data.tar.gz: ba707c678fb3dee1211d0e0d2e57eccfe1ac4e15567fb985127602043085693643c70f6f7f5779259f2baf3b437b739d23db3196f500f00ae157a854d6543a44
6
+ metadata.gz: ae0a246b5937e586c808a25d28f051bafc54c2a922a52d89160eb3f5ef3bf7360b1d637cbb0c170d41eb74cd536638b6f9a1880275bd0ccd2fc8dcb4ac44db5c
7
+ data.tar.gz: 7ffebcfeadf51f193c7f2277a0842c2f56e00d9ff95d502915924f2a6d7e10744a0a710d1d2f5b1865182a9de21b2cce30edc3e94c16f49626912b93b1fc7063
data/CHANGELOG.md CHANGED
@@ -1,12 +1,19 @@
1
1
  # Changelog
2
2
 
3
- ## 0.2.8 - 2026-06-08
3
+ ## 0.2.10 - 2026-06-10
4
4
 
5
- - Add autonomous task fleets as reusable single-orchestrator runtimes with workflow skills, tools, guardrails, compaction, and run monitoring.
6
- - Add `Agent#run` and `TurnKit::Run` for non-interactive application tasks.
7
- - Improve task-runtime DX with `TurnKit.configure`, `TurnKit.model`, `TurnKit.max_spend`, `TurnKit.fleet`, positional `run("task")`, `run.output`, `run.tool_calls`, and `Tool.terminal!`.
5
+ - Add output audits and file-backed output policies for validating final run output.
6
+ - Add per-tool execution limits and explicit budget errors.
7
+ - Improve workflow event callbacks, model telemetry events, and compaction usage accounting.
8
+ - Add an Amazon memo writer example and batched page reading in the workflow researcher example.
9
+
10
+ ## 0.2.9 - 2026-06-08
11
+
12
+ - Add `TurnKit::Workflow` for reusable single-orchestrator task runtimes with workflow skills, tools, guardrails, compaction, and run monitoring.
13
+ - Add `Agent#run` and `TurnKit::Run` for non-interactive application tasks, with task prompt behavior by default.
14
+ - Improve task-runtime DX with `TurnKit.configure`, `TurnKit.model`, `TurnKit.max_spend`, `TurnKit::Workflow`, positional `run("task")`, `run.output`, `run.tool_calls`, and `Tool.terminal!`.
8
15
  - Support tool instances with constructor-injected dependencies.
9
- - Add a fleet researcher example and upgrade guide.
16
+ - Add a workflow researcher example and upgrade guide.
10
17
 
11
18
  ## 0.2.6 - 2026-06-07
12
19
 
data/README.md CHANGED
@@ -4,7 +4,8 @@
4
4
  [![Ruby](https://img.shields.io/badge/ruby-%3E%3D%203.1-red.svg)](https://www.ruby-lang.org)
5
5
  [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE.md)
6
6
 
7
- Build durable Ruby and Rails agents with tools, skills, sub-agents, and persistence.
7
+ Build durable Ruby and Rails agents with conversations, runs, workflows, tools,
8
+ skills, output audits, sub-agents, and persistence.
8
9
 
9
10
  ## Installation
10
11
 
@@ -57,6 +58,18 @@ puts run.output
57
58
 
58
59
  ## Usage
59
60
 
61
+ For runnable, API-key-free examples of the three core entry points, see
62
+ [`examples/core_api`](examples/core_api):
63
+
64
+ - conversation: durable thread over time;
65
+ - agent run: one bounded application task;
66
+ - workflow: reusable task runner with skills, tools, and limits.
67
+
68
+ For fuller workflow examples, see:
69
+
70
+ - [`examples/workflow_researcher`](examples/workflow_researcher): source-grounded research with web tools, batch reads, per-tool limits, and deep monitoring;
71
+ - [`examples/amazon_memo_writer`](examples/amazon_memo_writer): strict memo generation with research tools, a structured terminal submit tool, deterministic format checks, and an LLM output policy.
72
+
60
73
  ### Models
61
74
 
62
75
  Set a model:
@@ -92,6 +105,23 @@ Use these common providers:
92
105
 
93
106
  Expect `TurnKit::ModelAccessError` for obvious key mistakes.
94
107
 
108
+ To run eligible coding tasks against a ChatGPT Plus/Pro Codex subscription instead of provider API-key billing, use the Codex adapter. It shells out to the official `codex exec` CLI, so authenticate Codex first:
109
+
110
+ ```sh
111
+ codex login --device-auth
112
+ ```
113
+
114
+ Then configure TurnKit:
115
+
116
+ ```ruby
117
+ TurnKit.configure do |config|
118
+ config.client = TurnKit::Adapters::Codex.new(sandbox: "read-only")
119
+ config.model = "gpt-5.4"
120
+ end
121
+ ```
122
+
123
+ The Codex adapter does not store ChatGPT tokens or read `~/.codex/auth.json` directly. It reuses Codex CLI auth and records token usage with no TurnKit provider cost, because usage is charged against the user's ChatGPT/Codex plan limits.
124
+
95
125
  ### Conversations
96
126
 
97
127
  Create a conversation:
@@ -118,10 +148,13 @@ turn = conversation.run!
118
148
  puts turn.output_text
119
149
  ```
120
150
 
121
- ### Application Tasks
151
+ ### Runs
152
+
153
+ Use `Agent#run` when your application needs one non-interactive result. A run is
154
+ the AI equivalent of a service object call: one input, one job, one output.
122
155
 
123
- Use `Agent#run` when your application is executing a task instead of chatting
124
- with a user:
156
+ Reach for a run when the task is bounded, such as classification, extraction,
157
+ summarization, routing, scoring, or structured JSON generation.
125
158
 
126
159
  ```ruby
127
160
  agent = TurnKit::Agent.new(
@@ -135,7 +168,6 @@ agent = TurnKit::Agent.new(
135
168
  },
136
169
  required: ["priority", "reason"]
137
170
  },
138
- prompt_mode: :task
139
171
  )
140
172
 
141
173
  run = agent.run(
@@ -146,8 +178,10 @@ run = agent.run(
146
178
  puts run.output_data
147
179
  ```
148
180
 
149
- `Agent#run` is a small wrapper over TurnKit's existing conversation and turn
150
- engine. Existing `conversation.ask` usage is still supported.
181
+ `Agent#run` uses task prompt behavior by default: it treats the input as the
182
+ contract, avoids follow-up questions, and returns the best result it can. It is a
183
+ small wrapper over TurnKit's existing conversation and turn engine. Existing
184
+ `conversation.ask` usage is still supported for multi-turn threads.
151
185
 
152
186
  Prepare a pending run without calling the model:
153
187
 
@@ -157,31 +191,39 @@ request = run.preview
157
191
  run.run!
158
192
  ```
159
193
 
160
- ### Fleets
194
+ ### Workflows
161
195
 
162
- Use a fleet when you want to package a reusable autonomous workflow: one
163
- task-mode orchestrator, workflow skills, tools, defaults, and guardrails. A
164
- fleet is not a requirement for multi-agent work; it is the reusable runtime for
165
- getting from input to output.
196
+ Use a workflow when a run graduates into a reusable production capability: a
197
+ named task runner with workflow skills, tools, defaults, guardrails, compaction,
198
+ and output policy.
199
+
200
+ Workflows fight for their life when the task has a repeatable operating
201
+ procedure: inspect app data, gather context, use sources, draft, verify, save,
202
+ and stop under budget. They are overkill for simple classification or extraction
203
+ runs.
166
204
 
167
205
  ```ruby
168
206
  source_grounded_brief = TurnKit::Skill.from_file("app/ai/skills/source_grounded_brief.md")
169
207
 
170
- fleet = TurnKit.fleet(
171
- "brief_writer",
208
+ workflow = TurnKit::Workflow.new(
209
+ name: "brief_writer",
172
210
  instructions: "Create source-grounded briefs and verify claims before final output.",
173
211
  skills: [source_grounded_brief],
174
212
  tools: [WebSearch.new, ReadWebPage.new, SaveBrief],
175
213
  max_spend: 0.25,
176
214
  max_iterations: 12,
177
215
  max_tool_executions: 25,
216
+ max_tool_executions_by_name: {
217
+ web_search: 2,
218
+ read_web_page: 8
219
+ },
178
220
  compaction: {
179
221
  context_limit: 64_000,
180
222
  threshold: 0.75
181
223
  }
182
224
  )
183
225
 
184
- run = fleet.run(
226
+ run = workflow.run(
185
227
  "Create a source-grounded brief.",
186
228
  input: { topic: "Rails 8 Solid Queue" }
187
229
  )
@@ -198,11 +240,40 @@ model-tool loop:
198
240
  model → tool → result → model → tool → result → final
199
241
  ```
200
242
 
201
- `auto_run` is an alias for `run` when you want the name to emphasize autonomous
202
- execution:
243
+ For repeated workflows, keep instructions, skills, and tools stable and pass the
244
+ per-run data through `input:`. This gives provider prompt caching the best chance
245
+ to reuse the stable workflow prompt while each run supplies dynamic data.
246
+
247
+ ### Choosing runs, conversations, and workflows
248
+
249
+ Use the smallest entry point that matches the shape of work:
250
+
251
+ | Entry point | Use when | Tradeoffs |
252
+ | --- | --- | --- |
253
+ | `Conversation` | A user or app will keep adding messages over time. | Best for durable threads and follow-up steering; history grows, so long threads need compaction. |
254
+ | `Agent#run` | Your app needs one bounded result now. | Best for simple production tasks; repeated complex policies can sprawl across callers. |
255
+ | `TurnKit::Workflow` | A task becomes a named reusable workflow with tools, skills, limits, and observability. | Best cache and packaging story for repeated autonomous work; overkill for one-off/simple tasks. |
256
+
257
+ Prompt caching and compaction solve different problems:
258
+
259
+ - prompt caching reduces the cost of repeated stable instructions, tools, and
260
+ skills;
261
+ - compaction reduces the cost of long dynamic histories;
262
+ - budgets (`max_spend`, `max_iterations`, `max_tool_executions`) keep autonomous
263
+ loops bounded.
264
+
265
+ Use `max_tool_executions_by_name` when a workflow needs different budgets for
266
+ different tools. For example, allow many cheap reads but only one final submit
267
+ tool, or cap web searches while allowing a batch page reader.
268
+
269
+ Reach for separate agents and `sub_agents` only when the isolation is worth the
270
+ extra model calls, such as different models, different tool permissions,
271
+ parallel specialist review, or separate durable child conversations.
272
+
273
+ Run a workflow with `run`:
203
274
 
204
275
  ```ruby
205
- run = fleet.auto_run(
276
+ run = workflow.run(
206
277
  "Create compliant outreach for this account.",
207
278
  input: lead.attributes,
208
279
  max_spend: 0.25,
@@ -215,10 +286,6 @@ run = fleet.auto_run(
215
286
  )
216
287
  ```
217
288
 
218
- Reach for separate agents and `sub_agents` only when the isolation is worth the
219
- extra model calls, such as different models, different tool permissions,
220
- parallel specialist review, or separate durable child conversations.
221
-
222
289
  Use `terminal!` for save or action tools that complete the run:
223
290
 
224
291
  ```ruby
@@ -235,6 +302,71 @@ class SaveBrief < TurnKit::Tool
235
302
  end
236
303
  ```
237
304
 
305
+ ### Output audits and policies
306
+
307
+ Use output audits for deterministic checks that should not depend on another
308
+ model call: required headings, source counts, forbidden characters, JSON shape,
309
+ or project-specific formatting rules.
310
+
311
+ ```ruby
312
+ no_em_dash = ->(output) do
313
+ next unless output.include?("—")
314
+
315
+ { rule: "no_em_dash", message: "contains an em dash" }
316
+ end
317
+
318
+ numbered_lists_only = ->(output) do
319
+ lines = output.lines.each_with_index.filter_map do |line, index|
320
+ index + 1 if line.match?(/^\s*[-*]\s+/)
321
+ end
322
+
323
+ next if lines.empty?
324
+
325
+ {
326
+ rule: "numbered_lists_only",
327
+ message: "contains unordered list markers",
328
+ metadata: { lines: lines }
329
+ }
330
+ end
331
+
332
+ workflow = TurnKit::Workflow.new(
333
+ name: "memo_writer",
334
+ output_audit: [no_em_dash, numbered_lists_only],
335
+ output_audit_mode: :fail
336
+ )
337
+ ```
338
+
339
+ Run checks directly when you want to test a renderer or policy without calling a
340
+ model:
341
+
342
+ ```ruby
343
+ audit = TurnKit.audit_output(
344
+ "1. Recommendation\n- unordered item — fix this\n",
345
+ constraints: [no_em_dash, numbered_lists_only]
346
+ )
347
+
348
+ puts audit.clean?
349
+ puts audit.messages
350
+ ```
351
+
352
+ Use `output_policy` when a semantic judge is worth the extra model call. The
353
+ policy can be a `.md`, `.markdown`, or `.txt` file path, a `TurnKit::OutputPolicy`,
354
+ or any object that responds to `#call` or `#check`.
355
+
356
+ ```ruby
357
+ workflow = TurnKit::Workflow.new(
358
+ name: "memo_writer",
359
+ output_policy: "app/ai/policies/amazon_memo.md",
360
+ output_policy_model: "gpt-4.1-mini",
361
+ output_policy_thinking: { effort: :low },
362
+ output_policy_mode: :report
363
+ )
364
+ ```
365
+
366
+ `output_policy_mode: :report` records violations while allowing the run to
367
+ complete. `:fail` marks the run failed after recording the output and audit.
368
+ Policy model usage and cost are counted on the parent run.
369
+
238
370
  ### Prompt Preview
239
371
 
240
372
  Preview a pending turn:
@@ -272,9 +404,7 @@ class SaveReport < TurnKit::Tool
272
404
  parameter :title, :string, required: true
273
405
  parameter :body, :string, required: true
274
406
 
275
- def self.ends_turn? = true
276
-
277
- def self.completion_message(result)
407
+ terminal! do |result|
278
408
  "Saved #{result.fetch("report_id")}."
279
409
  end
280
410
 
@@ -490,19 +620,28 @@ TurnKit.reconcile_stale!
490
620
  | `TurnKit.max_iterations` | Limit model loop iterations. |
491
621
  | `TurnKit.max_depth` | Limit sub-agent depth. |
492
622
  | `TurnKit.max_tool_executions` | Limit tool calls per turn. |
623
+ | `TurnKit.max_tool_executions_by_name` | Limit specific tools independently. |
493
624
  | `TurnKit.timeout` | Limit turn runtime. |
494
- | `TurnKit.cost_limit` | Limit estimated turn cost. |
625
+ | `TurnKit.max_spend` | Limit estimated turn cost. |
495
626
  | `TurnKit.compaction` | Configure context compaction. |
627
+ | `TurnKit.output_policy_model` | Default model for file-backed output policies. |
628
+ | `TurnKit.output_policy_thinking` | Default thinking config for file-backed output policies. |
496
629
  | `TurnKit.on_event` | Subscribe to lifecycle events. |
497
630
 
498
631
  Set options globally:
499
632
 
500
633
  ```ruby
501
634
  TurnKit.default_model = "gpt-4.1-mini"
635
+ TurnKit.max_spend = 0.25
502
636
  TurnKit.max_iterations = 25
637
+ TurnKit.max_tool_executions_by_name = { web_search: 2 }
638
+ TurnKit.output_policy_model = "gpt-4.1-mini"
503
639
  TurnKit.timeout = 300
504
640
  ```
505
641
 
642
+ `TurnKit.cost_limit` remains supported as the internal/legacy name for
643
+ `max_spend`.
644
+
506
645
  Set options per agent:
507
646
 
508
647
  ```ruby
data/UPGRADE.md CHANGED
@@ -1,13 +1,27 @@
1
1
  # Upgrade Guide
2
2
 
3
- This guide covers migrating to the newer task-runtime API. The changes are
4
- mostly additive: existing `Agent`, `Conversation`, `Tool`, and `Fleet` code
5
- should continue to work. The recommended migration is about improving developer
6
- experience and making autonomous workflows easier to read.
3
+ This guide covers migrating to the workflow-based task-runtime API. The
4
+ recommended migration is about making the three work shapes easier to read:
5
+
6
+ - conversations for durable multi-turn threads;
7
+ - runs for one non-interactive application task;
8
+ - workflows for reusable task runners with tools, skills, limits, and policy.
7
9
 
8
10
  ## Quick summary
9
11
 
10
- You do **not** need to rewrite existing code immediately.
12
+ Before changing call sites, bump TurnKit to the latest version and run your
13
+ test suite against the new release.
14
+
15
+ ```ruby
16
+ # Gemfile
17
+ gem "turnkit", "~> 0.2.9"
18
+ ```
19
+
20
+ ```sh
21
+ bundle update turnkit
22
+ ```
23
+
24
+ Use workflows for reusable autonomous task runners.
11
25
 
12
26
  Recommended new forms:
13
27
 
@@ -17,23 +31,12 @@ TurnKit.configure do |config|
17
31
  config.max_spend = 0.25
18
32
  end
19
33
 
20
- fleet = TurnKit.fleet("brief_writer", tools: [WebSearch, SaveBrief])
21
- run = fleet.run("Create a source-grounded brief.", input: { topic: "Rails 8" })
34
+ workflow = TurnKit::Workflow.new(name: "brief_writer", tools: [WebSearch, SaveBrief])
35
+ run = workflow.run("Create a source-grounded brief.", input: { topic: "Rails 8" })
22
36
 
23
37
  puts run.output
24
38
  ```
25
39
 
26
- Old forms still work:
27
-
28
- ```ruby
29
- TurnKit.default_model = "gpt-5.2"
30
-
31
- fleet = TurnKit::Fleet.new(name: "brief_writer", tools: [WebSearch, SaveBrief])
32
- run = fleet.run(task: "Create a source-grounded brief.", input: { topic: "Rails 8" })
33
-
34
- puts run.output_text
35
- ```
36
-
37
40
  ## Configuration
38
41
 
39
42
  ### Model name
@@ -112,7 +115,8 @@ puts run.output
112
115
  ```
113
116
 
114
117
  The keyword form still works. The positional string is the recommended form for
115
- the common case.
118
+ the common case. `Agent#run` uses task prompt behavior by default; pass
119
+ `prompt_mode: :full` if you need conversation-style prompt behavior for a run.
116
120
 
117
121
  ### Pending runs
118
122
 
@@ -130,10 +134,10 @@ The existing keyword form remains valid:
130
134
  run = agent.run(task: "Classify later.", async: true)
131
135
  ```
132
136
 
133
- ## Fleets
137
+ ## Workflows
134
138
 
135
- The fleet mental model changed from “many agents” to “one reusable autonomous
136
- task runtime.” A fleet packages:
139
+ The preferred name for reusable autonomous task runtimes is now workflow. A
140
+ workflow packages:
137
141
 
138
142
  - one task-mode orchestrator
139
143
  - workflow skills
@@ -144,10 +148,8 @@ task runtime.” A fleet packages:
144
148
 
145
149
  ### Construction
146
150
 
147
- Before:
148
-
149
151
  ```ruby
150
- fleet = TurnKit::Fleet.new(
152
+ workflow = TurnKit::Workflow.new(
151
153
  name: "sales_enrichment",
152
154
  tools: [AccountLookup, WebSearch, SaveEnrichment],
153
155
  skills: [sales_research_skill],
@@ -155,34 +157,10 @@ fleet = TurnKit::Fleet.new(
155
157
  )
156
158
  ```
157
159
 
158
- After:
159
-
160
- ```ruby
161
- fleet = TurnKit.fleet(
162
- "sales_enrichment",
163
- tools: [AccountLookup, WebSearch, SaveEnrichment],
164
- skills: [sales_research_skill],
165
- max_spend: 0.25
166
- )
167
- ```
168
-
169
- `TurnKit::Fleet.new` remains supported.
170
-
171
160
  ### Running
172
161
 
173
- Before:
174
-
175
162
  ```ruby
176
- run = fleet.run(
177
- task: "Enrich this account for responsible outreach.",
178
- input: account.attributes
179
- )
180
- ```
181
-
182
- After:
183
-
184
- ```ruby
185
- run = fleet.run(
163
+ run = workflow.run(
186
164
  "Enrich this account for responsible outreach.",
187
165
  input: account.attributes
188
166
  )
@@ -190,17 +168,6 @@ run = fleet.run(
190
168
 
191
169
  `task:` remains supported.
192
170
 
193
- ### Auto-run alias
194
-
195
- No behavior change.
196
-
197
- ```ruby
198
- run = fleet.auto_run("Enrich this account.", input: account.attributes)
199
- ```
200
-
201
- Use `auto_run` when the name helps communicate that the fleet should navigate
202
- from input to output on its own. It is an alias for `run`.
203
-
204
171
  ## Run inspection
205
172
 
206
173
  New convenience methods were added to `TurnKit::Run`.
@@ -285,10 +252,10 @@ agent = TurnKit::Agent.new(tools: [WebSearch.new(client: client)])
285
252
  This is the recommended pattern for API clients, test doubles, and per-tenant
286
253
  dependencies.
287
254
 
288
- ## Multi-agent fleets
255
+ ## Multi-agent workflows
289
256
 
290
257
  If you previously modeled every role as a separate agent, consider migrating the
291
- default path to one fleet with a workflow skill.
258
+ default path to one workflow with a workflow skill.
292
259
 
293
260
  Before:
294
261
 
@@ -315,8 +282,8 @@ workflow = TurnKit::Skill.new(
315
282
  TEXT
316
283
  )
317
284
 
318
- fleet = TurnKit.fleet(
319
- "source_brief",
285
+ source_brief = TurnKit::Workflow.new(
286
+ name: "source_brief",
320
287
  skills: [workflow],
321
288
  tools: [WebSearch, ReadWebPage, SaveBrief],
322
289
  max_spend: 0.25,
@@ -336,11 +303,11 @@ Keep separate agents when the isolation is worth the extra model calls:
336
303
 
337
304
  1. Replace `TurnKit.default_model =` with `TurnKit.model =` in app-level config.
338
305
  2. Wrap global settings in `TurnKit.configure` if you have more than one.
339
- 3. Replace `TurnKit::Fleet.new(name: ...)` with `TurnKit.fleet("...")` in new code.
306
+ 3. Use `TurnKit::Workflow.new(name: "...")` for reusable autonomous task runners.
340
307
  4. Replace `run(task: "...")` with `run("...")` where it improves readability.
341
308
  5. Replace `run.output_text` with `run.output` in application code.
342
309
  6. Replace save/action tool overrides with `terminal!` when convenient.
343
- 7. Consider collapsing role-agent fleets into one fleet plus workflow skills if
310
+ 7. Consider collapsing role-agent workflows into one workflow plus workflow skills if
344
311
  cost or complexity is a concern.
345
312
 
346
- None of these steps are required for existing code to keep working.
313
+ Run your test suite after migrating call sites.
@@ -0,0 +1,160 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require "open3"
5
+ require "tempfile"
6
+
7
+ module TurnKit
8
+ module Adapters
9
+ class Codex < Client
10
+ Status = Struct.new(:successful, keyword_init: true) do
11
+ def success? = successful
12
+ end
13
+
14
+ attr_reader :command, :sandbox, :working_directory
15
+
16
+ def initialize(command: ENV.fetch("CODEX_COMMAND", "codex"), sandbox: "read-only", working_directory: Dir.pwd, runner: nil)
17
+ @command = command.to_s
18
+ @sandbox = sandbox
19
+ @working_directory = working_directory
20
+ @runner = runner || method(:run_command)
21
+ end
22
+
23
+ def validate!(model:)
24
+ raise ModelAccessError, "codex command is required" if command.empty?
25
+ raise ModelAccessError, "#{command.inspect} was not found. Install OpenAI Codex CLI and run `codex login --device-auth`." unless executable?(command)
26
+
27
+ stdout, stderr, status = @runner.call([ command, "login", "status" ], stdin_data: nil, chdir: working_directory)
28
+ return true if status.success?
29
+
30
+ message = [ stderr, stdout ].join("\n").strip
31
+ hint = "Run `#{command} login --device-auth` to connect your ChatGPT/Codex subscription."
32
+ raise ModelAccessError, [ "Codex is not authenticated.", message, hint ].reject(&:empty?).join(" ")
33
+ end
34
+
35
+ def chat(model:, messages:, tools:, instructions:, temperature: nil, thinking: nil, output_schema: nil, metadata: nil, on_event: nil)
36
+ raise ToolError, "TurnKit tools are not supported by the Codex adapter; Codex uses its own local tools" if Array(tools).any?
37
+
38
+ with_tempfiles(output_schema: output_schema) do |schema_file, output_file|
39
+ command = exec_command(model: model, schema_file: schema_file&.path, output_file: output_file.path)
40
+ stdout, stderr, status = @runner.call(command, stdin_data: prompt_for(messages: messages, instructions: instructions), chdir: working_directory)
41
+ emit_codex_events(stdout, on_event: on_event)
42
+ raise ModelAccessError, stderr.strip.empty? ? "codex exec failed" : stderr.strip unless status.success?
43
+
44
+ text = read_output(output_file, stdout)
45
+ Result.new(
46
+ text: text,
47
+ output_data: parse_output_data(text, output_schema: output_schema),
48
+ usage: usage_from_jsonl(stdout),
49
+ model: model
50
+ )
51
+ end
52
+ end
53
+
54
+ private
55
+ def exec_command(model:, schema_file:, output_file:)
56
+ args = [ command, "exec", "--json" ]
57
+ args += [ "--sandbox", sandbox.to_s ] if sandbox
58
+ args += [ "--model", model.to_s ] unless model.to_s.empty? || model.to_s == "codex"
59
+ args += [ "--output-schema", schema_file ] if schema_file
60
+ args += [ "-o", output_file, "-" ]
61
+ args
62
+ end
63
+
64
+ def prompt_for(messages:, instructions:)
65
+ parts = []
66
+ parts << "System instructions:\n#{instructions}" unless instructions.to_s.empty?
67
+ Array(messages).each do |message|
68
+ attrs = message.respond_to?(:to_h) ? message.to_h : message
69
+ attrs = attrs.transform_keys(&:to_s)
70
+ role = attrs["role"] || "user"
71
+ content = attrs["content"] || attrs["text"] || ""
72
+ parts << "#{role}:\n#{content}"
73
+ end
74
+ parts.join("\n\n")
75
+ end
76
+
77
+ def with_tempfiles(output_schema:)
78
+ output_file = Tempfile.new([ "turnkit-codex-output", ".txt" ])
79
+ schema_file = nil
80
+ if output_schema
81
+ schema_file = Tempfile.new([ "turnkit-codex-schema", ".json" ])
82
+ schema_file.write(JSON.generate(output_schema))
83
+ schema_file.flush
84
+ end
85
+
86
+ yield schema_file, output_file
87
+ ensure
88
+ schema_file&.close!
89
+ output_file&.close!
90
+ end
91
+
92
+ def read_output(output_file, stdout)
93
+ output_file.rewind
94
+ text = output_file.read.to_s
95
+ return text unless text.empty?
96
+
97
+ final_message_from_jsonl(stdout) || stdout.to_s
98
+ end
99
+
100
+ def final_message_from_jsonl(stdout)
101
+ events = parse_jsonl(stdout)
102
+ messages = events.filter_map do |event|
103
+ item = event["item"]
104
+ next unless item.is_a?(Hash) && item["type"] == "agent_message"
105
+
106
+ item["text"]
107
+ end
108
+ messages.last
109
+ end
110
+
111
+ def parse_output_data(text, output_schema:)
112
+ return nil unless output_schema
113
+
114
+ JSON.parse(text)
115
+ rescue JSON::ParserError
116
+ nil
117
+ end
118
+
119
+ def usage_from_jsonl(stdout)
120
+ usage = parse_jsonl(stdout).filter_map { |event| event["usage"] if event.is_a?(Hash) }.last || {}
121
+ input = usage["input_tokens"].to_i
122
+ cached = usage["cached_input_tokens"].to_i
123
+ Usage.new(
124
+ input_tokens: [ input - cached, 0 ].max,
125
+ output_tokens: usage["output_tokens"].to_i,
126
+ cached_tokens: cached,
127
+ thinking_tokens: usage["reasoning_output_tokens"].to_i
128
+ )
129
+ end
130
+
131
+ def emit_codex_events(stdout, on_event:)
132
+ return unless on_event
133
+
134
+ parse_jsonl(stdout).each do |event|
135
+ on_event.call(type: "codex.#{event.fetch("type", "event")}", payload: event)
136
+ end
137
+ end
138
+
139
+ def parse_jsonl(stdout)
140
+ stdout.to_s.each_line.filter_map do |line|
141
+ JSON.parse(line)
142
+ rescue JSON::ParserError
143
+ nil
144
+ end
145
+ end
146
+
147
+ def executable?(name)
148
+ return true if @runner != method(:run_command)
149
+ return File.executable?(name) if name.include?(File::SEPARATOR)
150
+
151
+ ENV.fetch("PATH", "").split(File::PATH_SEPARATOR).any? { |path| File.executable?(File.join(path, name)) }
152
+ end
153
+
154
+ def run_command(command, stdin_data:, chdir:)
155
+ stdout, stderr, status = Open3.capture3(*command, stdin_data: stdin_data, chdir: chdir)
156
+ [ stdout, stderr, status ]
157
+ end
158
+ end
159
+ end
160
+ end