turnkit 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b5694bc97b2f735e5076574e2863ee5addc41926bd85edf02e1835263ffb3516
4
- data.tar.gz: 65286330a1d0b4bbd0e3e6c11ba73abd836fb22a44ae4b3ab48a58ecf9d19425
3
+ metadata.gz: 67553a737fbce38e2402167aeb2cc799c55390c5c7c9740a8d73d056b0679a06
4
+ data.tar.gz: fc395c09f05e8ba640ec9dda4907419f0a76a47e8a45607435c5dde0630c3562
5
5
  SHA512:
6
- metadata.gz: 2b3674abf0cae37286a04431f0ceb02a30e282c715e4d6d96e51c0a08d600c94a9fee6c82bf178c0b97ff080ee221b00a18b5409e72003d92c7a5430b34d5733
7
- data.tar.gz: 7141f5cc00df42bfaf0e9b035d75f54e0b7c9b14ff71a8c95805242b32835fb410358f7f42ff2161f89896425b62fd840c05dcc2504f555430450517dc61bf9b
6
+ metadata.gz: 45864840f7bc24d3626e0e3bb849f2ea3d71c1fdb8a2faa7ff19725ab0b6932d205962208c2ea75be7014d8f0befa7c6ebd754e0f5dfedd7db6875fff90254a9
7
+ data.tar.gz: 7dc83b0922078e52fb220438097514c0efba0d10740a9cec40e810f316d42c6e50fdc67f92db2cb9e9564fdcab376a1e01286d8821effa4a9246e51f8baf6c1c
data/CHANGELOG.md CHANGED
@@ -1,5 +1,18 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.4.1 - 2026-06-19
4
+
5
+ - Add first-class media analysis with `Turn#view_media`, `TurnKit.view_media`, and `TurnKit::ViewMediaTool`.
6
+ - Normalize media inputs for paths, URLs, IO/bytes, and Rails Active Storage-compatible attachments.
7
+ - Persist media analysis messages with model, provider, usage, cost, structured output, events, and output policy support.
8
+ - Add a Gemini 3 Flash media-analysis smoke example.
9
+
10
+ ## 0.4.0 - 2026-06-19
11
+
12
+ - Add first-class image generation with `Turn#paint`, `TurnKit.paint`, and `TurnKit::ImageTool`.
13
+ - Persist generated images as durable image messages with normalized metadata, usage, cost, and event callbacks.
14
+ - Add image output policy support and a `generate-image` CLI smoke example for Gemini 16:9 image generation.
15
+
3
16
  ## 0.3.0 - 2026-06-10
4
17
 
5
18
  - Make the task-runtime API skills-first and intentionally breaking: `max_spend` is the only spend-limit name and output validation is exposed as `output_policy` / `policy_audit`.
data/README.md CHANGED
@@ -459,6 +459,121 @@ puts turn.output_text
459
459
 
460
460
  Rely on TurnKit to validate tools and model-provided arguments.
461
461
 
462
+ ### Images
463
+
464
+ Generate images inside a durable turn with `turn.paint`. The image call uses the
465
+ configured client adapter, records usage and cost on the turn, persists an image
466
+ message, and emits `image.requested` / `image.completed` events.
467
+
468
+ ```ruby
469
+ image = turn.paint(
470
+ "Create a 16:9 editorial header image for the article.",
471
+ model: "gemini-3-pro-image-preview",
472
+ provider: :gemini,
473
+ size: "1024x576",
474
+ metadata: { article_id: article.id }
475
+ )
476
+
477
+ image.url # provider-hosted URL when returned
478
+ image.to_blob # generated bytes for base64 responses, or fetched URL bytes
479
+ image.mime_type # "image/png"
480
+ ```
481
+
482
+ For reusable workflow steps, subclass `TurnKit::ImageTool`:
483
+
484
+ ```ruby
485
+ class GenerateHeaderImage < TurnKit::ImageTool
486
+ description "Generate an article header image."
487
+ parameter :title, :string, required: true
488
+
489
+ model "gemini-3-pro-image-preview"
490
+ provider :gemini
491
+ size "1024x576"
492
+
493
+ def prompt(title:)
494
+ "Create a 16:9 editorial header image for #{title}."
495
+ end
496
+ end
497
+ ```
498
+
499
+ Rails apps can attach generated images from the event stream without TurnKit
500
+ taking a dependency on Active Storage:
501
+
502
+ ```ruby
503
+ TurnKit.on_event = ->(event) do
504
+ next unless event.type == "image.completed"
505
+
506
+ image = TurnKit::ImageResult.from_h(event.payload.fetch(:image))
507
+ Article.find(event.payload.dig(:metadata, :article_id)).header_image.attach(
508
+ io: StringIO.new(image.to_blob),
509
+ filename: "header.png",
510
+ content_type: image.mime_type
511
+ )
512
+ end
513
+ ```
514
+
515
+ Require an image before completion with `TurnKit::OutputPolicy.require_image`.
516
+
517
+ ### Media analysis
518
+
519
+ Analyze existing images, PDFs, audio, or video inside a durable turn with
520
+ `turn.view_media`. Media inputs can be local paths, URLs, IO-like objects,
521
+ `TurnKit::MediaInput.bytes(...)`, or Rails Active Storage blobs/attachments.
522
+ TurnKit records usage and cost on the turn, persists a media analysis message,
523
+ and emits `media.requested` / `media.completed` / `media.failed` events.
524
+
525
+ ```ruby
526
+ analysis = turn.view_media(
527
+ article.header_image,
528
+ objective: "Verify this generated header matches the article art direction.",
529
+ model: "gemini-2.5-pro",
530
+ provider: :gemini,
531
+ metadata: { article_id: article.id }
532
+ )
533
+
534
+ analysis.text # text analysis
535
+ analysis.data # structured output when requested
536
+ analysis.media # normalized media metadata
537
+ ```
538
+
539
+ For bytes, provide a MIME type so adapters can pass the media correctly:
540
+
541
+ ```ruby
542
+ media = TurnKit::MediaInput.bytes(
543
+ File.binread("header.png"),
544
+ mime_type: "image/png",
545
+ filename: "header.png"
546
+ )
547
+ ```
548
+
549
+ For reusable workflow steps, subclass `TurnKit::ViewMediaTool`:
550
+
551
+ ```ruby
552
+ class ReviewHeaderImage < TurnKit::ViewMediaTool
553
+ description "Review a generated article header image."
554
+ parameter :article_id, :integer, required: true
555
+
556
+ model "gemini-2.5-pro"
557
+ provider :gemini
558
+
559
+ def media(article_id:)
560
+ Article.find(article_id).header_image
561
+ end
562
+
563
+ def objective(article_id:)
564
+ "Review this generated image against the article art direction."
565
+ end
566
+
567
+ def metadata(article_id:)
568
+ { article_id: article_id }
569
+ end
570
+ end
571
+ ```
572
+
573
+ Require a media review before completion with
574
+ `TurnKit::OutputPolicy.require_media_analysis`. TurnKit persists media metadata
575
+ and analysis text, not raw media bytes.
576
+
462
577
  ### Structured Output
463
578
 
464
579
  Define a schema:
@@ -41,6 +41,40 @@ module TurnKit
41
41
  normalize_response(response, model: model)
42
42
  end
43
43
 
44
+ def paint(prompt:, model:, provider: nil, size: nil, assume_model_exists: nil, input_images: nil, mask: nil, params: {}, metadata: nil, on_event: nil)
45
+ require "ruby_llm"
46
+
47
+ configure_from_environment
48
+ kwargs = paint_kwargs(
49
+ model: model,
50
+ provider: provider,
51
+ assume_model_exists: assume_model_exists || false,
52
+ size: size || "1024x1024",
53
+ with: input_images,
54
+ mask: mask,
55
+ params: params || {}
56
+ )
57
+ image = ::RubyLLM.paint(prompt, **kwargs)
58
+ normalize_image_response(image, model: model, provider: provider, params: { "size" => size || "1024x1024" }.merge(params || {}), metadata: metadata)
59
+ end
60
+
61
+ def view_media(media:, objective:, model:, provider: nil, output_schema: nil, params: {}, metadata: nil, on_event: nil)
62
+ require "ruby_llm"
63
+
64
+ configure_from_environment
65
+ media_input = MediaInput.wrap(media)
66
+ content = ::RubyLLM::Content.new(objective.to_s)
67
+ content.add_attachment(media_input.attachment_source, filename: media_input.filename)
68
+
69
+ chat = ::RubyLLM.chat(model: model)
70
+ chat.with_schema(normalize_schema(output_schema)) if output_schema
71
+ chat.with_params(**params) if params && !params.empty?
72
+ chat.add_message(role: :user, content: content)
73
+
74
+ response = complete_without_tool_execution(chat)
75
+ normalize_media_analysis_response(response, media: media_input, model: model, provider: provider, params: params || {}, metadata: metadata)
76
+ end
77
+
44
78
  private
45
79
  def configure_from_environment
46
80
  config = ::RubyLLM.config
@@ -246,6 +280,70 @@ module TurnKit
246
280
 
247
281
  response.cost&.total
248
282
  end
283
+
284
+ def paint_kwargs(kwargs)
285
+ parameters = ::RubyLLM::Image.method(:paint).parameters
286
+ return kwargs if parameters.any? { |kind, _| kind == :keyrest }
287
+
288
+ accepted = parameters.filter_map { |kind, name| name if %i[key keyreq].include?(kind) }
289
+ unsupported = kwargs.keys.select { |key| !accepted.include?(key) && !blank?(kwargs[key]) }
290
+ raise ArgumentError, "RubyLLM image generation does not support: #{unsupported.join(", ")}" if unsupported.any?
291
+
292
+ kwargs.slice(*accepted)
293
+ end
294
+
295
+ def blank?(value)
296
+ value.nil? || value == false || (value.respond_to?(:empty?) && value.empty?)
297
+ end
298
+
299
+ def normalize_image_response(image, model:, provider:, params:, metadata:)
300
+ usage = Usage.new(
301
+ input_tokens: image_usage_value(image, "input_tokens"),
302
+ output_tokens: image_usage_value(image, "output_tokens"),
303
+ cost: response_cost(image)
304
+ )
305
+ part = ImageResult.new(
306
+ url: image.respond_to?(:url) ? image.url : nil,
307
+ data: image.respond_to?(:data) ? image.data : nil,
308
+ mime_type: image.respond_to?(:mime_type) ? image.mime_type : nil,
309
+ revised_prompt: image.respond_to?(:revised_prompt) ? image.revised_prompt : nil,
310
+ model: image.respond_to?(:model_id) ? image.model_id : model,
311
+ provider: provider&.to_s,
312
+ usage: usage,
313
+ params: params,
314
+ metadata: metadata || {}
315
+ ).to_h.merge("type" => "image")
316
+
317
+ Result.new(parts: [ part ], usage: usage, model: part["model"], output_data: { "type" => "image", "images" => [ part ] })
318
+ end
319
+
320
+ def normalize_media_analysis_response(response, media:, model:, provider:, params:, metadata:)
321
+ usage = Usage.new(
322
+ input_tokens: token_value(response, :input_tokens),
323
+ output_tokens: token_value(response, :output_tokens),
324
+ cached_tokens: token_value(response, :cached_tokens),
325
+ cache_write_tokens: token_value(response, :cache_creation_tokens),
326
+ thinking_tokens: thinking_token_value(response),
327
+ cost: response_cost(response)
328
+ )
329
+ part = MediaAnalysisResult.new(
330
+ text: response_text(response),
331
+ data: response_data(response),
332
+ model: response.respond_to?(:model_id) ? response.model_id : model,
333
+ provider: provider&.to_s,
334
+ usage: usage,
335
+ params: params,
336
+ media: media.to_h,
337
+ metadata: metadata || {}
338
+ ).to_h.merge("type" => "media_analysis")
339
+
340
+ Result.new(parts: [ part ], usage: usage, model: part["model"], output_data: { "type" => "media_analysis", "media_analyses" => [ part ] })
341
+ end
342
+
343
+ def image_usage_value(image, key)
344
+ usage = image.respond_to?(:usage) ? image.usage || {} : {}
345
+ (usage[key] || usage[key.to_sym]).to_i
346
+ end
249
347
  end
250
348
  end
251
349
  end
@@ -9,5 +9,13 @@ module TurnKit
9
9
  def chat(model:, messages:, tools:, instructions:, temperature: nil, thinking: nil, output_schema: nil, metadata: nil, on_event: nil)
10
10
  raise NotImplementedError
11
11
  end
12
+
13
+ def paint(prompt:, model:, provider: nil, size: nil, assume_model_exists: nil, input_images: nil, mask: nil, params: {}, metadata: nil, on_event: nil)
14
+ raise NotImplementedError
15
+ end
16
+
17
+ def view_media(media:, objective:, model:, provider: nil, output_schema: nil, params: {}, metadata: nil, on_event: nil)
18
+ raise NotImplementedError
19
+ end
12
20
  end
13
21
  end
@@ -0,0 +1,51 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "base64"
4
+ require "open-uri"
5
+
6
+ module TurnKit
7
+ class ImageResult
8
+ attr_reader :url, :data, :mime_type, :revised_prompt, :model, :provider, :usage, :params, :metadata
9
+
10
+ def self.from_h(value)
11
+ new(**value.transform_keys(&:to_sym))
12
+ end
13
+
14
+ def initialize(url: nil, data: nil, mime_type: nil, revised_prompt: nil, model: nil, provider: nil, usage: Usage.new, params: {}, metadata: {}, **)
15
+ @url = url
16
+ @data = data
17
+ @mime_type = mime_type
18
+ @revised_prompt = revised_prompt
19
+ @model = model
20
+ @provider = provider
21
+ @usage = usage.is_a?(Usage) ? usage : Usage.from_h(usage || {})
22
+ @params = params || {}
23
+ @metadata = metadata || {}
24
+ end
25
+
26
+ def to_blob
27
+ raise Error, "image has no url or data" if url.to_s.empty? && data.to_s.empty?
28
+
29
+ data ? Base64.decode64(data) : URI.open(url, &:read)
30
+ end
31
+
32
+ def cost
33
+ Cost.from_usage(usage, model: model)
34
+ end
35
+
36
+ def to_h
37
+ {
38
+ "url" => url,
39
+ "data" => data,
40
+ "mime_type" => mime_type,
41
+ "revised_prompt" => revised_prompt,
42
+ "model" => model,
43
+ "provider" => provider,
44
+ "usage" => usage.to_h,
45
+ "cost" => cost.to_h,
46
+ "params" => params,
47
+ "metadata" => metadata
48
+ }.compact
49
+ end
50
+ end
51
+ end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TurnKit
4
+ class ImageTool < Tool
5
+ class << self
6
+ %i[model provider size assume_model_exists params].each do |name|
7
+ define_method(name) do |value = nil|
8
+ instance_variable_set("@#{name}", value) unless value.nil?
9
+ instance_variable_get("@#{name}")
10
+ end
11
+ end
12
+ end
13
+
14
+ def call(turnkit_context:, **arguments)
15
+ turnkit_context.turn.paint(
16
+ prompt(**arguments),
17
+ model: self.class.model,
18
+ provider: self.class.provider,
19
+ size: self.class.size,
20
+ assume_model_exists: self.class.assume_model_exists,
21
+ params: self.class.params || {},
22
+ metadata: metadata(**arguments)
23
+ ).to_h
24
+ end
25
+
26
+ def metadata(**)
27
+ {}
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,48 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TurnKit
4
+ class MediaAnalysisResult
5
+ attr_reader :text, :data, :model, :provider, :usage, :params, :media, :metadata, :error
6
+
7
+ def self.from_h(value)
8
+ new(**value.transform_keys(&:to_sym))
9
+ end
10
+
11
+ def initialize(text: "", data: nil, model: nil, provider: nil, usage: Usage.new, params: {}, media: {}, metadata: {}, error: nil, **)
12
+ @text = text.to_s
13
+ @data = data
14
+ @model = model
15
+ @provider = provider
16
+ @usage = usage.is_a?(Usage) ? usage : Usage.from_h(usage || {})
17
+ @params = params || {}
18
+ @media = media || {}
19
+ @metadata = metadata || {}
20
+ @error = error
21
+ end
22
+
23
+ def data?
24
+ !data.nil?
25
+ end
26
+
27
+ alias structured? data?
28
+
29
+ def cost
30
+ Cost.from_usage(usage, model: model)
31
+ end
32
+
33
+ def to_h
34
+ {
35
+ "text" => text,
36
+ "data" => data,
37
+ "model" => model,
38
+ "provider" => provider,
39
+ "usage" => usage.to_h,
40
+ "cost" => cost.to_h,
41
+ "params" => params,
42
+ "media" => media,
43
+ "metadata" => metadata,
44
+ "error" => error
45
+ }.compact
46
+ end
47
+ end
48
+ end
@@ -0,0 +1,208 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "pathname"
4
+ require "stringio"
5
+ require "uri"
6
+
7
+ module TurnKit
8
+ class MediaInput
9
+ SUPPORTED_MIME_TYPES = %w[image/png image/jpeg image/webp image/gif application/pdf].freeze
10
+ EXTENSION_MIME_TYPES = {
11
+ ".png" => "image/png",
12
+ ".jpg" => "image/jpeg",
13
+ ".jpeg" => "image/jpeg",
14
+ ".webp" => "image/webp",
15
+ ".gif" => "image/gif",
16
+ ".pdf" => "application/pdf",
17
+ ".mp3" => "audio/mpeg",
18
+ ".wav" => "audio/wav",
19
+ ".m4a" => "audio/mp4",
20
+ ".mp4" => "video/mp4",
21
+ ".mov" => "video/quicktime",
22
+ ".webm" => "video/webm"
23
+ }.freeze
24
+
25
+ attr_reader :source, :mime_type, :filename, :metadata, :source_type
26
+
27
+ def self.wrap(value, **options)
28
+ value.is_a?(self) && options.empty? ? value : new(value, **options)
29
+ end
30
+
31
+ def self.bytes(data, mime_type:, filename: nil, metadata: {})
32
+ new(data, source_type: :bytes, mime_type: mime_type, filename: filename, metadata: metadata)
33
+ end
34
+
35
+ def initialize(source, mime_type: nil, filename: nil, metadata: {}, source_type: nil)
36
+ @source = source
37
+ @source_type = (source_type || infer_source_type).to_s
38
+ @filename = filename || infer_filename
39
+ @mime_type = mime_type || infer_mime_type
40
+ @metadata = metadata || {}
41
+
42
+ validate!
43
+ end
44
+
45
+ def kind
46
+ return "image" if mime_type&.start_with?("image/")
47
+ return "audio" if mime_type&.start_with?("audio/")
48
+ return "video" if mime_type&.start_with?("video/")
49
+ return "pdf" if mime_type == "application/pdf"
50
+
51
+ nil
52
+ end
53
+
54
+ def byte_size
55
+ case source_type
56
+ when "path"
57
+ File.size(source.to_s) if File.file?(source.to_s)
58
+ when "bytes"
59
+ source.bytesize
60
+ when "io"
61
+ source.size if source.respond_to?(:size)
62
+ when "active_storage"
63
+ active_storage_byte_size
64
+ end
65
+ end
66
+
67
+ def url
68
+ source.to_s if source_type == "url"
69
+ end
70
+
71
+ def path
72
+ source.to_s if source_type == "path"
73
+ end
74
+
75
+ def attachment_source
76
+ case source_type
77
+ when "bytes"
78
+ StringIO.new(source)
79
+ else
80
+ source
81
+ end
82
+ end
83
+
84
+ def to_h
85
+ {
86
+ "kind" => kind,
87
+ "mime_type" => mime_type,
88
+ "filename" => filename,
89
+ "byte_size" => byte_size,
90
+ "url" => url,
91
+ "path" => path,
92
+ "metadata" => metadata
93
+ }.compact
94
+ end
95
+
96
+ private
97
+ def infer_source_type
98
+ return :url if source.to_s.match?(%r{\Ahttps?://})
99
+ return :active_storage if active_storage?
100
+ return :path if source.is_a?(Pathname) || (source.is_a?(String) && File.exist?(source))
101
+ return :io if source.respond_to?(:read)
102
+ return :bytes if source.is_a?(String)
103
+
104
+ raise ArgumentError, "unsupported media input: #{source.class}"
105
+ end
106
+
107
+ def infer_filename
108
+ case source_type
109
+ when "url"
110
+ basename = File.basename(URI(source.to_s).path).to_s
111
+ basename.empty? ? nil : basename
112
+ when "path"
113
+ File.basename(source.to_s)
114
+ when "io"
115
+ source.respond_to?(:path) ? File.basename(source.path.to_s) : nil
116
+ when "active_storage"
117
+ active_storage_filename
118
+ end
119
+ end
120
+
121
+ def infer_mime_type
122
+ active_storage_content_type || mime_from_filename || mime_from_marcel
123
+ end
124
+
125
+ def mime_from_filename
126
+ EXTENSION_MIME_TYPES[File.extname(filename.to_s).downcase]
127
+ end
128
+
129
+ def mime_from_marcel
130
+ require "marcel"
131
+
132
+ Marcel::MimeType.for(marcel_io, name: filename)
133
+ rescue LoadError
134
+ nil
135
+ ensure
136
+ rewind_source
137
+ end
138
+
139
+ def marcel_io
140
+ case source_type
141
+ when "path"
142
+ Pathname.new(source.to_s)
143
+ when "bytes"
144
+ StringIO.new(source)
145
+ when "io"
146
+ source
147
+ else
148
+ nil
149
+ end
150
+ end
151
+
152
+ def validate!
153
+ return if mime_type.nil?
154
+ return if SUPPORTED_MIME_TYPES.include?(mime_type)
155
+ return if mime_type.start_with?("audio/", "video/")
156
+
157
+ raise ArgumentError, "unsupported media type: #{mime_type}"
158
+ end
159
+
160
+ def active_storage?
161
+ return false unless defined?(ActiveStorage)
162
+
163
+ (defined?(ActiveStorage::Blob) && source.is_a?(ActiveStorage::Blob)) ||
164
+ (defined?(ActiveStorage::Attached::One) && source.is_a?(ActiveStorage::Attached::One)) ||
165
+ (defined?(ActiveStorage::Attached::Many) && source.is_a?(ActiveStorage::Attached::Many))
166
+ end
167
+
168
+ def active_storage_filename
169
+ if defined?(ActiveStorage::Blob) && source.is_a?(ActiveStorage::Blob)
170
+ source.filename.to_s
171
+ elsif source.respond_to?(:filename)
172
+ source.filename.to_s
173
+ elsif source.respond_to?(:blob)
174
+ source.blob&.filename&.to_s
175
+ elsif source.respond_to?(:blobs)
176
+ source.blobs.first&.filename&.to_s
177
+ end
178
+ end
179
+
180
+ def active_storage_content_type
181
+ if defined?(ActiveStorage::Blob) && source.is_a?(ActiveStorage::Blob)
182
+ source.content_type
183
+ elsif source.respond_to?(:content_type)
184
+ source.content_type
185
+ elsif source.respond_to?(:blob)
186
+ source.blob&.content_type
187
+ elsif source.respond_to?(:blobs)
188
+ source.blobs.first&.content_type
189
+ end
190
+ end
191
+
192
+ def active_storage_byte_size
193
+ if defined?(ActiveStorage::Blob) && source.is_a?(ActiveStorage::Blob)
194
+ source.byte_size
195
+ elsif source.respond_to?(:byte_size)
196
+ source.byte_size
197
+ elsif source.respond_to?(:blob)
198
+ source.blob&.byte_size
199
+ elsif source.respond_to?(:blobs)
200
+ source.blobs.first&.byte_size
201
+ end
202
+ end
203
+
204
+ def rewind_source
205
+ source.rewind if source_type == "io" && source.respond_to?(:rewind)
206
+ end
207
+ end
208
+ end
@@ -3,7 +3,7 @@
3
3
  module TurnKit
4
4
  class Message
5
5
  ROLES = %w[user assistant tool].freeze
6
- KINDS = %w[text tool_call tool_result context_summary].freeze
6
+ KINDS = %w[text tool_call tool_result context_summary image media_analysis].freeze
7
7
 
8
8
  attr_reader :id, :conversation_id, :turn_id, :role, :kind, :sequence
9
9
  attr_reader :content, :tool_execution_id, :provider_message_id, :metadata, :created_at
@@ -57,6 +57,14 @@ module TurnKit
57
57
  kind == "context_summary"
58
58
  end
59
59
 
60
+ def image?
61
+ kind == "image"
62
+ end
63
+
64
+ def media_analysis?
65
+ kind == "media_analysis"
66
+ end
67
+
60
68
  def text
61
69
  content.filter_map do |part|
62
70
  attrs = stringify(part)
@@ -44,6 +44,10 @@ module TurnKit
44
44
  when "tool_result"
45
45
  part = message.content.find { |candidate| candidate.fetch("type") == "tool_result" }
46
46
  { role: :tool, content: part&.fetch("text", message.text) || message.text, tool_call_id: part&.fetch("tool_call_id", nil) }
47
+ when "image"
48
+ { role: :assistant, content: projected_images }
49
+ when "media_analysis"
50
+ { role: :assistant, content: projected_media_analyses }
47
51
  else
48
52
  { role: message.role.to_sym, content: message.text }
49
53
  end
@@ -65,5 +69,23 @@ module TurnKit
65
69
  { "id" => part.fetch("id"), "name" => part.fetch("name"), "arguments" => part["arguments"] || {} }
66
70
  end
67
71
  end
72
+
73
+ def projected_images
74
+ message.content.filter_map do |part|
75
+ next unless part.fetch("type") == "image"
76
+
77
+ attrs = part.slice("url", "mime_type", "model", "provider", "revised_prompt").compact
78
+ "Generated image: #{attrs.to_json}"
79
+ end.join("\n")
80
+ end
81
+
82
+ def projected_media_analyses
83
+ message.content.filter_map do |part|
84
+ next unless part.fetch("type") == "media_analysis"
85
+
86
+ media = part.fetch("media", {}).slice("kind", "mime_type", "filename", "url").compact
87
+ [ "Media analysis: #{media.to_json}", part["text"].to_s ].reject(&:empty?).join("\n")
88
+ end.join("\n")
89
+ end
68
90
  end
69
91
  end
@@ -31,6 +31,24 @@ module TurnKit
31
31
  new(name: skill.key, content: skill.content, **options)
32
32
  end
33
33
 
34
+ def self.require_image
35
+ lambda do |output, output_data: nil, turn: nil, **|
36
+ data = output_data.is_a?(Hash) ? output_data : output
37
+ images = data.is_a?(Hash) ? data["images"] || data[:images] : nil
38
+ has_image = Array(images).any? || turn&.conversation&.messages_for_turn(turn)&.any?(&:image?)
39
+ { rule: "image_required", message: "output must include an image result" } unless has_image
40
+ end
41
+ end
42
+
43
+ def self.require_media_analysis
44
+ lambda do |output, output_data: nil, turn: nil, **|
45
+ data = output_data.is_a?(Hash) ? output_data : output
46
+ analyses = data.is_a?(Hash) ? data["media_analyses"] || data[:media_analyses] : nil
47
+ has_analysis = Array(analyses).any? || turn&.conversation&.messages_for_turn(turn)&.any?(&:media_analysis?)
48
+ { rule: "media_analysis_required", message: "output must include a media analysis result" } unless has_analysis
49
+ end
50
+ end
51
+
34
52
  def initialize(content:, name: "output_policy", model: nil, thinking: nil, client: nil)
35
53
  @name = name.to_s
36
54
  @content = content.to_s
@@ -28,6 +28,30 @@ module TurnKit
28
28
  tool_calls.any?
29
29
  end
30
30
 
31
+ def images
32
+ parts.filter_map do |part|
33
+ next unless part["type"] == "image"
34
+
35
+ ImageResult.from_h(part)
36
+ end
37
+ end
38
+
39
+ def image?
40
+ images.any?
41
+ end
42
+
43
+ def media_analyses
44
+ parts.filter_map do |part|
45
+ next unless part["type"] == "media_analysis"
46
+
47
+ MediaAnalysisResult.from_h(part)
48
+ end
49
+ end
50
+
51
+ def media_analysis?
52
+ media_analyses.any?
53
+ end
54
+
31
55
  private
32
56
  def synthesize_parts(text:, tool_calls:)
33
57
  parts = []
@@ -49,6 +49,9 @@ module TurnKit
49
49
  context = ToolContext.new(turn: turn, execution: execution)
50
50
  payload = begin
51
51
  normalize_payload(call_tool(tool, tool_call.arguments, context: context))
52
+ rescue BudgetError => error
53
+ finish_error(execution, tool_call, error.message, details: { "class" => error.class.name, "budget_denied" => true })
54
+ raise
52
55
  rescue StandardError => error
53
56
  return finish_error(execution, tool_call, error.message, details: { "class" => error.class.name })
54
57
  end
data/lib/turnkit/turn.rb CHANGED
@@ -167,6 +167,112 @@ module TurnKit
167
167
  result
168
168
  end
169
169
 
170
+ def paint(prompt, model:, provider: nil, size: nil, assume_model_exists: nil, input_images: nil, mask: nil, params: {}, metadata: {}, client: nil)
171
+ claimed_standalone = false
172
+ case status
173
+ when "pending"
174
+ claimed = store.claim_turn(id, from: "pending", to: "running", started_at: Clock.now, heartbeat_at: Clock.now)
175
+ raise Error, "turn is already running" unless claimed
176
+
177
+ @record = claimed
178
+ @started_at = @record["started_at"]
179
+ @budget = Budget.resume(store: store, root_turn_id: root_turn_id, limits: budget_limits)
180
+ claimed_standalone = true
181
+ emit("turn.started", status: status, model: model)
182
+ when "running"
183
+ # Image tools call this while their parent turn is running.
184
+ else
185
+ raise Error, "cannot paint for #{status} turn"
186
+ end
187
+
188
+ image_client = client || agent.effective_client
189
+ request = {
190
+ prompt: prompt,
191
+ model: model,
192
+ provider: provider,
193
+ size: size,
194
+ assume_model_exists: assume_model_exists,
195
+ input_images: input_images,
196
+ mask: mask,
197
+ params: params || {},
198
+ metadata: { turn_id: id, conversation_id: conversation.id }.merge(metadata || {})
199
+ }
200
+
201
+ image_client.validate!(model: model)
202
+ emit("image.requested", request.except(:input_images, :mask))
203
+ result = call_image_client(image_client, request)
204
+ result_cost = Cost.from_usage(result.usage, model: result.model || model)
205
+ add_usage!(result.usage, cost: result_cost)
206
+ budget.add_cost!(result_cost.total)
207
+ image = result.images.first
208
+ raise Error, "image client returned no image" unless image
209
+ raise Error, "image client returned image without url or data" if image.url.to_s.empty? && image.data.to_s.empty?
210
+
211
+ persist_image_message(image)
212
+ emit("image.completed", image: image.to_h, model: image.model || model, provider: image.provider || provider&.to_s, mime_type: image.mime_type, usage: result.usage.to_h, cost: result_cost.to_h, metadata: metadata || {})
213
+ complete_with_output(image.url.to_s, output_data: { "type" => "image", "images" => [ image.to_h ] }, audit: check_policy(image.url.to_s, output_data: { "type" => "image", "images" => [ image.to_h ] })) if claimed_standalone
214
+ image
215
+ rescue StandardError => error
216
+ if claimed_standalone
217
+ update!(status: "failed", error: { "class" => error.class.name, "message" => error.message }, completed_at: Clock.now)
218
+ emit("turn.failed", error: { "class" => error.class.name, "message" => error.message })
219
+ end
220
+ raise
221
+ end
222
+
223
+ def view_media(media, objective:, model:, provider: nil, output_schema: nil, params: {}, metadata: {}, client: nil)
224
+ claimed_standalone = false
225
+ case status
226
+ when "pending"
227
+ claimed = store.claim_turn(id, from: "pending", to: "running", started_at: Clock.now, heartbeat_at: Clock.now)
228
+ raise Error, "turn is already running" unless claimed
229
+
230
+ @record = claimed
231
+ @started_at = @record["started_at"]
232
+ @budget = Budget.resume(store: store, root_turn_id: root_turn_id, limits: budget_limits)
233
+ claimed_standalone = true
234
+ emit("turn.started", status: status, model: model)
235
+ when "running"
236
+ # Media tools call this while their parent turn is running.
237
+ else
238
+ raise Error, "cannot view media for #{status} turn"
239
+ end
240
+
241
+ media_input = MediaInput.wrap(media)
242
+ media_client = client || agent.effective_client
243
+ request = {
244
+ media: media_input,
245
+ objective: objective,
246
+ model: model,
247
+ provider: provider,
248
+ output_schema: output_schema,
249
+ params: params || {},
250
+ metadata: { turn_id: id, conversation_id: conversation.id }.merge(metadata || {})
251
+ }
252
+
253
+ media_client.validate!(model: model)
254
+ emit("media.requested", request.except(:media).merge(media: media_input.to_h))
255
+ result = call_media_client(media_client, request)
256
+ result_cost = Cost.from_usage(result.usage, model: result.model || model)
257
+ add_usage!(result.usage, cost: result_cost)
258
+ budget.add_cost!(result_cost.total)
259
+ analysis = result.media_analyses.first
260
+ raise Error, "media client returned no media analysis" unless analysis
261
+
262
+ persist_media_analysis_message(analysis)
263
+ output_data = { "type" => "media_analysis", "media_analyses" => [ analysis.to_h ] }
264
+ emit("media.completed", analysis: analysis.to_h, model: analysis.model || model, provider: analysis.provider || provider&.to_s, media: media_input.to_h, usage: result.usage.to_h, cost: result_cost.to_h, metadata: metadata || {})
265
+ complete_with_output(analysis.text, output_data: output_data, audit: check_policy(analysis.text, output_data: output_data)) if claimed_standalone
266
+ analysis
267
+ rescue StandardError => error
268
+ emit("media.failed", error: { "class" => error.class.name, "message" => error.message }, metadata: metadata || {}) if status == "running" || claimed_standalone
269
+ if claimed_standalone
270
+ update!(status: "failed", error: { "class" => error.class.name, "message" => error.message }, completed_at: Clock.now)
271
+ emit("turn.failed", error: { "class" => error.class.name, "message" => error.message })
272
+ end
273
+ raise
274
+ end
275
+
170
276
  private
171
277
  def model_request
172
278
  prompt = SystemPrompt.new(agent: agent, turn: self, conversation: conversation, mode: prompt_mode || agent.effective_prompt_mode(turn: self))
@@ -214,6 +320,26 @@ module TurnKit
214
320
  end
215
321
  end
216
322
 
323
+ def call_image_client(client, request)
324
+ kwargs = request.merge(on_event: ->(event) { emit_event(event) })
325
+ accepted = client.method(:paint).parameters.filter_map do |kind, name|
326
+ return client.paint(**kwargs) if kind == :keyrest
327
+
328
+ name if %i[key keyreq].include?(kind)
329
+ end
330
+ client.paint(**kwargs.slice(*accepted))
331
+ end
332
+
333
+ def call_media_client(client, request)
334
+ kwargs = request.merge(on_event: ->(event) { emit_event(event) })
335
+ accepted = client.method(:view_media).parameters.filter_map do |kind, name|
336
+ return client.view_media(**kwargs) if kind == :keyrest
337
+
338
+ name if %i[key keyreq].include?(kind)
339
+ end
340
+ client.view_media(**kwargs.slice(*accepted))
341
+ end
342
+
217
343
  def llm_messages
218
344
  MessageProjection.for(TurnKit::Compaction.project(conversation.messages_for_turn(self)))
219
345
  end
@@ -271,12 +397,28 @@ module TurnKit
271
397
  )
272
398
  emit("message.created", message_id: message.id, role: message.role, kind: message.kind)
273
399
  result.tool_calls.each { |call| emit("tool_call.created", id: call.id, name: call.name) }
400
+ elsif result.image?
401
+ message = conversation.append_message(role: "assistant", kind: "image", content: result.images.map { |image| image.to_h.merge("type" => "image") }, turn_id: id, metadata: { "output_data" => result.output_data }.compact)
402
+ emit("message.created", message_id: message.id, role: message.role, kind: message.kind)
403
+ elsif result.media_analysis?
404
+ message = conversation.append_message(role: "assistant", kind: "media_analysis", content: result.media_analyses.map { |analysis| analysis.to_h.merge("type" => "media_analysis") }, turn_id: id, metadata: { "output_data" => result.output_data }.compact)
405
+ emit("message.created", message_id: message.id, role: message.role, kind: message.kind)
274
406
  else
275
407
  message = conversation.append_message(role: "assistant", kind: "text", text: result.text, turn_id: id, metadata: { "output_data" => result.output_data }.compact)
276
408
  emit("message.created", message_id: message.id, role: message.role, kind: message.kind)
277
409
  end
278
410
  end
279
411
 
412
+ def persist_image_message(image)
413
+ message = conversation.append_message(role: "assistant", kind: "image", content: [ image.to_h.merge("type" => "image") ], turn_id: id, metadata: { "output_data" => { "type" => "image", "images" => [ image.to_h ] } })
414
+ emit("message.created", message_id: message.id, role: message.role, kind: message.kind)
415
+ end
416
+
417
+ def persist_media_analysis_message(analysis)
418
+ message = conversation.append_message(role: "assistant", kind: "media_analysis", content: [ analysis.to_h.merge("type" => "media_analysis") ], turn_id: id, metadata: { "output_data" => { "type" => "media_analysis", "media_analyses" => [ analysis.to_h ] } })
419
+ emit("message.created", message_id: message.id, role: message.role, kind: message.kind)
420
+ end
421
+
280
422
  def append_terminal_completion(runner, execution)
281
423
  message = runner.completion_message(execution)
282
424
  assistant = conversation.append_message(role: "assistant", kind: "text", text: message, turn_id: id)
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module TurnKit
4
- VERSION = "0.3.0"
4
+ VERSION = "0.4.1"
5
5
  end
@@ -0,0 +1,30 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TurnKit
4
+ class ViewMediaTool < Tool
5
+ class << self
6
+ %i[model provider output_schema params].each do |name|
7
+ define_method(name) do |value = nil|
8
+ instance_variable_set("@#{name}", value) unless value.nil?
9
+ instance_variable_get("@#{name}")
10
+ end
11
+ end
12
+ end
13
+
14
+ def call(turnkit_context:, **arguments)
15
+ turnkit_context.turn.view_media(
16
+ media(**arguments),
17
+ objective: objective(**arguments),
18
+ model: self.class.model,
19
+ provider: self.class.provider,
20
+ output_schema: self.class.output_schema,
21
+ params: self.class.params || {},
22
+ metadata: metadata(**arguments)
23
+ ).to_h
24
+ end
25
+
26
+ def metadata(**)
27
+ {}
28
+ end
29
+ end
30
+ end
data/lib/turnkit.rb CHANGED
@@ -22,6 +22,9 @@ require_relative "turnkit/client"
22
22
  require_relative "turnkit/conversation"
23
23
  require_relative "turnkit/message"
24
24
  require_relative "turnkit/record"
25
+ require_relative "turnkit/image_result"
26
+ require_relative "turnkit/media_input"
27
+ require_relative "turnkit/media_analysis_result"
25
28
  require_relative "turnkit/result"
26
29
  require_relative "turnkit/skill"
27
30
  require_relative "turnkit/output_audit"
@@ -34,6 +37,8 @@ require_relative "turnkit/store"
34
37
  require_relative "turnkit/memory_store"
35
38
  require_relative "turnkit/compaction"
36
39
  require_relative "turnkit/tool"
40
+ require_relative "turnkit/image_tool"
41
+ require_relative "turnkit/view_media_tool"
37
42
  require_relative "turnkit/tool_call"
38
43
  require_relative "turnkit/tool_execution"
39
44
  require_relative "turnkit/sub_agent_tool"
@@ -109,4 +114,14 @@ module TurnKit
109
114
  def self.check_output_policy(output, constraints: [], context: {})
110
115
  OutputAudit.check(output, constraints: constraints, context: context)
111
116
  end
117
+
118
+ def self.paint(prompt, model:, provider: nil, size: nil, assume_model_exists: nil, input_images: nil, mask: nil, params: {}, metadata: {}, client: nil)
119
+ image_client = client || self.client
120
+ image_client.paint(prompt: prompt, model: model, provider: provider, size: size, assume_model_exists: assume_model_exists, input_images: input_images, mask: mask, params: params, metadata: metadata).images.first
121
+ end
122
+
123
+ def self.view_media(media, objective:, model:, provider: nil, output_schema: nil, params: {}, metadata: {}, client: nil)
124
+ media_client = client || self.client
125
+ media_client.view_media(media: media, objective: objective, model: model, provider: provider, output_schema: output_schema, params: params, metadata: metadata).media_analyses.first
126
+ end
112
127
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: turnkit
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sam Couch
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2026-06-10 00:00:00.000000000 Z
11
+ date: 2026-06-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby_llm
@@ -57,7 +57,11 @@ files:
57
57
  - lib/turnkit/generators/turnkit/install/templates/turn.rb
58
58
  - lib/turnkit/generators/turnkit/install_generator.rb
59
59
  - lib/turnkit/id.rb
60
+ - lib/turnkit/image_result.rb
61
+ - lib/turnkit/image_tool.rb
60
62
  - lib/turnkit/load_skill_tool.rb
63
+ - lib/turnkit/media_analysis_result.rb
64
+ - lib/turnkit/media_input.rb
61
65
  - lib/turnkit/memory_store.rb
62
66
  - lib/turnkit/message.rb
63
67
  - lib/turnkit/message_projection.rb
@@ -84,6 +88,7 @@ files:
84
88
  - lib/turnkit/turn.rb
85
89
  - lib/turnkit/usage.rb
86
90
  - lib/turnkit/version.rb
91
+ - lib/turnkit/view_media_tool.rb
87
92
  - lib/turnkit/workflow.rb
88
93
  homepage: https://github.com/samuelcouch/turnkit
89
94
  licenses: