RubyGems - turnkit - Versions diffs - 0.2.4 → 0.2.6 - Mend

turnkit 0.2.4 → 0.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +8 -0
data/README.md +306 -3
data/lib/turnkit/adapters/ruby_llm.rb +12 -1
data/lib/turnkit/agent.rb +23 -2
data/lib/turnkit/client.rb +1 -1
data/lib/turnkit/compaction.rb +406 -0
data/lib/turnkit/conversation.rb +15 -4
data/lib/turnkit/cost.rb +9 -4
data/lib/turnkit/error.rb +1 -0
data/lib/turnkit/message.rb +21 -1
data/lib/turnkit/message_projection.rb +28 -1
data/lib/turnkit/turn.rb +21 -2
data/lib/turnkit/usage.rb +7 -3
data/lib/turnkit/version.rb +1 -1
data/lib/turnkit.rb +3 -0
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 75121664c1e081304931fbf125db92a9abc8b9062f920c7e33f7759b52ce51ec
-  data.tar.gz: ccabe905d199d955d281c936a019995a3bd9bc29c0fc009160ea924de4605835
+  metadata.gz: 34429a11d156c9631705ec193c77c2ad166fb3dffc182a7b730cffd38b52f694
+  data.tar.gz: c497d2042388a33e80c037145e82a6adf1cc47286073441b7fb7f21fcd4a89b7
 SHA512:
-  metadata.gz: ff0fa50aabb4c4b4fd9ea6f3ae78b62a4b020522a083f96605028dca2f4ca50a4fb6a9b98b36070e070d38a36b205ebf343823b520f5b0e5b4fe7a06b643cdce
-  data.tar.gz: beec35d2fc1f51cc6fe674d12d72e0ec1b44722bdcfab28019e9ab2d2ae313c684125989647e6d5d389f80b2df5f98dd33aa3c154e0af7da0885d2b8bec0221c
+  metadata.gz: 330444b7c8964271b8f11ec562f22c331cf6f00d470880082edc1efa263c33708e68b436ed29276c417ec173044993ecb105c05c788fa84405ac34f90f9521a2
+  data.tar.gz: 5bb9900c687ffa6c9eed0678c0d1a36bba08c79ceb0a4ab3767046e772394b4794c5f81f2b9a52142411873f8aea1bc19e148b101b00fc4fd1cb6fe89933f531

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,13 @@
 # Changelog
+## 0.2.6 - 2026-06-07
+- Add automatic context compaction for long conversations. TurnKit now stores append-only `context_summary` messages and projects compacted history into future model calls while keeping the full transcript durable.
+## 0.2.5 - 2026-06-06
+- Add per-agent and per-turn provider thinking configuration.
 ## 0.2.4 - 2026-06-06
 - Add Anthropic prompt cache support for stable system prompt sections.

data/README.md CHANGED Viewed

@@ -22,12 +22,21 @@ bundle install
 ## Quick Start
-Set a provider key:
+Set a provider key. TurnKit uses RubyLLM under the hood and defaults to Anthropic Claude:
 ```sh
 export ANTHROPIC_API_KEY=...
 ```
+| Provider | Env var | Example model |
+| --- | --- | --- |
+| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-5` |
+| OpenAI | `OPENAI_API_KEY` | `gpt-4.1-mini` |
+| Gemini | `GEMINI_API_KEY` | `gemini-2.5-flash` |
+> [!WARNING]
+> TurnKit defaults to `claude-sonnet-4-5`. If `ANTHROPIC_API_KEY` is unset or blank, set `TurnKit.default_model` to a provider you have configured.
 Create an agent:
 ```ruby
@@ -68,6 +77,52 @@ Set an OpenAI model:
 TurnKit.default_model = "gpt-4.1-mini"
 ```
+Use Gemini:
+```sh
+export GEMINI_API_KEY=...
+```
+Set a Gemini model:
+```ruby
+TurnKit.default_model = "gemini-2.5-flash"
+```
+### Thinking
+Enable provider reasoning or extended thinking per agent:
+```ruby
+agent = TurnKit::Agent.new(
+  name: "reasoner",
+  model: "claude-sonnet-4-5",
+  thinking: { budget: 4_000 }
+)
+```
+Use effort-based thinking for providers that support it:
+```ruby
+agent = TurnKit::Agent.new(
+  name: "reasoner",
+  model: "gemini-2.5-flash",
+  thinking: { effort: :high }
+)
+```
+Override or disable thinking for one turn:
+```ruby
+conversation = agent.conversation
+conversation.ask("Solve this carefully.", thinking: { budget: 8_000 })
+conversation.ask("Answer quickly.", thinking: nil)
+```
+TurnKit passes `thinking` to RubyLLM as `{ effort:, budget: }`. Anthropic requires `budget`; Gemini and OpenRouter can use `effort`, `budget`, or both depending on the model.
+When the provider reports reasoning usage, TurnKit records it as `thinking_tokens` and includes it in usage totals and cost calculation.
 ### Conversations
 Create a conversation:
@@ -93,6 +148,93 @@ turn = conversation.run!
 puts turn.output_text
 ```
+### Context compaction
+TurnKit automatically compacts long conversations. Older messages are summarized for future model calls, while the original transcript remains stored durably.
+```ruby
+conversation = agent.conversation
+conversation.ask("Work through this long task.")
+```
+By default, compaction is enabled and uses the current turn model for the summary call. If a turn runs with `gpt-5`, compaction uses `gpt-5` unless you configure a separate summary model.
+Disable compaction globally:
+```ruby
+TurnKit.compaction = false
+```
+Use a different model for summaries:
+```ruby
+TurnKit.compaction = {
+  model: "gpt-4.1-mini"
+}
+```
+You can also configure the compaction threshold and estimated context limit:
+```ruby
+TurnKit.compaction = {
+  model: "gpt-4.1-mini",
+  threshold: 0.75,
+  context_limit: 128_000
+}
+```
+Configure compaction for one agent:
+```ruby
+agent = TurnKit::Agent.new(
+  name: "engineer",
+  model: "gpt-5",
+  compaction: {
+    model: "gpt-4.1-mini",
+    threshold: 0.75,
+    context_limit: 128_000
+  }
+)
+```
+In this example, normal turns use `gpt-5` and compaction summaries use `gpt-4.1-mini`.
+Override the model for one manual compaction:
+```ruby
+conversation.compact!(model: "gpt-4.1-mini")
+conversation.compact!(focus: "billing migration", model: "gpt-4.1-mini")
+```
+Disable compaction for a single turn:
+```ruby
+conversation.ask("Continue", compact: false)
+```
+Manually compact a conversation:
+```ruby
+conversation.compact!
+conversation.compact!(focus: "billing migration")
+```
+Compaction is append-only: TurnKit stores a `context_summary` message with metadata describing the message range it replaces for model projection. The original messages are not deleted, so `conversation.messages` remains the full durable transcript. Future model calls see a compacted projection that includes a reference-only summary and the recent tail.
+The model-visible projection uses a synthetic summary exchange followed by recent messages:
+```text
+user: What did we do so far?
+assistant: [CONTEXT COMPACTION — REFERENCE ONLY] ...
+user: latest request
+```
+For a local smoke test without calling a real provider, run:
+```sh
+ruby script/manual_compaction.rb
+```
 ### Tools
 Create a tool:
@@ -130,6 +272,76 @@ turn = agent.conversation.ask("Save a short status report.")
 puts turn.output_text
 ```
+#### Defining application tools
+Tools are classes, not instances. Namespaced tools work fine, and the default tool name comes from the class name: `Assistant::Tools::WebSearch` becomes `web_search`.
+```ruby
+module Assistant
+  module Tools
+    class WebSearch < TurnKit::Tool
+      description "Search the web for current information."
+      usage_hint "Use when current external information is needed."
+      parameter :objective, :string, required: true
+      parameter :search_queries, :array, required: false
+      def call(objective:, search_queries: nil, context:)
+        ParallelClient.new.web_search(
+          objective: objective,
+          search_queries: search_queries
+        )
+      end
+    end
+  end
+end
+```
+Register tool classes on the agent:
+```ruby
+agent = TurnKit::Agent.new(
+  name: "researcher",
+  tools: [
+    Assistant::Tools::WebSearch,
+    Assistant::Tools::ReadWebPage
+  ]
+)
+```
+#### Tool context
+Every tool receives a `context:` object. Use it for logging, correlation, persistence, and domain scoping:
+```ruby
+def call(query:, context:)
+  context.turn       # The TurnKit::Turn being run
+  context.execution  # The TurnKit::ToolExecution for this tool call
+  { query: query }
+end
+```
+If your application already uses a `context:` keyword for something else, use `turnkit_context:` instead:
+```ruby
+def call(query:, turnkit_context:)
+  { turn_id: turnkit_context.turn.id, query: query }
+end
+```
+#### Tool return values
+Prefer returning a `Hash`. TurnKit serializes the normalized value as the tool result:
+| Return value | Stored tool result |
+| --- | --- |
+| `Hash` | Keys are stringified. |
+| `Array` | Wrapped as `{ "items" => [...] }`. |
+| Scalar | Wrapped as `{ "result" => value.to_s }`. |
+Avoid returning arbitrary objects unless you convert them to a plain Hash or Array first.
 ### Skills
 Load a skill:
@@ -260,7 +472,7 @@ Create a client:
 ```ruby
 class MyClient < TurnKit::Client
-  def chat(model:, messages:, tools:, instructions:, temperature: nil, metadata: nil)
+  def chat(model:, messages:, tools:, instructions:, temperature: nil, thinking: nil, metadata: nil)
     TurnKit::Result.new(
       text: "provider response",
       model: model,
@@ -295,6 +507,17 @@ Install Rails persistence:
 bin/rails generate turnkit:install
 ```
+The installer creates:
+- `config/initializers/turnkit.rb`
+- `app/models/turnkit/conversation.rb`
+- `app/models/turnkit/turn.rb`
+- `app/models/turnkit/message.rb`
+- `app/models/turnkit/tool_execution.rb`
+- a migration for TurnKit persistence
+The generated migration currently uses `ActiveRecord::Migration[7.1]`. In a newer Rails app, update that version if your app requires it, for example `ActiveRecord::Migration[8.1]`.
 Run migrations:
 ```sh
@@ -307,12 +530,88 @@ Configure Rails:
 TurnKit.store = TurnKit::ActiveRecordStore.new
 ```
+Suggested Rails file layout for your application AI code:
+```text
+app/models/assistant/
+  tools/
+    web_search.rb
+    read_web_page.rb
+  skills/
+  prompts/
+```
+If you prefer to keep AI infrastructure out of `app/models`, add an autoloaded directory such as:
+```text
+app/ai/
+  tools/
+  skills/
+  prompts/
+```
 Reconcile stale turns:
 ```ruby
 TurnKit.reconcile_stale!
 ```
+#### Debugging Rails persistence
+Inspect the latest persisted turn in a Rails console:
+```ruby
+turn = Turnkit::Turn.order(created_at: :desc).first
+turn.status
+turn.error
+turn.output_text
+```
+Check whether the model actually called tools:
+```ruby
+Turnkit::ToolExecution
+  .where(turn_uid: turn.uid)
+  .order(:created_at)
+  .map { |execution|
+    {
+      name: execution.tool_name,
+      status: execution.status,
+      arguments: execution.arguments,
+      result_keys: execution.result&.keys,
+      error: execution.error
+    }
+  }
+```
+#### Live smoke test
+Use a model whose provider key is configured, then run a real tool-using turn:
+```ruby
+TurnKit.default_model = "gpt-4.1-mini"
+agent = TurnKit::Agent.new(
+  name: "researcher",
+  instructions: "Use web_search, then read_web_page, before answering.",
+  tools: [
+    Assistant::Tools::WebSearch,
+    Assistant::Tools::ReadWebPage
+  ]
+)
+turn = agent.conversation.ask(
+  "Search for the TurnKit Ruby gem, read the first useful result, then summarize it."
+)
+puts turn.output_text
+pp Turnkit::ToolExecution
+  .where(turn_uid: turn.id)
+  .order(:created_at)
+  .pluck(:tool_name, :status, :error)
+```
 ## Options
 Configure defaults:
@@ -327,6 +626,7 @@ TurnKit.cost_limit = nil
 TurnKit.cost_rates = {}
 TurnKit.cost_calculator = nil
 TurnKit.prompt_cache = :auto
+TurnKit.compaction = true
 ```
 Override an agent:
@@ -337,7 +637,8 @@ agent = TurnKit::Agent.new(
   model: "gpt-4.1-mini",
   max_iterations: 10,
   timeout: 60,
-  cost_limit: 0.25
+  cost_limit: 0.25,
+  thinking: { effort: :low }
 )
 ```
@@ -350,9 +651,11 @@ agent = TurnKit::Agent.new(
 | `timeout` | Limit seconds per root turn. |
 | `max_tool_executions` | Limit tool calls per root turn. |
 | `cost_limit` | Limit cost per root turn. |
+| `thinking` | Configure provider reasoning or extended thinking per agent. |
 | `cost_rates` | Override prices by model. |
 | `cost_calculator` | Override cost calculation. |
 | `prompt_cache` | Use provider prompt caching. |
+| `compaction` | Enable, disable, or configure automatic context compaction. |
 ## Contributing

data/lib/turnkit/adapters/ruby_llm.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module TurnKit
   module Adapters
     class RubyLLM < Client
-      def chat(model:, messages:, tools:, instructions:, temperature: nil, metadata: nil)
+      def chat(model:, messages:, tools:, instructions:, temperature: nil, thinking: nil, metadata: nil)
         require "ruby_llm"
         configure_from_environment
@@ -11,6 +11,7 @@ module TurnKit
         chat = ::RubyLLM.chat(model: model)
         add_instructions(chat, instructions, model: model)
         chat.with_temperature(temperature) if temperature
+        apply_thinking(chat, thinking)
         Array(tools).each { |tool| chat.with_tool(ruby_llm_tool(tool)) }
         Array(messages).each { |message| add_message(chat, message) }
@@ -27,6 +28,11 @@ module TurnKit
           config.openrouter_api_key ||= ENV["OPENROUTER_API_KEY"]
         end
+        def apply_thinking(chat, thinking)
+          thinking = Agent.normalize_thinking(thinking)
+          chat.with_thinking(**thinking) if thinking
+        end
         def complete_without_tool_execution(chat)
           provider = chat.instance_variable_get(:@provider)
           provider.complete(
@@ -123,6 +129,7 @@ module TurnKit
             output_tokens: token_value(response, :output_tokens),
             cached_tokens: token_value(response, :cached_tokens),
             cache_write_tokens: token_value(response, :cache_creation_tokens),
+            thinking_tokens: thinking_token_value(response),
             cost: response_cost(response)
           )
           Result.new(
@@ -137,6 +144,10 @@ module TurnKit
           response.respond_to?(method) ? response.public_send(method).to_i : 0
         end
+        def thinking_token_value(response)
+          token_value(response, :thinking_tokens).nonzero? || token_value(response, :reasoning_tokens)
+        end
         def response_cost(response)
           return unless response.respond_to?(:cost)

data/lib/turnkit/agent.rb CHANGED Viewed

@@ -4,11 +4,11 @@ module TurnKit
   class Agent
     attr_reader :name, :description, :model, :instructions, :tools, :skills, :available_skills, :sub_agents
     attr_reader :client, :store, :max_iterations, :timeout, :cost_limit, :max_depth, :max_tool_executions
-    attr_reader :prompt_sections, :system_prompt, :prompt_mode
+    attr_reader :prompt_sections, :system_prompt, :prompt_mode, :thinking, :compaction
     def initialize(name:, description: "", model: nil, instructions: "", tools: [], skills: [], available_skills: [], sub_agents: [],
       system_prompt: nil, prompt_sections: nil, prompt_mode: nil, client: nil, store: nil,
-      max_iterations: nil, timeout: nil, cost_limit: nil, max_depth: nil, max_tool_executions: nil)
+      max_iterations: nil, timeout: nil, cost_limit: nil, max_depth: nil, max_tool_executions: nil, thinking: nil, compaction: nil)
       @name = name.to_s
       @description = description.to_s
       @model = model
@@ -27,9 +27,26 @@ module TurnKit
       @cost_limit = cost_limit
       @max_depth = max_depth
       @max_tool_executions = max_tool_executions
+      @thinking = self.class.normalize_thinking(thinking)
+      @compaction = compaction
       raise ArgumentError, "name is required" if @name.empty?
     end
+    def self.normalize_thinking(value)
+      return nil if value.nil?
+      attrs = value.respond_to?(:to_h) ? value.to_h : value
+      raise ArgumentError, "thinking must be a hash" unless attrs.is_a?(Hash)
+      attrs = attrs.transform_keys(&:to_sym)
+      unknown = attrs.keys - %i[effort budget]
+      raise ArgumentError, "unknown thinking attributes: #{unknown.join(", ")}" if unknown.any?
+      raise ArgumentError, "thinking requires :effort or :budget" if attrs[:effort].nil? && attrs[:budget].nil?
+      raise ArgumentError, "thinking budget must be an Integer" if attrs[:budget] && !attrs[:budget].is_a?(Integer)
+      attrs.slice(:effort, :budget).compact
+    end
     def conversation(model: nil, subject: nil, metadata: {})
       store = effective_store
       record = store.create_conversation(
@@ -53,6 +70,10 @@ module TurnKit
       model || TurnKit.default_model
     end
+    def effective_thinking
+      thinking
+    end
     def effective_client
       client || TurnKit.client
     end

data/lib/turnkit/client.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module TurnKit
   class Client
-    def chat(model:, messages:, tools:, instructions:, temperature: nil, metadata: nil)
+    def chat(model:, messages:, tools:, instructions:, temperature: nil, thinking: nil, metadata: nil)
       raise NotImplementedError
     end
   end

data/lib/turnkit/compaction.rb ADDED Viewed

@@ -0,0 +1,406 @@
+# frozen_string_literal: true
+module TurnKit
+  module Compaction
+    DEFAULTS = {
+      "enabled" => true,
+      "threshold" => 0.75,
+      "context_limit" => 128_000,
+      "reserved_tokens" => 20_000,
+      "head_messages" => 0,
+      "tail_messages" => 12,
+      "tail_tokens" => 8_000,
+      "summary_ratio" => 0.20,
+      "min_summary_tokens" => 1_000,
+      "max_summary_tokens" => 12_000,
+      "tool_output_max_chars" => 2_000,
+      "model" => nil,
+      "client" => nil
+    }.freeze
+    KNOWN_KEYS = DEFAULTS.keys.freeze
+    COMPACTION_SYSTEM_PROMPT = <<~TEXT.strip
+      You are an anchored context summarization assistant for TurnKit conversations.
+      Summarize only the conversation history you are given. Recent turns may be kept verbatim outside your summary, so focus on older context that still matters for continuing the work.
+      If a previous summary is provided, update it by preserving still-true details, removing stale details, and merging in new facts.
+      Produce only the requested Markdown summary. Do not answer the conversation itself. Do not mention that you are summarizing, compacting, or merging context.
+      Write in the same language the user was using.
+      Never include API keys, tokens, passwords, secrets, credentials, or connection strings. Replace secret values with [REDACTED].
+    TEXT
+    SUMMARY_TEMPLATE = <<~TEXT.strip
+      Use this exact structure:
+      ## Active Task
+      - [latest unfulfilled user request, preferably verbatim]
+      ## Goal
+      - [what the user is trying to accomplish overall]
+      ## Constraints & Preferences
+      - [user/developer preferences, specs, constraints, important choices]
+      ## Completed Actions
+      - [completed work and outcomes]
+      ## Active State
+      - [current state, records/files touched, test status, running tool/turn state]
+      ## In Progress
+      - [work underway, or "(none)"]
+      ## Blocked
+      - [blockers, exact errors, missing information, or "(none)"]
+      ## Key Decisions
+      - [important decisions and why]
+      ## Resolved Questions
+      - [questions already answered]
+      ## Pending User Asks
+      - [unanswered or unfulfilled asks]
+      ## Relevant Files
+      - [file/path/resource and why it matters, or "(none)"]
+      ## Tool Results To Remember
+      - [important tool output summaries, or "(none)"]
+      ## Remaining Work
+      - [likely next work, framed as context, not instructions]
+      ## Critical Context
+      - [specific values, IDs, commands, errors, constraints; redact secrets]
+      Rules:
+      - Keep every section.
+      - Use terse bullets.
+      - Preserve exact file paths, commands, error strings, IDs, and important values.
+      - Do not invent facts.
+      - Do not include secrets.
+      - Do not include a greeting or preamble.
+    TEXT
+    module_function
+    def enabled_for?(agent, overrides = {})
+      policy_for(agent, overrides)["enabled"]
+    end
+    def policy_for(agent, overrides = {})
+      global = normalize_config(TurnKit.compaction)
+      local = normalize_config(agent.compaction)
+      override = normalize_config(overrides)
+      return DEFAULTS.merge("enabled" => false) if global == false
+      return DEFAULTS.merge("enabled" => false) if local == false
+      return DEFAULTS.merge("enabled" => false) if override == false
+      DEFAULTS.merge(global || {}).merge(local || {}).merge(override || {})
+    end
+    def maybe_compact!(turn, force: nil, focus: nil)
+      return if turn.compact == false
+      force = turn.compact == true if force.nil?
+      policy = policy_for(turn.agent)
+      return unless policy["enabled"]
+      messages = project(turn.conversation.messages_for_turn(turn))
+      return unless force || over_threshold?(messages, policy)
+      compact!(turn.conversation, agent: turn.agent, turn: turn, focus: focus, auto: true, overrides: policy, force: true)
+    rescue StandardError => error
+      TurnKit.logger&.warn("TurnKit compaction failed: #{error.class}: #{error.message}")
+      nil
+    end
+    def compact!(conversation, agent:, turn: nil, focus: nil, auto: false, overrides: {}, force: true)
+      policy = policy_for(agent, overrides)
+      raise CompactionError, "compaction is disabled" unless policy["enabled"]
+      messages = turn ? conversation.messages_for_turn(turn) : conversation.messages
+      projected = project(messages)
+      selected = select_messages(projected, policy)
+      return nil if selected.nil? && auto
+      raise CompactionError, "not enough messages to compact" unless selected
+      selected_tokens = estimate_messages_tokens(selected.fetch("middle"))
+      return nil if auto && !force && !over_threshold?(projected, policy)
+      summary = generate_summary(
+        agent: agent,
+        policy: policy,
+        messages: selected.fetch("middle"),
+        previous_summary: selected["previous_summary"]&.text,
+        focus: focus,
+        target_tokens: summary_budget(selected_tokens, policy),
+        fallback_model: turn&.model || conversation.model || agent.effective_model,
+        conversation_id: conversation.id,
+        turn_id: turn&.id
+      )
+      append_summary(conversation, turn: turn, summary: summary, selected: selected, policy: policy, focus: focus, auto: auto, input_tokens: selected_tokens)
+    rescue CompactionError
+      raise
+    rescue StandardError => error
+      raise CompactionError, "#{error.class}: #{error.message}"
+    end
+    def project(messages)
+      rows = Array(messages).sort_by { |message| [ message.sequence.to_i, message.id ] }
+      summaries = active_summaries(rows)
+      ranges = summaries.filter_map { |summary| range_for(summary) }
+      summaries_by_id = summaries.to_h { |summary| [ summary.id, summary ] }
+      inserted = {}
+      projected = []
+      rows.each do |message|
+        summaries.each do |summary|
+          range = range_for(summary)
+          next unless range
+          next if inserted[summary.id]
+          next unless range.begin <= message.sequence.to_i
+          projected << summary
+          inserted[summary.id] = true
+        end
+        if message.context_summary?
+          projected << message if summaries_by_id[message.id] && !inserted[message.id] && !range_for(message)
+          inserted[message.id] = true if summaries_by_id[message.id]
+          next
+        end
+        next if ranges.any? { |range| range.cover?(message.sequence.to_i) }
+        projected << message
+      end
+      summaries.each do |summary|
+        next if inserted[summary.id]
+        projected << summary
+        inserted[summary.id] = true
+      end
+      projected
+    end
+    def estimate_messages_tokens(messages)
+      Array(messages).sum { |message| estimate_text_tokens(message.text) + 8 }
+    end
+    def estimate_text_tokens(text)
+      (text.to_s.length / 4.0).ceil
+    end
+    def summary_budget(input_tokens, policy)
+      budget = (input_tokens.to_i * policy["summary_ratio"].to_f).ceil
+      budget = [ budget, policy["min_summary_tokens"].to_i ].max
+      [ budget, policy["max_summary_tokens"].to_i ].min
+    end
+    def over_threshold?(messages, policy)
+      usable = [ policy["context_limit"].to_i - policy["reserved_tokens"].to_i, 1 ].max
+      estimate_messages_tokens(messages) >= (usable * policy["threshold"].to_f)
+    end
+    def select_messages(messages, policy)
+      rows = Array(messages)
+      return nil if rows.length <= policy["head_messages"].to_i + 1
+      previous_summary = rows.reverse.find(&:context_summary?)
+      candidates = rows.reject(&:context_summary?)
+      return nil if candidates.length <= policy["head_messages"].to_i + 1
+      head_count = policy["head_messages"].to_i
+      tail_start = tail_start_index(candidates, policy)
+      tail_start = [ tail_start, head_count ].max
+      tail_start = expand_tail_start_for_tool_pairs(candidates, tail_start)
+      middle = candidates[head_count...tail_start]
+      return nil if middle.nil? || middle.empty?
+      from_sequence = middle.first.sequence.to_i
+      through_sequence = middle.last.sequence.to_i
+      if previous_summary
+        from_sequence = [ from_sequence, previous_summary.sequence.to_i ].min
+        through_sequence = [ through_sequence, previous_summary.sequence.to_i ].max
+      end
+      {
+        "middle" => middle,
+        "previous_summary" => previous_summary,
+        "replaces_from_sequence" => from_sequence,
+        "replaces_through_sequence" => through_sequence,
+        "tail_start_sequence" => candidates[tail_start]&.sequence
+      }
+    end
+    def build_prompt(previous_summary:, focus:, target_tokens:)
+      parts = []
+      if previous_summary && !previous_summary.empty?
+        parts << <<~TEXT.strip
+          Update the anchored summary below using the conversation history above.
+          Preserve still-true details, remove stale details, and merge in new facts. Remove stale details that are no longer relevant or have been superseded.
+          <previous-summary>
+          #{previous_summary}
+          </previous-summary>
+        TEXT
+      else
+        parts << <<~TEXT.strip
+          Create a structured context checkpoint for the conversation history above.
+          This summary will replace older TurnKit messages in future model prompts while the original messages remain stored durably.
+        TEXT
+      end
+      if focus && !focus.to_s.strip.empty?
+        parts << <<~TEXT.strip
+          Focus topic: "#{focus}"
+          Preserve extra detail related to this focus topic. Summarize unrelated context more aggressively, but do not omit constraints or active blockers that affect the current task.
+        TEXT
+      end
+      parts << "Target length: approximately #{target_tokens} tokens."
+      parts << SUMMARY_TEMPLATE
+      parts.join("\n\n")
+    end
+    def normalize_config(value)
+      case value
+      when nil, true
+        nil
+      when false
+        false
+      when Hash
+        attrs = value.transform_keys(&:to_s)
+        unknown = attrs.keys - KNOWN_KEYS
+        raise ConfigError, "unknown compaction options: #{unknown.join(", ")}" if unknown.any?
+        attrs
+      else
+        raise ConfigError, "compaction must be true, false, nil, or a Hash"
+      end
+    end
+    def range_for(summary)
+      metadata = summary.compaction_metadata
+      from = metadata["replaces_from_sequence"]
+      through = metadata["replaces_through_sequence"]
+      return nil unless from && through
+      (from.to_i..through.to_i)
+    end
+    def active_summaries(messages)
+      summaries = Array(messages).select(&:context_summary?).sort_by { |summary| summary.sequence.to_i }
+      active = []
+      summaries.reverse_each do |summary|
+        next if active.any? { |newer| (range_for(newer)&.cover?(summary.sequence.to_i)) }
+        active << summary
+      end
+      active.reverse
+    end
+    def tail_start_index(messages, policy)
+      max_messages = policy["tail_messages"].to_i
+      max_tokens = policy["tail_tokens"].to_i
+      count = 0
+      tokens = 0
+      index = messages.length
+      (messages.length - 1).downto(0) do |i|
+        message_tokens = estimate_text_tokens(messages[i].text) + 8
+        break if count >= max_messages
+        break if count.positive? && tokens + message_tokens > max_tokens
+        count += 1
+        tokens += message_tokens
+        index = i
+      end
+      index
+    end
+    def expand_tail_start_for_tool_pairs(messages, tail_start)
+      index = tail_start
+      while index.positive? && messages[index]&.tool_result?
+        call_id = messages[index].metadata["tool_call_id"]
+        call_index = (index - 1).downto(0).find do |i|
+          messages[i].tool_call? && Array(messages[i].metadata["tool_calls"]).any? { |call| call["id"] == call_id || call[:id] == call_id }
+        end
+        break unless call_index
+        index = call_index
+      end
+      index
+    end
+    def generate_summary(agent:, policy:, messages:, previous_summary:, focus:, target_tokens:, fallback_model:, conversation_id:, turn_id:)
+      client = policy["client"] || agent.effective_client
+      model = policy["model"] || fallback_model
+      safe_messages = messages.map { |message| sanitize_message(message, policy) }
+      prompt = build_prompt(previous_summary: previous_summary, focus: focus, target_tokens: target_tokens)
+      result = client.chat(
+        model: model,
+        messages: MessageProjection.for(safe_messages) + [ { role: :user, content: prompt } ],
+        tools: [],
+        instructions: COMPACTION_SYSTEM_PROMPT,
+        metadata: { compaction: true, conversation_id: conversation_id, turn_id: turn_id }
+      )
+      text = result.text.to_s.strip
+      raise CompactionError, "compaction model returned an empty summary" if text.empty?
+      text
+    end
+    def sanitize_message(message, policy)
+      return message unless message.tool_result?
+      max = policy["tool_output_max_chars"].to_i
+      return message if max <= 0 || message.text.length <= max
+      attrs = message.to_h
+      text = "#{message.text[0, max]}\n\n[Tool result truncated for compaction]"
+      Message.new(attrs.merge("text" => text, "content" => [ { "type" => "text", "text" => text } ]))
+    end
+    def append_summary(conversation, turn:, summary:, selected:, policy:, focus:, auto:, input_tokens:)
+      model = policy["model"] || turn&.model || conversation.model || conversation.agent.effective_model
+      conversation.append_message(
+        role: "assistant",
+        kind: "context_summary",
+        text: summary,
+        turn_id: turn&.id,
+        metadata: {
+          "compaction" => {
+            "auto" => auto,
+            "focus" => focus,
+            "replaces_from_sequence" => selected.fetch("replaces_from_sequence"),
+            "replaces_through_sequence" => selected.fetch("replaces_through_sequence"),
+            "tail_start_sequence" => selected["tail_start_sequence"],
+            "summary_model" => model,
+            "input_tokens" => input_tokens,
+            "summary_tokens" => estimate_text_tokens(summary),
+            "created_for_turn_id" => turn&.id,
+            "created_at" => Clock.now.iso8601
+          }.compact
+        }
+      )
+    end
+  end
+end

data/lib/turnkit/conversation.rb CHANGED Viewed

@@ -2,6 +2,8 @@
 module TurnKit
   class Conversation
+    THINKING_UNSET = Object.new.freeze
     attr_reader :agent, :id, :store, :model, :subject, :metadata
     def initialize(agent:, record:, store:, model:, subject: nil, metadata: {})
@@ -24,12 +26,16 @@ module TurnKit
       async ? turn : turn.run!
     end
-    def run!(trigger_message_id: nil, model: nil, budget: nil, parent_turn: nil, parent_tool_execution: nil, depth: 0, agent: self.agent)
-      build_turn(trigger_message_id: trigger_message_id, model: model, budget: budget, parent_turn: parent_turn, parent_tool_execution: parent_tool_execution, depth: depth, agent: agent).run!
+    def run!(trigger_message_id: nil, model: nil, budget: nil, parent_turn: nil, parent_tool_execution: nil, depth: 0, agent: self.agent, thinking: THINKING_UNSET, compact: nil)
+      build_turn(trigger_message_id: trigger_message_id, model: model, budget: budget, parent_turn: parent_turn, parent_tool_execution: parent_tool_execution, depth: depth, agent: agent, thinking: thinking, compact: compact).run!
     end
-    def build_turn(trigger_message_id: nil, model: nil, budget: nil, parent_turn: nil, parent_tool_execution: nil, depth: 0, agent: self.agent)
+    def build_turn(trigger_message_id: nil, model: nil, budget: nil, parent_turn: nil, parent_tool_execution: nil, depth: 0, agent: self.agent, thinking: THINKING_UNSET, compact: nil)
       snapshot = latest_message_sequence
+      effective_thinking = thinking.equal?(THINKING_UNSET) ? agent.effective_thinking : Agent.normalize_thinking(thinking)
+      options = { "trigger_message_id" => trigger_message_id }.compact
+      options["thinking"] = effective_thinking
+      options["compact"] = compact unless compact.nil?
       record = store.create_turn(
         "conversation_id" => id,
         "agent_name" => agent.name,
@@ -39,11 +45,16 @@ module TurnKit
         "context_message_sequence" => snapshot,
         "status" => "pending",
         "model" => model || self.model || agent.effective_model,
-        "options" => { "trigger_message_id" => trigger_message_id }.compact
+        "options" => options
       )
       Turn.new(agent: agent, conversation: self, record: record, store: store, budget: budget, depth: depth)
     end
+    def compact!(focus: nil, model: nil)
+      overrides = { "model" => model }.compact
+      TurnKit::Compaction.compact!(self, agent: agent, focus: focus, auto: false, overrides: overrides)
+    end
     def messages
       store.list_messages(id).map { |attrs| Message.new(attrs) }
     end

data/lib/turnkit/cost.rb CHANGED Viewed

@@ -2,10 +2,10 @@
 module TurnKit
   class Cost
-    COMPONENTS = %i[input output cache_read cache_write].freeze
+    COMPONENTS = %i[input output cache_read cache_write thinking].freeze
     PER_MILLION = 1_000_000.0
-    attr_reader :input, :output, :cache_read, :cache_write
+    attr_reader :input, :output, :cache_read, :cache_write, :thinking
     def self.aggregate(costs)
       costs = costs.compact
@@ -55,6 +55,7 @@ module TurnKit
         output: amount(usage.output_tokens, rates[:output] || rates[:output_per_million]),
         cache_read: amount(usage.cached_tokens, rates[:cache_read] || rates[:cached_input] || rates[:cache_read_input_per_million] || rates[:cached_input_per_million]),
         cache_write: amount(usage.cache_write_tokens, rates[:cache_write] || rates[:cache_creation] || rates[:cache_write_input_per_million] || rates[:cache_creation_input_per_million]),
+        thinking: amount(usage.thinking_tokens, rates[:thinking] || rates[:reasoning] || rates[:thinking_output] || rates[:reasoning_output] || rates[:thinking_output_per_million] || rates[:reasoning_output_per_million]),
         strict: true
       )
     end
@@ -70,7 +71,8 @@ module TurnKit
           input: usage.input_tokens,
           output: usage.output_tokens,
           cached: usage.cached_tokens,
-          cache_creation: usage.cache_write_tokens
+          cache_creation: usage.cache_write_tokens,
+          thinking: usage.thinking_tokens
         )
         from_hash(::RubyLLM::Cost.new(tokens: tokens, model: model_info).to_h)
       else
@@ -92,6 +94,7 @@ module TurnKit
         output: hash[:output],
         cache_read: hash[:cache_read] || hash[:cached_input],
         cache_write: hash[:cache_write] || hash[:cache_creation],
+        thinking: hash[:thinking] || hash[:reasoning] || hash[:thinking_output] || hash[:reasoning_output],
         total: hash[:total]
       )
     end
@@ -119,11 +122,12 @@ module TurnKit
       tokens.to_i * price.to_f / PER_MILLION
     end
-    def initialize(input: nil, output: nil, cache_read: nil, cache_write: nil, total: nil, strict: false)
+    def initialize(input: nil, output: nil, cache_read: nil, cache_write: nil, thinking: nil, total: nil, strict: false)
       @input = number(input)
       @output = number(output)
       @cache_read = number(cache_read)
       @cache_write = number(cache_write)
+      @thinking = number(thinking)
       @total = number(total)
       @strict = strict
     end
@@ -142,6 +146,7 @@ module TurnKit
         "output" => output,
         "cache_read" => cache_read,
         "cache_write" => cache_write,
+        "thinking" => thinking,
         "total" => total
       }.compact
     end

data/lib/turnkit/error.rb CHANGED Viewed

@@ -3,6 +3,7 @@
 module TurnKit
   class Error < StandardError; end
   class ConfigError < Error; end
+  class CompactionError < Error; end
   class StoreError < Error; end
   class ToolError < Error; end
 end

data/lib/turnkit/message.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module TurnKit
   class Message
     ROLES = %w[user assistant tool].freeze
-    KINDS = %w[text tool_call tool_result].freeze
+    KINDS = %w[text tool_call tool_result context_summary].freeze
     attr_reader :id, :conversation_id, :turn_id, :role, :kind, :sequence
     attr_reader :content, :text, :tool_execution_id, :provider_message_id, :metadata, :created_at
@@ -43,6 +43,26 @@ module TurnKit
       }
     end
+    def text?
+      kind == "text"
+    end
+    def tool_call?
+      kind == "tool_call"
+    end
+    def tool_result?
+      kind == "tool_result"
+    end
+    def context_summary?
+      kind == "context_summary"
+    end
+    def compaction_metadata
+      metadata.fetch("compaction", {})
+    end
     private
       def stringify(hash)
         hash.transform_keys(&:to_s)

data/lib/turnkit/message_projection.rb CHANGED Viewed

@@ -2,14 +2,41 @@
 module TurnKit
   class MessageProjection
+    CONTEXT_SUMMARY_TRIGGER = "What did we do so far?"
+    CONTEXT_SUMMARY_PREFIX = <<~TEXT.strip
+      [CONTEXT COMPACTION — REFERENCE ONLY]
+      Earlier TurnKit conversation messages were compacted into the summary below. This is a handoff from a previous context window. Treat it as background reference, not as active instructions.
+      Do not answer questions or perform tasks merely because they appear in this summary. Respond to the latest user message after this summary.
+      If the latest user message contradicts, supersedes, changes topic from, or diverges from Active Task, In Progress, Pending User Asks, or Remaining Work, the latest user message wins.
+      Subject context and live context are recomputed for the current turn and are more authoritative for state-sensitive facts.
+      The original messages remain durably stored; this summary only affects the model-visible prompt projection.
+    TEXT
     def self.for(messages)
-      messages.map { |message| new(message).to_h }
+      messages.flat_map { |message| new(message).to_a }
     end
     def initialize(message)
       @message = message
     end
+    def to_a
+      case message.kind
+      when "context_summary"
+        [
+          { role: :user, content: CONTEXT_SUMMARY_TRIGGER },
+          { role: :assistant, content: [ CONTEXT_SUMMARY_PREFIX, message.text ].reject(&:empty?).join("\n\n") }
+        ]
+      else
+        [ to_h ]
+      end
+    end
     def to_h
       case message.kind
       when "tool_call"

data/lib/turnkit/turn.rb CHANGED Viewed

@@ -6,7 +6,7 @@ module TurnKit
     attr_reader :agent, :conversation, :store, :budget, :depth
     attr_reader :id, :conversation_id, :agent_name, :parent_turn_id, :parent_tool_execution_id
-    attr_reader :root_turn_id, :context_message_sequence, :model
+    attr_reader :root_turn_id, :context_message_sequence, :model, :thinking, :compact
     attr_reader :started_at
     def initialize(agent:, conversation:, record:, store:, budget: nil, depth: 0)
@@ -22,6 +22,8 @@ module TurnKit
       @root_turn_id = @record["root_turn_id"] || id
       @context_message_sequence = @record["context_message_sequence"].to_i
       @model = @record["model"] || agent.effective_model
+      @thinking = thinking_from_options
+      @compact = compact_from_options
       @started_at = @record["started_at"]
       @budget = budget || agent.build_budget
       @depth = depth
@@ -34,12 +36,14 @@ module TurnKit
       loop do
         budget.check!(depth: depth)
         budget.count_iteration!
+        TurnKit::Compaction.maybe_compact!(self)
         result = agent.effective_client.chat(
           model: model,
           messages: llm_messages,
           tools: agent.effective_tools,
           instructions: agent.system_prompt_for(turn: self, conversation: conversation),
+          thinking: thinking,
           metadata: { turn_id: id, conversation_id: conversation.id }
         )
         result_cost = Cost.from_usage(result.usage, model: result.model || model)
@@ -94,6 +98,8 @@ module TurnKit
     def reload
       @record = store.load_turn(id)
+      @thinking = thinking_from_options
+      @compact = compact_from_options
       self
     end
@@ -103,7 +109,19 @@ module TurnKit
     private
       def llm_messages
-        MessageProjection.for(conversation.messages_for_turn(self))
+        MessageProjection.for(TurnKit::Compaction.project(conversation.messages_for_turn(self)))
+      end
+      def thinking_from_options
+        options = (@record["options"] || {}).transform_keys(&:to_s)
+        return Agent.normalize_thinking(options["thinking"]) if options.key?("thinking")
+        agent.effective_thinking
+      end
+      def compact_from_options
+        options = (@record["options"] || {}).transform_keys(&:to_s)
+        options["compact"] if options.key?("compact")
       end
       def persist_assistant_message(result)
@@ -133,6 +151,7 @@ module TurnKit
           "output_tokens" => current["output_tokens"].to_i + usage.output_tokens,
           "cached_tokens" => current["cached_tokens"].to_i + usage.cached_tokens,
           "cache_write_tokens" => current["cache_write_tokens"].to_i + usage.cache_write_tokens,
+          "thinking_tokens" => current["thinking_tokens"].to_i + usage.thinking_tokens,
           "total_tokens" => current["total_tokens"].to_i + usage.total_tokens
         }
         totals["cost_details"] = aggregate_cost(current["cost_details"], cost).to_h if cost&.total

data/lib/turnkit/usage.rb CHANGED Viewed

@@ -2,7 +2,7 @@
 module TurnKit
   class Usage
-    attr_reader :input_tokens, :output_tokens, :cached_tokens, :cache_write_tokens, :cost
+    attr_reader :input_tokens, :output_tokens, :cached_tokens, :cache_write_tokens, :thinking_tokens, :cost
     def self.aggregate(usages)
       usages = usages.compact
@@ -13,6 +13,7 @@ module TurnKit
         output_tokens: usages.sum(&:output_tokens),
         cached_tokens: usages.sum(&:cached_tokens),
         cache_write_tokens: usages.sum(&:cache_write_tokens),
+        thinking_tokens: usages.sum(&:thinking_tokens),
         cost: cost
       )
     end
@@ -29,20 +30,22 @@ module TurnKit
         output_tokens: attrs["output_tokens"],
         cached_tokens: attrs["cached_tokens"],
         cache_write_tokens: attrs["cache_write_tokens"],
+        thinking_tokens: attrs["thinking_tokens"] || attrs["reasoning_tokens"],
         cost: cost
       )
     end
-    def initialize(input_tokens: 0, output_tokens: 0, cached_tokens: 0, cache_write_tokens: 0, cost: nil)
+    def initialize(input_tokens: 0, output_tokens: 0, cached_tokens: 0, cache_write_tokens: 0, thinking_tokens: 0, cost: nil)
       @input_tokens = input_tokens.to_i
       @output_tokens = output_tokens.to_i
       @cached_tokens = cached_tokens.to_i
       @cache_write_tokens = cache_write_tokens.to_i
+      @thinking_tokens = thinking_tokens.to_i
       @cost = cost
     end
     def total_tokens
-      input_tokens + output_tokens + cached_tokens + cache_write_tokens
+      input_tokens + output_tokens + cached_tokens + cache_write_tokens + thinking_tokens
     end
     def to_h
@@ -51,6 +54,7 @@ module TurnKit
         "output_tokens" => output_tokens,
         "cached_tokens" => cached_tokens,
         "cache_write_tokens" => cache_write_tokens,
+        "thinking_tokens" => thinking_tokens,
         "total_tokens" => total_tokens,
         "cost" => cost
       }.compact

data/lib/turnkit/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module TurnKit
-  VERSION = "0.2.4"
+  VERSION = "0.2.6"
 end

data/lib/turnkit.rb CHANGED Viewed

@@ -25,6 +25,7 @@ require_relative "turnkit/prompt_contribution"
 require_relative "turnkit/system_prompt"
 require_relative "turnkit/store"
 require_relative "turnkit/memory_store"
+require_relative "turnkit/compaction"
 require_relative "turnkit/tool"
 require_relative "turnkit/tool_call"
 require_relative "turnkit/tool_execution"
@@ -43,6 +44,7 @@ module TurnKit
     attr_accessor :default_model, :client, :store, :logger
     attr_accessor :max_iterations, :timeout, :max_depth, :max_tool_executions
     attr_accessor :cost_limit, :prompt_cache
+    attr_accessor :compaction
     attr_accessor :cost_rates, :cost_calculator
     attr_accessor :prompt_sections, :prompt_behavior, :available_skills
     attr_accessor :prompt_data_max_chars, :context_contributors
@@ -59,6 +61,7 @@ module TurnKit
   self.max_depth = 3
   self.max_tool_executions = 100
   self.prompt_cache = :auto
+  self.compaction = true
   self.cost_rates = {}
   self.prompt_sections = SystemPrompt::DEFAULT_SECTIONS.dup
   self.prompt_data_max_chars = 20_000

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: turnkit
 version: !ruby/object:Gem::Version
-  version: 0.2.4
+  version: 0.2.6
 platform: ruby
 authors:
 - Sam Couch
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2026-06-06 00:00:00.000000000 Z
+date: 2026-06-07 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ruby_llm
@@ -42,6 +42,7 @@ files:
 - lib/turnkit/budget.rb
 - lib/turnkit/client.rb
 - lib/turnkit/clock.rb
+- lib/turnkit/compaction.rb
 - lib/turnkit/conversation.rb
 - lib/turnkit/cost.rb
 - lib/turnkit/error.rb