RubyGems - kairos-chain - Versions diffs - 3.5.0 → 3.6.0 - Mend

kairos-chain 3.5.0 → 3.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (53) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 330c7ed792c82c5484002e3d4c34eadee17b12a3dff89b4740bf331f49973565
-  data.tar.gz: eaec92628e858a914b10863221c3463e05767e7e01dfaaf5fe430ff667a4c04a
+  metadata.gz: e63245a9c8dd3d8e79b83ad1eba7697cbf2075832b1afbae7080c4b1dea0e0de
+  data.tar.gz: 56ec7236649c644bd73fccbe35989640ac169831e8300531821c46ff01805475
 SHA512:
-  metadata.gz: d55b480528d2a810bf254948ea867f888e6c426e370fff6ec069f146479fbd2e80908b7ea0f2349f713f76763fca34d108eadd5f3592eb51c91125f36466c9bc
-  data.tar.gz: 72d9eeb8df3c7185d6ad411e4bc77305b74d29002c2022e529fcab0de52fd6706c4048f508039c9b94ebff3db96aa6eeb30a69b736fedeb74e67e867ad4ed912
+  metadata.gz: cb2ddc02ed24e10dcd76e6cf6f8149588b135011b8e777c52324961fc06ecb44a6810c1b97cdedda66bbcc8bcca484400a7f6a245dfbe7c1b831b7b2b0b04003
+  data.tar.gz: 455843011cef593d7ff7693d66f22be19dad14f70fe5b26cccc131cae903362089e35367e4410ba91138a9822b9b8adb6823f7ca98420bb6d9811e8f5233f106

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,71 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
 This project follows [Semantic Versioning](https://semver.org/).
+## [3.6.0] - 2026-03-28
+### Added
+- **Agent SkillSet** — OODA cognitive loop for autonomous task execution
+  - `agent_start`: Initialize agent session with mandate and goal
+  - `agent_step`: Execute one OODA cycle (Observe → Orient → Decide → Act via autoexec)
+  - `agent_status`: View cycle history and active mandates
+  - `agent_stop`: End agent session with reflection
+  - Cumulative progress file (`progress.jsonl`) for cross-cycle continuity
+  - Loop detection via decision_payload summary comparison
+  - Multi-cycle mandate progression with checkpoint
+  - 90 tests across M1-M4 milestones
+- **mcp_client SkillSet** — Connect to external MCP servers as a client
+  - `mcp_connect`: Establish connection to remote MCP server (HTTP JSON-RPC)
+  - `mcp_disconnect`: Close connection and unregister proxy tools
+  - `mcp_list_remote`: List available tools on connected server
+  - `ProxyTool`: Dynamic tool proxying with namespace prefixing
+  - `ConnectionManager`: Singleton with lifecycle management
+  - Dual blacklist (Agent + InvocationContext) for security
+  - ORIENT_TOOLS integration for Agent SkillSet awareness
+  - 25 tests (Client 6, ConnectionManager 7, ProxyTool 4, Registry 3, E2E 5)
+- **Attestation Nudge** (MMP SkillSet) — Proactive attestation prompts
+  - Tracks usage of acquired skills, suggests attestation after threshold
+  - `register_gate(:attestation_nudge)` passive observer (zero L0 changes)
+  - Gate detects `resource_read`/`knowledge_get` access to received skills
+  - In-memory tool_name/file_path indexes for O(1) gate miss path
+  - `flock(LOCK_EX)` atomic JSON file updates
+  - Time-window throttling: `cooldown_hours` + `nudge_interval_hours`
+  - Passive decline: nudge emission starts cooldown
+  - Nudge footer on 5 MMP tools (browse, connect, details, preview, freshness)
+  - `sanitize_for_display` for remote metadata in nudge messages
+  - 39 tests, 4 rounds of multi-LLM review (3/3 APPROVE including Codex)
+- **InvocationContext** — Tool invocation chain tracking
+  - Depth limiting, caller tracking, mandate propagation
+  - Whitelist/blacklist policy enforcement at registry boundary
+  - `derive` method for Agent SkillSet tool_names extraction
+  - 59 tests
+### Changed
+- **L1 Knowledge Consolidation** (4 → 3 skills):
+  - `multi_llm_review_workflow` v3.1: merged with `multi_llm_design_review` (methodology + CLI execution in single skill)
+  - `multi_llm_reviewer_evaluation` v1.1: Codex convergence behavior data, APPROVE signal reliability
+  - `design_to_implementation_workflow` v1.1: self-review phase, implementation review phase, Persona Assembly merge gate
+  - Deleted: `multi_llm_design_review` (absorbed into `multi_llm_review_workflow`)
+  - Self-referential review: v3.0 reviewed by its own multi-LLM process → v3.1
+- **meeting_attest_skill**: Fail-closed when `content_hash` is nil (previously fail-open)
+- **autoexec**: Enhanced `task_dsl` and `plan_store` for Agent SkillSet integration
+### Fixed
+- **Phase 4 review fixes**: Notification method, restore hook, race condition, stale proxy
+- **Mandate save race**: Single atomic save (no update_status then stale save)
+- **Attestation Nudge race condition**: `rebuild_indexes_from(data)` inside `with_locked_data`
+- **Attestation Nudge index staleness**: `mark_attested` rebuilds indexes
+- **Attestation Nudge JSON recovery**: `with_locked_data` recovers from corrupted JSON
+---
 ## [3.5.0] - 2026-03-27
 ### Added

data/lib/kairos_mcp/invocation_context.rb ADDED Viewed

@@ -0,0 +1,118 @@
+# frozen_string_literal: true
+require 'securerandom'
+module KairosMcp
+  # Tracks invocation chain metadata for internal tool-to-tool calls.
+  # Carries depth, caller, mandate, and policy (whitelist/blacklist) through
+  # the entire invocation chain. Created by BaseTool#invoke_tool, threaded
+  # through ToolRegistry#call_tool.
+  class InvocationContext
+    MAX_DEPTH = 10
+    attr_reader :depth, :caller_tool, :mandate_id, :token_budget,
+                :whitelist, :blacklist, :root_invocation_id
+    def initialize(depth: 0, caller_tool: nil, mandate_id: nil,
+                   token_budget: nil, whitelist: nil, blacklist: nil,
+                   root_invocation_id: nil)
+      @depth = depth
+      @caller_tool = caller_tool
+      @mandate_id = mandate_id
+      @token_budget = token_budget
+      @whitelist = whitelist
+      @blacklist = blacklist
+      @root_invocation_id = root_invocation_id || SecureRandom.hex(8)
+    end
+    # Create a child context for a nested invocation.
+    # Inherits all policy from the parent; increments depth.
+    def child(caller_tool:)
+      raise DepthExceededError, "Max invocation depth (#{MAX_DEPTH}) exceeded" if @depth >= MAX_DEPTH
+      self.class.new(
+        depth: @depth + 1,
+        caller_tool: caller_tool,
+        mandate_id: @mandate_id,
+        token_budget: @token_budget,
+        whitelist: @whitelist&.dup,
+        blacklist: @blacklist&.dup,
+        root_invocation_id: @root_invocation_id
+      )
+    end
+    # Derive a new context with modified blacklist, preserving all other fields.
+    # Used by agent ACT phase to selectively unblock autoexec tools.
+    # Does NOT increment depth — child() does that at invoke_tool time.
+    def derive(blacklist_remove: [], blacklist_add: [])
+      new_blacklist = Array(@blacklist).dup
+      blacklist_remove.each { |pat| new_blacklist.delete(pat) }
+      blacklist_add.each { |pat| new_blacklist << pat unless new_blacklist.include?(pat) }
+      self.class.new(
+        depth: @depth,
+        caller_tool: @caller_tool,
+        mandate_id: @mandate_id,
+        token_budget: @token_budget,
+        whitelist: @whitelist&.dup,
+        blacklist: new_blacklist.empty? ? nil : new_blacklist,
+        root_invocation_id: @root_invocation_id
+      )
+    end
+    # Serialize to a plain Hash for passing through tool arguments.
+    # Only includes policy-relevant fields (whitelist, blacklist, mandate_id, token_budget).
+    def to_h
+      {
+        'whitelist' => @whitelist,
+        'blacklist' => @blacklist,
+        'mandate_id' => @mandate_id,
+        'token_budget' => @token_budget
+      }
+    end
+    def to_json(*args)
+      require 'json'
+      to_h.to_json(*args)
+    end
+    # Reconstruct policy from a Hash (e.g., parsed from tool arguments).
+    # Only restores policy fields — depth and caller are not transferred.
+    def self.from_h(hash)
+      return nil if hash.nil?
+      new(
+        whitelist: hash['whitelist'],
+        blacklist: hash['blacklist'],
+        mandate_id: hash['mandate_id'],
+        token_budget: hash['token_budget']
+      )
+    end
+    def self.from_json(json_string)
+      require 'json'
+      from_h(JSON.parse(json_string))
+    end
+    # Check if a tool is allowed by whitelist/blacklist policy.
+    # Blacklist is checked first (deny wins). Both use fnmatch patterns.
+    # For namespaced tools (e.g., "peer1/agent_start"), also checks
+    # the bare name ("agent_start") to prevent blacklist bypass via
+    # remote proxy tool namespace prefix.
+    def allowed?(tool_name)
+      names = [tool_name]
+      names << tool_name.split('/').last if tool_name.include?('/')
+      if @blacklist
+        return false if names.any? { |n| @blacklist.any? { |pat| File.fnmatch(pat, n) } }
+      end
+      if @whitelist
+        return names.any? { |n| @whitelist.any? { |pat| File.fnmatch(pat, n) } }
+      end
+      true
+    end
+    class DepthExceededError < StandardError; end
+    class PolicyDeniedError < StandardError; end
+  end
+end

data/lib/kairos_mcp/protocol.rb CHANGED Viewed

@@ -168,9 +168,10 @@ module KairosMcp
     end
     def handle_tools_list
-      {
-        tools: @tool_registry.list_tools
-      }
+      # Filter namespaced proxy tools (e.g., "peer1/tool") from external clients
+      # to prevent infinite proxy loops. Internal call_tool/tool_exists? still sees them.
+      tools = @tool_registry.list_tools.reject { |t| t[:name].to_s.include?('/') }
+      { tools: tools }
     end
     def handle_tools_call(params)

data/lib/kairos_mcp/skill_tool_adapter.rb CHANGED Viewed

@@ -5,8 +5,8 @@ module KairosMcp
   # Adapter that wraps a Skill with tool_config as an MCP Tool
   # This allows skills defined in kairos.rb to be exposed as MCP tools
   class SkillToolAdapter < Tools::BaseTool
-    def initialize(skill, safety = nil)
-      super(safety)
+    def initialize(skill, safety = nil, registry: nil)
+      super(safety, registry: registry)
       @skill = skill
       @tool_config = skill.tool_config
     end

data/lib/kairos_mcp/tool_registry.rb CHANGED Viewed

@@ -130,6 +130,9 @@ module KairosMcp
       # Skill-based tools (from kairos.rb with tool block)
       register_skill_tools if skill_tools_enabled?
+      # Restore dynamic proxy tools from active mcp_client connections (Phase 4)
+      restore_dynamic_tools
     end
     # Register tools from enabled SkillSets
@@ -154,26 +157,11 @@ module KairosMcp
       Kairos.skills.each do |skill|
         next unless skill.has_tool?  # Only skills with tool block and executor
-        adapter = SkillToolAdapter.new(skill, @safety)
+        adapter = SkillToolAdapter.new(skill, @safety, registry: self)
         register(adapter)
       end
     end
-    def skill_tools_enabled?
-      SkillsConfig.load['skill_tools_enabled'] == true
-    end
-    def register_if_defined(class_name)
-      klass = Object.const_get(class_name)
-      register(klass.new(@safety))
-    rescue NameError
-      # Class not defined yet (file might not exist), ignore
-    end
-    def register(tool)
-      @tools[tool.name] = tool
-    end
     def set_workspace(roots)
       @safety.set_workspace(roots)
     end
@@ -182,16 +170,71 @@ module KairosMcp
       @tools.values.map(&:to_schema)
     end
-    def call_tool(name, arguments)
+    # Register a pre-built tool instance (e.g., proxy tools from mcp_client).
+    # Cannot overwrite local (non-proxy) tools to prevent accidental replacement.
+    def register_dynamic_tool(tool_instance)
+      name = tool_instance.name
+      existing = @tools[name]
+      if existing && !existing.respond_to?(:remote_name)
+        raise "Cannot override local tool '#{name}' with dynamic registration"
+      end
+      @tools[name] = tool_instance
+    end
+    # Remove a dynamically registered tool (e.g., on mcp_disconnect).
+    def unregister_tool(name)
+      @tools.delete(name)
+    end
+    def call_tool(name, arguments, invocation_context: nil)
       tool = @tools[name]
       unless tool
         raise "Tool not found: #{name}"
       end
+      # Defense-in-depth: enforce invocation policy at the registry boundary.
+      # This duplicates the check in BaseTool#invoke_tool so that direct
+      # call_tool calls with a context also respect whitelist/blacklist.
+      if invocation_context && !invocation_context.allowed?(name)
+        raise InvocationContext::PolicyDeniedError,
+              "Tool '#{name}' blocked by invocation policy at registry boundary"
+      end
       self.class.run_gates(name, arguments, @safety)
       tool.call(arguments)
     rescue GateDeniedError => e
       [{ type: 'text', text: JSON.pretty_generate({ error: 'forbidden', message: e.message }) }]
+    rescue InvocationContext::DepthExceededError, InvocationContext::PolicyDeniedError => e
+      [{ type: 'text', text: JSON.pretty_generate({ error: 'invocation_denied', message: e.message }) }]
+    end
+    private
+    def skill_tools_enabled?
+      SkillsConfig.load['skill_tools_enabled'] == true
+    end
+    def register_if_defined(class_name)
+      klass = Object.const_get(class_name)
+      register(klass.new(@safety, registry: self))
+    rescue NameError
+      # Class not defined yet (file might not exist), ignore
+    end
+    def register(tool)
+      @tools[tool.name] = tool
+    end
+    # Restore dynamic proxy tools from active mcp_client connections.
+    # Called at the end of register_tools so that HTTP-mode registries
+    # (which are recreated per request) pick up existing connections.
+    def restore_dynamic_tools
+      return unless defined?(KairosMcp::SkillSets::McpClient::ConnectionManager)
+      conn_mgr = KairosMcp::SkillSets::McpClient::ConnectionManager.instance
+      conn_mgr.restore_proxy_tools(self, @safety)
+    rescue StandardError
+      nil  # mcp_client SkillSet may not be loaded
     end
   end
 end

data/lib/kairos_mcp/tools/base_tool.rb CHANGED Viewed

@@ -1,8 +1,28 @@
+require_relative '../invocation_context'
 module KairosMcp
   module Tools
     class BaseTool
-      def initialize(safety = nil)
+      def initialize(safety = nil, registry: nil)
         @safety = safety
+        @registry = registry
+      end
+      # Invoke another tool through the same ToolRegistry, preserving the
+      # full gate pipeline and invocation policy (whitelist/blacklist/depth).
+      # Only available when the tool was registered with a registry reference.
+      def invoke_tool(tool_name, arguments = {}, context: nil)
+        raise "Tool invocation not available (no registry)" unless @registry
+        ctx = context || InvocationContext.new
+        child_ctx = ctx.child(caller_tool: name)
+        unless child_ctx.allowed?(tool_name)
+          raise InvocationContext::PolicyDeniedError,
+                "Tool '#{tool_name}' blocked by invocation policy (caller: #{name})"
+        end
+        @registry.call_tool(tool_name, arguments, invocation_context: child_ctx)
       end
       def name

data/lib/kairos_mcp/version.rb CHANGED Viewed

@@ -1,4 +1,4 @@
 module KairosMcp
-  VERSION = "3.5.0"
+  VERSION = "3.6.0"
   CHANGELOG_URL = "https://github.com/masaomi/KairosChain_2026/blob/main/CHANGELOG.md"
 end

data/templates/knowledge/design_to_implementation_workflow/design_to_implementation_workflow.md ADDED Viewed

@@ -0,0 +1,196 @@
+---
+name: design_to_implementation_workflow
+description: "Full-lifecycle workflow for complex features: design review, self-review, implementation review, and final merge gate. Derived from Service Grant + Attestation Nudge experiments."
+version: "1.1"
+tags:
+  - workflow
+  - implementation
+  - multi-llm
+  - design-review
+  - methodology
+  - self-review
+---
+# Design-to-Implementation Workflow
+## Overview
+A structured workflow for implementing complex features (Tier 2+) that maximizes
+quality through multiple review checkpoints. Each checkpoint finds categorically
+different bugs.
+## Full Lifecycle Model (v1.1)
+```
+┌─────────────────────────────────────────────────────────────┐
+│ DESIGN PHASE                                                │
+│                                                             │
+│  Draft v0.1 ──→ Multi-LLM Review R1 ──→ Fix ──→ v0.2      │
+│                 (structural gaps)                           │
+│                                                             │
+│  v0.2 ──→ Multi-LLM Review R2 ──→ Fix ──→ v0.3            │
+│            (fix correctness)                                │
+│                                                             │
+│  Convergence: 0 FAIL, 2/3+ APPROVE                         │
+├─────────────────────────────────────────────────────────────┤
+│ IMPLEMENTATION PHASE                                        │
+│                                                             │
+│  Implement from v0.3 ──→ Tests pass                        │
+│                                                             │
+│  Self-Review (Agent subagent) ──→ Fix P0/P1                │
+│  (race conditions, edge cases, code quality)                │
+│                                                             │
+│  Tests pass again                                           │
+├─────────────────────────────────────────────────────────────┤
+│ VERIFICATION PHASE                                          │
+│                                                             │
+│  Multi-LLM Implementation Review ──→ Fix                   │
+│  (missing wiring, fail-open, integration gaps)              │
+│                                                             │
+│  Final Multi-LLM Review + Persona Assembly                  │
+│  (merge gate: 3/3 APPROVE = merge-ready)                   │
+└─────────────────────────────────────────────────────────────┘
+```
+## When to Use This Workflow
+| Tier | Scope | Design Review | Self-Review | Impl Review | Final Review |
+|------|-------|--------------|-------------|-------------|--------------|
+| 1 | Single file, known pattern | Skip | Optional | Skip | Skip |
+| 2 | Multi-file, SkillSet feature | 1-2 rounds | Recommended | 1 round | Optional |
+| 3 | Cross-component, new subsystem | 2-3 rounds | Required | 1 round | Required |
+| 3+ | Security-critical | 2-3 rounds | Required | 1 round | Required + Persona Assembly |
+## Phase Details
+### Design Phase
+#### Solo Design (v0.1)
+- Single LLM (Opus-class) produces initial design
+- Include: architecture, component design, schema, error handling, phase boundaries
+- Output: Complete design document with pseudocode
+#### Multi-LLM Review Rounds
+- **3 reviewers**: Claude Opus 4.6 + Codex GPT-5.4 + Composer-2
+- **Convergence criteria**: 0 FAIL, 2/3+ APPROVE
+- **Typical rounds**: 2-3 for Tier 3 complexity
+- **Convergence curve**:
+  - R1: Structural gaps — "this is missing" (existence)
+  - R2: Fix correctness — "the fix is wrong" (accuracy)
+  - R3: Refinement — "minor adjustments" (polish)
+### Implementation Phase
+#### Implementation
+- Single Opus-class LLM for context preservation
+- Follow design document's phase ordering
+- Implement → test within each component before moving to next
+#### Self-Review (NEW in v1.1)
+Before requesting external multi-LLM review, run a self-review using an Agent subagent:
+```
+Agent(subagent_type: "general-purpose"):
+  "Review [file] for bugs, race conditions, edge cases,
+   test coverage gaps. Categorize as P0/P1/P2."
+```
+**Why self-review matters**:
+- Finds P0 bugs cheaply (no external LLM cost)
+- Catches implementation-level issues design review can't see
+- Example: P0 race condition in `rebuild_indexes` (unlocked file read) — found by self-review, invisible to design review
+**What self-review finds** (confirmed in Attestation Nudge session):
+- Race conditions in file I/O patterns
+- Index staleness after state transitions
+- Missing error recovery paths (corrupted JSON)
+- Test coverage gaps for edge cases
+### Verification Phase
+#### Implementation Review (NEW in v1.1)
+After self-review fixes, run full multi-LLM review of the **implemented code** (not design doc):
+**Key difference from design review**: Implementation review finds **categorically different bugs**:
+| Design Review Finds | Implementation Review Finds |
+|--------------------|-----------------------------|
+| "This API doesn't exist" | "This method has no call site" |
+| "The key model is inconsistent" | "The fail-open path is exploitable" |
+| "Session concept is undefined" | "The return type doesn't match the guard" |
+**Attestation Nudge data point**:
+- Design review: 8 findings across 2 rounds (structural + correctness)
+- Implementation review: 5 findings in 1 round (wiring + integration)
+- **Zero overlap** between design and implementation findings
+#### Final Review + Persona Assembly
+For Tier 3+ or pre-merge gates:
+```
+Claude Persona Assembly (4 personas):
+  Kairos    — Philosophical alignment, layer boundaries
+  Guardian  — Security, fail-safe behavior, flock correctness
+  Pragmatist — Code quality, test coverage, performance
+  Skeptic   — What breaks first? Scale? Silent failures?
+```
+**When to use Persona Assembly**:
+- Final merge gate for Tier 3+ features
+- Safety-critical components
+- NOT for intermediate rounds (diminishing returns)
+**Merge criteria**: 3/3 APPROVE with 0 FAIL. Codex APPROVE is the strongest signal (see `multi_llm_reviewer_evaluation`).
+## Effort Level Selection
+| Phase | Effort | Rationale |
+|-------|--------|-----------|
+| Design review | High | Maximize gap detection |
+| Implementation | Medium | Design is detailed; faithful translation |
+| Self-review | Low | Quick Agent pass, fix obvious issues |
+| Implementation review | High | Find wiring/integration bugs |
+| Final review | High | Merge gate with Persona Assembly |
+## Tool Usage During Implementation
+| Tool | Purpose | Timing |
+|------|---------|--------|
+| knowledge_get (L1) | Load domain context | Session start |
+| context_save (L2) | Save session progress | Session end / milestone |
+| Agent (subagent) | Self-review | After implementation, before external review |
+### What NOT to Use During Implementation
+- **Autonomos**: Overhead of observe/orient/decide is wasteful when design document
+  already serves as roadmap. Save for exploratory phases.
+- **autoexec**: Designed for structured JSON step plans, not free-form coding
+- **Agent team**: Context fragmentation across agents. Single LLM preserves
+  cross-component coherence for tightly-coupled implementations.
+## Convergence Data
+### Service Grant (Tier 3, 2026-03-18)
+- Design: v1.0 → v1.4, 3 review rounds, 3 LLMs
+- Design review findings: R1: 8 P0/P1, R2: 2 FAIL + 28 CONCERN, R3: 0 FAIL
+- Implementation: Phase 0-3, 2 rounds implementation review
+- Total bugs found: 8 (design) + 13 (implementation review) + 2 (during coding)
+### Attestation Nudge (Tier 2, 2026-03-28)
+- Design: v0.1 → v0.3, 2 review rounds, 3 LLMs
+- Self-review: 4 fixes (P0-1 race, P1-4 staleness, P1-6 test gap, P2-2 recovery)
+- Implementation review: 3 fixes (missing call site, fail-open attest, escaping)
+- Final review (Persona Assembly): 0 FAIL, 3/3 APPROVE
+- **Codex convergence**: REJECT → REJECT → REJECT → APPROVE (4 rounds)
+## Anti-Patterns
+- Implementing Phase 2+ when Phase 1 prerequisites aren't met
+- Using agent team for implementation (context fragmentation)
+- Skipping self-review (misses cheap P0 fixes)
+- Skipping implementation review (design review can't find wiring bugs)
+- Treating Codex REJECT as "too strict" without investigating (usually substantive)
+- Using Persona Assembly in every round (diminishing returns; save for final gate)
+- Implementing without design review for Tier 3 complexity ("just implement it")