kairos-chain 3.5.0 → 3.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +65 -0
- data/lib/kairos_mcp/invocation_context.rb +118 -0
- data/lib/kairos_mcp/protocol.rb +4 -3
- data/lib/kairos_mcp/skill_tool_adapter.rb +2 -2
- data/lib/kairos_mcp/tool_registry.rb +60 -17
- data/lib/kairos_mcp/tools/base_tool.rb +21 -1
- data/lib/kairos_mcp/version.rb +1 -1
- data/templates/knowledge/design_to_implementation_workflow/design_to_implementation_workflow.md +196 -0
- data/templates/knowledge/multi_llm_review_workflow/multi_llm_review_workflow.md +358 -0
- data/templates/knowledge/multi_llm_reviewer_evaluation/multi_llm_reviewer_evaluation.md +185 -0
- data/templates/skillsets/agent/config/agent.yml +43 -0
- data/templates/skillsets/agent/lib/agent/cognitive_loop.rb +146 -0
- data/templates/skillsets/agent/lib/agent/mandate_adapter.rb +45 -0
- data/templates/skillsets/agent/lib/agent/message_format.rb +33 -0
- data/templates/skillsets/agent/lib/agent/session.rb +193 -0
- data/templates/skillsets/agent/lib/agent.rb +6 -0
- data/templates/skillsets/agent/skillset.json +21 -0
- data/templates/skillsets/agent/test/test_agent_m1.rb +329 -0
- data/templates/skillsets/agent/test/test_agent_m2.rb +625 -0
- data/templates/skillsets/agent/test/test_agent_m3.rb +710 -0
- data/templates/skillsets/agent/test/test_agent_m4.rb +545 -0
- data/templates/skillsets/agent/tools/agent_start.rb +150 -0
- data/templates/skillsets/agent/tools/agent_status.rb +75 -0
- data/templates/skillsets/agent/tools/agent_step.rb +481 -0
- data/templates/skillsets/agent/tools/agent_stop.rb +74 -0
- data/templates/skillsets/autoexec/lib/autoexec/plan_store.rb +46 -14
- data/templates/skillsets/autoexec/lib/autoexec/risk_classifier.rb +26 -0
- data/templates/skillsets/autoexec/lib/autoexec/task_dsl.rb +81 -8
- data/templates/skillsets/autoexec/tools/autoexec_plan.rb +7 -2
- data/templates/skillsets/autoexec/tools/autoexec_run.rb +126 -10
- data/templates/skillsets/mcp_client/config/mcp_client.yml +15 -0
- data/templates/skillsets/mcp_client/lib/mcp_client/client.rb +110 -0
- data/templates/skillsets/mcp_client/lib/mcp_client/connection_manager.rb +127 -0
- data/templates/skillsets/mcp_client/lib/mcp_client/proxy_tool.rb +62 -0
- data/templates/skillsets/mcp_client/lib/mcp_client.rb +5 -0
- data/templates/skillsets/mcp_client/skillset.json +14 -0
- data/templates/skillsets/mcp_client/test/test_mcp_client.rb +487 -0
- data/templates/skillsets/mcp_client/tools/mcp_connect.rb +116 -0
- data/templates/skillsets/mcp_client/tools/mcp_disconnect.rb +55 -0
- data/templates/skillsets/mcp_client/tools/mcp_list_remote.rb +50 -0
- data/templates/skillsets/mmp/config/meeting.yml +6 -0
- data/templates/skillsets/mmp/lib/mmp/attestation_nudge.rb +277 -0
- data/templates/skillsets/mmp/lib/mmp.rb +24 -0
- data/templates/skillsets/mmp/tools/meeting_acquire_skill.rb +13 -0
- data/templates/skillsets/mmp/tools/meeting_attest_skill.rb +19 -0
- data/templates/skillsets/mmp/tools/meeting_browse.rb +11 -1
- data/templates/skillsets/mmp/tools/meeting_check_freshness.rb +12 -2
- data/templates/skillsets/mmp/tools/meeting_connect.rb +12 -1
- data/templates/skillsets/mmp/tools/meeting_get_skill_details.rb +11 -1
- data/templates/skillsets/mmp/tools/meeting_preview_skill.rb +11 -1
- metadata +33 -4
- data/templates/knowledge/multi_llm_design_review/multi_llm_design_review.md +0 -398
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: e63245a9c8dd3d8e79b83ad1eba7697cbf2075832b1afbae7080c4b1dea0e0de
|
|
4
|
+
data.tar.gz: 56ec7236649c644bd73fccbe35989640ac169831e8300531821c46ff01805475
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: cb2ddc02ed24e10dcd76e6cf6f8149588b135011b8e777c52324961fc06ecb44a6810c1b97cdedda66bbcc8bcca484400a7f6a245dfbe7c1b831b7b2b0b04003
|
|
7
|
+
data.tar.gz: 455843011cef593d7ff7693d66f22be19dad14f70fe5b26cccc131cae903362089e35367e4410ba91138a9822b9b8adb6823f7ca98420bb6d9811e8f5233f106
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,71 @@ All notable changes to the `kairos-chain` gem will be documented in this file.
|
|
|
4
4
|
|
|
5
5
|
This project follows [Semantic Versioning](https://semver.org/).
|
|
6
6
|
|
|
7
|
+
## [3.6.0] - 2026-03-28
|
|
8
|
+
|
|
9
|
+
### Added
|
|
10
|
+
|
|
11
|
+
- **Agent SkillSet** — OODA cognitive loop for autonomous task execution
|
|
12
|
+
- `agent_start`: Initialize agent session with mandate and goal
|
|
13
|
+
- `agent_step`: Execute one OODA cycle (Observe → Orient → Decide → Act via autoexec)
|
|
14
|
+
- `agent_status`: View cycle history and active mandates
|
|
15
|
+
- `agent_stop`: End agent session with reflection
|
|
16
|
+
- Cumulative progress file (`progress.jsonl`) for cross-cycle continuity
|
|
17
|
+
- Loop detection via decision_payload summary comparison
|
|
18
|
+
- Multi-cycle mandate progression with checkpoint
|
|
19
|
+
- 90 tests across M1-M4 milestones
|
|
20
|
+
|
|
21
|
+
- **mcp_client SkillSet** — Connect to external MCP servers as a client
|
|
22
|
+
- `mcp_connect`: Establish connection to remote MCP server (HTTP JSON-RPC)
|
|
23
|
+
- `mcp_disconnect`: Close connection and unregister proxy tools
|
|
24
|
+
- `mcp_list_remote`: List available tools on connected server
|
|
25
|
+
- `ProxyTool`: Dynamic tool proxying with namespace prefixing
|
|
26
|
+
- `ConnectionManager`: Singleton with lifecycle management
|
|
27
|
+
- Dual blacklist (Agent + InvocationContext) for security
|
|
28
|
+
- ORIENT_TOOLS integration for Agent SkillSet awareness
|
|
29
|
+
- 25 tests (Client 6, ConnectionManager 7, ProxyTool 4, Registry 3, E2E 5)
|
|
30
|
+
|
|
31
|
+
- **Attestation Nudge** (MMP SkillSet) — Proactive attestation prompts
|
|
32
|
+
- Tracks usage of acquired skills, suggests attestation after threshold
|
|
33
|
+
- `register_gate(:attestation_nudge)` passive observer (zero L0 changes)
|
|
34
|
+
- Gate detects `resource_read`/`knowledge_get` access to received skills
|
|
35
|
+
- In-memory tool_name/file_path indexes for O(1) gate miss path
|
|
36
|
+
- `flock(LOCK_EX)` atomic JSON file updates
|
|
37
|
+
- Time-window throttling: `cooldown_hours` + `nudge_interval_hours`
|
|
38
|
+
- Passive decline: nudge emission starts cooldown
|
|
39
|
+
- Nudge footer on 5 MMP tools (browse, connect, details, preview, freshness)
|
|
40
|
+
- `sanitize_for_display` for remote metadata in nudge messages
|
|
41
|
+
- 39 tests, 4 rounds of multi-LLM review (3/3 APPROVE including Codex)
|
|
42
|
+
|
|
43
|
+
- **InvocationContext** — Tool invocation chain tracking
|
|
44
|
+
- Depth limiting, caller tracking, mandate propagation
|
|
45
|
+
- Whitelist/blacklist policy enforcement at registry boundary
|
|
46
|
+
- `derive` method for Agent SkillSet tool_names extraction
|
|
47
|
+
- 59 tests
|
|
48
|
+
|
|
49
|
+
### Changed
|
|
50
|
+
|
|
51
|
+
- **L1 Knowledge Consolidation** (4 → 3 skills):
|
|
52
|
+
- `multi_llm_review_workflow` v3.1: merged with `multi_llm_design_review` (methodology + CLI execution in single skill)
|
|
53
|
+
- `multi_llm_reviewer_evaluation` v1.1: Codex convergence behavior data, APPROVE signal reliability
|
|
54
|
+
- `design_to_implementation_workflow` v1.1: self-review phase, implementation review phase, Persona Assembly merge gate
|
|
55
|
+
- Deleted: `multi_llm_design_review` (absorbed into `multi_llm_review_workflow`)
|
|
56
|
+
- Self-referential review: v3.0 reviewed by its own multi-LLM process → v3.1
|
|
57
|
+
|
|
58
|
+
- **meeting_attest_skill**: Fail-closed when `content_hash` is nil (previously fail-open)
|
|
59
|
+
|
|
60
|
+
- **autoexec**: Enhanced `task_dsl` and `plan_store` for Agent SkillSet integration
|
|
61
|
+
|
|
62
|
+
### Fixed
|
|
63
|
+
|
|
64
|
+
- **Phase 4 review fixes**: Notification method, restore hook, race condition, stale proxy
|
|
65
|
+
- **Mandate save race**: Single atomic save (no update_status then stale save)
|
|
66
|
+
- **Attestation Nudge race condition**: `rebuild_indexes_from(data)` inside `with_locked_data`
|
|
67
|
+
- **Attestation Nudge index staleness**: `mark_attested` rebuilds indexes
|
|
68
|
+
- **Attestation Nudge JSON recovery**: `with_locked_data` recovers from corrupted JSON
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
7
72
|
## [3.5.0] - 2026-03-27
|
|
8
73
|
|
|
9
74
|
### Added
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'securerandom'
|
|
4
|
+
|
|
5
|
+
module KairosMcp
|
|
6
|
+
# Tracks invocation chain metadata for internal tool-to-tool calls.
|
|
7
|
+
# Carries depth, caller, mandate, and policy (whitelist/blacklist) through
|
|
8
|
+
# the entire invocation chain. Created by BaseTool#invoke_tool, threaded
|
|
9
|
+
# through ToolRegistry#call_tool.
|
|
10
|
+
class InvocationContext
|
|
11
|
+
MAX_DEPTH = 10
|
|
12
|
+
|
|
13
|
+
attr_reader :depth, :caller_tool, :mandate_id, :token_budget,
|
|
14
|
+
:whitelist, :blacklist, :root_invocation_id
|
|
15
|
+
|
|
16
|
+
def initialize(depth: 0, caller_tool: nil, mandate_id: nil,
|
|
17
|
+
token_budget: nil, whitelist: nil, blacklist: nil,
|
|
18
|
+
root_invocation_id: nil)
|
|
19
|
+
@depth = depth
|
|
20
|
+
@caller_tool = caller_tool
|
|
21
|
+
@mandate_id = mandate_id
|
|
22
|
+
@token_budget = token_budget
|
|
23
|
+
@whitelist = whitelist
|
|
24
|
+
@blacklist = blacklist
|
|
25
|
+
@root_invocation_id = root_invocation_id || SecureRandom.hex(8)
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
# Create a child context for a nested invocation.
|
|
29
|
+
# Inherits all policy from the parent; increments depth.
|
|
30
|
+
def child(caller_tool:)
|
|
31
|
+
raise DepthExceededError, "Max invocation depth (#{MAX_DEPTH}) exceeded" if @depth >= MAX_DEPTH
|
|
32
|
+
|
|
33
|
+
self.class.new(
|
|
34
|
+
depth: @depth + 1,
|
|
35
|
+
caller_tool: caller_tool,
|
|
36
|
+
mandate_id: @mandate_id,
|
|
37
|
+
token_budget: @token_budget,
|
|
38
|
+
whitelist: @whitelist&.dup,
|
|
39
|
+
blacklist: @blacklist&.dup,
|
|
40
|
+
root_invocation_id: @root_invocation_id
|
|
41
|
+
)
|
|
42
|
+
end
|
|
43
|
+
|
|
44
|
+
# Derive a new context with modified blacklist, preserving all other fields.
|
|
45
|
+
# Used by agent ACT phase to selectively unblock autoexec tools.
|
|
46
|
+
# Does NOT increment depth — child() does that at invoke_tool time.
|
|
47
|
+
def derive(blacklist_remove: [], blacklist_add: [])
|
|
48
|
+
new_blacklist = Array(@blacklist).dup
|
|
49
|
+
blacklist_remove.each { |pat| new_blacklist.delete(pat) }
|
|
50
|
+
blacklist_add.each { |pat| new_blacklist << pat unless new_blacklist.include?(pat) }
|
|
51
|
+
|
|
52
|
+
self.class.new(
|
|
53
|
+
depth: @depth,
|
|
54
|
+
caller_tool: @caller_tool,
|
|
55
|
+
mandate_id: @mandate_id,
|
|
56
|
+
token_budget: @token_budget,
|
|
57
|
+
whitelist: @whitelist&.dup,
|
|
58
|
+
blacklist: new_blacklist.empty? ? nil : new_blacklist,
|
|
59
|
+
root_invocation_id: @root_invocation_id
|
|
60
|
+
)
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
# Serialize to a plain Hash for passing through tool arguments.
|
|
64
|
+
# Only includes policy-relevant fields (whitelist, blacklist, mandate_id, token_budget).
|
|
65
|
+
def to_h
|
|
66
|
+
{
|
|
67
|
+
'whitelist' => @whitelist,
|
|
68
|
+
'blacklist' => @blacklist,
|
|
69
|
+
'mandate_id' => @mandate_id,
|
|
70
|
+
'token_budget' => @token_budget
|
|
71
|
+
}
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
def to_json(*args)
|
|
75
|
+
require 'json'
|
|
76
|
+
to_h.to_json(*args)
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
# Reconstruct policy from a Hash (e.g., parsed from tool arguments).
|
|
80
|
+
# Only restores policy fields — depth and caller are not transferred.
|
|
81
|
+
def self.from_h(hash)
|
|
82
|
+
return nil if hash.nil?
|
|
83
|
+
|
|
84
|
+
new(
|
|
85
|
+
whitelist: hash['whitelist'],
|
|
86
|
+
blacklist: hash['blacklist'],
|
|
87
|
+
mandate_id: hash['mandate_id'],
|
|
88
|
+
token_budget: hash['token_budget']
|
|
89
|
+
)
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def self.from_json(json_string)
|
|
93
|
+
require 'json'
|
|
94
|
+
from_h(JSON.parse(json_string))
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
# Check if a tool is allowed by whitelist/blacklist policy.
|
|
98
|
+
# Blacklist is checked first (deny wins). Both use fnmatch patterns.
|
|
99
|
+
# For namespaced tools (e.g., "peer1/agent_start"), also checks
|
|
100
|
+
# the bare name ("agent_start") to prevent blacklist bypass via
|
|
101
|
+
# remote proxy tool namespace prefix.
|
|
102
|
+
def allowed?(tool_name)
|
|
103
|
+
names = [tool_name]
|
|
104
|
+
names << tool_name.split('/').last if tool_name.include?('/')
|
|
105
|
+
|
|
106
|
+
if @blacklist
|
|
107
|
+
return false if names.any? { |n| @blacklist.any? { |pat| File.fnmatch(pat, n) } }
|
|
108
|
+
end
|
|
109
|
+
if @whitelist
|
|
110
|
+
return names.any? { |n| @whitelist.any? { |pat| File.fnmatch(pat, n) } }
|
|
111
|
+
end
|
|
112
|
+
true
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
class DepthExceededError < StandardError; end
|
|
116
|
+
class PolicyDeniedError < StandardError; end
|
|
117
|
+
end
|
|
118
|
+
end
|
data/lib/kairos_mcp/protocol.rb
CHANGED
|
@@ -168,9 +168,10 @@ module KairosMcp
|
|
|
168
168
|
end
|
|
169
169
|
|
|
170
170
|
def handle_tools_list
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
}
|
|
171
|
+
# Filter namespaced proxy tools (e.g., "peer1/tool") from external clients
|
|
172
|
+
# to prevent infinite proxy loops. Internal call_tool/tool_exists? still sees them.
|
|
173
|
+
tools = @tool_registry.list_tools.reject { |t| t[:name].to_s.include?('/') }
|
|
174
|
+
{ tools: tools }
|
|
174
175
|
end
|
|
175
176
|
|
|
176
177
|
def handle_tools_call(params)
|
|
@@ -5,8 +5,8 @@ module KairosMcp
|
|
|
5
5
|
# Adapter that wraps a Skill with tool_config as an MCP Tool
|
|
6
6
|
# This allows skills defined in kairos.rb to be exposed as MCP tools
|
|
7
7
|
class SkillToolAdapter < Tools::BaseTool
|
|
8
|
-
def initialize(skill, safety = nil)
|
|
9
|
-
super(safety)
|
|
8
|
+
def initialize(skill, safety = nil, registry: nil)
|
|
9
|
+
super(safety, registry: registry)
|
|
10
10
|
@skill = skill
|
|
11
11
|
@tool_config = skill.tool_config
|
|
12
12
|
end
|
|
@@ -130,6 +130,9 @@ module KairosMcp
|
|
|
130
130
|
|
|
131
131
|
# Skill-based tools (from kairos.rb with tool block)
|
|
132
132
|
register_skill_tools if skill_tools_enabled?
|
|
133
|
+
|
|
134
|
+
# Restore dynamic proxy tools from active mcp_client connections (Phase 4)
|
|
135
|
+
restore_dynamic_tools
|
|
133
136
|
end
|
|
134
137
|
|
|
135
138
|
# Register tools from enabled SkillSets
|
|
@@ -154,26 +157,11 @@ module KairosMcp
|
|
|
154
157
|
|
|
155
158
|
Kairos.skills.each do |skill|
|
|
156
159
|
next unless skill.has_tool? # Only skills with tool block and executor
|
|
157
|
-
adapter = SkillToolAdapter.new(skill, @safety)
|
|
160
|
+
adapter = SkillToolAdapter.new(skill, @safety, registry: self)
|
|
158
161
|
register(adapter)
|
|
159
162
|
end
|
|
160
163
|
end
|
|
161
164
|
|
|
162
|
-
def skill_tools_enabled?
|
|
163
|
-
SkillsConfig.load['skill_tools_enabled'] == true
|
|
164
|
-
end
|
|
165
|
-
|
|
166
|
-
def register_if_defined(class_name)
|
|
167
|
-
klass = Object.const_get(class_name)
|
|
168
|
-
register(klass.new(@safety))
|
|
169
|
-
rescue NameError
|
|
170
|
-
# Class not defined yet (file might not exist), ignore
|
|
171
|
-
end
|
|
172
|
-
|
|
173
|
-
def register(tool)
|
|
174
|
-
@tools[tool.name] = tool
|
|
175
|
-
end
|
|
176
|
-
|
|
177
165
|
def set_workspace(roots)
|
|
178
166
|
@safety.set_workspace(roots)
|
|
179
167
|
end
|
|
@@ -182,16 +170,71 @@ module KairosMcp
|
|
|
182
170
|
@tools.values.map(&:to_schema)
|
|
183
171
|
end
|
|
184
172
|
|
|
185
|
-
|
|
173
|
+
# Register a pre-built tool instance (e.g., proxy tools from mcp_client).
|
|
174
|
+
# Cannot overwrite local (non-proxy) tools to prevent accidental replacement.
|
|
175
|
+
def register_dynamic_tool(tool_instance)
|
|
176
|
+
name = tool_instance.name
|
|
177
|
+
existing = @tools[name]
|
|
178
|
+
if existing && !existing.respond_to?(:remote_name)
|
|
179
|
+
raise "Cannot override local tool '#{name}' with dynamic registration"
|
|
180
|
+
end
|
|
181
|
+
@tools[name] = tool_instance
|
|
182
|
+
end
|
|
183
|
+
|
|
184
|
+
# Remove a dynamically registered tool (e.g., on mcp_disconnect).
|
|
185
|
+
def unregister_tool(name)
|
|
186
|
+
@tools.delete(name)
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
def call_tool(name, arguments, invocation_context: nil)
|
|
186
190
|
tool = @tools[name]
|
|
187
191
|
unless tool
|
|
188
192
|
raise "Tool not found: #{name}"
|
|
189
193
|
end
|
|
190
194
|
|
|
195
|
+
# Defense-in-depth: enforce invocation policy at the registry boundary.
|
|
196
|
+
# This duplicates the check in BaseTool#invoke_tool so that direct
|
|
197
|
+
# call_tool calls with a context also respect whitelist/blacklist.
|
|
198
|
+
if invocation_context && !invocation_context.allowed?(name)
|
|
199
|
+
raise InvocationContext::PolicyDeniedError,
|
|
200
|
+
"Tool '#{name}' blocked by invocation policy at registry boundary"
|
|
201
|
+
end
|
|
202
|
+
|
|
191
203
|
self.class.run_gates(name, arguments, @safety)
|
|
192
204
|
tool.call(arguments)
|
|
193
205
|
rescue GateDeniedError => e
|
|
194
206
|
[{ type: 'text', text: JSON.pretty_generate({ error: 'forbidden', message: e.message }) }]
|
|
207
|
+
rescue InvocationContext::DepthExceededError, InvocationContext::PolicyDeniedError => e
|
|
208
|
+
[{ type: 'text', text: JSON.pretty_generate({ error: 'invocation_denied', message: e.message }) }]
|
|
209
|
+
end
|
|
210
|
+
|
|
211
|
+
private
|
|
212
|
+
|
|
213
|
+
def skill_tools_enabled?
|
|
214
|
+
SkillsConfig.load['skill_tools_enabled'] == true
|
|
215
|
+
end
|
|
216
|
+
|
|
217
|
+
def register_if_defined(class_name)
|
|
218
|
+
klass = Object.const_get(class_name)
|
|
219
|
+
register(klass.new(@safety, registry: self))
|
|
220
|
+
rescue NameError
|
|
221
|
+
# Class not defined yet (file might not exist), ignore
|
|
222
|
+
end
|
|
223
|
+
|
|
224
|
+
def register(tool)
|
|
225
|
+
@tools[tool.name] = tool
|
|
226
|
+
end
|
|
227
|
+
|
|
228
|
+
# Restore dynamic proxy tools from active mcp_client connections.
|
|
229
|
+
# Called at the end of register_tools so that HTTP-mode registries
|
|
230
|
+
# (which are recreated per request) pick up existing connections.
|
|
231
|
+
def restore_dynamic_tools
|
|
232
|
+
return unless defined?(KairosMcp::SkillSets::McpClient::ConnectionManager)
|
|
233
|
+
|
|
234
|
+
conn_mgr = KairosMcp::SkillSets::McpClient::ConnectionManager.instance
|
|
235
|
+
conn_mgr.restore_proxy_tools(self, @safety)
|
|
236
|
+
rescue StandardError
|
|
237
|
+
nil # mcp_client SkillSet may not be loaded
|
|
195
238
|
end
|
|
196
239
|
end
|
|
197
240
|
end
|
|
@@ -1,8 +1,28 @@
|
|
|
1
|
+
require_relative '../invocation_context'
|
|
2
|
+
|
|
1
3
|
module KairosMcp
|
|
2
4
|
module Tools
|
|
3
5
|
class BaseTool
|
|
4
|
-
def initialize(safety = nil)
|
|
6
|
+
def initialize(safety = nil, registry: nil)
|
|
5
7
|
@safety = safety
|
|
8
|
+
@registry = registry
|
|
9
|
+
end
|
|
10
|
+
|
|
11
|
+
# Invoke another tool through the same ToolRegistry, preserving the
|
|
12
|
+
# full gate pipeline and invocation policy (whitelist/blacklist/depth).
|
|
13
|
+
# Only available when the tool was registered with a registry reference.
|
|
14
|
+
def invoke_tool(tool_name, arguments = {}, context: nil)
|
|
15
|
+
raise "Tool invocation not available (no registry)" unless @registry
|
|
16
|
+
|
|
17
|
+
ctx = context || InvocationContext.new
|
|
18
|
+
child_ctx = ctx.child(caller_tool: name)
|
|
19
|
+
|
|
20
|
+
unless child_ctx.allowed?(tool_name)
|
|
21
|
+
raise InvocationContext::PolicyDeniedError,
|
|
22
|
+
"Tool '#{tool_name}' blocked by invocation policy (caller: #{name})"
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
@registry.call_tool(tool_name, arguments, invocation_context: child_ctx)
|
|
6
26
|
end
|
|
7
27
|
|
|
8
28
|
def name
|
data/lib/kairos_mcp/version.rb
CHANGED
data/templates/knowledge/design_to_implementation_workflow/design_to_implementation_workflow.md
ADDED
|
@@ -0,0 +1,196 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: design_to_implementation_workflow
|
|
3
|
+
description: "Full-lifecycle workflow for complex features: design review, self-review, implementation review, and final merge gate. Derived from Service Grant + Attestation Nudge experiments."
|
|
4
|
+
version: "1.1"
|
|
5
|
+
tags:
|
|
6
|
+
- workflow
|
|
7
|
+
- implementation
|
|
8
|
+
- multi-llm
|
|
9
|
+
- design-review
|
|
10
|
+
- methodology
|
|
11
|
+
- self-review
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Design-to-Implementation Workflow
|
|
15
|
+
|
|
16
|
+
## Overview
|
|
17
|
+
|
|
18
|
+
A structured workflow for implementing complex features (Tier 2+) that maximizes
|
|
19
|
+
quality through multiple review checkpoints. Each checkpoint finds categorically
|
|
20
|
+
different bugs.
|
|
21
|
+
|
|
22
|
+
## Full Lifecycle Model (v1.1)
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
26
|
+
│ DESIGN PHASE │
|
|
27
|
+
│ │
|
|
28
|
+
│ Draft v0.1 ──→ Multi-LLM Review R1 ──→ Fix ──→ v0.2 │
|
|
29
|
+
│ (structural gaps) │
|
|
30
|
+
│ │
|
|
31
|
+
│ v0.2 ──→ Multi-LLM Review R2 ──→ Fix ──→ v0.3 │
|
|
32
|
+
│ (fix correctness) │
|
|
33
|
+
│ │
|
|
34
|
+
│ Convergence: 0 FAIL, 2/3+ APPROVE │
|
|
35
|
+
├─────────────────────────────────────────────────────────────┤
|
|
36
|
+
│ IMPLEMENTATION PHASE │
|
|
37
|
+
│ │
|
|
38
|
+
│ Implement from v0.3 ──→ Tests pass │
|
|
39
|
+
│ │
|
|
40
|
+
│ Self-Review (Agent subagent) ──→ Fix P0/P1 │
|
|
41
|
+
│ (race conditions, edge cases, code quality) │
|
|
42
|
+
│ │
|
|
43
|
+
│ Tests pass again │
|
|
44
|
+
├─────────────────────────────────────────────────────────────┤
|
|
45
|
+
│ VERIFICATION PHASE │
|
|
46
|
+
│ │
|
|
47
|
+
│ Multi-LLM Implementation Review ──→ Fix │
|
|
48
|
+
│ (missing wiring, fail-open, integration gaps) │
|
|
49
|
+
│ │
|
|
50
|
+
│ Final Multi-LLM Review + Persona Assembly │
|
|
51
|
+
│ (merge gate: 3/3 APPROVE = merge-ready) │
|
|
52
|
+
└─────────────────────────────────────────────────────────────┘
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## When to Use This Workflow
|
|
56
|
+
|
|
57
|
+
| Tier | Scope | Design Review | Self-Review | Impl Review | Final Review |
|
|
58
|
+
|------|-------|--------------|-------------|-------------|--------------|
|
|
59
|
+
| 1 | Single file, known pattern | Skip | Optional | Skip | Skip |
|
|
60
|
+
| 2 | Multi-file, SkillSet feature | 1-2 rounds | Recommended | 1 round | Optional |
|
|
61
|
+
| 3 | Cross-component, new subsystem | 2-3 rounds | Required | 1 round | Required |
|
|
62
|
+
| 3+ | Security-critical | 2-3 rounds | Required | 1 round | Required + Persona Assembly |
|
|
63
|
+
|
|
64
|
+
## Phase Details
|
|
65
|
+
|
|
66
|
+
### Design Phase
|
|
67
|
+
|
|
68
|
+
#### Solo Design (v0.1)
|
|
69
|
+
- Single LLM (Opus-class) produces initial design
|
|
70
|
+
- Include: architecture, component design, schema, error handling, phase boundaries
|
|
71
|
+
- Output: Complete design document with pseudocode
|
|
72
|
+
|
|
73
|
+
#### Multi-LLM Review Rounds
|
|
74
|
+
- **3 reviewers**: Claude Opus 4.6 + Codex GPT-5.4 + Composer-2
|
|
75
|
+
- **Convergence criteria**: 0 FAIL, 2/3+ APPROVE
|
|
76
|
+
- **Typical rounds**: 2-3 for Tier 3 complexity
|
|
77
|
+
- **Convergence curve**:
|
|
78
|
+
- R1: Structural gaps — "this is missing" (existence)
|
|
79
|
+
- R2: Fix correctness — "the fix is wrong" (accuracy)
|
|
80
|
+
- R3: Refinement — "minor adjustments" (polish)
|
|
81
|
+
|
|
82
|
+
### Implementation Phase
|
|
83
|
+
|
|
84
|
+
#### Implementation
|
|
85
|
+
- Single Opus-class LLM for context preservation
|
|
86
|
+
- Follow design document's phase ordering
|
|
87
|
+
- Implement → test within each component before moving to next
|
|
88
|
+
|
|
89
|
+
#### Self-Review (NEW in v1.1)
|
|
90
|
+
|
|
91
|
+
Before requesting external multi-LLM review, run a self-review using an Agent subagent:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
Agent(subagent_type: "general-purpose"):
|
|
95
|
+
"Review [file] for bugs, race conditions, edge cases,
|
|
96
|
+
test coverage gaps. Categorize as P0/P1/P2."
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
**Why self-review matters**:
|
|
100
|
+
- Finds P0 bugs cheaply (no external LLM cost)
|
|
101
|
+
- Catches implementation-level issues design review can't see
|
|
102
|
+
- Example: P0 race condition in `rebuild_indexes` (unlocked file read) — found by self-review, invisible to design review
|
|
103
|
+
|
|
104
|
+
**What self-review finds** (confirmed in Attestation Nudge session):
|
|
105
|
+
- Race conditions in file I/O patterns
|
|
106
|
+
- Index staleness after state transitions
|
|
107
|
+
- Missing error recovery paths (corrupted JSON)
|
|
108
|
+
- Test coverage gaps for edge cases
|
|
109
|
+
|
|
110
|
+
### Verification Phase
|
|
111
|
+
|
|
112
|
+
#### Implementation Review (NEW in v1.1)
|
|
113
|
+
|
|
114
|
+
After self-review fixes, run full multi-LLM review of the **implemented code** (not design doc):
|
|
115
|
+
|
|
116
|
+
**Key difference from design review**: Implementation review finds **categorically different bugs**:
|
|
117
|
+
|
|
118
|
+
| Design Review Finds | Implementation Review Finds |
|
|
119
|
+
|--------------------|-----------------------------|
|
|
120
|
+
| "This API doesn't exist" | "This method has no call site" |
|
|
121
|
+
| "The key model is inconsistent" | "The fail-open path is exploitable" |
|
|
122
|
+
| "Session concept is undefined" | "The return type doesn't match the guard" |
|
|
123
|
+
|
|
124
|
+
**Attestation Nudge data point**:
|
|
125
|
+
- Design review: 8 findings across 2 rounds (structural + correctness)
|
|
126
|
+
- Implementation review: 5 findings in 1 round (wiring + integration)
|
|
127
|
+
- **Zero overlap** between design and implementation findings
|
|
128
|
+
|
|
129
|
+
#### Final Review + Persona Assembly
|
|
130
|
+
|
|
131
|
+
For Tier 3+ or pre-merge gates:
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
Claude Persona Assembly (4 personas):
|
|
135
|
+
Kairos — Philosophical alignment, layer boundaries
|
|
136
|
+
Guardian — Security, fail-safe behavior, flock correctness
|
|
137
|
+
Pragmatist — Code quality, test coverage, performance
|
|
138
|
+
Skeptic — What breaks first? Scale? Silent failures?
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
**When to use Persona Assembly**:
|
|
142
|
+
- Final merge gate for Tier 3+ features
|
|
143
|
+
- Safety-critical components
|
|
144
|
+
- NOT for intermediate rounds (diminishing returns)
|
|
145
|
+
|
|
146
|
+
**Merge criteria**: 3/3 APPROVE with 0 FAIL. Codex APPROVE is the strongest signal (see `multi_llm_reviewer_evaluation`).
|
|
147
|
+
|
|
148
|
+
## Effort Level Selection
|
|
149
|
+
|
|
150
|
+
| Phase | Effort | Rationale |
|
|
151
|
+
|-------|--------|-----------|
|
|
152
|
+
| Design review | High | Maximize gap detection |
|
|
153
|
+
| Implementation | Medium | Design is detailed; faithful translation |
|
|
154
|
+
| Self-review | Low | Quick Agent pass, fix obvious issues |
|
|
155
|
+
| Implementation review | High | Find wiring/integration bugs |
|
|
156
|
+
| Final review | High | Merge gate with Persona Assembly |
|
|
157
|
+
|
|
158
|
+
## Tool Usage During Implementation
|
|
159
|
+
|
|
160
|
+
| Tool | Purpose | Timing |
|
|
161
|
+
|------|---------|--------|
|
|
162
|
+
| knowledge_get (L1) | Load domain context | Session start |
|
|
163
|
+
| context_save (L2) | Save session progress | Session end / milestone |
|
|
164
|
+
| Agent (subagent) | Self-review | After implementation, before external review |
|
|
165
|
+
|
|
166
|
+
### What NOT to Use During Implementation
|
|
167
|
+
- **Autonomos**: Overhead of observe/orient/decide is wasteful when design document
|
|
168
|
+
already serves as roadmap. Save for exploratory phases.
|
|
169
|
+
- **autoexec**: Designed for structured JSON step plans, not free-form coding
|
|
170
|
+
- **Agent team**: Context fragmentation across agents. Single LLM preserves
|
|
171
|
+
cross-component coherence for tightly-coupled implementations.
|
|
172
|
+
|
|
173
|
+
## Convergence Data
|
|
174
|
+
|
|
175
|
+
### Service Grant (Tier 3, 2026-03-18)
|
|
176
|
+
- Design: v1.0 → v1.4, 3 review rounds, 3 LLMs
|
|
177
|
+
- Design review findings: R1: 8 P0/P1, R2: 2 FAIL + 28 CONCERN, R3: 0 FAIL
|
|
178
|
+
- Implementation: Phase 0-3, 2 rounds implementation review
|
|
179
|
+
- Total bugs found: 8 (design) + 13 (implementation review) + 2 (during coding)
|
|
180
|
+
|
|
181
|
+
### Attestation Nudge (Tier 2, 2026-03-28)
|
|
182
|
+
- Design: v0.1 → v0.3, 2 review rounds, 3 LLMs
|
|
183
|
+
- Self-review: 4 fixes (P0-1 race, P1-4 staleness, P1-6 test gap, P2-2 recovery)
|
|
184
|
+
- Implementation review: 3 fixes (missing call site, fail-open attest, escaping)
|
|
185
|
+
- Final review (Persona Assembly): 0 FAIL, 3/3 APPROVE
|
|
186
|
+
- **Codex convergence**: REJECT → REJECT → REJECT → APPROVE (4 rounds)
|
|
187
|
+
|
|
188
|
+
## Anti-Patterns
|
|
189
|
+
|
|
190
|
+
- Implementing Phase 2+ when Phase 1 prerequisites aren't met
|
|
191
|
+
- Using agent team for implementation (context fragmentation)
|
|
192
|
+
- Skipping self-review (misses cheap P0 fixes)
|
|
193
|
+
- Skipping implementation review (design review can't find wiring bugs)
|
|
194
|
+
- Treating Codex REJECT as "too strict" without investigating (usually substantive)
|
|
195
|
+
- Using Persona Assembly in every round (diminishing returns; save for final gate)
|
|
196
|
+
- Implementing without design review for Tier 3 complexity ("just implement it")
|