@rune-kit/rune 2.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +357 -0
- package/agents/.gitkeep +0 -0
- package/agents/architect.md +29 -0
- package/agents/asset-creator.md +11 -0
- package/agents/audit.md +11 -0
- package/agents/autopsy.md +11 -0
- package/agents/brainstorm.md +11 -0
- package/agents/browser-pilot.md +11 -0
- package/agents/coder.md +29 -0
- package/agents/completion-gate.md +11 -0
- package/agents/constraint-check.md +11 -0
- package/agents/context-engine.md +11 -0
- package/agents/cook.md +11 -0
- package/agents/db.md +11 -0
- package/agents/debug.md +11 -0
- package/agents/dependency-doctor.md +11 -0
- package/agents/deploy.md +11 -0
- package/agents/design.md +11 -0
- package/agents/docs-seeker.md +11 -0
- package/agents/fix.md +11 -0
- package/agents/hallucination-guard.md +11 -0
- package/agents/incident.md +11 -0
- package/agents/integrity-check.md +11 -0
- package/agents/journal.md +11 -0
- package/agents/launch.md +11 -0
- package/agents/logic-guardian.md +11 -0
- package/agents/marketing.md +11 -0
- package/agents/onboard.md +11 -0
- package/agents/perf.md +11 -0
- package/agents/plan.md +11 -0
- package/agents/preflight.md +11 -0
- package/agents/problem-solver.md +11 -0
- package/agents/rescue.md +11 -0
- package/agents/research.md +11 -0
- package/agents/researcher.md +29 -0
- package/agents/review-intake.md +11 -0
- package/agents/review.md +11 -0
- package/agents/reviewer.md +28 -0
- package/agents/safeguard.md +11 -0
- package/agents/sast.md +11 -0
- package/agents/scanner.md +28 -0
- package/agents/scope-guard.md +11 -0
- package/agents/scout.md +11 -0
- package/agents/sentinel.md +11 -0
- package/agents/sequential-thinking.md +11 -0
- package/agents/session-bridge.md +11 -0
- package/agents/skill-forge.md +11 -0
- package/agents/skill-router.md +11 -0
- package/agents/surgeon.md +11 -0
- package/agents/team.md +11 -0
- package/agents/test.md +11 -0
- package/agents/trend-scout.md +11 -0
- package/agents/verification.md +11 -0
- package/agents/video-creator.md +11 -0
- package/agents/watchdog.md +11 -0
- package/agents/worktree.md +11 -0
- package/commands/.gitkeep +0 -0
- package/commands/rune.md +168 -0
- package/compiler/__tests__/openclaw-adapter.test.js +140 -0
- package/compiler/__tests__/parser.test.js +55 -0
- package/compiler/adapters/antigravity.js +59 -0
- package/compiler/adapters/claude.js +37 -0
- package/compiler/adapters/cursor.js +67 -0
- package/compiler/adapters/generic.js +60 -0
- package/compiler/adapters/index.js +45 -0
- package/compiler/adapters/openclaw.js +150 -0
- package/compiler/adapters/windsurf.js +60 -0
- package/compiler/bin/rune.js +288 -0
- package/compiler/doctor.js +153 -0
- package/compiler/emitter.js +240 -0
- package/compiler/parser.js +208 -0
- package/compiler/transformer.js +69 -0
- package/compiler/transforms/branding.js +27 -0
- package/compiler/transforms/cross-references.js +29 -0
- package/compiler/transforms/frontmatter.js +38 -0
- package/compiler/transforms/hooks.js +68 -0
- package/compiler/transforms/subagents.js +36 -0
- package/compiler/transforms/tool-names.js +60 -0
- package/contexts/dev.md +34 -0
- package/contexts/research.md +43 -0
- package/contexts/review.md +55 -0
- package/extensions/ai-ml/PACK.md +517 -0
- package/extensions/analytics/PACK.md +557 -0
- package/extensions/backend/PACK.md +678 -0
- package/extensions/chrome-ext/PACK.md +995 -0
- package/extensions/content/PACK.md +381 -0
- package/extensions/devops/PACK.md +520 -0
- package/extensions/ecommerce/PACK.md +280 -0
- package/extensions/gamedev/PACK.md +393 -0
- package/extensions/mobile/PACK.md +273 -0
- package/extensions/saas/PACK.md +805 -0
- package/extensions/security/PACK.md +536 -0
- package/extensions/trading/PACK.md +597 -0
- package/extensions/ui/PACK.md +947 -0
- package/package.json +47 -0
- package/skills/.gitkeep +0 -0
- package/skills/adversary/SKILL.md +271 -0
- package/skills/asset-creator/SKILL.md +157 -0
- package/skills/audit/SKILL.md +466 -0
- package/skills/autopsy/SKILL.md +200 -0
- package/skills/ba/SKILL.md +279 -0
- package/skills/brainstorm/SKILL.md +266 -0
- package/skills/browser-pilot/SKILL.md +168 -0
- package/skills/completion-gate/SKILL.md +151 -0
- package/skills/constraint-check/SKILL.md +165 -0
- package/skills/context-engine/SKILL.md +176 -0
- package/skills/cook/SKILL.md +636 -0
- package/skills/db/SKILL.md +256 -0
- package/skills/debug/SKILL.md +240 -0
- package/skills/dependency-doctor/SKILL.md +235 -0
- package/skills/deploy/SKILL.md +174 -0
- package/skills/design/DESIGN-REFERENCE.md +365 -0
- package/skills/design/SKILL.md +462 -0
- package/skills/doc-processor/SKILL.md +254 -0
- package/skills/docs/SKILL.md +336 -0
- package/skills/docs-seeker/SKILL.md +166 -0
- package/skills/fix/SKILL.md +192 -0
- package/skills/git/SKILL.md +285 -0
- package/skills/hallucination-guard/SKILL.md +204 -0
- package/skills/incident/SKILL.md +241 -0
- package/skills/integrity-check/SKILL.md +169 -0
- package/skills/journal/SKILL.md +190 -0
- package/skills/launch/SKILL.md +330 -0
- package/skills/logic-guardian/SKILL.md +240 -0
- package/skills/marketing/SKILL.md +229 -0
- package/skills/mcp-builder/SKILL.md +311 -0
- package/skills/onboard/SKILL.md +298 -0
- package/skills/perf/SKILL.md +297 -0
- package/skills/plan/SKILL.md +520 -0
- package/skills/preflight/SKILL.md +231 -0
- package/skills/problem-solver/SKILL.md +284 -0
- package/skills/rescue/SKILL.md +434 -0
- package/skills/research/SKILL.md +122 -0
- package/skills/review/SKILL.md +354 -0
- package/skills/review-intake/SKILL.md +222 -0
- package/skills/safeguard/SKILL.md +188 -0
- package/skills/sast/SKILL.md +190 -0
- package/skills/scaffold/SKILL.md +276 -0
- package/skills/scope-guard/SKILL.md +150 -0
- package/skills/scout/SKILL.md +232 -0
- package/skills/sentinel/SKILL.md +320 -0
- package/skills/sentinel-env/SKILL.md +226 -0
- package/skills/sequential-thinking/SKILL.md +234 -0
- package/skills/session-bridge/SKILL.md +287 -0
- package/skills/skill-forge/SKILL.md +317 -0
- package/skills/skill-router/SKILL.md +267 -0
- package/skills/surgeon/SKILL.md +203 -0
- package/skills/team/SKILL.md +397 -0
- package/skills/test/SKILL.md +271 -0
- package/skills/trend-scout/SKILL.md +145 -0
- package/skills/verification/SKILL.md +201 -0
- package/skills/video-creator/SKILL.md +201 -0
- package/skills/watchdog/SKILL.md +166 -0
- package/skills/worktree/SKILL.md +140 -0
|
@@ -0,0 +1,168 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: browser-pilot
|
|
3
|
+
description: Playwright browser automation. Navigates URLs, takes screenshots, checks accessibility tree, interacts with UI elements, and reports findings.
|
|
4
|
+
metadata:
|
|
5
|
+
author: runedev
|
|
6
|
+
version: "0.2.0"
|
|
7
|
+
layer: L3
|
|
8
|
+
model: sonnet
|
|
9
|
+
group: media
|
|
10
|
+
tools: "Read, Bash, Glob, Grep"
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
# browser-pilot
|
|
14
|
+
|
|
15
|
+
## Purpose
|
|
16
|
+
|
|
17
|
+
Browser automation for testing and verification using MCP Playwright tools. Navigates to URLs, captures accessibility snapshots and screenshots, interacts with UI elements (click, type, fill form), and reports findings with visual evidence.
|
|
18
|
+
|
|
19
|
+
## Called By (inbound)
|
|
20
|
+
|
|
21
|
+
- `test` (L2): e2e and visual testing
|
|
22
|
+
- `deploy` (L2): verify live deployment
|
|
23
|
+
- `debug` (L2): capture browser console errors
|
|
24
|
+
- `marketing` (L2): screenshot for assets
|
|
25
|
+
- `launch` (L1): verify live site after deployment
|
|
26
|
+
- `perf` (L2): Lighthouse / Core Web Vitals measurement
|
|
27
|
+
|
|
28
|
+
## Calls (outbound)
|
|
29
|
+
|
|
30
|
+
None — pure L3 utility using Playwright MCP tools.
|
|
31
|
+
|
|
32
|
+
## Executable Instructions
|
|
33
|
+
|
|
34
|
+
### Step 1: Receive Task
|
|
35
|
+
|
|
36
|
+
Accept input from calling skill:
|
|
37
|
+
- `url` — target URL to open
|
|
38
|
+
- `task` — what to do: `screenshot` | `check_elements` | `fill_form` | `test_flow` | `console_errors`
|
|
39
|
+
- `interactions` — optional list of actions (click X, type Y into Z, etc.)
|
|
40
|
+
|
|
41
|
+
### Step 2: Navigate
|
|
42
|
+
|
|
43
|
+
Open the target URL using the Playwright MCP navigate tool:
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
mcp__plugin_playwright_playwright__browser_navigate({ url: "<url>" })
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
Wait for the page to load. If navigation fails (timeout or error), report UNREACHABLE and stop.
|
|
50
|
+
|
|
51
|
+
### Step 3: Snapshot
|
|
52
|
+
|
|
53
|
+
Capture the accessibility tree to understand page structure:
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
mcp__plugin_playwright_playwright__browser_snapshot()
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Use the snapshot to:
|
|
60
|
+
- Identify interactive elements (buttons, inputs, links)
|
|
61
|
+
- Find specific elements referenced in the task
|
|
62
|
+
- Detect accessibility issues (missing labels, roles)
|
|
63
|
+
|
|
64
|
+
### Step 4: Interact
|
|
65
|
+
|
|
66
|
+
Based on the task, perform interactions using Playwright MCP tools:
|
|
67
|
+
|
|
68
|
+
- **Click**: `mcp__plugin_playwright_playwright__browser_click({ ref: "<ref>", element: "<description>" })`
|
|
69
|
+
- **Type**: `mcp__plugin_playwright_playwright__browser_type({ ref: "<ref>", text: "<value>" })`
|
|
70
|
+
- **Fill form**: `mcp__plugin_playwright_playwright__browser_fill_form({ fields: [...] })`
|
|
71
|
+
- **Navigate back**: `mcp__plugin_playwright_playwright__browser_navigate_back()`
|
|
72
|
+
- **Select option**: `mcp__plugin_playwright_playwright__browser_select_option({ ref: "<ref>", values: [...] })`
|
|
73
|
+
|
|
74
|
+
Limit: max 20 interactions per session. If the task requires more, stop and report partial results.
|
|
75
|
+
|
|
76
|
+
After each interaction, take a new snapshot to verify the result before proceeding.
|
|
77
|
+
|
|
78
|
+
### Step 5: Screenshot
|
|
79
|
+
|
|
80
|
+
Capture visual evidence:
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
mcp__plugin_playwright_playwright__browser_take_screenshot({ type: "png" })
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
For full-page capture (landing pages, long content):
|
|
87
|
+
|
|
88
|
+
```
|
|
89
|
+
mcp__plugin_playwright_playwright__browser_take_screenshot({ type: "png", fullPage: true })
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
Save with a descriptive filename if the `filename` param is supported.
|
|
93
|
+
|
|
94
|
+
### Step 6: Report
|
|
95
|
+
|
|
96
|
+
Compile findings into a structured report:
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
## Browser Report: [url]
|
|
100
|
+
|
|
101
|
+
- **Task**: [task description]
|
|
102
|
+
- **Status**: SUCCESS | PARTIAL | FAILED
|
|
103
|
+
|
|
104
|
+
### Page Info
|
|
105
|
+
- HTTP Status: [status]
|
|
106
|
+
- Load outcome: [loaded | timeout | error]
|
|
107
|
+
|
|
108
|
+
### Accessibility Findings
|
|
109
|
+
- [finding from snapshot — missing labels, broken roles, etc.]
|
|
110
|
+
|
|
111
|
+
### Interaction Log
|
|
112
|
+
- [action taken] → [result: success | element not found | error]
|
|
113
|
+
|
|
114
|
+
### Console Errors
|
|
115
|
+
- [error message — source]
|
|
116
|
+
|
|
117
|
+
### Screenshots
|
|
118
|
+
- [screenshot path or description]
|
|
119
|
+
|
|
120
|
+
### Summary
|
|
121
|
+
- [overall assessment — what works, what failed, any critical issues]
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Step 7: Close
|
|
125
|
+
|
|
126
|
+
Always close the browser when done:
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
mcp__plugin_playwright_playwright__browser_close()
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
This step is mandatory even if earlier steps fail. Use a try-finally pattern in your reasoning.
|
|
133
|
+
|
|
134
|
+
## Output Format
|
|
135
|
+
|
|
136
|
+
Structured Browser Report with task status, page info, accessibility findings, interaction log, console errors, screenshots, and summary. See Step 6 Report above for full template.
|
|
137
|
+
|
|
138
|
+
## Constraints
|
|
139
|
+
|
|
140
|
+
1. MUST close browser when done — Step 7 is non-optional even if earlier steps fail
|
|
141
|
+
2. MUST NOT exceed 20 interactions per session
|
|
142
|
+
3. MUST NOT store credentials or sensitive data in interaction logs
|
|
143
|
+
4. MUST take screenshot evidence before reporting visual findings
|
|
144
|
+
|
|
145
|
+
## Sharp Edges
|
|
146
|
+
|
|
147
|
+
Known failure modes for this skill. Check these before declaring done.
|
|
148
|
+
|
|
149
|
+
| Failure Mode | Severity | Mitigation |
|
|
150
|
+
|---|---|---|
|
|
151
|
+
| Not closing browser when done (including on error) | CRITICAL | Constraint 1: Step 7 browser_close() is mandatory — treat as try-finally |
|
|
152
|
+
| Storing credentials or tokens in interaction logs | HIGH | Constraint 3: redact all sensitive values before logging |
|
|
153
|
+
| Exceeding 20 interactions without stopping and reporting partial | MEDIUM | Constraint 2: stop at 20, report what was tested and what remains |
|
|
154
|
+
| Reporting visual findings without screenshot evidence | MEDIUM | Constraint 4: screenshot before reporting — "looks broken" without screenshot is invalid |
|
|
155
|
+
|
|
156
|
+
## Done When
|
|
157
|
+
|
|
158
|
+
- URL navigated successfully (or UNREACHABLE reported)
|
|
159
|
+
- Page snapshot captured for accessibility context
|
|
160
|
+
- All requested interactions completed (or partial with reason if >20)
|
|
161
|
+
- Screenshot taken as visual evidence
|
|
162
|
+
- Console errors captured if task requested them
|
|
163
|
+
- Browser closed (Step 7 executed)
|
|
164
|
+
- Browser Report emitted with status, findings, and screenshot reference
|
|
165
|
+
|
|
166
|
+
## Cost Profile
|
|
167
|
+
|
|
168
|
+
~500-1500 tokens input, ~300-800 tokens output. Sonnet for interaction logic.
|
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: completion-gate
|
|
3
|
+
description: "Validates agent claims against evidence trail. Catches 'done' without proof, 'tests pass' without output, 'fixed' without verification. Called by cook and team at workflow end."
|
|
4
|
+
user-invocable: false
|
|
5
|
+
metadata:
|
|
6
|
+
author: runedev
|
|
7
|
+
version: "1.1.0"
|
|
8
|
+
layer: L3
|
|
9
|
+
model: haiku
|
|
10
|
+
group: validation
|
|
11
|
+
tools: "Read, Bash, Glob, Grep"
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# completion-gate
|
|
15
|
+
|
|
16
|
+
## Purpose
|
|
17
|
+
|
|
18
|
+
The lie detector for agent claims. Validates that what an agent says it did actually happened — with evidence. Catches the #1 failure mode in AI coding: claiming completion without proof.
|
|
19
|
+
|
|
20
|
+
<HARD-GATE>
|
|
21
|
+
Every claim requires evidence. No evidence = UNCONFIRMED = BLOCK.
|
|
22
|
+
"I ran the tests and they pass" without stdout = UNCONFIRMED.
|
|
23
|
+
"I fixed the bug" without before/after diff = UNCONFIRMED.
|
|
24
|
+
"Build succeeds" without build output = UNCONFIRMED.
|
|
25
|
+
</HARD-GATE>
|
|
26
|
+
|
|
27
|
+
## Triggers
|
|
28
|
+
|
|
29
|
+
- Called by `cook` in Phase 5d (quality gate)
|
|
30
|
+
- Called by `team` before merging stream results
|
|
31
|
+
- Called by any skill that reports "done" to an orchestrator
|
|
32
|
+
- Auto-trigger: when agent says "done", "complete", "fixed", "passing"
|
|
33
|
+
|
|
34
|
+
## Calls (outbound)
|
|
35
|
+
|
|
36
|
+
None — pure validator. Reads evidence, produces verdict.
|
|
37
|
+
|
|
38
|
+
## Called By (inbound)
|
|
39
|
+
|
|
40
|
+
- `cook` (L1): Phase 5d — validate completion claims before commit
|
|
41
|
+
- `team` (L1): validate cook reports from parallel streams
|
|
42
|
+
|
|
43
|
+
## Execution
|
|
44
|
+
|
|
45
|
+
### Step 1 — Collect Claims
|
|
46
|
+
|
|
47
|
+
Parse the agent's output for completion claims. Common claim patterns:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
CLAIM PATTERNS:
|
|
51
|
+
"tests pass" / "all tests passing" / "test suite green"
|
|
52
|
+
"build succeeds" / "build complete" / "compiles clean"
|
|
53
|
+
"no lint errors" / "lint clean"
|
|
54
|
+
"fixed" / "resolved" / "bug is gone"
|
|
55
|
+
"implemented" / "feature complete" / "done"
|
|
56
|
+
"no security issues" / "sentinel passed"
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Extract each claim as: `{ claim: string, source_skill: string }`
|
|
60
|
+
|
|
61
|
+
### Step 2 — Match Evidence
|
|
62
|
+
|
|
63
|
+
For each claim, look for corresponding evidence in the conversation context:
|
|
64
|
+
|
|
65
|
+
| Claim Type | Required Evidence | Where to Find |
|
|
66
|
+
|---|---|---|
|
|
67
|
+
| "tests pass" | Test runner stdout with pass count | Bash output from test command |
|
|
68
|
+
| "build succeeds" | Build command stdout showing success | Bash output from build command |
|
|
69
|
+
| "lint clean" | Linter stdout (even if empty = 0 errors) | Bash output from lint command |
|
|
70
|
+
| "fixed" | Git diff showing the change + test proving fix | Edit/Write tool calls + test output |
|
|
71
|
+
| "implemented" | Files created/modified matching the plan | Write/Edit tool calls vs plan |
|
|
72
|
+
| "no security issues" | Sentinel report with PASS verdict | Sentinel skill output |
|
|
73
|
+
| "coverage ≥ X%" | Coverage tool output with actual percentage | Test runner with coverage flag |
|
|
74
|
+
|
|
75
|
+
### Step 3 — Validate Each Claim
|
|
76
|
+
|
|
77
|
+
For each claim + evidence pair:
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
IF evidence exists AND evidence supports claim:
|
|
81
|
+
→ CONFIRMED
|
|
82
|
+
IF evidence exists BUT contradicts claim:
|
|
83
|
+
→ CONTRADICTED (most serious — agent is wrong)
|
|
84
|
+
IF no evidence found:
|
|
85
|
+
→ UNCONFIRMED (agent may be right but didn't prove it)
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Step 4 — Report
|
|
89
|
+
|
|
90
|
+
```
|
|
91
|
+
## Completion Gate Report
|
|
92
|
+
- **Status**: CONFIRMED | UNCONFIRMED | CONTRADICTED
|
|
93
|
+
- **Claims Checked**: [count]
|
|
94
|
+
- **Confirmed**: [count] | **Unconfirmed**: [count] | **Contradicted**: [count]
|
|
95
|
+
|
|
96
|
+
### Claim Validation
|
|
97
|
+
| # | Claim | Evidence | Verdict |
|
|
98
|
+
|---|---|---|---|
|
|
99
|
+
| 1 | "All tests pass" | Bash: `npm test` → "42 passed, 0 failed" | CONFIRMED |
|
|
100
|
+
| 2 | "Build succeeds" | No build command output found | UNCONFIRMED |
|
|
101
|
+
| 3 | "No lint errors" | Bash: `npm run lint` → "3 errors" | CONTRADICTED |
|
|
102
|
+
|
|
103
|
+
### Gaps (if any)
|
|
104
|
+
- Claim 2: Re-run `npm run build` and capture output
|
|
105
|
+
- Claim 3: Agent claimed clean but lint shows 3 errors — fix required
|
|
106
|
+
|
|
107
|
+
### Verdict
|
|
108
|
+
UNCONFIRMED — 1 claim lacks evidence, 1 contradicted. Cannot proceed to commit.
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Verdict Rules
|
|
112
|
+
|
|
113
|
+
```
|
|
114
|
+
ALL claims CONFIRMED → overall CONFIRMED (proceed)
|
|
115
|
+
ANY claim CONTRADICTED → overall CONTRADICTED (BLOCK — fix the contradiction)
|
|
116
|
+
ANY claim UNCONFIRMED → overall UNCONFIRMED (BLOCK — provide evidence)
|
|
117
|
+
(no CONTRADICTED)
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## Output Format
|
|
121
|
+
|
|
122
|
+
Completion Gate Report with status (CONFIRMED/UNCONFIRMED/CONTRADICTED), claim validation table, gaps, and verdict. See Step 4 Report above for full template.
|
|
123
|
+
|
|
124
|
+
## Constraints
|
|
125
|
+
|
|
126
|
+
1. MUST check every completion claim against actual tool output — not agent narrative
|
|
127
|
+
2. MUST flag missing evidence as UNCONFIRMED — absence of proof is not proof of absence
|
|
128
|
+
3. MUST flag contradictions as CONTRADICTED — this is more serious than missing evidence
|
|
129
|
+
4. MUST NOT accept "I verified it" as evidence — show the command output
|
|
130
|
+
5. MUST be fast (haiku) — this runs on every cook completion
|
|
131
|
+
|
|
132
|
+
## Sharp Edges
|
|
133
|
+
|
|
134
|
+
| Failure Mode | Severity | Mitigation |
|
|
135
|
+
|---|---|---|
|
|
136
|
+
| Agent rephrases claim to avoid detection | MEDIUM | Pattern matching covers common phrasings — extend as new patterns emerge |
|
|
137
|
+
| Evidence from a DIFFERENT test run (stale) | HIGH | Check that evidence timestamp/context matches current changes |
|
|
138
|
+
| Agent pre-generates evidence by running commands proactively | LOW | This is actually GOOD behavior — we want agents to provide evidence |
|
|
139
|
+
| Completion-gate itself claims "all confirmed" without evidence | CRITICAL | Gate report MUST include the evidence table — no table = report is invalid |
|
|
140
|
+
|
|
141
|
+
## Done When
|
|
142
|
+
|
|
143
|
+
- All completion claims extracted from agent output
|
|
144
|
+
- Each claim matched against tool output evidence
|
|
145
|
+
- Verdict table emitted with claim/evidence/verdict for each item
|
|
146
|
+
- Overall verdict: CONFIRMED / UNCONFIRMED / CONTRADICTED
|
|
147
|
+
- If not CONFIRMED: specific gaps listed with remediation steps
|
|
148
|
+
|
|
149
|
+
## Cost Profile
|
|
150
|
+
|
|
151
|
+
~500-1000 tokens input, ~200-500 tokens output. Haiku for speed. Runs frequently as part of cook's quality phase.
|
|
@@ -0,0 +1,165 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: constraint-check
|
|
3
|
+
description: "Meta-validator for HARD-GATEs. Verifies that skills' mandatory constraints were followed during a workflow. Called by cook, team, and audit to audit discipline compliance."
|
|
4
|
+
user-invocable: false
|
|
5
|
+
metadata:
|
|
6
|
+
author: runedev
|
|
7
|
+
version: "1.1.0"
|
|
8
|
+
layer: L3
|
|
9
|
+
model: haiku
|
|
10
|
+
group: validation
|
|
11
|
+
tools: "Read, Glob, Grep"
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# constraint-check
|
|
15
|
+
|
|
16
|
+
## Purpose
|
|
17
|
+
|
|
18
|
+
The internal affairs department for Rune skills. Checks whether HARD-GATEs and mandatory constraints were actually followed during a workflow — not just claimed to be followed. Reads the constraint definitions from skill files and audits the conversation trail for compliance.
|
|
19
|
+
|
|
20
|
+
While `completion-gate` checks if claims have evidence, `constraint-check` checks if the PROCESS was followed. Did you actually write tests before code? Did you actually get plan approval? Did you actually run sentinel?
|
|
21
|
+
|
|
22
|
+
## Triggers
|
|
23
|
+
|
|
24
|
+
- Called by `cook` (L1) at end of workflow as discipline audit
|
|
25
|
+
- Called by `team` (L1) to verify stream agents followed constraints
|
|
26
|
+
- Called by `audit` (L2) during quality dimension assessment
|
|
27
|
+
- `/rune constraint-check` — manual audit of current session
|
|
28
|
+
|
|
29
|
+
## Calls (outbound)
|
|
30
|
+
|
|
31
|
+
None — pure read-only validator.
|
|
32
|
+
|
|
33
|
+
## Called By (inbound)
|
|
34
|
+
|
|
35
|
+
- `cook` (L1): end-of-workflow discipline audit
|
|
36
|
+
- `team` (L1): verify stream agent compliance
|
|
37
|
+
- `audit` (L2): quality dimension
|
|
38
|
+
- User: manual session audit
|
|
39
|
+
|
|
40
|
+
## Execution
|
|
41
|
+
|
|
42
|
+
### Step 1 — Identify Active Skills
|
|
43
|
+
|
|
44
|
+
Parse the conversation/workflow to identify which skills were invoked:
|
|
45
|
+
|
|
46
|
+
```
|
|
47
|
+
Extract from context:
|
|
48
|
+
- Skills invoked via Skill tool (exact list)
|
|
49
|
+
- Skills referenced in agent narrative
|
|
50
|
+
- Phase progression (cook phases completed)
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
### Step 2 — Load Constraint Definitions
|
|
54
|
+
|
|
55
|
+
For each invoked skill, extract HARD-GATEs and numbered constraints:
|
|
56
|
+
|
|
57
|
+
```
|
|
58
|
+
For each skill in invoked_skills:
|
|
59
|
+
Read: skills/<skill>/SKILL.md
|
|
60
|
+
Extract:
|
|
61
|
+
- <HARD-GATE> blocks → mandatory, violation = BLOCK
|
|
62
|
+
- ## Constraints numbered list → required, violation = WARN
|
|
63
|
+
- ## Mesh Gates table → required gates
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Step 3 — Audit Compliance
|
|
67
|
+
|
|
68
|
+
Check each constraint against the conversation evidence:
|
|
69
|
+
|
|
70
|
+
| Constraint Type | How to Verify | Evidence Source |
|
|
71
|
+
|---|---|---|
|
|
72
|
+
| "MUST write tests BEFORE code" | Test file Write/Edit timestamps before implementation Write/Edit | Tool call ordering |
|
|
73
|
+
| "MUST get user approval" | User message containing "go"/"yes"/"proceed" after plan | Conversation history |
|
|
74
|
+
| "MUST run verification" | Bash command with test/lint/build output | Tool call results |
|
|
75
|
+
| "MUST show actual output" | Stdout captured in agent response | Agent messages |
|
|
76
|
+
| "MUST NOT modify files outside scope" | Git diff files vs plan file list | Git + plan comparison |
|
|
77
|
+
| "Iron Law: delete code before test" | No implementation code exists before test creation | Tool call ordering |
|
|
78
|
+
|
|
79
|
+
### Step 4 — Classify Violations
|
|
80
|
+
|
|
81
|
+
| Violation Type | Severity | Meaning |
|
|
82
|
+
|---------------|----------|---------|
|
|
83
|
+
| HARD-GATE violation | BLOCK | Skill says this is non-negotiable |
|
|
84
|
+
| Constraint violation | WARN | Skill says this is required but not fatal |
|
|
85
|
+
| Best practice skip | INFO | Recommended but optional |
|
|
86
|
+
|
|
87
|
+
### Step 5 — Report
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
## Constraint Check Report
|
|
91
|
+
- **Status**: COMPLIANT | VIOLATIONS_FOUND | CRITICAL_VIOLATION
|
|
92
|
+
- **Skills Audited**: [count]
|
|
93
|
+
- **Constraints Checked**: [count]
|
|
94
|
+
- **Violations**: [count by severity]
|
|
95
|
+
|
|
96
|
+
### HARD-GATE Violations (BLOCK)
|
|
97
|
+
- [skill:test] Iron Law: implementation code written at tool_call #12 BEFORE test file created at #15
|
|
98
|
+
- [skill:cook] Plan Gate: Phase 4 started without user approval message
|
|
99
|
+
|
|
100
|
+
### Constraint Violations (WARN)
|
|
101
|
+
- [skill:verification] Constraint 2: "All tests pass" claimed at message #20 without stdout evidence
|
|
102
|
+
- [skill:sentinel] Constraint 3: files scanned list not included in report
|
|
103
|
+
|
|
104
|
+
### Compliance Summary
|
|
105
|
+
| Skill | HARD-GATEs | Constraints | Status |
|
|
106
|
+
|-------|-----------|-------------|--------|
|
|
107
|
+
| cook | 3/3 ✓ | 6/7 (1 WARN) | WARN |
|
|
108
|
+
| test | 0/1 ✗ | 8/9 (1 WARN) | BLOCK |
|
|
109
|
+
| verification | 1/1 ✓ | 4/6 (2 WARN) | WARN |
|
|
110
|
+
| sentinel | 1/1 ✓ | 7/7 ✓ | PASS |
|
|
111
|
+
|
|
112
|
+
### Remediation
|
|
113
|
+
- BLOCK: test Iron Law — delete implementation, restart with test-first
|
|
114
|
+
- WARN: verification — re-run and capture stdout
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
## Constraint Catalog (Quick Reference)
|
|
118
|
+
|
|
119
|
+
Key HARD-GATEs across skills that constraint-check audits:
|
|
120
|
+
|
|
121
|
+
| Skill | HARD-GATE | Check Method |
|
|
122
|
+
|---|---|---|
|
|
123
|
+
| test | Tests BEFORE code (Iron Law) | Tool call ordering |
|
|
124
|
+
| cook | Scout before plan, plan before code | Phase progression |
|
|
125
|
+
| plan | Every code phase has test entry | Plan content |
|
|
126
|
+
| verification | Evidence for every claim | Stdout capture |
|
|
127
|
+
| sentinel | BLOCK = halt pipeline | No commit after BLOCK |
|
|
128
|
+
| preflight | BLOCK = halt pipeline | No commit after BLOCK |
|
|
129
|
+
| debug | No code changes during debug | No Write/Edit in debug |
|
|
130
|
+
| debug | 3-fix escalation | Fix attempt counter |
|
|
131
|
+
| brainstorm | No implementation before approval | User message check |
|
|
132
|
+
|
|
133
|
+
## Output Format
|
|
134
|
+
|
|
135
|
+
Constraint Check Report with status (COMPLIANT/VIOLATIONS_FOUND/CRITICAL_VIOLATION), HARD-GATE violations, constraint violations, compliance summary table, and remediation steps. See Step 5 Report above for full template.
|
|
136
|
+
|
|
137
|
+
## Constraints
|
|
138
|
+
|
|
139
|
+
1. MUST check all HARD-GATEs for every invoked skill — not just the ones that seem relevant
|
|
140
|
+
2. MUST use tool call ordering (not agent narrative) to verify temporal constraints
|
|
141
|
+
3. MUST distinguish HARD-GATE violations (BLOCK) from constraint violations (WARN)
|
|
142
|
+
4. MUST report specific evidence for each violation — not just "violated"
|
|
143
|
+
5. MUST NOT accept agent's self-report as compliance evidence — check independently
|
|
144
|
+
|
|
145
|
+
## Sharp Edges
|
|
146
|
+
|
|
147
|
+
| Failure Mode | Severity | Mitigation |
|
|
148
|
+
|---|---|---|
|
|
149
|
+
| Agent self-reports compliance and constraint-check trusts it | CRITICAL | Constraint 5: check tool calls independently, not agent narrative |
|
|
150
|
+
| Only checking cook constraints, missing test/sentinel/etc | HIGH | Constraint 1: audit ALL invoked skills, not just the orchestrator |
|
|
151
|
+
| Temporal check wrong (tool calls reordered in context) | MEDIUM | Use tool call sequence numbers, not message ordering |
|
|
152
|
+
| Too strict on optional steps (INFO treated as BLOCK) | LOW | Step 4 classification: only HARD-GATE = BLOCK, constraints = WARN |
|
|
153
|
+
|
|
154
|
+
## Done When
|
|
155
|
+
|
|
156
|
+
- All invoked skills identified from context
|
|
157
|
+
- HARD-GATEs and constraints extracted from each skill's SKILL.md
|
|
158
|
+
- Each constraint checked against conversation evidence
|
|
159
|
+
- Violations classified as BLOCK/WARN/INFO
|
|
160
|
+
- Compliance summary table emitted per skill
|
|
161
|
+
- Remediation steps listed for each violation
|
|
162
|
+
|
|
163
|
+
## Cost Profile
|
|
164
|
+
|
|
165
|
+
~1000-2000 tokens input, ~500-1000 tokens output. Haiku for speed — reads skill files and checks tool call ordering.
|
|
@@ -0,0 +1,176 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: context-engine
|
|
3
|
+
description: "Context window management. Auto-triggered when context is filling up. Triggers smart compaction and preserves critical information across compaction boundaries. Called by L1 orchestrators at context thresholds."
|
|
4
|
+
user-invocable: false
|
|
5
|
+
metadata:
|
|
6
|
+
author: runedev
|
|
7
|
+
version: "0.4.0"
|
|
8
|
+
layer: L3
|
|
9
|
+
model: haiku
|
|
10
|
+
group: state
|
|
11
|
+
tools: "Read, Glob, Grep"
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# context-engine
|
|
15
|
+
|
|
16
|
+
## Purpose
|
|
17
|
+
|
|
18
|
+
Context window management for long sessions. Detects when context is approaching limits, triggers smart compaction preserving critical decisions and progress, and coordinates with session-bridge to save state before compaction. Prevents the common failure mode of losing important context mid-workflow.
|
|
19
|
+
|
|
20
|
+
### Behavioral Contexts
|
|
21
|
+
|
|
22
|
+
Context-engine also manages **behavioral mode injection** via `contexts/` directory. Three modes are available:
|
|
23
|
+
|
|
24
|
+
| Mode | File | When to Use |
|
|
25
|
+
|------|------|-------------|
|
|
26
|
+
| `dev` | `contexts/dev.md` | Active coding — bias toward action, code-first |
|
|
27
|
+
| `research` | `contexts/research.md` | Investigation — read widely, evidence-based |
|
|
28
|
+
| `review` | `contexts/review.md` | Code review — systematic, severity-labeled |
|
|
29
|
+
|
|
30
|
+
**Mode activation**: Orchestrators (cook, team, rescue) can set the active mode by writing to `.rune/active-context.md`. The session-start hook injects the active context file into the session. Mode switches mid-session are supported — the orchestrator updates the file and references the new behavioral rules.
|
|
31
|
+
|
|
32
|
+
**Default**: If no `.rune/active-context.md` exists, no behavioral mode is injected (standard Claude behavior).
|
|
33
|
+
|
|
34
|
+
## Triggers
|
|
35
|
+
|
|
36
|
+
- Called by `cook` and `team` automatically at context boundaries
|
|
37
|
+
- Auto-trigger: when tool call count exceeds threshold or context utilization is high
|
|
38
|
+
- Auto-trigger: before compaction events
|
|
39
|
+
|
|
40
|
+
## Calls (outbound)
|
|
41
|
+
|
|
42
|
+
# Exception: L3→L3 coordination
|
|
43
|
+
- `session-bridge` (L3): coordinate state save when context critical
|
|
44
|
+
|
|
45
|
+
## Called By (inbound)
|
|
46
|
+
|
|
47
|
+
- Auto-triggered at phase boundaries and context thresholds by L1 orchestrators
|
|
48
|
+
|
|
49
|
+
## Execution
|
|
50
|
+
|
|
51
|
+
### Step 1 — Count tool calls
|
|
52
|
+
|
|
53
|
+
Count total tool calls made so far in this session. This is the ONLY reliable metric — token usage is not exposed by Claude Code and any estimate will be dangerously inaccurate.
|
|
54
|
+
|
|
55
|
+
Do NOT attempt to estimate token percentages. Tool count is a directional proxy, not a precise measurement.
|
|
56
|
+
|
|
57
|
+
### Step 2 — Classify health
|
|
58
|
+
|
|
59
|
+
Map tool call count to health level:
|
|
60
|
+
|
|
61
|
+
```
|
|
62
|
+
GREEN (<50 calls) — Healthy, continue normally
|
|
63
|
+
YELLOW (50-80 calls) — Load only essential files going forward
|
|
64
|
+
ORANGE (80-120 calls) — Recommend /compact at next logical boundary
|
|
65
|
+
RED (>120 calls) — Trigger immediate compaction, save state first
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
These thresholds are directional heuristics, not precise limits. Sessions with many large file reads may hit context limits earlier; sessions with mostly Grep/Glob may go longer.
|
|
69
|
+
|
|
70
|
+
#### Large-File Adjustment
|
|
71
|
+
|
|
72
|
+
Projects with large source files (Python modules often 500-1500 LOC, Java files similarly) consume significantly more context per `Read` call. If the session has read files averaging >500 lines, apply a 0.8x multiplier to all thresholds:
|
|
73
|
+
|
|
74
|
+
```
|
|
75
|
+
Adjusted thresholds (large-file sessions):
|
|
76
|
+
GREEN (<40 calls) — Healthy, continue normally
|
|
77
|
+
YELLOW (40-65 calls) — Load only essential files going forward
|
|
78
|
+
ORANGE (65-100 calls) — Recommend /compact at next logical boundary
|
|
79
|
+
RED (>100 calls) — Trigger immediate compaction, save state first
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Detection: count `Read` tool calls that returned >500 lines. If ≥3 such calls → activate large-file thresholds for the remainder of the session.
|
|
83
|
+
|
|
84
|
+
### Step 3 — If YELLOW
|
|
85
|
+
|
|
86
|
+
Emit advisory to the calling orchestrator:
|
|
87
|
+
|
|
88
|
+
> "[X] tool calls. Load only essential files. Avoid reading full files when Grep will do."
|
|
89
|
+
|
|
90
|
+
Do NOT trigger compaction yet. Continue execution.
|
|
91
|
+
|
|
92
|
+
### Step 4 — If ORANGE
|
|
93
|
+
|
|
94
|
+
Emit recommendation to the calling orchestrator:
|
|
95
|
+
|
|
96
|
+
> "[X] tool calls. Recommend /compact at next phase boundary (after current module completes)."
|
|
97
|
+
|
|
98
|
+
Identify the next safe boundary (end of current loop iteration, end of current file being processed) and flag it.
|
|
99
|
+
|
|
100
|
+
### Step 5 — If RED
|
|
101
|
+
|
|
102
|
+
Immediately trigger state save via `rune:session-bridge` (Save Mode) before any compaction occurs.
|
|
103
|
+
|
|
104
|
+
Pass to session-bridge:
|
|
105
|
+
- Current task and phase description
|
|
106
|
+
- List of files touched this session
|
|
107
|
+
- Decisions made (architectural choices, conventions established)
|
|
108
|
+
- Remaining tasks not yet started
|
|
109
|
+
|
|
110
|
+
After session-bridge confirms save, emit:
|
|
111
|
+
|
|
112
|
+
> "Context CRITICAL ([X] tool calls, likely near limit). State saved to .rune/. Run /compact now."
|
|
113
|
+
|
|
114
|
+
Block further tool calls until compaction is acknowledged.
|
|
115
|
+
|
|
116
|
+
### Step 6 — Report
|
|
117
|
+
|
|
118
|
+
Emit the context health report to the calling skill.
|
|
119
|
+
|
|
120
|
+
## Context Health Levels
|
|
121
|
+
|
|
122
|
+
```
|
|
123
|
+
GREEN (<50 calls) — Healthy, continue normally
|
|
124
|
+
YELLOW (50-80 calls) — Load only essential files
|
|
125
|
+
ORANGE (80-120 calls) — Recommend /compact at next logical boundary
|
|
126
|
+
RED (>120 calls) — Save state NOW via session-bridge, compact immediately
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Note: These are tool call counts, NOT token percentages. Claude Code does not expose context utilization to skills. Tool count is a directional signal only.
|
|
130
|
+
|
|
131
|
+
## Output Format
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
## Context Health
|
|
135
|
+
- **Tool Calls**: [count]
|
|
136
|
+
- **Status**: GREEN | YELLOW | ORANGE | RED
|
|
137
|
+
- **Recommendation**: continue | load-essential-only | compact-at-boundary | compact-immediately
|
|
138
|
+
- **Note**: Tool count is a directional proxy. Check CLI status bar for actual context usage.
|
|
139
|
+
|
|
140
|
+
### Critical Context (preserved on compaction)
|
|
141
|
+
- Task: [current task]
|
|
142
|
+
- Phase: [current phase]
|
|
143
|
+
- Decisions: [count saved to .rune/]
|
|
144
|
+
- Files touched: [list]
|
|
145
|
+
- Blockers: [if any]
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
## Constraints
|
|
149
|
+
|
|
150
|
+
1. MUST preserve context fidelity — no summarizing away critical details
|
|
151
|
+
2. MUST flag context conflicts between skills — never silently pick one
|
|
152
|
+
3. MUST NOT inject stale context from previous sessions without marking it as historical
|
|
153
|
+
|
|
154
|
+
## Sharp Edges
|
|
155
|
+
|
|
156
|
+
Known failure modes for this skill. Check these before declaring done.
|
|
157
|
+
|
|
158
|
+
| Failure Mode | Severity | Mitigation |
|
|
159
|
+
|---|---|---|
|
|
160
|
+
| Triggering compaction without saving state first | CRITICAL | Step 5 (RED): session-bridge MUST run before any compaction — state loss is irreversible |
|
|
161
|
+
| Blocking tool calls when context is ORANGE (not RED) | MEDIUM | ORANGE = recommend only; blocking is only for RED (>120 calls) |
|
|
162
|
+
| Injecting stale context from previous session without marking it historical | HIGH | Constraint 3: all loaded context must include session date marker |
|
|
163
|
+
| Premature compaction from over-estimated utilization | MEDIUM | Tool count is directional only — sessions with heavy Read calls may need lower thresholds; only block at confirmed RED |
|
|
164
|
+
| Not activating large-file adjustment on Python/Java codebases | MEDIUM | Track Read calls returning >500 lines; if ≥3 occur, switch to adjusted (0.8x) thresholds for the session |
|
|
165
|
+
|
|
166
|
+
## Done When
|
|
167
|
+
|
|
168
|
+
- Tool call count captured
|
|
169
|
+
- Health level classified from count thresholds (GREEN / YELLOW / ORANGE / RED)
|
|
170
|
+
- Appropriate advisory emitted matching health level (no advisory for GREEN)
|
|
171
|
+
- If RED: session-bridge called and confirmed saved before compaction signal
|
|
172
|
+
- Context Health Report emitted with tool count, status, and recommendation
|
|
173
|
+
|
|
174
|
+
## Cost Profile
|
|
175
|
+
|
|
176
|
+
~200-500 tokens input, ~100-200 tokens output. Haiku for minimal overhead. Runs frequently as a background monitor.
|