copilot-tap-extension 2.0.7 → 2.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/README.md +4 -1
  2. package/SOUL.md +51 -0
  3. package/bin/install.mjs +7 -1
  4. package/dist/copilot-instructions.md +15 -0
  5. package/dist/extension.mjs +823 -29
  6. package/dist/skills/tap-goal/SKILL.md +13 -2
  7. package/dist/skills/tap-loop/SKILL.md +6 -0
  8. package/dist/skills/tap-monitor/SKILL.md +19 -3
  9. package/dist/skills/tap-orchestrate/SKILL.md +81 -0
  10. package/dist/version.json +1 -1
  11. package/docs/adr/0001-persistent-config-default-ownership.md +33 -0
  12. package/docs/adr/0002-local-provider-gateway-runtime-security.md +36 -0
  13. package/docs/adr/0003-emitter-delivery-lifecycle.md +68 -0
  14. package/docs/adr/0004-persistent-config-canonical-streams.md +86 -0
  15. package/docs/adr/0005-provider-sdk-push-and-dynamic-tools.md +48 -0
  16. package/docs/adr/0006-command-emitter-cwd-workspace-boundary.md +46 -0
  17. package/docs/adr/0007-runtime-session-workspace-context.md +62 -0
  18. package/docs/evals.md +41 -0
  19. package/docs/evolution-of-tap-icon.html +989 -0
  20. package/docs/providers.md +242 -0
  21. package/docs/recipes/adaptive-agent.md +303 -0
  22. package/docs/recipes/agent-brainstorm/100-extension-ideas.md +288 -0
  23. package/docs/recipes/agent-brainstorm/deep-ideas.md +216 -0
  24. package/docs/recipes/ambient-guardian.md +314 -0
  25. package/docs/recipes/browser-bridge.md +162 -0
  26. package/docs/recipes/codex-goals-for-tap-goal.md +136 -0
  27. package/docs/recipes/copilot-sdk-canvas.md +147 -0
  28. package/docs/recipes/deferred-cognition.md +310 -0
  29. package/docs/recipes/provider-integration-patterns.md +93 -0
  30. package/docs/recipes/provider-interface-advanced.md +1364 -0
  31. package/docs/recipes/provider-interface-core-profile.md +568 -0
  32. package/docs/recipes/tap-control-plane-roadmap.md +60 -0
  33. package/docs/recipes/universal-tool-gateway.md +202 -0
  34. package/docs/reference.md +229 -0
  35. package/docs/use-cases.md +348 -0
  36. package/package.json +4 -1
  37. package/providers/detour/README.md +84 -0
  38. package/providers/detour/bridge.js +219 -0
  39. package/providers/detour/index.mjs +322 -0
  40. package/providers/detour/package-lock.json +577 -0
  41. package/providers/detour/package.json +19 -0
  42. package/providers/detour/scripts/build.mjs +31 -0
  43. package/providers/detour/src/bridge.js +256 -0
  44. package/providers/detour/src/contracts.js +40 -0
  45. package/providers/detour/src/inspector.js +260 -0
  46. package/providers/detour/src/inspector.test.mjs +53 -0
  47. package/providers/detour/src/panel.js +465 -0
  48. package/providers/detour/src/provider-core.js +233 -0
  49. package/providers/detour/src/provider-core.test.mjs +185 -0
  50. package/providers/detour/src/react-context-core.js +143 -0
  51. package/providers/detour/src/react-context.js +44 -0
  52. package/providers/detour/src/react-context.test.mjs +41 -0
  53. package/providers/templates/README.md +23 -0
  54. package/providers/templates/ci-review-provider.mjs +46 -0
  55. package/providers/templates/detour-workflow-provider.mjs +41 -0
  56. package/providers/templates/jira-github-provider.mjs +42 -0
  57. package/providers/templates/provider-utils.mjs +45 -0
  58. package/providers/templates/sast-triage-provider.mjs +51 -0
@@ -0,0 +1,314 @@
1
+ # Recipe: Ambient Guardian — Continuous Background Intelligence
2
+
3
+ ## The insight
4
+
5
+ Skills fire when invoked. tap fires when something happens. The gap between these two is **time** — the 30 seconds between a teammate's force-push and your next `git push` that will conflict. The 90 seconds between a deploy and the error spike it causes. The silent period while CI is failing and you're still writing code that depends on it passing.
6
+
7
+ The Ambient Guardian is a pattern where tap maintains a continuous awareness of your environment and interrupts **only when something needs your attention right now**. Not a dashboard. Not a notification system. A runtime that understands what you're doing and correlates it with what's happening around you.
8
+
9
+ ## Architecture
10
+
11
+ ```
12
+ ┌──────────────────────────────────────────────────────────┐
13
+ │ Copilot CLI session │
14
+ │ │
15
+ │ ┌─────────────────────────────────────────────────────┐ │
16
+ │ │ Ambient Guardian (tap extension layer) │ │
17
+ │ │ │ │
18
+ │ │ onPreToolUse ──► gate actions against live state │ │
19
+ │ │ onPostToolUse ──► track what you're working on │ │
20
+ │ │ transform callbacks ──► rewrite rules per context │ │
21
+ │ └────────┬───────────┬───────────┬────────────────────┘ │
22
+ │ │ │ │ │
23
+ │ ┌────────▼──┐ ┌──────▼───┐ ┌────▼──────┐ │
24
+ │ │ Emitter: │ │ Emitter: │ │ Emitter: │ │
25
+ │ │ git state │ │ CI watch │ │ env probe │ │
26
+ │ │ (30s poll)│ │ (gh api) │ │ (custom) │ │
27
+ │ └────────┬──┘ └──────┬───┘ └────┬──────┘ │
28
+ │ │ │ │ │
29
+ │ ┌────────▼───────────▼──────────▼──────┐ │
30
+ │ │ Correlation PromptEmitter (idle) │ │
31
+ │ │ Reads all streams, finds patterns, │ │
32
+ │ │ decides what to surface │ │
33
+ │ └───────────────────────────────────────┘ │
34
+ └──────────────────────────────────────────────────────────┘
35
+ ```
36
+
37
+ ## Why skills can't do this
38
+
39
+ A skill can check CI status when you ask. But:
40
+
41
+ 1. You don't ask until you're about to push — by then you've built on a broken foundation for 20 minutes.
42
+ 2. A skill can't correlate a deploy, an error spike, and a PR comment that happened 90 seconds apart. It sees one thing at a time.
43
+ 3. A skill can't physically block a `git push` mid-execution. `onPreToolUse` can.
44
+ 4. A skill can't rewrite the system prompt to say "be conservative, production is degraded." Transform callbacks can.
45
+
46
+ The value is in what happens **between user messages** — the silence when nobody is asking questions but the world is changing.
47
+
48
+ ## Components
49
+
50
+ ### 1. Environment emitters (the eyes)
51
+
52
+ Three CommandEmitters running continuously:
53
+
54
+ **Git state watcher** — polls every 30 seconds:
55
+ ```bash
56
+ git fetch --quiet 2>/dev/null; \
57
+ echo "branch=$(git branch --show-current)"; \
58
+ echo "ahead=$(git rev-list --count @{u}..HEAD 2>/dev/null || echo 0)"; \
59
+ echo "behind=$(git rev-list --count HEAD..@{u} 2>/dev/null || echo 0)"; \
60
+ echo "dirty=$(git status --porcelain | wc -l)"; \
61
+ echo "conflicts=$(git diff --name-only --diff-filter=U | wc -l)"
62
+ ```
63
+
64
+ **CI watcher** — polls GitHub Actions:
65
+ ```bash
66
+ gh run list --branch $(git branch --show-current) --limit 3 --json status,conclusion,name,createdAt
67
+ ```
68
+
69
+ **Deploy/infra probe** — customizable per project (Kubernetes, AWS, Vercel, etc.):
70
+ ```bash
71
+ kubectl get pods -l app=myservice --no-headers | awk '{print $1, $3, $4, $5}'
72
+ ```
73
+
74
+ ### 2. EventFilter rules (noise control)
75
+
76
+ ```json
77
+ [
78
+ { "match": "behind=0", "outcome": "drop" },
79
+ { "match": "dirty=0", "outcome": "drop" },
80
+ { "match": "conflicts=[1-9]", "outcome": "inject" },
81
+ { "match": "behind=[1-9]", "outcome": "surface" },
82
+ { "match": "status.*failure", "outcome": "inject" },
83
+ { "match": "CrashLoopBackOff", "outcome": "inject" },
84
+ { "match": ".*", "outcome": "keep" }
85
+ ]
86
+ ```
87
+
88
+ Most polls produce nothing interesting → dropped. Only real signals break through.
89
+
90
+ ### 3. Correlation engine (the brain)
91
+
92
+ A PromptEmitter on idle schedule that reads across all streams:
93
+
94
+ ```
95
+ prompt: |
96
+ You are a background correlation engine. Read the recent events
97
+ from all streams and look for patterns:
98
+ - Did something change in one stream that explains an event in another?
99
+ - Is there a time correlation between events across streams?
100
+ - Is the developer's current work going to collide with something
101
+ that just happened?
102
+
103
+ Only report if you find a genuine correlation. Say nothing if
104
+ everything looks normal. Be terse — one sentence max.
105
+
106
+ Stream history:
107
+ {{git_stream_last_10}}
108
+ {{ci_stream_last_10}}
109
+ {{deploy_stream_last_10}}
110
+ ```
111
+
112
+ ### 4. Action gating (onPreToolUse)
113
+
114
+ Before tool calls execute, the guardian checks live state:
115
+
116
+ ```js
117
+ onPreToolUse: async ({ toolName, toolArgs }) => {
118
+ // Block push if CI is failing
119
+ if (toolName === "shell" && isGitPush(toolArgs.command)) {
120
+ const ciState = streams.latest("ci-watch");
121
+ if (ciState?.includes("failure")) {
122
+ return {
123
+ permissionDecision: "deny",
124
+ permissionDecisionReason:
125
+ "CI is currently failing on this branch. Fix the failing " +
126
+ "tests before pushing, or the failure will block the PR."
127
+ };
128
+ }
129
+ }
130
+
131
+ // Warn before editing files that have upstream changes
132
+ if (toolName === "edit") {
133
+ const gitState = streams.latest("git-watch");
134
+ if (gitState?.behind > 0) {
135
+ return {
136
+ additionalContext:
137
+ `Warning: your branch is ${gitState.behind} commits behind ` +
138
+ `origin. The file you're editing may have upstream changes. ` +
139
+ `Consider pulling first.`
140
+ };
141
+ }
142
+ }
143
+ }
144
+ ```
145
+
146
+ ### 5. Context-adaptive system prompt (transform callbacks)
147
+
148
+ ```js
149
+ registerTransformCallbacks(new Map([
150
+ ["code_change_rules", (current) => {
151
+ const branch = streams.latest("git-watch")?.branch;
152
+ const ciStatus = streams.latest("ci-watch")?.status;
153
+ const deploying = streams.latest("deploy-watch")?.deploying;
154
+
155
+ const additions = [];
156
+
157
+ if (branch === "main" || branch === "master") {
158
+ additions.push(
159
+ "You are on the production branch. Require explicit user " +
160
+ "confirmation before any file write. Suggest a feature branch."
161
+ );
162
+ }
163
+
164
+ if (ciStatus === "failure") {
165
+ additions.push(
166
+ "CI is currently failing. Prioritize fixing tests over new features."
167
+ );
168
+ }
169
+
170
+ if (deploying) {
171
+ additions.push(
172
+ "A production deploy is in progress. Do not suggest database " +
173
+ "migrations or infrastructure changes until it completes."
174
+ );
175
+ }
176
+
177
+ return additions.length > 0
178
+ ? current + "\n\n" + additions.join("\n")
179
+ : current;
180
+ }]
181
+ ]));
182
+ ```
183
+
184
+ ## Example scenarios
185
+
186
+ ### Scenario A: The silent conflict
187
+
188
+ ```
189
+ You're writing code on feature/auth (10 minutes in)
190
+
191
+
192
+ Git emitter detects: branch is now 2 commits behind origin
193
+
194
+
195
+ EventFilter: behind=[1-9] → surface
196
+
197
+
198
+ Timeline shows: "※ tap: feature/auth is 2 commits behind origin"
199
+
200
+
201
+ You keep working (it's just a surface, not an inject)
202
+
203
+
204
+ 5 minutes later, you ask Copilot to edit src/auth.ts
205
+
206
+
207
+ onPreToolUse fires → checks git state → one of the upstream
208
+ commits touched src/auth.ts
209
+
210
+
211
+ Copilot receives: "Warning: src/auth.ts was modified in an upstream
212
+ commit (abc123 by Alice, 7 min ago). Your edit may conflict.
213
+ Consider pulling first."
214
+
215
+
216
+ You pull, resolve cleanly, then continue. Saved 20 minutes of
217
+ merge conflict debugging.
218
+ ```
219
+
220
+ ### Scenario B: The cascading failure
221
+
222
+ ```
223
+ 3 events arrive over 90 seconds:
224
+
225
+ 2:01pm — deploy emitter: "v2.4.2 deployed to prod"
226
+ 2:02pm — CI emitter: "staging pipeline failed: connection refused"
227
+ 2:03pm — deploy emitter: "pod auth-service restart count: 4"
228
+
229
+
230
+ Correlation PromptEmitter (idle) reads all streams:
231
+
232
+
233
+ Injects: "Deploy v2.4.2 is causing auth-service crash loops
234
+ (4 restarts in 2 min). Staging CI is failing with connection
235
+ refused — likely same root cause. Consider rolling back."
236
+
237
+
238
+ Meanwhile, transform callback has already added to system prompt:
239
+ "Production is degraded. Do not suggest changes to auth-service
240
+ configuration. Prioritize investigation and rollback."
241
+
242
+
243
+ You say: "rollback" — Copilot already knows the context,
244
+ runs the rollback command immediately.
245
+ ```
246
+
247
+ ### Scenario C: The preemptive gate
248
+
249
+ ```
250
+ You ask Copilot: "push my changes"
251
+
252
+
253
+ onPreToolUse fires for shell(git push)
254
+
255
+
256
+ Guardian checks:
257
+ ✗ CI status: failure (test/auth.spec.ts)
258
+ ✗ Uncommitted files: 2 files not in this branch's scope
259
+ ✓ Branch: feature/auth (not main)
260
+ ✓ No deploy in progress
261
+
262
+
263
+ Returns: permissionDecision: "deny"
264
+ reason: "CI is failing on test/auth.spec.ts (your branch).
265
+ Also, you have 2 uncommitted files (config.json,
266
+ .env.local) that aren't related to this PR.
267
+ Fix the test first, then stash or commit the
268
+ unrelated files."
269
+
270
+
271
+ Copilot: "I can't push right now — CI is failing and you
272
+ have unrelated uncommitted files. Want me to fix the
273
+ failing test first?"
274
+ ```
275
+
276
+ ## Configuration
277
+
278
+ In `tap.config.json`:
279
+
280
+ ```json
281
+ {
282
+ "guardian": {
283
+ "emitters": {
284
+ "git": { "every": "30s", "enabled": true },
285
+ "ci": { "every": "60s", "enabled": true },
286
+ "deploy": { "command": "kubectl get pods ...", "every": "60s", "enabled": false }
287
+ },
288
+ "correlation": { "schedule": "idle", "enabled": true },
289
+ "gating": {
290
+ "blockPushOnCIFailure": true,
291
+ "warnOnUpstreamChanges": true,
292
+ "blockMainBranchWrites": true
293
+ }
294
+ }
295
+ }
296
+ ```
297
+
298
+ ## Phased delivery
299
+
300
+ | Phase | Scope |
301
+ |---|---|
302
+ | **1. Git + CI emitters** | Two CommandEmitters with EventFilter rules, surface/inject thresholds |
303
+ | **2. onPreToolUse gating** | Block push on CI failure, warn on upstream conflicts |
304
+ | **3. Transform callbacks** | Context-adaptive system prompt based on branch/CI/deploy state |
305
+ | **4. Correlation engine** | PromptEmitter that reads across streams and synthesizes |
306
+ | **5. Configuration** | Per-project guardian config in tap.config.json |
307
+
308
+ ## Open questions
309
+
310
+ - **Polling frequency** — 30s for git, 60s for CI? Configurable per project?
311
+ - **Gate strictness** — deny vs. warn? Should the user be able to override gates?
312
+ - **Correlation prompt** — how to keep it cheap (token-wise) while effective?
313
+ - **Multi-repo** — does the guardian follow you across repos, or reset per project?
314
+ - **Override mechanism** — `--force` style escape hatch for gates?
@@ -0,0 +1,162 @@
1
+ # Recipe: Browser Bridge — Copilot CLI ↔ Live Web Pages
2
+
3
+ Connect Copilot CLI to any browser tab via a local WebSocket relay and [Detour](https://chromewebstore.google.com/detail/detour/cinkplogkjggmgdkaflhlemcdhchninp) (a Chrome extension that injects scripts into pages).
4
+
5
+ ## How it works
6
+
7
+ ```
8
+ Copilot CLI (※ tap) ◄─ws─► Bridge Server ◄─ws─► Injected JS (via Detour)
9
+ ws://localhost:9400 running in page MAIN world
10
+ ```
11
+
12
+ 1. A standalone **bridge server** runs locally on a WebSocket port.
13
+ 2. **Detour injects a client script** into target pages — no changes to Detour needed. Detour already runs arbitrary JS in the MAIN world and bypasses CSP. The bridge client is just another script it injects.
14
+ 3. **tap dynamically registers tools** when the bridge connects via `session.registerTools()` — Copilot sees browser tools appear and disappear as the bridge connects/disconnects.
15
+ 4. **Push events** (console, annotations) flow from the browser through a tap emitter into the Copilot session.
16
+
17
+ ## Architecture
18
+
19
+ ### Bridge server (standalone)
20
+
21
+ A minimal Node.js WebSocket relay. Clients self-identify as `agent` (Copilot) or `browser` (injected page script). The bridge routes messages between them.
22
+
23
+ ```
24
+ npx copilot-bridge
25
+ # or
26
+ node bridge/server.mjs
27
+ ```
28
+
29
+ Zero knowledge of tap or Detour — it just relays JSON.
30
+
31
+ ### Injected script (via Detour)
32
+
33
+ A self-contained JS file hosted locally or on a CDN. Added to Detour as a script injection rule on target pages. It:
34
+
35
+ - Connects to `ws://localhost:9400`
36
+ - Identifies as `browser`
37
+ - Handles action requests (screenshot, DOM query, JS exec)
38
+ - Pushes events (console, annotations) to the bridge
39
+
40
+ ### tap integration — dynamic tool registration
41
+
42
+ The Copilot SDK supports `session.registerTools()` at runtime (`CopilotSession.registerTools`). tap doesn't need to predefine browser tools — it registers them when the bridge connects and removes them when it disconnects.
43
+
44
+ ```js
45
+ // When bridge connects and browser is available:
46
+ session.registerTools([
47
+ ...existingTapTools,
48
+ {
49
+ name: "browser_screenshot",
50
+ description: "Capture the visible browser viewport as a PNG screenshot",
51
+ handler: async () => bridge.request("screenshot")
52
+ },
53
+ {
54
+ name: "browser_exec",
55
+ description: "Execute JS in the page MAIN world, return result",
56
+ parameters: { type: "object", properties: { js: { type: "string" } }, required: ["js"] },
57
+ handler: async ({ js }) => bridge.request("js.exec", { js })
58
+ }
59
+ ]);
60
+
61
+ // When bridge disconnects — re-register without browser tools:
62
+ session.registerTools(existingTapTools);
63
+ ```
64
+
65
+ This pattern generalizes: any WebSocket-connected service can surface tools into Copilot at runtime — not just the browser bridge. The bridge announces what actions the connected browser supports, and tap materializes them as tools.
66
+
67
+ ## Protocol
68
+
69
+ JSON over WebSocket. Request/response with correlation IDs.
70
+
71
+ ### Handshake
72
+
73
+ ```json
74
+ { "type": "hello", "role": "agent", "name": "copilot-tap" }
75
+ { "type": "hello", "role": "browser", "name": "detour-bridge-client" }
76
+ ```
77
+
78
+ ### Request → Response
79
+
80
+ ```json
81
+ // agent sends
82
+ { "type": "request", "id": "r1", "action": "screenshot", "params": {} }
83
+
84
+ // browser responds
85
+ { "type": "response", "id": "r1", "data": { "image": "data:image/png;base64,..." } }
86
+ ```
87
+
88
+ ### Push (browser → agent, unsolicited)
89
+
90
+ ```json
91
+ { "type": "push", "action": "comment", "data": { "text": "Fix this button", "selector": "#submit-btn", "url": "https://..." } }
92
+ ```
93
+
94
+ ## Actions
95
+
96
+ | Action | Direction | What it does |
97
+ |---|---|---|
98
+ | `screenshot` | agent → browser | `html2canvas` or Canvas API capture of viewport |
99
+ | `dom.query` | agent → browser | `querySelector` → outerHTML, textContent, attributes |
100
+ | `dom.react` | agent → browser | React fiber walk → component name, file, line, props |
101
+ | `js.exec` | agent → browser | Run arbitrary JS in page context, return result |
102
+ | `page.info` | agent → browser | URL, title, meta, `document.readyState` |
103
+ | `comment` | browser → agent | User annotation from page → Copilot session |
104
+ | `console` | browser → agent | Intercepted `console.*` calls → tap emitter |
105
+ | `navigate` | agent → browser | `window.location.href = url` |
106
+
107
+ ## Use cases
108
+
109
+ ### Get a screenshot into Copilot
110
+
111
+ ```
112
+ > take a screenshot of the current page
113
+ ```
114
+
115
+ tap calls `tap_browser_screenshot` → bridge → injected script captures viewport → base64 flows back → Copilot sees the image.
116
+
117
+ ### React component context (like react-grab)
118
+
119
+ ```
120
+ > what React component renders the sidebar?
121
+ ```
122
+
123
+ tap calls `tap_browser_query` with a selector or `tap_browser_react_context` → walks React fiber tree → returns component name, source file, line number, props → Copilot has full context without searching the codebase.
124
+
125
+ ### Live console monitoring
126
+
127
+ A tap CommandEmitter connects to the bridge and streams `console` push events. EventFilter drops noise, injects errors:
128
+
129
+ ```json
130
+ { "match": "error|warn|uncaught", "outcome": "inject" }
131
+ { "match": ".*", "outcome": "keep" }
132
+ ```
133
+
134
+ ### Page annotations → Copilot
135
+
136
+ User selects an element on the page, types a comment. The injected script pushes it to the bridge → tap injects it into the Copilot session. Like react-grab but the context goes straight into the conversation, not the clipboard.
137
+
138
+ ### Copilot drives the browser
139
+
140
+ ```
141
+ > click the submit button and tell me what happens
142
+ ```
143
+
144
+ tap calls `tap_browser_exec` with `document.querySelector('#submit').click()` → injected script runs it → returns result or captures DOM changes.
145
+
146
+ ## Phased delivery
147
+
148
+ | Phase | Scope |
149
+ |---|---|
150
+ | **1. Prove the round-trip** | Bridge server + injected client script + `screenshot` action + one tap tool |
151
+ | **2. DOM + React context** | `dom.query`, `dom.react`, `page.info` actions and tap tools |
152
+ | **3. Bidirectional** | `js.exec`, `comment` push, `console` push, `navigate` |
153
+ | **4. Polish** | Auto-reconnect, multi-tab targeting, annotation overlay UI, error handling |
154
+
155
+ ## Open questions
156
+
157
+ - **Bridge as npm package?** `npx copilot-bridge` or should tap auto-start it?
158
+ - **Multi-tab** — target active tab by default, allow tab ID targeting?
159
+ - **Screenshot method** — `html2canvas` (full fidelity) vs Canvas API (faster)?
160
+ - **Security** — localhost-only binding, optional shared secret?
161
+ - **Image delivery** — base64 inline vs write to temp file and return path?
162
+ - **React context** — bundle react-grab extraction logic or write lightweight version?
@@ -0,0 +1,136 @@
1
+ # Codex Goals lessons for `/tap-goal`
2
+
3
+ This recipe records the design lessons borrowed from OpenAI's
4
+ [Using Goals in Codex](https://developers.openai.com/cookbook/examples/codex/using_goals_in_codex)
5
+ guide and maps them to ※ tap's `/tap-goal` skill.
6
+
7
+ ## Core lesson
8
+
9
+ A goal is a **completion contract** attached to the current thread:
10
+
11
+ ```text
12
+ work -> check evidence -> continue, complete, or stop blocked
13
+ ```
14
+
15
+ It is not open-ended background autonomy. The objective persists, but evidence
16
+ decides whether work is complete.
17
+
18
+ ## Strong goal contract
19
+
20
+ Before starting a goal loop, make these fields explicit:
21
+
22
+ | Field | Purpose |
23
+ | --- | --- |
24
+ | Outcome | Desired end state |
25
+ | Verification surface | Test, benchmark, command output, artifact, source material, or report that proves completion |
26
+ | Constraints | What must not regress |
27
+ | Boundaries | Files, tools, data, repositories, or resources in scope |
28
+ | Iteration policy | How to choose the next experiment/action after each attempt |
29
+ | Blocked stop condition | When to stop, what to report, and what would unlock progress |
30
+
31
+ Weak:
32
+
33
+ ```text
34
+ /tap-goal improve performance
35
+ ```
36
+
37
+ Strong:
38
+
39
+ ```text
40
+ /tap-goal Reduce p95 checkout latency below 120 ms, verified by the checkout benchmark,
41
+ while keeping the correctness suite green. Use only checkout service files,
42
+ benchmark fixtures, and related tests. Between iterations, record what changed,
43
+ what the benchmark showed, and the next best experiment. If blocked, stop with
44
+ attempted paths, evidence, blocker, and next input needed.
45
+ ```
46
+
47
+ ## Runtime mapping in tap
48
+
49
+ Codex Goals continue at safe idle boundaries. Tap supports that as:
50
+
51
+ ```json
52
+ { "prompt": "...", "every": "idle", "maxRuns": 50 }
53
+ ```
54
+
55
+ Copilot CLI autopilot can keep a session continuously busy, so `/tap-goal` also
56
+ supports timed autopilot-compatible goals:
57
+
58
+ ```json
59
+ { "prompt": "...", "everySchedule": ["2m", "5m", "10m"], "maxRuns": 50 }
60
+ ```
61
+
62
+ Timed prompt sends that are deferred because the session is busy do not consume
63
+ the real iteration budget.
64
+
65
+ ## Evidence audit before completion
66
+
67
+ Before a goal stops as complete, the prompt must record:
68
+
69
+ ```text
70
+ GOAL COMPLETE
71
+ Verification surface checked: <specific evidence>
72
+ Result observed: <what it showed>
73
+ Constraints checked: <what did not regress>
74
+ Conclusion: complete
75
+ ```
76
+
77
+ If the verification surface cannot be checked, the goal is blocked, not
78
+ complete.
79
+
80
+ ## Iteration ledger
81
+
82
+ Each iteration should post a structured EventStream note with `tap_post`:
83
+
84
+ ```text
85
+ ITERATION RECORD
86
+ Iteration: <runs> of <maxRuns>
87
+ Action taken: <smallest useful action>
88
+ Evidence checked: <test/output/artifact/result>
89
+ Status: progressing | complete | blocked | budget-limited
90
+ Next best action: <next step>
91
+ ```
92
+
93
+ This makes the EventStream an audit trail rather than only a notification log.
94
+
95
+ ## Research and reproduction goals
96
+
97
+ For research goals, maintain a claim ledger:
98
+
99
+ ```text
100
+ Claim: <specific claim>
101
+ Route: <how it was tested>
102
+ Evidence surface: <what was checked>
103
+ Status: confirmed | approximate-support | blocked | uncertain
104
+ Remaining uncertainty: <what is missing>
105
+ ```
106
+
107
+ The final output should preserve epistemic levels instead of flattening partial
108
+ support into success.
109
+
110
+ ## Figure lessons from the Codex guide
111
+
112
+ The guide's figures reinforce these workflow rules:
113
+
114
+ 1. A goal turns a one-turn exchange into an evidence-checked continuation loop.
115
+ 2. Goal state is thread-scoped and includes durable state, continuation,
116
+ controls, and evidence checks.
117
+ 3. Continuation is gated: active goal, idle thread, and no queued user input.
118
+ 4. Strong goals visibly name end state, verification surface, and constraints.
119
+ 5. Research goals decompose source claims into evidence channels before status.
120
+ 6. Final research output preserves confirmed, approximate, blocked, and
121
+ uncertain support levels.
122
+ 7. The UI example shows goal mode as an explicit command/input affordance rather
123
+ than hidden background work.
124
+
125
+ ## Budget handling
126
+
127
+ `maxRuns` is a safety budget. Reaching it means "budget-limited handoff," not
128
+ "goal complete." The final budget-limited iteration should post:
129
+
130
+ ```text
131
+ BUDGET LIMITED
132
+ Progress: <what was achieved>
133
+ Evidence gathered: <what is known>
134
+ Remaining work: <what is not done>
135
+ Recommended next goal/budget: <next invocation>
136
+ ```