copilot-tap-extension 2.0.7 → 2.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (58) hide show
  1. package/README.md +4 -1
  2. package/SOUL.md +51 -0
  3. package/bin/install.mjs +7 -1
  4. package/dist/copilot-instructions.md +15 -0
  5. package/dist/extension.mjs +823 -29
  6. package/dist/skills/tap-goal/SKILL.md +13 -2
  7. package/dist/skills/tap-loop/SKILL.md +6 -0
  8. package/dist/skills/tap-monitor/SKILL.md +19 -3
  9. package/dist/skills/tap-orchestrate/SKILL.md +81 -0
  10. package/dist/version.json +1 -1
  11. package/docs/adr/0001-persistent-config-default-ownership.md +33 -0
  12. package/docs/adr/0002-local-provider-gateway-runtime-security.md +36 -0
  13. package/docs/adr/0003-emitter-delivery-lifecycle.md +68 -0
  14. package/docs/adr/0004-persistent-config-canonical-streams.md +86 -0
  15. package/docs/adr/0005-provider-sdk-push-and-dynamic-tools.md +48 -0
  16. package/docs/adr/0006-command-emitter-cwd-workspace-boundary.md +46 -0
  17. package/docs/adr/0007-runtime-session-workspace-context.md +62 -0
  18. package/docs/evals.md +41 -0
  19. package/docs/evolution-of-tap-icon.html +989 -0
  20. package/docs/providers.md +242 -0
  21. package/docs/recipes/adaptive-agent.md +303 -0
  22. package/docs/recipes/agent-brainstorm/100-extension-ideas.md +288 -0
  23. package/docs/recipes/agent-brainstorm/deep-ideas.md +216 -0
  24. package/docs/recipes/ambient-guardian.md +314 -0
  25. package/docs/recipes/browser-bridge.md +162 -0
  26. package/docs/recipes/codex-goals-for-tap-goal.md +136 -0
  27. package/docs/recipes/copilot-sdk-canvas.md +147 -0
  28. package/docs/recipes/deferred-cognition.md +310 -0
  29. package/docs/recipes/provider-integration-patterns.md +93 -0
  30. package/docs/recipes/provider-interface-advanced.md +1364 -0
  31. package/docs/recipes/provider-interface-core-profile.md +568 -0
  32. package/docs/recipes/tap-control-plane-roadmap.md +60 -0
  33. package/docs/recipes/universal-tool-gateway.md +202 -0
  34. package/docs/reference.md +229 -0
  35. package/docs/use-cases.md +348 -0
  36. package/package.json +4 -1
  37. package/providers/detour/README.md +84 -0
  38. package/providers/detour/bridge.js +219 -0
  39. package/providers/detour/index.mjs +322 -0
  40. package/providers/detour/package-lock.json +577 -0
  41. package/providers/detour/package.json +19 -0
  42. package/providers/detour/scripts/build.mjs +31 -0
  43. package/providers/detour/src/bridge.js +256 -0
  44. package/providers/detour/src/contracts.js +40 -0
  45. package/providers/detour/src/inspector.js +260 -0
  46. package/providers/detour/src/inspector.test.mjs +53 -0
  47. package/providers/detour/src/panel.js +465 -0
  48. package/providers/detour/src/provider-core.js +233 -0
  49. package/providers/detour/src/provider-core.test.mjs +185 -0
  50. package/providers/detour/src/react-context-core.js +143 -0
  51. package/providers/detour/src/react-context.js +44 -0
  52. package/providers/detour/src/react-context.test.mjs +41 -0
  53. package/providers/templates/README.md +23 -0
  54. package/providers/templates/ci-review-provider.mjs +46 -0
  55. package/providers/templates/detour-workflow-provider.mjs +41 -0
  56. package/providers/templates/jira-github-provider.mjs +42 -0
  57. package/providers/templates/provider-utils.mjs +45 -0
  58. package/providers/templates/sast-triage-provider.mjs +51 -0
@@ -133,13 +133,19 @@ Continuation rules:
133
133
  - If only 1 iteration remains and the goal is not complete, do not start broad new work. Produce a budget-limited handoff instead.
134
134
  - Do not treat budget exhaustion or a lifecycle "reached run budget" message as success.
135
135
  - If this iteration makes no progress-producing tool calls beyond required status/ledger bookkeeping and does not change evidence, call `tap_post` with `channel='<goal-emitter-name>'` and a no-action handoff, then stop the emitter rather than spinning.
136
+ - If the remaining delta is unchanged from the previous ITERATION RECORD, post a STALLED LOOP record and stop rather than spending the rest of the budget.
136
137
 
137
138
  Evidence-audit rules:
138
139
  - Before marking complete, identify the verification surface from the goal contract.
139
140
  - Check the evidence directly: test output, benchmark result, file content, diff, generated artifact, source material, or other concrete proof.
141
+ - When the evidence is a workspace file, EventStream entry, or already-run command result, call `tap_verify_goal_output` or `tap_audit_claims` before GOAL COMPLETE.
140
142
  - Check listed constraints for regressions.
141
143
  - If the verification surface cannot be checked, treat the goal as blocked, not complete.
142
144
  - Completion requires an explicit evidence audit in the final response and in the EventStream.
145
+ - Wrap machine-readable EventStream records with explicit markers:
146
+ `=== BEGIN_ITERATION_RECORD ===` / `=== END_ITERATION_RECORD ===`,
147
+ `=== BEGIN_GOAL_COMPLETE ===` / `=== END_GOAL_COMPLETE ===`,
148
+ `=== BEGIN_GOAL_BLOCKED ===` / `=== END_GOAL_BLOCKED ===`.
143
149
 
144
150
  Research/audit goal rules:
145
151
  - For research, reproduction, audit, or investigation goals, maintain a claim ledger.
@@ -148,21 +154,26 @@ Research/audit goal rules:
148
154
 
149
155
  On this iteration:
150
156
  1. Briefly assess current progress toward the goal and the remaining iteration budget.
151
- 2. If the goal is already achieved, first call `tap_post` with `channel='<goal-emitter-name>'` and a GOAL COMPLETE evidence audit in `message`, then call tap_stop_emitter for '<goal-emitter-name>' with scope='temporary', report that the goal is complete, and stop.
157
+ 2. If the goal is already achieved, first call `tap_verify_goal_output` or `tap_audit_claims` against the verification surface. If verification passes, call `tap_post` with `channel='<goal-emitter-name>'` and a marked GOAL COMPLETE evidence audit in `message`, then call tap_stop_emitter for '<goal-emitter-name>' with scope='temporary', report that the goal is complete, and stop.
152
158
  3. If the goal is blocked by missing information, permissions, failing external systems, or an unsafe action, first call `tap_post` with `channel='<goal-emitter-name>'` and a GOAL BLOCKED report in `message`, then call tap_stop_emitter for '<goal-emitter-name>' with scope='temporary', report the blocker, and stop.
153
- 4. If this is the final iteration and the goal is not complete, do not start substantive new work. Call `tap_post` with `channel='<goal-emitter-name>'` and a BUDGET LIMITED summary in `message`: progress, evidence gathered, remaining work, and recommended next goal or budget. Then leave a concise handoff.
159
+ 4. If this is the final iteration and the goal is not complete, do not start substantive new work. Call `tap_post` with `channel='<goal-emitter-name>'` and a BUDGET LIMITED summary in `message`: progress, evidence gathered, remaining work, recommended next `/tap-goal ...` invocation, and suggested fresh budget. Then leave a concise handoff.
154
160
  5. Otherwise, choose the next smallest useful action toward the goal that fits the remaining budget and perform it.
155
161
  6. Validate the action using the repository's existing checks when relevant.
156
162
  7. End by calling `tap_post` with `channel='<goal-emitter-name>'` and an ITERATION RECORD in `message` containing:
157
163
  - iteration and budget used
158
164
  - action taken
159
165
  - evidence checked and result
166
+ - claim ledger entries when this is a research/audit goal
167
+ - remaining_delta or unchanged_delta status
160
168
  - current status: progressing, complete, blocked, or budget-limited
161
169
  - next best action
170
+ - branch, commit SHA, PR URL, run URL, or issue key when relevant
162
171
  8. End the user-visible response with the same concise progress update, what remains, and the next best step if the loop stops before completion.
163
172
 
164
173
  Safety rules:
165
174
  - Do not make unrelated changes.
175
+ - Do not modify this goal emitter's own `every`, `everySchedule`, `maxRuns`, event filter, or goal contract while it is running unless the user explicitly asks.
176
+ - Do not spawn additional emitters from this goal unless orchestration is explicitly part of the goal contract.
166
177
  - Do not mark the goal complete unless the objective is actually achieved and no required work remains.
167
178
  - Do not treat reaching the iteration budget as success.
168
179
  - Do not continue if the next step requires explicit user approval.
@@ -7,6 +7,12 @@ user-invocable: true
7
7
 
8
8
  Create a timed or idle PromptEmitter with `tap_start_emitter`.
9
9
 
10
+ If the request includes a completion condition such as "until", "keep going
11
+ until", "stop when", "work until done", or "iterate until complete", do not
12
+ create a plain loop. Redirect to `/tap-goal` semantics instead, because the
13
+ user is asking for a completion contract with evidence, budget, and stop
14
+ conditions rather than a recurring prompt.
15
+
10
16
  ## Expected input
11
17
 
12
18
  Interpret the invocation as:
@@ -63,9 +63,25 @@ Steps:
63
63
  - Lines that indicate important events (errors, warnings, state changes) → candidates for `{ "match": "...", "outcome": "inject" }`.
64
64
  - Lines that are never relevant at all → candidates for tighter keep/drop rules.
65
65
  4. Compare what you see against the current filter patterns for emitter '<command-emitter-name>'.
66
- 5. Only update if the evidence clearly justifies a change (signal-to-noise is poor or a pattern is clearly wrong).
67
- 6. If an update is needed, call tap_set_event_filter with the revised patterns for emitter '<command-emitter-name>'.
68
- 7. Do not report your findings to the user unless you made a change. If you made a change, send one short message describing what you updated and why.
66
+ 5. Use this shared contract when judging the stream:
67
+ - stream_purpose: <why the user wanted this monitor>
68
+ - signal_vocabulary: errors, warnings, failures, state changes, explicit success/failure markers
69
+ - noise_vocabulary: timestamps-only, heartbeat-only, repeated unchanged status, empty pings
70
+ 6. Only update if the evidence clearly justifies a change (signal-to-noise is poor or a pattern is clearly wrong).
71
+ 7. If an update is needed, call tap_set_event_filter with the revised patterns for emitter '<command-emitter-name>'.
72
+ 8. Always call tap_post with channel '<stream-name>' and a REVIEW RECORD wrapped in markers:
73
+ === BEGIN_REVIEW_RECORD ===
74
+ {
75
+ "reviewed_at": "<ISO timestamp>",
76
+ "entries_examined": <number>,
77
+ "issue_type": "noise_pattern|missed_signal|over_filtering|duplicate_inject|no_change",
78
+ "patterns_changed": ["short label for each change"],
79
+ "remaining_noise_delta": ["what still looks noisy or uncertain"],
80
+ "signal_vocabulary": ["terms treated as signal"],
81
+ "noise_vocabulary": ["terms treated as noise"]
82
+ }
83
+ === END_REVIEW_RECORD ===
84
+ 9. Do not report your findings to the user unless you made a change. If you made a change, send one short message describing what you updated and why.
69
85
  ```
70
86
 
71
87
  Substitute the real emitter name and stream name into the prompt before passing it to `tap_start_emitter`.
@@ -0,0 +1,81 @@
1
+ ---
2
+ name: tap-orchestrate
3
+ description: "Create a coordinator PromptEmitter for multi-agent tap workflows with role-specific sub-emitters, gated handoffs, and evidence records. Use when the user asks to orchestrate multiple agents, roles, workstreams, or parallel implementation/review/test phases."
4
+ argument-hint: "<objective and roles>"
5
+ user-invocable: true
6
+ ---
7
+
8
+ Create a coordinator PromptEmitter that manages a multi-agent workflow using tap
9
+ emitters and EventStreams.
10
+
11
+ Use this for work that naturally decomposes into roles such as planner,
12
+ implementer, reviewer, tester, documenter, provider-builder, or release
13
+ coordinator. Do not use it for a single straightforward task.
14
+
15
+ ## What to create
16
+
17
+ Use `tap_start_emitter` to create a **coordinator PromptEmitter**:
18
+
19
+ - Name: `orchestrate-<objective-slug>`.
20
+ - Prompt: a self-contained orchestration contract.
21
+ - Schedule: `everySchedule = ["2m", "5m", "10m"]`.
22
+ - `lifespan = "temporary"` unless the user explicitly asks for persistence.
23
+ - `ownership = "modelOwned"`.
24
+ - `subscribe = false`.
25
+ - `maxRuns = 50` unless the user gives a budget.
26
+
27
+ The coordinator may create role-specific PromptEmitters only when the role has a
28
+ clear deliverable and verification surface. Each role emitter should write its
29
+ handoff to an EventStream with a stable name:
30
+
31
+ ```text
32
+ orchestrate-<objective>-<role>
33
+ ```
34
+
35
+ ## Coordinator prompt contract
36
+
37
+ The coordinator prompt must include:
38
+
39
+ ```text
40
+ Objective: <user objective>
41
+ Roles: <role list, deliverables, and verification surface>
42
+ Gate policy:
43
+ - Do not hand off to the next role until required artifacts or EventStream notes exist.
44
+ - Read role EventStreams with tap_stream_history before deciding a gate is satisfied.
45
+ - If parallel work is safe, create independent role emitters in the same iteration.
46
+ - If a role blocks, post ORCHESTRATION BLOCKED and stop the coordinator.
47
+ Audit trail:
48
+ - After every decision, call tap_post to the coordinator stream with ORCHESTRATION RECORD:
49
+ role, gate, evidence checked, decision, next handoff.
50
+ Safety:
51
+ - Do not spawn duplicate role emitters.
52
+ - Do not mutate another role's scope unless the coordinator evidence supports it.
53
+ - Stop all role emitters when the orchestration completes or blocks.
54
+ ```
55
+
56
+ ## Required behavior
57
+
58
+ 1. Parse the objective and any requested roles.
59
+ 2. If roles are missing, infer a minimal role set from the objective:
60
+ planner, implementer, reviewer, validator.
61
+ 3. Create the coordinator PromptEmitter only; do not immediately create role
62
+ emitters in the setup turn. The coordinator will create them when it runs.
63
+ 4. Confirm:
64
+ - coordinator emitter name and stream
65
+ - roles
66
+ - gate policy
67
+ - max iteration budget
68
+
69
+ ## Good role patterns
70
+
71
+ - **planner**: produce plan and boundaries; verification is a plan note.
72
+ - **implementer**: make code/doc changes; verification is diff + focused checks.
73
+ - **reviewer**: inspect changes; verification is review note with findings.
74
+ - **validator**: run tests/build/evals; verification is command evidence.
75
+ - **release**: bump/push/publish only after validator passes.
76
+
77
+ ## When not to use
78
+
79
+ Do not create orchestration for a normal `/tap-goal` objective that one agent can
80
+ complete directly. Orchestration adds coordination cost and should only be used
81
+ when parallel roles or gated handoffs are genuinely useful.
package/dist/version.json CHANGED
@@ -1,3 +1,3 @@
1
1
  {
2
- "version": "2.0.7"
2
+ "version": "2.0.9"
3
3
  }
@@ -0,0 +1,33 @@
1
+ # ADR 0001: Persistent config emitters default to user ownership
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` existed when this ADR was written, so this file uses the standard ADR sections.
10
+
11
+ The project documentation treats on-disk emitter definitions as persistent workflows and recommends `userOwned` ownership for persistent, recurring, or policy-bearing emitters. Existing normalization code defaulted missing persisted emitter ownership and lifespan to `modelOwned` and `temporary`, even though configured emitters are restored from disk and listed as persistent definitions.
12
+
13
+ That mismatch can weaken the protection model for manually edited config files: an emitter with no explicit ownership may be treated as model-owned during normalization even though it came from persistent user config.
14
+
15
+ ## Decision
16
+
17
+ Persisted/on-disk emitter definitions default to:
18
+
19
+ - `ownership: "userOwned"`
20
+ - `lifespan: "persistent"`
21
+
22
+ Normalization still honors explicit compatibility fields:
23
+
24
+ - `ownership` or legacy `managedBy`, including explicit `modelOwned`
25
+ - `lifespan` or legacy `scope`, including explicit `temporary`
26
+
27
+ Runtime-created temporary/model-owned emitters keep their explicit normalized fields when serialized, so this default only applies when persisted config omits ownership/lifespan signals.
28
+
29
+ ## Consequences
30
+
31
+ - Manually authored config aligns with the documented safety default for persistent workflows.
32
+ - Legacy config with explicit `modelOwned` or `temporary` remains compatible.
33
+ - Future changes that alter config ownership/lifespan defaults should update or supersede this ADR.
@@ -0,0 +1,36 @@
1
+ # ADR 0002: Local provider gateway token and runtime shutdown boundary
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` exists, so this ADR follows the existing ADR style.
10
+
11
+ The provider gateway lets local external processes register tools with a Copilot session over WebSocket. That boundary needs safe defaults for network exposure, token discovery, token lifetime, and shutdown cleanup. ADR 0001 covers persisted emitter ownership defaults and does not cover provider gateway security or runtime lifecycle. The extension entrypoint should not own provider protocol details; it delegates session lifecycle handling to the runtime facade.
12
+
13
+ Providers may be launched from the active Copilot environment, from a sibling terminal, or via the provider SDK. Sibling processes cannot reliably inherit `TAP_PROVIDER_TOKEN`, so the gateway needs a local discovery path without turning the token into durable configuration.
14
+
15
+ ## Decision
16
+
17
+ - The provider WebSocket gateway binds to loopback by default: `127.0.0.1:9400`. Any non-loopback host must be an explicit runtime override.
18
+ - On gateway start, generate a fresh provider token for the running gateway instance.
19
+ - Publish the token in both supported discovery locations:
20
+ - `TAP_PROVIDER_TOKEN` in the gateway process environment.
21
+ - `<COPILOT_HOME or ~/.copilot>/extensions/tap/.provider-token` for sibling local providers and SDK auto-discovery.
22
+ - Create the token directory with restrictive permissions (`0700`) and write the token file with restrictive permissions (`0600`), including a best-effort chmod after write.
23
+ - Treat the token file as runtime state, not config: remove it and clear `TAP_PROVIDER_TOKEN` when the gateway stops.
24
+ - On session shutdown, the runtime facade owns provider lifecycle coordination. Entrypoints delegate shutdown listener registration to the cached runtime, which de-duplicates the effective `session.shutdown` handler across extension reloads and logs cleanup failures instead of allowing fire-and-forget rejections. The runtime sends `session.lifecycle` with `state: "shutdown.pending"` and a runtime-owned deadline (currently 10 seconds). Stop accepting new gateway connections immediately, but keep existing provider sockets open until they send `goodbye`, all sockets drain, or the deadline expires. After the deadline, close remaining provider sockets.
25
+ - Runtime session-shutdown cleanup uses the shutdown-specific emitter wait path from ADR 0003 before reporting cleanup complete; ordinary user/tool stop requests remain non-blocking stop requests.
26
+ - Bound-provider protocol failures that can represent an in-flight tool call's terminal response must fail deterministically. If a malformed message, oversized message, invalid `tool.result`, or syntactically valid `tool.result` with an unknown call id cannot be correlated while provider calls are pending, reject exactly one pending call with the protocol/validation error. If multiple calls are pending and correlation is impossible, disconnect the provider and reject all pending calls with `DISCONNECTED`. Unknown-id `tool.result` messages are protocol errors and are not delivered to normal tool-result callbacks.
27
+
28
+ ## Consequences
29
+
30
+ - Local providers can connect from sibling terminals without manually copying environment variables, while the gateway remains limited to loopback by default.
31
+ - Token exposure is scoped to the current OS user profile and gateway runtime. The token is still bearer auth, so users must not share it or expose the token file to untrusted processes.
32
+ - Gateway stop acts as token revocation by deleting the token file and clearing the environment value.
33
+ - Providers get a bounded cleanup window during session shutdown, avoiding abrupt termination when they respond promptly and preventing indefinite shutdown hangs when they do not.
34
+ - Repeated extension reloads do not accumulate active shutdown cleanup handlers against the cached runtime, and rejected async shutdown cleanup is visible in stderr/session diagnostics.
35
+ - Invalid or uncorrelatable terminal provider behavior cannot leave Copilot-facing tool promises pending indefinitely, including tools without declared timeouts. Some parse/payload errors remain non-fatal when no calls are pending, but become fail-fast while ambiguous in-flight calls could otherwise be orphaned.
36
+ - Future changes to provider token discovery, host binding, token persistence, runtime shutdown ownership/deadlines/listener ownership, or pending-call fail-fast semantics should update or supersede this ADR.
@@ -0,0 +1,68 @@
1
+ # ADR 0003: Emitter delivery lifecycle is retryable and transactional
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` exists, so this ADR follows the existing ADR style.
10
+
11
+ Persistent emitter auto-start, stream session-injector policy, notification delivery, and supervisor persistence happen across separate modules. A persistent emitter may produce output during session startup, before the Copilot session is attached to the cached runtime, and it may route `surface` or `inject` outcomes through the stream's session injector immediately.
12
+
13
+ The lifecycle contract needs to avoid these failure modes:
14
+
15
+ - auto-started persistent emitters must not silently suppress `surface`/`inject` outcomes when no explicit stream injector is configured;
16
+ - queued notification delivery must not discard a batch solely because the session is not attached yet;
17
+ - a failed persistent start must not report failure while leaving a newly-started emitter running outside durable config.
18
+ - startup surface logs must not disappear when emitters produce `surface` output before the Copilot session object is attached to the cached runtime.
19
+ - session shutdown cleanup must not report completion before child processes close, except through a bounded timeout path.
20
+ - idle PromptEmitters auto-started during session startup must not remain `WAITING` forever solely because the initial `session.idle` transition happened before the runtime's session activity bridge was attached.
21
+
22
+ ## Decision
23
+
24
+ - Config bootstrap preserves explicit `subscribe: false` on emitter definitions.
25
+ - When a persistent stream already has a configured session injector, auto-start does not overwrite that injector during emitter start; the persisted stream policy remains authoritative.
26
+ - When no explicit stream injector is configured for the emitter stream, bootstrap allows the emitter's normal subscription default to apply so `surface`/`inject` filter outcomes have an enabled delivery path.
27
+ - Notification dispatch removes a batch from the queue only for an attempted send, and requeues that batch at the front when delivery fails because the session is not attached. Retry is delayed and single-flight to avoid tight retry loops while preserving notification ordering. The retry queue is bounded in memory: new updates are dropped when the queue is full, and retry requeues preserve the failed batch at the front while dropping any overflow from the tail. Drops are reported through session diagnostics.
28
+ - Session end/shutdown advances the notification dispatch generation, cancels pending retry timers, and clears queued-but-unsent notifications so stale background updates from one Copilot session cannot be injected into a later session.
29
+ - Session timeline logs emitted before initial attach are queued in bounded memory by the session port and replayed after attach. This covers startup `surface` delivery races without changing post-attach logging behavior.
30
+ - Supervisor start is transactional around newly-started emitters: if post-start persistence fails, the supervisor requests a bounded stop-and-wait for the new emitter, removes/restores the in-memory emitter entry after the stop settles, restores the prior session-injector state, and best-effort restores the previous persistent emitter config before surfacing the failure. If the bounded rollback wait times out or stop fails, the runtime emitter entry and current injector state remain visible for manual cleanup while durable config is still restored best-effort.
31
+ - Bootstrap restoration of an emitter that already exists in persistent config is
32
+ a runtime-only start path. When bootstrap does not request any durable
33
+ mutation, supervisor start skips config rewrites and persistence rollback so a
34
+ read-only or temporarily unwritable config file cannot by itself prevent
35
+ auto-start recovery. User-initiated persistent starts and persistent injector
36
+ updates continue to use the transactional persistence behavior above.
37
+ - Scheduled emitter iterations must clear `inFlight` in a `finally` path and convert unexpected thrown/rejected iteration failures into deterministic failed iteration results so unhandled rejections cannot strand an emitter in `RUNNING`.
38
+ - Ordinary emitter `stop()` remains a request/transition operation for tool compatibility. Shutdown uses a separate wait path that requests stop, waits for child `close`/in-flight completion, and returns per-emitter outcomes (`stopped`, `timedOut`, or `failed`) instead of discarding timeout/rejection details. Hook cleanup summaries report those outcomes rather than claiming unconditional success.
39
+ - After session listeners are attached, the runtime synthesizes one initial idle lifecycle nudge by marking the session port idle and calling the supervisor's existing `onSessionIdle()` path. Later real activity events clear scheduled idle work through the normal session-activity transition. This gives persistent idle PromptEmitters auto-started during `onSessionStart` a deterministic first scheduling path even when the SDK does not replay an already-observed `session.idle` event to late listeners.
40
+ - CommandEmitter event delivery is resolved through the shared stream delivery policy seam, preserving the existing EventFilter + SessionInjector matrix:
41
+ - `drop` stores nothing and increments the dropped-line count.
42
+ - `keep` stores the event and surfaces it only when the stream SessionInjector is enabled with `delivery: "all"`.
43
+ - `surface` stores the event and surfaces it only when the stream SessionInjector is enabled with `delivery: "surface"` or `delivery: "all"`.
44
+ - `inject` stores the event and enqueues session injection when the stream SessionInjector is enabled with `delivery: "important"`, `"all"`, `"surface"`, or `"inject"`; it surfaces only for enabled `delivery: "surface"` or `"all"`.
45
+ - Nullish SessionInjector delivery continues to default to `surface`; disabled SessionInjectors and `keep`/`drop`/unknown non-null delivery modes do not proactively surface or inject.
46
+ - System notifications emitted by the line router continue to use the same SessionInjector injection decision for enqueueing, but they are not timeline-surfaced by that path.
47
+ - `handlePromptResult()` remains append-only compatibility code; this decision does not introduce PromptEmitter assistant-response capture.
48
+
49
+ ## Consequences
50
+
51
+ - Persistent emitters with `inject` or `surface` event-filter outcomes can deliver immediately after auto-start without requiring a duplicate stream entry.
52
+ - User-configured persistent stream injector policy remains the source of truth when both a stream definition and an emitter definition exist.
53
+ - Session startup races defer notification delivery instead of losing background events.
54
+ - Startup surface events can appear after attach instead of being silently suppressed by a temporarily detached session port.
55
+ - Startup idle PromptEmitters can run after attach without waiting for a second idle transition; if real session activity follows attach, the existing activity bridge cancels the pending idle timer.
56
+ - Retried notifications retain FIFO order relative to later queued updates that remain inside the bounded retry queue; deterministic overflow drops prefer preserving older/retried work over newer tail entries.
57
+ - Session lifecycle clearing prevents queued notification retries from crossing session boundaries.
58
+ - A caller that sees persistent emitter start fail should not also have a hidden running emitter to clean up; if rollback cannot confirm settlement before the bounded timeout, the emitter remains visible in runtime state for manual cleanup.
59
+ - Persistent emitter auto-start from config can succeed even when the existing
60
+ config file cannot be rewritten, provided no durable state change was
61
+ requested by bootstrap.
62
+ - Shutdown cleanup can wait for real process closure without changing the public meaning of user/tool stop requests, and session summaries can distinguish emitters that stopped, timed out, or failed during cleanup.
63
+ - The shared delivery seam centralizes CommandEmitter delivery decisions without changing EventFilter outcomes, SessionInjector authority, notification retry behavior, or PromptEmitter capture semantics.
64
+ - Future changes to auto-start subscription defaults, bootstrap persistence
65
+ writes, notification/log replay policy, attach-time idle nudging, scheduled
66
+ iteration failure handling, notification retry bounds/session clearing,
67
+ shutdown wait reporting behavior, or supervisor start rollback semantics
68
+ should update or supersede this ADR.
@@ -0,0 +1,86 @@
1
+ # ADR 0004: Persistent config stream injector and alias semantics
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` exists, so this ADR follows the existing ADR style.
10
+
11
+ ADR 0001 records that persisted emitter definitions default to user-owned,
12
+ persistent ownership semantics. Persistent stream definitions have the same
13
+ on-disk/user-authored character, but session-injector normalization defaulted
14
+ missing ownership and lifespan to model-owned, temporary values while runtime
15
+ bootstrap applied persisted streams as user-owned, persistent definitions.
16
+
17
+ Emitter config also has two names for the destination EventStream:
18
+
19
+ - `stream` is the documented config field.
20
+ - `channel` is a legacy alias still accepted by older tools and config files.
21
+
22
+ Keeping both fields in normalized/serialized config lets a stale `channel`
23
+ silently override an edited `stream` in runtime paths that consume only
24
+ `channel`.
25
+
26
+ ## Decision
27
+
28
+ - Persisted stream `sessionInjector` entries default to:
29
+ - `ownership: "userOwned"`
30
+ - `lifespan: "persistent"`
31
+ - Explicit compatibility fields are still honored:
32
+ - `ownership` or legacy `managedBy`
33
+ - `lifespan` or legacy `scope`
34
+ - `stream` is the canonical persisted emitter destination field.
35
+ - `channel` remains accepted as an input alias for backwards compatibility.
36
+ - When both `stream` and `channel` are present and conflict, `stream` wins.
37
+ - Normalization and serialization drop the legacy `channel` alias after resolving
38
+ the canonical `stream`, preventing stale aliases from being persisted again.
39
+ - Runtime emitter normalization and configured-emitter projection prefer
40
+ `stream` over `channel` so user-authored config and runtime routing agree.
41
+ - Config migration persistence is best-effort: loading a readable config should
42
+ succeed even if saving the canonical migrated form fails. The store should skip
43
+ the migration save when the parsed on-disk JSON is already canonically equal.
44
+ - Config loading is transactional. The store builds candidate cwd/path/config
45
+ state in locals and commits it only after read, parse, and migration all
46
+ succeed, or after the config search completes successfully with no file found.
47
+ If a load fails, the previous runtime config remains active and subsequent
48
+ saves are refused until a later load succeeds.
49
+ - Persisted stream entries must include an explicit, non-blank string `name`.
50
+ `name: "main"` remains valid, but missing, blank, non-string, or otherwise
51
+ non-normalizable names are config validation errors and are never defaulted to
52
+ `main`.
53
+ - A persisted stream entry with only metadata such as `name` and `description`
54
+ does not define durable SessionInjector policy. Applying such an entry keeps
55
+ the runtime injector on the non-protected default
56
+ (`modelOwned`/`temporary`, disabled) instead of synthesizing a
57
+ `userOwned`/`persistent` injector. Only an explicit `sessionInjector` or
58
+ legacy `subscription` object receives the persisted stream injector defaults
59
+ above.
60
+ - Bootstrap auto-start of emitter definitions already present in config is a
61
+ runtime restoration path, not a durable config update. It must not require a
62
+ config rewrite when no persisted emitter or stream policy is being changed.
63
+
64
+ ## Consequences
65
+
66
+ - Hand-authored stream injector config aligns with runtime persistent semantics
67
+ and the user-owned defaults documented for durable workflows.
68
+ - Existing channel-only config remains valid and is migrated to `stream`.
69
+ - Editing `stream` is deterministic even if an old `channel` field remains.
70
+ - Read-only or temporarily unwritable config files no longer prevent the
71
+ extension from using an otherwise valid config; users receive a warning when
72
+ canonical migration persistence fails.
73
+ - Malformed or unreadable config files cannot replace the last known-good
74
+ runtime config and cannot be overwritten later by an empty or partially loaded
75
+ state via `save()`.
76
+ - Malformed persisted stream entries fail config normalization instead of
77
+ silently enabling or modifying the default `main` stream.
78
+ - Bare persisted stream entries preserve stream metadata without turning normal
79
+ unforced SessionInjector updates into protected user-owned mutations.
80
+ - Auto-start restoration can recover already-persisted emitters from a
81
+ read-only config file, while user-initiated durable emitter starts and stream
82
+ injector changes still persist and roll back on save failure.
83
+ - Future changes to persistent config defaults, stream-name validation, emitter
84
+ destination aliases, transactional load/save safety, bootstrap restoration
85
+ writes, or migration write-failure behavior should update or supersede this
86
+ ADR.
@@ -0,0 +1,48 @@
1
+ # ADR 0005: Bound-provider SDK push and dynamic tools
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` exists, so this ADR follows the existing ADR style.
10
+
11
+ The provider SDK exposes `push`, `surface`, `keep`, and `updateTools`. Detour already calls `provider.push()`. Before this decision, the gateway accepted only `auth`, `hello`, `tool.result`, and `goodbye` from providers after binding, so SDK pushes and dynamic tool updates were rejected as unknown message types.
12
+
13
+ The full provider-interface design contains broader advanced-provider features such as hooks updates, context updates, stream queries, subscriptions, all-session binding, pairing auth, revisions, and update acknowledgments. Those features are intentionally outside this follow-up.
14
+
15
+ ## Decision
16
+
17
+ - A provider must still authenticate and send `hello` for exactly one active session before using the new messages.
18
+ - While Bound, a provider may send `push` with:
19
+ - `level`: `keep`, `surface`, or `inject`
20
+ - `event`: non-empty text
21
+ - optional `stream`, defaulting in the SDK to the provider name
22
+ - optional `sessionId`, which must match the session selected in `hello`
23
+ - optional object `metadata`
24
+ - Explicit push stream names use the same canonical EventStream identifier rules as stream tools. The gateway rejects non-normalizable stream names instead of falling back to `main`; accepted pushes use the canonical stream name consistently for storage, notifications, and return values. If a push omits `stream`, the runtime uses the canonical provider name/id when possible and otherwise falls back to `main`.
25
+ - Push delivery is immediate and session-bound:
26
+ - `keep` appends to the EventStream only.
27
+ - `surface` appends and logs to the Copilot timeline.
28
+ - `inject` appends, logs, and enqueues a session injection through the existing retrying notification dispatcher.
29
+ - Provider push delivery uses the shared stream delivery policy seam in provider-authoritative mode. The provider-selected `level` remains the complete delivery policy for that push: provider `inject` is not gated by the destination stream's SessionInjector, and provider `keep`/`surface` semantics are unchanged.
30
+ - Provider push surfacing is best-effort: timeline logging failures must not create unhandled promise rejections or prevent the already-appended stream event from remaining stored.
31
+ - While Bound, a provider may send `tools.update` with a complete replacement `tools` array using the same validation and 100-tool cap as `hello.tools`.
32
+ - `tools.update` is accepted only for the bound session. A supplied `sessionId` must match the session selected in `hello`.
33
+ - Successful `tools.update` replaces the provider's registry entry, updates the connection's active tool definitions, and schedules the same debounced session tool refresh used by provider connect/disconnect.
34
+ - Rejected `tools.update` messages leave the previous provider tool list active.
35
+ - Existing in-flight tool calls are not cancelled when a successful update removes their tool definition; they continue to their result, timeout, cancellation, or disconnect outcome.
36
+ - `hello.ack` may include `sessionId` so SDK providers can observe the bound session.
37
+ - This minimal contract does not add hooks updates, context updates, stream queries, subscriptions, all-session binding, pairing auth, per-update revisions, or success acknowledgments.
38
+
39
+ ## Consequences
40
+
41
+ - SDK providers can use `provider.push()`, `provider.surface()`, `provider.keep()`, and `provider.updateTools()` without being rejected by the gateway.
42
+ - Detour page messages can reach the Copilot session instead of failing as unknown provider messages.
43
+ - The shared delivery seam records the provider delivery matrix alongside CommandEmitter delivery policy while preserving this ADR's existing `keep`/`surface`/`inject` behavior.
44
+ - Invalid provider-selected stream names fail closed instead of silently misrouting events into `main`.
45
+ - Dynamic tools remain simple and deterministic: each update replaces the provider's complete tool list and reuses the existing reload path.
46
+ - Providers do not receive a success ack for `tools.update`; they should treat absence of `error` as success in the minimal profile.
47
+ - The gateway remains single-session-bound for external providers, preserving the current security boundary.
48
+ - Future changes to provider push semantics, dynamic tool update acknowledgments/revisions, multi-session binding, pairing auth, or advanced provider capabilities should update or supersede this ADR.
@@ -0,0 +1,46 @@
1
+ # ADR 0006: Command-emitter cwd stays within the session workspace
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` exists, so this ADR follows the existing ADR style.
10
+
11
+ Command emitters accept an optional `cwd` field so a watcher can run from a
12
+ subdirectory such as `services/api`. Before this decision, `cwd` resolution used
13
+ `path.resolve(sessionCwd, requestedCwd)`, which also accepted absolute paths and
14
+ relative traversal that escaped the Copilot session working directory.
15
+
16
+ That behavior made a model-supplied emitter request capable of changing the
17
+ process working directory outside the workspace boundary selected for the
18
+ session. ADR 0001 and ADR 0004 cover persistent ownership/config defaults, and
19
+ ADR 0002 covers provider gateway security; they do not define command-emitter
20
+ working-directory boundaries.
21
+
22
+ ## Decision
23
+
24
+ - Command-emitter `cwd` is interpreted as a path relative to the session cwd.
25
+ - Omitted or blank `cwd` uses the session cwd.
26
+ - `cwd: "."` is allowed and resolves to the session cwd.
27
+ - Subdirectories under the session cwd are allowed.
28
+ - Absolute `cwd` values are rejected.
29
+ - Relative paths that resolve outside the session cwd, including `..` traversal,
30
+ are rejected.
31
+ - This is API hardening for emitter configuration. It is not a shell sandbox:
32
+ the spawned command still runs with the user's normal OS permissions, and the
33
+ command itself may access files or change directories according to those
34
+ permissions.
35
+
36
+ ## Consequences
37
+
38
+ - Tool callers and persisted configs must express command-emitter working
39
+ directories as workspace-relative paths.
40
+ - Existing configs that used absolute paths or traversal outside the session cwd
41
+ will fail validation during auto-start until rewritten as in-workspace
42
+ relative paths.
43
+ - The session cwd remains the durable boundary for command-emitter placement,
44
+ reducing accidental or model-initiated execution from unrelated directories.
45
+ - Future changes to command-emitter cwd resolution, workspace-boundary behavior,
46
+ or stronger process sandboxing should update or supersede this ADR.
@@ -0,0 +1,62 @@
1
+ # ADR 0007: Runtime session workspace context boundary
2
+
3
+ ## Status
4
+
5
+ Accepted
6
+
7
+ ## Context
8
+
9
+ No `docs/adr/0000-template.md` exists, so this ADR follows the existing ADR
10
+ style.
11
+
12
+ Runtime services previously shared the active session workspace through loose
13
+ getter/setter cwd closures. That made the mutable cwd boundary easy to
14
+ thread through services but left ownership split across runtime subsystem
15
+ bootstrap, config loading, emitter service fallback behavior, and supervisor cwd
16
+ validation.
17
+
18
+ ADR 0004 requires persistent config loading to be transactional: failed loads
19
+ must not replace the last-known-good config state. ADR 0006 requires
20
+ command-emitter `cwd` values to remain relative to the session workspace and to
21
+ be validated by the existing workspace-boundary rules. Those decisions remain
22
+ the source of truth for config safety and emitter cwd semantics; this ADR records
23
+ where the runtime owns the shared session/workspace context.
24
+
25
+ ## Decision
26
+
27
+ - Introduce a runtime-owned session/workspace context for active session
28
+ identity metadata, current session/base cwd, current config cwd, and emitter
29
+ cwd resolution.
30
+ - Runtime subsystem construction creates or receives this context once and then
31
+ hands out narrow capability views:
32
+ - config bootstrap receives config cwd resolution/commit capabilities;
33
+ - emitter services and supervisor receive emitter cwd resolution capabilities.
34
+ - Config bootstrap resolves the candidate cwd before loading config, then
35
+ commits the runtime context's base/config cwd only after `configStore.load()`
36
+ succeeds or completes the no-config-found path. This keeps runtime cwd
37
+ ownership aligned with ADR 0004 last-known-good load semantics.
38
+ - Emitter cwd resolution remains delegated to the existing path validation
39
+ helpers, preserving ADR 0006 behavior:
40
+ - omitted, blank, or `.` emitter `cwd` uses the session cwd;
41
+ - subdirectories under the session cwd are allowed;
42
+ - absolute paths and traversal outside the session cwd are rejected;
43
+ - this remains cwd placement hardening, not a shell sandbox.
44
+ - Emitter start fallback preserves the prior nullish-only base cwd behavior:
45
+ omitted/null base cwd options use the current session cwd, while explicitly
46
+ supplied base cwd values are normalized as supplied.
47
+ - Public tool names, provider protocol behavior, emitter delivery semantics,
48
+ persistence defaults, and config canonicalization rules are unchanged.
49
+
50
+ ## Consequences
51
+
52
+ - The mutable session/workspace boundary has one owner instead of ad hoc closure
53
+ plumbing across services.
54
+ - Dependency injection remains capability-specific: consumers receive config or
55
+ emitter workspace capabilities rather than a broad runtime object.
56
+ - Failed config loads cannot advance the runtime context's config/base cwd ahead
57
+ of the last-known-good config load state.
58
+ - Command-emitter cwd validation remains centralized on the existing helper and
59
+ keeps the ADR 0006 workspace-relative contract.
60
+ - Future changes to runtime session/workspace ownership, config cwd commit
61
+ timing, base cwd fallback semantics, or emitter cwd validation ownership
62
+ should update or supersede this ADR.
package/docs/evals.md ADDED
@@ -0,0 +1,41 @@
1
+ # Evals
2
+
3
+ Testing infrastructure for copilot-tap-extension.
4
+
5
+ ## Quick validation
6
+
7
+ ```bash
8
+ npm run check # syntax check
9
+ npm run evals:smoke # smoke test
10
+ npm run evals:validate-modes # interactive vs prompt-mode gap
11
+ ```
12
+
13
+ ## How the eval runner works
14
+
15
+ `evals/run.mjs` starts one ACP server, creates fresh SDK sessions, and mounts the shared runtime from `src/tap-runtime.mjs` directly into those sessions. This means `smoke` and `run` exercise the same EventStream/EventEmitter logic as the extension without depending on `.github/extensions` being discovered in a headless session.
16
+
17
+ The runner writes prompt, response, error, and full event-transcript artifacts under `evals/results/...`.
18
+
19
+ ## Supported paths
20
+
21
+ The reliable supported paths are:
22
+
23
+ 1. **Interactive foreground Copilot sessions**
24
+ 2. **ACP/SDK sessions that mount the shared runtime directly**
25
+
26
+ Do **not** treat headless prompt-mode or other non-interactive repo-extension loading as reliable. Use `validate-modes` to prove that distinction.
27
+
28
+ ## Extension-loader evals
29
+
30
+ The real repo-scoped extension loader is validated separately. `npm run evals:validate-modes` probes `copilot -p` with the actual `.github/extensions` entrypoint, then compares with the same prompt in an interactive session.
31
+
32
+ For interactive executor evals:
33
+
34
+ ```bash
35
+ node evals/run.mjs prepare-interactive --case E001
36
+ # run the printed prompt inside an interactive `copilot` session
37
+ # then run the printed /share command
38
+ node evals/run.mjs judge-interactive --run-dir "<printed-run-dir>"
39
+ ```
40
+
41
+ This keeps the executor in a foreground Copilot session where the extension can attach, uses `/share <path>` to persist the transcript, and runs a tool-free ACP judge against the transcript plus config snapshots. If you reuse one session for multiple cases, run `/clear` before each next case.