copilot-tap-extension 2.0.8 → 2.0.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -1
- package/SOUL.md +51 -0
- package/bin/install.mjs +2 -1
- package/dist/copilot-instructions.md +5 -0
- package/dist/extension.mjs +361 -20
- package/dist/version.json +1 -1
- package/docs/adr/0001-persistent-config-default-ownership.md +33 -0
- package/docs/adr/0002-local-provider-gateway-runtime-security.md +36 -0
- package/docs/adr/0003-emitter-delivery-lifecycle.md +68 -0
- package/docs/adr/0004-persistent-config-canonical-streams.md +86 -0
- package/docs/adr/0005-provider-sdk-push-and-dynamic-tools.md +48 -0
- package/docs/adr/0006-command-emitter-cwd-workspace-boundary.md +46 -0
- package/docs/adr/0007-runtime-session-workspace-context.md +62 -0
- package/docs/evals.md +41 -0
- package/docs/evolution-of-tap-icon.html +989 -0
- package/docs/providers.md +242 -0
- package/docs/recipes/adaptive-agent.md +303 -0
- package/docs/recipes/agent-brainstorm/100-extension-ideas.md +288 -0
- package/docs/recipes/agent-brainstorm/deep-ideas.md +216 -0
- package/docs/recipes/ambient-guardian.md +314 -0
- package/docs/recipes/browser-bridge.md +162 -0
- package/docs/recipes/codex-goals-for-tap-goal.md +136 -0
- package/docs/recipes/copilot-sdk-canvas.md +147 -0
- package/docs/recipes/deferred-cognition.md +310 -0
- package/docs/recipes/provider-integration-patterns.md +93 -0
- package/docs/recipes/provider-interface-advanced.md +1364 -0
- package/docs/recipes/provider-interface-core-profile.md +568 -0
- package/docs/recipes/tap-control-plane-roadmap.md +60 -0
- package/docs/recipes/universal-tool-gateway.md +202 -0
- package/docs/reference.md +229 -0
- package/docs/use-cases.md +348 -0
- package/package.json +4 -1
- package/providers/detour/README.md +84 -0
- package/providers/detour/bridge.js +219 -0
- package/providers/detour/index.mjs +322 -0
- package/providers/detour/package-lock.json +577 -0
- package/providers/detour/package.json +19 -0
- package/providers/detour/scripts/build.mjs +31 -0
- package/providers/detour/src/bridge.js +256 -0
- package/providers/detour/src/contracts.js +40 -0
- package/providers/detour/src/inspector.js +260 -0
- package/providers/detour/src/inspector.test.mjs +53 -0
- package/providers/detour/src/panel.js +465 -0
- package/providers/detour/src/provider-core.js +233 -0
- package/providers/detour/src/provider-core.test.mjs +185 -0
- package/providers/detour/src/react-context-core.js +143 -0
- package/providers/detour/src/react-context.js +44 -0
- package/providers/detour/src/react-context.test.mjs +41 -0
- package/providers/templates/README.md +23 -0
- package/providers/templates/ci-review-provider.mjs +46 -0
- package/providers/templates/detour-workflow-provider.mjs +41 -0
- package/providers/templates/jira-github-provider.mjs +42 -0
- package/providers/templates/provider-utils.mjs +45 -0
- package/providers/templates/sast-triage-provider.mjs +51 -0
|
@@ -0,0 +1,242 @@
|
|
|
1
|
+
# Extending ※ tap with Providers
|
|
2
|
+
|
|
3
|
+
External processes can register tools with your Copilot session through the **Provider Interface**. A provider connects via WebSocket, authenticates, declares its tools, and handles calls — without knowing anything about the Copilot SDK.
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
┌─────────────────┐ WebSocket (JSON) ┌─────────────────┐
|
|
7
|
+
│ ※ tap Gateway │◄── ws://localhost:9400 ──► │ Provider │
|
|
8
|
+
│ │ │ (your process) │
|
|
9
|
+
│ Owns Copilot SDK │ ── sessions ──────────► │ Knows nothing │
|
|
10
|
+
│ Runs WS server │ ◄── auth ───────────── │ about Copilot │
|
|
11
|
+
│ Registers tools │ ── hello.ack ─────────► │ Declares tools │
|
|
12
|
+
│ Dispatches calls │ ◄── hello ──────────── │ Handles calls │
|
|
13
|
+
│ │ ── tool.call ─────────► │ │
|
|
14
|
+
│ │ ◄── tool.result ────── │ │
|
|
15
|
+
│ │ ◄── push ───────────── │ Pushes events │
|
|
16
|
+
│ │ ◄── tools.update ───── │ Updates tools │
|
|
17
|
+
└─────────────────┘ └─────────────────┘
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Quick start
|
|
21
|
+
|
|
22
|
+
### 1. Start a Copilot session
|
|
23
|
+
|
|
24
|
+
The gateway starts automatically on loopback port 9400 (`127.0.0.1`, reachable as `localhost`) when ※ tap loads. It generates an auth token and exposes it in two local-only discovery locations:
|
|
25
|
+
|
|
26
|
+
- `TAP_PROVIDER_TOKEN` for providers launched with the Copilot environment.
|
|
27
|
+
- `<COPILOT_HOME or ~/.copilot>/extensions/tap/.provider-token` for sibling terminals and SDK auto-discovery.
|
|
28
|
+
|
|
29
|
+
The token directory is created with restrictive permissions (`0700`), the token file is written as `0600`, and the token file is removed when the gateway stops.
|
|
30
|
+
|
|
31
|
+
### 2. Write a provider
|
|
32
|
+
|
|
33
|
+
A provider is any process that speaks the WebSocket protocol. Here's a minimal example in Node.js:
|
|
34
|
+
|
|
35
|
+
```js
|
|
36
|
+
import WebSocket from "ws";
|
|
37
|
+
import fs from "node:fs";
|
|
38
|
+
import os from "node:os";
|
|
39
|
+
import path from "node:path";
|
|
40
|
+
|
|
41
|
+
function discoverToken() {
|
|
42
|
+
if (process.env.TAP_PROVIDER_TOKEN) return process.env.TAP_PROVIDER_TOKEN;
|
|
43
|
+
const copilotHome = process.env.COPILOT_HOME || path.join(os.homedir(), ".copilot");
|
|
44
|
+
return fs.readFileSync(path.join(copilotHome, "extensions", "tap", ".provider-token"), "utf8").trim();
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
const TOKEN = discoverToken();
|
|
48
|
+
const ws = new WebSocket("ws://localhost:9400");
|
|
49
|
+
|
|
50
|
+
ws.on("open", () => {
|
|
51
|
+
ws.send(JSON.stringify({ type: "auth", token: TOKEN }));
|
|
52
|
+
});
|
|
53
|
+
|
|
54
|
+
ws.on("message", (raw) => {
|
|
55
|
+
const msg = JSON.parse(raw);
|
|
56
|
+
|
|
57
|
+
switch (msg.type) {
|
|
58
|
+
case "sessions":
|
|
59
|
+
// Bind to the first available session and register tools
|
|
60
|
+
ws.send(JSON.stringify({
|
|
61
|
+
type: "hello",
|
|
62
|
+
name: "my-provider",
|
|
63
|
+
protocolVersion: 2,
|
|
64
|
+
session: msg.active[0].id,
|
|
65
|
+
tools: [{
|
|
66
|
+
name: "greet",
|
|
67
|
+
description: "Greet someone by name",
|
|
68
|
+
parameters: {
|
|
69
|
+
type: "object",
|
|
70
|
+
properties: { name: { type: "string" } },
|
|
71
|
+
required: ["name"]
|
|
72
|
+
}
|
|
73
|
+
}]
|
|
74
|
+
}));
|
|
75
|
+
break;
|
|
76
|
+
|
|
77
|
+
case "hello.ack":
|
|
78
|
+
console.log(`Registered as ${msg.providerId}`);
|
|
79
|
+
break;
|
|
80
|
+
|
|
81
|
+
case "tool.call":
|
|
82
|
+
// Handle the call and return a result
|
|
83
|
+
ws.send(JSON.stringify({
|
|
84
|
+
type: "tool.result",
|
|
85
|
+
id: msg.id,
|
|
86
|
+
data: `Hello, ${msg.args.name}!`
|
|
87
|
+
}));
|
|
88
|
+
break;
|
|
89
|
+
|
|
90
|
+
case "tool.cancel":
|
|
91
|
+
ws.send(JSON.stringify({
|
|
92
|
+
type: "tool.result",
|
|
93
|
+
id: msg.id,
|
|
94
|
+
error: "Cancelled",
|
|
95
|
+
errorCode: "CANCELLED"
|
|
96
|
+
}));
|
|
97
|
+
break;
|
|
98
|
+
|
|
99
|
+
case "session.lifecycle":
|
|
100
|
+
if (msg.state === "shutdown.pending") {
|
|
101
|
+
ws.send(JSON.stringify({ type: "goodbye", reason: "session ending" }));
|
|
102
|
+
ws.close();
|
|
103
|
+
}
|
|
104
|
+
break;
|
|
105
|
+
|
|
106
|
+
case "error":
|
|
107
|
+
console.error(`[${msg.code}]: ${msg.message}`);
|
|
108
|
+
break;
|
|
109
|
+
}
|
|
110
|
+
});
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### 3. Run it
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
# If you are in the terminal where Copilot is running, the token is also in env:
|
|
117
|
+
echo $TAP_PROVIDER_TOKEN # macOS/Linux
|
|
118
|
+
echo %TAP_PROVIDER_TOKEN% # Windows
|
|
119
|
+
|
|
120
|
+
# In another terminal, either pass the token explicitly:
|
|
121
|
+
TAP_PROVIDER_TOKEN=ptk-... node my-provider.mjs
|
|
122
|
+
|
|
123
|
+
# Or let the SDK/sample discover it from
|
|
124
|
+
# <COPILOT_HOME or ~/.copilot>/extensions/tap/.provider-token:
|
|
125
|
+
node my-provider.mjs
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
Once connected, the `greet` tool appears in Copilot alongside the existing ※ tap tools. Ask Copilot to use it:
|
|
129
|
+
|
|
130
|
+
> _"Use the greet tool to say hello to Alice"_
|
|
131
|
+
|
|
132
|
+
## Connection lifecycle
|
|
133
|
+
|
|
134
|
+
```
|
|
135
|
+
AwaitAuth ──auth──► AwaitHello ──hello──► Bound ──goodbye/disconnect──► Disconnected
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
1. **AwaitAuth** — Provider sends `auth` with the token. Gateway responds with `sessions` (list of active sessions).
|
|
139
|
+
2. **AwaitHello** — Provider sends `hello` with its name, protocol version, session choice, and tool definitions. Gateway responds with `hello.ack`.
|
|
140
|
+
3. **Bound** — Provider receives `tool.call` messages and responds with `tool.result`. It may also send `push` events or replace its tool list with `tools.update`. Gateway sends `session.lifecycle` events.
|
|
141
|
+
4. **Disconnected** — On `goodbye`, WebSocket close, or crash. All tools are removed and in-flight calls fail.
|
|
142
|
+
|
|
143
|
+
On `session.lifecycle` with `state: "shutdown.pending"`, send `goodbye` promptly. The gateway keeps existing provider sockets open until `goodbye` or the shutdown deadline, then closes any remaining sockets.
|
|
144
|
+
|
|
145
|
+
## Message reference
|
|
146
|
+
|
|
147
|
+
| Direction | Type | When |
|
|
148
|
+
|---|---|---|
|
|
149
|
+
| Provider → Gateway | `auth` | First message — send the token |
|
|
150
|
+
| Gateway → Provider | `sessions` | After auth — pick a session |
|
|
151
|
+
| Provider → Gateway | `hello` | After sessions — register tools |
|
|
152
|
+
| Gateway → Provider | `hello.ack` | Bound — tools are live; includes `providerId` and `sessionId` |
|
|
153
|
+
| Gateway → Provider | `tool.call` | Copilot invokes your tool |
|
|
154
|
+
| Provider → Gateway | `tool.result` | Your response (exactly one per call) |
|
|
155
|
+
| Gateway → Provider | `tool.cancel` | Timeout/interrupt — respond with `CANCELLED` |
|
|
156
|
+
| Provider → Gateway | `push` | Store, surface, or inject a provider event |
|
|
157
|
+
| Provider → Gateway | `tools.update` | Replace this provider's tool list |
|
|
158
|
+
| Gateway → Provider | `session.lifecycle` | Session state changes (`started`, `idle`, `shutdown.pending`) |
|
|
159
|
+
| Gateway → Provider | `error` | Something went wrong |
|
|
160
|
+
| Provider → Gateway | `goodbye` | Before disconnecting |
|
|
161
|
+
|
|
162
|
+
## Tool definitions
|
|
163
|
+
|
|
164
|
+
Each tool in the `hello` message needs:
|
|
165
|
+
|
|
166
|
+
| Field | Required | Description |
|
|
167
|
+
|---|---|---|
|
|
168
|
+
| `name` | yes | Unique tool name (must not conflict with tap tools or other providers) |
|
|
169
|
+
| `description` | yes | What the tool does |
|
|
170
|
+
| `parameters` | yes | JSON Schema object describing the arguments |
|
|
171
|
+
| `timeout` | no | Max execution time in ms |
|
|
172
|
+
|
|
173
|
+
A provider can register up to **100 tools**.
|
|
174
|
+
|
|
175
|
+
Bound providers may replace the entire list with:
|
|
176
|
+
|
|
177
|
+
```json
|
|
178
|
+
{ "type": "tools.update", "tools": [ /* same tool definition shape as hello.tools */ ] }
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
The update is session-bound: if `sessionId` is supplied it must match the session selected in `hello`. Success is silent and triggers the same debounced tool refresh as provider connect/disconnect. Rejected updates receive `error` (for example `TOOL_CONFLICT`), and the previously registered tool list stays active. In-flight calls to tools removed by an accepted update continue to their normal result, timeout, cancellation, or disconnect outcome.
|
|
182
|
+
|
|
183
|
+
## Push events
|
|
184
|
+
|
|
185
|
+
The provider SDK helpers map to bound-provider `push` messages:
|
|
186
|
+
|
|
187
|
+
```js
|
|
188
|
+
provider.keep("stored in the provider stream only");
|
|
189
|
+
provider.surface("visible in the Copilot timeline");
|
|
190
|
+
provider.push("inject this into the active session");
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
Wire shape:
|
|
194
|
+
|
|
195
|
+
```json
|
|
196
|
+
{ "type": "push", "level": "inject", "event": "Browser page asks for help", "stream": "detour" }
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
`level` must be `keep`, `surface`, or `inject`. `stream` is optional and defaults to the provider name. Pushes are delivered only to the session chosen in `hello`; an optional `sessionId` must match that bound session. `metadata` may be a JSON object and is stored with the event.
|
|
200
|
+
|
|
201
|
+
## Error handling
|
|
202
|
+
|
|
203
|
+
| Code | Fatal? | Meaning |
|
|
204
|
+
|---|---|---|
|
|
205
|
+
| `AUTH_FAILED` | Yes | Bad token — connection closes |
|
|
206
|
+
| `UNSUPPORTED_VERSION` | Yes | Wrong `protocolVersion` — connection closes |
|
|
207
|
+
| `INVALID_SESSION` | No | Session ID doesn't exist — pick another |
|
|
208
|
+
| `TOOL_CONFLICT` | No | Tool name already taken — rename and retry |
|
|
209
|
+
| `PAYLOAD_TOO_LARGE` | No | Message exceeds size limit |
|
|
210
|
+
|
|
211
|
+
Payload limits: `tool.result` max 5 MB, all other messages max 2 MB.
|
|
212
|
+
|
|
213
|
+
When a bound provider has in-flight `tool.call` messages, malformed JSON,
|
|
214
|
+
oversized messages, or invalid `tool.result` messages that cannot be correlated
|
|
215
|
+
are fail-fast: one pending call is rejected with the protocol error; multiple
|
|
216
|
+
pending calls cause the provider to disconnect and all in-flight calls to fail
|
|
217
|
+
with `DISCONNECTED`.
|
|
218
|
+
|
|
219
|
+
## Writing providers in other languages
|
|
220
|
+
|
|
221
|
+
The protocol is plain JSON over WebSocket. Any language with a WebSocket client works. See [the full spec](./docs/recipes/provider-interface-core-profile.md) for a Python example.
|
|
222
|
+
|
|
223
|
+
## Multiple providers
|
|
224
|
+
|
|
225
|
+
Multiple providers can connect simultaneously. Each gets its own tool namespace. The gateway debounces tool registration (200ms) so multiple providers connecting at the same time trigger only one reload.
|
|
226
|
+
|
|
227
|
+
## Dynamic tool registration
|
|
228
|
+
|
|
229
|
+
When a provider connects or disconnects, ※ tap:
|
|
230
|
+
|
|
231
|
+
1. Merges all provider tools with the existing tap tools
|
|
232
|
+
2. Calls `session.registerTools()` to update the in-memory handler map
|
|
233
|
+
3. Calls `session.rpc.extensions.reload()` to make the CLI pick up the new tools
|
|
234
|
+
|
|
235
|
+
This happens automatically — providers just connect and their tools appear.
|
|
236
|
+
|
|
237
|
+
After binding, providers can also send `tools.update` to replace their own tool list without reconnecting. ※ tap validates the new definitions, rejects conflicts without changing the active list, and uses the same debounced `registerTools()` + extension reload path on success.
|
|
238
|
+
|
|
239
|
+
## Further reading
|
|
240
|
+
|
|
241
|
+
- [Core Profile spec](./docs/recipes/provider-interface-core-profile.md) — Full protocol specification with state machine, error codes, and payload limits
|
|
242
|
+
- [Test provider example](./examples/test-provider.mjs) — A runnable example you can try immediately
|
|
@@ -0,0 +1,303 @@
|
|
|
1
|
+
# Recipe: Adaptive Agent — Self-Tuning Behavior via Session Observation
|
|
2
|
+
|
|
3
|
+
## The insight
|
|
4
|
+
|
|
5
|
+
Every Copilot session starts from zero. The AI doesn't know that you corrected it about the import style yesterday. It doesn't know that you always run tests after editing test files. It doesn't know that the last 3 times it suggested `jsonwebtoken`, you replaced it with the project's custom JWT library.
|
|
6
|
+
|
|
7
|
+
Skills can encode known rules. But they can't **discover** rules by observing what happens in sessions. The Adaptive Agent watches the session — tool calls, user corrections, assistant mistakes — and builds a living knowledge base that rewrites the system prompt via transform callbacks. The agent gets better over time, not because someone wrote instructions, but because the extension observed and learned.
|
|
8
|
+
|
|
9
|
+
## Why skills can't do this
|
|
10
|
+
|
|
11
|
+
1. A skill is static text. It can say "use custom JWT library." But someone has to write that rule. The adaptive agent discovers it by watching you replace `jsonwebtoken` twice.
|
|
12
|
+
2. A skill can't watch `assistant.message` events to detect when the AI makes a mistake. The extension can.
|
|
13
|
+
3. A skill can't watch `user.message` events to detect correction patterns ("no, actually..."). The extension can.
|
|
14
|
+
4. A skill can't rewrite its own content. Transform callbacks can modify the system prompt every turn based on accumulated observations.
|
|
15
|
+
|
|
16
|
+
## Architecture
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
┌─────────────────────────────────────────────────────────────┐
|
|
20
|
+
│ Copilot CLI session │
|
|
21
|
+
│ │
|
|
22
|
+
│ session.on("user.message") ─────┐ │
|
|
23
|
+
│ session.on("assistant.message")──┤ │
|
|
24
|
+
│ onPostToolUse ───────────────────┤ │
|
|
25
|
+
│ ▼ │
|
|
26
|
+
│ ┌──────────────────┐ │
|
|
27
|
+
│ │ Observer Module │ │
|
|
28
|
+
│ │ │ │
|
|
29
|
+
│ │ Detects: │ │
|
|
30
|
+
│ │ • Corrections │ │
|
|
31
|
+
│ │ • Patterns │ │
|
|
32
|
+
│ │ • Failures │ │
|
|
33
|
+
│ │ • Preferences │ │
|
|
34
|
+
│ └────────┬─────────┘ │
|
|
35
|
+
│ │ │
|
|
36
|
+
│ ▼ │
|
|
37
|
+
│ ┌──────────────────┐ │
|
|
38
|
+
│ │ Memory Store │ │
|
|
39
|
+
│ │ (workspace/ │ │
|
|
40
|
+
│ │ memory.json) │ │
|
|
41
|
+
│ └────────┬─────────┘ │
|
|
42
|
+
│ │ │
|
|
43
|
+
│ ▼ │
|
|
44
|
+
│ ┌──────────────────┐ │
|
|
45
|
+
│ │ Transform │ │
|
|
46
|
+
│ │ Callbacks │──► system prompt │
|
|
47
|
+
│ │ (every turn) │ rewritten with │
|
|
48
|
+
│ │ │ learned rules │
|
|
49
|
+
│ └──────────────────┘ │
|
|
50
|
+
└─────────────────────────────────────────────────────────────┘
|
|
51
|
+
│
|
|
52
|
+
│ onSessionEnd
|
|
53
|
+
▼
|
|
54
|
+
┌──────────────────┐
|
|
55
|
+
│ Distill │
|
|
56
|
+
│ PromptEmitter │
|
|
57
|
+
│ (one-time) │
|
|
58
|
+
│ │
|
|
59
|
+
│ Summarize what │
|
|
60
|
+
│ was learned │
|
|
61
|
+
│ this session │
|
|
62
|
+
│ → persist to │
|
|
63
|
+
│ memory.json │
|
|
64
|
+
└──────────────────┘
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Components
|
|
68
|
+
|
|
69
|
+
### 1. Observer module (event listeners)
|
|
70
|
+
|
|
71
|
+
Hooks into three event sources to watch the session:
|
|
72
|
+
|
|
73
|
+
```js
|
|
74
|
+
// Watch user messages for correction patterns
|
|
75
|
+
session.on("user.message", (event) => {
|
|
76
|
+
const msg = event.data.content?.toLowerCase() ?? "";
|
|
77
|
+
// Detect corrections: "no", "actually", "don't use", "wrong", "instead"
|
|
78
|
+
if (correctionPattern.test(msg)) {
|
|
79
|
+
observer.recordCorrection({
|
|
80
|
+
userMessage: event.data.content,
|
|
81
|
+
// The previous assistant message is what was wrong
|
|
82
|
+
previousAssistantAction: observer.lastAssistantAction,
|
|
83
|
+
timestamp: Date.now()
|
|
84
|
+
});
|
|
85
|
+
}
|
|
86
|
+
});
|
|
87
|
+
|
|
88
|
+
// Watch assistant messages to track what the AI does
|
|
89
|
+
session.on("assistant.message", (event) => {
|
|
90
|
+
observer.lastAssistantAction = {
|
|
91
|
+
content: event.data.content,
|
|
92
|
+
toolRequests: event.data.toolRequests
|
|
93
|
+
};
|
|
94
|
+
});
|
|
95
|
+
|
|
96
|
+
// Watch tool calls to track workflow patterns
|
|
97
|
+
onPostToolUse: ({ toolName, toolArgs, result }) => {
|
|
98
|
+
observer.recordToolUse({
|
|
99
|
+
tool: toolName,
|
|
100
|
+
args: toolArgs,
|
|
101
|
+
succeeded: result.type === "success",
|
|
102
|
+
file: toolArgs?.path || toolArgs?.file,
|
|
103
|
+
timestamp: Date.now()
|
|
104
|
+
});
|
|
105
|
+
}
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### 2. Pattern detection
|
|
109
|
+
|
|
110
|
+
The observer accumulates raw events. A PromptEmitter on idle periodically distills patterns:
|
|
111
|
+
|
|
112
|
+
```
|
|
113
|
+
prompt: |
|
|
114
|
+
Review these raw observations from the current session and extract
|
|
115
|
+
durable learnings. Only output learnings that:
|
|
116
|
+
- Are specific to THIS codebase (not generic advice)
|
|
117
|
+
- Were demonstrated at least once clearly
|
|
118
|
+
- Would prevent a future mistake or save time
|
|
119
|
+
|
|
120
|
+
Format each as a single instruction sentence.
|
|
121
|
+
Output nothing if no clear learnings emerged.
|
|
122
|
+
|
|
123
|
+
Observations:
|
|
124
|
+
{{corrections}}
|
|
125
|
+
{{tool_failures}}
|
|
126
|
+
{{repeated_sequences}}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
Example output:
|
|
130
|
+
```json
|
|
131
|
+
[
|
|
132
|
+
"Use the custom JWT library at src/lib/jwt.ts instead of jsonwebtoken — user corrected this.",
|
|
133
|
+
"Always run npm test after editing files in test/ — user does this manually every time.",
|
|
134
|
+
"The staging environment uses port 3001, not 3000 — the agent used the wrong port twice."
|
|
135
|
+
]
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
### 3. Memory store (workspace persistence)
|
|
139
|
+
|
|
140
|
+
Learnings accumulate in `workspace/memory.json`:
|
|
141
|
+
|
|
142
|
+
```json
|
|
143
|
+
{
|
|
144
|
+
"schemaVersion": 1,
|
|
145
|
+
"learnings": [
|
|
146
|
+
{
|
|
147
|
+
"rule": "Use src/lib/jwt.ts instead of jsonwebtoken for JWT operations",
|
|
148
|
+
"confidence": 0.9,
|
|
149
|
+
"observations": 2,
|
|
150
|
+
"firstSeen": "2026-04-24T10:00:00Z",
|
|
151
|
+
"lastSeen": "2026-04-25T14:30:00Z",
|
|
152
|
+
"source": "user-correction"
|
|
153
|
+
},
|
|
154
|
+
{
|
|
155
|
+
"rule": "Run npm test after editing files in test/",
|
|
156
|
+
"confidence": 0.7,
|
|
157
|
+
"observations": 3,
|
|
158
|
+
"firstSeen": "2026-04-24T11:00:00Z",
|
|
159
|
+
"lastSeen": "2026-04-26T09:00:00Z",
|
|
160
|
+
"source": "repeated-pattern"
|
|
161
|
+
},
|
|
162
|
+
{
|
|
163
|
+
"rule": "Staging environment is on port 3001",
|
|
164
|
+
"confidence": 0.8,
|
|
165
|
+
"observations": 2,
|
|
166
|
+
"firstSeen": "2026-04-25T14:00:00Z",
|
|
167
|
+
"lastSeen": "2026-04-25T14:05:00Z",
|
|
168
|
+
"source": "tool-failure-correction"
|
|
169
|
+
}
|
|
170
|
+
],
|
|
171
|
+
"lastDistilled": "2026-04-26T09:30:00Z"
|
|
172
|
+
}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### 4. Transform callback (system prompt rewriting)
|
|
176
|
+
|
|
177
|
+
Every turn, the learned rules are injected into the system prompt:
|
|
178
|
+
|
|
179
|
+
```js
|
|
180
|
+
registerTransformCallbacks(new Map([
|
|
181
|
+
["custom_instructions", (current) => {
|
|
182
|
+
const memory = readMemoryStore();
|
|
183
|
+
if (!memory.learnings.length) return current;
|
|
184
|
+
|
|
185
|
+
const rules = memory.learnings
|
|
186
|
+
.filter(l => l.confidence >= 0.6)
|
|
187
|
+
.sort((a, b) => b.confidence - a.confidence)
|
|
188
|
+
.slice(0, 15) // cap to avoid prompt bloat
|
|
189
|
+
.map(l => `- ${l.rule}`)
|
|
190
|
+
.join("\n");
|
|
191
|
+
|
|
192
|
+
return current + "\n\n" +
|
|
193
|
+
"## Learned from previous sessions\n\n" +
|
|
194
|
+
"These rules were learned by observing your corrections " +
|
|
195
|
+
"and patterns. Follow them unless the user explicitly " +
|
|
196
|
+
"asks otherwise.\n\n" + rules;
|
|
197
|
+
}]
|
|
198
|
+
]));
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
### 5. Session-end distillation
|
|
202
|
+
|
|
203
|
+
When the session ends, a one-time PromptEmitter reviews the raw observations and updates the memory store:
|
|
204
|
+
|
|
205
|
+
```js
|
|
206
|
+
onSessionEnd: async () => {
|
|
207
|
+
const observations = observer.getSessionObservations();
|
|
208
|
+
if (observations.length === 0) return;
|
|
209
|
+
|
|
210
|
+
// Fire a one-time prompt to distill learnings
|
|
211
|
+
const distilled = await distillLearnings(observations);
|
|
212
|
+
|
|
213
|
+
// Merge with existing memory
|
|
214
|
+
const memory = readMemoryStore();
|
|
215
|
+
for (const learning of distilled) {
|
|
216
|
+
const existing = memory.learnings.find(
|
|
217
|
+
l => semanticallySimilar(l.rule, learning.rule)
|
|
218
|
+
);
|
|
219
|
+
if (existing) {
|
|
220
|
+
existing.confidence = Math.min(1.0, existing.confidence + 0.1);
|
|
221
|
+
existing.observations += 1;
|
|
222
|
+
existing.lastSeen = new Date().toISOString();
|
|
223
|
+
} else {
|
|
224
|
+
memory.learnings.push({
|
|
225
|
+
rule: learning.rule,
|
|
226
|
+
confidence: 0.5, // new learnings start at 0.5
|
|
227
|
+
observations: 1,
|
|
228
|
+
firstSeen: new Date().toISOString(),
|
|
229
|
+
lastSeen: new Date().toISOString(),
|
|
230
|
+
source: learning.source
|
|
231
|
+
});
|
|
232
|
+
}
|
|
233
|
+
}
|
|
234
|
+
|
|
235
|
+
// Decay old learnings that haven't been reinforced
|
|
236
|
+
for (const l of memory.learnings) {
|
|
237
|
+
const daysSinceLastSeen = (Date.now() - new Date(l.lastSeen)) / 86400000;
|
|
238
|
+
if (daysSinceLastSeen > 30) {
|
|
239
|
+
l.confidence -= 0.1;
|
|
240
|
+
}
|
|
241
|
+
}
|
|
242
|
+
|
|
243
|
+
// Prune low-confidence learnings
|
|
244
|
+
memory.learnings = memory.learnings.filter(l => l.confidence > 0.2);
|
|
245
|
+
writeMemoryStore(memory);
|
|
246
|
+
}
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
## Example: How a learning forms
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
Session 1:
|
|
253
|
+
Agent suggests: import jwt from 'jsonwebtoken'
|
|
254
|
+
User says: "no, use the custom one at src/lib/jwt.ts"
|
|
255
|
+
Observer records: correction, jsonwebtoken → src/lib/jwt.ts
|
|
256
|
+
Session ends → memory.json gets: { rule: "Use src/lib/jwt.ts...", confidence: 0.5 }
|
|
257
|
+
|
|
258
|
+
Session 2:
|
|
259
|
+
Transform callback injects the rule into system prompt
|
|
260
|
+
Agent correctly uses: import { sign } from './lib/jwt.ts'
|
|
261
|
+
No correction needed → confidence stays at 0.5
|
|
262
|
+
|
|
263
|
+
Session 3:
|
|
264
|
+
Different context — agent is writing a new auth endpoint
|
|
265
|
+
Agent uses src/lib/jwt.ts without being told
|
|
266
|
+
User says nothing (implicit approval) → confidence bumps to 0.6
|
|
267
|
+
|
|
268
|
+
Session 5:
|
|
269
|
+
Confidence at 0.7. The rule is now firmly established.
|
|
270
|
+
The agent never makes this mistake again in this repo.
|
|
271
|
+
Nobody wrote an instruction. It was learned.
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
## What gets learned (categories)
|
|
275
|
+
|
|
276
|
+
| Category | Detection method | Example |
|
|
277
|
+
|---|---|---|
|
|
278
|
+
| **Library preferences** | User corrects import/require | "Use date-fns not moment" |
|
|
279
|
+
| **Workflow sequences** | Repeated tool call patterns | "Run tests after editing test files" |
|
|
280
|
+
| **Environment facts** | Tool failures + corrections | "Staging is port 3001" |
|
|
281
|
+
| **Code conventions** | User rewrites AI output | "Use single quotes not double" |
|
|
282
|
+
| **Architecture rules** | User rejects suggestions | "Don't put business logic in controllers" |
|
|
283
|
+
| **Command preferences** | User overrides commands | "Use pnpm not npm in this repo" |
|
|
284
|
+
|
|
285
|
+
## Phased delivery
|
|
286
|
+
|
|
287
|
+
| Phase | Scope |
|
|
288
|
+
|---|---|
|
|
289
|
+
| **1. Observer + raw logging** | Hook into user.message, assistant.message, onPostToolUse. Log to EventStream. |
|
|
290
|
+
| **2. Memory store** | workspace/memory.json with read/write. Load on session start. |
|
|
291
|
+
| **3. Transform callback** | Inject learned rules into custom_instructions section every turn. |
|
|
292
|
+
| **4. Session-end distillation** | PromptEmitter at session end to distill raw observations into learnings. |
|
|
293
|
+
| **5. Confidence decay** | Time-based decay for stale learnings. Reinforcement on reuse. |
|
|
294
|
+
| **6. User control** | Tool to list/remove/edit learned rules: `tap_memory_list`, `tap_memory_forget`. |
|
|
295
|
+
|
|
296
|
+
## Open questions
|
|
297
|
+
|
|
298
|
+
- **Privacy** — learnings are repo-scoped by default. Should they ever be user-global?
|
|
299
|
+
- **Conflict resolution** — what if two sessions produce contradictory learnings?
|
|
300
|
+
- **Prompt budget** — how many learned rules before the system prompt gets too long? Cap at 15? 20?
|
|
301
|
+
- **Semantic similarity** — how to detect that two rules are about the same thing? Exact match? Embedding?
|
|
302
|
+
- **Observation quality** — not every "no" is a correction. How to reduce false positives?
|
|
303
|
+
- **User trust** — should learned rules be surfaced for approval before taking effect?
|