usesteady 0.1.0-alpha.1 → 0.1.0-alpha.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +67 -658
- package/dist/server-admin.d.ts +11 -0
- package/dist/server-admin.d.ts.map +1 -0
- package/dist/server-admin.js +188 -0
- package/dist/server-admin.js.map +1 -0
- package/dist/server.d.ts.map +1 -1
- package/dist/server.js +432 -7
- package/dist/server.js.map +1 -1
- package/dist/src/execution/session-db.d.ts +134 -0
- package/dist/src/execution/session-db.d.ts.map +1 -0
- package/dist/src/execution/session-db.js +345 -0
- package/dist/src/execution/session-db.js.map +1 -0
- package/dist/src/friction/payout-ledger.d.ts +3 -0
- package/dist/src/friction/payout-ledger.d.ts.map +1 -1
- package/dist/src/friction/payout-ledger.js +13 -3
- package/dist/src/friction/payout-ledger.js.map +1 -1
- package/dist/src/shell/cli/main.d.ts.map +1 -1
- package/dist/src/shell/cli/main.js +25 -10
- package/dist/src/shell/cli/main.js.map +1 -1
- package/dist/src/shell/cli/use-steady.js +13 -9
- package/dist/src/shell/cli/use-steady.js.map +1 -1
- package/dist/src/types/execution.d.ts +25 -0
- package/dist/src/types/execution.d.ts.map +1 -0
- package/dist/src/types/execution.js +9 -0
- package/dist/src/types/execution.js.map +1 -0
- package/package.json +2 -1
package/README.md
CHANGED
|
@@ -1,724 +1,133 @@
|
|
|
1
|
-
#
|
|
1
|
+
# UseSteady — Review AI actions before they run
|
|
2
2
|
|
|
3
|
-
UseSteady
|
|
4
|
-
|
|
5
|
-
**Core system:** deterministic intake + understanding engine, Cursor and Claude execution runtimes, multi-step workflow coordinator, CLI shell, React web UI, and history/audit layer. No LLM in the control path. No cloud dependency.
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## What this does in one sentence
|
|
10
|
-
|
|
11
|
-
Given a natural-language input and a session context, the system returns one of five verbs: **refuse, ignore, clarify, guide, or execute** ? plus structured data explaining why.
|
|
12
|
-
|
|
13
|
-
Natural language never executes directly. Ever.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## Build phases
|
|
18
|
-
|
|
19
|
-
| Phase | Name | Status |
|
|
20
|
-
|-------|------|--------|
|
|
21
|
-
| 1 | Intake — PRV, Safety, Context Alignment, Disambiguation, Completion | ✅ Locked |
|
|
22
|
-
| 2–3 | Understand — Intent Interpretation Bridge, interpreter system | ✅ Locked |
|
|
23
|
-
| 4 | UCP Core — canonical envelope format, hashing, persistence, provenance chain | ✅ Locked |
|
|
24
|
-
| 4C | Silent Guidance — selector expansion (operation + investigation), evidence loop | ✅ Locked |
|
|
25
|
-
| 5 | Present Layer — reminder presentation, formatIntakeResult, presentFromInput | ✅ Frozen |
|
|
26
|
-
| 5B | Control Visibility (CVG) — authority-signal classifier, assertion layer | ✅ Baseline locked |
|
|
27
|
-
| 5C | Boundary Explanation — why-routing surfaced to H | ✅ Baseline locked |
|
|
28
|
-
| 5D | Cross-layer Contradiction Visibility | ✅ Baseline locked |
|
|
29
|
-
| 6 | Execution Layer — Cursor seam (artifact → gate → adapter) | ✅ Locked |
|
|
30
|
-
| 6B | Cursor Product Session — phase state machine, terminal guards, session invariants | ✅ Baseline locked |
|
|
31
|
-
| 6C | Session Resilience — interruption handling, partial-session recovery | ✅ Baseline locked |
|
|
32
|
-
| 7 | UCP Alignment — cursor envelope types, timeline, provenance chain extension | ✅ Locked |
|
|
33
|
-
| 8A | Claude Managed Agents — seam design, authority model, tool policy | ✅ Phase A frozen |
|
|
34
|
-
| 8B | Claude Delivery Gate — ClaudeAgentPlugin, handoff persistence, stub adapter | ✅ Baseline locked |
|
|
35
|
-
| 8C | Claude API Adapter — real Anthropic client behind ClaudeAgentPlugin interface | ✅ Baseline locked |
|
|
36
|
-
| 8D | Claude Product Session — session wiring, parity with Cursor path | ✅ Baseline locked |
|
|
37
|
-
| 9A–9F | Product Shell — CLI, workflow coordinator, workflow shell, UCP persistence | ✅ Frozen |
|
|
38
|
-
| 10A–10C | History / Audit — two-tier read model, history shell | ✅ Frozen |
|
|
39
|
-
| 11B–11C | Product Slice Validation — Safe Refactor Workflow, CLI surface validation | ✅ Frozen |
|
|
40
|
-
| 11D | UI Design — reviewing phase, raw input anchor, targetFiles scope | ✅ Frozen |
|
|
41
|
-
| 11A-Web | Web UI — React shell on frozen contracts, Express API bridge | ✅ Frozen |
|
|
42
|
-
| 11E | Workflow Builder — natural-language entry model | ✅ Frozen |
|
|
43
|
-
| 12 | Simulation — persona-based validation across all interaction states | ✅ Passed |
|
|
44
|
-
|
|
45
|
-
**Correctly deferred (not yet built):**
|
|
46
|
-
reminder scheduler / persistence, timezone resolution,
|
|
47
|
-
`cursor_artifact.v1`, `ucp.claude_session.v1`, `ucp.claude_result.v1`,
|
|
48
|
-
Session 7 vocabulary validation, `networkAccess: "allow_limited"` semantics,
|
|
49
|
-
Claude session resumability, mobile layout.
|
|
50
|
-
|
|
51
|
-
---
|
|
52
|
-
|
|
53
|
-
## Core pipeline
|
|
54
|
-
|
|
55
|
-
Every input passes through the same sequence. No step is skipped. No step can be reordered.
|
|
3
|
+
UseSteady shows you exactly what AI will do — before it executes anything.
|
|
56
4
|
|
|
57
5
|
```
|
|
58
|
-
|
|
59
|
-
?
|
|
60
|
-
?
|
|
61
|
-
PRV ? Is this input obviously missing required prior state?
|
|
62
|
-
? clarify if yes
|
|
63
|
-
?
|
|
64
|
-
Safety Gate ? Is this input unsafe to process at all?
|
|
65
|
-
? refuse if yes
|
|
66
|
-
?
|
|
67
|
-
Context Alignment ? Is this a social message or a context reference with no context?
|
|
68
|
-
? ignore / clarify if yes
|
|
69
|
-
?
|
|
70
|
-
Disambiguation ? Does this input contain a term with multiple valid meanings?
|
|
71
|
-
? clarify if yes
|
|
72
|
-
?
|
|
73
|
-
Completion ? Is this input specific enough to be actionable?
|
|
74
|
-
? (receives full context ? resolves "run again" etc.)
|
|
75
|
-
? guide if incomplete or vague
|
|
76
|
-
?
|
|
77
|
-
Intent Interpretation ? What does the user appear to be trying to do? (guided_recovery only)
|
|
78
|
-
Bridge ? advisory ? improves guidance labels, never invents missing facts
|
|
79
|
-
?
|
|
80
|
-
?
|
|
81
|
-
Response Planner ? Maps intent state to a final response mode
|
|
82
|
-
?
|
|
83
|
-
?
|
|
84
|
-
(if execute)
|
|
85
|
-
Change Interpretation ? What will the human experience from this change? (advisory)
|
|
86
|
-
?
|
|
87
|
-
?
|
|
88
|
-
Guidance Ordering ? applyGuidanceOrdering (B2 ? session-aware, advisory only)
|
|
89
|
-
[if guidance exists] reorders read_first before use_exact_format
|
|
90
|
-
shortens read_first label when category is familiar (?3 obs)
|
|
91
|
-
?
|
|
92
|
-
?
|
|
93
|
-
IntakeResult ? { mode, reason, signal, intentState, guidance?, interpretation? }
|
|
94
|
-
?
|
|
95
|
-
?
|
|
96
|
-
Presentation Layer ? formatIntakeResult ? PresentationOutput
|
|
97
|
-
badge, headline, category, confidence, steps, missing
|
|
6
|
+
npx usesteady
|
|
98
7
|
```
|
|
99
8
|
|
|
100
9
|
---
|
|
101
10
|
|
|
102
|
-
##
|
|
103
|
-
|
|
104
|
-
### UCP ? UseSteady Control Protocol
|
|
105
|
-
**What it is:** The canonical message envelope format. Every payload in the system has a version, kind, id, timestamp, and a SHA-256 hash derived from all fields.
|
|
106
|
-
|
|
107
|
-
**Why it matters:** Envelopes are tamper-evident. The hash makes the payload verifiable without trusting memory or state.
|
|
108
|
-
|
|
109
|
-
**One-liner:** All messages have a stable identity and a content hash.
|
|
110
|
-
|
|
111
|
-
---
|
|
112
|
-
|
|
113
|
-
### PRV ? Pre-Response Validation
|
|
114
|
-
**What it is:** A fast lexical check that runs before anything else. Looks for high-signal markers (`again`, `continue`, `previous`, `same as`) that mean the input requires prior state.
|
|
115
|
-
|
|
116
|
-
**Why it matters:** If you say "run again" and there is no prior run, the system must clarify ? not guess. PRV enforces this at the front door.
|
|
117
|
-
|
|
118
|
-
**One-liner:** Stops obvious context-dependent requests before they go anywhere.
|
|
119
|
-
|
|
120
|
-
**Boundary with Context Alignment:** PRV is lexical (word-level). Context Alignment is semantic (phrase-level). "same file", "that result", "my project" are handled by Context Alignment, not PRV.
|
|
121
|
-
|
|
122
|
-
---
|
|
123
|
-
|
|
124
|
-
### Safety Gate
|
|
125
|
-
**What it is:** A deterministic, registry-driven chain of detectors. Runs after PRV, before anything else. Each detector has an `id`, a `priority`, and labeled patterns that produce a `matchedPattern` and `detectorId` on block.
|
|
126
|
-
|
|
127
|
-
**Why it matters:** Unsafe inputs ? destructive actions, credential access, rule bypass, arbitrary script execution ? are refused here. No unsafe input reaches interpretation or execution.
|
|
128
|
-
|
|
129
|
-
**One-liner:** Blocks dangerous inputs early and tells you exactly why.
|
|
130
|
-
|
|
131
|
-
**Inspectable fields on a block result:**
|
|
132
|
-
- `detectorId` ? which detector fired
|
|
133
|
-
- `matchedPattern` ? which specific phrase triggered it
|
|
134
|
-
- `reason` ? the risk category enum
|
|
135
|
-
- `note` ? plain-English explanation
|
|
136
|
-
|
|
137
|
-
---
|
|
138
|
-
|
|
139
|
-
### Context Alignment
|
|
140
|
-
**What it is:** Classifies the input into one of three states: `aligned` (normal task), `non_literal` (social/greeting), or `hard_mismatch` (references prior context that isn't available).
|
|
141
|
-
|
|
142
|
-
**Why it matters:** "Good morning" and "use same file" should never reach the planner. Context Alignment handles both correctly.
|
|
143
|
-
|
|
144
|
-
**One-liner:** Identifies social messages and unresolvable context references.
|
|
145
|
-
|
|
146
|
-
---
|
|
147
|
-
|
|
148
|
-
### Disambiguation
|
|
149
|
-
**What it is:** A registry of detectors that flag inputs with multiple valid interpretations. When a term is ambiguous, it returns bounded options ? never a silent rewrite.
|
|
150
|
-
|
|
151
|
-
**Why it matters:** "change button" could mean text, color, behavior, or handler. The system must ask, not guess.
|
|
152
|
-
|
|
153
|
-
**One-liner:** Detects terms with multiple meanings and surfaces the options.
|
|
154
|
-
|
|
155
|
-
**Note:** `unknown` from disambiguation is an internal state ? it means "no ambiguity detected, proceed to completion." It is not returned as a public signal.
|
|
156
|
-
|
|
157
|
-
---
|
|
158
|
-
|
|
159
|
-
### Completion
|
|
160
|
-
**What it is:** The authority on executability. Determines whether the input is actionable (`complete`), missing a specific required field (`incomplete`), or too vague to act on without format guidance (`guided_recovery`).
|
|
161
|
-
|
|
162
|
-
**Why it matters:** Completion receives full session context, so "run again" resolves to `complete` when a prior run exists. Disambiguation returning `unknown` does not block completion from returning `complete`.
|
|
163
|
-
|
|
164
|
-
**One-liner:** Decides whether the input is ready to execute ? and if not, explains exactly what's needed.
|
|
165
|
-
|
|
166
|
-
**Semantic contract (must not be collapsed):**
|
|
167
|
-
- `incomplete` = intent is clear, a required field is missing (e.g. "commit my changes" ? needs a message)
|
|
168
|
-
- `guided_recovery` = intent is vague, a safe format path exists (e.g. "make button blue" ? needs file + values)
|
|
169
|
-
|
|
170
|
-
**Written contract:** Completion is the sole authority on executability. Disambiguation `unknown` does not block it. Context-dependent executable inputs are resolved here.
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
|
-
### Change Interpretation (v1)
|
|
175
|
-
**What it is:** An advisory layer that runs only when mode is `execute`. Describes what the human will experience from the change ? in plain English, with impact notes and confidence.
|
|
176
|
-
|
|
177
|
-
**Why it matters:** The system can now say "this changes the background color from blue-500 to red-500 in Button.tsx ? visual change only, no logic impact" rather than just "execute." That is meaningful information for the user and for any downstream reviewer.
|
|
178
|
-
|
|
179
|
-
**One-liner:** Tells you what the change means, not just that it's allowed.
|
|
180
|
-
|
|
181
|
-
**v1 scope (deterministic, no inference):**
|
|
182
|
-
- Tailwind color class changes (`bg-`, `text-`, `border-`, etc.) ? confidence: high
|
|
183
|
-
- CSS color value changes (hex, rgb, hsl, named colors, CSS property declarations) ? confidence: high/medium
|
|
184
|
-
- Config value changes (numbers, URLs, booleans in .json/.yaml/.env files) ? confidence: high/medium
|
|
185
|
-
- Text literal changes (human-readable UI copy) ? confidence: medium
|
|
186
|
-
|
|
187
|
-
**Returns `null`** for any input outside these three categories. Does not guess.
|
|
188
|
-
|
|
189
|
-
---
|
|
190
|
-
|
|
191
|
-
### Intent Interpretation Bridge
|
|
192
|
-
**What it is:** An advisory layer that runs only when `CompletionResult.kind === "guided_recovery"`. It classifies the user's broad intent (visual color change, text/copy change, configuration change) and uses that classification to rewrite guidance labels into human-readable, context-aware form.
|
|
193
|
-
|
|
194
|
-
**Why it matters:** Before this layer, guided recovery returned generic format instructions. After it, the same instructions are prefaced with "Read the component file first to find the current color value" ? which tells the user *why* they're being asked to read before patching. The meaning is still safe; nothing is invented.
|
|
195
|
-
|
|
196
|
-
**One-liner:** Tells you what the user appears to be trying to do, so guided recovery reads like help rather than syntax documentation.
|
|
197
|
-
|
|
198
|
-
**Hard invariant (locked):** Interpretation can improve guidance, but it can never manufacture executability.
|
|
199
|
-
|
|
200
|
-
**What it classifies:**
|
|
201
|
-
- `visual_color` ? named colors, comparative terms (darker/lighter), color/style targets (button, background, header)
|
|
202
|
-
- `text_change` ? text-change verb AND text-content target (heading, label, copy, title, placeholder)
|
|
203
|
-
- `config_change` ? strong config verbs (toggle/enable/disable) or config nouns (port, timeout, flag, env)
|
|
204
|
-
|
|
205
|
-
**What it never does:**
|
|
206
|
-
- Guess the file path
|
|
207
|
-
- Guess the current or new value
|
|
208
|
-
- Guess any token, class name, or key
|
|
209
|
-
- Change the mode from `guide` to `execute`
|
|
210
|
-
|
|
211
|
-
**Returns `null`** (no enrichment) if the input does not match any safe category. Original guidance is returned unchanged.
|
|
212
|
-
|
|
213
|
-
**Priority ordering in the registry:**
|
|
214
|
-
1. Config (priority 10) ? must beat color for "toggle dark mode"
|
|
215
|
-
2. Color (priority 20) ? named colors + UI target terms
|
|
216
|
-
3. Text (priority 30) ? requires both verb AND target (tighter match)
|
|
217
|
-
|
|
218
|
-
---
|
|
219
|
-
|
|
220
|
-
### Presentation Layer
|
|
221
|
-
**What it is:** A pure translation layer that converts an `IntakeResult` into a `PresentationOutput` ? the last-mile data structure a consumer can render directly, without re-running or re-interpreting anything.
|
|
222
|
-
|
|
223
|
-
**Why it matters:** Without this layer, every consumer needs to know how to branch on `mode`, which interpretation family to use, where to find confidence, and how to format step labels. The presentation layer handles that once, correctly, and deterministically.
|
|
11
|
+
## Why
|
|
224
12
|
|
|
225
|
-
|
|
13
|
+
AI tools can generate code, run commands, and modify files — often before you fully understand what will change.
|
|
226
14
|
|
|
227
|
-
|
|
228
|
-
- `badge` ? mode indicator: `REFUSE` | `IGNORE` | `CLARIFY` | `GUIDE` | `EXECUTE`
|
|
229
|
-
- `headline` ? the first sentence to show; sourced from interpretation summary (when available) or system reason
|
|
230
|
-
- `category` ? human-readable display name for the interpretation category (when available)
|
|
231
|
-
- `confidence` ? `"High confidence"` / `"Medium confidence"` / `"Low confidence"` (when available)
|
|
232
|
-
- `steps` ? formatted step labels from `guidance.nextSteps` (guide mode only)
|
|
233
|
-
- `missing` ? list of missing fields from `guidance.missing` (guide mode only)
|
|
15
|
+
UseSteady adds a review layer:
|
|
234
16
|
|
|
235
|
-
**
|
|
236
|
-
-
|
|
237
|
-
-
|
|
238
|
-
-
|
|
17
|
+
- **SYSTEM WILL** — the exact change, not a summary
|
|
18
|
+
- **Risk level** — LOW / MEDIUM / HIGH, derived from what is actually changing
|
|
19
|
+
- **WHY explanation** — what this does and why it was triggered
|
|
20
|
+
- **Approve / Reject** — per step, before anything runs
|
|
239
21
|
|
|
240
22
|
---
|
|
241
23
|
|
|
242
|
-
|
|
243
|
-
**What it is:** A presentation-only selector layer that maps inputs to operation or investigation mode when interpretation does not apply. It runs inside the Present Layer ? never inside intake.
|
|
244
|
-
|
|
245
|
-
**Why it matters:** 86% "unknown" at the presentation level is honest, not a failure. But when a pattern recurs across sessions (C1/C3 evidence), it can be given a structured guidance voice without involving the interpreter. Silent Guidance covers ops/BI vocabulary (scale, export, deploy, inspect, trace) that belongs to presentation, not execution.
|
|
246
|
-
|
|
247
|
-
**Hard invariants:** no patch leakage, deterministic output, investigation > operation precedence when both match, no template may contain `replace ? with ?`.
|
|
248
|
-
|
|
249
|
-
---
|
|
24
|
+
## Example
|
|
250
25
|
|
|
251
|
-
### Control Visibility & Boundary Integrity (CVG ? Phase 5B)
|
|
252
|
-
**What it is:** A presentation-layer safety contract. Not a UI concern. A programmatic classifier that maps any presentation state to a `ControlVisibilityResult` ? a structured record of the signal's authority tier (`blocking` / `attention` / `normal`) and three derived boolean flags.
|
|
253
|
-
|
|
254
|
-
**Why it matters:** This is the formal guard against the Helix-style failure mode where critical authority signals blend into normal presentation. With CVG in place, a `conflict_detected` state cannot be visually or programmatically equivalent to a `ready_to_confirm` state.
|
|
255
|
-
|
|
256
|
-
**One-liner:** Makes higher-authority signals structurally distinct from lower-authority signals ? and throws immediately if that contract is violated.
|
|
257
|
-
|
|
258
|
-
**Locked rules (C1?C4):**
|
|
259
|
-
- **C1** ? CVG derives from existing presentation state only. No re-evaluation. No inference.
|
|
260
|
-
- **C2** ? CVG may assert, but never route. It never changes the caller's output.
|
|
261
|
-
- **C3** ? CVG may detect violations, but never repair them silently. Bad results throw.
|
|
262
|
-
- **C4** ? Any new presentation family must extend `ControlVisibilityInput`, add an exhaustive mapping, and pass uniqueness + invariant tests before shipping.
|
|
263
|
-
|
|
264
|
-
**Level ? flag assignment:**
|
|
265
|
-
|
|
266
|
-
| Level | `must_block_confirm` | `must_surface` | `must_differentiate` |
|
|
267
|
-
|-------|---------------------|----------------|----------------------|
|
|
268
|
-
| `blocking` | `true` | `true` | `true` |
|
|
269
|
-
| `attention` | `false` | `true` | `true` |
|
|
270
|
-
| `normal` | `false` | `false` | `true` |
|
|
271
|
-
|
|
272
|
-
**Hooks:** `presentFromInput()` and `renderReminder()` run CVG eval + assert on every call. Output shapes are unchanged.
|
|
273
|
-
|
|
274
|
-
---
|
|
275
|
-
|
|
276
|
-
### Cursor Execution Seam (Phase 6)
|
|
277
|
-
**What it is:** The governed handoff path from a prepared edit artifact to real filesystem mutation. Consists of four components that form a strict one-way delivery pipe:
|
|
278
|
-
|
|
279
|
-
```
|
|
280
|
-
CursorHandoffArtifact
|
|
281
|
-
? CursorDeliveryGate (eligibility check + UCP persistence)
|
|
282
|
-
? CursorEditorPlugin (transport interface)
|
|
283
|
-
? CursorInProcessAdapter (exact-match filesystem edit)
|
|
284
26
|
```
|
|
27
|
+
$ npx usesteady
|
|
285
28
|
|
|
286
|
-
|
|
29
|
+
UseSteady
|
|
287
30
|
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
- `ucp.cursor_handoff.v1` persists before send, or delivery is blocked
|
|
291
|
-
- Raw input never crosses the seam ? Cursor cannot reinterpret what it never receives
|
|
292
|
-
- Exactly one string match required ? `ambiguous_match` is a first-class refusal
|
|
293
|
-
- Scope refusal is recoverable only through H narrowing ? never automatic file choice
|
|
294
|
-
- OCD constrains only; it never proposes `allowedFiles`
|
|
31
|
+
AI can propose changes.
|
|
32
|
+
You approve before they run.
|
|
295
33
|
|
|
296
|
-
|
|
34
|
+
Type your request:
|
|
35
|
+
> Update button color
|
|
297
36
|
|
|
298
|
-
|
|
299
|
-
**What it is:** A flat immutable state machine that connects the execution engine to human interaction flows. One session = one edit attempt. Consumers receive a `CursorSessionState` after each transition and route based on `phase`.
|
|
37
|
+
Generating execution plan...
|
|
300
38
|
|
|
301
|
-
|
|
39
|
+
SYSTEM WILL
|
|
302
40
|
|
|
303
|
-
|
|
41
|
+
1. Modify src/components/Button.tsx
|
|
42
|
+
Replace bg-blue-500 → bg-indigo-600
|
|
304
43
|
|
|
305
|
-
|
|
306
|
-
- **P1** ? Terminal states are immutable. All transitions return the same state object unchanged after termination.
|
|
307
|
-
- **P2** ? Scope clarification is candidate-bounded. `answerScope()` rejects any file not in `scopeQuestion.candidates`.
|
|
308
|
-
- **P3** ? Approval is never execution. `approve()` ? `"approved"`. `deliver()` is a separate explicit act.
|
|
309
|
-
- **P4** ? Delivery requires approved. `deliver()` from any other phase ? `"blocked"`.
|
|
310
|
-
- **P5** ? Session carries state but has no independent authority.
|
|
311
|
-
- **P6** ? All authority lives in intake, OCD/policy, H, and the delivery gate. The session calls them; it does not replace them.
|
|
44
|
+
RISK: LOW
|
|
312
45
|
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
### Interaction Contract
|
|
316
|
-
**What it is:** An immutable record of how this user prefers to receive responses ? ambiguity tolerance, explanation style, guidance format. Updated by pure functions from observed events.
|
|
317
|
-
|
|
318
|
-
**Why it matters:** The system can adapt to a user who consistently corrects misinterpretations, without requiring a profile UI or persistent identity.
|
|
46
|
+
WHY
|
|
47
|
+
Updates shared button styling.
|
|
319
48
|
|
|
320
|
-
|
|
49
|
+
[a] Approve [r] Reject
|
|
321
50
|
|
|
322
|
-
|
|
323
|
-
A counter field on the contract that records how many times each intent category was seen in guided recovery flows where the Intent Interpretation Bridge actually fired.
|
|
51
|
+
> a
|
|
324
52
|
|
|
325
|
-
|
|
326
|
-
type ObservedIntentPatterns = {
|
|
327
|
-
visual_color: number; // guide + visual_color classification + bridgeFired
|
|
328
|
-
text_change: number; // guide + text_change classification + bridgeFired
|
|
329
|
-
config_change: number; // guide + config_change classification + bridgeFired
|
|
330
|
-
};
|
|
331
|
-
```
|
|
332
|
-
|
|
333
|
-
**How to observe:** Call `observeIntentPattern(result, trace)` after a pipeline run. It returns an `InteractionEvent | null`. If non-null, pass it to `applyInteractionEvent(contract, event)` to get an updated contract.
|
|
53
|
+
✓ Step approved — Button.tsx updated.
|
|
334
54
|
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
const event = observeIntentPattern(result, trace);
|
|
338
|
-
if (event !== null) {
|
|
339
|
-
contract = applyInteractionEvent(contract, event);
|
|
340
|
-
}
|
|
55
|
+
All steps reviewed.
|
|
56
|
+
Return to your workflow to continue.
|
|
341
57
|
```
|
|
342
58
|
|
|
343
|
-
|
|
344
|
-
1. `observeIntentPattern` returns null unless **all four** conditions are met:
|
|
345
|
-
- `result.mode === "guide"`
|
|
346
|
-
- `result.guidance?.interpretation` is defined
|
|
347
|
-
- `trace.bridgeFired === true`
|
|
348
|
-
- interpretation category is one of `visual_color`, `text_change`, `config_change`
|
|
349
|
-
2. `observedIntentPatterns` is **advisory only**. It influences guidance ordering and label emphasis in B2. It may **never** affect: PRV, Safety Gate, Context Alignment, Disambiguation, Completion, response mode, intentState, or interpretation category/confidence.
|
|
350
|
-
3. No observation event fires on `execute`, `refuse`, `clarify`, or `ignore` flows.
|
|
351
|
-
4. No observation event fires when `bridgeFired: false` or when no trace is provided.
|
|
352
|
-
|
|
353
|
-
**Why `bridgeFired` is required (not just interpretation presence):** The trace is explicit confirmation that the bridge ran and completed. Requiring it prevents the observation from being triggered by any future path that might attach interpretation from a different source.
|
|
354
|
-
|
|
355
|
-
---
|
|
356
|
-
|
|
357
|
-
### Session-Aware Guidance Ordering (Phase B2)
|
|
358
|
-
**What it is:** A purely presentational step that runs after `enrichGuidance` and applies two adjustments to the `GuidancePayload` based on `observedIntentPatterns`:
|
|
359
|
-
|
|
360
|
-
1. **Step type ordering** ? `read_first` is guaranteed before `use_exact_format` (defensive sort; already true for enriched paths, but now an explicit invariant).
|
|
361
|
-
|
|
362
|
-
2. **Label emphasis** ? when `patterns[category] >= FAMILIARITY_THRESHOLD` (3 observations), the `read_first` step label shortens from its explanatory form to an action-first form:
|
|
59
|
+
### High risk example
|
|
363
60
|
|
|
364
|
-
| Category | Default label (0 observations) | Familiar label (?3 observations) |
|
|
365
|
-
|---|---|---|
|
|
366
|
-
| `visual_color` | `Read the component file first to find the current color value: read "<file>"` | `Find the current color in the component file: read "<file>"` |
|
|
367
|
-
| `text_change` | `Read the file first to find the current text: read "<file>"` | `Find the current text in the file: read "<file>"` |
|
|
368
|
-
| `config_change` | `Read the config file first to find the current setting: read "<file>"` | `Find the current setting in the config file: read "<file>"` |
|
|
369
|
-
|
|
370
|
-
**The "first" word is the learner cue.** New users need it. Experienced users don't.
|
|
371
|
-
|
|
372
|
-
**Hard invariants (must not be violated):**
|
|
373
|
-
1. Step count never changes ? no step is added or removed.
|
|
374
|
-
2. `missing[]` never changes.
|
|
375
|
-
3. `interpretation` (category, confidence, summary, basis) never changes.
|
|
376
|
-
4. `mode`, `reason`, `signal`, `intentState` are decided before this function runs ? it cannot affect them.
|
|
377
|
-
5. A high count for one category never touches another category's labels.
|
|
378
|
-
6. `read_first` is never suppressed, only relabeled.
|
|
379
|
-
7. If no `read_first` step exists in the guidance, the function has no effect.
|
|
380
|
-
8. Tie-breaking: stable sort ? same-type steps preserve their original relative order.
|
|
381
|
-
|
|
382
|
-
**`FAMILIARITY_THRESHOLD = 3`** ? enough observations to distinguish deliberate repeated use from accidental. Below threshold, nothing changes.
|
|
383
|
-
|
|
384
|
-
---
|
|
385
|
-
|
|
386
|
-
### Response Planner
|
|
387
|
-
**What it is:** A deterministic mapping from intent state to response mode. No logic ? just a lookup table with documented reasoning.
|
|
388
|
-
|
|
389
|
-
**Why it matters:** The decision of what to do is explicit, auditable, and separate from the understanding that produced it.
|
|
390
|
-
|
|
391
|
-
**Mapping:**
|
|
392
|
-
| Intent state | Response mode |
|
|
393
|
-
|---|---|
|
|
394
|
-
| unsafe | refuse |
|
|
395
|
-
| non_literal | ignore |
|
|
396
|
-
| ambiguous | clarify |
|
|
397
|
-
| incomplete | guide |
|
|
398
|
-
| guided_recovery | guide |
|
|
399
|
-
| clear | execute |
|
|
400
|
-
|
|
401
|
-
---
|
|
402
|
-
|
|
403
|
-
## Locked behavior invariants
|
|
404
|
-
|
|
405
|
-
These were confirmed by a persona friction pass against the live system. They are product invariants, not guidelines. Future contributors must not violate them.
|
|
406
|
-
|
|
407
|
-
Any code change that violates an invariant below is a breaking change ? same as the written contracts above.
|
|
408
|
-
|
|
409
|
-
1. **Intent Interpretation is narrower than user language.** The bridge only classifies what it can safely claim. If no color word, text-change verb+target, or config signal is present, it returns null and stays silent. Helpfulness without evidence is overclaiming.
|
|
410
|
-
|
|
411
|
-
2. **Null is a valid success state.** A bridge result of null means "no safe interpretation exists." It is not an error. It is not a fallback. It is the correct answer when the signal is absent. Do not replace null with a guess.
|
|
412
|
-
|
|
413
|
-
3. **Vague intent never yields high confidence.** High confidence is reserved for inputs where the evidence is unambiguous and specific. No vague natural-language request (without exact values) may produce `confidence: "high"`. Medium or low are the correct ceiling.
|
|
414
|
-
|
|
415
|
-
4. **Incomplete intent does not use the bridge.** When `CompletionResult.kind === "incomplete"`, the intent is already unambiguous ? a required field is simply missing. Running intent interpretation on these inputs would add noise, not meaning. The bridge skips them.
|
|
416
|
-
|
|
417
|
-
5. **Bridge output may improve labels, not truth.** The bridge may rewrite step labels to be context-aware. It may never invent file paths, token names, current values, or new values. The `missing[]` array must remain identical before and after enrichment.
|
|
418
|
-
|
|
419
|
-
6. **Executable interpretation and guided interpretation are separate families.** `InterpretationResult` (change interpretation, at `IntakeResult.interpretation`) describes what a structured patch command means. `IntentInterpretation` (intent bridge, at `IntakeResult.guidance.interpretation`) describes what a vague request appears to be attempting. These must never be merged or confused. One is for execute mode. One is for guide mode.
|
|
420
|
-
|
|
421
|
-
7. **Observation events fire only when the bridge actually ran.** `observeIntentPattern` requires `trace.bridgeFired === true` as an explicit precondition. Without a trace, no observation fires. This ensures the session observation only tracks meaningful guided interpretations ? not generic noise, incomplete paths, or short-circuited flows.
|
|
422
|
-
|
|
423
|
-
8. **Observed patterns are advisory only ? they never affect mode or truth.** `observedIntentPatterns` counts accumulate silently. They influence guidance ordering and label emphasis (B2). They must never influence `mode`, `intentState`, `safetyVerdict`, `completionKind`, or `interpretation category/confidence`. Same input + same context ? same mode, always, regardless of prior observation counts.
|
|
424
|
-
|
|
425
|
-
9. **Guidance ordering may only reorder and relabel ? never add, remove, or suppress.** `applyGuidanceOrdering` may change the position and label text of existing steps. It may never add a new step, remove an existing step, remove a `read_first` step, or suppress any step that was present in the input guidance. Step count in equals step count out, always.
|
|
426
|
-
|
|
427
|
-
10. **Cross-category isolation is absolute.** A high count in `text_change` never affects `visual_color` labels, and vice versa. Category-specific familiar labels are applied only when the current guidance's `interpretation.category` matches the pattern that exceeded the threshold.
|
|
428
|
-
|
|
429
|
-
---
|
|
430
|
-
|
|
431
|
-
## Integration boundaries
|
|
432
|
-
|
|
433
|
-
### Intake + Present seam (HTTP API / CLI)
|
|
434
|
-
|
|
435
|
-
```
|
|
436
|
-
User input
|
|
437
|
-
→
|
|
438
|
-
runIntakeWithTrace(input, ctx) → this package
|
|
439
|
-
observeIntentPattern + applyInteractionEvent → this package
|
|
440
|
-
presentFromInput(input, intakeResult) → this package
|
|
441
|
-
→ PresentResult (kind: "reminder" | "intake")
|
|
442
|
-
→ consumed by CLI shell (src/shell/cli/main.ts)
|
|
443
|
-
or API server (server.ts → HTTP JSON → React UI at ui/)
|
|
444
61
|
```
|
|
62
|
+
SYSTEM WILL
|
|
445
63
|
|
|
446
|
-
|
|
64
|
+
1. Permanently delete src/utils/deprecated.ts
|
|
447
65
|
|
|
448
|
-
|
|
66
|
+
Removes deprecated.ts and all its exports
|
|
67
|
+
Any import of this file will fail immediately after deletion
|
|
68
|
+
No automatic recovery — requires git revert if this was a mistake
|
|
449
69
|
|
|
450
|
-
|
|
451
|
-
User input
|
|
452
|
-
? runIntakeWithUCP()
|
|
453
|
-
?
|
|
454
|
-
CursorProductSession.submit() ? prepareCursorExecution() inside
|
|
455
|
-
? phase: "prepared" | "conflict"
|
|
456
|
-
? H reviews display, accepts conflict (if any)
|
|
457
|
-
?
|
|
458
|
-
CursorProductSession.approve() ? approveArtifact() inside
|
|
459
|
-
? phase: "approved"
|
|
460
|
-
?
|
|
461
|
-
CursorProductSession.deliver() ? deliverCursorExecution() ? CursorDeliveryGate inside
|
|
462
|
-
? phase: "accepted" | "scope_question" | "exec_error" | "blocked"
|
|
463
|
-
?
|
|
464
|
-
Filesystem (real file mutation ? only on "accepted")
|
|
465
|
-
```
|
|
466
|
-
|
|
467
|
-
Every stage is independently testable. The gate is the enforcement point between H and filesystem. Nothing below the gate is reachable without a persisted `ucp.cursor_handoff.v1`.
|
|
70
|
+
RISK: HIGH
|
|
468
71
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
intake ? sole mode authority
|
|
472
|
-
OCD policy ? sole constraint authority
|
|
473
|
-
H ? sole approval authority
|
|
474
|
-
delivery gate ? sole eligibility authority
|
|
475
|
-
session ? state carrier only (no authority of its own)
|
|
476
|
-
```
|
|
72
|
+
WHY
|
|
73
|
+
Removes deprecated utilities no longer used.
|
|
477
74
|
|
|
478
|
-
|
|
75
|
+
⚠ HIGH RISK — cannot be undone without version control
|
|
479
76
|
|
|
77
|
+
[a] Approve (high risk)
|
|
78
|
+
[r] Reject
|
|
480
79
|
```
|
|
481
|
-
User input
|
|
482
|
-
? runIntakeWithUCP() ? intake: sole mode authority
|
|
483
|
-
?
|
|
484
|
-
buildClaudeHandoffArtifact()
|
|
485
|
-
? executionDomain derived here (A1) ? never reclassified downstream
|
|
486
|
-
? toolPolicy.networkAccess = "deny" (V1 lock, A2)
|
|
487
|
-
? H reviews, approves ? approveClaudeArtifact()
|
|
488
|
-
?
|
|
489
|
-
ClaudeDeliveryGate.deliver()
|
|
490
|
-
? checks: eligibility, networkAccess === "deny" (A2), filesystemMode, allowedTools
|
|
491
|
-
? persists ucp.claude_handoff.v1 BEFORE Claude is called
|
|
492
|
-
?
|
|
493
|
-
ClaudeAgentPlugin.receive() ? only call gate makes into Claude (A4)
|
|
494
|
-
? returns: accepted | refused_due_to_scope | refused_due_to_execution_error
|
|
495
|
-
? unknown kinds ? refused_due_to_execution_error (fail-closed)
|
|
496
|
-
?
|
|
497
|
-
gate persists ucp.claude_receipt.v1 or ucp.claude_refused.v1
|
|
498
|
-
```
|
|
499
|
-
|
|
500
|
-
Four Phase A locked truths enforced in the gate:
|
|
501
|
-
- **A1** ? `executionDomain` is mapper-derived; gate validates presence only
|
|
502
|
-
- **A2** ? `networkAccess: "allow_limited"` ? `blocked_tool_policy` (V1 hard block)
|
|
503
|
-
- **A3** ? Session interruption ? `refused_due_to_execution_error` (non-resumable in V1)
|
|
504
|
-
- **A4** ? No Claude?Intake callback path; `ClaudeAgentPlugin` is one-way only
|
|
505
|
-
|
|
506
|
-
---
|
|
507
|
-
|
|
508
|
-
## Governance documents
|
|
509
|
-
|
|
510
|
-
| Document | What it governs |
|
|
511
|
-
|---|---|
|
|
512
|
-
| [`docs/cursor-v1-baseline.md`](docs/cursor-v1-baseline.md) | Cursor seam + product session invariants |
|
|
513
|
-
| [`docs/cvg-baseline.md`](docs/cvg-baseline.md) | Control Visibility & Boundary Integrity invariants |
|
|
514
|
-
| [`docs/claude-agent-phase-a-architecture.md`](docs/claude-agent-phase-a-architecture.md) | Claude Managed Agents Phase A freeze |
|
|
515
|
-
| [`docs/architecture/MODEL_INSERTION_POLICY.md`](docs/architecture/MODEL_INSERTION_POLICY.md) | Where AI models may and may not be used ? authority/advisory split, forbidden patterns, CI enforcement |
|
|
516
|
-
|
|
517
|
-
The Model Insertion Policy is a first-class invariant document alongside CVG and the Cursor baseline. Any change that introduces an AI model at any layer must comply with it before implementation.
|
|
518
|
-
|
|
519
|
-
---
|
|
520
|
-
|
|
521
|
-
## Non-goals
|
|
522
|
-
|
|
523
|
-
This project does not:
|
|
524
|
-
- Generate responses (that is the consumer's job)
|
|
525
|
-
- Execute anything
|
|
526
|
-
- Connect to any external service
|
|
527
|
-
- Use an LLM at any stage to make control decisions
|
|
528
|
-
- Correct user input silently
|
|
529
|
-
- Guess missing values
|
|
530
|
-
- Produce fuzzy matches
|
|
531
80
|
|
|
532
81
|
---
|
|
533
82
|
|
|
534
|
-
##
|
|
83
|
+
## What makes it different
|
|
535
84
|
|
|
536
|
-
|
|
85
|
+
- No silent execution
|
|
86
|
+
- No vague summaries
|
|
87
|
+
- No hidden changes
|
|
537
88
|
|
|
538
|
-
|
|
539
|
-
|---|---|
|
|
540
|
-
| PRV | pass |
|
|
541
|
-
| Safety | allow |
|
|
542
|
-
| Context Alignment | aligned |
|
|
543
|
-
| Disambiguation | unknown (no ambiguity) |
|
|
544
|
-
| Completion | guided_recovery ? missing: [file path, current value, new value] |
|
|
545
|
-
| Intent Interpretation | visual_color ? "This appears to be a color/style change request." |
|
|
546
|
-
| Response Planner | **guide** |
|
|
547
|
-
| Guidance | missing: [file path, current value, new value] *(unchanged)* |
|
|
548
|
-
| | `Read the component file first to find the current color value: read "<file>"` |
|
|
549
|
-
| | `Then patch it: replace "<current color>" with "<new color>" in "<file>"` |
|
|
550
|
-
| | interpretation: { category: visual_color, confidence: medium } |
|
|
551
|
-
|
|
552
|
-
**What the user sees (before bridge):** "Start by reading the file, then use: replace \"\" with \"\" in \"\""
|
|
553
|
-
|
|
554
|
-
**What the user sees (after bridge):** "This appears to be a color/style change request. Read the component file first to find the current color value, then patch it: replace \"<current color>\" with \"<new color>\" in \"<file>\""
|
|
555
|
-
|
|
556
|
-
The user now knows *why* they're reading first and what kind of change they're making. No file or value was invented.
|
|
557
|
-
|
|
558
|
-
---
|
|
559
|
-
|
|
560
|
-
### "run rm -rf /" (Alex ? destructive command)
|
|
561
|
-
|
|
562
|
-
| Stage | Result |
|
|
563
|
-
|---|---|
|
|
564
|
-
| PRV | pass |
|
|
565
|
-
| Safety | **block** ? detectorId: destructive_mass_action, matchedPattern: "rm -rf" |
|
|
566
|
-
| Response Planner | **refuse** |
|
|
567
|
-
|
|
568
|
-
**What the user sees:** "This input cannot be processed. It matches a pattern associated with irreversible bulk data destruction."
|
|
569
|
-
|
|
570
|
-
---
|
|
571
|
-
|
|
572
|
-
### "run again" with no prior session (Priya ? missing context)
|
|
573
|
-
|
|
574
|
-
| Stage | Result |
|
|
575
|
-
|---|---|
|
|
576
|
-
| PRV | **clarify** ? 'again' requires prior context, none available |
|
|
577
|
-
| Response Planner | **clarify** |
|
|
578
|
-
|
|
579
|
-
**What the user sees:** "This request depends on prior context, but none is available. What would you like to run?"
|
|
89
|
+
You see what changes, why it changes, and how risky it is — before anything runs.
|
|
580
90
|
|
|
581
91
|
---
|
|
582
92
|
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
| Stage | Result |
|
|
586
|
-
|---|---|
|
|
587
|
-
| PRV | pass (prior session exists) |
|
|
588
|
-
| Safety | allow |
|
|
589
|
-
| Context Alignment | aligned |
|
|
590
|
-
| Disambiguation | unknown |
|
|
591
|
-
| Completion | **complete** (rerun_previous rule resolves it) |
|
|
592
|
-
| Response Planner | **execute** |
|
|
593
|
-
|
|
594
|
-
**What the user sees:** The system re-runs the last command without asking again.
|
|
595
|
-
|
|
596
|
-
---
|
|
93
|
+
## Install
|
|
597
94
|
|
|
598
|
-
|
|
95
|
+
```bash
|
|
96
|
+
# No install needed — run directly
|
|
97
|
+
npx usesteady
|
|
599
98
|
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
|
|
603
|
-
| Safety | allow |
|
|
604
|
-
| Context Alignment | aligned |
|
|
605
|
-
| Disambiguation | **ambiguous** ? options: ["loops = circular logic", "loops = iteration patterns"] |
|
|
606
|
-
| Response Planner | **clarify** |
|
|
99
|
+
# Or install globally
|
|
100
|
+
npm install -g usesteady
|
|
101
|
+
```
|
|
607
102
|
|
|
608
|
-
|
|
103
|
+
Requires Node.js 18+. Runs fully local. No cloud, no API key.
|
|
609
104
|
|
|
610
105
|
---
|
|
611
106
|
|
|
612
|
-
|
|
107
|
+
## Core idea
|
|
613
108
|
|
|
614
|
-
|
|
615
|
-
|---|---|
|
|
616
|
-
| PRV | pass |
|
|
617
|
-
| Safety | allow |
|
|
618
|
-
| Context Alignment | aligned |
|
|
619
|
-
| Disambiguation | unknown |
|
|
620
|
-
| Completion | **incomplete** ? missing: [commit message] |
|
|
621
|
-
| Response Planner | **guide** |
|
|
622
|
-
| Guidance | next step: commit "" |
|
|
109
|
+
**AI proposes. You approve. Then it runs.**
|
|
623
110
|
|
|
624
|
-
|
|
111
|
+
Like `git diff` — but for AI actions before they execute.
|
|
625
112
|
|
|
626
113
|
---
|
|
627
114
|
|
|
628
|
-
|
|
115
|
+
## Language
|
|
629
116
|
|
|
630
|
-
|
|
|
117
|
+
| Term | Meaning |
|
|
631
118
|
|---|---|
|
|
632
|
-
|
|
|
633
|
-
|
|
|
634
|
-
|
|
|
635
|
-
|
|
|
636
|
-
|
|
|
637
|
-
| Response Planner | **execute** |
|
|
638
|
-
| Interpretation | category: tailwind_color_change, confidence: high |
|
|
639
|
-
| | summary: "Changes background color from 'bg-blue-500' to 'bg-red-500' in Button.tsx." |
|
|
640
|
-
| | impact: ["Visual change only ? no logic, data, or behavior impact."] |
|
|
641
|
-
|
|
642
|
-
**What the user sees:** The change is applied. The result includes a plain-English description: "Changes background color from blue-500 to red-500 in Button.tsx ? visual change only."
|
|
119
|
+
| `SYSTEM WILL` | The exact operation that will run — not a summary |
|
|
120
|
+
| `SYSTEM SUGGESTS` | Options shown when AI is unsure — not a guess |
|
|
121
|
+
| `Approve` / `Reject` | Your decision, per step |
|
|
122
|
+
| `Revert last approval` | Undo a decision (not a filesystem change — nothing has run yet) |
|
|
123
|
+
| `Risk: LOW / MEDIUM / HIGH` | Derived from what is actually changing |
|
|
643
124
|
|
|
644
125
|
---
|
|
645
126
|
|
|
646
|
-
##
|
|
647
|
-
|
|
648
|
-
These are non-negotiable invariants of the system. Any code change that violates one is a breaking change.
|
|
649
|
-
|
|
650
|
-
1. **Completion is the sole authority on executability.** Disambiguation `unknown` does not block completion from returning `complete`.
|
|
651
|
-
|
|
652
|
-
2. **Guide mode must carry structured guidance.** No consumer should need to re-run completion to render next steps.
|
|
653
|
-
|
|
654
|
-
3. **Completion receives context.** Any context-dependent executable intent must be resolvable in completion.
|
|
655
|
-
|
|
656
|
-
4. **PRV is lexical. Context Alignment is semantic.** No semantic reasoning in PRV.
|
|
657
|
-
|
|
658
|
-
5. **Public output exposes final outcome, not internal intermediates.** Disambiguation `unknown` is not a public signal.
|
|
659
|
-
|
|
660
|
-
6. **Interpretation is advisory.** It does not change the mode decision. It describes what the human will experience.
|
|
661
|
-
|
|
662
|
-
7. **`incomplete` ? `guided_recovery`.** These are distinct states with distinct semantics. Do not collapse them.
|
|
663
|
-
|
|
664
|
-
8. **Interpretation can improve guidance, but it can never manufacture executability.** The Intent Interpretation Bridge runs only for `guided_recovery`. It may improve labels. It may never guess values, file paths, or tokens. It may never change the mode decision.
|
|
665
|
-
|
|
666
|
-
---
|
|
667
|
-
|
|
668
|
-
## Test summary
|
|
669
|
-
|
|
670
|
-
```
|
|
671
|
-
2321 tests passing across 53 test files.
|
|
672
|
-
0 failures.
|
|
673
|
-
```
|
|
674
|
-
|
|
675
|
-
| Area | Files | What is covered |
|
|
676
|
-
|------|-------|-----------------|
|
|
677
|
-
| `tests/ucp/` | 4 | Envelope hash determinism, structure, persistence, timeline, phase 7 alignment |
|
|
678
|
-
| `tests/prv/` | 1 | Lexical markers, narrowed patterns, context boundary |
|
|
679
|
-
| `tests/safety/` | 1 | Blocking, allowing, detectorId + matchedPattern inspectability |
|
|
680
|
-
| `tests/understand/` | 12 | Context alignment, disambiguation, completion, change interpretation, intent bridge (unit + friction + pressure), silent guidance, session-aware guidance ordering |
|
|
681
|
-
| `tests/interaction/` | 2 | Default contract, updater, selectors, observation events |
|
|
682
|
-
| `tests/intake/` | 10 | All five modes, guidance contract, interpretation, branch validation, session validation |
|
|
683
|
-
| `tests/present/` | 4 | Badge/headline/category/confidence, reminder presenter + renderer, coordinator routing, CVG signal mapping + flag assignment + uniqueness + assertion invariants + integration |
|
|
684
|
-
| `tests/cursor/` | 3 | Delivery gate (eligibility, persistence, OCD, glob-matcher), in-process adapter (real FS, round-trip scope proof), execution coordinator (two-phase, consumer helpers) |
|
|
685
|
-
| `tests/product/` | 2 | Session invariants S1?S8, phase gates, terminal state preservation, approve ? deliver; session resilience (snapshot, staleness, summary) |
|
|
686
|
-
| `tests/execution/` | 8 | Reminder execution parse-once, validator, artifact-first design, UCP reminder chain |
|
|
687
|
-
| `tests/claude/` | 1 | Claude delivery gate: A1?A4 invariants, three delivery paths, UCP chain integrity, fail-closed unknowns, session interruption (A3) |
|
|
688
|
-
|
|
689
|
-
---
|
|
690
|
-
|
|
691
|
-
## How to safely extend the system
|
|
692
|
-
|
|
693
|
-
Three rules. All three apply to every change.
|
|
694
|
-
|
|
695
|
-
**1. Extend only one layer at a time.**
|
|
696
|
-
Each layer has a defined scope and a defined boundary. An intake change is an intake change. A presentation change is a presentation change. A change that touches two layers simultaneously is a seam violation ? it means the boundary was unclear, not that the rule is wrong. Clarify the boundary first, then make the change in the correct layer.
|
|
697
|
-
|
|
698
|
-
**2. Update baseline ? annotation ? tests. In that order.**
|
|
699
|
-
Before writing code: update the relevant baseline document (`docs/cursor-v1-baseline.md`, `docs/cvg-baseline.md`, or the written contracts in this file) to state what the new invariant is. Then add the comment-level annotation to the source file at the point where the rule could be violated. Then write the test that proves it. If you cannot write the invariant before the code, the design is not clear enough yet.
|
|
700
|
-
|
|
701
|
-
**3. Never introduce authority into the Present or Session layers.**
|
|
702
|
-
The Present layer formats state. The Session layer carries state. Neither layer may make a decision that belongs to intake, OCD policy, H, or the delivery gate. The test is simple: if the new code changes what *happens* based on what *was presented*, it belongs to a different layer. If it only changes what *is shown* based on what *already happened*, it belongs here.
|
|
127
|
+
## You see SYSTEM WILL before anything runs.
|
|
703
128
|
|
|
704
|
-
|
|
705
|
-
Models may only be inserted as language adapters in advisory layers. They may never participate in routing, classification, approval, scope selection, gate decisions, or UCP persistence. See [`docs/architecture/MODEL_INSERTION_POLICY.md`](docs/architecture/MODEL_INSERTION_POLICY.md) for the complete authority/advisory split, forbidden patterns, implementation constraints, and CI enforcement requirements.
|
|
129
|
+
That is the guarantee. Every step. No exceptions.
|
|
706
130
|
|
|
707
131
|
---
|
|
708
132
|
|
|
709
|
-
|
|
710
|
-
|
|
711
|
-
| Item | Status |
|
|
712
|
-
|------|--------|
|
|
713
|
-
| Change Interpretation ? structural code changes | Not in v1 scope |
|
|
714
|
-
| Change Interpretation ? multi-file changes | Not in v1 scope |
|
|
715
|
-
| Intent Interpretation Bridge ? file search / token lookup | Intentionally excluded; bridge never searches |
|
|
716
|
-
| Intent Interpretation Bridge ? `incomplete` enrichment | Intentionally excluded; incomplete already has specific guidance |
|
|
717
|
-
| Interaction Contract ? persistence | In-memory only; no storage layer yet |
|
|
718
|
-
| Reminder scheduler / persistence | Reminder execution is complete; scheduling is deferred |
|
|
719
|
-
| Timezone resolution | Deferred; time is extracted as text, not resolved |
|
|
720
|
-
| `ucp.cursor_artifact.v1` | Reserved slot between `cursor_receipt` and `execution_trace`; deferred until Cursor produces a real diff object |
|
|
721
|
-
| IPC / remote Cursor transport | In-process adapter proves the contract; remote transport follows same `CursorEditorPlugin` interface |
|
|
722
|
-
| Persistent run store | ✅ Shipped PI-4 Iter 1 — file-backed store (`server-store.ts`) |
|
|
723
|
-
| Real-time frame streaming | ✅ Shipped PI-4 Iter 3 — SSE endpoint (`GET /api/workflow/:runId/events`) |
|
|
724
|
-
| Claude real API adapter | `ClaudeStubAdapter` used in web server; real Anthropic wiring requires `@anthropic-ai/sdk` install |
|
|
133
|
+
[usesteady.dev](https://usesteady.dev) · Apache 2.0
|