usesteady 0.1.0-alpha.1 → 0.1.0-alpha.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,724 +1,133 @@
1
- # usesteady-core-v2
1
+ # UseSteady — Review AI actions before they run
2
2
 
3
- UseSteady V1 controlled refactor engine with step-by-step approval and full audit trail.
4
-
5
- **Core system:** deterministic intake + understanding engine, Cursor and Claude execution runtimes, multi-step workflow coordinator, CLI shell, React web UI, and history/audit layer. No LLM in the control path. No cloud dependency.
6
-
7
- ---
8
-
9
- ## What this does in one sentence
10
-
11
- Given a natural-language input and a session context, the system returns one of five verbs: **refuse, ignore, clarify, guide, or execute** ? plus structured data explaining why.
12
-
13
- Natural language never executes directly. Ever.
14
-
15
- ---
16
-
17
- ## Build phases
18
-
19
- | Phase | Name | Status |
20
- |-------|------|--------|
21
- | 1 | Intake — PRV, Safety, Context Alignment, Disambiguation, Completion | ✅ Locked |
22
- | 2–3 | Understand — Intent Interpretation Bridge, interpreter system | ✅ Locked |
23
- | 4 | UCP Core — canonical envelope format, hashing, persistence, provenance chain | ✅ Locked |
24
- | 4C | Silent Guidance — selector expansion (operation + investigation), evidence loop | ✅ Locked |
25
- | 5 | Present Layer — reminder presentation, formatIntakeResult, presentFromInput | ✅ Frozen |
26
- | 5B | Control Visibility (CVG) — authority-signal classifier, assertion layer | ✅ Baseline locked |
27
- | 5C | Boundary Explanation — why-routing surfaced to H | ✅ Baseline locked |
28
- | 5D | Cross-layer Contradiction Visibility | ✅ Baseline locked |
29
- | 6 | Execution Layer — Cursor seam (artifact → gate → adapter) | ✅ Locked |
30
- | 6B | Cursor Product Session — phase state machine, terminal guards, session invariants | ✅ Baseline locked |
31
- | 6C | Session Resilience — interruption handling, partial-session recovery | ✅ Baseline locked |
32
- | 7 | UCP Alignment — cursor envelope types, timeline, provenance chain extension | ✅ Locked |
33
- | 8A | Claude Managed Agents — seam design, authority model, tool policy | ✅ Phase A frozen |
34
- | 8B | Claude Delivery Gate — ClaudeAgentPlugin, handoff persistence, stub adapter | ✅ Baseline locked |
35
- | 8C | Claude API Adapter — real Anthropic client behind ClaudeAgentPlugin interface | ✅ Baseline locked |
36
- | 8D | Claude Product Session — session wiring, parity with Cursor path | ✅ Baseline locked |
37
- | 9A–9F | Product Shell — CLI, workflow coordinator, workflow shell, UCP persistence | ✅ Frozen |
38
- | 10A–10C | History / Audit — two-tier read model, history shell | ✅ Frozen |
39
- | 11B–11C | Product Slice Validation — Safe Refactor Workflow, CLI surface validation | ✅ Frozen |
40
- | 11D | UI Design — reviewing phase, raw input anchor, targetFiles scope | ✅ Frozen |
41
- | 11A-Web | Web UI — React shell on frozen contracts, Express API bridge | ✅ Frozen |
42
- | 11E | Workflow Builder — natural-language entry model | ✅ Frozen |
43
- | 12 | Simulation — persona-based validation across all interaction states | ✅ Passed |
44
-
45
- **Correctly deferred (not yet built):**
46
- reminder scheduler / persistence, timezone resolution,
47
- `cursor_artifact.v1`, `ucp.claude_session.v1`, `ucp.claude_result.v1`,
48
- Session 7 vocabulary validation, `networkAccess: "allow_limited"` semantics,
49
- Claude session resumability, mobile layout.
50
-
51
- ---
52
-
53
- ## Core pipeline
54
-
55
- Every input passes through the same sequence. No step is skipped. No step can be reordered.
3
+ UseSteady shows you exactly what AI will do before it executes anything.
56
4
 
57
5
  ```
58
- input + context
59
- ?
60
- ?
61
- PRV ? Is this input obviously missing required prior state?
62
- ? clarify if yes
63
- ?
64
- Safety Gate ? Is this input unsafe to process at all?
65
- ? refuse if yes
66
- ?
67
- Context Alignment ? Is this a social message or a context reference with no context?
68
- ? ignore / clarify if yes
69
- ?
70
- Disambiguation ? Does this input contain a term with multiple valid meanings?
71
- ? clarify if yes
72
- ?
73
- Completion ? Is this input specific enough to be actionable?
74
- ? (receives full context ? resolves "run again" etc.)
75
- ? guide if incomplete or vague
76
- ?
77
- Intent Interpretation ? What does the user appear to be trying to do? (guided_recovery only)
78
- Bridge ? advisory ? improves guidance labels, never invents missing facts
79
- ?
80
- ?
81
- Response Planner ? Maps intent state to a final response mode
82
- ?
83
- ?
84
- (if execute)
85
- Change Interpretation ? What will the human experience from this change? (advisory)
86
- ?
87
- ?
88
- Guidance Ordering ? applyGuidanceOrdering (B2 ? session-aware, advisory only)
89
- [if guidance exists] reorders read_first before use_exact_format
90
- shortens read_first label when category is familiar (?3 obs)
91
- ?
92
- ?
93
- IntakeResult ? { mode, reason, signal, intentState, guidance?, interpretation? }
94
- ?
95
- ?
96
- Presentation Layer ? formatIntakeResult ? PresentationOutput
97
- badge, headline, category, confidence, steps, missing
6
+ npx usesteady
98
7
  ```
99
8
 
100
9
  ---
101
10
 
102
- ## Layers
103
-
104
- ### UCP ? UseSteady Control Protocol
105
- **What it is:** The canonical message envelope format. Every payload in the system has a version, kind, id, timestamp, and a SHA-256 hash derived from all fields.
106
-
107
- **Why it matters:** Envelopes are tamper-evident. The hash makes the payload verifiable without trusting memory or state.
108
-
109
- **One-liner:** All messages have a stable identity and a content hash.
110
-
111
- ---
112
-
113
- ### PRV ? Pre-Response Validation
114
- **What it is:** A fast lexical check that runs before anything else. Looks for high-signal markers (`again`, `continue`, `previous`, `same as`) that mean the input requires prior state.
115
-
116
- **Why it matters:** If you say "run again" and there is no prior run, the system must clarify ? not guess. PRV enforces this at the front door.
117
-
118
- **One-liner:** Stops obvious context-dependent requests before they go anywhere.
119
-
120
- **Boundary with Context Alignment:** PRV is lexical (word-level). Context Alignment is semantic (phrase-level). "same file", "that result", "my project" are handled by Context Alignment, not PRV.
121
-
122
- ---
123
-
124
- ### Safety Gate
125
- **What it is:** A deterministic, registry-driven chain of detectors. Runs after PRV, before anything else. Each detector has an `id`, a `priority`, and labeled patterns that produce a `matchedPattern` and `detectorId` on block.
126
-
127
- **Why it matters:** Unsafe inputs ? destructive actions, credential access, rule bypass, arbitrary script execution ? are refused here. No unsafe input reaches interpretation or execution.
128
-
129
- **One-liner:** Blocks dangerous inputs early and tells you exactly why.
130
-
131
- **Inspectable fields on a block result:**
132
- - `detectorId` ? which detector fired
133
- - `matchedPattern` ? which specific phrase triggered it
134
- - `reason` ? the risk category enum
135
- - `note` ? plain-English explanation
136
-
137
- ---
138
-
139
- ### Context Alignment
140
- **What it is:** Classifies the input into one of three states: `aligned` (normal task), `non_literal` (social/greeting), or `hard_mismatch` (references prior context that isn't available).
141
-
142
- **Why it matters:** "Good morning" and "use same file" should never reach the planner. Context Alignment handles both correctly.
143
-
144
- **One-liner:** Identifies social messages and unresolvable context references.
145
-
146
- ---
147
-
148
- ### Disambiguation
149
- **What it is:** A registry of detectors that flag inputs with multiple valid interpretations. When a term is ambiguous, it returns bounded options ? never a silent rewrite.
150
-
151
- **Why it matters:** "change button" could mean text, color, behavior, or handler. The system must ask, not guess.
152
-
153
- **One-liner:** Detects terms with multiple meanings and surfaces the options.
154
-
155
- **Note:** `unknown` from disambiguation is an internal state ? it means "no ambiguity detected, proceed to completion." It is not returned as a public signal.
156
-
157
- ---
158
-
159
- ### Completion
160
- **What it is:** The authority on executability. Determines whether the input is actionable (`complete`), missing a specific required field (`incomplete`), or too vague to act on without format guidance (`guided_recovery`).
161
-
162
- **Why it matters:** Completion receives full session context, so "run again" resolves to `complete` when a prior run exists. Disambiguation returning `unknown` does not block completion from returning `complete`.
163
-
164
- **One-liner:** Decides whether the input is ready to execute ? and if not, explains exactly what's needed.
165
-
166
- **Semantic contract (must not be collapsed):**
167
- - `incomplete` = intent is clear, a required field is missing (e.g. "commit my changes" ? needs a message)
168
- - `guided_recovery` = intent is vague, a safe format path exists (e.g. "make button blue" ? needs file + values)
169
-
170
- **Written contract:** Completion is the sole authority on executability. Disambiguation `unknown` does not block it. Context-dependent executable inputs are resolved here.
171
-
172
- ---
173
-
174
- ### Change Interpretation (v1)
175
- **What it is:** An advisory layer that runs only when mode is `execute`. Describes what the human will experience from the change ? in plain English, with impact notes and confidence.
176
-
177
- **Why it matters:** The system can now say "this changes the background color from blue-500 to red-500 in Button.tsx ? visual change only, no logic impact" rather than just "execute." That is meaningful information for the user and for any downstream reviewer.
178
-
179
- **One-liner:** Tells you what the change means, not just that it's allowed.
180
-
181
- **v1 scope (deterministic, no inference):**
182
- - Tailwind color class changes (`bg-`, `text-`, `border-`, etc.) ? confidence: high
183
- - CSS color value changes (hex, rgb, hsl, named colors, CSS property declarations) ? confidence: high/medium
184
- - Config value changes (numbers, URLs, booleans in .json/.yaml/.env files) ? confidence: high/medium
185
- - Text literal changes (human-readable UI copy) ? confidence: medium
186
-
187
- **Returns `null`** for any input outside these three categories. Does not guess.
188
-
189
- ---
190
-
191
- ### Intent Interpretation Bridge
192
- **What it is:** An advisory layer that runs only when `CompletionResult.kind === "guided_recovery"`. It classifies the user's broad intent (visual color change, text/copy change, configuration change) and uses that classification to rewrite guidance labels into human-readable, context-aware form.
193
-
194
- **Why it matters:** Before this layer, guided recovery returned generic format instructions. After it, the same instructions are prefaced with "Read the component file first to find the current color value" ? which tells the user *why* they're being asked to read before patching. The meaning is still safe; nothing is invented.
195
-
196
- **One-liner:** Tells you what the user appears to be trying to do, so guided recovery reads like help rather than syntax documentation.
197
-
198
- **Hard invariant (locked):** Interpretation can improve guidance, but it can never manufacture executability.
199
-
200
- **What it classifies:**
201
- - `visual_color` ? named colors, comparative terms (darker/lighter), color/style targets (button, background, header)
202
- - `text_change` ? text-change verb AND text-content target (heading, label, copy, title, placeholder)
203
- - `config_change` ? strong config verbs (toggle/enable/disable) or config nouns (port, timeout, flag, env)
204
-
205
- **What it never does:**
206
- - Guess the file path
207
- - Guess the current or new value
208
- - Guess any token, class name, or key
209
- - Change the mode from `guide` to `execute`
210
-
211
- **Returns `null`** (no enrichment) if the input does not match any safe category. Original guidance is returned unchanged.
212
-
213
- **Priority ordering in the registry:**
214
- 1. Config (priority 10) ? must beat color for "toggle dark mode"
215
- 2. Color (priority 20) ? named colors + UI target terms
216
- 3. Text (priority 30) ? requires both verb AND target (tighter match)
217
-
218
- ---
219
-
220
- ### Presentation Layer
221
- **What it is:** A pure translation layer that converts an `IntakeResult` into a `PresentationOutput` ? the last-mile data structure a consumer can render directly, without re-running or re-interpreting anything.
222
-
223
- **Why it matters:** Without this layer, every consumer needs to know how to branch on `mode`, which interpretation family to use, where to find confidence, and how to format step labels. The presentation layer handles that once, correctly, and deterministically.
11
+ ## Why
224
12
 
225
- **One-liner:** Takes a decided `IntakeResult` and formats it for display.
13
+ AI tools can generate code, run commands, and modify files often before you fully understand what will change.
226
14
 
227
- **`PresentationOutput` fields:**
228
- - `badge` ? mode indicator: `REFUSE` | `IGNORE` | `CLARIFY` | `GUIDE` | `EXECUTE`
229
- - `headline` ? the first sentence to show; sourced from interpretation summary (when available) or system reason
230
- - `category` ? human-readable display name for the interpretation category (when available)
231
- - `confidence` ? `"High confidence"` / `"Medium confidence"` / `"Low confidence"` (when available)
232
- - `steps` ? formatted step labels from `guidance.nextSteps` (guide mode only)
233
- - `missing` ? list of missing fields from `guidance.missing` (guide mode only)
15
+ UseSteady adds a review layer:
234
16
 
235
- **What it never does:**
236
- - Re-runs intake, completion, or interpretation
237
- - Adds new decisions or modifies existing ones
238
- - Invents values, categories, or steps not already in `IntakeResult`
17
+ - **SYSTEM WILL** the exact change, not a summary
18
+ - **Risk level** — LOW / MEDIUM / HIGH, derived from what is actually changing
19
+ - **WHY explanation** what this does and why it was triggered
20
+ - **Approve / Reject** per step, before anything runs
239
21
 
240
22
  ---
241
23
 
242
- ### Silent Guidance
243
- **What it is:** A presentation-only selector layer that maps inputs to operation or investigation mode when interpretation does not apply. It runs inside the Present Layer ? never inside intake.
244
-
245
- **Why it matters:** 86% "unknown" at the presentation level is honest, not a failure. But when a pattern recurs across sessions (C1/C3 evidence), it can be given a structured guidance voice without involving the interpreter. Silent Guidance covers ops/BI vocabulary (scale, export, deploy, inspect, trace) that belongs to presentation, not execution.
246
-
247
- **Hard invariants:** no patch leakage, deterministic output, investigation > operation precedence when both match, no template may contain `replace ? with ?`.
248
-
249
- ---
24
+ ## Example
250
25
 
251
- ### Control Visibility & Boundary Integrity (CVG ? Phase 5B)
252
- **What it is:** A presentation-layer safety contract. Not a UI concern. A programmatic classifier that maps any presentation state to a `ControlVisibilityResult` ? a structured record of the signal's authority tier (`blocking` / `attention` / `normal`) and three derived boolean flags.
253
-
254
- **Why it matters:** This is the formal guard against the Helix-style failure mode where critical authority signals blend into normal presentation. With CVG in place, a `conflict_detected` state cannot be visually or programmatically equivalent to a `ready_to_confirm` state.
255
-
256
- **One-liner:** Makes higher-authority signals structurally distinct from lower-authority signals ? and throws immediately if that contract is violated.
257
-
258
- **Locked rules (C1?C4):**
259
- - **C1** ? CVG derives from existing presentation state only. No re-evaluation. No inference.
260
- - **C2** ? CVG may assert, but never route. It never changes the caller's output.
261
- - **C3** ? CVG may detect violations, but never repair them silently. Bad results throw.
262
- - **C4** ? Any new presentation family must extend `ControlVisibilityInput`, add an exhaustive mapping, and pass uniqueness + invariant tests before shipping.
263
-
264
- **Level ? flag assignment:**
265
-
266
- | Level | `must_block_confirm` | `must_surface` | `must_differentiate` |
267
- |-------|---------------------|----------------|----------------------|
268
- | `blocking` | `true` | `true` | `true` |
269
- | `attention` | `false` | `true` | `true` |
270
- | `normal` | `false` | `false` | `true` |
271
-
272
- **Hooks:** `presentFromInput()` and `renderReminder()` run CVG eval + assert on every call. Output shapes are unchanged.
273
-
274
- ---
275
-
276
- ### Cursor Execution Seam (Phase 6)
277
- **What it is:** The governed handoff path from a prepared edit artifact to real filesystem mutation. Consists of four components that form a strict one-way delivery pipe:
278
-
279
- ```
280
- CursorHandoffArtifact
281
- ? CursorDeliveryGate (eligibility check + UCP persistence)
282
- ? CursorEditorPlugin (transport interface)
283
- ? CursorInProcessAdapter (exact-match filesystem edit)
284
26
  ```
27
+ $ npx usesteady
285
28
 
286
- **Why it matters:** Natural language never directly edits files. Every edit goes through: intake (mode authority), OCD policy (constraint authority), H approval (sole approval authority), delivery gate (eligibility enforcement), and then ? only then ? the adapter. No layer can bypass another.
29
+ UseSteady
287
30
 
288
- **Key invariants (I1?I10):**
289
- - `prepareCursorExecution()` is side-effect-free ? nothing touches disk before H approval
290
- - `ucp.cursor_handoff.v1` persists before send, or delivery is blocked
291
- - Raw input never crosses the seam ? Cursor cannot reinterpret what it never receives
292
- - Exactly one string match required ? `ambiguous_match` is a first-class refusal
293
- - Scope refusal is recoverable only through H narrowing ? never automatic file choice
294
- - OCD constrains only; it never proposes `allowedFiles`
31
+ AI can propose changes.
32
+ You approve before they run.
295
33
 
296
- ---
34
+ Type your request:
35
+ > Update button color
297
36
 
298
- ### Cursor Product Session (Phase 6B)
299
- **What it is:** A flat immutable state machine that connects the execution engine to human interaction flows. One session = one edit attempt. Consumers receive a `CursorSessionState` after each transition and route based on `phase`.
37
+ Generating execution plan...
300
38
 
301
- **Phases:** `idle` ? `prepared` | `conflict` ? `approved` ? `accepted` | `scope_question` | `exec_error` | `blocked` | `rejected` | `not_execute` | `intake_failed`
39
+ SYSTEM WILL
302
40
 
303
- **Why it matters:** The session makes the governed edit workflow operable from any shell (CLI, UI, plugin) without each shell needing to re-implement the authority logic. The session holds state; authority stays upstream.
41
+ 1. Modify src/components/Button.tsx
42
+ Replace bg-blue-500 → bg-indigo-600
304
43
 
305
- **Baseline truths (P1?P6):**
306
- - **P1** ? Terminal states are immutable. All transitions return the same state object unchanged after termination.
307
- - **P2** ? Scope clarification is candidate-bounded. `answerScope()` rejects any file not in `scopeQuestion.candidates`.
308
- - **P3** ? Approval is never execution. `approve()` ? `"approved"`. `deliver()` is a separate explicit act.
309
- - **P4** ? Delivery requires approved. `deliver()` from any other phase ? `"blocked"`.
310
- - **P5** ? Session carries state but has no independent authority.
311
- - **P6** ? All authority lives in intake, OCD/policy, H, and the delivery gate. The session calls them; it does not replace them.
44
+ RISK: LOW
312
45
 
313
- ---
314
-
315
- ### Interaction Contract
316
- **What it is:** An immutable record of how this user prefers to receive responses ? ambiguity tolerance, explanation style, guidance format. Updated by pure functions from observed events.
317
-
318
- **Why it matters:** The system can adapt to a user who consistently corrects misinterpretations, without requiring a profile UI or persistent identity.
46
+ WHY
47
+ Updates shared button styling.
319
48
 
320
- **One-liner:** Tracks how the user communicates, without storing personal data.
49
+ [a] Approve [r] Reject
321
50
 
322
- **`observedIntentPatterns` (Phase B1 ? advisory session signal):**
323
- A counter field on the contract that records how many times each intent category was seen in guided recovery flows where the Intent Interpretation Bridge actually fired.
51
+ > a
324
52
 
325
- ```ts
326
- type ObservedIntentPatterns = {
327
- visual_color: number; // guide + visual_color classification + bridgeFired
328
- text_change: number; // guide + text_change classification + bridgeFired
329
- config_change: number; // guide + config_change classification + bridgeFired
330
- };
331
- ```
332
-
333
- **How to observe:** Call `observeIntentPattern(result, trace)` after a pipeline run. It returns an `InteractionEvent | null`. If non-null, pass it to `applyInteractionEvent(contract, event)` to get an updated contract.
53
+ ✓ Step approved — Button.tsx updated.
334
54
 
335
- ```ts
336
- const { result, trace } = runIntakeWithTrace(input, ctx);
337
- const event = observeIntentPattern(result, trace);
338
- if (event !== null) {
339
- contract = applyInteractionEvent(contract, event);
340
- }
55
+ All steps reviewed.
56
+ Return to your workflow to continue.
341
57
  ```
342
58
 
343
- **Hard invariants (must not be violated):**
344
- 1. `observeIntentPattern` returns null unless **all four** conditions are met:
345
- - `result.mode === "guide"`
346
- - `result.guidance?.interpretation` is defined
347
- - `trace.bridgeFired === true`
348
- - interpretation category is one of `visual_color`, `text_change`, `config_change`
349
- 2. `observedIntentPatterns` is **advisory only**. It influences guidance ordering and label emphasis in B2. It may **never** affect: PRV, Safety Gate, Context Alignment, Disambiguation, Completion, response mode, intentState, or interpretation category/confidence.
350
- 3. No observation event fires on `execute`, `refuse`, `clarify`, or `ignore` flows.
351
- 4. No observation event fires when `bridgeFired: false` or when no trace is provided.
352
-
353
- **Why `bridgeFired` is required (not just interpretation presence):** The trace is explicit confirmation that the bridge ran and completed. Requiring it prevents the observation from being triggered by any future path that might attach interpretation from a different source.
354
-
355
- ---
356
-
357
- ### Session-Aware Guidance Ordering (Phase B2)
358
- **What it is:** A purely presentational step that runs after `enrichGuidance` and applies two adjustments to the `GuidancePayload` based on `observedIntentPatterns`:
359
-
360
- 1. **Step type ordering** ? `read_first` is guaranteed before `use_exact_format` (defensive sort; already true for enriched paths, but now an explicit invariant).
361
-
362
- 2. **Label emphasis** ? when `patterns[category] >= FAMILIARITY_THRESHOLD` (3 observations), the `read_first` step label shortens from its explanatory form to an action-first form:
59
+ ### High risk example
363
60
 
364
- | Category | Default label (0 observations) | Familiar label (?3 observations) |
365
- |---|---|---|
366
- | `visual_color` | `Read the component file first to find the current color value: read "<file>"` | `Find the current color in the component file: read "<file>"` |
367
- | `text_change` | `Read the file first to find the current text: read "<file>"` | `Find the current text in the file: read "<file>"` |
368
- | `config_change` | `Read the config file first to find the current setting: read "<file>"` | `Find the current setting in the config file: read "<file>"` |
369
-
370
- **The "first" word is the learner cue.** New users need it. Experienced users don't.
371
-
372
- **Hard invariants (must not be violated):**
373
- 1. Step count never changes ? no step is added or removed.
374
- 2. `missing[]` never changes.
375
- 3. `interpretation` (category, confidence, summary, basis) never changes.
376
- 4. `mode`, `reason`, `signal`, `intentState` are decided before this function runs ? it cannot affect them.
377
- 5. A high count for one category never touches another category's labels.
378
- 6. `read_first` is never suppressed, only relabeled.
379
- 7. If no `read_first` step exists in the guidance, the function has no effect.
380
- 8. Tie-breaking: stable sort ? same-type steps preserve their original relative order.
381
-
382
- **`FAMILIARITY_THRESHOLD = 3`** ? enough observations to distinguish deliberate repeated use from accidental. Below threshold, nothing changes.
383
-
384
- ---
385
-
386
- ### Response Planner
387
- **What it is:** A deterministic mapping from intent state to response mode. No logic ? just a lookup table with documented reasoning.
388
-
389
- **Why it matters:** The decision of what to do is explicit, auditable, and separate from the understanding that produced it.
390
-
391
- **Mapping:**
392
- | Intent state | Response mode |
393
- |---|---|
394
- | unsafe | refuse |
395
- | non_literal | ignore |
396
- | ambiguous | clarify |
397
- | incomplete | guide |
398
- | guided_recovery | guide |
399
- | clear | execute |
400
-
401
- ---
402
-
403
- ## Locked behavior invariants
404
-
405
- These were confirmed by a persona friction pass against the live system. They are product invariants, not guidelines. Future contributors must not violate them.
406
-
407
- Any code change that violates an invariant below is a breaking change ? same as the written contracts above.
408
-
409
- 1. **Intent Interpretation is narrower than user language.** The bridge only classifies what it can safely claim. If no color word, text-change verb+target, or config signal is present, it returns null and stays silent. Helpfulness without evidence is overclaiming.
410
-
411
- 2. **Null is a valid success state.** A bridge result of null means "no safe interpretation exists." It is not an error. It is not a fallback. It is the correct answer when the signal is absent. Do not replace null with a guess.
412
-
413
- 3. **Vague intent never yields high confidence.** High confidence is reserved for inputs where the evidence is unambiguous and specific. No vague natural-language request (without exact values) may produce `confidence: "high"`. Medium or low are the correct ceiling.
414
-
415
- 4. **Incomplete intent does not use the bridge.** When `CompletionResult.kind === "incomplete"`, the intent is already unambiguous ? a required field is simply missing. Running intent interpretation on these inputs would add noise, not meaning. The bridge skips them.
416
-
417
- 5. **Bridge output may improve labels, not truth.** The bridge may rewrite step labels to be context-aware. It may never invent file paths, token names, current values, or new values. The `missing[]` array must remain identical before and after enrichment.
418
-
419
- 6. **Executable interpretation and guided interpretation are separate families.** `InterpretationResult` (change interpretation, at `IntakeResult.interpretation`) describes what a structured patch command means. `IntentInterpretation` (intent bridge, at `IntakeResult.guidance.interpretation`) describes what a vague request appears to be attempting. These must never be merged or confused. One is for execute mode. One is for guide mode.
420
-
421
- 7. **Observation events fire only when the bridge actually ran.** `observeIntentPattern` requires `trace.bridgeFired === true` as an explicit precondition. Without a trace, no observation fires. This ensures the session observation only tracks meaningful guided interpretations ? not generic noise, incomplete paths, or short-circuited flows.
422
-
423
- 8. **Observed patterns are advisory only ? they never affect mode or truth.** `observedIntentPatterns` counts accumulate silently. They influence guidance ordering and label emphasis (B2). They must never influence `mode`, `intentState`, `safetyVerdict`, `completionKind`, or `interpretation category/confidence`. Same input + same context ? same mode, always, regardless of prior observation counts.
424
-
425
- 9. **Guidance ordering may only reorder and relabel ? never add, remove, or suppress.** `applyGuidanceOrdering` may change the position and label text of existing steps. It may never add a new step, remove an existing step, remove a `read_first` step, or suppress any step that was present in the input guidance. Step count in equals step count out, always.
426
-
427
- 10. **Cross-category isolation is absolute.** A high count in `text_change` never affects `visual_color` labels, and vice versa. Category-specific familiar labels are applied only when the current guidance's `interpretation.category` matches the pattern that exceeded the threshold.
428
-
429
- ---
430
-
431
- ## Integration boundaries
432
-
433
- ### Intake + Present seam (HTTP API / CLI)
434
-
435
- ```
436
- User input
437
-
438
- runIntakeWithTrace(input, ctx) → this package
439
- observeIntentPattern + applyInteractionEvent → this package
440
- presentFromInput(input, intakeResult) → this package
441
- → PresentResult (kind: "reminder" | "intake")
442
- → consumed by CLI shell (src/shell/cli/main.ts)
443
- or API server (server.ts → HTTP JSON → React UI at ui/)
444
61
  ```
62
+ SYSTEM WILL
445
63
 
446
- CVG assertions run inside `presentFromInput()` and `renderReminder()` — zero-cost for consumers, throws on invalid states.
64
+ 1. Permanently delete src/utils/deprecated.ts
447
65
 
448
- ### Cursor execution seam (governed edit path)
66
+ Removes deprecated.ts and all its exports
67
+ Any import of this file will fail immediately after deletion
68
+ No automatic recovery — requires git revert if this was a mistake
449
69
 
450
- ```
451
- User input
452
- ? runIntakeWithUCP()
453
- ?
454
- CursorProductSession.submit() ? prepareCursorExecution() inside
455
- ? phase: "prepared" | "conflict"
456
- ? H reviews display, accepts conflict (if any)
457
- ?
458
- CursorProductSession.approve() ? approveArtifact() inside
459
- ? phase: "approved"
460
- ?
461
- CursorProductSession.deliver() ? deliverCursorExecution() ? CursorDeliveryGate inside
462
- ? phase: "accepted" | "scope_question" | "exec_error" | "blocked"
463
- ?
464
- Filesystem (real file mutation ? only on "accepted")
465
- ```
466
-
467
- Every stage is independently testable. The gate is the enforcement point between H and filesystem. Nothing below the gate is reachable without a persisted `ucp.cursor_handoff.v1`.
70
+ RISK: HIGH
468
71
 
469
- Authority distribution:
470
- ```
471
- intake ? sole mode authority
472
- OCD policy ? sole constraint authority
473
- H ? sole approval authority
474
- delivery gate ? sole eligibility authority
475
- session ? state carrier only (no authority of its own)
476
- ```
72
+ WHY
73
+ Removes deprecated utilities no longer used.
477
74
 
478
- ### Claude Managed Agents seam (Phase 8B ? scaffold locked)
75
+ HIGH RISK cannot be undone without version control
479
76
 
77
+ [a] Approve (high risk)
78
+ [r] Reject
480
79
  ```
481
- User input
482
- ? runIntakeWithUCP() ? intake: sole mode authority
483
- ?
484
- buildClaudeHandoffArtifact()
485
- ? executionDomain derived here (A1) ? never reclassified downstream
486
- ? toolPolicy.networkAccess = "deny" (V1 lock, A2)
487
- ? H reviews, approves ? approveClaudeArtifact()
488
- ?
489
- ClaudeDeliveryGate.deliver()
490
- ? checks: eligibility, networkAccess === "deny" (A2), filesystemMode, allowedTools
491
- ? persists ucp.claude_handoff.v1 BEFORE Claude is called
492
- ?
493
- ClaudeAgentPlugin.receive() ? only call gate makes into Claude (A4)
494
- ? returns: accepted | refused_due_to_scope | refused_due_to_execution_error
495
- ? unknown kinds ? refused_due_to_execution_error (fail-closed)
496
- ?
497
- gate persists ucp.claude_receipt.v1 or ucp.claude_refused.v1
498
- ```
499
-
500
- Four Phase A locked truths enforced in the gate:
501
- - **A1** ? `executionDomain` is mapper-derived; gate validates presence only
502
- - **A2** ? `networkAccess: "allow_limited"` ? `blocked_tool_policy` (V1 hard block)
503
- - **A3** ? Session interruption ? `refused_due_to_execution_error` (non-resumable in V1)
504
- - **A4** ? No Claude?Intake callback path; `ClaudeAgentPlugin` is one-way only
505
-
506
- ---
507
-
508
- ## Governance documents
509
-
510
- | Document | What it governs |
511
- |---|---|
512
- | [`docs/cursor-v1-baseline.md`](docs/cursor-v1-baseline.md) | Cursor seam + product session invariants |
513
- | [`docs/cvg-baseline.md`](docs/cvg-baseline.md) | Control Visibility & Boundary Integrity invariants |
514
- | [`docs/claude-agent-phase-a-architecture.md`](docs/claude-agent-phase-a-architecture.md) | Claude Managed Agents Phase A freeze |
515
- | [`docs/architecture/MODEL_INSERTION_POLICY.md`](docs/architecture/MODEL_INSERTION_POLICY.md) | Where AI models may and may not be used ? authority/advisory split, forbidden patterns, CI enforcement |
516
-
517
- The Model Insertion Policy is a first-class invariant document alongside CVG and the Cursor baseline. Any change that introduces an AI model at any layer must comply with it before implementation.
518
-
519
- ---
520
-
521
- ## Non-goals
522
-
523
- This project does not:
524
- - Generate responses (that is the consumer's job)
525
- - Execute anything
526
- - Connect to any external service
527
- - Use an LLM at any stage to make control decisions
528
- - Correct user input silently
529
- - Guess missing values
530
- - Produce fuzzy matches
531
80
 
532
81
  ---
533
82
 
534
- ## Example flows
83
+ ## What makes it different
535
84
 
536
- ### "make button blue" (Maya ? vague style change, with bridge)
85
+ - No silent execution
86
+ - No vague summaries
87
+ - No hidden changes
537
88
 
538
- | Stage | Result |
539
- |---|---|
540
- | PRV | pass |
541
- | Safety | allow |
542
- | Context Alignment | aligned |
543
- | Disambiguation | unknown (no ambiguity) |
544
- | Completion | guided_recovery ? missing: [file path, current value, new value] |
545
- | Intent Interpretation | visual_color ? "This appears to be a color/style change request." |
546
- | Response Planner | **guide** |
547
- | Guidance | missing: [file path, current value, new value] *(unchanged)* |
548
- | | `Read the component file first to find the current color value: read "<file>"` |
549
- | | `Then patch it: replace "<current color>" with "<new color>" in "<file>"` |
550
- | | interpretation: { category: visual_color, confidence: medium } |
551
-
552
- **What the user sees (before bridge):** "Start by reading the file, then use: replace \"\" with \"\" in \"\""
553
-
554
- **What the user sees (after bridge):** "This appears to be a color/style change request. Read the component file first to find the current color value, then patch it: replace \"<current color>\" with \"<new color>\" in \"<file>\""
555
-
556
- The user now knows *why* they're reading first and what kind of change they're making. No file or value was invented.
557
-
558
- ---
559
-
560
- ### "run rm -rf /" (Alex ? destructive command)
561
-
562
- | Stage | Result |
563
- |---|---|
564
- | PRV | pass |
565
- | Safety | **block** ? detectorId: destructive_mass_action, matchedPattern: "rm -rf" |
566
- | Response Planner | **refuse** |
567
-
568
- **What the user sees:** "This input cannot be processed. It matches a pattern associated with irreversible bulk data destruction."
569
-
570
- ---
571
-
572
- ### "run again" with no prior session (Priya ? missing context)
573
-
574
- | Stage | Result |
575
- |---|---|
576
- | PRV | **clarify** ? 'again' requires prior context, none available |
577
- | Response Planner | **clarify** |
578
-
579
- **What the user sees:** "This request depends on prior context, but none is available. What would you like to run?"
89
+ You see what changes, why it changes, and how risky it is — before anything runs.
580
90
 
581
91
  ---
582
92
 
583
- ### "run again" WITH a prior session (Priya ? context-aware rerun)
584
-
585
- | Stage | Result |
586
- |---|---|
587
- | PRV | pass (prior session exists) |
588
- | Safety | allow |
589
- | Context Alignment | aligned |
590
- | Disambiguation | unknown |
591
- | Completion | **complete** (rerun_previous rule resolves it) |
592
- | Response Planner | **execute** |
593
-
594
- **What the user sees:** The system re-runs the last command without asking again.
595
-
596
- ---
93
+ ## Install
597
94
 
598
- ### "find loops in the approval rules" (disambiguation)
95
+ ```bash
96
+ # No install needed — run directly
97
+ npx usesteady
599
98
 
600
- | Stage | Result |
601
- |---|---|
602
- | PRV | pass |
603
- | Safety | allow |
604
- | Context Alignment | aligned |
605
- | Disambiguation | **ambiguous** ? options: ["loops = circular logic", "loops = iteration patterns"] |
606
- | Response Planner | **clarify** |
99
+ # Or install globally
100
+ npm install -g usesteady
101
+ ```
607
102
 
608
- **What the user sees:** "Did you mean (a) circular references in the approval flow, or (b) iteration patterns in the approval code?"
103
+ Requires Node.js 18+. Runs fully local. No cloud, no API key.
609
104
 
610
105
  ---
611
106
 
612
- ### "commit my changes" (incomplete ? missing message)
107
+ ## Core idea
613
108
 
614
- | Stage | Result |
615
- |---|---|
616
- | PRV | pass |
617
- | Safety | allow |
618
- | Context Alignment | aligned |
619
- | Disambiguation | unknown |
620
- | Completion | **incomplete** ? missing: [commit message] |
621
- | Response Planner | **guide** |
622
- | Guidance | next step: commit "" |
109
+ **AI proposes. You approve. Then it runs.**
623
110
 
624
- **What the user sees:** "A commit message is required. Use: commit \"\""
111
+ Like `git diff` but for AI actions before they execute.
625
112
 
626
113
  ---
627
114
 
628
- ### replace "bg-blue-500" with "bg-red-500" in src/Button.tsx (Maya ? exact change with interpretation)
115
+ ## Language
629
116
 
630
- | Stage | Result |
117
+ | Term | Meaning |
631
118
  |---|---|
632
- | PRV | pass |
633
- | Safety | allow |
634
- | Context Alignment | aligned |
635
- | Disambiguation | unknown |
636
- | Completion | **complete** |
637
- | Response Planner | **execute** |
638
- | Interpretation | category: tailwind_color_change, confidence: high |
639
- | | summary: "Changes background color from 'bg-blue-500' to 'bg-red-500' in Button.tsx." |
640
- | | impact: ["Visual change only ? no logic, data, or behavior impact."] |
641
-
642
- **What the user sees:** The change is applied. The result includes a plain-English description: "Changes background color from blue-500 to red-500 in Button.tsx ? visual change only."
119
+ | `SYSTEM WILL` | The exact operation that will run — not a summary |
120
+ | `SYSTEM SUGGESTS` | Options shown when AI is unsure — not a guess |
121
+ | `Approve` / `Reject` | Your decision, per step |
122
+ | `Revert last approval` | Undo a decision (not a filesystem change — nothing has run yet) |
123
+ | `Risk: LOW / MEDIUM / HIGH` | Derived from what is actually changing |
643
124
 
644
125
  ---
645
126
 
646
- ## Written contracts
647
-
648
- These are non-negotiable invariants of the system. Any code change that violates one is a breaking change.
649
-
650
- 1. **Completion is the sole authority on executability.** Disambiguation `unknown` does not block completion from returning `complete`.
651
-
652
- 2. **Guide mode must carry structured guidance.** No consumer should need to re-run completion to render next steps.
653
-
654
- 3. **Completion receives context.** Any context-dependent executable intent must be resolvable in completion.
655
-
656
- 4. **PRV is lexical. Context Alignment is semantic.** No semantic reasoning in PRV.
657
-
658
- 5. **Public output exposes final outcome, not internal intermediates.** Disambiguation `unknown` is not a public signal.
659
-
660
- 6. **Interpretation is advisory.** It does not change the mode decision. It describes what the human will experience.
661
-
662
- 7. **`incomplete` ? `guided_recovery`.** These are distinct states with distinct semantics. Do not collapse them.
663
-
664
- 8. **Interpretation can improve guidance, but it can never manufacture executability.** The Intent Interpretation Bridge runs only for `guided_recovery`. It may improve labels. It may never guess values, file paths, or tokens. It may never change the mode decision.
665
-
666
- ---
667
-
668
- ## Test summary
669
-
670
- ```
671
- 2321 tests passing across 53 test files.
672
- 0 failures.
673
- ```
674
-
675
- | Area | Files | What is covered |
676
- |------|-------|-----------------|
677
- | `tests/ucp/` | 4 | Envelope hash determinism, structure, persistence, timeline, phase 7 alignment |
678
- | `tests/prv/` | 1 | Lexical markers, narrowed patterns, context boundary |
679
- | `tests/safety/` | 1 | Blocking, allowing, detectorId + matchedPattern inspectability |
680
- | `tests/understand/` | 12 | Context alignment, disambiguation, completion, change interpretation, intent bridge (unit + friction + pressure), silent guidance, session-aware guidance ordering |
681
- | `tests/interaction/` | 2 | Default contract, updater, selectors, observation events |
682
- | `tests/intake/` | 10 | All five modes, guidance contract, interpretation, branch validation, session validation |
683
- | `tests/present/` | 4 | Badge/headline/category/confidence, reminder presenter + renderer, coordinator routing, CVG signal mapping + flag assignment + uniqueness + assertion invariants + integration |
684
- | `tests/cursor/` | 3 | Delivery gate (eligibility, persistence, OCD, glob-matcher), in-process adapter (real FS, round-trip scope proof), execution coordinator (two-phase, consumer helpers) |
685
- | `tests/product/` | 2 | Session invariants S1?S8, phase gates, terminal state preservation, approve ? deliver; session resilience (snapshot, staleness, summary) |
686
- | `tests/execution/` | 8 | Reminder execution parse-once, validator, artifact-first design, UCP reminder chain |
687
- | `tests/claude/` | 1 | Claude delivery gate: A1?A4 invariants, three delivery paths, UCP chain integrity, fail-closed unknowns, session interruption (A3) |
688
-
689
- ---
690
-
691
- ## How to safely extend the system
692
-
693
- Three rules. All three apply to every change.
694
-
695
- **1. Extend only one layer at a time.**
696
- Each layer has a defined scope and a defined boundary. An intake change is an intake change. A presentation change is a presentation change. A change that touches two layers simultaneously is a seam violation ? it means the boundary was unclear, not that the rule is wrong. Clarify the boundary first, then make the change in the correct layer.
697
-
698
- **2. Update baseline ? annotation ? tests. In that order.**
699
- Before writing code: update the relevant baseline document (`docs/cursor-v1-baseline.md`, `docs/cvg-baseline.md`, or the written contracts in this file) to state what the new invariant is. Then add the comment-level annotation to the source file at the point where the rule could be violated. Then write the test that proves it. If you cannot write the invariant before the code, the design is not clear enough yet.
700
-
701
- **3. Never introduce authority into the Present or Session layers.**
702
- The Present layer formats state. The Session layer carries state. Neither layer may make a decision that belongs to intake, OCD policy, H, or the delivery gate. The test is simple: if the new code changes what *happens* based on what *was presented*, it belongs to a different layer. If it only changes what *is shown* based on what *already happened*, it belongs here.
127
+ ## You see SYSTEM WILL before anything runs.
703
128
 
704
- **4. If the change involves an AI model, consult the Model Insertion Policy first.**
705
- Models may only be inserted as language adapters in advisory layers. They may never participate in routing, classification, approval, scope selection, gate decisions, or UCP persistence. See [`docs/architecture/MODEL_INSERTION_POLICY.md`](docs/architecture/MODEL_INSERTION_POLICY.md) for the complete authority/advisory split, forbidden patterns, implementation constraints, and CI enforcement requirements.
129
+ That is the guarantee. Every step. No exceptions.
706
130
 
707
131
  ---
708
132
 
709
- ## What is intentionally deferred
710
-
711
- | Item | Status |
712
- |------|--------|
713
- | Change Interpretation ? structural code changes | Not in v1 scope |
714
- | Change Interpretation ? multi-file changes | Not in v1 scope |
715
- | Intent Interpretation Bridge ? file search / token lookup | Intentionally excluded; bridge never searches |
716
- | Intent Interpretation Bridge ? `incomplete` enrichment | Intentionally excluded; incomplete already has specific guidance |
717
- | Interaction Contract ? persistence | In-memory only; no storage layer yet |
718
- | Reminder scheduler / persistence | Reminder execution is complete; scheduling is deferred |
719
- | Timezone resolution | Deferred; time is extracted as text, not resolved |
720
- | `ucp.cursor_artifact.v1` | Reserved slot between `cursor_receipt` and `execution_trace`; deferred until Cursor produces a real diff object |
721
- | IPC / remote Cursor transport | In-process adapter proves the contract; remote transport follows same `CursorEditorPlugin` interface |
722
- | Persistent run store | ✅ Shipped PI-4 Iter 1 — file-backed store (`server-store.ts`) |
723
- | Real-time frame streaming | ✅ Shipped PI-4 Iter 3 — SSE endpoint (`GET /api/workflow/:runId/events`) |
724
- | Claude real API adapter | `ClaudeStubAdapter` used in web server; real Anthropic wiring requires `@anthropic-ai/sdk` install |
133
+ [usesteady.dev](https://usesteady.dev) · Apache 2.0