@xenonbyte/da-vinci-workflow 0.1.14 → 0.1.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +9 -1
- package/README.md +17 -0
- package/README.zh-CN.md +17 -0
- package/SKILL.md +13 -0
- package/commands/claude/dv/design.md +2 -0
- package/commands/claude/dv/verify.md +2 -0
- package/commands/codex/prompts/dv-design.md +2 -0
- package/commands/codex/prompts/dv-verify.md +1 -0
- package/commands/gemini/dv/design.toml +2 -0
- package/commands/gemini/dv/verify.toml +1 -0
- package/docs/mcp-aware-gate-implementation.md +291 -0
- package/docs/mcp-aware-gate-tests.md +244 -0
- package/docs/mcp-aware-gate.md +246 -0
- package/docs/mode-use-cases.md +2 -0
- package/docs/prompt-presets/desktop-app.md +3 -0
- package/docs/prompt-presets/mobile-app.md +3 -0
- package/docs/prompt-presets/tablet-app.md +3 -0
- package/docs/prompt-presets/web-app.md +3 -0
- package/docs/workflow-examples.md +9 -4
- package/docs/zh-CN/mcp-aware-gate-implementation.md +290 -0
- package/docs/zh-CN/mcp-aware-gate-tests.md +244 -0
- package/docs/zh-CN/mcp-aware-gate.md +249 -0
- package/docs/zh-CN/mode-use-cases.md +3 -0
- package/docs/zh-CN/prompt-presets/desktop-app.md +3 -0
- package/docs/zh-CN/prompt-presets/mobile-app.md +3 -0
- package/docs/zh-CN/prompt-presets/tablet-app.md +3 -0
- package/docs/zh-CN/prompt-presets/web-app.md +3 -0
- package/docs/zh-CN/workflow-examples.md +9 -4
- package/lib/audit.js +348 -0
- package/lib/cli.js +47 -1
- package/lib/mcp-runtime-gate.js +342 -0
- package/package.json +3 -2
- package/references/artifact-templates.md +26 -1
- package/references/checkpoints.md +66 -1
- package/references/design-inputs.md +2 -1
- package/references/pencil-design-to-code.md +8 -0
- package/scripts/test-mcp-runtime-gate.js +199 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,7 +2,15 @@
|
|
|
2
2
|
|
|
3
3
|
## Unreleased
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
## v0.1.15 - 2026-03-27
|
|
6
|
+
|
|
7
|
+
### Changed
|
|
8
|
+
- MCP-aware runtime gate now has a first implementation slice: a pure evaluator, runtime-gate recording shape, and workflow hooks that require live source convergence checks before terminal completion claims
|
|
9
|
+
- `da-vinci audit` now distinguishes `integrity` and `completion` modes so mid-workflow sanity checks do not masquerade as terminal completion gates
|
|
10
|
+
- completion guidance now blocks terminal `design complete` or `workflow complete` claims unless the registered project-local `.pen` source is shell-visible, standard artifacts exist, and the completion gate passes
|
|
11
|
+
- design-source rules now reject unnamed live editors such as `new` as persisted project sources and explicitly block screenshot or markdown pollution inside `.da-vinci/designs/`
|
|
12
|
+
- prompt presets, workflow examples, and mode guides now state that screenshot exports belong under `.da-vinci/changes/<change-id>/exports/` and cannot replace the `.pen` source of truth
|
|
13
|
+
- Pencil-operation guidance now treats repeated unsupported-property rollbacks on the same anchor surface as unstable progress instead of acceptable forward motion
|
|
6
14
|
|
|
7
15
|
## v0.1.14 - 2026-03-27
|
|
8
16
|
|
package/README.md
CHANGED
|
@@ -414,9 +414,26 @@ Useful commands:
|
|
|
414
414
|
```bash
|
|
415
415
|
da-vinci status
|
|
416
416
|
da-vinci validate-assets
|
|
417
|
+
da-vinci audit --mode integrity /abs/path/to/project
|
|
418
|
+
da-vinci audit --mode completion --change <change-id> /abs/path/to/project
|
|
417
419
|
da-vinci uninstall --platform codex,claude,gemini
|
|
418
420
|
```
|
|
419
421
|
|
|
422
|
+
`da-vinci audit` has two intended modes:
|
|
423
|
+
|
|
424
|
+
- `--mode integrity`: a mid-workflow filesystem-truth check for missing baseline artifacts, misplaced exports, polluted `.da-vinci/designs/`, and missing persisted `.pen` sources
|
|
425
|
+
- `--mode completion`: a strict pre-completion gate for one change scope; use `--change <change-id>` and treat any failure as blocking
|
|
426
|
+
|
|
427
|
+
Both modes check the most common workflow-integrity failures in a project:
|
|
428
|
+
|
|
429
|
+
- missing standard Da Vinci artifacts
|
|
430
|
+
- missing shell-visible project-local `.pen` sources
|
|
431
|
+
- pollution inside `.da-vinci/designs/`
|
|
432
|
+
- screenshot exports stored in the wrong place
|
|
433
|
+
- empty or partial change scaffolds
|
|
434
|
+
|
|
435
|
+
When Pencil MCP is active, Da Vinci now also expects an MCP runtime gate record in `pencil-design.md` before terminal completion claims. That runtime gate checks live editor/source convergence separately from filesystem audit.
|
|
436
|
+
|
|
420
437
|
Installation targets:
|
|
421
438
|
|
|
422
439
|
- Codex prompts: `~/.codex/prompts/`
|
package/README.zh-CN.md
CHANGED
|
@@ -343,9 +343,26 @@ da-vinci install --platform codex,claude,gemini
|
|
|
343
343
|
```bash
|
|
344
344
|
da-vinci status
|
|
345
345
|
da-vinci validate-assets
|
|
346
|
+
da-vinci audit --mode integrity /abs/path/to/project
|
|
347
|
+
da-vinci audit --mode completion --change <change-id> /abs/path/to/project
|
|
346
348
|
da-vinci uninstall --platform codex,claude,gemini
|
|
347
349
|
```
|
|
348
350
|
|
|
351
|
+
`da-vinci audit` 现在有两种主要模式:
|
|
352
|
+
|
|
353
|
+
- `--mode integrity`:适合在工作进行中检查文件系统真相,比如基础工件缺失、导出路径错误、`.da-vinci/designs/` 被污染、项目内 `.pen` 没落盘
|
|
354
|
+
- `--mode completion`:适合在宣称完成前做严格检查;配合 `--change <change-id>` 使用,任何失败都应视为阻断
|
|
355
|
+
|
|
356
|
+
两种模式都会检查项目里最常见的工作流完整性问题:
|
|
357
|
+
|
|
358
|
+
- 标准 Da Vinci 工件缺失
|
|
359
|
+
- 项目内 shell 可见 `.pen` 设计源缺失
|
|
360
|
+
- `.da-vinci/designs/` 目录被污染
|
|
361
|
+
- 截图导出写到了错误位置
|
|
362
|
+
- change scaffold 只有空目录或只写了一半
|
|
363
|
+
|
|
364
|
+
当 Pencil MCP 可用时,Da Vinci 现在还要求在终态完成声明前,把 MCP runtime gate 结果记录到 `pencil-design.md`。这层 gate 负责检查 live editor/source convergence,与 filesystem audit 分工不同。
|
|
365
|
+
|
|
349
366
|
安装目标:
|
|
350
367
|
|
|
351
368
|
- Codex prompts:`~/.codex/prompts/`
|
package/SKILL.md
CHANGED
|
@@ -216,16 +216,24 @@ Default completion rule:
|
|
|
216
216
|
- if the request is `design-only`, stop after design artifacts and bindings
|
|
217
217
|
- otherwise assume `full-delivery` and continue through implementation and verification
|
|
218
218
|
|
|
219
|
+
Do not report `design complete`, `workflow complete`, or any equivalent terminal state unless the completion gate in `references/checkpoints.md` is satisfied.
|
|
220
|
+
When shell access is available, prefer `da-vinci audit --mode integrity <project-path>` during active workflow work and `da-vinci audit --mode completion --change <change-id> <project-path>` before any terminal completion claim.
|
|
221
|
+
|
|
219
222
|
## Pencil Generation Rules
|
|
220
223
|
|
|
221
224
|
During active Pencil work:
|
|
222
225
|
|
|
226
|
+
- do not begin anchor-surface generation until the required discovery and design-source artifacts exist in their standard locations for the active mode
|
|
223
227
|
- keep `.da-vinci/designs/` reserved for project-local `.pen` files; do not write workflow markdown such as inventories, proposals, or checkpoints into that directory
|
|
224
228
|
- on `redesign-from-code`, write a short structural-delta note for each anchor surface explaining how the new composition differs from the current XML or layout grouping
|
|
225
229
|
- after the first successful Pencil write, verify that the registered project-local `.pen` path exists as a shell-visible file before treating the design source as persistent
|
|
230
|
+
- after the first successful Pencil write, run the MCP runtime gate when Pencil MCP is available and record the result in `pencil-design.md`
|
|
231
|
+
- do not treat an unnamed live editor such as `new` as a persisted project design source; reconcile it to the registered project-local `.pen` path before the design pass is considered traceable
|
|
226
232
|
- use only Pencil-supported properties; do not emit web- or CSS-only layout properties such as `flex` or `margin`
|
|
233
|
+
- if unsupported Pencil properties cause repeated rolled-back batches on the same anchor surface, treat that pass as unstable and fix the schema usage before expanding further
|
|
227
234
|
- on complex redesigns, turn approved anchor surfaces into a small shared primitive family before broad page expansion
|
|
228
235
|
- apply the resolved form-factor-specific layout hygiene profile before passing screenshot review on any anchor surface or other approval candidate
|
|
236
|
+
- exported screenshots are review artifacts only; place them under `.da-vinci/changes/<change-id>/exports/` and never treat them as a substitute for the project-local `.pen` source
|
|
229
237
|
- screenshot review is binding: if the review calls out hierarchy, spacing, clarity, inconsistency, or unresolved-placeholder issues, revise the screen before treating the checkpoint as `PASS`
|
|
230
238
|
|
|
231
239
|
## Load References On Demand
|
|
@@ -573,6 +581,11 @@ When Pencil is available through MCP:
|
|
|
573
581
|
- Before mapping or implementation closes, verify both:
|
|
574
582
|
- the `.pen` path is readable through MCP
|
|
575
583
|
- the same path exists as a shell-visible file inside the project
|
|
584
|
+
- Before broad expansion or terminal completion, run the MCP runtime gate:
|
|
585
|
+
- evaluate source convergence from the active editor, registered `.pen` path, and shell-visible `.pen` file
|
|
586
|
+
- evaluate screen presence for claimed anchor and review target ids
|
|
587
|
+
- evaluate review execution for approved surfaces
|
|
588
|
+
- append the runtime gate result to `pencil-design.md`
|
|
576
589
|
|
|
577
590
|
When Pencil is not available:
|
|
578
591
|
|
|
@@ -18,3 +18,5 @@ Create or update:
|
|
|
18
18
|
- `pencil-design.md`
|
|
19
19
|
|
|
20
20
|
Run the `design checkpoint` before locking implementation tasks.
|
|
21
|
+
If Pencil MCP is active, run the MCP runtime gate after the first successful Pencil write and record it in `pencil-design.md`.
|
|
22
|
+
Before reporting `design complete` or `workflow complete`, run `da-vinci audit --mode completion --change <change-id> <project-path>` and treat any failure as blocking.
|
|
@@ -12,3 +12,5 @@ Output should move the work toward:
|
|
|
12
12
|
- `pencil-design.md`
|
|
13
13
|
|
|
14
14
|
Use Pencil-backed structure as the design source when available.
|
|
15
|
+
If Pencil MCP is active, run the MCP runtime gate after the first successful Pencil write and record it in `pencil-design.md`.
|
|
16
|
+
Before claiming `design complete` or `workflow complete`, run `da-vinci audit --mode completion --change <change-id> <project-path>` and treat any failure as blocking.
|
|
@@ -11,4 +11,6 @@ Create or update:
|
|
|
11
11
|
- `pencil-design.md`
|
|
12
12
|
|
|
13
13
|
Use Pencil-backed page coverage as the source of presentation truth.
|
|
14
|
+
If Pencil MCP is active, run the MCP runtime gate after the first successful Pencil write and record it in `pencil-design.md`.
|
|
15
|
+
Before reporting `design complete` or `workflow complete`, run `da-vinci audit --mode completion --change <change-id> <project-path>` and treat any failure as blocking.
|
|
14
16
|
"""
|
|
@@ -0,0 +1,291 @@
|
|
|
1
|
+
# MCP-Aware Gate Implementation Design
|
|
2
|
+
|
|
3
|
+
This document turns the MCP-aware gate proposal into an implementation design.
|
|
4
|
+
|
|
5
|
+
It still does not commit to writing code.
|
|
6
|
+
|
|
7
|
+
## Scope
|
|
8
|
+
|
|
9
|
+
This design covers only the first implementation slice:
|
|
10
|
+
|
|
11
|
+
- runtime source convergence
|
|
12
|
+
- runtime screen presence
|
|
13
|
+
- runtime review execution
|
|
14
|
+
- completion blocking when runtime truth and filesystem truth diverge
|
|
15
|
+
|
|
16
|
+
It does not cover:
|
|
17
|
+
|
|
18
|
+
- automatic `.pen` reconstruction
|
|
19
|
+
- CLI access to live MCP state
|
|
20
|
+
- session persistence or transport work
|
|
21
|
+
|
|
22
|
+
## Design Goal
|
|
23
|
+
|
|
24
|
+
Add a narrow runtime checkpoint that can stop false completion claims caused by live-editor drift.
|
|
25
|
+
|
|
26
|
+
The gate should catch cases like:
|
|
27
|
+
|
|
28
|
+
- active editor is still `new`
|
|
29
|
+
- anchor screens exist only in the live session
|
|
30
|
+
- node ids used for screenshots do not exist in the current editor
|
|
31
|
+
- the workflow claims completion before runtime state and filesystem state converge
|
|
32
|
+
|
|
33
|
+
## Existing Constraints
|
|
34
|
+
|
|
35
|
+
The current architecture already provides:
|
|
36
|
+
|
|
37
|
+
- filesystem `audit`
|
|
38
|
+
- checkpoint rules in `references/checkpoints.md`
|
|
39
|
+
- artifact expectations in `design-registry.md` and `pencil-design.md`
|
|
40
|
+
- MCP access to active editor state and screen nodes
|
|
41
|
+
|
|
42
|
+
The current architecture does not provide:
|
|
43
|
+
|
|
44
|
+
- a CLI bridge to MCP runtime state
|
|
45
|
+
- a stable session id outside the active agent context
|
|
46
|
+
|
|
47
|
+
That means the MCP-aware gate must be executed inside the agent workflow while MCP tools are live.
|
|
48
|
+
|
|
49
|
+
## Implementation Placement
|
|
50
|
+
|
|
51
|
+
### Primary insertion points
|
|
52
|
+
|
|
53
|
+
1. After the first successful Pencil write in a design pass.
|
|
54
|
+
2. Before any terminal `design complete` or `workflow complete` claim.
|
|
55
|
+
|
|
56
|
+
### Secondary insertion point
|
|
57
|
+
|
|
58
|
+
3. Before broad expansion beyond approved anchor surfaces when the design pass depends on screenshot-reviewed anchors.
|
|
59
|
+
|
|
60
|
+
### Why these points
|
|
61
|
+
|
|
62
|
+
- after first write: catches `new`-editor drift early
|
|
63
|
+
- before completion: catches false success claims
|
|
64
|
+
- before broad expansion: prevents weak runtime state from spreading into more screens
|
|
65
|
+
|
|
66
|
+
## Owning Workflow Stage
|
|
67
|
+
|
|
68
|
+
The runtime gate should be owned by the design phase, not the CLI.
|
|
69
|
+
|
|
70
|
+
That means:
|
|
71
|
+
|
|
72
|
+
- design routes should execute it while Pencil MCP is available
|
|
73
|
+
- verify routes may re-check it if design completion is being claimed
|
|
74
|
+
- build routes should not become the primary owner of runtime gate logic
|
|
75
|
+
|
|
76
|
+
## Input Sources
|
|
77
|
+
|
|
78
|
+
### MCP inputs
|
|
79
|
+
|
|
80
|
+
Required:
|
|
81
|
+
|
|
82
|
+
- active editor state
|
|
83
|
+
- top-level nodes
|
|
84
|
+
- targeted node reads for claimed anchor surfaces
|
|
85
|
+
|
|
86
|
+
Expected MCP operations:
|
|
87
|
+
|
|
88
|
+
- `pencil.get_editor_state`
|
|
89
|
+
- `pencil.batch_get`
|
|
90
|
+
|
|
91
|
+
### Filesystem inputs
|
|
92
|
+
|
|
93
|
+
Required:
|
|
94
|
+
|
|
95
|
+
- shell-visible `.pen` existence
|
|
96
|
+
- registered `.pen` path from `design-registry.md`
|
|
97
|
+
- declared reviewed screens and screenshot targets from `pencil-design.md`
|
|
98
|
+
|
|
99
|
+
Expected shell or file reads:
|
|
100
|
+
|
|
101
|
+
- read `design-registry.md`
|
|
102
|
+
- read `pencil-design.md`
|
|
103
|
+
- check registered `.pen` path on disk
|
|
104
|
+
|
|
105
|
+
## Runtime Snapshot Model
|
|
106
|
+
|
|
107
|
+
The runtime gate should build one structured snapshot in memory:
|
|
108
|
+
|
|
109
|
+
```md
|
|
110
|
+
runtime snapshot
|
|
111
|
+
- activeEditor
|
|
112
|
+
- topLevelScreenIds
|
|
113
|
+
- topLevelScreenNames
|
|
114
|
+
- registeredPenPath
|
|
115
|
+
- shellVisiblePenExists
|
|
116
|
+
- claimedAnchorIds
|
|
117
|
+
- claimedReviewedScreenIds
|
|
118
|
+
- reviewTargets
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
The evaluator should only depend on this snapshot.
|
|
122
|
+
|
|
123
|
+
That keeps the implementation testable without needing a real live Pencil session for every case.
|
|
124
|
+
|
|
125
|
+
## Evaluation Stages
|
|
126
|
+
|
|
127
|
+
### Stage 1: Source Convergence
|
|
128
|
+
|
|
129
|
+
Checks:
|
|
130
|
+
|
|
131
|
+
- active editor is not `new`
|
|
132
|
+
- registered `.pen` path exists in `design-registry.md`
|
|
133
|
+
- registered `.pen` path exists on disk
|
|
134
|
+
- active editor and registered source do not obviously diverge
|
|
135
|
+
|
|
136
|
+
Result rules:
|
|
137
|
+
|
|
138
|
+
- `PASS`: runtime source and registered source converge
|
|
139
|
+
- `WARN`: no new live edits happened yet, or a documented deferred baseline is still being used
|
|
140
|
+
- `BLOCK`: runtime source is unnamed, missing, or diverged
|
|
141
|
+
|
|
142
|
+
### Stage 2: Screen Presence
|
|
143
|
+
|
|
144
|
+
Checks:
|
|
145
|
+
|
|
146
|
+
- claimed anchor ids exist in live MCP state
|
|
147
|
+
- claimed reviewed screens exist in live MCP state
|
|
148
|
+
- screenshot targets resolve in the active document
|
|
149
|
+
|
|
150
|
+
Result rules:
|
|
151
|
+
|
|
152
|
+
- `PASS`: claimed design output is traceable to live editor nodes
|
|
153
|
+
- `WARN`: screen naming drift exists but ids are still traceable
|
|
154
|
+
- `BLOCK`: claimed screens or targets do not resolve
|
|
155
|
+
|
|
156
|
+
### Stage 3: Review Execution
|
|
157
|
+
|
|
158
|
+
Checks:
|
|
159
|
+
|
|
160
|
+
- each approved anchor has a reviewed screen id or screenshot target
|
|
161
|
+
- runtime review records align with the current live editor
|
|
162
|
+
- review blockers were not ignored
|
|
163
|
+
|
|
164
|
+
Result rules:
|
|
165
|
+
|
|
166
|
+
- `PASS`: runtime review is credible
|
|
167
|
+
- `WARN`: review exists but requires follow-up before expansion
|
|
168
|
+
- `BLOCK`: approval claim is unsupported by runtime evidence
|
|
169
|
+
|
|
170
|
+
## Recording Strategy
|
|
171
|
+
|
|
172
|
+
Do not introduce a new artifact family.
|
|
173
|
+
|
|
174
|
+
Append a structured section to `pencil-design.md`:
|
|
175
|
+
|
|
176
|
+
```md
|
|
177
|
+
## MCP Runtime Gate
|
|
178
|
+
- Time:
|
|
179
|
+
- Active editor:
|
|
180
|
+
- Registered `.pen` path:
|
|
181
|
+
- Shell-visible `.pen` path:
|
|
182
|
+
- Claimed anchor ids:
|
|
183
|
+
- Reviewed screen ids:
|
|
184
|
+
- Source convergence: PASS | WARN | BLOCK
|
|
185
|
+
- Screen presence: PASS | WARN | BLOCK
|
|
186
|
+
- Review execution: PASS | WARN | BLOCK
|
|
187
|
+
- Final runtime gate status: PASS | WARN | BLOCK
|
|
188
|
+
- Notes:
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Why `pencil-design.md`
|
|
192
|
+
|
|
193
|
+
- it already records source path, screens, screenshots, and design notes
|
|
194
|
+
- it is the closest existing artifact to runtime design truth
|
|
195
|
+
- it avoids scattering checkpoint state across ad hoc files
|
|
196
|
+
|
|
197
|
+
## Failure Handling
|
|
198
|
+
|
|
199
|
+
When runtime gate returns `BLOCK`:
|
|
200
|
+
|
|
201
|
+
- do not continue to broad multi-screen expansion
|
|
202
|
+
- do not claim design completion
|
|
203
|
+
- do not claim workflow completion
|
|
204
|
+
- record the mismatch explicitly in `pencil-design.md`
|
|
205
|
+
|
|
206
|
+
When runtime gate returns `WARN`:
|
|
207
|
+
|
|
208
|
+
- allow continuation only when the warning does not create source ambiguity
|
|
209
|
+
- do not allow terminal completion unless the warning is explicitly resolved or accepted by the workflow rules
|
|
210
|
+
|
|
211
|
+
## Interaction With Filesystem Audit
|
|
212
|
+
|
|
213
|
+
The runtime gate should run first.
|
|
214
|
+
|
|
215
|
+
Then:
|
|
216
|
+
|
|
217
|
+
- if runtime gate is `BLOCK`, stop immediately
|
|
218
|
+
- if runtime gate is `PASS` or acceptable `WARN`, run filesystem completion audit before terminal completion
|
|
219
|
+
|
|
220
|
+
That yields this order:
|
|
221
|
+
|
|
222
|
+
1. runtime gate
|
|
223
|
+
2. filesystem completion audit
|
|
224
|
+
3. completion claim
|
|
225
|
+
|
|
226
|
+
## Minimal Pseudoflow
|
|
227
|
+
|
|
228
|
+
```md
|
|
229
|
+
1. perform first successful Pencil write
|
|
230
|
+
2. read active editor via MCP
|
|
231
|
+
3. read claimed anchor ids from `pencil-design.md`
|
|
232
|
+
4. read registered `.pen` path from `design-registry.md`
|
|
233
|
+
5. check shell-visible `.pen`
|
|
234
|
+
6. read live nodes for claimed anchors
|
|
235
|
+
7. evaluate source convergence
|
|
236
|
+
8. evaluate screen presence
|
|
237
|
+
9. evaluate review execution when relevant
|
|
238
|
+
10. append runtime gate results to `pencil-design.md`
|
|
239
|
+
11. if terminal completion is being claimed, run filesystem completion audit
|
|
240
|
+
12. only report completion if both layers pass
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
## Boundary Decisions
|
|
244
|
+
|
|
245
|
+
### When Pencil MCP is unavailable
|
|
246
|
+
|
|
247
|
+
Do not try to emulate runtime gate.
|
|
248
|
+
|
|
249
|
+
Instead:
|
|
250
|
+
|
|
251
|
+
- record that MCP runtime gate could not run
|
|
252
|
+
- fall back to filesystem audit plus documented constraints
|
|
253
|
+
- do not describe the runtime gate as passed
|
|
254
|
+
|
|
255
|
+
### When no anchor ids are recorded yet
|
|
256
|
+
|
|
257
|
+
The runtime gate may run a reduced source-convergence-only check after the first Pencil write.
|
|
258
|
+
|
|
259
|
+
It should not pretend screen-presence or review-execution checks were completed.
|
|
260
|
+
|
|
261
|
+
### When no new Pencil edits happened
|
|
262
|
+
|
|
263
|
+
Use `WARN` or skip runtime gate rather than fabricating a pass.
|
|
264
|
+
|
|
265
|
+
## Non-Functional Requirements
|
|
266
|
+
|
|
267
|
+
The first implementation should be:
|
|
268
|
+
|
|
269
|
+
- deterministic
|
|
270
|
+
- append-only in artifact recording
|
|
271
|
+
- easy to unit-test from a runtime snapshot object
|
|
272
|
+
- independent from CLI transport changes
|
|
273
|
+
|
|
274
|
+
## Implementation Steps
|
|
275
|
+
|
|
276
|
+
Recommended order:
|
|
277
|
+
|
|
278
|
+
1. define a runtime snapshot shape
|
|
279
|
+
2. define a pure evaluator over that snapshot
|
|
280
|
+
3. add a writer that appends runtime gate results to `pencil-design.md`
|
|
281
|
+
4. call the gate from design-phase runtime checkpoints
|
|
282
|
+
5. wire terminal completion to require both runtime gate and filesystem completion audit
|
|
283
|
+
|
|
284
|
+
## Deferred Work
|
|
285
|
+
|
|
286
|
+
Do not include these in the first implementation:
|
|
287
|
+
|
|
288
|
+
- auto-repair of editor/source mismatch
|
|
289
|
+
- multi-session state reconciliation
|
|
290
|
+
- CLI-facing live runtime commands
|
|
291
|
+
- generalized checkpoint orchestration engine
|
|
@@ -0,0 +1,244 @@
|
|
|
1
|
+
# MCP-Aware Gate Test Design
|
|
2
|
+
|
|
3
|
+
This document defines how to test the future MCP-aware gate implementation.
|
|
4
|
+
|
|
5
|
+
It is a design for validation coverage, not a test implementation.
|
|
6
|
+
|
|
7
|
+
## Test Goal
|
|
8
|
+
|
|
9
|
+
Prove that the MCP-aware gate:
|
|
10
|
+
|
|
11
|
+
- blocks false runtime completion states
|
|
12
|
+
- does not replace filesystem audit
|
|
13
|
+
- does not misclassify healthy runtime design sessions
|
|
14
|
+
- behaves predictably when MCP runtime facts are incomplete
|
|
15
|
+
|
|
16
|
+
## Test Layers
|
|
17
|
+
|
|
18
|
+
The test plan should cover three layers.
|
|
19
|
+
|
|
20
|
+
### 1. Evaluator unit tests
|
|
21
|
+
|
|
22
|
+
Test the pure runtime-gate evaluator against synthetic snapshot inputs.
|
|
23
|
+
|
|
24
|
+
Why:
|
|
25
|
+
|
|
26
|
+
- fastest feedback
|
|
27
|
+
- deterministic
|
|
28
|
+
- does not depend on a live Pencil session
|
|
29
|
+
|
|
30
|
+
### 2. Runtime integration tests
|
|
31
|
+
|
|
32
|
+
Test the runtime-gate caller against real or fixture-backed MCP responses.
|
|
33
|
+
|
|
34
|
+
Why:
|
|
35
|
+
|
|
36
|
+
- proves that active editor and node reads are interpreted correctly
|
|
37
|
+
|
|
38
|
+
### 3. End-to-end workflow tests
|
|
39
|
+
|
|
40
|
+
Test that runtime gate and filesystem audit cooperate correctly in realistic workflow states.
|
|
41
|
+
|
|
42
|
+
Why:
|
|
43
|
+
|
|
44
|
+
- prevents regressions where runtime gate passes but completion still lies
|
|
45
|
+
|
|
46
|
+
## Core Test Categories
|
|
47
|
+
|
|
48
|
+
### A. Healthy runtime source convergence
|
|
49
|
+
|
|
50
|
+
Expected:
|
|
51
|
+
|
|
52
|
+
- active editor is a named project-local `.pen`
|
|
53
|
+
- registered `.pen` path matches
|
|
54
|
+
- shell-visible `.pen` exists
|
|
55
|
+
- result is `PASS`
|
|
56
|
+
|
|
57
|
+
### B. Live editor still `new`
|
|
58
|
+
|
|
59
|
+
Expected:
|
|
60
|
+
|
|
61
|
+
- source convergence is `BLOCK`
|
|
62
|
+
- completion is blocked even if screens exist in the live editor
|
|
63
|
+
|
|
64
|
+
### C. Registered `.pen` path missing on disk
|
|
65
|
+
|
|
66
|
+
Expected:
|
|
67
|
+
|
|
68
|
+
- source convergence or completion status becomes `BLOCK`
|
|
69
|
+
- runtime gate does not treat the live editor alone as sufficient
|
|
70
|
+
|
|
71
|
+
### D. Live screens exist but are not persisted
|
|
72
|
+
|
|
73
|
+
Example:
|
|
74
|
+
|
|
75
|
+
- active editor contains `Splash`, `Home`, `SafeBox`
|
|
76
|
+
- shell-visible `.pen` does not exist
|
|
77
|
+
|
|
78
|
+
Expected:
|
|
79
|
+
|
|
80
|
+
- runtime gate can prove screens exist
|
|
81
|
+
- source convergence is still `BLOCK`
|
|
82
|
+
|
|
83
|
+
### E. Claimed anchor id missing
|
|
84
|
+
|
|
85
|
+
Example:
|
|
86
|
+
|
|
87
|
+
- `pencil-design.md` claims `mCZ1G`
|
|
88
|
+
- current live editor does not contain `mCZ1G`
|
|
89
|
+
|
|
90
|
+
Expected:
|
|
91
|
+
|
|
92
|
+
- screen presence is `BLOCK`
|
|
93
|
+
|
|
94
|
+
### F. Screenshot target missing
|
|
95
|
+
|
|
96
|
+
Expected:
|
|
97
|
+
|
|
98
|
+
- review execution is `BLOCK`
|
|
99
|
+
|
|
100
|
+
### G. Review blockers ignored
|
|
101
|
+
|
|
102
|
+
Example:
|
|
103
|
+
|
|
104
|
+
- screenshot review records blocker-level layout-hygiene issues
|
|
105
|
+
- workflow still marks anchor as approved
|
|
106
|
+
|
|
107
|
+
Expected:
|
|
108
|
+
|
|
109
|
+
- review execution is `BLOCK`
|
|
110
|
+
|
|
111
|
+
### H. Source mismatch with explicit reconciliation
|
|
112
|
+
|
|
113
|
+
Example:
|
|
114
|
+
|
|
115
|
+
- active editor path and registered path differ
|
|
116
|
+
- `pencil-design.md` records a documented reconciliation
|
|
117
|
+
|
|
118
|
+
Expected:
|
|
119
|
+
|
|
120
|
+
- evaluator returns `WARN` or `PASS` only if the reconciliation rules are satisfied
|
|
121
|
+
|
|
122
|
+
### I. No new Pencil edits yet
|
|
123
|
+
|
|
124
|
+
Expected:
|
|
125
|
+
|
|
126
|
+
- runtime gate may return `WARN` or partial result
|
|
127
|
+
- must not fabricate full `PASS`
|
|
128
|
+
|
|
129
|
+
### J. Pencil MCP unavailable
|
|
130
|
+
|
|
131
|
+
Expected:
|
|
132
|
+
|
|
133
|
+
- runtime gate reports unavailable or skipped state
|
|
134
|
+
- workflow does not treat runtime gate as passed
|
|
135
|
+
- filesystem audit remains usable
|
|
136
|
+
|
|
137
|
+
## Required End-to-End Scenarios
|
|
138
|
+
|
|
139
|
+
### Scenario 1: Real failure pattern from Cipher
|
|
140
|
+
|
|
141
|
+
State:
|
|
142
|
+
|
|
143
|
+
- active editor is `new`
|
|
144
|
+
- live screens exist
|
|
145
|
+
- `.da-vinci/designs/` is polluted
|
|
146
|
+
- no shell-visible `.pen`
|
|
147
|
+
|
|
148
|
+
Expected:
|
|
149
|
+
|
|
150
|
+
- runtime gate blocks completion
|
|
151
|
+
- filesystem completion audit also fails
|
|
152
|
+
|
|
153
|
+
### Scenario 2: Healthy redesign completion
|
|
154
|
+
|
|
155
|
+
State:
|
|
156
|
+
|
|
157
|
+
- active editor is the registered `.pen`
|
|
158
|
+
- anchor ids exist
|
|
159
|
+
- screenshot review exists
|
|
160
|
+
- shell-visible `.pen` exists
|
|
161
|
+
- completion audit passes
|
|
162
|
+
|
|
163
|
+
Expected:
|
|
164
|
+
|
|
165
|
+
- runtime gate passes
|
|
166
|
+
- completion audit passes
|
|
167
|
+
- terminal completion is allowed
|
|
168
|
+
|
|
169
|
+
### Scenario 3: Runtime pass, filesystem fail
|
|
170
|
+
|
|
171
|
+
State:
|
|
172
|
+
|
|
173
|
+
- active editor is healthy
|
|
174
|
+
- shell-visible `.pen` exists
|
|
175
|
+
- but `.da-vinci/designs/` contains extra PNG or markdown files
|
|
176
|
+
|
|
177
|
+
Expected:
|
|
178
|
+
|
|
179
|
+
- runtime gate may pass
|
|
180
|
+
- filesystem completion audit fails
|
|
181
|
+
- terminal completion is blocked
|
|
182
|
+
|
|
183
|
+
### Scenario 4: Filesystem pass, runtime fail
|
|
184
|
+
|
|
185
|
+
State:
|
|
186
|
+
|
|
187
|
+
- shell-visible `.pen` exists
|
|
188
|
+
- files look healthy
|
|
189
|
+
- active editor is still `new` or points elsewhere
|
|
190
|
+
|
|
191
|
+
Expected:
|
|
192
|
+
|
|
193
|
+
- runtime gate fails
|
|
194
|
+
- completion is blocked
|
|
195
|
+
|
|
196
|
+
## Assertions That Must Exist
|
|
197
|
+
|
|
198
|
+
For every runtime-gate result, tests should assert:
|
|
199
|
+
|
|
200
|
+
- final status
|
|
201
|
+
- per-layer status
|
|
202
|
+
- which blocking condition fired
|
|
203
|
+
- whether completion would be allowed
|
|
204
|
+
- whether artifact recording shape stays stable
|
|
205
|
+
|
|
206
|
+
## Artifact Recording Tests
|
|
207
|
+
|
|
208
|
+
Verify that runtime gate output:
|
|
209
|
+
|
|
210
|
+
- appends to `pencil-design.md`
|
|
211
|
+
- does not create a new artifact family
|
|
212
|
+
- does not write into `.da-vinci/designs/`
|
|
213
|
+
- does not overwrite unrelated sections
|
|
214
|
+
|
|
215
|
+
## Regression Cases
|
|
216
|
+
|
|
217
|
+
These regressions should be covered:
|
|
218
|
+
|
|
219
|
+
1. runtime gate accidentally behaves like a CLI-only audit
|
|
220
|
+
2. runtime gate silently treats `new` as acceptable
|
|
221
|
+
3. runtime gate treats PNG exports as source evidence
|
|
222
|
+
4. runtime gate passes when anchor ids are missing
|
|
223
|
+
5. runtime gate is skipped but reported as passed
|
|
224
|
+
6. runtime gate writes ad hoc markdown files outside the allowed artifact path
|
|
225
|
+
|
|
226
|
+
## Recommended Test Matrix
|
|
227
|
+
|
|
228
|
+
Minimum matrix:
|
|
229
|
+
|
|
230
|
+
- form factor: mobile first
|
|
231
|
+
- design stage: first Pencil write / approved anchor / terminal completion
|
|
232
|
+
- source state: healthy / `new` / diverged / missing `.pen`
|
|
233
|
+
- screen state: present / missing / stale ids
|
|
234
|
+
- review state: reviewed / missing / blocker ignored
|
|
235
|
+
|
|
236
|
+
## Exit Criteria
|
|
237
|
+
|
|
238
|
+
The implementation should not be considered ready unless all of the following are true:
|
|
239
|
+
|
|
240
|
+
1. evaluator unit tests cover all blocker branches
|
|
241
|
+
2. at least one runtime integration test proves live editor detection works
|
|
242
|
+
3. the Cipher-like failure scenario is blocked
|
|
243
|
+
4. a healthy redesign scenario passes
|
|
244
|
+
5. runtime gate plus filesystem audit together prevent false terminal completion
|