@captain_z/zsk-skills 1.6.1 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/demo/SKILL.md CHANGED
@@ -15,6 +15,7 @@ actions:
15
15
  - resume
16
16
  - terminate
17
17
  - complete
18
+ - run optimized
18
19
  - run playwright
19
20
  - run computer-use
20
21
  - run hybrid
@@ -35,7 +36,19 @@ triggers:
35
36
 
36
37
  # Demo
37
38
 
38
- Development demo before formal testing. Build a module-grouped, audience-ready demo outline, run rehearsed Playwright demonstration steps, record demo-only issues, preserve evidence, and leave reusable Playwright scenarios when possible.
39
+ Development demo before formal testing. Build a module-grouped, audience-ready demo outline, generate Playwright cases from formal raw testing inputs through `test-plan.json`, run rehearsed Playwright demonstration steps, record demo-only issues, preserve evidence, and leave reusable Playwright scenarios when possible.
40
+
41
+ Default `/demo` execution follows the optimized SOP:
42
+
43
+ ```text
44
+ tests.raw_cases / sources.testing, normally .raws/testing
45
+ -> Browser Use observation handoff for logged-in/current-page state and locator hints
46
+ -> zsk/agent-written test-plan.json in tests.derived_cases
47
+ -> Playwright generator/CLI/Skills-written .spec.ts in tests.automated.e2e
48
+ -> Playwright Test/UI/debug execution and evidence under the configured issue evidence path
49
+ ```
50
+
51
+ Browser Use is observation-only in this path. It may describe URL, page title, visible targets, role/label hints, auth/session notes, and privacy concerns. It must not write repo artifacts, `test-plan.json`, or final `.spec.ts` files. Optimized `/demo` must not silently fall back to Playwright MCP, Computer Use, or the legacy hybrid bridge.
39
52
 
40
53
  Demo should execute and refine Playwright scenarios that were planned earlier by spec/design/task. It should not be the first stage that invents UI scenarios unless upstream artifacts are missing, in which case the gap must be recorded as a resource blocker or learning proposal.
41
54
 
@@ -66,7 +79,7 @@ Every demo step must have complete source alignment:
66
79
 
67
80
  Do not include orphan demo steps that cannot be traced to those sources. Do not omit required function points from the PRD/SRS unless the omission is called out as a known gap with owner and reason.
68
81
 
69
- If a formal test case and structured page information or Browser Use handoff are available, demo may generate or refresh a Playwright case before running it. Prefer `aria_snapshot + Browser Use handoff + test case + getByRole` over screenshot-driven generation.
82
+ If a formal test case and Browser Use handoff are available, demo should generate or refresh a Playwright case before running it. Prefer `raw test case + Browser Use observation + test-plan.json + Playwright locator generation` over screenshot-driven generation. Use the legacy hybrid lane explicitly when MCP or Computer Use is required.
70
83
 
71
84
  ## Operating Constraints
72
85
 
@@ -74,6 +87,8 @@ If a formal test case and structured page information or Browser Use handoff are
74
87
  - Do not invent flows when spec/design/task resources are missing; record the resource gap.
75
88
  - Do not use Browser Use when Playwright can reproduce the flow with storageState, a persistent context, or CDP.
76
89
  - Do not use Computer Use when deterministic Playwright, Browser Use, or app APIs are sufficient.
90
+ - Do not let Browser Use generate final Playwright specs; it supplies observation only.
91
+ - Do not use Playwright MCP or Computer Use in the default optimized SOP.
77
92
  - Do not mark ready-for-testing with P0 blockers or untriaged core P1 issues.
78
93
  - Every claim needs reusable scenario evidence, screenshot/trace evidence, or a documented manual run.
79
94
 
@@ -85,6 +100,8 @@ The skill is the conversational entrypoint. The harness/CLI is the execution aut
85
100
  demo start -> zsk demo start
86
101
  demo pause -> zsk demo pause
87
102
  demo resume -> zsk demo resume
103
+ demo run -> zsk demo run --optimized
104
+ demo run optimized -> zsk demo run --optimized
88
105
  demo run playwright -> zsk demo run --playwright
89
106
  demo run computer-use -> zsk demo run --computer-use
90
107
  demo run hybrid -> zsk demo run --hybrid
@@ -96,17 +113,22 @@ demo complete -> zsk demo complete
96
113
  ## Automation Priority
97
114
 
98
115
  1. Prefer deterministic scripts or app/test APIs when they are sufficient.
99
- 2. Prefer Playwright CLI/Test/UI mode for visible, repeatable demo performance, controllable stop/pause, screenshots, traces, reports, and scenario execution.
100
- 3. Use Playwright MCP when structured accessibility snapshots or locator planning are enough before execution.
101
- 4. Use Browser Use when an already logged-in browser, SSO session, extension, persistent profile, or human-like page goal identification is the relevant state source.
102
- 5. Use Computer Use when the surface is visual/system-level rather than reliably scriptable web DOM.
103
- 6. Use the hybrid lane when understanding and execution should be split: Playwright MCP, Browser Use, or Computer Use decides; Playwright executes when feasible and records evidence.
116
+ 2. Default to optimized Playwright generation: raw testing inputs plus Browser Use observation become `test-plan.json`, then Playwright CLI/Skills generate and execute `.spec.ts`.
117
+ 3. Prefer Playwright CLI/Test/UI mode for visible, repeatable demo performance, controllable stop/pause, screenshots, traces, reports, and scenario execution.
118
+ 4. Use Browser Use when an already logged-in browser, SSO session, extension, persistent profile, or human-like page goal identification is the relevant state source; keep it observation-only.
119
+ 5. Use the explicit legacy hybrid lane when understanding and execution should be split across Playwright MCP, Browser Use, or Computer Use.
120
+ 6. Use Computer Use when the surface is visual/system-level rather than reliably scriptable web DOM.
104
121
 
105
- Browser Use or Computer Use must record why scripts, Playwright CLI/MCP, storageState, persistent context, or CDP were insufficient.
122
+ Browser Use or Computer Use outside optimized mode must record why scripts, Playwright CLI, storageState, persistent context, or CDP were insufficient.
106
123
 
107
124
  ## Tool Bridge
108
125
 
109
- The hybrid lane exchanges structured artifacts:
126
+ The optimized lane exchanges structured artifacts:
127
+
128
+ - `test-plan.json`: source-aligned test intent, preconditions, Browser Use observation summary, locator hints, steps, assertions, auth handoff, risks, and generated spec target.
129
+ - `.spec.ts`: final executable Playwright test generated from `test-plan.json`, following Playwright locator and assertion best practices.
130
+
131
+ The legacy hybrid lane exchanges structured artifacts:
110
132
 
111
133
  - `operation-plan.json`: page understanding from Playwright MCP, Browser Use, or Computer Use; next operation; target intent; candidate locator(s); auth/session handoff; confidence; and fallback note.
112
134
  - `playwright-execution.json`: selector/action attempted, result, screenshot/trace/video paths, UI mode/report link, scenario update, and issue link.
@@ -130,9 +152,9 @@ Demo has two sub-phases:
130
152
 
131
153
  The handoff loop is:
132
154
 
133
- 1. Identify: Browser Use or Playwright MCP identifies the human goal, current page state, likely control to click/type, and candidate locators. Browser Use is preferred when Playwright cannot know which visible element matches the user's intent.
134
- 2. Persist state: Browser Use observations must be written to local artifacts before handoff, including URL, candidate locator text, storage/session hint, profile source, visible login state, and privacy note.
135
- 3. Pre-write: Claude/agent converts source evidence plus Browser Use observations into Playwright specs, preferring role-based locators and including any auth handoff that Playwright can reproduce.
155
+ 1. Identify: Browser Use identifies the human goal, current page state, likely control to click/type, and candidate locators when Playwright cannot know which visible element matches the user's intent. In legacy hybrid mode, Playwright MCP or Computer Use may also identify the next operation.
156
+ 2. Persist state: zsk or the agent generator persists Browser Use observations into `test-plan.json` or a separate handoff note before spec generation, including URL, candidate locator text, storage/session hint, profile source, visible login state, and privacy note. Browser Use itself is not the repo writer.
157
+ 3. Pre-write: zsk or the agent generator converts source evidence plus Browser Use observations into `test-plan.json` and then Playwright specs, preferring role-based locators and including any auth handoff that Playwright can reproduce.
136
158
  4. Rehearse: Playwright runs the pre-written cases with `storageState`, persistent context, CDP, or fixture login when available, then records trace/UI/report evidence.
137
159
  5. Perform: Demo Show uses the rehearsed Playwright cases; successful runs are promoted to reusable verify/regression scenarios.
138
160
 
@@ -140,17 +162,16 @@ Repeat the loop until the demo function point is passed, paused, or converted in
140
162
 
141
163
  ## Scenario Generation
142
164
 
143
- The agent should synthesize Playwright cases before the external demo from structured page info, source evidence, and Browser Use state handoff:
165
+ The agent should synthesize Playwright cases before the external demo from raw test cases, source evidence, and Browser Use state handoff:
144
166
 
145
167
  1. Load SRS/spec/design rows, formal QA cases, existing automation/e2e cases, and relevant unit-test assertions/fixtures.
146
- 2. Read Playwright MCP `aria_snapshot` or accessibility tree; if Browser Use supplied login/session state, map it to Playwright `storageState`, persistent context, CDP, or documented manual setup.
147
- 3. Use Browser Use to map human intent to visible controls and candidate locators when Playwright cannot infer which control to operate.
148
- 4. Generate a Playwright spec using role-first locators, explicit auth bootstrap, and assertions derived from the source evidence.
149
- 5. Save the pre-write spec under `docs/{module}/scenarios/`.
150
- 6. Rehearse it with Playwright UI/trace/report evidence and preserve trace/screenshots/video.
151
- 7. Use the rehearsed case for Demo Show; do not improvise new clicks during the external demo unless the prepared case is blocked and the blocker is recorded.
168
+ 2. Read Browser Use observation when login/session/current-page state or human-intent locator mapping is needed; map auth to Playwright `storageState`, persistent context, CDP, or documented manual setup.
169
+ 3. Generate `test-plan.json` under `tests.derived_cases` with source links, step intent, locator hints, assertions, auth handoff, risks, and generated spec target.
170
+ 4. Generate a Playwright spec under `tests.automated.e2e` using role-first locators, explicit auth bootstrap, and assertions derived from source evidence.
171
+ 5. Rehearse it with Playwright UI/trace/report evidence and preserve trace/screenshots/video under the configured issue evidence path.
172
+ 6. Use the rehearsed case for Demo Show; do not improvise new clicks during the external demo unless the prepared case is blocked and the blocker is recorded.
152
173
 
153
- Browser Use is reserved for existing authenticated browser/profile state. Computer Use is reserved for visual/system-level cases where the structured page tree, Browser Use, and Playwright CLI/MCP evidence are insufficient.
174
+ Playwright MCP and Computer Use are legacy hybrid fallbacks only. Browser Use is reserved for existing authenticated browser/profile state and human-intent locator mapping, and remains observation-only in optimized mode.
154
175
 
155
176
  CLI example:
156
177
 
@@ -158,7 +179,6 @@ CLI example:
158
179
  zsk demo scenario generate \
159
180
  -m checkout \
160
181
  --test-case .raws/testing/checkout-happy-path.md \
161
- --snapshot .raws/testing/checkout.aria.md \
162
182
  --name "Checkout happy path"
163
183
  ```
164
184
 
package/demo/harness.yaml CHANGED
@@ -22,7 +22,25 @@ checks:
22
22
  - demo-session
23
23
  - issue-taxonomy
24
24
  - scenario-preservation
25
+ - optimized-test-plan
25
26
  - tool-bridge
27
+ optimized:
28
+ testPlan:
29
+ role: source-aligned-intermediate-contract
30
+ input: tests.raw_cases
31
+ fallbackInput: sources.testing
32
+ output: test-plan.json
33
+ playwrightSpec:
34
+ role: generated-executable-scenario
35
+ input: test-plan.json
36
+ output: "*.spec.ts"
37
+ browserUse:
38
+ role: observation-only
39
+ writesRepoArtifacts: false
40
+ forbidden:
41
+ - playwright_mcp
42
+ - computer_use
43
+ - operation-plan.json
26
44
  bridge:
27
45
  playwrightCli:
28
46
  role: low-token-execute-screenshot-trace
@@ -1,13 +1,35 @@
1
1
  # Demo Automation Reference
2
2
 
3
- Use deterministic local scripts first when they fully cover the flow. For real UI demo automation, prefer the hybrid bridge:
3
+ Use deterministic local scripts first when they fully cover the flow. For test-case-driven web demos, `/demo` defaults to the optimized SOP:
4
+
5
+ ```text
6
+ tests.raw_cases / sources.testing
7
+ -> Browser Use observation handoff
8
+ -> test-plan.json
9
+ -> Playwright .spec.ts
10
+ -> Playwright Test/UI/debug evidence
11
+ ```
12
+
13
+ The optimized lane does not use Playwright MCP, Computer Use, or bridge artifacts. Use the legacy hybrid bridge only when optimized Playwright generation cannot represent the page or state.
4
14
 
5
15
  - Playwright: perform the visible demo run, allow controlled pause/termination, record screenshots/traces/video/reports, preserve scenario cases, and support reproducible auth through fixtures, `storageState`, persistent contexts, or CDP.
6
16
  - Playwright MCP: inspect structured accessibility snapshots and produce low-ambiguity operation plans.
7
- - Browser Use: identify the human-intent target and operate an existing or persistent logged-in browser profile when SSO, extensions, CAPTCHA-adjacent flows, or human browser state matters.
17
+ - Browser Use: observe the human-intent target and existing or persistent logged-in browser profile when SSO, extensions, CAPTCHA-adjacent flows, or human browser state matters.
8
18
  - Computer Use: understand visual/human-like or system-level context when DOM, ARIA, CDP, and Browser Use are insufficient.
9
19
 
10
- Use Playwright-only for stable scenarios. Use Browser Use for stateful browser sessions. Use Computer Use-only as an explicit visual/system fallback.
20
+ Use optimized mode for raw-test-case to Playwright generation. Use Playwright-only for stable scenarios. Use Browser Use for observation of stateful browser sessions. Use Computer Use-only as an explicit visual/system fallback.
21
+
22
+ ## Optimized SOP
23
+
24
+ In optimized mode:
25
+
26
+ 1. Read formal test cases from `tests.raw_cases` or `sources.testing`; these usually point into `.raws/testing`.
27
+ 2. Use Browser Use only to observe the logged-in/current page: URL, page title, visible controls, role/label hints, auth/session note, and privacy note.
28
+ 3. Have zsk or the agent generator write `test-plan.json` under `tests.derived_cases`.
29
+ 4. Generate final Playwright `.spec.ts` under `tests.automated.e2e`.
30
+ 5. Execute or rehearse with Playwright CLI/Test/UI mode and store evidence under the configured issue evidence directory.
31
+
32
+ Browser Use must not write repo artifacts, `test-plan.json`, or final `.spec.ts`. If raw cases, auth state, or generated output are missing, optimized mode records a blocker or exits non-successfully; it must not silently fall back to Playwright MCP, Computer Use, or legacy bridge behavior.
11
33
 
12
34
  ## Playwright Surfaces
13
35
 
@@ -17,7 +39,7 @@ Use the Playwright tool that matches the demo job:
17
39
  | --- | --- |
18
40
  | Playwright Test | Preserve reusable scenario cases and rerun them for smoke, verify, and regression. |
19
41
  | Playwright CLI/UI/Report | Visible demo performance, controlled stop/pause, token-efficient browser control, live session inspection, and replayable reports/traces. |
20
- | Playwright MCP | Agent-facing structured accessibility snapshots for page understanding and operation planning. |
42
+ | Playwright MCP | Legacy hybrid-only structured accessibility snapshots for page understanding and operation planning. Not used by optimized mode. |
21
43
  | Playwright Library | Deterministic browser scripts for screenshots, PDF, network interception, and custom evidence capture. |
22
44
 
23
45
  ## Browser State And Login
@@ -26,14 +48,14 @@ Do not use Computer Use just to keep a login session. Prefer this order:
26
48
 
27
49
  1. Playwright fixture login or `storageState` for controlled test accounts.
28
50
  2. Playwright persistent context or CDP connection when a dedicated browser profile is acceptable.
29
- 3. Browser Use when the user already has a logged-in browser/profile, SSO state, or extension-dependent session that should be preserved.
51
+ 3. Browser Use observation when the user already has a logged-in browser/profile, SSO state, or extension-dependent session that should be preserved.
30
52
  4. Computer Use only when the required state is visual/system-level or not reachable through browser automation.
31
53
 
32
54
  Browser Use runs must record the browser/profile/session source and whether credentials or personal data were visible.
33
55
 
34
56
  ## Browser Use To Playwright Handoff
35
57
 
36
- Browser Use should not be a throwaway visit. When it identifies what to click or which logged-in state matters, persist a handoff artifact:
58
+ Browser Use should not be a throwaway visit. When it identifies what to click or which logged-in state matters, preserve a structured handoff in the optimized `test-plan.json` or a separate agent note consumed by the zsk generator:
37
59
 
38
60
  - URL and page title.
39
61
  - Human goal and current page summary.
@@ -41,7 +63,7 @@ Browser Use should not be a throwaway visit. When it identifies what to click or
41
63
  - Login/profile/session source and privacy note.
42
64
  - Whether Playwright should use `storageState`, persistent context, CDP, or manual fixture setup.
43
65
 
44
- Playwright then performs the visible demo run from that handoff. If the run succeeds, promote the path into a Playwright spec; if it fails, pause the demo with selector/auth/session diagnostics instead of losing the Browser Use observation.
66
+ Playwright then performs the visible demo run from that handoff. Browser Use itself must not write the final Playwright spec. If the run succeeds, promote the path into a Playwright spec; if it fails, pause the demo with selector/auth/session diagnostics instead of losing the Browser Use observation.
45
67
 
46
68
  ## Pre-write Demo Cases
47
69
 
@@ -50,7 +72,7 @@ Demo should not discover the path live in front of the audience. Prepare it firs
50
72
  1. Collect source evidence: SRS, spec/design rows, formal QA cases, existing automation/e2e cases, and unit-test assertions/fixtures.
51
73
  2. Write a flow-first demo outline: first show the starting state and primary goal, then the core happy path, then dependent function points, then required branch or edge scenarios, then the final state and evidence. Each row needs function/business point, scenario, source alignment, Playwright case, presenter words, visible result, and next step.
52
74
  3. Browser Use captures login/profile state, page intent, visible targets, and candidate locators.
53
- 4. The agent maps the source evidence to those targets and writes Playwright pre-write specs under the configured scenario directory.
75
+ 4. zsk or the agent generator maps the source evidence to those targets, writes `test-plan.json`, and then writes Playwright pre-write specs under the configured scenario directory.
54
76
  5. Playwright rehearses the specs with UI mode, trace, video, or HTML report enabled.
55
77
  6. The external demo runs the rehearsed specs, so every step is visible, controllable, and stoppable.
56
78
  7. Any live drift pauses the demo and creates diagnostics; it does not erase the Browser Use handoff.
@@ -91,7 +113,18 @@ Keep detailed resources, handoff notes, and evidence tables below the flow. They
91
113
 
92
114
  ## Tool-call Bridge
93
115
 
94
- `operation-plan.json` should contain:
116
+ Optimized `test-plan.json` should contain:
117
+
118
+ - source raw case paths
119
+ - auth/storageState expectation
120
+ - Browser Use observation summary
121
+ - test data
122
+ - step intent/action/locator hints
123
+ - assertions
124
+ - generated spec target
125
+ - risks and blocker notes
126
+
127
+ Legacy hybrid `operation-plan.json` should contain:
95
128
 
96
129
  - current page summary
97
130
  - target function point
@@ -100,7 +133,7 @@ Keep detailed resources, handoff notes, and evidence tables below the flow. They
100
133
  - risk/confidence
101
134
  - fallback note
102
135
 
103
- Prefer `aria_snapshot + agent decision + getByRole` as the minimum stable loop:
136
+ In optimized mode, prefer `raw test case + Browser Use observation + test-plan.json + getByRole` as the minimum stable loop. In legacy hybrid mode, `aria_snapshot + agent decision + getByRole` is still valid:
104
137
 
105
138
  1. `aria_snapshot` gives the agent a compact semantic page tree.
106
139
  2. Browser Use fills the gap when the semantic tree does not reveal which visible control matches the human intent.
@@ -114,10 +147,11 @@ Use screenshots only when semantic snapshots or authenticated Browser Use observ
114
147
  When formal test cases exist, the agent should generate Playwright specs before the external demo:
115
148
 
116
149
  1. Parse the test case steps and expected results.
117
- 2. Map each step to the aria/accessibility snapshot.
118
- 3. Choose locators in this order: role, label, placeholder, test id, text.
119
- 4. Generate a spec with web-first assertions.
120
- 5. Mark tags: `demo`, `verify`, `regression` as appropriate.
150
+ 2. Map each step to Browser Use observation and locator hints when available.
151
+ 3. Write `test-plan.json` as the intermediate contract.
152
+ 4. Choose locators in this order: role, label, placeholder, test id, text.
153
+ 5. Generate a spec with web-first assertions.
154
+ 6. Mark tags: `demo`, `verify`, `regression` as appropriate.
121
155
 
122
156
  This turns demo automation into reusable test assets instead of one-off clicking.
123
157
 
@@ -127,6 +161,8 @@ The generator should produce executable skeletons from structured input. Minimum
127
161
  test case markdown + aria snapshot
128
162
  ```
129
163
 
164
+ For optimized mode, replace `aria snapshot` with a Browser Use observation handoff when available.
165
+
130
166
  Minimum viable output:
131
167
 
132
168
  ```ts
@@ -147,7 +183,7 @@ await page.getByRole("button", { name: "Add" }).click()
147
183
 
148
184
  When a demo step is stable and reusable:
149
185
 
150
- 1. Save or update a scenario under `skills/demo/scenarios/` or the project configured scenario directory.
186
+ 1. Save or update a scenario under the project configured scenario directory.
151
187
  2. Link the scenario from `docs/{module}/demo-report.md`.
152
188
  3. Link screenshots and traces from `.issues/{module}/demo/_evidence/`.
153
189
  4. Mark reuse targets: `smoke`, `verify`, `regression`.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@captain_z/zsk-skills",
3
- "version": "1.6.1",
3
+ "version": "1.7.0",
4
4
  "description": "ZNorth Standard Kit — core harness-first skill content package",
5
5
  "license": "MIT",
6
6
  "files": [