npm - @captain_z/zsk-skills - Versions diffs - 1.6.1 → 1.7.0 - Mend

@captain_z/zsk-skills 1.6.1 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/demo/SKILL.md +41 -21
package/demo/harness.yaml +18 -0
package/demo/references/automation.md +51 -15
package/package.json +1 -1

package/demo/SKILL.md CHANGED Viewed

@@ -15,6 +15,7 @@ actions:
   - resume
   - terminate
   - complete
+  - run optimized
   - run playwright
   - run computer-use
   - run hybrid
@@ -35,7 +36,19 @@ triggers:
 # Demo
-Development demo before formal testing. Build a module-grouped, audience-ready demo outline, run rehearsed Playwright demonstration steps, record demo-only issues, preserve evidence, and leave reusable Playwright scenarios when possible.
+Development demo before formal testing. Build a module-grouped, audience-ready demo outline, generate Playwright cases from formal raw testing inputs through `test-plan.json`, run rehearsed Playwright demonstration steps, record demo-only issues, preserve evidence, and leave reusable Playwright scenarios when possible.
+Default `/demo` execution follows the optimized SOP:
+```text
+tests.raw_cases / sources.testing, normally .raws/testing
+  -> Browser Use observation handoff for logged-in/current-page state and locator hints
+  -> zsk/agent-written test-plan.json in tests.derived_cases
+  -> Playwright generator/CLI/Skills-written .spec.ts in tests.automated.e2e
+  -> Playwright Test/UI/debug execution and evidence under the configured issue evidence path
+```
+Browser Use is observation-only in this path. It may describe URL, page title, visible targets, role/label hints, auth/session notes, and privacy concerns. It must not write repo artifacts, `test-plan.json`, or final `.spec.ts` files. Optimized `/demo` must not silently fall back to Playwright MCP, Computer Use, or the legacy hybrid bridge.
 Demo should execute and refine Playwright scenarios that were planned earlier by spec/design/task. It should not be the first stage that invents UI scenarios unless upstream artifacts are missing, in which case the gap must be recorded as a resource blocker or learning proposal.
@@ -66,7 +79,7 @@ Every demo step must have complete source alignment:
 Do not include orphan demo steps that cannot be traced to those sources. Do not omit required function points from the PRD/SRS unless the omission is called out as a known gap with owner and reason.
-If a formal test case and structured page information or Browser Use handoff are available, demo may generate or refresh a Playwright case before running it. Prefer `aria_snapshot + Browser Use handoff + test case + getByRole` over screenshot-driven generation.
+If a formal test case and Browser Use handoff are available, demo should generate or refresh a Playwright case before running it. Prefer `raw test case + Browser Use observation + test-plan.json + Playwright locator generation` over screenshot-driven generation. Use the legacy hybrid lane explicitly when MCP or Computer Use is required.
 ## Operating Constraints
@@ -74,6 +87,8 @@ If a formal test case and structured page information or Browser Use handoff are
 - Do not invent flows when spec/design/task resources are missing; record the resource gap.
 - Do not use Browser Use when Playwright can reproduce the flow with storageState, a persistent context, or CDP.
 - Do not use Computer Use when deterministic Playwright, Browser Use, or app APIs are sufficient.
+- Do not let Browser Use generate final Playwright specs; it supplies observation only.
+- Do not use Playwright MCP or Computer Use in the default optimized SOP.
 - Do not mark ready-for-testing with P0 blockers or untriaged core P1 issues.
 - Every claim needs reusable scenario evidence, screenshot/trace evidence, or a documented manual run.
@@ -85,6 +100,8 @@ The skill is the conversational entrypoint. The harness/CLI is the execution aut
 demo start              -> zsk demo start
 demo pause              -> zsk demo pause
 demo resume             -> zsk demo resume
+demo run                -> zsk demo run --optimized
+demo run optimized      -> zsk demo run --optimized
 demo run playwright     -> zsk demo run --playwright
 demo run computer-use   -> zsk demo run --computer-use
 demo run hybrid         -> zsk demo run --hybrid
@@ -96,17 +113,22 @@ demo complete           -> zsk demo complete
 ## Automation Priority
 1. Prefer deterministic scripts or app/test APIs when they are sufficient.
-2. Prefer Playwright CLI/Test/UI mode for visible, repeatable demo performance, controllable stop/pause, screenshots, traces, reports, and scenario execution.
-3. Use Playwright MCP when structured accessibility snapshots or locator planning are enough before execution.
-4. Use Browser Use when an already logged-in browser, SSO session, extension, persistent profile, or human-like page goal identification is the relevant state source.
-5. Use Computer Use when the surface is visual/system-level rather than reliably scriptable web DOM.
-6. Use the hybrid lane when understanding and execution should be split: Playwright MCP, Browser Use, or Computer Use decides; Playwright executes when feasible and records evidence.
+2. Default to optimized Playwright generation: raw testing inputs plus Browser Use observation become `test-plan.json`, then Playwright CLI/Skills generate and execute `.spec.ts`.
+3. Prefer Playwright CLI/Test/UI mode for visible, repeatable demo performance, controllable stop/pause, screenshots, traces, reports, and scenario execution.
+4. Use Browser Use when an already logged-in browser, SSO session, extension, persistent profile, or human-like page goal identification is the relevant state source; keep it observation-only.
+5. Use the explicit legacy hybrid lane when understanding and execution should be split across Playwright MCP, Browser Use, or Computer Use.
+6. Use Computer Use when the surface is visual/system-level rather than reliably scriptable web DOM.
-Browser Use or Computer Use must record why scripts, Playwright CLI/MCP, storageState, persistent context, or CDP were insufficient.
+Browser Use or Computer Use outside optimized mode must record why scripts, Playwright CLI, storageState, persistent context, or CDP were insufficient.
 ## Tool Bridge
-The hybrid lane exchanges structured artifacts:
+The optimized lane exchanges structured artifacts:
+- `test-plan.json`: source-aligned test intent, preconditions, Browser Use observation summary, locator hints, steps, assertions, auth handoff, risks, and generated spec target.
+- `.spec.ts`: final executable Playwright test generated from `test-plan.json`, following Playwright locator and assertion best practices.
+The legacy hybrid lane exchanges structured artifacts:
 - `operation-plan.json`: page understanding from Playwright MCP, Browser Use, or Computer Use; next operation; target intent; candidate locator(s); auth/session handoff; confidence; and fallback note.
 - `playwright-execution.json`: selector/action attempted, result, screenshot/trace/video paths, UI mode/report link, scenario update, and issue link.
@@ -130,9 +152,9 @@ Demo has two sub-phases:
 The handoff loop is:
-1. Identify: Browser Use or Playwright MCP identifies the human goal, current page state, likely control to click/type, and candidate locators. Browser Use is preferred when Playwright cannot know which visible element matches the user's intent.
-2. Persist state: Browser Use observations must be written to local artifacts before handoff, including URL, candidate locator text, storage/session hint, profile source, visible login state, and privacy note.
-3. Pre-write: Claude/agent converts source evidence plus Browser Use observations into Playwright specs, preferring role-based locators and including any auth handoff that Playwright can reproduce.
+1. Identify: Browser Use identifies the human goal, current page state, likely control to click/type, and candidate locators when Playwright cannot know which visible element matches the user's intent. In legacy hybrid mode, Playwright MCP or Computer Use may also identify the next operation.
+2. Persist state: zsk or the agent generator persists Browser Use observations into `test-plan.json` or a separate handoff note before spec generation, including URL, candidate locator text, storage/session hint, profile source, visible login state, and privacy note. Browser Use itself is not the repo writer.
+3. Pre-write: zsk or the agent generator converts source evidence plus Browser Use observations into `test-plan.json` and then Playwright specs, preferring role-based locators and including any auth handoff that Playwright can reproduce.
 4. Rehearse: Playwright runs the pre-written cases with `storageState`, persistent context, CDP, or fixture login when available, then records trace/UI/report evidence.
 5. Perform: Demo Show uses the rehearsed Playwright cases; successful runs are promoted to reusable verify/regression scenarios.
@@ -140,17 +162,16 @@ Repeat the loop until the demo function point is passed, paused, or converted in
 ## Scenario Generation
-The agent should synthesize Playwright cases before the external demo from structured page info, source evidence, and Browser Use state handoff:
+The agent should synthesize Playwright cases before the external demo from raw test cases, source evidence, and Browser Use state handoff:
 1. Load SRS/spec/design rows, formal QA cases, existing automation/e2e cases, and relevant unit-test assertions/fixtures.
-2. Read Playwright MCP `aria_snapshot` or accessibility tree; if Browser Use supplied login/session state, map it to Playwright `storageState`, persistent context, CDP, or documented manual setup.
-3. Use Browser Use to map human intent to visible controls and candidate locators when Playwright cannot infer which control to operate.
-4. Generate a Playwright spec using role-first locators, explicit auth bootstrap, and assertions derived from the source evidence.
-5. Save the pre-write spec under `docs/{module}/scenarios/`.
-6. Rehearse it with Playwright UI/trace/report evidence and preserve trace/screenshots/video.
-7. Use the rehearsed case for Demo Show; do not improvise new clicks during the external demo unless the prepared case is blocked and the blocker is recorded.
+2. Read Browser Use observation when login/session/current-page state or human-intent locator mapping is needed; map auth to Playwright `storageState`, persistent context, CDP, or documented manual setup.
+3. Generate `test-plan.json` under `tests.derived_cases` with source links, step intent, locator hints, assertions, auth handoff, risks, and generated spec target.
+4. Generate a Playwright spec under `tests.automated.e2e` using role-first locators, explicit auth bootstrap, and assertions derived from source evidence.
+5. Rehearse it with Playwright UI/trace/report evidence and preserve trace/screenshots/video under the configured issue evidence path.
+6. Use the rehearsed case for Demo Show; do not improvise new clicks during the external demo unless the prepared case is blocked and the blocker is recorded.
-Browser Use is reserved for existing authenticated browser/profile state. Computer Use is reserved for visual/system-level cases where the structured page tree, Browser Use, and Playwright CLI/MCP evidence are insufficient.
+Playwright MCP and Computer Use are legacy hybrid fallbacks only. Browser Use is reserved for existing authenticated browser/profile state and human-intent locator mapping, and remains observation-only in optimized mode.
 CLI example:
@@ -158,7 +179,6 @@ CLI example:
 zsk demo scenario generate \
   -m checkout \
   --test-case .raws/testing/checkout-happy-path.md \
-  --snapshot .raws/testing/checkout.aria.md \
   --name "Checkout happy path"
 ```

package/demo/harness.yaml CHANGED Viewed

@@ -22,7 +22,25 @@ checks:
   - demo-session
   - issue-taxonomy
   - scenario-preservation
+  - optimized-test-plan
   - tool-bridge
+optimized:
+  testPlan:
+    role: source-aligned-intermediate-contract
+    input: tests.raw_cases
+    fallbackInput: sources.testing
+    output: test-plan.json
+  playwrightSpec:
+    role: generated-executable-scenario
+    input: test-plan.json
+    output: "*.spec.ts"
+  browserUse:
+    role: observation-only
+    writesRepoArtifacts: false
+  forbidden:
+    - playwright_mcp
+    - computer_use
+    - operation-plan.json
 bridge:
   playwrightCli:
     role: low-token-execute-screenshot-trace

package/demo/references/automation.md CHANGED Viewed

@@ -1,13 +1,35 @@
 # Demo Automation Reference
-Use deterministic local scripts first when they fully cover the flow. For real UI demo automation, prefer the hybrid bridge:
+Use deterministic local scripts first when they fully cover the flow. For test-case-driven web demos, `/demo` defaults to the optimized SOP:
+```text
+tests.raw_cases / sources.testing
+  -> Browser Use observation handoff
+  -> test-plan.json
+  -> Playwright .spec.ts
+  -> Playwright Test/UI/debug evidence
+```
+The optimized lane does not use Playwright MCP, Computer Use, or bridge artifacts. Use the legacy hybrid bridge only when optimized Playwright generation cannot represent the page or state.
 - Playwright: perform the visible demo run, allow controlled pause/termination, record screenshots/traces/video/reports, preserve scenario cases, and support reproducible auth through fixtures, `storageState`, persistent contexts, or CDP.
 - Playwright MCP: inspect structured accessibility snapshots and produce low-ambiguity operation plans.
-- Browser Use: identify the human-intent target and operate an existing or persistent logged-in browser profile when SSO, extensions, CAPTCHA-adjacent flows, or human browser state matters.
+- Browser Use: observe the human-intent target and existing or persistent logged-in browser profile when SSO, extensions, CAPTCHA-adjacent flows, or human browser state matters.
 - Computer Use: understand visual/human-like or system-level context when DOM, ARIA, CDP, and Browser Use are insufficient.
-Use Playwright-only for stable scenarios. Use Browser Use for stateful browser sessions. Use Computer Use-only as an explicit visual/system fallback.
+Use optimized mode for raw-test-case to Playwright generation. Use Playwright-only for stable scenarios. Use Browser Use for observation of stateful browser sessions. Use Computer Use-only as an explicit visual/system fallback.
+## Optimized SOP
+In optimized mode:
+1. Read formal test cases from `tests.raw_cases` or `sources.testing`; these usually point into `.raws/testing`.
+2. Use Browser Use only to observe the logged-in/current page: URL, page title, visible controls, role/label hints, auth/session note, and privacy note.
+3. Have zsk or the agent generator write `test-plan.json` under `tests.derived_cases`.
+4. Generate final Playwright `.spec.ts` under `tests.automated.e2e`.
+5. Execute or rehearse with Playwright CLI/Test/UI mode and store evidence under the configured issue evidence directory.
+Browser Use must not write repo artifacts, `test-plan.json`, or final `.spec.ts`. If raw cases, auth state, or generated output are missing, optimized mode records a blocker or exits non-successfully; it must not silently fall back to Playwright MCP, Computer Use, or legacy bridge behavior.
 ## Playwright Surfaces
@@ -17,7 +39,7 @@ Use the Playwright tool that matches the demo job:
 | --- | --- |
 | Playwright Test | Preserve reusable scenario cases and rerun them for smoke, verify, and regression. |
 | Playwright CLI/UI/Report | Visible demo performance, controlled stop/pause, token-efficient browser control, live session inspection, and replayable reports/traces. |
-| Playwright MCP | Agent-facing structured accessibility snapshots for page understanding and operation planning. |
+| Playwright MCP | Legacy hybrid-only structured accessibility snapshots for page understanding and operation planning. Not used by optimized mode. |
 | Playwright Library | Deterministic browser scripts for screenshots, PDF, network interception, and custom evidence capture. |
 ## Browser State And Login
@@ -26,14 +48,14 @@ Do not use Computer Use just to keep a login session. Prefer this order:
 1. Playwright fixture login or `storageState` for controlled test accounts.
 2. Playwright persistent context or CDP connection when a dedicated browser profile is acceptable.
-3. Browser Use when the user already has a logged-in browser/profile, SSO state, or extension-dependent session that should be preserved.
+3. Browser Use observation when the user already has a logged-in browser/profile, SSO state, or extension-dependent session that should be preserved.
 4. Computer Use only when the required state is visual/system-level or not reachable through browser automation.
 Browser Use runs must record the browser/profile/session source and whether credentials or personal data were visible.
 ## Browser Use To Playwright Handoff
-Browser Use should not be a throwaway visit. When it identifies what to click or which logged-in state matters, persist a handoff artifact:
+Browser Use should not be a throwaway visit. When it identifies what to click or which logged-in state matters, preserve a structured handoff in the optimized `test-plan.json` or a separate agent note consumed by the zsk generator:
 - URL and page title.
 - Human goal and current page summary.
@@ -41,7 +63,7 @@ Browser Use should not be a throwaway visit. When it identifies what to click or
 - Login/profile/session source and privacy note.
 - Whether Playwright should use `storageState`, persistent context, CDP, or manual fixture setup.
-Playwright then performs the visible demo run from that handoff. If the run succeeds, promote the path into a Playwright spec; if it fails, pause the demo with selector/auth/session diagnostics instead of losing the Browser Use observation.
+Playwright then performs the visible demo run from that handoff. Browser Use itself must not write the final Playwright spec. If the run succeeds, promote the path into a Playwright spec; if it fails, pause the demo with selector/auth/session diagnostics instead of losing the Browser Use observation.
 ## Pre-write Demo Cases
@@ -50,7 +72,7 @@ Demo should not discover the path live in front of the audience. Prepare it firs
 1. Collect source evidence: SRS, spec/design rows, formal QA cases, existing automation/e2e cases, and unit-test assertions/fixtures.
 2. Write a flow-first demo outline: first show the starting state and primary goal, then the core happy path, then dependent function points, then required branch or edge scenarios, then the final state and evidence. Each row needs function/business point, scenario, source alignment, Playwright case, presenter words, visible result, and next step.
 3. Browser Use captures login/profile state, page intent, visible targets, and candidate locators.
-4. The agent maps the source evidence to those targets and writes Playwright pre-write specs under the configured scenario directory.
+4. zsk or the agent generator maps the source evidence to those targets, writes `test-plan.json`, and then writes Playwright pre-write specs under the configured scenario directory.
 5. Playwright rehearses the specs with UI mode, trace, video, or HTML report enabled.
 6. The external demo runs the rehearsed specs, so every step is visible, controllable, and stoppable.
 7. Any live drift pauses the demo and creates diagnostics; it does not erase the Browser Use handoff.
@@ -91,7 +113,18 @@ Keep detailed resources, handoff notes, and evidence tables below the flow. They
 ## Tool-call Bridge
-`operation-plan.json` should contain:
+Optimized `test-plan.json` should contain:
+- source raw case paths
+- auth/storageState expectation
+- Browser Use observation summary
+- test data
+- step intent/action/locator hints
+- assertions
+- generated spec target
+- risks and blocker notes
+Legacy hybrid `operation-plan.json` should contain:
 - current page summary
 - target function point
@@ -100,7 +133,7 @@ Keep detailed resources, handoff notes, and evidence tables below the flow. They
 - risk/confidence
 - fallback note
-Prefer `aria_snapshot + agent decision + getByRole` as the minimum stable loop:
+In optimized mode, prefer `raw test case + Browser Use observation + test-plan.json + getByRole` as the minimum stable loop. In legacy hybrid mode, `aria_snapshot + agent decision + getByRole` is still valid:
 1. `aria_snapshot` gives the agent a compact semantic page tree.
 2. Browser Use fills the gap when the semantic tree does not reveal which visible control matches the human intent.
@@ -114,10 +147,11 @@ Use screenshots only when semantic snapshots or authenticated Browser Use observ
 When formal test cases exist, the agent should generate Playwright specs before the external demo:
 1. Parse the test case steps and expected results.
-2. Map each step to the aria/accessibility snapshot.
-3. Choose locators in this order: role, label, placeholder, test id, text.
-4. Generate a spec with web-first assertions.
-5. Mark tags: `demo`, `verify`, `regression` as appropriate.
+2. Map each step to Browser Use observation and locator hints when available.
+3. Write `test-plan.json` as the intermediate contract.
+4. Choose locators in this order: role, label, placeholder, test id, text.
+5. Generate a spec with web-first assertions.
+6. Mark tags: `demo`, `verify`, `regression` as appropriate.
 This turns demo automation into reusable test assets instead of one-off clicking.
@@ -127,6 +161,8 @@ The generator should produce executable skeletons from structured input. Minimum
 test case markdown + aria snapshot
 ```
+For optimized mode, replace `aria snapshot` with a Browser Use observation handoff when available.
 Minimum viable output:
 ```ts
@@ -147,7 +183,7 @@ await page.getByRole("button", { name: "Add" }).click()
 When a demo step is stable and reusable:
-1. Save or update a scenario under `skills/demo/scenarios/` or the project configured scenario directory.
+1. Save or update a scenario under the project configured scenario directory.
 2. Link the scenario from `docs/{module}/demo-report.md`.
 3. Link screenshots and traces from `.issues/{module}/demo/_evidence/`.
 4. Mark reuse targets: `smoke`, `verify`, `regression`.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@captain_z/zsk-skills",
-  "version": "1.6.1",
+  "version": "1.7.0",
   "description": "ZNorth Standard Kit — core harness-first skill content package",
   "license": "MIT",
   "files": [