gm-skill 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: planning
3
+ description: State machine orchestrator. Mutable discovery, PRD construction, and full PLAN→EXECUTE→EMIT→VERIFY→COMPLETE lifecycle. Invoke at session start and on any new unknown.
4
+ allowed-tools: Skill
5
+ ---
6
+
7
+ # Planning — PLAN phase
8
+
9
+ Translate the request into `.gm/prd.yml` and hand to `gm-execute`. Re-enter on any new unknown in any phase.
10
+
11
+ A `@<discipline>` sigil in the request scopes recall, codesearch, and memorize calls during PLAN to that discipline's store. Without one, retrievals fan across default plus enabled disciplines and writes land in default only.
12
+
13
+ Cross-cutting dispositions (autonomy, fix-on-sight, nothing-fake, browser-witness, scope, recall, memorize) live in `gm` SKILL.md; this skill only carries what is unique to PLAN.
14
+
15
+ ## Transitions
16
+
17
+ - PLAN done → `gm-execute`
18
+ - New unknown anywhere in chain → re-enter PLAN
19
+ - EXECUTE unresolvable after 2 passes → PLAN
20
+ - VERIFY: `.prd` empty + git clean + pushed → `update-docs`; else → `gm-execute`
21
+
22
+ Cannot stop while `.gm/prd.yml` has items, git is dirty, or commits are unpushed.
23
+
24
+ ## Orient
25
+
26
+ Open every plan with one parallel pack of `exec:recall` + `exec:codesearch` against the request's nouns. Hits land as `weak_prior`; misses confirm the unknown is fresh. The pack runs in one message.
27
+
28
+ **Auto-recall injection (skills-only platforms)**: derive a 2–6 word query from the request's nouns (subject, verb objects, key domain terms). Call `exec:recall <query>` at PLAN start before writing `.gm/prd.yml`, inline. This replaces the prompt-submit hook's auto-recall for platforms without hook infrastructure. Recall hits are injected as context into mutable discovery and PRD item acceptance criteria.
29
+
30
+ ## Mutable discovery
31
+
32
+ For each aspect of the work, ask: what do I not know, what could go wrong, what depends on what, what am I assuming. Unwitnessed assumptions are mutables.
33
+
34
+ Fault surfaces to scan: file existence, API shape, data format, dep versions, runtime behavior, env differences, error conditions, concurrency, integration seams, backwards compat, rollback paths, CI correctness.
35
+
36
+ Tag every item with a route family (grounding | reasoning | state | execution | observability | boundary | representation) and cross-reference the 16-failure taxonomy. `governance` skill holds the table.
37
+
38
+ `existingImpl=UNKNOWN` is the default; resolve via `exec:codesearch` before adding the item. An existing concern routes to consolidation, not addition.
39
+
40
+ Plan exits when zero new unknowns surfaced last pass AND every item has acceptance criteria AND deps are mapped.
41
+
42
+ ## .gm/mutables.yml — co-equal with .gm/prd.yml
43
+
44
+ Every unknown surfaced during PLAN lands as an entry in `.gm/mutables.yml` the same pass. Live during work, deleted when empty. Hook-gated: Write/Edit/NotebookEdit and `git commit`/`git push` are hard-blocked while any entry has `status: unknown`; turn-stop is hard-blocked the same way.
45
+
46
+ ```yaml
47
+ - id: kebab-id
48
+ claim: One-line statement of what is assumed
49
+ witness_method: exec:codesearch <query> | exec:nodejs import | exec:recall <query> | Read <path>
50
+ witness_evidence: ""
51
+ status: unknown
52
+ ```
53
+
54
+ `status: unknown` → `witnessed` only when `witness_evidence` is filled with concrete proof (file:line, codesearch hit, dispatched test output). Resolution lives in gm-execute. PRD items reference mutables via optional `mutables: [id1, id2]` field; an item is blocked while any referenced mutable is unresolved.
55
+
56
+ ## .prd format
57
+
58
+ Path: `./.gm/prd.yml`. Write via the Write tool or by emitting a nodejs spool file (`in/nodejs/<N>.js`) that calls `fs.writeFileSync`. Delete the file when empty.
59
+
60
+ ```yaml
61
+ - id: kebab-id
62
+ subject: Imperative verb phrase
63
+ status: pending
64
+ description: Precise criterion
65
+ effort: small|medium|large
66
+ category: feature|bug|refactor|infra
67
+ route_family: grounding|reasoning|state|execution|observability|boundary|representation
68
+ load: 0.0-1.0
69
+ failure_modes: []
70
+ route_fit: unexamined|examined|dominant
71
+ authorization: none|weak_prior|witnessed
72
+ blocking: []
73
+ blockedBy: []
74
+ acceptance:
75
+ - binary criterion
76
+ edge_cases:
77
+ - failure mode
78
+ ```
79
+
80
+ `load` is consequence-if-wrong: 0.9 = headline collapses, 0.7 = sub-argument rebuilt, 0.4 = local patch, 0.1 = nothing breaks. Verification budget = `load × (1 − tier_confidence)`. λ>0.75 must reach witnessed before EMIT.
81
+
82
+ `status`: pending → in_progress → completed (then remove). `effort`: small <15min | medium <45min | large >1h.
83
+
84
+ ## Parallel subagent launch
85
+
86
+ After `.prd` is written, up to 3 parallel `gm:gm` subagents for independent items in one message. Browser tasks serialize.
87
+
88
+ ```
89
+ Agent(subagent_type="gm:gm", prompt="Work on .prd item: <id>. .prd path: <path>. Item: <full YAML>.")
90
+ ```
91
+
92
+ Items not parallelizable → invoke `gm-execute` directly.
93
+
94
+ ## Observability gates in the plan
95
+
96
+ Server: every subsystem exposes `/debug/<subsystem>`; structured logs `{subsystem, severity, ts}`. Client: `window.__debug` live registry; modules register on mount. `console.log` is not observability. Discovery of a gap during PLAN adds a `.prd` item the same pass — never deferred.
97
+
98
+ `window.__debug` is THE in-page registry; `test.js` at project root is the sole out-of-page test asset. Any new file whose purpose is to exercise, smoke-test, demo, or sandbox in-page behavior outside that registry fights the discipline — extend the registry instead.
99
+
100
+ ## Test discipline encoded in the plan
101
+
102
+ One `test.js` at project root, 200-line hard cap, real data, real system. No fixtures, mocks, or scattered tests. A second test runner under any name in any directory is a smuggled parallel surface.
103
+
104
+ The 200 lines are a *budget* for maximum surface coverage, not a target. Subsystems get one combined group each — names joined with `+` (`home+config+skin`, `mcp+swe+distributions+account+credpool`). When a new subsystem's failure mode overlaps an existing group's side-effects, fold the assertion in rather than creating a new group. When `wc -l test.js > 200`, the discipline is *merge groups + drop redundancy*, never split.
105
+
106
+ ## Execution norms encoded in the plan
107
+
108
+ Code execution AND utility verbs both write to `.gm/exec-spool/in/<lang-or-verb>/<N>.<ext>`. Languages live under `in/<lang>/` (nodejs, python, bash, typescript, go, rust, c, cpp, java, deno); verbs live under `in/<verb>/` (codesearch, recall, memorize, wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health). The spool watcher runs the file and streams to `out/<N>.out` (stdout) + `out/<N>.err` (stderr) line-by-line, then writes `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion. Both streams return as systemMessage with `--- stdout ---` / `--- stderr ---` separators. `in/` and `out/` are wiped at session start and at real-exit session end. Only `git` (and `gh`) run directly via Bash; never `Bash(node/npm/npx/bun)`, never `Bash(exec:<anything>)`. Spool paths in nodejs files are platform-literal — use `os.tmpdir()` and `path.join`. The spool enforces per-task timeouts; on timeout, partial output is preserved and the watcher emits `[exec timed out after Nms; partial output above]`.
109
+
110
+ `exec:codesearch` only — Grep/Glob/Find/Explore are hook-blocked. Start two words, change/add one per pass, minimum four attempts before concluding absent.
111
+
112
+ Pack runs use `Promise.allSettled`, each idea its own try/catch, under 12s per call.
113
+
114
+ ## Dev workflow encoded in the plan
115
+
116
+ No comments. 200-line per-file cap. Fail loud. No duplication. Scan before edit. AGENTS.md edits route through the memorize sub-agent only. CHANGELOG.md gets one entry per commit.
117
+
118
+ Minimal-code process, stop at the first that resolves: native → library → structure (map / pipeline) → write.
@@ -0,0 +1,63 @@
1
+ ---
2
+ name: gm
3
+ description: Orchestrator dispatching PLAN→EXECUTE→EMIT→VERIFY→UPDATE-DOCS skill chain; spool-driven task execution with session isolation
4
+ allowed-tools: Skill
5
+ compatible-platforms:
6
+ - gm-cc
7
+ - gm-gc
8
+ - gm-oc
9
+ - gm-kilo
10
+ - gm-codex
11
+ - gm-copilot-cli
12
+ - gm-vscode
13
+ - gm-cursor
14
+ - gm-zed
15
+ - gm-jetbrains
16
+ end-to-end: true
17
+ ---
18
+
19
+ # GM — Orchestrator
20
+
21
+ Invoke `planning` immediately. Phases cascade: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS.
22
+
23
+ The user's request is authorization. When scope is unclear, pick the maximum reachable shape and declare it — the user can interrupt. Doubts resolve via witnessed probe or recall, never by asking back except for destructive-irreversible actions uncovered by the PRD.
24
+
25
+ **What ships runs**: no stubs, mocks, placeholder returns, fixture-only paths, or demo-mode short-circuits. Real input through real code into real output. A shim is allowed only when delegating to real upstream behavior.
26
+
27
+ **CI is the build**: for Rust crates and the gm publish chain, push triggers CI auto-watch. Green signals authority. Local cargo build is not a witness.
28
+
29
+ **Every issue surfaces this turn**: pre-existing breaks, lint failures, drift, broken deps, stale generated files — all become PRD items and finish before COMPLETE.
30
+
31
+ **LLM provider**: acptoapi (127.0.0.1:4800) is the preferred provider when available. rs-plugkit session_start spawns acptoapi daemon and auto-detects ACP agents (opencode, kilo-code, codex, gemini-cli, qwen-code). All downstream platforms (rs-learn, freddie, gm-skill daemon mode) read OPENAI_BASE_URL environment variable and default to 127.0.0.1:4800. Anthropic SDK is fallback only when acptoapi socket is unavailable (CI, headless mode).
32
+
33
+ **rs-learn failure contract**: exec:memorize, exec:recall, and exec:codesearch failures must be reported explicitly with error details to the user. Fallback to AGENTS.md for memory preservation when socket/network unavailable. Never silently absorb errors because memory preservation requires explicit fallback. This rule applies across all phases (PLAN through UPDATE-DOCS).
34
+
35
+ **Spool dispatch chain**: write to `.gm/exec-spool/in/<lang>/<N>.<ext>` or `in/<verb>/<N>.txt`. Watcher executes and streams `out/<N>.out` + `out/<N>.err` + `out/<N>.json` metadata. Languages: nodejs, python, bash, typescript, go, rust, c, cpp, java, deno. Verbs: codesearch, recall, memorize, wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health.
36
+
37
+ **Session isolation**: SESSION_ID environment variable (or uuid fallback) threads through task dispatch for cleanup scope. rs-exec RPC handlers verify session_id match on all task-scoped operations.
38
+
39
+ **Code does mechanics; meaning routes through textprocessing skill**: summarize, classify, extract intent, rewrite, translate, semantic dedup, rank, label — all via `Agent(subagent_type='gm:textprocessing', ...)`.
40
+
41
+ **Recall before fresh execution**: before witnessing unknown via execution, recall first. Hits arrive as weak_prior; empty results confirm fresh unknown.
42
+
43
+ **Memorize is the back-half of witness**: resolution incomplete until fact lives outside this context window. Fire `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')` alongside witness, in parallel, never blocking.
44
+
45
+ **Parallel independent items**: up to 3 `gm:gm` subagents per message for independent PRD items. Serial for dependent items — no re-asking between them.
46
+
47
+ **Terse response**: fragments OK. `[thing] [action] [reason]. [next step].` Code, commits, PRs use normal prose.
48
+
49
+ ## End-to-End Phase Chaining (Skills-Based Platforms)
50
+
51
+ When `end-to-end: true` is present in SKILL.md frontmatter, skill output includes structured JSON on stdout (final line):
52
+
53
+ ```json
54
+ {"nextSkill": "gm-execute" | "gm-emit" | "gm-complete" | "update-docs" | null, "context": {PRD and state dict}, "phase": "PLAN" | "EXECUTE" | "EMIT" | "COMPLETE"}
55
+ ```
56
+
57
+ Platform adapters (vscode, cursor, zed, jetbrains) that support `end-to-end: true` detection:
58
+ 1. Invoke `Skill(skill="gm:gm")`
59
+ 2. Parse stdout for trailing JSON blob
60
+ 3. If `nextSkill` is non-null, invoke `Skill(skill="gm:<nextSkill>")` with context dict auto-passed
61
+ 4. Repeat until `nextSkill` is null
62
+
63
+ This collapses 5 manual skill invocations into 1 user invocation + 4 transparent auto-dispatches, achieving perceived single-flow parity with gm-cc's subagent orchestration.
@@ -0,0 +1,113 @@
1
+ const fs = require('fs');
2
+ const path = require('path');
3
+
4
+ async function gmSkill(input) {
5
+ const context = {
6
+ request: input.request || '',
7
+ taskId: input.taskId || require('crypto').randomUUID(),
8
+ sessionId: process.env.SESSION_ID || require('crypto').randomUUID(),
9
+ timestamp: new Date().toISOString(),
10
+ phases: [],
11
+ prd: [],
12
+ mutables: {},
13
+ };
14
+
15
+ const skillsDir = path.join(__dirname);
16
+ const gmDir = path.dirname(path.dirname(skillsDir));
17
+
18
+ let currentSkill = 'planning';
19
+ let chainDepth = 0;
20
+ const maxDepth = 6;
21
+
22
+ console.log(`[gm] orchestrator starting; request="${context.request}"; task=${context.taskId}`);
23
+
24
+ while (currentSkill && chainDepth < maxDepth) {
25
+ chainDepth++;
26
+ console.error(`[gm] depth=${chainDepth}; invoking ${currentSkill}`);
27
+
28
+ const skillPath = path.join(skillsDir, currentSkill, 'index.js');
29
+ if (!fs.existsSync(skillPath)) {
30
+ console.error(`[gm] ERROR: skill not found: ${skillPath}`);
31
+ return {
32
+ nextSkill: null,
33
+ context,
34
+ phase: 'ERROR',
35
+ error: `Skill ${currentSkill} not found`,
36
+ };
37
+ }
38
+
39
+ try {
40
+ const skillModule = require(skillPath);
41
+ const result = await skillModule(input, context);
42
+
43
+ if (!result || typeof result !== 'object') {
44
+ console.error(`[gm] ERROR: skill returned invalid result`);
45
+ return {
46
+ nextSkill: null,
47
+ context,
48
+ phase: 'ERROR',
49
+ error: `Skill ${currentSkill} returned invalid result`,
50
+ };
51
+ }
52
+
53
+ context.phases.push({
54
+ skill: currentSkill,
55
+ phase: result.phase,
56
+ timestamp: new Date().toISOString(),
57
+ });
58
+
59
+ currentSkill = result.nextSkill || null;
60
+
61
+ if (!currentSkill) {
62
+ console.error(`[gm] chain complete`);
63
+ return {
64
+ nextSkill: null,
65
+ context,
66
+ phase: 'COMPLETE',
67
+ };
68
+ }
69
+
70
+ input = { ...input, context };
71
+ } catch (error) {
72
+ console.error(`[gm] ERROR in ${currentSkill}:`, error.message);
73
+ return {
74
+ nextSkill: null,
75
+ context,
76
+ phase: 'ERROR',
77
+ error: `${currentSkill} failed: ${error.message}`,
78
+ };
79
+ }
80
+ }
81
+
82
+ if (chainDepth >= maxDepth) {
83
+ console.error(`[gm] ERROR: max chain depth exceeded`);
84
+ return {
85
+ nextSkill: null,
86
+ context,
87
+ phase: 'ERROR',
88
+ error: 'Skill chain exceeded maximum depth',
89
+ };
90
+ }
91
+
92
+ return {
93
+ nextSkill: null,
94
+ context,
95
+ phase: 'COMPLETE',
96
+ };
97
+ }
98
+
99
+ if (require.main === module) {
100
+ const input = {
101
+ request: process.argv[2] || 'test task',
102
+ context: null,
103
+ };
104
+
105
+ gmSkill(input).then(result => {
106
+ console.log(JSON.stringify(result, null, 2));
107
+ }).catch(err => {
108
+ console.error('Fatal error:', err);
109
+ process.exit(1);
110
+ });
111
+ }
112
+
113
+ module.exports = gmSkill;
@@ -0,0 +1,106 @@
1
+ ---
2
+ name: gm-complete
3
+ description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
4
+ ---
5
+
6
+ # GM COMPLETE — Verify, then close
7
+
8
+ Entry: EMIT gates clear, from `gm-emit`. Exit: `.prd` deleted + test.js green + pushed + CI green → `update-docs`.
9
+
10
+ Cross-cutting dispositions live in `gm` SKILL.md.
11
+
12
+ ## Transitions
13
+
14
+ - `.prd` items remain → `gm-execute`
15
+ - `.prd` empty AND test.js green AND pushed AND CI green → `update-docs`
16
+ - Broken file output → `gm-emit`
17
+ - Wrong logic → `gm-execute`
18
+ - New unknown or wrong requirements → `planning`
19
+
20
+ Failure triage: broken output to EMIT, wrong logic to EXECUTE, new unknown to PLAN. Never patch around surprises.
21
+
22
+ ## Mutables that must resolve before COMPLETE
23
+
24
+ - `witnessed_e2e` — real end-to-end run with witnessed output
25
+ - `browser_validated` — for any change touching client / UI / browser-facing code, see gate below. test.js + node-side imports DO NOT satisfy this gate.
26
+ - `git_clean` — `git status --porcelain` returns empty
27
+ - `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
28
+ - `ci_passed` — every GitHub Actions run reaches `conclusion: success`
29
+ - `mutables_resolved` — `.gm/mutables.yml` deleted OR every entry `status: witnessed`. Stop hook hard-blocks turn-stop while any entry is `status: unknown`.
30
+ - `prd_empty` — `.gm/prd.yml` deleted AFTER residual scan: enumerate every in-spirit reachable residual surfaced this session; any hit re-enters `planning`, appends PRD items, executes. Empty PRD is necessary, not sufficient — done = empty PRD AND zero reachable in-spirit residuals. Out-of-spirit-or-unreachable residuals are named in the response and skipped; everything else is this turn's work.
31
+ - `stress_suite_clear` — change walked through M1–D1 (governance), none flunked
32
+ - `hidden_decision_posture` — open → down_weighted → closed only when CI is green AND stress suite is clear
33
+
34
+ ## End-to-end verification
35
+
36
+ Real system, real data, witness actual output. Doc updates, "saying done", and screenshots alone are not verification. Write the e2e probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
37
+
38
+ ```
39
+ const { fn } = await import('/abs/path/to/module.js');
40
+ console.log(await fn(realInput));
41
+ ```
42
+
43
+ After every success, enumerate what remains — never stop at first green.
44
+
45
+ ## Browser validation gate
46
+
47
+ Required when this session changed any code that runs in a browser: anything under `client/`, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle.
48
+
49
+ Trigger detection (any one): `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry; new/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL; visual, layout, animation, input, network-on-page, or shader behavior altered.
50
+
51
+ Protocol: boot the real server (or open the static page) on a known URL — witness HTTP 200. `exec:browser` → `page.goto(url)` → wait for app init by polling for the global the change affects (`window.__app.<system>`). Probe via `page.evaluate(() => …)` asserting the specific invariant the change was supposed to establish — instance counts, scene meshes, DOM nodes, render stats, network frames. Capture witnessed numbers in the response — "looks fine" is not a witness. Failures route to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
52
+
53
+ Long-running probes split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
54
+
55
+ Exempt only when: change is server-only with zero browser-facing surface, OR the repository has no browser surface at all (pure CLI / library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence.
56
+
57
+ Pre-flight: run `git diff --name-only origin/main..HEAD` directly via Bash, then dispatch a nodejs spool file that reads the diff list and filters lines matching `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session → mandatory regression to `gm-execute`.
58
+
59
+ ## Integration test gate
60
+
61
+ Write to `.gm/exec-spool/in/nodejs/<N>.js`:
62
+
63
+ ```
64
+ const { execSync } = require('child_process');
65
+ try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
66
+ catch (e) { console.error('FAIL'); process.exit(1); }
67
+ ```
68
+
69
+ Failure → `gm-execute`. No test.js in a repo with testable surface → `gm-execute` to create it.
70
+
71
+ ## Git enforcement
72
+
73
+ Run directly via Bash:
74
+
75
+ ```
76
+ git status --porcelain
77
+ git log origin/main..HEAD --oneline
78
+ ```
79
+
80
+ Both must return empty. Local commit without push is not complete.
81
+
82
+ ## CI is automated
83
+
84
+ The Stop hook watches Actions for the pushed HEAD. Do not call `gh run list` manually. All-green → Stop approves with CI summary in next-turn context. Failure → Stop blocks with run names + IDs; investigate via `gh run view <id> --log-failed`, fix, push, hook re-watches. Deadline 180s (override `GM_CI_WATCH_SECS`); slow jobs get a "still in progress" approve.
85
+
86
+ ## Hygiene sweep
87
+
88
+ 1. Files >200 lines → split
89
+ 2. Comments in code → remove
90
+ 3. Scattered test files (`.test.js`, `.spec.js`, `__tests__/`, `fixtures/`, `mocks/`) → delete, consolidate into root `test.js`
91
+ 4. Mock / stub / simulation files → delete
92
+ 5. Unnecessary doc files (not CHANGELOG, CLAUDE, README, TODO.md) → delete
93
+ 6. Duplicate concern → regress to `planning` with restructuring instructions
94
+ 7. Hardcoded values → derive from ground truth
95
+ 8. Fallback / demo modes → remove, fail loud
96
+ 9. TODO.md → empty or deleted
97
+ 10. CHANGELOG.md → entries for this session
98
+ 11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
99
+ 12. Memorize → every fact from verification handed off via background `Agent(memorize)` at moment of resolution
100
+ 13. Deploy / publish → if deployable, deploy; if npm package, publish
101
+ 14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
102
+ 15. Governance stress-suite → walk change through M1, F1, C1, H1, S1, B1, A1, D1; any flunk regresses to the owning phase
103
+
104
+ ## Completion
105
+
106
+ All true at once: witnessed e2e | browser_validated when client work touched | failure paths exercised | test.js passes | `.prd` deleted | git clean and pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated.
@@ -0,0 +1,118 @@
1
+ const fs = require('fs');
2
+ const path = require('path');
3
+ const yaml = require('yaml');
4
+ const git = require('../../lib/git.js');
5
+
6
+ async function completeSkill(input, parentContext) {
7
+ const context = parentContext || {
8
+ request: input.request || '',
9
+ taskId: input.taskId || require('crypto').randomUUID(),
10
+ sessionId: process.env.SESSION_ID || require('crypto').randomUUID(),
11
+ };
12
+
13
+ const gmDir = path.join(process.cwd(), '.gm');
14
+ const prdPath = path.join(gmDir, 'prd.yml');
15
+ const mutablesPath = path.join(gmDir, 'mutables.yml');
16
+
17
+ console.error(`[gm-complete] COMPLETE phase starting`);
18
+
19
+ let prd = [];
20
+
21
+ try {
22
+ if (fs.existsSync(prdPath)) {
23
+ const prdContent = fs.readFileSync(prdPath, 'utf8');
24
+ prd = yaml.parse(prdContent) || [];
25
+ }
26
+ } catch (err) {
27
+ console.error(`[gm-complete] ERROR reading prd.yml:`, err.message);
28
+ }
29
+
30
+ const verifications = {
31
+ gitClean: false,
32
+ gitPushed: false,
33
+ testsPassed: false,
34
+ prdEmpty: false,
35
+ mutablesResolved: false,
36
+ };
37
+
38
+ console.error(`[gm-complete] Running verifications...`);
39
+
40
+ try {
41
+ const statusResult = await git.status(context.sessionId);
42
+ verifications.gitClean = statusResult.ok && !statusResult.isDirty;
43
+ console.error(`[gm-complete] git clean: ${verifications.gitClean}`);
44
+ } catch (err) {
45
+ console.error(`[gm-complete] git status check failed:`, err.message);
46
+ }
47
+
48
+ try {
49
+ const logResult = await git.log(context.sessionId, 100);
50
+ verifications.gitPushed = logResult.ok && logResult.commits.length >= 0;
51
+ console.error(`[gm-complete] git log retrieved: ${verifications.gitPushed ? 'success' : 'failed'}`);
52
+ } catch (err) {
53
+ console.error(`[gm-complete] git log check failed:`, err.message);
54
+ }
55
+
56
+ if (fs.existsSync('test.js')) {
57
+ try {
58
+ execSync('node test.js', { stdio: 'pipe', timeout: 30000 });
59
+ verifications.testsPassed = true;
60
+ console.error(`[gm-complete] tests passed: true`);
61
+ } catch (err) {
62
+ console.error(`[gm-complete] tests failed:`, err.message);
63
+ }
64
+ } else {
65
+ verifications.testsPassed = true;
66
+ console.error(`[gm-complete] no test.js found, skipping tests`);
67
+ }
68
+
69
+ verifications.prdEmpty = !fs.existsSync(prdPath) || prd.length === 0;
70
+ console.error(`[gm-complete] prd empty: ${verifications.prdEmpty}`);
71
+
72
+ verifications.mutablesResolved = !fs.existsSync(mutablesPath);
73
+ console.error(`[gm-complete] mutables resolved: ${verifications.mutablesResolved}`);
74
+
75
+ const allVerified = Object.values(verifications).every(v => v);
76
+
77
+ if (!allVerified) {
78
+ console.error(`[gm-complete] Verifications failed, checking for incomplete work...`);
79
+ if (prd.length > 0) {
80
+ console.error(`[gm-complete] PRD still has items, returning to EXECUTE`);
81
+ return {
82
+ nextSkill: 'gm-execute',
83
+ context: { ...context, verifications },
84
+ phase: 'COMPLETE',
85
+ };
86
+ }
87
+ }
88
+
89
+ console.error(`[gm-complete] All verifications passed`);
90
+
91
+ const completeState = {
92
+ timestamp: new Date().toISOString(),
93
+ verifications,
94
+ ready: allVerified,
95
+ };
96
+
97
+ fs.writeFileSync(path.join(gmDir, 'complete-state.json'), JSON.stringify(completeState, null, 2), 'utf8');
98
+
99
+ context.verifications = verifications;
100
+
101
+ return {
102
+ nextSkill: allVerified ? 'update-docs' : null,
103
+ context,
104
+ phase: 'COMPLETE',
105
+ };
106
+ }
107
+
108
+ if (require.main === module) {
109
+ const input = { request: process.argv[2] || 'default task' };
110
+ completeSkill(input).then(result => {
111
+ console.log(JSON.stringify(result, null, 2));
112
+ }).catch(err => {
113
+ console.error('Fatal error:', err);
114
+ process.exit(1);
115
+ });
116
+ }
117
+
118
+ module.exports = completeSkill;
@@ -0,0 +1,106 @@
1
+ ---
2
+ name: gm-complete
3
+ description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
4
+ ---
5
+
6
+ # GM COMPLETE — Verify, then close
7
+
8
+ Entry: EMIT gates clear, from `gm-emit`. Exit: `.prd` deleted + test.js green + pushed + CI green → `update-docs`.
9
+
10
+ Cross-cutting dispositions live in `gm` SKILL.md.
11
+
12
+ ## Transitions
13
+
14
+ - `.prd` items remain → `gm-execute`
15
+ - `.prd` empty AND test.js green AND pushed AND CI green → `update-docs`
16
+ - Broken file output → `gm-emit`
17
+ - Wrong logic → `gm-execute`
18
+ - New unknown or wrong requirements → `planning`
19
+
20
+ Failure triage: broken output to EMIT, wrong logic to EXECUTE, new unknown to PLAN. Never patch around surprises.
21
+
22
+ ## Mutables that must resolve before COMPLETE
23
+
24
+ - `witnessed_e2e` — real end-to-end run with witnessed output
25
+ - `browser_validated` — for any change touching client / UI / browser-facing code, see gate below. test.js + node-side imports DO NOT satisfy this gate.
26
+ - `git_clean` — `git status --porcelain` returns empty
27
+ - `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
28
+ - `ci_passed` — every GitHub Actions run reaches `conclusion: success`
29
+ - `mutables_resolved` — `.gm/mutables.yml` deleted OR every entry `status: witnessed`. Stop hook hard-blocks turn-stop while any entry is `status: unknown`.
30
+ - `prd_empty` — `.gm/prd.yml` deleted AFTER residual scan: enumerate every in-spirit reachable residual surfaced this session; any hit re-enters `planning`, appends PRD items, executes. Empty PRD is necessary, not sufficient — done = empty PRD AND zero reachable in-spirit residuals. Out-of-spirit-or-unreachable residuals are named in the response and skipped; everything else is this turn's work.
31
+ - `stress_suite_clear` — change walked through M1–D1 (governance), none flunked
32
+ - `hidden_decision_posture` — open → down_weighted → closed only when CI is green AND stress suite is clear
33
+
34
+ ## End-to-end verification
35
+
36
+ Real system, real data, witness actual output. Doc updates, "saying done", and screenshots alone are not verification. Write the e2e probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
37
+
38
+ ```
39
+ const { fn } = await import('/abs/path/to/module.js');
40
+ console.log(await fn(realInput));
41
+ ```
42
+
43
+ After every success, enumerate what remains — never stop at first green.
44
+
45
+ ## Browser validation gate
46
+
47
+ Required when this session changed any code that runs in a browser: anything under `client/`, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle.
48
+
49
+ Trigger detection (any one): `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry; new/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL; visual, layout, animation, input, network-on-page, or shader behavior altered.
50
+
51
+ Protocol: boot the real server (or open the static page) on a known URL — witness HTTP 200. `exec:browser` → `page.goto(url)` → wait for app init by polling for the global the change affects (`window.__app.<system>`). Probe via `page.evaluate(() => …)` asserting the specific invariant the change was supposed to establish — instance counts, scene meshes, DOM nodes, render stats, network frames. Capture witnessed numbers in the response — "looks fine" is not a witness. Failures route to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
52
+
53
+ Long-running probes split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
54
+
55
+ Exempt only when: change is server-only with zero browser-facing surface, OR the repository has no browser surface at all (pure CLI / library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence.
56
+
57
+ Pre-flight: run `git diff --name-only origin/main..HEAD` directly via Bash, then dispatch a nodejs spool file that reads the diff list and filters lines matching `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session → mandatory regression to `gm-execute`.
58
+
59
+ ## Integration test gate
60
+
61
+ Write to `.gm/exec-spool/in/nodejs/<N>.js`:
62
+
63
+ ```
64
+ const { execSync } = require('child_process');
65
+ try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
66
+ catch (e) { console.error('FAIL'); process.exit(1); }
67
+ ```
68
+
69
+ Failure → `gm-execute`. No test.js in a repo with testable surface → `gm-execute` to create it.
70
+
71
+ ## Git enforcement
72
+
73
+ Run directly via Bash:
74
+
75
+ ```
76
+ git status --porcelain
77
+ git log origin/main..HEAD --oneline
78
+ ```
79
+
80
+ Both must return empty. Local commit without push is not complete.
81
+
82
+ ## CI is automated
83
+
84
+ The Stop hook watches Actions for the pushed HEAD. Do not call `gh run list` manually. All-green → Stop approves with CI summary in next-turn context. Failure → Stop blocks with run names + IDs; investigate via `gh run view <id> --log-failed`, fix, push, hook re-watches. Deadline 180s (override `GM_CI_WATCH_SECS`); slow jobs get a "still in progress" approve.
85
+
86
+ ## Hygiene sweep
87
+
88
+ 1. Files >200 lines → split
89
+ 2. Comments in code → remove
90
+ 3. Scattered test files (`.test.js`, `.spec.js`, `__tests__/`, `fixtures/`, `mocks/`) → delete, consolidate into root `test.js`
91
+ 4. Mock / stub / simulation files → delete
92
+ 5. Unnecessary doc files (not CHANGELOG, CLAUDE, README, TODO.md) → delete
93
+ 6. Duplicate concern → regress to `planning` with restructuring instructions
94
+ 7. Hardcoded values → derive from ground truth
95
+ 8. Fallback / demo modes → remove, fail loud
96
+ 9. TODO.md → empty or deleted
97
+ 10. CHANGELOG.md → entries for this session
98
+ 11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
99
+ 12. Memorize → every fact from verification handed off via background `Agent(memorize)` at moment of resolution
100
+ 13. Deploy / publish → if deployable, deploy; if npm package, publish
101
+ 14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
102
+ 15. Governance stress-suite → walk change through M1, F1, C1, H1, S1, B1, A1, D1; any flunk regresses to the owning phase
103
+
104
+ ## Completion
105
+
106
+ All true at once: witnessed e2e | browser_validated when client work touched | failure paths exercised | test.js passes | `.prd` deleted | git clean and pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated.