gm-copilot-cli 2.0.187 → 2.0.189

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,57 +1,70 @@
1
1
  ---
2
2
  name: gm-execute
3
- description: EXECUTE phase. Hypothesis proving, chain decomposition, import-based debugging, browser protocols, ground truth enforcement. Invoke when entering EXECUTE or snaking back from EMIT/VERIFY.
3
+ description: EXECUTE phase. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning restart chain from PLAN.
4
4
  ---
5
5
 
6
6
  # GM EXECUTE — Resolving Every Unknown
7
7
 
8
- You are in the **EXECUTE** phase. Every mutable must resolve to KNOWN via witnessed execution before advancing.
8
+ You are in the **EXECUTE** phase. Resolve every named mutable via witnessed execution. Any new unknown = stop, snake to `planning`, restart chain.
9
9
 
10
10
  **GRAPH POSITION**: `PLAN → [EXECUTE] → EMIT → VERIFY → COMPLETE`
11
- - **Entry chain**: prompt-submit hook `gm` skill `planning` `gm-execute` (here). Also entered via snake from EMIT or VERIFY.
11
+ - **Entry**: .prd exists with all unknowns named. Entered from `planning` or via snake from EMIT/VERIFY.
12
12
 
13
13
  ## TRANSITIONS
14
14
 
15
- **FORWARD (ladders)**:
16
- - All mutables resolved to KNOWN → invoke `gm-emit` skill
15
+ **FORWARD**: All mutables KNOWN → invoke `gm-emit` skill
17
16
 
18
- **BACKWARD (snakes) when to re-enter here**:
19
- - From EMIT: pre-emit debugging reveals logic error, hypothesis was wrong → snake back, re-run execution with corrected approach
20
- - From VERIFY: end-to-end debugging reveals runtime failure not caught in execution → snake back, re-execute with real system state
21
- - Self-loop: mutables still UNKNOWN after a pass → re-invoke `gm-execute` with broader debug scope. Never add stages.
17
+ **SELF-LOOP**: Mutable still UNKNOWN after one pass → re-run with different angle (max 2 passes then snake)
22
18
 
23
- **WHEN TO SNAKE BACK TO PLAN instead**: discovered hidden dependencies that require .prd restructure → invoke `planning` skill
24
-
25
- **Sub-skills** (invoke from within EXECUTE):
26
- - Code explorationinvoke `code-search` skill
27
- - Browser/UI debugging → invoke `agent-browser` skill
28
- - Servers/workers/daemons → invoke `process-management` skill
19
+ **BACKWARD**:
20
+ - New unknown discovered → invoke `planning` skill immediately, restart chain
21
+ - From EMIT: logic error → re-enter here, re-resolve mutable
22
+ - From VERIFY: runtime failure re-enter here, re-resolve with real system state
29
23
 
30
24
  ## MUTABLE DISCIPLINE
31
25
 
32
- Enumerate every unknown as a named mutable. Each: name, expected value, current value, resolution method. Execute → witness → assign → compare zero variance = resolved. Unresolved = absolute barrier. Never narrate past an unresolved mutable. Trigger a snake if stuck.
26
+ Each mutable: name | expected | current | resolution method. Execute → witness → assign → compare. Zero variance = resolved. Unresolved after 2 passes = new unknown = snake to `planning`. Never narrate past an unresolved mutable.
33
27
 
34
- ## EXECUTION DENSITY
28
+ ## CODE EXECUTION
35
29
 
36
- Each run ≤15s, packed with every related hypothesis. Group all related unknowns into one run. Never one idea per run. Witnessed output = ground truth. Narrated assumption = nothing.
30
+ **exec:<lang> is the only way to run code.** Bash tool body: `exec:<lang>\n<code>`
37
31
 
38
- **Parallel waves**: Launch ≤3 `gm:gm` subagents per wave via Task tool. Independent items simultaneously. Sequential execution of independent items = violation.
32
+ `exec:nodejs` (default) | `exec:bash` | `exec:python` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:c` | `exec:cpp` | `exec:java` | `exec:deno` | `exec:cmd`
39
33
 
40
- ## CHAIN DECOMPOSITION
34
+ Lang auto-detected if omitted. `cwd` sets directory. File I/O via exec:nodejs + require('fs'). Only git in bash directly. `Bash(node/npm/npx/bun)` = violations.
41
35
 
42
- Break every multi-step operation before running end-to-end:
43
- 1. Number every distinct step (parse → validate → transform → write → confirm)
44
- 2. Per step: input shape, output shape, success condition, failure condition
45
- 3. Run step 1 in isolation → witness → assign mutable → proceed only when KNOWN
46
- 4. Run step 2 with step 1's witnessed output. Repeat for each step.
47
- 5. Debug adjacent pairs (1+2, 2+3...) for handoff correctness
48
- 6. Only after all pairs pass: run full chain
36
+ **Background tasks** (auto-backgrounded when execution exceeds 15s):
37
+ ```
38
+ exec:sleep
39
+ <task_id> [seconds]
40
+ ```
41
+ ```
42
+ exec:status
43
+ <task_id>
44
+ ```
45
+ ```
46
+ exec:close
47
+ <task_id>
48
+ ```
49
49
 
50
- Step failure debug that step only, re-run from there. Never skip forward.
50
+ **Runner** (PM2-backed all activity visible in `pm2 list` and `pm2 monit` in user terminal):
51
+ ```
52
+ exec:runner
53
+ start|stop|status
54
+ ```
55
+
56
+ ## CODEBASE EXPLORATION
57
+
58
+ ```
59
+ exec:codesearch
60
+ <natural language description of what you need>
61
+ ```
62
+
63
+ Alias: `exec:search`. Glob, Grep, Read-for-discovery, Explore, WebSearch = blocked.
51
64
 
52
65
  ## IMPORT-BASED DEBUGGING
53
66
 
54
- Always import actual codebase modules. Never rewrite logic inline — that debugs your reimplementation, not the real code.
67
+ Always import actual codebase modules. Never rewrite logic inline.
55
68
 
56
69
  ```
57
70
  exec:nodejs
@@ -61,41 +74,48 @@ console.log(await fn(realInput));
61
74
 
62
75
  Witnessed import output = resolved mutable. Reimplemented output = UNKNOWN.
63
76
 
64
- ## TOOL REFERENCE
77
+ ## EXECUTION DENSITY
65
78
 
66
- **`exec:<lang>`** THE ONLY WAY TO RUN CODE. Bash tool body: `exec:<lang>\n<code>`. Languages: `exec:nodejs` (default) | `exec:python` | `exec:bash` | `exec:typescript` | `exec:go` | `exec:rust` | `exec:c` | `exec:cpp` | `exec:java` | `exec:deno` | `exec:cmd`. `cwd` sets directory. File I/O via exec:nodejs with require('fs'). Only git directly in bash.
79
+ Pack every related hypothesis into one run. Each run ≤15s. Witnessed output = ground truth. Narrated assumption = nothing.
67
80
 
68
- `Bash(node ...)` `Bash(npm ...)` `Bash(npx ...)` `Bash(bun ...)` = violations. Use `exec:<lang>`.
81
+ Parallel waves: ≤3 `gm:gm` subagents via Task tool independent items simultaneously, never sequentially.
69
82
 
70
- **`code-search`** Invoke `code-search` skill. MANDATORY for all exploration. Glob/Grep/Read/Explore/WebSearch blocked. Fallback: `bun x codebasesearch <query>`.
83
+ ## CHAIN DECOMPOSITION
84
+
85
+ Break every multi-step operation before running end-to-end:
86
+ 1. Number every distinct step
87
+ 2. Per step: input shape, output shape, success condition, failure mode
88
+ 3. Run each step in isolation — witness — assign mutable — KNOWN before next
89
+ 4. Debug adjacent pairs for handoff correctness
90
+ 5. Only when all pairs pass: run full chain end-to-end
71
91
 
72
- **`agent-browser`** Invoke `agent-browser` skill. Escalation: (1) `exec:agent-browser\n<js>` first (2) skill + `__gm` globals → (3) navigate/click → (4) screenshot last resort.
92
+ Step failure revealing new unknownsnake to `planning`.
73
93
 
74
- **`process-management`** Invoke `process-management` skill. MANDATORY for all servers/workers/daemons. Pre-check before start. Delete on completion.
94
+ ## BROWSER DEBUGGING
75
95
 
76
- ## BROWSER DEBUGGING SCAFFOLD
96
+ Invoke `agent-browser` skill. Escalation — exhaust each before advancing:
97
+ 1. `exec:agent-browser\n<js>` — query DOM/state. Always first.
98
+ 2. `agent-browser` skill + `__gm` globals — instrument and capture
99
+ 3. navigate/click/type — only when real events required
100
+ 4. screenshot — last resort
77
101
 
78
- Inject before any browser state assertion:
102
+ `__gm` scaffold:
79
103
  ```js
80
104
  window.__gm = { captures: [], log: (...a) => window.__gm.captures.push({t:Date.now(),a}), assert: (l,c) => { window.__gm.captures.push({l,pass:!!c,val:c}); return !!c; }, dump: () => JSON.stringify(window.__gm.captures,null,2) };
81
105
  ```
82
106
 
83
- ## DUAL-SIDE DEBUGGING
84
-
85
- Backend via `exec:nodejs`, frontend via `agent-browser` + `__gm`. Neither substitutes the other. Single-side = UNKNOWN mutable = blocked gate.
86
-
87
107
  ## GROUND TRUTH
88
108
 
89
- Real services, real API responses, real timing. On discovering mocks/fakes/stubs: delete immediately, implement real paths. No .test.js/.spec.js files. No mock files. Delete on discovery.
109
+ Real services, real data, real timing. Mocks/fakes/stubs = delete immediately. No .test.js/.spec.js. Delete on discovery.
90
110
 
91
111
  ## CONSTRAINTS
92
112
 
93
- **Never**: `Bash(node/npm/npx/bun/python)` | fake data | mock files | Glob/Grep/Explore for discovery | puppeteer/playwright | screenshot before JS exhausted | independent items sequentially
113
+ **Never**: `Bash(node/npm/npx/bun)` | fake data | mock files | Glob/Grep/Explore | sequential independent items | absorb surprises silently
94
114
 
95
- **Always**: import real modules | witness every hypothesis | delete mocks on discovery | fix immediately | snake back when blocked
115
+ **Always**: witness every hypothesis | import real modules | snake to planning on any new unknown | fix immediately on discovery
96
116
 
97
117
  ---
98
118
 
99
119
  **→ FORWARD**: All mutables KNOWN → invoke `gm-emit` skill.
100
- **↩ SNAKE to EXECUTE**: hypothesis wrong → re-invoke `gm-execute` with corrected approach.
101
- **↩ SNAKE to PLAN**: .prd needs restructure → invoke `planning` skill.
120
+ **↺ SELF-LOOP**: Still UNKNOWN → re-run (max 2 passes).
121
+ **↩ SNAKE to PLAN**: Any new unknown → invoke `planning` skill, restart chain.
@@ -1,82 +1,84 @@
1
1
  ---
2
2
  name: planning
3
- description: PRD construction for work planning. Compulsory in PLAN phase. Builds .prd file as frozen dependency graph of every possible work item before execution begins. Triggers on any new task, multi-step work, or when gm enters PLAN state.
3
+ description: Mutable discovery and PRD construction. Invoke at session start and any time new unknowns surface during execution. Loop until no new mutables are discovered.
4
4
  allowed-tools: Write
5
5
  ---
6
6
 
7
- # PRD Construction
7
+ # PRD Construction — Mutable Discovery Loop
8
8
 
9
- You are in the **PLAN** phase. Build the .prd before any execution begins.
9
+ You are in the **PLAN** phase. Your job is to discover every unknown before execution begins.
10
10
 
11
11
  **GRAPH POSITION**: `[PLAN] → EXECUTE → EMIT → VERIFY → COMPLETE`
12
- - **Session entry chain**: prompt-submit hook → `gm` skill → `planning` skill (here).
12
+ - **Entry chain**: prompt-submit hook → `gm` skill → `planning` skill (here).
13
+ - **Also entered**: any time a new unknown surfaces in EXECUTE, EMIT, or VERIFY.
13
14
 
14
15
  ## TRANSITIONS
15
16
 
16
- **FORWARD (ladders)**:
17
- - .prd written → invoke `gm-execute` skill to begin EXECUTE
17
+ **FORWARD**:
18
+ - No new mutables discovered in latest pass → .prd is complete → invoke `gm-execute` skill
18
19
 
19
- **BACKWARD (snakes) when to return here**:
20
- - From EXECUTE: discovered unknowns require .prd restructurere-invoke `planning` skill, revise .prd, re-enter EXECUTE
21
- - From EMIT: scope changed, current .prd items no longer match what needs to be done → re-invoke `planning` skill
22
- - From VERIFY: end-to-end reveals requirements were wrong re-invoke `planning` skill, rewrite affected items
20
+ **SELF-LOOP (stay in PLAN)**:
21
+ - Each planning pass may surface new unknownsadd them to .prd plan again
22
+ - Loop until a full pass produces zero new items
23
+ - Do not advance to EXECUTE while unknowns remain discoverable through reasoning alone
23
24
 
24
- **When to snake back to PLAN**: requirements changed | discovered hidden dependencies | .prd items are wrong/missing | scope expanded beyond current .prd
25
+ **BACKWARD (snakes back here from later phases)**:
26
+ - From EXECUTE: execution reveals an unknown not in .prd → snake here, add it, re-plan
27
+ - From EMIT: scope shifted mid-write → snake here, revise affected items, re-plan
28
+ - From VERIFY: end-to-end reveals requirement was wrong → snake here, rewrite items, re-plan
25
29
 
26
- ## Purpose
30
+ ## WHAT PLANNING MEANS
27
31
 
28
- The `.prd` is the single source of truth for remaining work. A frozen dependency graph capturing every possible item steps, substeps, edge cases, corner cases, dependencies, transitive dependencies, unknowns, assumptions, decisions, trade-offs, acceptance criteria, scenarios, failure paths, recovery paths, integration points, state transitions, error conditions, boundary conditions, configuration variants, environment differences, backwards compatibility, rollback paths, verification steps.
32
+ Planning = exhaustive mutable discovery. For every aspect of the task ask:
33
+ - What do I not know? → name it as a mutable
34
+ - What could go wrong? → name it as an edge case item
35
+ - What depends on what? → map blocking/blockedBy
36
+ - What assumptions am I making? → validate each as a mutable
29
37
 
30
- Longer is better. Missing items means missing work.
38
+ **Iterate until**: a full reasoning pass adds zero new items to .prd.
31
39
 
32
- ## File Rules
40
+ Categories of unknowns to enumerate: file existence | API shape | data format | dependency versions | runtime behavior | environment differences | error conditions | concurrency | integration points | backwards compatibility | rollback paths | deployment steps | verification criteria
33
41
 
34
- Path: exactly `./.prd` in current working directory. No variants. Valid JSON.
42
+ ## .PRD SCHEMA
35
43
 
36
- ## Item Schema
44
+ Path: exactly `./.prd` in current working directory. Valid JSON array.
37
45
 
38
46
  ```json
39
47
  {
40
48
  "id": "descriptive-kebab-id",
41
- "subject": "Imperative verb describing outcome",
49
+ "subject": "Imperative verb phrase — what must be true when done",
42
50
  "status": "pending",
43
- "description": "What must be true when this is done",
44
- "blocking": ["ids-this-prevents"],
45
- "blockedBy": ["ids-that-must-finish-first"],
51
+ "description": "Precise completion criterion",
52
+ "blocking": ["ids this prevents from starting"],
53
+ "blockedBy": ["ids that must complete first"],
46
54
  "effort": "small|medium|large",
47
- "category": "feature|bug|refactor|docs|infra",
48
- "acceptance": ["measurable criteria"],
49
- "edge_cases": ["known complications"]
55
+ "category": "feature|bug|refactor|infra",
56
+ "acceptance": ["measurable, binary criteria"],
57
+ "edge_cases": ["known failure modes and boundary conditions"]
50
58
  }
51
59
  ```
52
60
 
53
- **Subject**: imperative form. **Status**: `pending` → `in_progress` → `completed`. **Effort**: `small` (<15min) | `medium` (<45min) | `large` (1h+). **Blocking/blockedBy**: bidirectional, every dependency explicit.
61
+ **Status flow**: `pending` → `in_progress` → `completed` (completed items are removed from file).
62
+ **Effort**: `small` = single execution, under 15min | `medium` = 2-3 rounds, under 45min | `large` = multiple rounds, over 1h.
63
+ **blocking/blockedBy**: always bidirectional. Every dependency must be explicit in both directions.
54
64
 
55
- ## Construction
65
+ ## EXECUTION WAVES
56
66
 
57
- 1. Enumerate every possible unknown as a work item.
58
- 2. Map every possible dependency (blocking/blockedBy).
59
- 3. Group independent items into parallel waves (max 3 per wave).
60
- 4. Capture every edge case as either a separate item or edge_case field.
61
- 5. Write `./.prd` to disk.
62
- 6. **FREEZE** no additions after creation. Only mutation: removing finished items.
67
+ Independent items (empty `blockedBy`) run in parallel waves of ≤3 subagents.
68
+ - Find all pending items with empty `blockedBy`
69
+ - Launch ≤3 parallel `gm:gm` subagents via Task tool
70
+ - Each subagent handles one item: resolves it, witnesses output, removes from .prd
71
+ - After each wave: check newly unblocked items, launch next wave
72
+ - Never run independent items sequentially. Never launch more than 3 at once.
63
73
 
64
- ## Execution
74
+ ## COMPLETION CRITERION
65
75
 
66
- 1. Find all `pending` items with empty `blockedBy`.
67
- 2. Launch ≤3 parallel subagents (`subagent_type: gm:gm`) per wave.
68
- 3. Each subagent completes one item, verifies via witnessed execution.
69
- 4. On completion: remove item from `.prd`, write updated file.
70
- 5. Check for newly unblocked items. Launch next wave.
71
- 6. Continue until `.prd` is empty.
76
+ .prd is ready when: one full reasoning pass produces zero new items AND all items have explicit acceptance criteria AND all dependencies are mapped.
72
77
 
73
- Never execute independent items sequentially. Never launch more than 3 at once.
74
-
75
- ## Completion
76
-
77
- `.prd` must be empty at COMPLETE. Skip this skill if task is trivially single-step (under 5 minutes, no dependencies, no unknowns).
78
+ **Skip planning entirely** if: task is single-step, trivially bounded, zero unknowns, under 5 minutes.
78
79
 
79
80
  ---
80
81
 
81
- **→ FORWARD**: .prd written → invoke `gm-execute` skill.
82
- **↩ SNAKE**: re-invoke `planning` if requirements change at any later phase.
82
+ **→ FORWARD**: No new mutables → invoke `gm-execute` skill.
83
+ **↺ SELF-LOOP**: New items discovered add to .prd plan again.
84
+ **↩ SNAKE here**: New unknown surfaces in any later phase → add it, re-plan, re-advance.
package/tools.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "gm",
3
- "version": "2.0.187",
3
+ "version": "2.0.189",
4
4
  "description": "State machine agent with hooks, skills, and automated git enforcement",
5
5
  "tools": [
6
6
  {