gm-skill 0.1.2 → 2.0.1081
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +1 -0
- package/LICENSE +21 -0
- package/README.md +20 -84
- package/agents/gm.md +22 -0
- package/agents/memorize.md +100 -0
- package/agents/research-worker.md +36 -0
- package/agents/textprocessing.md +47 -0
- package/bin/bootstrap.js +702 -0
- package/bin/plugkit.js +136 -0
- package/bin/plugkit.sha256 +7 -0
- package/bin/plugkit.version +1 -0
- package/bin/plugkit.wasm +0 -0
- package/bin/plugkit.wasm.sha256 +1 -0
- package/bin/rtk.sha256 +6 -0
- package/bin/rtk.version +1 -0
- package/gm-plugkit/bootstrap.js +694 -0
- package/gm-plugkit/cli.js +48 -0
- package/gm-plugkit/index.js +12 -0
- package/gm-plugkit/package.json +26 -0
- package/gm-plugkit/plugkit-wasm-wrapper.js +190 -0
- package/gm-plugkit/plugkit.sha256 +6 -0
- package/gm-plugkit/plugkit.version +1 -0
- package/gm.json +27 -0
- package/lang/browser.js +45 -0
- package/lang/ssh.js +166 -0
- package/lib/browser-spool-handler.js +130 -0
- package/lib/browser.js +131 -0
- package/lib/codeinsight.js +109 -0
- package/lib/daemon-bootstrap.js +253 -132
- package/lib/git.js +0 -1
- package/lib/learning.js +169 -0
- package/lib/skill-bootstrap.js +406 -0
- package/lib/spool-dispatch.js +100 -0
- package/lib/spool.js +87 -49
- package/lib/wasm-host.js +241 -0
- package/package.json +38 -20
- package/prompts/bash-deny.txt +22 -0
- package/prompts/pre-compact.txt +21 -0
- package/prompts/prompt-submit.txt +83 -0
- package/prompts/session-start.txt +15 -0
- package/scripts/run-hook.sh +7 -0
- package/scripts/watch-cascade.js +166 -0
- package/skills/browser/SKILL.md +80 -0
- package/skills/code-search/SKILL.md +48 -0
- package/skills/create-lang-plugin/SKILL.md +121 -0
- package/skills/gm/SKILL.md +10 -49
- package/skills/gm-complete/SKILL.md +16 -87
- package/skills/gm-emit/SKILL.md +17 -50
- package/skills/gm-execute/SKILL.md +18 -69
- package/skills/gm-skill/SKILL.md +43 -0
- package/skills/gm-skill/index.js +21 -0
- package/skills/governance/SKILL.md +97 -0
- package/skills/pages/SKILL.md +208 -0
- package/skills/planning/SKILL.md +21 -97
- package/skills/research/SKILL.md +43 -0
- package/skills/ssh/SKILL.md +71 -0
- package/skills/textprocessing/SKILL.md +40 -0
- package/skills/update-docs/SKILL.md +24 -43
- package/gm-complete.SKILL.md +0 -106
- package/gm-emit.SKILL.md +0 -70
- package/gm-execute.SKILL.md +0 -88
- package/gm.SKILL.md +0 -63
- package/index.js +0 -1
- package/lib/index.js +0 -37
- package/lib/loader.js +0 -66
- package/lib/manifest.js +0 -99
- package/lib/prepare.js +0 -14
- package/planning.SKILL.md +0 -118
- package/skills/gm/index.js +0 -113
- package/skills/gm-complete/index.js +0 -118
- package/skills/gm-complete.SKILL.md +0 -106
- package/skills/gm-emit/index.js +0 -90
- package/skills/gm-emit.SKILL.md +0 -70
- package/skills/gm-execute/index.js +0 -91
- package/skills/gm-execute.SKILL.md +0 -88
- package/skills/gm.SKILL.md +0 -63
- package/skills/planning/index.js +0 -107
- package/skills/planning.SKILL.md +0 -118
- package/skills/update-docs/index.js +0 -108
- package/skills/update-docs.SKILL.md +0 -66
- package/test-build.js +0 -29
- package/test-e2e.js +0 -117
- package/test-unified.js +0 -24
- package/test.js +0 -89
- package/update-docs.SKILL.md +0 -66
package/skills/planning/SKILL.md
CHANGED
|
@@ -1,118 +1,42 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: planning
|
|
3
3
|
description: State machine orchestrator. Mutable discovery, PRD construction, and full PLAN→EXECUTE→EMIT→VERIFY→COMPLETE lifecycle. Invoke at session start and on any new unknown.
|
|
4
|
-
allowed-tools: Skill
|
|
4
|
+
allowed-tools: Skill, Read, Write
|
|
5
5
|
---
|
|
6
6
|
|
|
7
|
-
#
|
|
7
|
+
# planning — PLAN
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
Every turn begins with prior memory already loaded by auto-recall. PLAN adds targeted reconnaissance on top of that injection. Before any unknown is named as absent, it has been searched for. Before an abstraction is designed, the codebase has been checked for one that already exists.
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
## ORIENT
|
|
12
12
|
|
|
13
|
-
|
|
13
|
+
The first action of PLAN is a parallel pack: 3–5 `exec:recall` calls and 3–5 `exec:codesearch` calls against the request's nouns, dispatched in one message. Hits become weak_prior — still witnessed before adoption. Misses confirm the unknown is fresh. The pack is free relative to the duplicated discovery and disagree-with-prior-witness risk it prevents. Serial probing of nouns one-at-a-time is the failure mode this discipline guards against.
|
|
14
14
|
|
|
15
|
-
|
|
15
|
+
Spool the pack as the opening move:
|
|
16
16
|
|
|
17
|
-
- PLAN done → `gm-execute`
|
|
18
|
-
- New unknown anywhere in chain → re-enter PLAN
|
|
19
|
-
- EXECUTE unresolvable after 2 passes → PLAN
|
|
20
|
-
- VERIFY: `.prd` empty + git clean + pushed → `update-docs`; else → `gm-execute`
|
|
21
|
-
|
|
22
|
-
Cannot stop while `.gm/prd.yml` has items, git is dirty, or commits are unpushed.
|
|
23
|
-
|
|
24
|
-
## Orient
|
|
25
|
-
|
|
26
|
-
Open every plan with one parallel pack of `exec:recall` + `exec:codesearch` against the request's nouns. Hits land as `weak_prior`; misses confirm the unknown is fresh. The pack runs in one message.
|
|
27
|
-
|
|
28
|
-
**Auto-recall injection (skills-only platforms)**: derive a 2–6 word query from the request's nouns (subject, verb objects, key domain terms). Call `exec:recall <query>` at PLAN start before writing `.gm/prd.yml`, inline. This replaces the prompt-submit hook's auto-recall for platforms without hook infrastructure. Recall hits are injected as context into mutable discovery and PRD item acceptance criteria.
|
|
29
|
-
|
|
30
|
-
## Mutable discovery
|
|
31
|
-
|
|
32
|
-
For each aspect of the work, ask: what do I not know, what could go wrong, what depends on what, what am I assuming. Unwitnessed assumptions are mutables.
|
|
33
|
-
|
|
34
|
-
Fault surfaces to scan: file existence, API shape, data format, dep versions, runtime behavior, env differences, error conditions, concurrency, integration seams, backwards compat, rollback paths, CI correctness.
|
|
35
|
-
|
|
36
|
-
Tag every item with a route family (grounding | reasoning | state | execution | observability | boundary | representation) and cross-reference the 16-failure taxonomy. `governance` skill holds the table.
|
|
37
|
-
|
|
38
|
-
`existingImpl=UNKNOWN` is the default; resolve via `exec:codesearch` before adding the item. An existing concern routes to consolidation, not addition.
|
|
39
|
-
|
|
40
|
-
Plan exits when zero new unknowns surfaced last pass AND every item has acceptance criteria AND deps are mapped.
|
|
41
|
-
|
|
42
|
-
## .gm/mutables.yml — co-equal with .gm/prd.yml
|
|
43
|
-
|
|
44
|
-
Every unknown surfaced during PLAN lands as an entry in `.gm/mutables.yml` the same pass. Live during work, deleted when empty. Hook-gated: Write/Edit/NotebookEdit and `git commit`/`git push` are hard-blocked while any entry has `status: unknown`; turn-stop is hard-blocked the same way.
|
|
45
|
-
|
|
46
|
-
```yaml
|
|
47
|
-
- id: kebab-id
|
|
48
|
-
claim: One-line statement of what is assumed
|
|
49
|
-
witness_method: exec:codesearch <query> | exec:nodejs import | exec:recall <query> | Read <path>
|
|
50
|
-
witness_evidence: ""
|
|
51
|
-
status: unknown
|
|
52
17
|
```
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
```yaml
|
|
61
|
-
- id: kebab-id
|
|
62
|
-
subject: Imperative verb phrase
|
|
63
|
-
status: pending
|
|
64
|
-
description: Precise criterion
|
|
65
|
-
effort: small|medium|large
|
|
66
|
-
category: feature|bug|refactor|infra
|
|
67
|
-
route_family: grounding|reasoning|state|execution|observability|boundary|representation
|
|
68
|
-
load: 0.0-1.0
|
|
69
|
-
failure_modes: []
|
|
70
|
-
route_fit: unexamined|examined|dominant
|
|
71
|
-
authorization: none|weak_prior|witnessed
|
|
72
|
-
blocking: []
|
|
73
|
-
blockedBy: []
|
|
74
|
-
acceptance:
|
|
75
|
-
- binary criterion
|
|
76
|
-
edge_cases:
|
|
77
|
-
- failure mode
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
`load` is consequence-if-wrong: 0.9 = headline collapses, 0.7 = sub-argument rebuilt, 0.4 = local patch, 0.1 = nothing breaks. Verification budget = `load × (1 − tier_confidence)`. λ>0.75 must reach witnessed before EMIT.
|
|
81
|
-
|
|
82
|
-
`status`: pending → in_progress → completed (then remove). `effort`: small <15min | medium <45min | large >1h.
|
|
83
|
-
|
|
84
|
-
## Parallel subagent launch
|
|
85
|
-
|
|
86
|
-
After `.prd` is written, up to 3 parallel `gm:gm` subagents for independent items in one message. Browser tasks serialize.
|
|
87
|
-
|
|
88
|
-
```
|
|
89
|
-
Agent(subagent_type="gm:gm", prompt="Work on .prd item: <id>. .prd path: <path>. Item: <full YAML>.")
|
|
18
|
+
.gm/exec-spool/in/recall/1.txt "<noun phrase 1>"
|
|
19
|
+
.gm/exec-spool/in/recall/2.txt "<noun phrase 2>"
|
|
20
|
+
.gm/exec-spool/in/recall/3.txt "<noun phrase 3>"
|
|
21
|
+
.gm/exec-spool/in/codesearch/1.txt "<two-word phrase 1>"
|
|
22
|
+
.gm/exec-spool/in/codesearch/2.txt "<two-word phrase 2>"
|
|
23
|
+
.gm/exec-spool/in/codesearch/3.txt "<two-word phrase 3>"
|
|
90
24
|
```
|
|
91
25
|
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
## Observability gates in the plan
|
|
95
|
-
|
|
96
|
-
Server: every subsystem exposes `/debug/<subsystem>`; structured logs `{subsystem, severity, ts}`. Client: `window.__debug` live registry; modules register on mount. `console.log` is not observability. Discovery of a gap during PLAN adds a `.prd` item the same pass — never deferred.
|
|
97
|
-
|
|
98
|
-
`window.__debug` is THE in-page registry; `test.js` at project root is the sole out-of-page test asset. Any new file whose purpose is to exercise, smoke-test, demo, or sandbox in-page behavior outside that registry fights the discipline — extend the registry instead.
|
|
99
|
-
|
|
100
|
-
## Test discipline encoded in the plan
|
|
101
|
-
|
|
102
|
-
One `test.js` at project root, 200-line hard cap, real data, real system. No fixtures, mocks, or scattered tests. A second test runner under any name in any directory is a smuggled parallel surface.
|
|
26
|
+
All in one message. Read `out/*.json` together.
|
|
103
27
|
|
|
104
|
-
|
|
28
|
+
## Maximal Cover
|
|
105
29
|
|
|
106
|
-
|
|
30
|
+
Scope-exceeds-reach is a planning condition, not a stopping condition. The covering family is the plan. Enumerate every bounded subset of the request witnessable from this session; write the family into `.gm/prd.yml` with the dependency graph explicit. Residuals within the spirit of the ask AND reachable from this session are self-authorized — expand the PRD and declare the read in one line ("treating X as in-spirit because Y"). Only out-of-spirit or unreachable residuals are name-and-stop.
|
|
107
31
|
|
|
108
|
-
|
|
32
|
+
## Mutables File
|
|
109
33
|
|
|
110
|
-
`
|
|
34
|
+
`.gm/mutables.yml` is co-equal with `.gm/prd.yml`. Every unknown surfaced lands as a row with `status: unknown`. The hook layer hard-blocks Write, Edit, `git commit`, `git push`, and stop while any row remains unknown. Rows flip to `witnessed` only when `witness_evidence` carries concrete proof — file:line, codesearch hit, exec output snippet. Narrative resolution is rejected on read. PLAN exits only at ε = 0 on the final pass.
|
|
111
35
|
|
|
112
|
-
|
|
36
|
+
## Dispatch
|
|
113
37
|
|
|
114
|
-
|
|
38
|
+
`phase-status` to read FSM state. `transition` to advance. `mutable-resolve` to mark witnessed (auto-fires memorize). Plus the usual `recall`, `codesearch`, `memorize`, `health`, language stems.
|
|
115
39
|
|
|
116
|
-
|
|
40
|
+
## Transition
|
|
117
41
|
|
|
118
|
-
|
|
42
|
+
Read `out/<N>.json::nextSkill`. Invoke `Skill(skill="gm:<nextSkill>")` immediately.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research
|
|
3
|
+
description: Web research via parallel subagent fan-out. Use when a question needs the live web, spans multiple sources, requires comparison across vendors/papers/repos, or would saturate a single context window. Skip for one-page lookups answerable by a single WebFetch.
|
|
4
|
+
allowed-tools: Skill
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Research
|
|
8
|
+
|
|
9
|
+
Declare the discipline before fetching anything. The first line of work names the `@<discipline>` the corpus belongs to — fresh material lives at `.gm/disciplines/<name>/corpus/raw/`, concise rewrites at `.gm/disciplines/<name>/corpus/concise/<chunk-id>.md`, the merged synthesis at `.gm/disciplines/<name>/deep-understanding.md`. A run without a discipline declaration falls back to the default store and the orchestrator says so in one line.
|
|
10
|
+
|
|
11
|
+
Lead orchestrates. Workers fetch. Findings converge on disk. The lead never reads pages — workers do.
|
|
12
|
+
|
|
13
|
+
Treat a thousand-document corpus with the same care as a codebase. Material above roughly ten pages — about eight thousand tokens — splits at paragraph boundaries into chunks each owned by one parallel `gm:research-worker` (haiku model). Each worker emits a fact-preserving concise rewrite to `corpus/concise/<chunk-id>.md` — every claim, number, name, caveat from the source survives, prose density rises, citations stay attached. Once every chunk returns, the lead merges into `deep-understanding.md` enumerating the opportunities the corpus opens and the reasonable opinionations it forces. Concise files and the merged document auto-ingest via `exec:memorize @<name>` so the next pass recalls instead of re-fetching.
|
|
14
|
+
|
|
15
|
+
Effort matches stakes. A single fact is one short fetch. A vendor comparison is a handful of workers, each owning one vendor. A landscape survey is ten or more, each owning one axis. Spending a fan-out on a fact wastes tokens; spending a fact-fetch on a landscape under-delivers.
|
|
16
|
+
|
|
17
|
+
Breadth first, depth on demand. Open with a wide sweep that maps the terrain, then commit deep dives only where the sweep surfaces something load-bearing. A narrow opening misses the alternative the user actually needed.
|
|
18
|
+
|
|
19
|
+
## Worker contract
|
|
20
|
+
|
|
21
|
+
Each worker receives the precise question it owns, the shape of the answer (bullets, table row, prose paragraph), the boundary of what it must not pursue, and the destination path under `.gm/research/<slug>/<worker-id>.md`. Workers write structured findings to disk and return only a path plus a one-line summary. The lead reads the paths it cares about; the rest stay on disk. Returning full prose through the agent boundary burns context that the synthesis pass needs.
|
|
22
|
+
|
|
23
|
+
Workers run in parallel — independent questions launch in one message, never serialized.
|
|
24
|
+
|
|
25
|
+
## Citations
|
|
26
|
+
|
|
27
|
+
A claim without a source URL is a hallucination waiting to be quoted. Workers attach the URL and the quoted span beside every non-trivial assertion. The lead refuses to lift a claim into the final answer if its citation field is empty.
|
|
28
|
+
|
|
29
|
+
## Source quality
|
|
30
|
+
|
|
31
|
+
Vendor docs, RFCs, primary repos, dated blog posts from named authors, and academic preprints beat aggregator pages. When two sources disagree, the older primary usually beats the newer aggregator.
|
|
32
|
+
|
|
33
|
+
## Convergence
|
|
34
|
+
|
|
35
|
+
Synthesis happens once, after all workers return. Mid-flight summarisation truncates findings the next worker would have built on. If a worker's return reveals a new axis the original plan missed, expand the fan-out — do not stretch an existing worker past its brief.
|
|
36
|
+
|
|
37
|
+
## When not to fan out
|
|
38
|
+
|
|
39
|
+
One question, one page, one fetch. A single `WebFetch` answers it. The fan-out machinery has overhead; spending it on a lookup is the same mistake as skipping it on a survey.
|
|
40
|
+
|
|
41
|
+
## Handoff
|
|
42
|
+
|
|
43
|
+
Final answer cites every load-bearing claim, names the workers' output paths for audit, and surfaces disagreements rather than averaging them away.
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ssh
|
|
3
|
+
description: Run shell commands on remote SSH hosts via exec:ssh. Reads targets from ~/.claude/ssh-targets.json. Use for deploying, monitoring, or controlling remote machines.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# exec:ssh — Remote SSH execution
|
|
7
|
+
|
|
8
|
+
Runs shell commands on a remote host. No manual connection needed.
|
|
9
|
+
|
|
10
|
+
## Setup
|
|
11
|
+
|
|
12
|
+
`~/.claude/ssh-targets.json`:
|
|
13
|
+
|
|
14
|
+
```json
|
|
15
|
+
{
|
|
16
|
+
"default": { "host": "192.168.1.10", "port": 22, "username": "pi", "password": "pass" },
|
|
17
|
+
"prod": { "host": "10.0.0.1", "username": "ubuntu", "keyPath": "/home/user/.ssh/id_rsa" }
|
|
18
|
+
}
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
`host` and `username` required. `port` defaults to 22. Auth: `password` OR `keyPath` + optional `passphrase`.
|
|
22
|
+
|
|
23
|
+
## Usage
|
|
24
|
+
|
|
25
|
+
```
|
|
26
|
+
exec:ssh
|
|
27
|
+
<shell command>
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
Named host with `@name` on the first line:
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
exec:ssh
|
|
34
|
+
@prod
|
|
35
|
+
sudo systemctl restart myapp
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Process persistence
|
|
39
|
+
|
|
40
|
+
SSH kills child processes on close. To survive disconnect:
|
|
41
|
+
|
|
42
|
+
```
|
|
43
|
+
exec:ssh
|
|
44
|
+
sudo systemctl reset-failed myunit 2>/dev/null; systemd-run --unit=myunit bash -c 'your-command'
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
Unique unit name per launch:
|
|
48
|
+
|
|
49
|
+
```
|
|
50
|
+
exec:ssh
|
|
51
|
+
systemd-run --unit=job-$(date +%s) bash -c 'nohup myprogram &'
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
No-systemd fallback:
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
exec:ssh
|
|
58
|
+
setsid nohup bash -c 'myprogram > /tmp/out.log 2>&1' &
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## Dependency
|
|
62
|
+
|
|
63
|
+
Requires `ssh2` in `~/.claude/gm-tools`. Write to `.gm/exec-spool/in/nodejs/<N>.js`:
|
|
64
|
+
|
|
65
|
+
```js
|
|
66
|
+
const { execSync } = require('child_process');
|
|
67
|
+
const path = require('path');
|
|
68
|
+
const os = require('os');
|
|
69
|
+
const gmTools = path.join(os.homedir(), '.claude', 'gm-tools');
|
|
70
|
+
execSync('npm install ssh2', { cwd: gmTools, stdio: 'inherit' });
|
|
71
|
+
```
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: textprocessing
|
|
3
|
+
description: The required surface for any text task whose correctness depends on understanding. Code does mechanics; this skill does meaning. Invoked via Agent(subagent_type='gm:textprocessing', model='haiku', ...).
|
|
4
|
+
allowed-tools: Skill
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Textprocessing — Meaning goes through Haiku
|
|
8
|
+
|
|
9
|
+
## Invocation
|
|
10
|
+
|
|
11
|
+
```
|
|
12
|
+
Agent(subagent_type='gm:textprocessing', model='haiku', prompt='## INPUT\n<body>\n\n## INSTRUCTION\n<task>')
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Background fire-and-forget: add `run_in_background=true`. Batch: N parallel `Agent` calls in one message, one per item.
|
|
16
|
+
|
|
17
|
+
## The split
|
|
18
|
+
|
|
19
|
+
Mechanics stay in code: char/word/token/line counts, byte length, split on delimiter, exact-string find/replace, regex match/extract, sort, group-by-key, dedup-by-equality, lower/uppercase, JSON parse/stringify, base64, URL encode, hash, diff, format/pretty-print.
|
|
20
|
+
|
|
21
|
+
Meaning goes through this skill: summarize, classify, extract entities or intents, rewrite for tone or audience, translate, semantic dedup (same meaning, different words), rank or score by quality, label by topic, decide whether two texts are about the same thing, paraphrase, simplify, expand outline → prose, headline-from-body, body-from-headline, fact-from-passage, sentiment, toxicity, relevance, similarity-by-meaning.
|
|
22
|
+
|
|
23
|
+
The bar: would a human have to *read and understand* the text to do this correctly? Yes → skill. No → code. A keyword-list, a regex on phrases like "important", or a string-similarity ratio loop deciding meaning is a stub of this skill. Replace it with one (or N parallel) Agent calls.
|
|
24
|
+
|
|
25
|
+
## Batch
|
|
26
|
+
|
|
27
|
+
Independent items run in parallel — one Agent call per item, all in one message. The runner Promise-allSettles. Sequential calls are wasteful when items don't depend on each other.
|
|
28
|
+
|
|
29
|
+
For one large body exceeding a single-prompt budget, the *caller* chunks deterministically (paragraph, section, fixed token count), fans out one Agent per chunk, and merges with a final reducer Agent if cross-chunk synthesis is needed. The agent itself never chunks — it processes whatever it receives in one shot.
|
|
30
|
+
|
|
31
|
+
## Output contract
|
|
32
|
+
|
|
33
|
+
Plain-text instruction → plain-text output, no fences, no labels. JSON instruction → exactly that JSON, parseable by `JSON.parse`. Multi-document input requested as a list → one entry per input doc in the same order. Ambiguous shape → defaults to plain text. Empty input → empty output.
|
|
34
|
+
|
|
35
|
+
## Constraints
|
|
36
|
+
|
|
37
|
+
- Model fixed at `haiku`. Escalate to opus only when haiku output fails an acceptance check.
|
|
38
|
+
- One transform per call. Three parallel calls beats one prompt asking for "summarize AND classify AND translate".
|
|
39
|
+
- Idempotent: same input + same instruction → same output, modulo sampling. Strict determinism callers specify `temperature=0` in the prompt.
|
|
40
|
+
- Output is the deliverable. No commentary, no "here is your output".
|
|
@@ -1,66 +1,47 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: update-docs
|
|
3
3
|
description: UPDATE-DOCS phase. Refresh README.md, AGENTS.md, and docs/index.html to reflect changes made this session. Commits and pushes doc updates. Terminal phase — declares COMPLETE.
|
|
4
|
+
allowed-tools: Skill, Read, Write
|
|
4
5
|
---
|
|
5
6
|
|
|
6
|
-
#
|
|
7
|
+
# update-docs — UPDATE-DOCS
|
|
7
8
|
|
|
8
|
-
|
|
9
|
+
Docs reflect the current state of the system, not its history. Every rule in AGENTS.md is a present-tense statement about what must or must-not be the case in code now. Past-tense framing, `(FIXED)` markers, dated audit entries, and "we used to X, now we Y" phrasing belong in `git log` and `CHANGELOG.md` — never in AGENTS.md.
|
|
9
10
|
|
|
10
|
-
|
|
11
|
+
## AGENTS.md and CLAUDE.md
|
|
11
12
|
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
What changed — run directly via Bash:
|
|
13
|
+
Edits to AGENTS.md and CLAUDE.md route through the memorize subagent only — never inline-edit. Invocation:
|
|
15
14
|
|
|
16
15
|
```
|
|
17
|
-
|
|
18
|
-
git diff HEAD~1 --stat
|
|
16
|
+
Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<fact>')
|
|
19
17
|
```
|
|
20
18
|
|
|
21
|
-
|
|
19
|
+
The classifier rejects changelog-shaped facts from AGENTS.md ingestion (rs-learn still accepts them). Multiple facts → multiple parallel Agent calls in one message.
|
|
22
20
|
|
|
23
|
-
|
|
24
|
-
const fs = require('fs');
|
|
25
|
-
['README.md', 'AGENTS.md', 'docs/index.html', 'gm-starter/agents/gm.md'].forEach(f => {
|
|
26
|
-
try { console.log(`=== ${f} ===\n` + fs.readFileSync(f, 'utf8')); }
|
|
27
|
-
catch(e) { console.log(`MISSING: ${f}`); }
|
|
28
|
-
});
|
|
29
|
-
```
|
|
21
|
+
## README.md
|
|
30
22
|
|
|
31
|
-
|
|
23
|
+
Refresh to reflect the surface a new reader actually encounters. Remove stale install steps, version pins, and features that no longer exist. Add what was added this session if it changes the public surface.
|
|
32
24
|
|
|
33
|
-
|
|
34
|
-
- **AGENTS.md** — via `Agent(subagent_type='gm:memorize', model='haiku', run_in_background=true, prompt='## CONTEXT TO MEMORIZE\n<learnings>')`. Never inline-edit.
|
|
35
|
-
- **docs/index.html** — `PHASES` array, platform lists, state machine diagram
|
|
36
|
-
- **gm-starter/agents/gm.md** — skill chain line if new skills added
|
|
25
|
+
## docs/index.html
|
|
37
26
|
|
|
38
|
-
|
|
27
|
+
Regenerate or hand-edit to reflect the same surface. Site builds run from `site/`; the deployed `/` route renders from `site/content/pages/home.yaml` via flatspace. Landing edits go through `site/theme.mjs` (Hero) and the YAML — never `site/index.html` directly. `docs/styles.css` is generated from `site/input.css`; append to the source, not the output.
|
|
39
28
|
|
|
40
|
-
|
|
41
|
-
const content = require('fs').readFileSync('/abs/path/file.md', 'utf8');
|
|
42
|
-
console.log(content.includes('expectedString'), content.length);
|
|
43
|
-
```
|
|
29
|
+
## CHANGELOG.md
|
|
44
30
|
|
|
45
|
-
|
|
31
|
+
One entry per commit landed this session. The commit message line plus a one-sentence "why" — no recipe, no narration. CHANGELOG carries the history that AGENTS.md refuses.
|
|
46
32
|
|
|
47
|
-
|
|
48
|
-
git add README.md docs/index.html gm-starter/agents/gm.md
|
|
49
|
-
git commit -m "docs: update documentation to reflect session changes"
|
|
50
|
-
git push -u origin HEAD
|
|
51
|
-
```
|
|
33
|
+
## Commit and Push
|
|
52
34
|
|
|
53
|
-
|
|
35
|
+
Stage doc updates only — never bundle them with code changes from earlier phases (those committed at their own time). One commit, present-tense imperative subject. Push to main. The push triggers the docs pipeline if the repo has one.
|
|
54
36
|
|
|
55
|
-
|
|
37
|
+
## COMPLETE
|
|
56
38
|
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
```
|
|
39
|
+
This is the terminal phase. After push lands, the chain signals COMPLETE. No further `Skill()` invocation; the orchestrator records the chain as concluded.
|
|
40
|
+
|
|
41
|
+
## Dispatch
|
|
42
|
+
|
|
43
|
+
`phase-status`, `transition` to COMPLETE.
|
|
44
|
+
|
|
45
|
+
## Transition
|
|
65
46
|
|
|
66
|
-
|
|
47
|
+
`nextSkill: null` from the orchestrator means the chain is done. End of skill body.
|
package/gm-complete.SKILL.md
DELETED
|
@@ -1,106 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: gm-complete
|
|
3
|
-
description: VERIFY and COMPLETE phase. End-to-end system verification and git enforcement. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# GM COMPLETE — Verify, then close
|
|
7
|
-
|
|
8
|
-
Entry: EMIT gates clear, from `gm-emit`. Exit: `.prd` deleted + test.js green + pushed + CI green → `update-docs`.
|
|
9
|
-
|
|
10
|
-
Cross-cutting dispositions live in `gm` SKILL.md.
|
|
11
|
-
|
|
12
|
-
## Transitions
|
|
13
|
-
|
|
14
|
-
- `.prd` items remain → `gm-execute`
|
|
15
|
-
- `.prd` empty AND test.js green AND pushed AND CI green → `update-docs`
|
|
16
|
-
- Broken file output → `gm-emit`
|
|
17
|
-
- Wrong logic → `gm-execute`
|
|
18
|
-
- New unknown or wrong requirements → `planning`
|
|
19
|
-
|
|
20
|
-
Failure triage: broken output to EMIT, wrong logic to EXECUTE, new unknown to PLAN. Never patch around surprises.
|
|
21
|
-
|
|
22
|
-
## Mutables that must resolve before COMPLETE
|
|
23
|
-
|
|
24
|
-
- `witnessed_e2e` — real end-to-end run with witnessed output
|
|
25
|
-
- `browser_validated` — for any change touching client / UI / browser-facing code, see gate below. test.js + node-side imports DO NOT satisfy this gate.
|
|
26
|
-
- `git_clean` — `git status --porcelain` returns empty
|
|
27
|
-
- `git_pushed` — `git log origin/main..HEAD --oneline` returns empty
|
|
28
|
-
- `ci_passed` — every GitHub Actions run reaches `conclusion: success`
|
|
29
|
-
- `mutables_resolved` — `.gm/mutables.yml` deleted OR every entry `status: witnessed`. Stop hook hard-blocks turn-stop while any entry is `status: unknown`.
|
|
30
|
-
- `prd_empty` — `.gm/prd.yml` deleted AFTER residual scan: enumerate every in-spirit reachable residual surfaced this session; any hit re-enters `planning`, appends PRD items, executes. Empty PRD is necessary, not sufficient — done = empty PRD AND zero reachable in-spirit residuals. Out-of-spirit-or-unreachable residuals are named in the response and skipped; everything else is this turn's work.
|
|
31
|
-
- `stress_suite_clear` — change walked through M1–D1 (governance), none flunked
|
|
32
|
-
- `hidden_decision_posture` — open → down_weighted → closed only when CI is green AND stress suite is clear
|
|
33
|
-
|
|
34
|
-
## End-to-end verification
|
|
35
|
-
|
|
36
|
-
Real system, real data, witness actual output. Doc updates, "saying done", and screenshots alone are not verification. Write the e2e probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
|
|
37
|
-
|
|
38
|
-
```
|
|
39
|
-
const { fn } = await import('/abs/path/to/module.js');
|
|
40
|
-
console.log(await fn(realInput));
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
After every success, enumerate what remains — never stop at first green.
|
|
44
|
-
|
|
45
|
-
## Browser validation gate
|
|
46
|
-
|
|
47
|
-
Required when this session changed any code that runs in a browser: anything under `client/`, UI components, shaders, page-loaded JS, served HTML, gh-pages assets, dev-server endpoints, or any module imported into the page bundle.
|
|
48
|
-
|
|
49
|
-
Trigger detection (any one): `git diff --name-only origin/main..HEAD` includes paths under `client/`, `apps/*/index.js` with client export, `docs/`, `*.html`, shader files, or any file imported by a browser entry; new/changed export consumed by `window.*` or rendered in DOM/canvas/WebGL; visual, layout, animation, input, network-on-page, or shader behavior altered.
|
|
50
|
-
|
|
51
|
-
Protocol: boot the real server (or open the static page) on a known URL — witness HTTP 200. `exec:browser` → `page.goto(url)` → wait for app init by polling for the global the change affects (`window.__app.<system>`). Probe via `page.evaluate(() => …)` asserting the specific invariant the change was supposed to establish — instance counts, scene meshes, DOM nodes, render stats, network frames. Capture witnessed numbers in the response — "looks fine" is not a witness. Failures route to `gm-execute` (logic) or `gm-emit` (output) — never paper over.
|
|
52
|
-
|
|
53
|
-
Long-running probes split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
|
|
54
|
-
|
|
55
|
-
Exempt only when: change is server-only with zero browser-facing surface, OR the repository has no browser surface at all (pure CLI / library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence.
|
|
56
|
-
|
|
57
|
-
Pre-flight: run `git diff --name-only origin/main..HEAD` directly via Bash, then dispatch a nodejs spool file that reads the diff list and filters lines matching `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session → mandatory regression to `gm-execute`.
|
|
58
|
-
|
|
59
|
-
## Integration test gate
|
|
60
|
-
|
|
61
|
-
Write to `.gm/exec-spool/in/nodejs/<N>.js`:
|
|
62
|
-
|
|
63
|
-
```
|
|
64
|
-
const { execSync } = require('child_process');
|
|
65
|
-
try { execSync('node test.js', { stdio: 'inherit', timeout: 30000 }); console.log('PASS'); }
|
|
66
|
-
catch (e) { console.error('FAIL'); process.exit(1); }
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
Failure → `gm-execute`. No test.js in a repo with testable surface → `gm-execute` to create it.
|
|
70
|
-
|
|
71
|
-
## Git enforcement
|
|
72
|
-
|
|
73
|
-
Run directly via Bash:
|
|
74
|
-
|
|
75
|
-
```
|
|
76
|
-
git status --porcelain
|
|
77
|
-
git log origin/main..HEAD --oneline
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
Both must return empty. Local commit without push is not complete.
|
|
81
|
-
|
|
82
|
-
## CI is automated
|
|
83
|
-
|
|
84
|
-
The Stop hook watches Actions for the pushed HEAD. Do not call `gh run list` manually. All-green → Stop approves with CI summary in next-turn context. Failure → Stop blocks with run names + IDs; investigate via `gh run view <id> --log-failed`, fix, push, hook re-watches. Deadline 180s (override `GM_CI_WATCH_SECS`); slow jobs get a "still in progress" approve.
|
|
85
|
-
|
|
86
|
-
## Hygiene sweep
|
|
87
|
-
|
|
88
|
-
1. Files >200 lines → split
|
|
89
|
-
2. Comments in code → remove
|
|
90
|
-
3. Scattered test files (`.test.js`, `.spec.js`, `__tests__/`, `fixtures/`, `mocks/`) → delete, consolidate into root `test.js`
|
|
91
|
-
4. Mock / stub / simulation files → delete
|
|
92
|
-
5. Unnecessary doc files (not CHANGELOG, CLAUDE, README, TODO.md) → delete
|
|
93
|
-
6. Duplicate concern → regress to `planning` with restructuring instructions
|
|
94
|
-
7. Hardcoded values → derive from ground truth
|
|
95
|
-
8. Fallback / demo modes → remove, fail loud
|
|
96
|
-
9. TODO.md → empty or deleted
|
|
97
|
-
10. CHANGELOG.md → entries for this session
|
|
98
|
-
11. Observability gaps → server subsystems expose `/debug/<subsystem>`; client modules register in `window.__debug`
|
|
99
|
-
12. Memorize → every fact from verification handed off via background `Agent(memorize)` at moment of resolution
|
|
100
|
-
13. Deploy / publish → if deployable, deploy; if npm package, publish
|
|
101
|
-
14. GitHub Pages → check `.github/workflows/pages.yml` + `docs/index.html` exist; invoke `pages` skill if absent
|
|
102
|
-
15. Governance stress-suite → walk change through M1, F1, C1, H1, S1, B1, A1, D1; any flunk regresses to the owning phase
|
|
103
|
-
|
|
104
|
-
## Completion
|
|
105
|
-
|
|
106
|
-
All true at once: witnessed e2e | browser_validated when client work touched | failure paths exercised | test.js passes | `.prd` deleted | git clean and pushed | CI green | hygiene sweep clean | TODO.md gone | CHANGELOG.md updated.
|
package/gm-emit.SKILL.md
DELETED
|
@@ -1,70 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: gm-emit
|
|
3
|
-
description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# GM EMIT — Write and verify from disk
|
|
7
|
-
|
|
8
|
-
Entry: every mutable KNOWN, from `gm-execute` or re-entered from VERIFY. Exit: gates clear → `gm-complete`.
|
|
9
|
-
|
|
10
|
-
Cross-cutting dispositions live in `gm` SKILL.md.
|
|
11
|
-
|
|
12
|
-
## Transitions
|
|
13
|
-
|
|
14
|
-
- All gates clear → `gm-complete`
|
|
15
|
-
- Post-emit variance with known cause → fix in-band, re-verify, stay in EMIT
|
|
16
|
-
- Pre-emit reveals known logic error → `gm-execute`
|
|
17
|
-
- Pre-emit reveals new unknown OR post-emit variance with unknown cause OR scope changed → `planning`
|
|
18
|
-
|
|
19
|
-
## Legitimacy gate (before pre-emit run)
|
|
20
|
-
|
|
21
|
-
For every claim landing in a file, answer five questions:
|
|
22
|
-
|
|
23
|
-
1. Earned specificity — does it trace to `authorization=witnessed`, or is it inflated from a weak prior?
|
|
24
|
-
2. Repair legality — is a local patch dressed as structural repair? Downgrade scope or regress to PLAN.
|
|
25
|
-
3. Lawful downgrade — can a weaker, true statement replace it? Prefer the downgrade.
|
|
26
|
-
4. Alternative-route suppression — is a live competing route being silenced? Preserve it.
|
|
27
|
-
5. Strongest objection — what would the sharpest reviewer pushback be? Articulate it. Cannot articulate = have not understood the alternatives → `gm-execute`.
|
|
28
|
-
|
|
29
|
-
Any failure regresses to `gm-execute` to witness what was missing, or `planning` if the gap is structural.
|
|
30
|
-
|
|
31
|
-
## Pre-emit run
|
|
32
|
-
|
|
33
|
-
Mandatory before writing any file. Write the probe to the spool (`.gm/exec-spool/in/nodejs/<N>.js`):
|
|
34
|
-
|
|
35
|
-
```
|
|
36
|
-
const { fn } = await import('/abs/path/to/module.js');
|
|
37
|
-
console.log(await fn(realInput));
|
|
38
|
-
```
|
|
39
|
-
|
|
40
|
-
Import the actual module from disk to witness current behavior as the baseline. Run the proposed logic in isolation without writing — witness with real inputs and with real error inputs. Match expected → write. Unexpected → new unknown → `planning`.
|
|
41
|
-
|
|
42
|
-
## Writing
|
|
43
|
-
|
|
44
|
-
Use the Write tool, or a nodejs spool file with `require('fs')`. Write only when every gate mutable resolves simultaneously.
|
|
45
|
-
|
|
46
|
-
## Post-emit verification
|
|
47
|
-
|
|
48
|
-
Re-import from disk — in-memory state is stale and inadmissible. Run identical inputs as pre-emit; output must match the baseline exactly. Known variance → fix and re-verify (self-loop). Unknown variance → `planning`.
|
|
49
|
-
|
|
50
|
-
## Mutables gate
|
|
51
|
-
|
|
52
|
-
Before pre-emit run, read `.gm/mutables.yml`. Any entry with `status: unknown` → regress to `gm-execute`. The pre-tool-use hook hard-blocks Write/Edit/NotebookEdit while unresolved entries exist; trying to emit anyway returns deny. Zero unresolved is the precondition for every legitimacy question below.
|
|
53
|
-
|
|
54
|
-
## Gate (all true at once)
|
|
55
|
-
|
|
56
|
-
- `.gm/mutables.yml` empty/absent OR every entry `status: witnessed` with filled `witness_evidence`
|
|
57
|
-
- Legitimacy gate passed; no refused collapse
|
|
58
|
-
- Pre-emit passed with real inputs and real error inputs
|
|
59
|
-
- Post-emit matches pre-emit exactly
|
|
60
|
-
- Hot-reloadable; errors throw with context (no `|| default`, no `catch { return null }`, no fallbacks)
|
|
61
|
-
- No mocks, fakes, stubs, or scattered test files (delete on discovery)
|
|
62
|
-
- Any behavior change has a corresponding assertion in `test.js` — a change no test catches is a change you cannot prove
|
|
63
|
-
- Browser-facing change → post-emit verify includes a live `exec:browser` witness (boot server → `page.goto` → `page.evaluate` asserting the invariant the change established). Node-side import + test.js does not satisfy this — the final gate runs again in `gm-complete`.
|
|
64
|
-
- Files ≤ 200 lines
|
|
65
|
-
- No duplicate concern (run `exec:codesearch` for the primary concern after writing; overlap → `planning`)
|
|
66
|
-
- No comments, no hardcoded values, no adjectives in identifiers, no unnecessary files
|
|
67
|
-
- Observability: new server subsystems expose `/debug/<subsystem>`; new client modules register in `window.__debug`
|
|
68
|
-
- Structure: no if/else where dispatch suffices; no one-liners that obscure; no reinvented APIs
|
|
69
|
-
- Every fact resolved this phase memorized via background `Agent(memorize)`
|
|
70
|
-
- CHANGELOG.md updated; TODO.md cleared or deleted
|