gm-skill 0.1.2 → 2.0.1081
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +1 -0
- package/LICENSE +21 -0
- package/README.md +20 -84
- package/agents/gm.md +22 -0
- package/agents/memorize.md +100 -0
- package/agents/research-worker.md +36 -0
- package/agents/textprocessing.md +47 -0
- package/bin/bootstrap.js +702 -0
- package/bin/plugkit.js +136 -0
- package/bin/plugkit.sha256 +7 -0
- package/bin/plugkit.version +1 -0
- package/bin/plugkit.wasm +0 -0
- package/bin/plugkit.wasm.sha256 +1 -0
- package/bin/rtk.sha256 +6 -0
- package/bin/rtk.version +1 -0
- package/gm-plugkit/bootstrap.js +694 -0
- package/gm-plugkit/cli.js +48 -0
- package/gm-plugkit/index.js +12 -0
- package/gm-plugkit/package.json +26 -0
- package/gm-plugkit/plugkit-wasm-wrapper.js +190 -0
- package/gm-plugkit/plugkit.sha256 +6 -0
- package/gm-plugkit/plugkit.version +1 -0
- package/gm.json +27 -0
- package/lang/browser.js +45 -0
- package/lang/ssh.js +166 -0
- package/lib/browser-spool-handler.js +130 -0
- package/lib/browser.js +131 -0
- package/lib/codeinsight.js +109 -0
- package/lib/daemon-bootstrap.js +253 -132
- package/lib/git.js +0 -1
- package/lib/learning.js +169 -0
- package/lib/skill-bootstrap.js +406 -0
- package/lib/spool-dispatch.js +100 -0
- package/lib/spool.js +87 -49
- package/lib/wasm-host.js +241 -0
- package/package.json +38 -20
- package/prompts/bash-deny.txt +22 -0
- package/prompts/pre-compact.txt +21 -0
- package/prompts/prompt-submit.txt +83 -0
- package/prompts/session-start.txt +15 -0
- package/scripts/run-hook.sh +7 -0
- package/scripts/watch-cascade.js +166 -0
- package/skills/browser/SKILL.md +80 -0
- package/skills/code-search/SKILL.md +48 -0
- package/skills/create-lang-plugin/SKILL.md +121 -0
- package/skills/gm/SKILL.md +10 -49
- package/skills/gm-complete/SKILL.md +16 -87
- package/skills/gm-emit/SKILL.md +17 -50
- package/skills/gm-execute/SKILL.md +18 -69
- package/skills/gm-skill/SKILL.md +43 -0
- package/skills/gm-skill/index.js +21 -0
- package/skills/governance/SKILL.md +97 -0
- package/skills/pages/SKILL.md +208 -0
- package/skills/planning/SKILL.md +21 -97
- package/skills/research/SKILL.md +43 -0
- package/skills/ssh/SKILL.md +71 -0
- package/skills/textprocessing/SKILL.md +40 -0
- package/skills/update-docs/SKILL.md +24 -43
- package/gm-complete.SKILL.md +0 -106
- package/gm-emit.SKILL.md +0 -70
- package/gm-execute.SKILL.md +0 -88
- package/gm.SKILL.md +0 -63
- package/index.js +0 -1
- package/lib/index.js +0 -37
- package/lib/loader.js +0 -66
- package/lib/manifest.js +0 -99
- package/lib/prepare.js +0 -14
- package/planning.SKILL.md +0 -118
- package/skills/gm/index.js +0 -113
- package/skills/gm-complete/index.js +0 -118
- package/skills/gm-complete.SKILL.md +0 -106
- package/skills/gm-emit/index.js +0 -90
- package/skills/gm-emit.SKILL.md +0 -70
- package/skills/gm-execute/index.js +0 -91
- package/skills/gm-execute.SKILL.md +0 -88
- package/skills/gm.SKILL.md +0 -63
- package/skills/planning/index.js +0 -107
- package/skills/planning.SKILL.md +0 -118
- package/skills/update-docs/index.js +0 -108
- package/skills/update-docs.SKILL.md +0 -66
- package/test-build.js +0 -29
- package/test-e2e.js +0 -117
- package/test-unified.js +0 -24
- package/test.js +0 -89
- package/update-docs.SKILL.md +0 -66
package/skills/gm-emit/SKILL.md
CHANGED
|
@@ -1,70 +1,37 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: gm-emit
|
|
3
3
|
description: EMIT phase. Pre-emit debug, write files, post-emit verify from disk. Any new unknown triggers immediate snake back to planning — restart chain.
|
|
4
|
+
allowed-tools: Skill, Read, Write
|
|
4
5
|
---
|
|
5
6
|
|
|
6
|
-
#
|
|
7
|
+
# gm-emit — EMIT
|
|
7
8
|
|
|
8
|
-
|
|
9
|
+
EMIT is where intent becomes artifact. The phase exists as a distinct gate because writing without verification produces silent drift between what the agent believes was emitted and what landed on disk.
|
|
9
10
|
|
|
10
|
-
|
|
11
|
+
## Pre-Emit
|
|
11
12
|
|
|
12
|
-
|
|
13
|
+
Before writing, debug the planned state. Read the target paths that will be touched — confirm current contents match the assumption the diff is built on. A diff applied to a file the agent has not freshly read is a diff against a stale model. Spool the reads if scope is wide; serial Read calls are acceptable for a small set. Surface mismatches → snake to PLAN.
|
|
13
14
|
|
|
14
|
-
|
|
15
|
-
- Post-emit variance with known cause → fix in-band, re-verify, stay in EMIT
|
|
16
|
-
- Pre-emit reveals known logic error → `gm-execute`
|
|
17
|
-
- Pre-emit reveals new unknown OR post-emit variance with unknown cause OR scope changed → `planning`
|
|
15
|
+
## Sync-Before-Emit
|
|
18
16
|
|
|
19
|
-
|
|
17
|
+
rs-codeinsight and rs-search outputs feeding EMIT must come from a freshly-completed index. No cache serves a result without a digest match against the live filesystem. Default invocation always runs fresh. `--read-cache` is permitted only when `.codeinsight.digest` matches exactly; on mismatch, the cache auto-refreshes before the result emits. Emitting from an unverified or partial index is forced closure equivalent to bluffing strength — the agent reads stale output as ground truth and acts on a state that no longer exists.
|
|
20
18
|
|
|
21
|
-
|
|
19
|
+
## Write
|
|
22
20
|
|
|
23
|
-
|
|
24
|
-
2. Repair legality — is a local patch dressed as structural repair? Downgrade scope or regress to PLAN.
|
|
25
|
-
3. Lawful downgrade — can a weaker, true statement replace it? Prefer the downgrade.
|
|
26
|
-
4. Alternative-route suppression — is a live competing route being silenced? Preserve it.
|
|
27
|
-
5. Strongest objection — what would the sharpest reviewer pushback be? Articulate it. Cannot articulate = have not understood the alternatives → `gm-execute`.
|
|
21
|
+
One Edit or Write per artifact. No multi-file batches that conceal which file failed if one fails. Spool larger payloads through `in/nodejs/` when shape demands it.
|
|
28
22
|
|
|
29
|
-
|
|
23
|
+
## Post-Emit Verify
|
|
30
24
|
|
|
31
|
-
|
|
25
|
+
After each write, re-read the file from disk and assert the change is present. The Read tool is the post-emit witness. Discrepancy → Fix on Sight: fix at root, re-emit, re-verify. A green Write call is not the witness — the verified disk state is.
|
|
32
26
|
|
|
33
|
-
|
|
27
|
+
## Fix on Sight
|
|
34
28
|
|
|
35
|
-
|
|
36
|
-
const { fn } = await import('/abs/path/to/module.js');
|
|
37
|
-
console.log(await fn(realInput));
|
|
38
|
-
```
|
|
29
|
+
Issues surfaced during EMIT (a write that revealed a previously-hidden import error, a generated file that no longer matches its source) are fixed this turn at root cause. Add the residual to PRD before transitioning if the fix expands scope beyond the current slice.
|
|
39
30
|
|
|
40
|
-
|
|
31
|
+
## Dispatch
|
|
41
32
|
|
|
42
|
-
|
|
33
|
+
`phase-status` to check FSM state before transition. Spool any meaningful reads/writes for auditability.
|
|
43
34
|
|
|
44
|
-
|
|
35
|
+
## Transition
|
|
45
36
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
Re-import from disk — in-memory state is stale and inadmissible. Run identical inputs as pre-emit; output must match the baseline exactly. Known variance → fix and re-verify (self-loop). Unknown variance → `planning`.
|
|
49
|
-
|
|
50
|
-
## Mutables gate
|
|
51
|
-
|
|
52
|
-
Before pre-emit run, read `.gm/mutables.yml`. Any entry with `status: unknown` → regress to `gm-execute`. The pre-tool-use hook hard-blocks Write/Edit/NotebookEdit while unresolved entries exist; trying to emit anyway returns deny. Zero unresolved is the precondition for every legitimacy question below.
|
|
53
|
-
|
|
54
|
-
## Gate (all true at once)
|
|
55
|
-
|
|
56
|
-
- `.gm/mutables.yml` empty/absent OR every entry `status: witnessed` with filled `witness_evidence`
|
|
57
|
-
- Legitimacy gate passed; no refused collapse
|
|
58
|
-
- Pre-emit passed with real inputs and real error inputs
|
|
59
|
-
- Post-emit matches pre-emit exactly
|
|
60
|
-
- Hot-reloadable; errors throw with context (no `|| default`, no `catch { return null }`, no fallbacks)
|
|
61
|
-
- No mocks, fakes, stubs, or scattered test files (delete on discovery)
|
|
62
|
-
- Any behavior change has a corresponding assertion in `test.js` — a change no test catches is a change you cannot prove
|
|
63
|
-
- Browser-facing change → post-emit verify includes a live `exec:browser` witness (boot server → `page.goto` → `page.evaluate` asserting the invariant the change established). Node-side import + test.js does not satisfy this — the final gate runs again in `gm-complete`.
|
|
64
|
-
- Files ≤ 200 lines
|
|
65
|
-
- No duplicate concern (run `exec:codesearch` for the primary concern after writing; overlap → `planning`)
|
|
66
|
-
- No comments, no hardcoded values, no adjectives in identifiers, no unnecessary files
|
|
67
|
-
- Observability: new server subsystems expose `/debug/<subsystem>`; new client modules register in `window.__debug`
|
|
68
|
-
- Structure: no if/else where dispatch suffices; no one-liners that obscure; no reinvented APIs
|
|
69
|
-
- Every fact resolved this phase memorized via background `Agent(memorize)`
|
|
70
|
-
- CHANGELOG.md updated; TODO.md cleared or deleted
|
|
37
|
+
Read `out/<N>.json::nextSkill`. Invoke `Skill(skill="gm:<nextSkill>")` immediately. New unknown → `Skill(skill="gm:planning")`.
|
|
@@ -1,88 +1,37 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: gm-execute
|
|
3
|
-
description: EXECUTE phase AND the foundational execution contract for every skill. Every
|
|
3
|
+
description: EXECUTE phase AND the foundational execution contract for every skill. Every spool dispatch run, every witnessed check, every code search, in every phase, follows this skill's discipline. Resolve all mutables via witnessed execution. Any new unknown triggers immediate snake back to planning — restart chain from PLAN.
|
|
4
|
+
allowed-tools: Skill, Read, Write
|
|
4
5
|
---
|
|
5
6
|
|
|
6
|
-
#
|
|
7
|
+
# gm-execute — EXECUTE
|
|
7
8
|
|
|
8
|
-
|
|
9
|
+
Every PRD item resolves through witnessed execution. Real input through real code into real output, witnessed. Anything less leaves the mutable open.
|
|
9
10
|
|
|
10
|
-
|
|
11
|
+
## Fix on Sight
|
|
11
12
|
|
|
12
|
-
|
|
13
|
+
Every issue surfaced during work is fixed in-band, this turn, at root cause. Defer-markers, swallowed errors, suppressed output, skipped tests, and "address it next session" are variants of the same failure: a known-bad signal carried past the moment of detection. Surface → diagnose → fix at root → re-witness → continue. Pre-existing build breaks, lockfile drift, broken deps, lint failures on neighboring code, stale generated files — all become PRD items the same turn they surface, executed before COMPLETE. The user does not have to ask. Genuinely out-of-reach errors (require credentials, depend on down services, demand product decisions) are named with `blockedBy: external` in the PRD — never silently dropped.
|
|
13
14
|
|
|
14
|
-
##
|
|
15
|
+
## Surprise Absorption Prohibition
|
|
15
16
|
|
|
16
|
-
-
|
|
17
|
-
- Still UNKNOWN → re-run from a different angle (max 2 passes)
|
|
18
|
-
- New unknown OR unresolvable after 2 passes → `planning`
|
|
17
|
+
Every unexpected output is a new mutable. The agent that absorbs surprise into its existing model — "that output is weird but the test still passes" — has just resolved an unknown by narrative, which the discipline rejects on principle. Snake back to PLAN, name the new mutable, witness it, resume. The two-pass rule applies: first pass exposes the surprise, second pass either witnesses the new mutable or proves the surprise was a measurement artifact.
|
|
19
18
|
|
|
20
|
-
##
|
|
19
|
+
## Nothing Fake
|
|
21
20
|
|
|
22
|
-
|
|
21
|
+
What ships runs against real services, real data, real binaries. Stubs, mocks, placeholder returns, fixture-only paths, "TODO: implement", hardcoded sample responses, and demo-mode fallbacks are forbidden. They produce green checks that survive into production and lie about what works. Behavioral detection: code paths that always succeed, always return the same value regardless of input, or short-circuit a real call to satisfy a type signature are stubs. Before writing a shim, check whether an upstream library already provides that surface — maintaining a local reimplementation drifts and ages.
|
|
23
22
|
|
|
24
|
-
|
|
23
|
+
## Browser Witness
|
|
25
24
|
|
|
26
|
-
|
|
27
|
-
- **λ ≥ 2** — two independent paths agree
|
|
28
|
-
- **ε intact** — adjacent invariants hold
|
|
29
|
-
- **Coverage ≥ 0.70** — enough corpus inspected to rule out contradiction
|
|
25
|
+
Editing code that runs in a browser requires a live `exec:browser` witness in the same turn as the edit. Boot the real surface (server up, page reachable, HTTP 200 witnessed), navigate, poll for the global the change affects, `page.evaluate` asserting the specific invariant, capture witnessed values. Variance → fix at root → re-witness. Pure-prose edits to static documents with no JS/canvas/DOM behavior change are exempt with the exemption tagged. Silent skip on actual behavior change is forced closure.
|
|
30
26
|
|
|
31
|
-
|
|
27
|
+
## Mutables Resolve
|
|
32
28
|
|
|
33
|
-
|
|
29
|
+
The `mutable-resolve` verb auto-fires memorize on success. `witness_evidence` is mandatory — file:line, codesearch hit, exec output snippet. Narrative resolution is rejected. Rows that cannot be witnessed stay `unknown` and the EMIT gate stays closed.
|
|
34
30
|
|
|
35
|
-
|
|
31
|
+
## Dispatch
|
|
36
32
|
|
|
37
|
-
|
|
33
|
+
Spool every exec. `mutable-resolve` to flip rows. `phase-status` to read FSM state. `transition` when the PRD slice for this phase is complete.
|
|
38
34
|
|
|
39
|
-
|
|
35
|
+
## Transition
|
|
40
36
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
Code AND utility verbs both run through the file-spool. Write a file to `.gm/exec-spool/in/<lang-or-verb>/<N>.<ext>` — language stems (`in/nodejs/42.js`, `in/python/43.py`, `in/bash/44.sh`, plus typescript, go, rust, c, cpp, java, deno) or verb stems (`in/codesearch/45.txt`, `in/recall/46.txt`, `in/memorize/47.md`, plus wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health). The spool watcher executes and streams stdout to `out/<N>.out`, stderr to `out/<N>.err`, then writes `out/<N>.json` metadata sidecar at completion (taskId, lang, ok, exitCode, durationMs, timedOut, startedAt, endedAt). Both streams return as systemMessage with `--- stdout ---` / `--- stderr ---` separators. File I/O via a nodejs spool file + `require('fs')`. Only `git` and `gh` run directly in Bash. Never `Bash(node/npm/npx/bun)`, never `Bash(exec:<anything>)`.
|
|
44
|
-
|
|
45
|
-
Pack runs: `Promise.allSettled`, each idea own try/catch, under 12s per call. Runner: write `in/runner/<N>.txt` with body `start` | `stop` | `status`.
|
|
46
|
-
|
|
47
|
-
Every exec daemonizes. The hook tails the task logfile up to 30s wall-clock and returns whatever is there — short tasks complete inside the window and look synchronous; long tasks return a task_id with partial output. Continue with `exec:tail` (drain, bounded), `exec:watch` (resume blocking until match or timeout), or `exec:close` (terminate). Never re-spawn a long task to check on it — that orphans the first one. `exec:wait` is a pure timer; `exec:sleep` blocks on a specific task's output; `exec:watch` is the match-or-timeout primitive. Every execution-platform RPC returns the live list of running tasks for this session — close stragglers via `exec:close\n<id>` so the list stays scannable. Session-end (clear/logout/prompt_input_exit) kills the session's tasks; compaction/handoff preserves them.
|
|
48
|
-
|
|
49
|
-
Every utility verb dispatches via `in/<verb>/<N>.txt`; the body of the file is the verb's argument. There is no inline form and no Bash-prefix form — both are denied by the hook.
|
|
50
|
-
|
|
51
|
-
## Codebase search
|
|
52
|
-
|
|
53
|
-
`exec:codesearch` only. Grep, Glob, Find, Explore, raw grep/rg/find inside `exec:bash` are all hook-blocked.
|
|
54
|
-
|
|
55
|
-
```
|
|
56
|
-
exec:codesearch
|
|
57
|
-
<two-word query>
|
|
58
|
-
```
|
|
59
|
-
|
|
60
|
-
Start two words, change/add one per pass, minimum four attempts before concluding absent. Known absolute path → `Read`. Known directory → `exec:nodejs` + `fs.readdirSync`.
|
|
61
|
-
|
|
62
|
-
## Utility verb failure handling
|
|
63
|
-
|
|
64
|
-
**Utility verb failures must surface**: exec:memorize, exec:recall, exec:codesearch, and other utility verbs may fail (socket unavailable, timeout, network error). Failures do not block witness completion but must be reported to the user with error context. Fallback mechanisms (AGENTS.md for memorize) ensure memory preservation even when rs-learn is temporarily unavailable.
|
|
65
|
-
|
|
66
|
-
## Import-based execution
|
|
67
|
-
|
|
68
|
-
Hypotheses become real by importing actual modules from disk. Reimplemented behavior is UNKNOWN. Write the import probe to the spool:
|
|
69
|
-
|
|
70
|
-
```
|
|
71
|
-
# write .gm/exec-spool/in/nodejs/42.js
|
|
72
|
-
const { fn } = await import('/abs/path/to/module.js');
|
|
73
|
-
console.log(await fn(realInput));
|
|
74
|
-
```
|
|
75
|
-
|
|
76
|
-
Differential diagnosis: smallest reproduction → compare actual vs expected → name the delta — that delta is the mutable.
|
|
77
|
-
|
|
78
|
-
## Edits depend on witnesses
|
|
79
|
-
|
|
80
|
-
Hypothesis → run → witness → edit. An edit before a witness is a guess. Scan via `exec:codesearch` before creating or modifying — duplicate concern regresses to `planning`. Code-quality preference: native → library → structure → write.
|
|
81
|
-
|
|
82
|
-
## Parallel subagents
|
|
83
|
-
|
|
84
|
-
Up to 3 `gm:gm` subagents for independent items in one message. Browser escalation: `exec:browser` → `browser` skill → screenshot only as last resort.
|
|
85
|
-
|
|
86
|
-
## CI is automated
|
|
87
|
-
|
|
88
|
-
`git push` triggers the Stop hook to watch Actions for the pushed HEAD on the same repo (downstream cascades are not auto-watched). Green → Stop approves with summary; failure → run names + IDs surfaced, investigate via `gh run view <id> --log-failed`. Deadline 180s (override `GM_CI_WATCH_SECS`).
|
|
37
|
+
Read `out/<N>.json::nextSkill`. Invoke `Skill(skill="gm:<nextSkill>")` immediately. New unknown surfaces → snake to `Skill(skill="gm:planning")`, restart chain.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: gm-skill
|
|
3
|
+
description: Canonical universal harness — AI-native software engineering via skill-driven orchestration; bootstraps plugkit for task execution and session isolation
|
|
4
|
+
allowed-tools: Skill, Read, Write
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# GM — Universal Skill Harness
|
|
8
|
+
|
|
9
|
+
Single canonical body re-exported by every platform-specific gm-<platform> skill. All 15 platforms share this identical surface. AI-native software engineering orchestrated as a continuous chain: PLAN → EXECUTE → EMIT → VERIFY → UPDATE-DOCS, no stops between phases, no permission gates, the user's first request is the authorization for the whole chain.
|
|
10
|
+
|
|
11
|
+
## Bootstrap
|
|
12
|
+
|
|
13
|
+
`bun x gm-plugkit@latest --daemon` downloads the correct platform binary, verifies SHA256, starts the spool watcher daemon. Idempotent. Call once at session start. Subsequent calls no-op.
|
|
14
|
+
|
|
15
|
+
Session-ID threading: at skill invoke, generate or detect SESSION_ID (env `SESSION_ID` or `uuid()`). Every rs-exec RPC body and every spool-written task body carries `sessionId: "<id>"`. Task-scoped cleanup (deleteTask, getTask, appendOutput, killSessionTasks) requires matching sessionId. Absence is hard-rejected by the handler — no orphaned tasks.
|
|
16
|
+
|
|
17
|
+
## Spool Dispatch Surface
|
|
18
|
+
|
|
19
|
+
Every dispatch goes through the spool. Tool args are ephemeral, inline, do not survive compaction, are not replayable. A file-based surface inverts every one of those: the request lives on disk before the watcher reads it, the watcher is detached from the agent process, the output triplet (`.out`, `.err`, `.json`) is auditable after the fact.
|
|
20
|
+
|
|
21
|
+
Write to `.gm/exec-spool/in/<lang>/<N>.<ext>` (nodejs, python, bash, typescript, go, rust, c, cpp, java, deno) or `in/<verb>/<N>.txt` (codesearch, recall, memorize, wait, sleep, status, close, browser, runner, type, kill-port, forget, feedback, learn-status, learn-debug, learn-build, discipline, pause, health). Watcher streams `out/<N>.out` and `out/<N>.err` line-by-line, then writes `out/<N>.json` metadata (exitCode, durationMs, timedOut, startedAt, endedAt) at completion.
|
|
22
|
+
|
|
23
|
+
Only `git` and `gh` run directly via the Bash tool. Inline `node script.js`, `Bash(exec:<anything>)`, JSON-form dispatch — all denied at the hook layer.
|
|
24
|
+
|
|
25
|
+
## Daemonize by Default
|
|
26
|
+
|
|
27
|
+
The watcher returns a task_id immediately and tails the logfile up to 30 seconds of wall-clock before returning. Short tasks complete inside the window and look synchronous. Long tasks return the task_id with partial output and continue running. The agent never re-spawns a long task to check on it — that orphans the first one.
|
|
28
|
+
|
|
29
|
+
Resumption grammar: `tail` drains additional output without blocking. `watch` blocks until a regex matches or timeout elapses. `wait` is a pure timer. `sleep` blocks on a specific task's output. `close` terminates. Every RPC response carries `running_task_ids` for the calling session so the agent never loses track of background work it spawned.
|
|
30
|
+
|
|
31
|
+
## Hooks Throw, Never Mutate
|
|
32
|
+
|
|
33
|
+
A hook that blocks a tool call throws an error with an imperative instruction string. It does not rewrite the call's arguments into a self-failing form. The thrown error is the entire denial surface. Throw form is for "use a different tool" (the model adapts policy); mutate form would be for "run this corrected version" (the model reads it as a broken tool and retries with simpler commands, reinforcing the wrong mental model).
|
|
34
|
+
|
|
35
|
+
## Meaning Through Haiku
|
|
36
|
+
|
|
37
|
+
Any task whose correctness depends on understanding — summarize, classify, extract intent, rewrite, translate, semantic dedup, score, label, decide-if-two-texts-mean-the-same — routes through `Agent(subagent_type='gm:textprocessing', model='haiku', ...)`. One subagent per item, N items in N parallel calls in one message. Code does mechanics well and meaning badly. A keyword-list or regex-on-meaning-phrases loop deciding semantic questions is a stub that ships a green check that lies.
|
|
38
|
+
|
|
39
|
+
## End-to-End Chaining
|
|
40
|
+
|
|
41
|
+
When SKILL.md includes `end-to-end: true`, the adapter parses stdout for trailing JSON: `{"nextSkill": "...", "context": {...}, "phase": "..."}`. Non-null `nextSkill` → invoke `Skill(skill="gm:<nextSkill>")` with context, repeat until null. Five skill invocations auto-chain into one user invocation.
|
|
42
|
+
|
|
43
|
+
Every task returns complete: taskId, exitCode, durationMs, timedOut, stdout, stderr. Background tasks return immediately with task_id; continue with `in/status/<N>.txt` (tail), `in/watch/<N>.txt` (watch), or `in/close/<N>.txt` (close).
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
const fs = require('fs');
|
|
2
|
+
const path = require('path');
|
|
3
|
+
|
|
4
|
+
const SKILL_MD_PATH = path.join(__dirname, 'SKILL.md');
|
|
5
|
+
|
|
6
|
+
function loadCanonicalSkill() {
|
|
7
|
+
return fs.readFileSync(SKILL_MD_PATH, 'utf-8');
|
|
8
|
+
}
|
|
9
|
+
|
|
10
|
+
function renderPlatformSkill(platformName) {
|
|
11
|
+
return `---
|
|
12
|
+
name: gm-${platformName}
|
|
13
|
+
description: AI-native software engineering via skill-driven orchestration on ${platformName}; bootstraps plugkit for task execution and session isolation
|
|
14
|
+
allowed-tools: Skill
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
See [gm-skill](../gm-skill/SKILL.md). All platforms share the same plugkit dispatch surface.
|
|
18
|
+
`;
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
module.exports = { loadCanonicalSkill, renderPlatformSkill, SKILL_MD_PATH };
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: governance
|
|
3
|
+
description: Governance reference invoked by PLAN/EXECUTE/EMIT/VERIFY. Separates route discovery (PLAN) from weak-prior handoff (EXECUTE) from earned-emission legitimacy (EMIT/VERIFY). Encodes 16-failure taxonomy, 4 state planes, ΔS/λ/ε/Coverage metrics, governance stress suite.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Governance — Route, bridge, legitimacy
|
|
7
|
+
|
|
8
|
+
Three roles, three failure surfaces.
|
|
9
|
+
|
|
10
|
+
1. Route discovery — what family of fault? Owned by `planning`.
|
|
11
|
+
2. Weak-prior bridge — plausibility is not authorization. Owned by `gm-execute`.
|
|
12
|
+
3. Legitimacy gate — did this answer earn its strength? Owned by `gm-emit` and `gm-complete`.
|
|
13
|
+
|
|
14
|
+
## Five refused collapses
|
|
15
|
+
|
|
16
|
+
1. Route → authorization ("plan looks good" treated as "code is right")
|
|
17
|
+
2. Candidate → structural repair (local patch shipped as architectural fix)
|
|
18
|
+
3. Hidden → public law (internal convenience shipped as contract)
|
|
19
|
+
4. Cleanliness → legitimacy (compiles treated as evidence-supports)
|
|
20
|
+
5. One strong route → universal closure (best answer treated as only answer)
|
|
21
|
+
|
|
22
|
+
When in doubt, preserve ambiguity. Lawful downgrade beats forced closure.
|
|
23
|
+
|
|
24
|
+
## 7 route families
|
|
25
|
+
|
|
26
|
+
| Family | What breaks | Repair |
|
|
27
|
+
|---|---|---|
|
|
28
|
+
| grounding | Retrieval, lookup, fact anchor | Re-ground against source of truth |
|
|
29
|
+
| reasoning | Inference chain, logic | Shorten chain, re-derive from primitives |
|
|
30
|
+
| state | Memory, session continuity | Make state addressable |
|
|
31
|
+
| execution | Runtime, scheduling, process | Isolate, witness, re-run |
|
|
32
|
+
| observability | Inspection, tracing | Add permanent structure |
|
|
33
|
+
| boundary | Interfaces, contracts, seams | Re-assert contract from one source |
|
|
34
|
+
| representation | Data shape, schema, type | Make illegal states unrepresentable |
|
|
35
|
+
|
|
36
|
+
## 16 failure modes
|
|
37
|
+
|
|
38
|
+
| # | Name | Family |
|
|
39
|
+
|---|---|---|
|
|
40
|
+
| 1 | Hallucination & chunk drift | grounding |
|
|
41
|
+
| 2 | Interpretation collapse | reasoning |
|
|
42
|
+
| 3 | Long reasoning drift | reasoning |
|
|
43
|
+
| 4 | Bluffing / overconfidence | reasoning |
|
|
44
|
+
| 5 | Semantic ≠ embedding | grounding |
|
|
45
|
+
| 6 | Logic collapse, needs reset | reasoning |
|
|
46
|
+
| 7 | Memory breaks across sessions | state |
|
|
47
|
+
| 8 | Debugging black box | observability |
|
|
48
|
+
| 9 | Entropy collapse | state |
|
|
49
|
+
| 10 | Creative freeze | representation |
|
|
50
|
+
| 11 | Symbolic collapse | reasoning |
|
|
51
|
+
| 12 | Philosophical recursion | reasoning |
|
|
52
|
+
| 13 | Multi-agent chaos | state |
|
|
53
|
+
| 14 | Bootstrap ordering | execution |
|
|
54
|
+
| 15 | Deployment deadlock | execution |
|
|
55
|
+
| 16 | Pre-deploy collapse | execution |
|
|
56
|
+
|
|
57
|
+
## 4 state planes
|
|
58
|
+
|
|
59
|
+
| Plane | Owner | States | Implication |
|
|
60
|
+
|---|---|---|---|
|
|
61
|
+
| route_fit | planning | unexamined → examined → dominant | Dominant ≠ authorized |
|
|
62
|
+
| authorization | gm-execute | none → weak_prior → witnessed | Only witnessed permits emission |
|
|
63
|
+
| repair_legality | gm-emit | unverified → local_candidate → structural | Local cannot ship as structural |
|
|
64
|
+
| hidden_decision_posture | gm-complete | open → down_weighted → closed | Close only after CI green |
|
|
65
|
+
|
|
66
|
+
## Quality metrics
|
|
67
|
+
|
|
68
|
+
- **ΔS** — witnessed output equals expected. ΔS≠0 = still open.
|
|
69
|
+
- **λ ≥ 2** — two independent paths agree. λ=1 = still unknown.
|
|
70
|
+
- **ε** — adjacent invariants hold (types, tests, neighboring callers).
|
|
71
|
+
- **Coverage ≥ 0.70** — enough corpus inspected to rule out contradicting evidence.
|
|
72
|
+
|
|
73
|
+
All four pass before a mutable flips UNKNOWN → KNOWN.
|
|
74
|
+
|
|
75
|
+
## Stress suite
|
|
76
|
+
|
|
77
|
+
Run before declaring COMPLETE.
|
|
78
|
+
|
|
79
|
+
| # | Case | Failure if flunked |
|
|
80
|
+
|---|---|---|
|
|
81
|
+
| M1 | Missing evidence forced decision | Over-commits to one cause |
|
|
82
|
+
| F1 | Financial advice unsourced number | Ships confident figure from vibes |
|
|
83
|
+
| C1 | Contract ambiguous clause | Collapses two readings into one |
|
|
84
|
+
| H1 | HR contradictory witnesses | Hides contradiction to force closure |
|
|
85
|
+
| S1 | Security attribution under pressure | Picks plausible, not witnessed |
|
|
86
|
+
| B1 | Business RCA multiple candidates | Single-route closure |
|
|
87
|
+
| A1 | Authenticity eval partial signals | Surface appearance beats evidence |
|
|
88
|
+
| D1 | Deploy-gate under CI flake | Treats noise as green |
|
|
89
|
+
|
|
90
|
+
Legal: `illegal_commitment=0`, `evidence_boundary_violation=0`, `lawful_downgrade=available` in all 8, `outlier_visibility=preserved`.
|
|
91
|
+
|
|
92
|
+
## Phase application
|
|
93
|
+
|
|
94
|
+
- **planning** — tag every `.prd` item with route family + failure-mode IDs
|
|
95
|
+
- **gm-execute** — weak prior only; witnessed probe before authorization
|
|
96
|
+
- **gm-emit** — legitimacy gate; unearned specificity → lawful downgrade
|
|
97
|
+
- **gm-complete** — stress-suite pass; close posture only when CI is green
|
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: pages
|
|
3
|
+
description: Scaffold and maintain a GitHub Pages site. Buildless in browser (webjsx + rippleui via CDN), flatspace for content aggregation built during GH Actions. Use when user wants to create or update a GH Pages site.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Pages — GitHub Pages site scaffolder
|
|
7
|
+
|
|
8
|
+
Scaffold a complete GH Pages site with no local build step. Content via flatspace flat-file CMS, UI via webjsx + rippleui CDN, GH Actions builds and deploys. Follow the full chain: `planning → gm-execute → gm-emit → gm-complete → update-docs`.
|
|
9
|
+
|
|
10
|
+
## Stack
|
|
11
|
+
|
|
12
|
+
| Layer | Tool | How |
|
|
13
|
+
|---|---|---|
|
|
14
|
+
| UI rendering | [webjsx](https://webjsx.org) | ES module via importmap, `applyDiff` for DOM updates |
|
|
15
|
+
| Styling | [rippleui](https://ripple-ui.com) | CDN `<link>` — Tailwind-based component classes |
|
|
16
|
+
| Content CMS | [flatspace](https://npmjs.com/package/flatspace) | Aggregates `content/` → `docs/data/*.json` at build time |
|
|
17
|
+
| Build | GH Actions | `npx flatspace` runs in CI, commits output to `docs/` |
|
|
18
|
+
| Hosting | GitHub Pages | Source set to "GitHub Actions" |
|
|
19
|
+
|
|
20
|
+
## Layout
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
<project>/
|
|
24
|
+
content/
|
|
25
|
+
pages/
|
|
26
|
+
posts/
|
|
27
|
+
data/
|
|
28
|
+
docs/
|
|
29
|
+
index.html # committed, never regenerated
|
|
30
|
+
app.js # committed
|
|
31
|
+
data/ # flatspace output, gitignored
|
|
32
|
+
.github/workflows/pages.yml
|
|
33
|
+
flatspace.config.js
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## index.html
|
|
37
|
+
|
|
38
|
+
```html
|
|
39
|
+
<!DOCTYPE html>
|
|
40
|
+
<html lang="en">
|
|
41
|
+
<head>
|
|
42
|
+
<meta charset="UTF-8">
|
|
43
|
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
44
|
+
<title>{{SITE_TITLE}}</title>
|
|
45
|
+
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/rippleui@1.12.1/dist/css/styles.css">
|
|
46
|
+
<script type="importmap">
|
|
47
|
+
{
|
|
48
|
+
"imports": {
|
|
49
|
+
"webjsx": "https://cdn.jsdelivr.net/npm/webjsx@0.0.42/dist/index.js",
|
|
50
|
+
"webjsx/jsx-runtime": "https://cdn.jsdelivr.net/npm/webjsx@0.0.42/dist/jsx-runtime.js"
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
</script>
|
|
54
|
+
<script type="module" src="./app.js"></script>
|
|
55
|
+
</head>
|
|
56
|
+
<body class="bg-backgroundPrimary text-content1 min-h-screen">
|
|
57
|
+
<div id="root"></div>
|
|
58
|
+
</body>
|
|
59
|
+
</html>
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## app.js
|
|
63
|
+
|
|
64
|
+
```js
|
|
65
|
+
import { applyDiff } from 'webjsx';
|
|
66
|
+
|
|
67
|
+
const h = (tag, props, ...children) => ({ tag, props: props || {}, children });
|
|
68
|
+
const state = { page: null, data: {} };
|
|
69
|
+
|
|
70
|
+
async function loadData(path) { return (await fetch(path)).json(); }
|
|
71
|
+
function render() { applyDiff(document.getElementById('root'), App(state)); }
|
|
72
|
+
|
|
73
|
+
function App(s) {
|
|
74
|
+
if (!s.page) return h('div', { class: 'flex justify-center p-8' }, h('span', { class: 'spinner' }));
|
|
75
|
+
return h('div', { class: 'max-w-4xl mx-auto p-4' },
|
|
76
|
+
h('nav', { class: 'navbar bg-backgroundSecondary mb-6' },
|
|
77
|
+
h('span', { class: 'navbar-brand text-xl font-bold' }, s.page.title)
|
|
78
|
+
),
|
|
79
|
+
h('main', {}, ...s.page.sections.map(Section))
|
|
80
|
+
);
|
|
81
|
+
}
|
|
82
|
+
|
|
83
|
+
function Section(section) {
|
|
84
|
+
return h('section', { class: 'card mb-4 p-6' },
|
|
85
|
+
h('h2', { class: 'text-2xl font-bold mb-2' }, section.title),
|
|
86
|
+
h('p', { class: 'text-content2' }, section.body)
|
|
87
|
+
);
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
async function main() { state.page = await loadData('./data/index.json'); render(); }
|
|
91
|
+
main();
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## flatspace.config.js
|
|
95
|
+
|
|
96
|
+
```js
|
|
97
|
+
module.exports = {
|
|
98
|
+
input: './content',
|
|
99
|
+
output: './docs/data',
|
|
100
|
+
collections: {
|
|
101
|
+
pages: { dir: 'pages', format: 'markdown' },
|
|
102
|
+
posts: { dir: 'posts', format: 'markdown', sortBy: 'date', order: 'desc' },
|
|
103
|
+
data: { dir: 'data', format: 'json' }
|
|
104
|
+
}
|
|
105
|
+
};
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
## pages.yml
|
|
109
|
+
|
|
110
|
+
```yaml
|
|
111
|
+
name: Deploy GitHub Pages
|
|
112
|
+
on:
|
|
113
|
+
push:
|
|
114
|
+
branches: [main]
|
|
115
|
+
workflow_dispatch:
|
|
116
|
+
|
|
117
|
+
permissions:
|
|
118
|
+
contents: write
|
|
119
|
+
pages: write
|
|
120
|
+
id-token: write
|
|
121
|
+
|
|
122
|
+
jobs:
|
|
123
|
+
build:
|
|
124
|
+
runs-on: ubuntu-latest
|
|
125
|
+
steps:
|
|
126
|
+
- uses: actions/checkout@v4
|
|
127
|
+
- uses: actions/setup-node@v4
|
|
128
|
+
with: { node-version: '20' }
|
|
129
|
+
- name: Build content with flatspace
|
|
130
|
+
run: npx flatspace
|
|
131
|
+
- name: Commit built data
|
|
132
|
+
run: |
|
|
133
|
+
git config user.name "github-actions[bot]"
|
|
134
|
+
git config user.email "github-actions[bot]@users.noreply.github.com"
|
|
135
|
+
git add docs/data/
|
|
136
|
+
git diff --staged --quiet || git commit -m "chore: build content [skip ci]"
|
|
137
|
+
git push
|
|
138
|
+
- uses: actions/upload-pages-artifact@v3
|
|
139
|
+
with: { path: docs/ }
|
|
140
|
+
|
|
141
|
+
deploy:
|
|
142
|
+
needs: build
|
|
143
|
+
runs-on: ubuntu-latest
|
|
144
|
+
environment:
|
|
145
|
+
name: github-pages
|
|
146
|
+
url: ${{ steps.deployment.outputs.page_url }}
|
|
147
|
+
steps:
|
|
148
|
+
- id: deployment
|
|
149
|
+
uses: actions/deploy-pages@v4
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
## Scaffold sequence
|
|
153
|
+
|
|
154
|
+
Read existing `docs/` and `content/` if present — never clobber existing content. Create the directory structure. Write `docs/index.html`, `docs/app.js`, `flatspace.config.js`, `.github/workflows/pages.yml`, `content/pages/index.md` with minimal frontmatter (`title`, `sections` array). Add `docs/data/` to `.gitignore`. Verify GH Pages setting is "GitHub Actions" in repo Settings — remind the user if you can't verify.
|
|
155
|
+
|
|
156
|
+
## rippleui classes
|
|
157
|
+
|
|
158
|
+
| Component | Class |
|
|
159
|
+
|---|---|
|
|
160
|
+
| Button | `btn btn-primary`, `btn btn-secondary`, `btn btn-ghost` |
|
|
161
|
+
| Card | `card p-4` |
|
|
162
|
+
| Input | `input input-primary` |
|
|
163
|
+
| Navbar | `navbar` + `navbar-brand` |
|
|
164
|
+
| Badge | `badge badge-primary` |
|
|
165
|
+
| Alert | `alert alert-success`, `alert alert-error` |
|
|
166
|
+
| Spinner | `spinner` |
|
|
167
|
+
| Divider | `divider` |
|
|
168
|
+
|
|
169
|
+
Background `bg-backgroundPrimary`, `bg-backgroundSecondary`. Text `text-content1`, `text-content2`. rippleui CSS color vars (e.g. `--gray-2`) are raw space-separated RGB tuples — invalid in `rgb()` directly. Use the component classes instead.
|
|
170
|
+
|
|
171
|
+
## webjsx
|
|
172
|
+
|
|
173
|
+
No JSX transpile needed. Use the `h()` factory in `.js` files served directly. `.jsx` with native importmap requires the server to set the correct MIME type, which GH Pages does not — stay in `.js` + `h()`.
|
|
174
|
+
|
|
175
|
+
`applyDiff(domNode, vnodeOrArray)` — never pass a string. State updates mutate `state` and call `render()`; no reactive system.
|
|
176
|
+
|
|
177
|
+
## Content format
|
|
178
|
+
|
|
179
|
+
Markdown with YAML frontmatter:
|
|
180
|
+
|
|
181
|
+
```markdown
|
|
182
|
+
---
|
|
183
|
+
title: Home
|
|
184
|
+
sections:
|
|
185
|
+
- title: Welcome
|
|
186
|
+
body: Hello world
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
# Home
|
|
190
|
+
|
|
191
|
+
Full markdown body here.
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Output `docs/data/pages/index.json`:
|
|
195
|
+
|
|
196
|
+
```json
|
|
197
|
+
{ "title": "Home", "sections": [...], "body": "<p>Full markdown body here.</p>", "slug": "index" }
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
## Gotchas
|
|
201
|
+
|
|
202
|
+
GH Pages must be set to "GitHub Actions" in Settings → Pages. "Deploy from branch" ignores the deploy-pages action.
|
|
203
|
+
|
|
204
|
+
`docs/data/` is gitignored; `docs/index.html` and `docs/app.js` are not — they are the committed source files.
|
|
205
|
+
|
|
206
|
+
`npx flatspace` cold-start is ~10s on first CI run; subsequent runs use the `actions/setup-node` cache.
|
|
207
|
+
|
|
208
|
+
Pin the webjsx CDN version in importmap (e.g. `@0.0.42`) — `@latest` breaks silently on upstream updates.
|