gm-codex 2.0.726 → 2.0.788
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.codex-plugin/plugin.json +1 -1
- package/README.md +21 -2
- package/agents/memorize.md +13 -0
- package/agents/research-worker.md +34 -0
- package/bin/plugkit.sha256 +6 -6
- package/bin/plugkit.version +1 -1
- package/cli.js +81 -0
- package/gm.json +2 -2
- package/hooks/hooks.spec.json +73 -0
- package/install.js +82 -0
- package/package.json +6 -4
- package/plugin.json +1 -1
- package/skills/gm/SKILL.md +45 -22
- package/skills/gm-complete/SKILL.md +7 -1
- package/skills/gm-emit/SKILL.md +4 -0
- package/skills/gm-execute/SKILL.md +16 -0
- package/skills/planning/SKILL.md +64 -12
- package/skills/research/SKILL.md +45 -0
- package/skills/update-docs/SKILL.md +4 -0
- package/uninstall.js +43 -0
- package/.github/workflows/publish-npm.yml +0 -44
package/README.md
CHANGED
|
@@ -8,7 +8,18 @@
|
|
|
8
8
|
bun x gm-codex@latest
|
|
9
9
|
```
|
|
10
10
|
|
|
11
|
-
|
|
11
|
+
Installs the plugin to `~/.codex/plugins/gm-codex` AND wires `~/.codex/config.toml` so Codex auto-loads hooks, MCP servers, and skills on next start. No manual TOML editing required. Idempotent — re-run to upgrade.
|
|
12
|
+
|
|
13
|
+
### What gets registered in `config.toml`
|
|
14
|
+
|
|
15
|
+
Inside a managed block fenced by `# >>> gm-codex managed` / `# <<< gm-codex managed` sentinels:
|
|
16
|
+
|
|
17
|
+
- `[features].codex_hooks = true`
|
|
18
|
+
- `[[hooks.<Event>]]` blocks for `PreToolUse`, `PostToolUse`, `SessionStart`, `UserPromptSubmit`, `Stop` — pointing at the bundled `plugkit` and node hook scripts under the install dir
|
|
19
|
+
- `[mcp_servers.<id>]` for any MCP servers declared in bundled `.mcp.json`
|
|
20
|
+
- `[[skills.config]]` entries for every bundled skill folder
|
|
21
|
+
|
|
22
|
+
Content outside the managed block is preserved verbatim. The installer never edits user-authored sections.
|
|
12
23
|
|
|
13
24
|
### Repository Installation (Project-Specific)
|
|
14
25
|
|
|
@@ -18,7 +29,15 @@ npm install gm-codex
|
|
|
18
29
|
npx gm-codex-install
|
|
19
30
|
```
|
|
20
31
|
|
|
21
|
-
|
|
32
|
+
Copies plugin assets into `<project>/.codex/plugins/gm-codex` and writes the same managed block into `<project>/.codex/config.toml` (project-trusted layer).
|
|
33
|
+
|
|
34
|
+
### Uninstall
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
npx gm-codex-uninstall
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Removes the plugin directory and strips the managed block from `config.toml`, leaving any user-authored content untouched.
|
|
22
41
|
|
|
23
42
|
### Manual Installation
|
|
24
43
|
|
package/agents/memorize.md
CHANGED
|
@@ -11,6 +11,19 @@ Writes facts to two places only: **AGENTS.md** (non-obvious technical caveats) a
|
|
|
11
11
|
Resolve at start of every run:
|
|
12
12
|
|
|
13
13
|
- **Project root** = `process.cwd()` when invoked. `AGENTS.md` is `<project root>/AGENTS.md`.
|
|
14
|
+
- **Reach check** = run `gh api repos/<owner>/<repo> --jq .permissions.push` on `<project root>`'s `git remote get-url origin`. Cache the answer for the run. If the result is anything other than literal `true` (false, no remote, non-github URL, gh CLI missing, gh not authed, repo private and inaccessible), the project is **out-of-reach**.
|
|
15
|
+
|
|
16
|
+
## STEP 0: SCOPE GUARD — DO NOT POLLUTE OUT-OF-REACH PROJECTS
|
|
17
|
+
|
|
18
|
+
If the reach check returns out-of-reach:
|
|
19
|
+
|
|
20
|
+
- **Do** ingest classified facts into rs-learn (Step 2) — rs-learn is per-user, not per-project, so private notes about a project the user is reading-but-not-owning are safe there.
|
|
21
|
+
- **Do not** read or edit `<project root>/AGENTS.md` (Step 3). Skip the file entirely.
|
|
22
|
+
- **Do not** run the AGENTS.md ↔ rs-learn migration audit (Step 4). The audit edits AGENTS.md.
|
|
23
|
+
|
|
24
|
+
Reason: agents running in a cwd that points at a third-party repo (e.g. running Claude inside a checkout of `nousresearch/hermes-agent` while building a downstream port) must not write project-specific notes into the upstream project's AGENTS.md. That AGENTS.md belongs to the upstream maintainers. Personal porting notes belong in the user's downstream repo's AGENTS.md, or — when the work spans multiple repos and there's no clean home — in rs-learn only.
|
|
25
|
+
|
|
26
|
+
When the reach check returns **in-reach**, proceed normally with all four steps below.
|
|
14
27
|
|
|
15
28
|
## STEP 1: CLASSIFY
|
|
16
29
|
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-worker
|
|
3
|
+
description: Focused single-thread web investigator. Spawned in parallel by the research skill. Owns one question, returns one path.
|
|
4
|
+
agent: true
|
|
5
|
+
allowed-tools: WebFetch, WebSearch, Bash
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Research Worker
|
|
9
|
+
|
|
10
|
+
One question. One context. One file on disk. One-line return.
|
|
11
|
+
|
|
12
|
+
## Brief shape
|
|
13
|
+
|
|
14
|
+
The spawning prompt names: the question, the answer shape expected, the explicit out-of-scope boundary, and the destination path `.gm/research/<slug>/<worker-id>.md`. If any of those is missing or ambiguous, treat that as the first finding — record what was unclear and stop, rather than guessing scope.
|
|
15
|
+
|
|
16
|
+
## Investigation
|
|
17
|
+
|
|
18
|
+
Open with a `WebSearch` broad enough to map sources, narrow enough to exclude obviously off-topic results. Pick the two or three highest-quality hits — primary docs, dated authored posts, RFCs, source repos — and `WebFetch` each. Aggregator pages, content farms, and undated listicles are last resort, flagged as such when used.
|
|
19
|
+
|
|
20
|
+
Stop fetching when the question is answered to the shape requested. Extra fetches past sufficiency burn tokens the orchestrator needs for synthesis.
|
|
21
|
+
|
|
22
|
+
## Output
|
|
23
|
+
|
|
24
|
+
Write the findings file with: the question restated, the answer in the requested shape, every non-trivial claim followed by `[source: <url>]` and a quoted span, a `Sources` section listing every URL touched with a one-line quality note, and an `Unresolved` section naming anything the brief asked for that the search did not yield.
|
|
25
|
+
|
|
26
|
+
A claim without an inline source URL is a defect; remove it before writing the file.
|
|
27
|
+
|
|
28
|
+
## Return
|
|
29
|
+
|
|
30
|
+
Return only: the absolute path to the findings file, and a single sentence summarising the headline answer. Never return the full findings inline — the orchestrator reads from disk.
|
|
31
|
+
|
|
32
|
+
## Boundary
|
|
33
|
+
|
|
34
|
+
Do not chase tangents that surface mid-investigation, however interesting. Note them in `Unresolved` so the orchestrator can decide whether to fan out a new worker. Stretching past the brief is the worker-side equivalent of the orchestrator skipping fan-out.
|
package/bin/plugkit.sha256
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
1
|
+
882cf8f24ae1755f1e503dca8b6010f3f53bf9325c0e58e059b0d9834066a151 plugkit-win32-x64.exe
|
|
2
|
+
d440cd5d3e24c08325c8b7cd45b68abe3f2fcd7195f7ee25e1eaa01c610e2971 plugkit-win32-arm64.exe
|
|
3
|
+
b30b0f6ad8d516283402e8f06a2ae1dae70fb4381ed9e1993751fb1baa7dff6e plugkit-darwin-x64
|
|
4
|
+
2f451f3783af406ba060f06d2ae34653f68b3731f30a0a6e269cf5f8c37372d5 plugkit-darwin-arm64
|
|
5
|
+
6ea361e856a42d69e21cc2037bbe949f3e96fdf5b00412f526b1a09d08bf104f plugkit-linux-x64
|
|
6
|
+
2caa17f4c162ad8fbd2f92ca9432344449313ffc8f75983210ada7ed481cbc03 plugkit-linux-arm64
|
package/bin/plugkit.version
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
0.1.
|
|
1
|
+
0.1.256
|
package/cli.js
CHANGED
|
@@ -28,6 +28,87 @@ try {
|
|
|
28
28
|
|
|
29
29
|
filesToCopy.forEach(([src, dst]) => copyRecursive(path.join(srcDir, src), path.join(destDir, dst)));
|
|
30
30
|
|
|
31
|
+
const SENTINEL_START = '# >>> gm-codex managed (do not edit between sentinels)';
|
|
32
|
+
const SENTINEL_END = '# <<< gm-codex managed';
|
|
33
|
+
const tomlString = (s) => '"' + String(s).replace(/\\/g, '\\\\').replace(/"/g, '\\"') + '"';
|
|
34
|
+
const expand = (cmd, root) => String(cmd).split('${CODEX_PLUGIN_ROOT}').join(root);
|
|
35
|
+
function buildHooksToml(hooksJson, root) {
|
|
36
|
+
const hooks = (hooksJson && hooksJson.hooks) || {};
|
|
37
|
+
const lines = [];
|
|
38
|
+
for (const event of Object.keys(hooks)) {
|
|
39
|
+
for (const group of (hooks[event] || [])) {
|
|
40
|
+
const matcher = group.matcher || '*';
|
|
41
|
+
const entries = group.hooks || [];
|
|
42
|
+
if (!entries.length) continue;
|
|
43
|
+
lines.push('', '[[hooks.' + event + ']]', 'matcher = ' + tomlString(matcher));
|
|
44
|
+
for (const e of entries) {
|
|
45
|
+
lines.push('', '[[hooks.' + event + '.hooks]]');
|
|
46
|
+
lines.push('type = ' + tomlString(e.type || 'command'));
|
|
47
|
+
lines.push('command = ' + tomlString(expand(e.command, root)));
|
|
48
|
+
const t = typeof e.timeout === 'number' ? Math.max(1, Math.round(e.timeout / 1000)) : 60;
|
|
49
|
+
lines.push('timeout = ' + t);
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
return lines.join('\n');
|
|
54
|
+
}
|
|
55
|
+
function buildMcpToml(mcpJson) {
|
|
56
|
+
const servers = (mcpJson && mcpJson.mcpServers) || {};
|
|
57
|
+
const lines = [];
|
|
58
|
+
for (const id of Object.keys(servers)) {
|
|
59
|
+
const s = servers[id];
|
|
60
|
+
lines.push('', '[mcp_servers.' + id + ']');
|
|
61
|
+
if (s.command) lines.push('command = ' + tomlString(s.command));
|
|
62
|
+
if (Array.isArray(s.args)) lines.push('args = [' + s.args.map(tomlString).join(', ') + ']');
|
|
63
|
+
if (s.cwd) lines.push('cwd = ' + tomlString(s.cwd));
|
|
64
|
+
if (s.url) lines.push('url = ' + tomlString(s.url));
|
|
65
|
+
if (s.env && typeof s.env === 'object') {
|
|
66
|
+
lines.push('', '[mcp_servers.' + id + '.env]');
|
|
67
|
+
for (const k of Object.keys(s.env)) lines.push(k + ' = ' + tomlString(s.env[k]));
|
|
68
|
+
}
|
|
69
|
+
}
|
|
70
|
+
return lines.join('\n');
|
|
71
|
+
}
|
|
72
|
+
function buildSkillsToml(skillsDir) {
|
|
73
|
+
if (!fs.existsSync(skillsDir)) return '';
|
|
74
|
+
const lines = [];
|
|
75
|
+
for (const ent of fs.readdirSync(skillsDir, { withFileTypes: true })) {
|
|
76
|
+
if (!ent.isDirectory()) continue;
|
|
77
|
+
const sp = path.join(skillsDir, ent.name);
|
|
78
|
+
if (!fs.existsSync(path.join(sp, 'SKILL.md'))) continue;
|
|
79
|
+
lines.push('', '[[skills.config]]', 'path = ' + tomlString(sp), 'enabled = true');
|
|
80
|
+
}
|
|
81
|
+
return lines.join('\n');
|
|
82
|
+
}
|
|
83
|
+
function stripManagedBlock(content) {
|
|
84
|
+
if (!content) return '';
|
|
85
|
+
const i = content.indexOf(SENTINEL_START);
|
|
86
|
+
if (i === -1) return content;
|
|
87
|
+
const j = content.indexOf(SENTINEL_END, i);
|
|
88
|
+
if (j === -1) return content;
|
|
89
|
+
return (content.slice(0, i).replace(/\n*$/, '\n') + content.slice(j + SENTINEL_END.length).replace(/^\n+/, '')).replace(/\n{3,}/g, '\n\n');
|
|
90
|
+
}
|
|
91
|
+
function buildBlock(root) {
|
|
92
|
+
const hooksJson = fs.existsSync(path.join(root, 'hooks', 'hooks.json')) ? JSON.parse(fs.readFileSync(path.join(root, 'hooks', 'hooks.json'), 'utf8')) : { hooks: {} };
|
|
93
|
+
const mcpJson = fs.existsSync(path.join(root, '.mcp.json')) ? JSON.parse(fs.readFileSync(path.join(root, '.mcp.json'), 'utf8')) : { mcpServers: {} };
|
|
94
|
+
const parts = [SENTINEL_START, '', '[features]', 'codex_hooks = true', buildHooksToml(hooksJson, root), buildMcpToml(mcpJson), buildSkillsToml(path.join(root, 'skills')), '', SENTINEL_END];
|
|
95
|
+
return parts.filter(p => p !== '').join('\n').replace(/\n{3,}/g, '\n\n') + '\n';
|
|
96
|
+
}
|
|
97
|
+
function mergeCodexToml(configPath, root) {
|
|
98
|
+
const existing = fs.existsSync(configPath) ? fs.readFileSync(configPath, 'utf8') : '';
|
|
99
|
+
const stripped = stripManagedBlock(existing).replace(/\s+$/, '');
|
|
100
|
+
const block = buildBlock(root);
|
|
101
|
+
const next = stripped ? stripped + '\n\n' + block : block;
|
|
102
|
+
fs.mkdirSync(path.dirname(configPath), { recursive: true });
|
|
103
|
+
fs.writeFileSync(configPath, next);
|
|
104
|
+
}
|
|
105
|
+
try {
|
|
106
|
+
mergeCodexToml(path.join(homeDir, '.codex', 'config.toml'), destDir);
|
|
107
|
+
console.log('✓ wired ~/.codex/config.toml (managed block)');
|
|
108
|
+
} catch (e) {
|
|
109
|
+
console.warn('Warning: failed to wire codex config.toml:', e.message);
|
|
110
|
+
}
|
|
111
|
+
|
|
31
112
|
const destPath = process.platform === 'win32' ? destDir.replace(/\\/g, '/') : destDir;
|
|
32
113
|
console.log(`✓ gm-codex ${isUpgrade ? 'upgraded' : 'installed'} to ${destPath}`);
|
|
33
114
|
console.log('Restart Codex to activate.');
|
package/gm.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.788",
|
|
4
4
|
"description": "State machine agent with hooks, skills, and automated git enforcement",
|
|
5
5
|
"author": "AnEntrypoint",
|
|
6
6
|
"license": "MIT",
|
|
@@ -23,5 +23,5 @@
|
|
|
23
23
|
"publishConfig": {
|
|
24
24
|
"access": "public"
|
|
25
25
|
},
|
|
26
|
-
"plugkitVersion": "0.1.
|
|
26
|
+
"plugkitVersion": "0.1.256"
|
|
27
27
|
}
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
{
|
|
2
|
+
"schemaVersion": 1,
|
|
3
|
+
"description": "Hook spec for gm Codex extension",
|
|
4
|
+
"envVar": "CODEX_PLUGIN_ROOT",
|
|
5
|
+
"plugkitInvoker": "node",
|
|
6
|
+
"events": [
|
|
7
|
+
{
|
|
8
|
+
"eventKey": "PreToolUse",
|
|
9
|
+
"commands": [
|
|
10
|
+
{
|
|
11
|
+
"kind": "plugkit",
|
|
12
|
+
"subcommand": "pre-tool-use",
|
|
13
|
+
"timeout": 3600
|
|
14
|
+
},
|
|
15
|
+
{
|
|
16
|
+
"kind": "js",
|
|
17
|
+
"file": "pre-tool-use-hook.js",
|
|
18
|
+
"timeout": 2000
|
|
19
|
+
}
|
|
20
|
+
]
|
|
21
|
+
},
|
|
22
|
+
{
|
|
23
|
+
"eventKey": "PostToolUse",
|
|
24
|
+
"commands": [
|
|
25
|
+
{
|
|
26
|
+
"kind": "js",
|
|
27
|
+
"file": "post-tool-use-hook.js",
|
|
28
|
+
"timeout": 3000
|
|
29
|
+
}
|
|
30
|
+
]
|
|
31
|
+
},
|
|
32
|
+
{
|
|
33
|
+
"eventKey": "SessionStart",
|
|
34
|
+
"commands": [
|
|
35
|
+
{
|
|
36
|
+
"kind": "plugkit",
|
|
37
|
+
"subcommand": "session-start",
|
|
38
|
+
"timeout": 180000
|
|
39
|
+
}
|
|
40
|
+
]
|
|
41
|
+
},
|
|
42
|
+
{
|
|
43
|
+
"eventKey": "UserPromptSubmit",
|
|
44
|
+
"commands": [
|
|
45
|
+
{
|
|
46
|
+
"kind": "plugkit",
|
|
47
|
+
"subcommand": "prompt-submit",
|
|
48
|
+
"timeout": 60000
|
|
49
|
+
},
|
|
50
|
+
{
|
|
51
|
+
"kind": "js",
|
|
52
|
+
"file": "prompt-submit-hook.js",
|
|
53
|
+
"timeout": 3000
|
|
54
|
+
}
|
|
55
|
+
]
|
|
56
|
+
},
|
|
57
|
+
{
|
|
58
|
+
"eventKey": "Stop",
|
|
59
|
+
"commands": [
|
|
60
|
+
{
|
|
61
|
+
"kind": "plugkit",
|
|
62
|
+
"subcommand": "stop",
|
|
63
|
+
"timeout": 15000
|
|
64
|
+
},
|
|
65
|
+
{
|
|
66
|
+
"kind": "plugkit",
|
|
67
|
+
"subcommand": "stop-git",
|
|
68
|
+
"timeout": 210000
|
|
69
|
+
}
|
|
70
|
+
]
|
|
71
|
+
}
|
|
72
|
+
]
|
|
73
|
+
}
|
package/install.js
CHANGED
|
@@ -50,6 +50,88 @@ function install() {
|
|
|
50
50
|
try { fs.copyFileSync(path.join(sourceDir, 'README.md'), path.join(codexDir, 'README.md')); } catch {}
|
|
51
51
|
try { fs.copyFileSync(path.join(sourceDir, 'CLAUDE.md'), path.join(codexDir, 'CLAUDE.md')); } catch {}
|
|
52
52
|
try { fs.copyFileSync(path.join(sourceDir, 'AGENTS.md'), path.join(codexDir, 'AGENTS.md')); } catch {}
|
|
53
|
+
|
|
54
|
+
const SENTINEL_START = '# >>> gm-codex managed (do not edit between sentinels)';
|
|
55
|
+
const SENTINEL_END = '# <<< gm-codex managed';
|
|
56
|
+
const tomlString = (s) => '"' + String(s).replace(/\\/g, '\\\\').replace(/"/g, '\\"') + '"';
|
|
57
|
+
const expand = (cmd, root) => String(cmd).split('${CODEX_PLUGIN_ROOT}').join(root);
|
|
58
|
+
function buildHooksToml(hooksJson, root) {
|
|
59
|
+
const hooks = (hooksJson && hooksJson.hooks) || {};
|
|
60
|
+
const lines = [];
|
|
61
|
+
for (const event of Object.keys(hooks)) {
|
|
62
|
+
for (const group of (hooks[event] || [])) {
|
|
63
|
+
const matcher = group.matcher || '*';
|
|
64
|
+
const entries = group.hooks || [];
|
|
65
|
+
if (!entries.length) continue;
|
|
66
|
+
lines.push('', '[[hooks.' + event + ']]', 'matcher = ' + tomlString(matcher));
|
|
67
|
+
for (const e of entries) {
|
|
68
|
+
lines.push('', '[[hooks.' + event + '.hooks]]');
|
|
69
|
+
lines.push('type = ' + tomlString(e.type || 'command'));
|
|
70
|
+
lines.push('command = ' + tomlString(expand(e.command, root)));
|
|
71
|
+
const t = typeof e.timeout === 'number' ? Math.max(1, Math.round(e.timeout / 1000)) : 60;
|
|
72
|
+
lines.push('timeout = ' + t);
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
}
|
|
76
|
+
return lines.join('\n');
|
|
77
|
+
}
|
|
78
|
+
function buildMcpToml(mcpJson) {
|
|
79
|
+
const servers = (mcpJson && mcpJson.mcpServers) || {};
|
|
80
|
+
const lines = [];
|
|
81
|
+
for (const id of Object.keys(servers)) {
|
|
82
|
+
const s = servers[id];
|
|
83
|
+
lines.push('', '[mcp_servers.' + id + ']');
|
|
84
|
+
if (s.command) lines.push('command = ' + tomlString(s.command));
|
|
85
|
+
if (Array.isArray(s.args)) lines.push('args = [' + s.args.map(tomlString).join(', ') + ']');
|
|
86
|
+
if (s.cwd) lines.push('cwd = ' + tomlString(s.cwd));
|
|
87
|
+
if (s.url) lines.push('url = ' + tomlString(s.url));
|
|
88
|
+
if (s.env && typeof s.env === 'object') {
|
|
89
|
+
lines.push('', '[mcp_servers.' + id + '.env]');
|
|
90
|
+
for (const k of Object.keys(s.env)) lines.push(k + ' = ' + tomlString(s.env[k]));
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
return lines.join('\n');
|
|
94
|
+
}
|
|
95
|
+
function buildSkillsToml(skillsDir) {
|
|
96
|
+
if (!fs.existsSync(skillsDir)) return '';
|
|
97
|
+
const lines = [];
|
|
98
|
+
for (const ent of fs.readdirSync(skillsDir, { withFileTypes: true })) {
|
|
99
|
+
if (!ent.isDirectory()) continue;
|
|
100
|
+
const sp = path.join(skillsDir, ent.name);
|
|
101
|
+
if (!fs.existsSync(path.join(sp, 'SKILL.md'))) continue;
|
|
102
|
+
lines.push('', '[[skills.config]]', 'path = ' + tomlString(sp), 'enabled = true');
|
|
103
|
+
}
|
|
104
|
+
return lines.join('\n');
|
|
105
|
+
}
|
|
106
|
+
function stripManagedBlock(content) {
|
|
107
|
+
if (!content) return '';
|
|
108
|
+
const i = content.indexOf(SENTINEL_START);
|
|
109
|
+
if (i === -1) return content;
|
|
110
|
+
const j = content.indexOf(SENTINEL_END, i);
|
|
111
|
+
if (j === -1) return content;
|
|
112
|
+
return (content.slice(0, i).replace(/\n*$/, '\n') + content.slice(j + SENTINEL_END.length).replace(/^\n+/, '')).replace(/\n{3,}/g, '\n\n');
|
|
113
|
+
}
|
|
114
|
+
function buildBlock(root) {
|
|
115
|
+
const hooksJson = fs.existsSync(path.join(root, 'hooks', 'hooks.json')) ? JSON.parse(fs.readFileSync(path.join(root, 'hooks', 'hooks.json'), 'utf8')) : { hooks: {} };
|
|
116
|
+
const mcpJson = fs.existsSync(path.join(root, '.mcp.json')) ? JSON.parse(fs.readFileSync(path.join(root, '.mcp.json'), 'utf8')) : { mcpServers: {} };
|
|
117
|
+
const parts = [SENTINEL_START, '', '[features]', 'codex_hooks = true', buildHooksToml(hooksJson, root), buildMcpToml(mcpJson), buildSkillsToml(path.join(root, 'skills')), '', SENTINEL_END];
|
|
118
|
+
return parts.filter(p => p !== '').join('\n').replace(/\n{3,}/g, '\n\n') + '\n';
|
|
119
|
+
}
|
|
120
|
+
function mergeCodexToml(configPath, root) {
|
|
121
|
+
const existing = fs.existsSync(configPath) ? fs.readFileSync(configPath, 'utf8') : '';
|
|
122
|
+
const stripped = stripManagedBlock(existing).replace(/\s+$/, '');
|
|
123
|
+
const block = buildBlock(root);
|
|
124
|
+
const next = stripped ? stripped + '\n\n' + block : block;
|
|
125
|
+
fs.mkdirSync(path.dirname(configPath), { recursive: true });
|
|
126
|
+
fs.writeFileSync(configPath, next);
|
|
127
|
+
}
|
|
128
|
+
try {
|
|
129
|
+
mergeCodexToml(path.join(projectRoot, '.codex', 'config.toml'), codexDir);
|
|
130
|
+
console.log('✓ wired ~/.codex/config.toml (managed block)');
|
|
131
|
+
} catch (e) {
|
|
132
|
+
console.warn('Warning: failed to wire codex config.toml:', e.message);
|
|
133
|
+
}
|
|
134
|
+
|
|
53
135
|
}
|
|
54
136
|
|
|
55
137
|
install();
|
package/package.json
CHANGED
|
@@ -1,13 +1,14 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm-codex",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.788",
|
|
4
4
|
"description": "State machine agent with hooks, skills, and automated git enforcement",
|
|
5
5
|
"author": "AnEntrypoint",
|
|
6
6
|
"license": "MIT",
|
|
7
7
|
"main": "plugin.json",
|
|
8
8
|
"bin": {
|
|
9
9
|
"gm-codex": "./cli.js",
|
|
10
|
-
"gm-codex-install": "./install.js"
|
|
10
|
+
"gm-codex-install": "./install.js",
|
|
11
|
+
"gm-codex-uninstall": "./uninstall.js"
|
|
11
12
|
},
|
|
12
13
|
"repository": {
|
|
13
14
|
"type": "git",
|
|
@@ -41,7 +42,8 @@
|
|
|
41
42
|
"plugin.json",
|
|
42
43
|
"gm.json",
|
|
43
44
|
"cli.js",
|
|
44
|
-
"install.js"
|
|
45
|
+
"install.js",
|
|
46
|
+
"uninstall.js"
|
|
45
47
|
],
|
|
46
48
|
"keywords": [
|
|
47
49
|
"codex",
|
|
@@ -51,4 +53,4 @@
|
|
|
51
53
|
"automation",
|
|
52
54
|
"gm"
|
|
53
55
|
]
|
|
54
|
-
}
|
|
56
|
+
}
|
package/plugin.json
CHANGED
package/skills/gm/SKILL.md
CHANGED
|
@@ -41,36 +41,59 @@ Multiple facts → parallel Agent calls in ONE message. End-of-turn: scan for un
|
|
|
41
41
|
|
|
42
42
|
## AUTONOMY — HARD RULE
|
|
43
43
|
|
|
44
|
-
|
|
44
|
+
A written PRD is the user's authorization. Once it exists, EXECUTE owns the work to COMPLETE. Resolve every doubt that arises during execution by witnessed probe, by recall, or by re-reading the PRD — never by asking the user. Any question whose answer the agent could obtain itself is a question the agent owes itself, not the user.
|
|
45
45
|
|
|
46
|
-
|
|
47
|
-
- "Should I continue with X?" / "Want me to do Y next?" / "Want me to also Z?"
|
|
48
|
-
- "This is a lot — should I do A first and confirm?" / "Two options: A or B, which?"
|
|
49
|
-
- Pre-confirmation before multi-file edits when scope is already clear
|
|
50
|
-
- Stopping after partial completion to summarize and await direction
|
|
46
|
+
**FINISH ALL REMAINING STEPS — HARD RULE**: when a request enumerates or implies multiple work items ("all", "any", "everything", "the rest", "remaining"), or after a covering family is constructed under MAXIMAL COVER, the agent finishes every witnessable item in the same turn. Stopping after one item to ask "which next?" is forbidden — the answer is *all of them*, in one chain, until `.gm/prd.yml` is empty and git is clean and pushed. Mid-chain "should I…", "want me to…", "which would you like…" prompts are forced closure; replace them with the next skill invocation.
|
|
51
47
|
|
|
52
|
-
|
|
53
|
-
- Destructive-irreversible decision with no prior context AND no PRD coverage
|
|
54
|
-
- User intent genuinely ambiguous AND cannot be inferred from PRD/memory/code
|
|
55
|
-
- Channel: prefer `exec:pause` (renames .gm/prd.yml → .gm/prd.paused.yml; question lives in header). In-conversation asking is last-resort only.
|
|
48
|
+
Asking is permitted only as a last resort, when the next action is destructive-irreversible AND the PRD does not cover it, OR when user intent is genuinely irrecoverable from PRD, memory, code, AND the public web. The channel is structured: `exec:pause` (renames `.gm/prd.yml` → `.gm/prd.paused.yml`, question in header). In-conversation asking is last-resort beneath last-resort.
|
|
56
49
|
|
|
57
|
-
|
|
50
|
+
**Web-search before pause / before user-ask — HARD RULE.** Before `exec:pause` or any in-conversation question whose answer plausibly exists on the public web (missing artifact, prebuilt binary, library status, build recipe, version compatibility, upstream issue, "does X exist for Y"), fire `WebSearch` and at least one targeted `WebFetch` first. Pause/ask only after the web pass returns empty, or returns candidates the agent has witnessed and rejected. Pausing on a question the web could have answered is forced closure dressed as humility — re-enter planning, web-search, and resume. Genuine user-only questions: private credentials, preference among already-surfaced viable options, destructive-irreversible authorization.
|
|
58
51
|
|
|
59
|
-
|
|
52
|
+
The size of the task, the cost of context, and the duration of CI are never grounds to ask.
|
|
60
53
|
|
|
61
|
-
|
|
54
|
+
## MAXIMAL COVER — HARD RULE
|
|
62
55
|
|
|
63
|
-
|
|
56
|
+
When scope exceeds reach, construct a maximal cover of the request — not a refusal, not one slice with the rest deferred to "follow-up."
|
|
64
57
|
|
|
65
|
-
|
|
66
|
-
- "Honest stop —" / "Stopping for a hard, honest call" / "Stopping here, unambiguously"
|
|
67
|
-
- "I cannot do this from inside this conversation"
|
|
68
|
-
- "I have to push back before writing a PRD I can't fulfill"
|
|
69
|
-
- "Pretending I can would be the most expensive kind of lie"
|
|
70
|
-
- "I don't have a working browser tool" / "X is unavailable in this environment" (when exec:browser, exec:codesearch, Read, etc. are in fact available)
|
|
71
|
-
- Any preamble that announces inability before attempting the bounded subset
|
|
58
|
+
Refusal is forced closure. So is *distributed* refusal — shipping a single bounded subset while other witnessable subsets exist. Both bypass witnessed execution.
|
|
72
59
|
|
|
73
|
-
**Required move when scope exceeds reach**:
|
|
60
|
+
**Required move when scope exceeds reach**: construct a *covering family* of bounded subsets — every subset of the request that is witnessable from this session — and write the family into the PRD. Execute every member.
|
|
61
|
+
|
|
62
|
+
**Residuals the agent judges within the spirit of the original ask AND completable from this session are self-authorized — expand the PRD with them and execute, do not stop to ask.** The judgment is the agent's honest read of what the user probably wanted, paired with reachability from this session. Only residuals genuinely outside the original ask, or genuinely unreachable, are name-and-stop. When expanding under self-authorization, the agent declares its judgment in the response ("treating X as in-scope because Y") so the user can correct mid-chain. Silent expansion without the declaration is the failure mode this rule guards against.
|
|
63
|
+
|
|
64
|
+
The discipline is enforced by what is delivered, not by which words appear. Before closing the turn, check that committed work + named out-of-spirit residuals equals the witnessable closure of the request. Anything witnessable that falls in neither set means the cover is not yet maximal — re-enter planning to expand it. The cover is *maximal*, not *complete*: completeness would require reaching scope outside the session, which is dishonest. Maximality reaches everything inside the session, which is the whole obligation.
|
|
65
|
+
|
|
66
|
+
## FIX ON SIGHT — HARD RULE
|
|
67
|
+
|
|
68
|
+
Every issue surfaced during work is fixed in-band, this turn, at root cause. Defer-markers, swallowed errors, suppressed output, skipped tests, and "address it next session" are all variants of the same failure: a known-bad signal carried past the moment of detection. Each is a small forced closure.
|
|
69
|
+
|
|
70
|
+
Surface → diagnose → fix at root cause → re-witness → continue. If the fix uncovers a new unknown, regress to `planning`. If the fix is itself genuinely out-of-scope-irreversible, the residual goes into `.gm/prd.yml` *before* moving on — narration is not a substitute for an item.
|
|
71
|
+
|
|
72
|
+
A skill chain that ships while ignoring a known-bad signal is forced closure (see MAXIMAL COVER).
|
|
73
|
+
|
|
74
|
+
## BROWSER WITNESS — HARD RULE
|
|
75
|
+
|
|
76
|
+
Editing code that runs in a browser requires a live `exec:browser` witness in the same turn as the edit. The witness does not defer to a later phase; later phases re-witness on top, they do not replace this one.
|
|
77
|
+
|
|
78
|
+
Protocol on every client edit:
|
|
79
|
+
1. Boot the real surface — server up, page reachable, HTTP 200 witnessed.
|
|
80
|
+
2. `exec:browser` → navigate → poll for the global the change affects.
|
|
81
|
+
3. `page.evaluate` asserting the specific invariant the change establishes. Capture the witnessed numbers in the response.
|
|
82
|
+
4. Variance from expectation → fix at root cause, re-witness (FIX ON SIGHT). Never advance on unwitnessed client behavior.
|
|
83
|
+
|
|
84
|
+
Pure-prose edits to static documents with no JS/canvas/DOM behavior change are exempt; tag the exemption explicitly with the reason so the skip is auditable. Silent skip on actual behavior change is forced closure.
|
|
85
|
+
|
|
86
|
+
This rule fires in EXECUTE (witness on edit), EMIT (post-emit verify), and VERIFY (final gate). All three.
|
|
87
|
+
|
|
88
|
+
## NOTHING FAKE — HARD RULE
|
|
89
|
+
|
|
90
|
+
What ships runs against real services, real data, real binaries. Stubs, mocks, placeholder returns, fixture-only paths, "TODO: implement", `return null /* fake */`, hardcoded sample responses, and demo-mode fallbacks are forbidden in source the user will run. They produce green checks that survive into production and lie about what works.
|
|
91
|
+
|
|
92
|
+
Scaffolding and shims are permitted when they call through to real behavior — an empty file laid down before its body, a thin adapter wrapping an upstream API, a build target that compiles but is wired to nothing yet *and is the only callsite of itself*. The test is whether the artifact, executed, would do the thing it claims. If it would not, it is a stub.
|
|
93
|
+
|
|
94
|
+
Before writing a shim or adapter, the agent asks whether an existing library or tool already provides the same surface. Maintaining a local reimplementation of something an upstream package solves is its own failure mode — the shim drifts, ages, and accumulates the bugs the upstream already fixed. If a published package fits, the shim becomes one line of import.
|
|
95
|
+
|
|
96
|
+
Stub detection is by behavior, not by keyword: code paths that always succeed, always return the same value regardless of input, or short-circuit a real call to satisfy a type signature, are stubs. Comments asserting realness do not make code real. The witnessing rule that closes a mutable also closes this one — until real input has produced real output through the new code, it is provisional, and shipping provisional code as done is forced closure.
|
|
74
97
|
|
|
75
98
|
## EXECUTION ORDER
|
|
76
99
|
|
|
@@ -59,7 +59,9 @@ Required protocol:
|
|
|
59
59
|
|
|
60
60
|
Long-running probes: split into navigate-call → `exec:wait N` → probe-call to stay under the per-call budget. Do not stack multi-second `setTimeout` inside one `exec:browser` invocation.
|
|
61
61
|
|
|
62
|
-
Exempt only when: change is server-only with zero browser-facing surface, OR repository has no browser surface at all (pure CLI/library).
|
|
62
|
+
Exempt only when: change is server-only with zero browser-facing surface, OR repository has no browser surface at all (pure CLI/library). Exemption requires explicit tag in the response: `BROWSER EXEMPT: <reason — must reference diff paths showing zero browser-facing surface>`. Silent skip = forced-closure failure. Default posture is NOT exempt — burden is on the agent to prove exemption with diff evidence, not to assume it.
|
|
63
|
+
|
|
64
|
+
**Pre-flight check before declaring complete**: run `git diff --name-only origin/main..HEAD` and grep for `client/|docs/|\.html$|\.glsl$|\.frag$|\.vert$`. Any hit AND no `exec:browser` block in this session = mandatory regression to `gm-execute` to witness the change live. No exceptions for "small CSS tweak" or "obvious string change".
|
|
63
65
|
|
|
64
66
|
## INTEGRATION TEST GATE
|
|
65
67
|
|
|
@@ -89,6 +91,10 @@ Stop hook watches all GitHub Actions runs for the pushed HEAD. Do not call `gh r
|
|
|
89
91
|
- Failure → Stop blocks with run names+IDs → investigate with `gh run view <id> --log-failed`, fix, push, hook re-watches
|
|
90
92
|
- Deadline 180s (override `GM_CI_WATCH_SECS`) → slow jobs get "still in progress" approve
|
|
91
93
|
|
|
94
|
+
## FIX ON SIGHT — HARD RULE
|
|
95
|
+
|
|
96
|
+
Any issue surfaced during verify (test.js failure, browser-validation mismatch, CI red, git-status dirt, hygiene-sweep finding, stress-suite flunk, observability gap) is fixed in-band before declaring complete. Never paper over, never `.skip`, never ship-and-followup, never silence stderr/CI signals. Failure routes to the owning phase: broken output → `gm-emit` | wrong logic → `gm-execute` | new unknown → `planning`. Never declare COMPLETE while a known-bad signal is live.
|
|
97
|
+
|
|
92
98
|
## HYGIENE SWEEP
|
|
93
99
|
|
|
94
100
|
Before declaring complete:
|
package/skills/gm-emit/SKILL.md
CHANGED
|
@@ -67,6 +67,10 @@ console.log(await fn(realInput));
|
|
|
67
67
|
- All facts resolved this phase memorized via background Agent(memorize)
|
|
68
68
|
- CHANGELOG.md updated; TODO.md cleared/deleted
|
|
69
69
|
|
|
70
|
+
## FIX ON SIGHT — HARD RULE
|
|
71
|
+
|
|
72
|
+
Pre-emit run, post-emit run, or legitimacy gate surfaces ANY issue (failing assertion, stderr, type/lint error, unexpected variance, broken import, runtime throw) → fix at root cause this turn, re-run pre-emit AND post-emit, advance only when all gates pass simultaneously. Never write-and-promise-fix-later, never `try/catch`-to-hide, never `.skip`, never silence with redirection. Known variance → fix and re-verify (self-loop). Unknown variance → regress to `planning`.
|
|
73
|
+
|
|
70
74
|
## CODE EXECUTION
|
|
71
75
|
|
|
72
76
|
`exec:<lang>` only. File writes via exec:nodejs + require('fs'). Never Bash(node/npm/npx/bun).
|
|
@@ -118,6 +118,22 @@ Triggers: exec output answers prior unknown | CI log reveals root cause | code r
|
|
|
118
118
|
|
|
119
119
|
N facts → N parallel Agent calls in ONE message. End-of-turn self-check mandatory.
|
|
120
120
|
|
|
121
|
+
## FIX ON SIGHT — HARD RULE
|
|
122
|
+
|
|
123
|
+
Issue surfaced mid-execution (failing test, exec stderr, broken import, runtime exception, lint/type error, deprecation warning, unexpected output) is fixed THIS turn, at root cause, in-band. Never `// TODO`, never `try/catch`-to-swallow, never `2>/dev/null`, never `.skip`, never "out of scope" inside the same file. Re-witness after fix. New unknown surfaced by the fix → regress to `planning`. Genuine out-of-scope → write a `.gm/prd.yml` item before continuing.
|
|
124
|
+
|
|
125
|
+
## BROWSER WITNESS — HARD RULE
|
|
126
|
+
|
|
127
|
+
Editing browser-facing code (under `client/`, `docs/`, `*.html`, shaders, page-bundle imports, served JS/CSS, gh-pages assets, anything imported by a browser entry, anything visible in DOM/canvas/WebGL) → live `exec:browser` witness in THIS phase, same turn as the edit. Not deferred to EMIT, not deferred to VERIFY — those layers re-witness on top, they don't replace this one.
|
|
128
|
+
|
|
129
|
+
Protocol on every client edit:
|
|
130
|
+
1. Boot server / open page → HTTP 200 witnessed
|
|
131
|
+
2. `exec:browser` → `page.goto(url)` → poll the affected global (`window.__app.<system>`, `window.__debug.<module>`)
|
|
132
|
+
3. `page.evaluate` asserting the specific invariant the change establishes — capture numbers
|
|
133
|
+
4. Variance → fix at root cause, re-witness (FIX ON SIGHT). Never advance to EMIT on unwitnessed client behavior.
|
|
134
|
+
|
|
135
|
+
Forbidden: `node test.js` green as a substitute | screenshot without evaluate | "I'll check it in VERIFY" then skipping | committing a client diff without an `exec:browser` block this turn. Exempt only for server-only / no-browser repos; tag the exemption explicitly.
|
|
136
|
+
|
|
121
137
|
## CONSTRAINTS
|
|
122
138
|
|
|
123
139
|
**Never**: Bash(node/npm/npx/bun) | fake data | mocks | scattered tests | fallbacks | Grep/Glob/Find/Explore | sequential independent items | respond mid-phase | edit before witnessing | duplicate code | if/else where dispatch suffices | one-liners that obscure | reinvent native/library
|
package/skills/planning/SKILL.md
CHANGED
|
@@ -43,23 +43,55 @@ Runs until: .gm/prd.yml empty AND git clean AND all pushes confirmed AND CI gree
|
|
|
43
43
|
|
|
44
44
|
## AUTONOMY — HARD RULE
|
|
45
45
|
|
|
46
|
-
PRD written → execute to COMPLETE without asking the user.
|
|
46
|
+
PRD written → execute to COMPLETE without asking the user. Doubts that arise during execution are resolved by witnessed probe, by recall, or by re-reading the PRD — never by asking. Any question whose answer is reachable from the agent's tools belongs to the agent, not the user.
|
|
47
47
|
|
|
48
|
-
Asking
|
|
48
|
+
Asking is last-resort: destructive-irreversible without PRD coverage, OR user intent irrecoverable from PRD/memory/code/web. Channel: `exec:pause` (renames `prd.yml` → `prd.paused.yml`; question in header). In-conversation asking is last-resort beneath last-resort.
|
|
49
49
|
|
|
50
|
-
**
|
|
50
|
+
**WEB-SEARCH BEFORE PAUSE — HARD RULE.** Before `exec:pause` for any blocking question whose answer could plausibly exist on the public web — missing artifact, unknown library/API, prebuilt binary, version compatibility, build recipe, upstream status — fire `WebSearch` and at least one `WebFetch` first. Only after the web pass returns empty (or returns options the agent then witnesses and rejects) is `exec:pause` legitimate. Pausing on a question the web could have answered is forced closure dressed up as humility — fix on sight by re-entering planning, web-searching, and resuming. The only questions that genuinely require user-ask are ones the public web cannot answer: private credentials, intent/preference between viable options the agent has *already surfaced*, destructive-irreversible authorization.
|
|
51
51
|
|
|
52
|
-
|
|
52
|
+
**Cannot stop while**: `.gm/prd.yml` has items | git uncommitted | git unpushed.
|
|
53
53
|
|
|
54
|
-
|
|
54
|
+
## MAXIMAL COVER — HARD RULE
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
When scope exceeds reach, expand the cover. Don't refuse. Don't ship one slice with the rest abandoned as "follow-up" — that's distributed refusal: the same failure dressed up as triage.
|
|
57
57
|
|
|
58
|
-
Required
|
|
58
|
+
**Required move when scope exceeds reach**: construct a *covering family* — every bounded subset of the request that is witnessable from this session — and write the family into the PRD as separate items, with the dependency graph explicit so independent members parallelize. Execute every member.
|
|
59
59
|
|
|
60
|
-
|
|
60
|
+
**Residuals the agent judges within the spirit of the original ask AND completable from this session are self-authorized — expand the PRD with them and execute, do not stop to ask.** The judgment is the agent's honest read of what the user probably wanted, paired with reachability from this session. Only residuals genuinely outside the original ask, or genuinely unreachable from this session, are name-and-stop. When expanding under self-authorization, the agent declares its judgment in the response ("treating X as in-scope because Y") so the user can correct mid-chain. Silent expansion without the declaration is the failure mode this rule guards against.
|
|
61
61
|
|
|
62
|
-
|
|
62
|
+
Enforcement is on what is delivered, not on which words appear. Before closing the turn, check that committed work + named out-of-spirit residuals = witnessable closure of the request. Gap = cover not yet maximal → re-enter PLAN to expand.
|
|
63
|
+
|
|
64
|
+
## FIX ON SIGHT — HARD RULE
|
|
65
|
+
|
|
66
|
+
Every issue surfaced during planning, execution, or verification is fixed in-band, the same session, at root cause. A known-bad signal carried past the moment of detection — by deferral, suppression, silencing, skipping, or "next time" narration — is a small forced closure.
|
|
67
|
+
|
|
68
|
+
Surface → diagnose → fix → re-witness → continue. New unknown surfaced by the fix → regress here. Genuinely out-of-scope-irreversible → the residual goes into `.gm/prd.yml` *before* moving on; narration is not a substitute for an item.
|
|
69
|
+
|
|
70
|
+
## BROWSER WITNESS — HARD RULE
|
|
71
|
+
|
|
72
|
+
A `.prd` item that touches browser-facing code is not plan-complete unless its acceptance criteria include a live `exec:browser` witness with a `page.evaluate` assertion against the specific invariant the change establishes. "Manual verification", "test.js passes", and "browser test optional" are all unwitnessed and therefore unacceptable.
|
|
73
|
+
|
|
74
|
+
The trigger is functional, not a path-list: any change whose effect is observable in the DOM, canvas, WebGL surface, network frames captured by the page, or any global the page exposes, requires the browser witness. Pure-prose edits to static documents with no behavior change are exempt; the exemption is tagged on the item with the reason.
|
|
75
|
+
|
|
76
|
+
Propagation: EXECUTE witnesses on edit, EMIT re-witnesses post-write, VERIFY runs the final gate. The plan must encode the rule so all three layers fire.
|
|
77
|
+
|
|
78
|
+
## NOTHING FAKE — HARD RULE
|
|
79
|
+
|
|
80
|
+
Plan items resolve when real input flows through real code into real output. Stubs, mocks, placeholder returns, fixture-only branches, "TODO: implement", and demo-mode short-circuits do not count as resolution — they are mutables wearing closed-status disguise.
|
|
81
|
+
|
|
82
|
+
Acceptance criteria must witness behavior, not the existence of a function with the right name. "X is implemented" is not acceptance; "X called with real Y produces real Z" is. The agent that satisfies the criterion via a stub has built something that will lie when production calls it.
|
|
83
|
+
|
|
84
|
+
Scaffolding and shims are permitted only when the shim *delegates* to real behavior — wraps an upstream API, calls a real subprocess, hits a real disk. Before adding a shim, the plan asks whether a published library or tool already provides that surface; maintaining a local reimplementation of an upstream solution is its own failure mode and the shim line should usually become an import line.
|
|
85
|
+
|
|
86
|
+
The fake-detection test is behavioral: would the code, executed against the inputs it claims to accept, produce the outputs it claims to produce? If the answer requires "after we fill in the body" or "once X is wired up", the plan item is open, not done.
|
|
87
|
+
|
|
88
|
+
## ORIENT — HARD RULE
|
|
89
|
+
|
|
90
|
+
Open every plan with a parallel pack of `exec:recall` and `exec:codesearch` against the request's nouns. Hits land as `weak_prior`; misses confirm the unknown is fresh. The pack runs in one message — never serially. The agent that skips orient pays the same cost in fresh probes a turn later, plus the price of disagreeing with its own prior witness.
|
|
91
|
+
|
|
92
|
+
## PRD — HARD RULE
|
|
93
|
+
|
|
94
|
+
`./.gm/prd.yml` is the authorization. It is written before EXECUTE fires for any task that touches more than one line in one file. The cost of writing it equals the cost of skipping it; what the file buys is durable trace, resumability, and the cover-maximality check.
|
|
63
95
|
|
|
64
96
|
## PLAN PHASE — MUTABLE DISCOVERY
|
|
65
97
|
|
|
@@ -82,7 +114,7 @@ Client: `window.__debug` live registry; modules register on mount.
|
|
|
82
114
|
|
|
83
115
|
`console.log` ≠ observability. Discovery of gap → add .prd item immediately, never deferred.
|
|
84
116
|
|
|
85
|
-
**No parallel
|
|
117
|
+
**No parallel observability surfaces.** `window.__debug` is THE in-page registry; `test.js` at project root is the sole out-of-page test asset. Any new file whose purpose is to exercise, smoke-test, demo, or sandbox in-page behavior outside that registry fights the discipline — extend the registry instead.
|
|
86
118
|
|
|
87
119
|
## .PRD FORMAT
|
|
88
120
|
|
|
@@ -124,6 +156,12 @@ Not parallelizable → invoke `gm-execute` directly.
|
|
|
124
156
|
|
|
125
157
|
`exec:<lang>` only via Bash. File I/O via exec:nodejs + fs. Git directly in Bash. Never Bash(node/npm/npx/bun).
|
|
126
158
|
|
|
159
|
+
File paths in `exec:nodejs` are platform-literal — node's `fs` does not auto-resolve POSIX shortcuts like `/tmp`. Use `os.tmpdir()` and `path.join` for portable temp paths; reserve `/tmp/...` for `exec:bash` heredocs where the shell rewrites the prefix.
|
|
160
|
+
|
|
161
|
+
Every `exec:<lang>` and `exec:bash` call should pass `--timeout-ms <ms>` (mandatory at the rs-exec RPC layer; the CLI emits a transitional warning when omitted). On timeout, partial streamed output is preserved and the runner emits `[exec timed out after Nms; partial output above]` — treat the marker as authoritative truncation and re-issue with a higher budget rather than retrying blindly.
|
|
162
|
+
|
|
163
|
+
**Utility-verb syntax**: `exec:wait`, `exec:sleep`, `exec:status`, `exec:close`, `exec:pause`, `exec:type`, `exec:runner`, `exec:kill-port`, `exec:recall`, `exec:memorize`, `exec:forget` all take their argument on the **next line** (heredoc body), never inline. `exec:status\n<task_id>` polls one task; bare `exec:status` lists all. `exec:sleep\n<task_id>` blocks until the task produces output or completes. `exec:close\n<task_id>` terminates and removes a task. Inline forms (`exec:status <id>`) are denied by the hook.
|
|
164
|
+
|
|
127
165
|
`exec:codesearch` only — Glob/Grep/Find/Explore hook-blocked. Start 2 words → change/add one per pass → minimum 4 attempts before concluding absent.
|
|
128
166
|
|
|
129
167
|
Pack runs: Promise.allSettled for parallel. Each idea own try/catch. Under 12s per call.
|
|
@@ -136,9 +174,23 @@ No comments. No scattered test files. 200-line limit per file. Fail loud. No dup
|
|
|
136
174
|
|
|
137
175
|
## SINGLE INTEGRATION TEST POLICY
|
|
138
176
|
|
|
139
|
-
One `test.js` at project root. 200-line
|
|
177
|
+
One `test.js` at project root. **200-line hard cap.** No fixtures, mocks, or scattered test files under any naming convention. Plain assertions, real data, real system. `gm-complete` runs it. Failure = regression to EXECUTE.
|
|
178
|
+
|
|
179
|
+
Any second test runner — under any name, in any directory — is a smuggled parallel surface and fights the discipline. If a behavior needs to be exercised in-page, register it in `window.__debug` and assert via `test.js`.
|
|
180
|
+
|
|
181
|
+
**Purpose: maximum surface coverage in 200 lines.** test.js is a budget, not a target. Every line should witness a load-bearing behavior; redundant assertions are dead weight. Subsystems get *one* group each — combined groups (e.g. `profiles+observability+auth+context+cron+batch`) are the norm, not the exception. As thoth grew from 17 → 21 → 14 named groups while the surface tripled, the win came from collapsing per-subsystem groups into multi-subsystem ones.
|
|
182
|
+
|
|
183
|
+
**Use overlap to exclude.** When subsystem A's test exercises B as a side effect, B does not need its own group — drop the redundant assertion. Examples that have proven out:
|
|
184
|
+
- The agent-machine tool-loop test exercises bash dispatch → no separate bash test needed beyond a smoke-call inside the tools+toolsets group.
|
|
185
|
+
- The dashboard test asserts the API surface AND that the registry has ≥N tools → covers tool registration coverage.
|
|
186
|
+
- The plugins+memory group exercises observability metrics + achievements → no need for a separate plugins-extra group.
|
|
187
|
+
- The gateway test exercising one platform plus a platform-stub-shape loop covers all 18 adapters in one group.
|
|
188
|
+
|
|
189
|
+
**Adding a new subsystem:** first try to fold its assertion into the closest existing group. Only create a new group when the subsystem's failure mode is genuinely orthogonal (e.g. compressor's iterative-update behavior is not exercised by any other group). Test surface should grow linearly with subsystem count, not multiplicatively, and the line budget is the forcing function.
|
|
190
|
+
|
|
191
|
+
**Pattern that works:** name combined groups by joining their subsystems with `+`, e.g. `home+config+skin`, `mcp+swe+distributions+account+credpool`, `env+pi+cli+tui+setup+website`. Future readers see the coverage at a glance. A group title with 4–6 components is healthy; a group with 1 component should be questioned.
|
|
140
192
|
|
|
141
|
-
**
|
|
193
|
+
**Hygiene at edit time:** every change to test.js prefers compaction over expansion. If `wc -l test.js > 200`, the discipline is *not* "split" — it's "merge groups + drop redundancy" until it fits. If the budget is genuinely insufficient for the load-bearing surface, the right move is to question whether the assertion is load-bearing, not to lift the cap.
|
|
142
194
|
|
|
143
195
|
## RESPONSE POLICY
|
|
144
196
|
|
|
@@ -0,0 +1,45 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research
|
|
3
|
+
description: Web research via parallel subagent fan-out. Use when a question needs the live web, spans multiple sources, requires comparison across vendors/papers/repos, or would saturate a single context window. Skip for one-page lookups answerable by a single WebFetch.
|
|
4
|
+
allowed-tools: Task, WebFetch, WebSearch
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Research
|
|
8
|
+
|
|
9
|
+
Lead orchestrates. Workers fetch. Findings converge. The lead never reads pages — workers do.
|
|
10
|
+
|
|
11
|
+
## Shape of the work
|
|
12
|
+
|
|
13
|
+
Breadth first, depth on demand. Open with a wide sweep that maps the terrain; commit deep dives only where the sweep surfaces something load-bearing. A narrow opening misses the alternative the user actually needed.
|
|
14
|
+
|
|
15
|
+
Effort matches stakes. A single fact warrants one short fetch. A vendor comparison warrants a handful of workers, each owning one vendor. A landscape survey warrants ten or more, each owning one axis. Spending a fan-out on a fact wastes tokens; spending a fact-fetch on a landscape under-delivers.
|
|
16
|
+
|
|
17
|
+
Workers run in parallel. Independent questions launch in one message, never serialized. Serial fan-out is the default failure mode — guard against it explicitly.
|
|
18
|
+
|
|
19
|
+
## Worker contract
|
|
20
|
+
|
|
21
|
+
Each worker receives: the precise question it owns, the shape of the answer (bullets, table row, prose paragraph), the boundary of what it must not pursue, and the destination path under `.gm/research/<slug>/<worker-id>.md` for its findings. Vague briefs produce vague returns; the lead's job is to make the brief unambiguous before spawning.
|
|
22
|
+
|
|
23
|
+
Workers write structured findings to disk and return only a path plus a one-line summary. The lead reads paths it cares about; the rest stay on disk. Returning full prose through the agent boundary burns context that the synthesis pass needs.
|
|
24
|
+
|
|
25
|
+
## Citations
|
|
26
|
+
|
|
27
|
+
A claim without a source URL is a hallucination waiting to be quoted. Workers attach the URL and the quoted span beside every non-trivial assertion. The lead refuses to lift a claim into the final answer if its citation field is empty.
|
|
28
|
+
|
|
29
|
+
## Source quality
|
|
30
|
+
|
|
31
|
+
Content farms and SEO-optimised listicles outrank primary sources on most queries. Prefer vendor docs, RFCs, primary repos, dated blog posts from named authors, and academic preprints over aggregator pages. When two sources disagree, the older primary usually beats the newer aggregator.
|
|
32
|
+
|
|
33
|
+
## Convergence
|
|
34
|
+
|
|
35
|
+
Synthesis happens once, after all workers return. Mid-flight summarisation truncates findings the next worker would have built on. The lead holds the question, the workers' paths, and the synthesis prompt; everything else lives on disk.
|
|
36
|
+
|
|
37
|
+
If a worker's return reveals a new axis the original plan missed, expand the fan-out — do not stretch an existing worker past its brief.
|
|
38
|
+
|
|
39
|
+
## When not to fan out
|
|
40
|
+
|
|
41
|
+
One question, one page, one fetch. A single `WebFetch` answers it. The fan-out machinery has overhead; spending it on a lookup is the same mistake as skipping it on a survey.
|
|
42
|
+
|
|
43
|
+
## Handoff
|
|
44
|
+
|
|
45
|
+
Final answer cites every load-bearing claim, names the workers' output paths for audit, and surfaces disagreements between sources rather than averaging them away.
|
|
@@ -52,6 +52,10 @@ git commit -m "docs: update documentation to reflect session changes"
|
|
|
52
52
|
git push -u origin HEAD
|
|
53
53
|
```
|
|
54
54
|
|
|
55
|
+
## FIX ON SIGHT — HARD RULE
|
|
56
|
+
|
|
57
|
+
Doc-write surfaces a stale claim, broken link, missing file referenced, or contradiction with disk → fix at root cause this turn (update doc to match disk, or fix code if disc is wrong). Never leave a known-false claim in docs. Push surfaces a CI failure → fix and re-push, do not declare complete.
|
|
58
|
+
|
|
55
59
|
## Fidelity Rules
|
|
56
60
|
|
|
57
61
|
Every claim verifiable against disk: phase names match frontmatter, platform names match `platforms/`, file paths exist, constraint counts accurate. Unverifiable section → remove, don't speculate.
|
package/uninstall.js
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
#!/usr/bin/env node
|
|
2
|
+
const fs = require('fs');
|
|
3
|
+
const path = require('path');
|
|
4
|
+
const os = require('os');
|
|
5
|
+
|
|
6
|
+
const homeDir = process.env.HOME || process.env.USERPROFILE || os.homedir();
|
|
7
|
+
const pluginDir = path.join(homeDir, '.codex', 'plugins', 'gm-codex');
|
|
8
|
+
const configPath = path.join(homeDir, '.codex', 'config.toml');
|
|
9
|
+
const SENTINEL_START = '# >>> gm-codex managed (do not edit between sentinels)';
|
|
10
|
+
const SENTINEL_END = '# <<< gm-codex managed';
|
|
11
|
+
|
|
12
|
+
function stripManagedBlock(content) {
|
|
13
|
+
if (!content) return '';
|
|
14
|
+
const i = content.indexOf(SENTINEL_START);
|
|
15
|
+
if (i === -1) return content;
|
|
16
|
+
const j = content.indexOf(SENTINEL_END, i);
|
|
17
|
+
if (j === -1) return content;
|
|
18
|
+
return (content.slice(0, i).replace(/\n*$/, '\n') + content.slice(j + SENTINEL_END.length).replace(/^\n+/, '')).replace(/\n{3,}/g, '\n\n');
|
|
19
|
+
}
|
|
20
|
+
|
|
21
|
+
try {
|
|
22
|
+
if (fs.existsSync(pluginDir)) {
|
|
23
|
+
fs.rmSync(pluginDir, { recursive: true, force: true });
|
|
24
|
+
console.log('✓ removed ' + pluginDir);
|
|
25
|
+
}
|
|
26
|
+
if (fs.existsSync(configPath)) {
|
|
27
|
+
const before = fs.readFileSync(configPath, 'utf8');
|
|
28
|
+
const after = stripManagedBlock(before);
|
|
29
|
+
if (after !== before) {
|
|
30
|
+
if (after.trim() === '') {
|
|
31
|
+
fs.unlinkSync(configPath);
|
|
32
|
+
console.log('✓ removed empty ' + configPath);
|
|
33
|
+
} else {
|
|
34
|
+
fs.writeFileSync(configPath, after);
|
|
35
|
+
console.log('✓ stripped managed block from ' + configPath);
|
|
36
|
+
}
|
|
37
|
+
}
|
|
38
|
+
}
|
|
39
|
+
console.log('gm-codex uninstalled.');
|
|
40
|
+
} catch (e) {
|
|
41
|
+
console.error('Uninstall failed:', e.message);
|
|
42
|
+
process.exit(1);
|
|
43
|
+
}
|
|
@@ -1,44 +0,0 @@
|
|
|
1
|
-
name: Publish to npm
|
|
2
|
-
|
|
3
|
-
on:
|
|
4
|
-
push:
|
|
5
|
-
branches:
|
|
6
|
-
- main
|
|
7
|
-
workflow_dispatch:
|
|
8
|
-
|
|
9
|
-
jobs:
|
|
10
|
-
publish:
|
|
11
|
-
runs-on: ubuntu-latest
|
|
12
|
-
steps:
|
|
13
|
-
- uses: actions/checkout@v4
|
|
14
|
-
|
|
15
|
-
- uses: actions/setup-node@v4
|
|
16
|
-
with:
|
|
17
|
-
node-version: '22'
|
|
18
|
-
registry-url: 'https://registry.npmjs.org'
|
|
19
|
-
|
|
20
|
-
- name: Publish to npm
|
|
21
|
-
run: |
|
|
22
|
-
PACKAGE=$(jq -r '.name' package.json)
|
|
23
|
-
VERSION=$(jq -r '.version' package.json)
|
|
24
|
-
echo "Package: $PACKAGE@$VERSION"
|
|
25
|
-
|
|
26
|
-
# Skip if this exact version is already on npm
|
|
27
|
-
PUBLISHED=$(npm view "$PACKAGE@$VERSION" version 2>/dev/null || echo "")
|
|
28
|
-
if [ "$PUBLISHED" = "$VERSION" ]; then
|
|
29
|
-
echo "✅ $PACKAGE@$VERSION already published - skipping"
|
|
30
|
-
exit 0
|
|
31
|
-
fi
|
|
32
|
-
|
|
33
|
-
echo "Publishing $PACKAGE@$VERSION..."
|
|
34
|
-
npm publish --access public 2>&1 | tee /tmp/npm-out.log; EXIT=${PIPESTATUS[0]}
|
|
35
|
-
if [ "$EXIT" != "0" ]; then
|
|
36
|
-
if grep -q "cannot publish over\|previously published" /tmp/npm-out.log; then
|
|
37
|
-
echo "⚠️ Version already published, skipping"
|
|
38
|
-
else
|
|
39
|
-
exit "$EXIT"
|
|
40
|
-
fi
|
|
41
|
-
fi
|
|
42
|
-
echo "✅ Published $PACKAGE@$VERSION"
|
|
43
|
-
env:
|
|
44
|
-
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
|