gm-qwen 2.0.754 → 2.0.756
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/plugkit.sha256 +6 -6
- package/bin/plugkit.version +1 -1
- package/gm.json +2 -2
- package/package.json +1 -1
- package/skills/gm/SKILL.md +18 -37
- package/skills/planning/SKILL.md +14 -16
package/bin/plugkit.sha256
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
|
-
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
1
|
+
fb30957c13634cf6c70fae0c43d06ea08d750ba0acf6aa458eb8d746b3b55b27 plugkit-win32-x64.exe
|
|
2
|
+
9ec0d6fc086a491d6ac6fa220f83c93771bb36d2c14fdc3585b9343b08bd4536 plugkit-win32-arm64.exe
|
|
3
|
+
764d862ab475b0168ee760b5fb825443bec02baeeaeb0f4d619a5296e821c30b plugkit-darwin-x64
|
|
4
|
+
0acab629fa681cd9682fc10abc21064f1813d550669d81cac3949c9439215cd3 plugkit-darwin-arm64
|
|
5
|
+
df52e7f94176e9fb0bdb1b1bdb00d8a1b1d622fe44ee27625dcabba579c35c52 plugkit-linux-x64
|
|
6
|
+
0b07cb8c3af4e6829b1d9b2584252f88dcf93d7bb30ee3816344d2a71810c8eb plugkit-linux-arm64
|
package/bin/plugkit.version
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
0.1.
|
|
1
|
+
0.1.251
|
package/gm.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "gm",
|
|
3
|
-
"version": "2.0.
|
|
3
|
+
"version": "2.0.756",
|
|
4
4
|
"description": "State machine agent with hooks, skills, and automated git enforcement",
|
|
5
5
|
"author": "AnEntrypoint",
|
|
6
6
|
"license": "MIT",
|
|
@@ -23,5 +23,5 @@
|
|
|
23
23
|
"publishConfig": {
|
|
24
24
|
"access": "public"
|
|
25
25
|
},
|
|
26
|
-
"plugkitVersion": "0.1.
|
|
26
|
+
"plugkitVersion": "0.1.251"
|
|
27
27
|
}
|
package/package.json
CHANGED
package/skills/gm/SKILL.md
CHANGED
|
@@ -41,62 +41,43 @@ Multiple facts → parallel Agent calls in ONE message. End-of-turn: scan for un
|
|
|
41
41
|
|
|
42
42
|
## AUTONOMY — HARD RULE
|
|
43
43
|
|
|
44
|
-
|
|
44
|
+
A written PRD is the user's authorization. Once it exists, EXECUTE owns the work to COMPLETE. Resolve every doubt that arises during execution by witnessed probe, by recall, or by re-reading the PRD — never by asking the user. Any question whose answer the agent could obtain itself is a question the agent owes itself, not the user.
|
|
45
45
|
|
|
46
|
-
|
|
47
|
-
- "Should I continue with X?" / "Want me to do Y next?" / "Want me to also Z?"
|
|
48
|
-
- "This is a lot — should I do A first and confirm?" / "Two options: A or B, which?"
|
|
49
|
-
- Pre-confirmation before multi-file edits when scope is already clear
|
|
50
|
-
- Stopping after partial completion to summarize and await direction
|
|
46
|
+
Asking is permitted only as a last resort, when the next action is destructive-irreversible AND the PRD does not cover it, OR when user intent is genuinely irrecoverable from PRD, memory, and code. The channel is structured: `exec:pause` (renames `.gm/prd.yml` → `.gm/prd.paused.yml`, question in header). In-conversation asking is last-resort beneath last-resort.
|
|
51
47
|
|
|
52
|
-
|
|
53
|
-
- Destructive-irreversible decision with no prior context AND no PRD coverage
|
|
54
|
-
- User intent genuinely ambiguous AND cannot be inferred from PRD/memory/code
|
|
55
|
-
- Channel: prefer `exec:pause` (renames .gm/prd.yml → .gm/prd.paused.yml; question lives in header). In-conversation asking is last-resort only.
|
|
56
|
-
|
|
57
|
-
A long task is not a reason to ask. Context limits are not a reason to ask. CI cascade time is not a reason to ask. Just emit the PRD and execute.
|
|
48
|
+
The size of the task, the cost of context, and the duration of CI are never grounds to ask.
|
|
58
49
|
|
|
59
50
|
## LAWFUL DOWNGRADE — HARD RULE
|
|
60
51
|
|
|
61
|
-
Per paper III §2.5
|
|
52
|
+
Per paper III §2.5 and paper IV §2: lawful downgrade — writing a weaker, true statement in place of a stronger, unearned one — is always available; forced closure never is.
|
|
62
53
|
|
|
63
|
-
Refusal is forced closure.
|
|
54
|
+
Refusal is forced closure. So is *distributed* refusal — attempting one bounded subset of the request and treating the rest as "follow-up" when other witnessable subsets exist. Both bypass witnessed execution; both are the inverse of bluffing strength.
|
|
64
55
|
|
|
65
|
-
**
|
|
66
|
-
- "Honest stop —" / "Stopping for a hard, honest call" / "Stopping here, unambiguously"
|
|
67
|
-
- "I cannot do this from inside this conversation"
|
|
68
|
-
- "I have to push back before writing a PRD I can't fulfill"
|
|
69
|
-
- "Pretending I can would be the most expensive kind of lie"
|
|
70
|
-
- "I don't have a working browser tool" / "X is unavailable in this environment" (when exec:browser, exec:codesearch, Read, etc. are in fact available)
|
|
71
|
-
- Any preamble that announces inability before attempting the bounded subset
|
|
56
|
+
**Required move when scope exceeds reach** (paper IV §2.3): construct a *covering family* of bounded subsets — every subset of the request that is witnessable from this session — and write the family into the PRD. Execute every member. Single-subset delivery is legitimate only when no other witnessable subset exists; otherwise it is distributed refusal under another name. At end-of-turn, name the residual complement explicitly, with the reason each excluded piece falls outside the witnessable closure.
|
|
72
57
|
|
|
73
|
-
|
|
58
|
+
The discipline is enforced by what is delivered, not by which words appear. Before closing the turn, check that the union of committed work plus named complement equals the witnessable closure of the request. Anything witnessable that falls in neither set means the cover is not yet maximal — re-enter planning to expand it.
|
|
74
59
|
|
|
75
60
|
## FIX ON SIGHT — HARD RULE
|
|
76
61
|
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
Forbidden: `// TODO fix later` | catching to ignore | `2>/dev/null` to hide | "out of scope" when scope is the same file | "we can address that next session" | leaving a red CI to ship docs | merging around a failing assertion | adding `.skip` / `xit` / `it.todo`.
|
|
62
|
+
Every issue surfaced during work is fixed in-band, this turn, at root cause. Defer-markers, swallowed errors, suppressed output, skipped tests, and "address it next session" are all variants of the same failure: a known-bad signal carried past the moment of detection. Each is a small forced closure.
|
|
80
63
|
|
|
81
|
-
|
|
64
|
+
Surface → diagnose → fix at root cause → re-witness → continue. If the fix uncovers a new unknown, regress to `planning`. If the fix is itself genuinely out-of-scope-irreversible, the residual goes into `.gm/prd.yml` *before* moving on — narration is not a substitute for an item.
|
|
82
65
|
|
|
83
|
-
A skill chain that
|
|
66
|
+
A skill chain that ships while ignoring a known-bad signal is forced closure (see LAWFUL DOWNGRADE).
|
|
84
67
|
|
|
85
68
|
## BROWSER WITNESS — HARD RULE
|
|
86
69
|
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
Mandatory protocol (every client edit):
|
|
90
|
-
1. Boot the real server / open the static page → witness HTTP 200
|
|
91
|
-
2. `exec:browser` → `page.goto(url)` → poll for the global the change affects (`window.__app.<system>`, `window.__debug.<module>`)
|
|
92
|
-
3. `page.evaluate(() => …)` asserting the specific invariant the change established — instance counts, scene meshes, DOM nodes, render stats, network frames
|
|
93
|
-
4. Capture witnessed numbers in the response. "Looks fine" / "should work" / "node test passes" = NOT a witness
|
|
70
|
+
Editing code that runs in a browser requires a live `exec:browser` witness in the same turn as the edit. The witness does not defer to a later phase; later phases re-witness on top, they do not replace this one.
|
|
94
71
|
|
|
95
|
-
|
|
72
|
+
Protocol on every client edit:
|
|
73
|
+
1. Boot the real surface — server up, page reachable, HTTP 200 witnessed.
|
|
74
|
+
2. `exec:browser` → navigate → poll for the global the change affects.
|
|
75
|
+
3. `page.evaluate` asserting the specific invariant the change establishes. Capture the witnessed numbers in the response.
|
|
76
|
+
4. Variance from expectation → fix at root cause, re-witness (FIX ON SIGHT). Never advance on unwitnessed client behavior.
|
|
96
77
|
|
|
97
|
-
|
|
78
|
+
Pure-prose edits to static documents with no JS/canvas/DOM behavior change are exempt; tag the exemption explicitly with the reason so the skip is auditable. Silent skip on actual behavior change is forced closure.
|
|
98
79
|
|
|
99
|
-
This rule fires in EXECUTE (witness on edit), EMIT (post-emit verify), and VERIFY (final gate). All three.
|
|
80
|
+
This rule fires in EXECUTE (witness on edit), EMIT (post-emit verify), and VERIFY (final gate). All three.
|
|
100
81
|
|
|
101
82
|
## EXECUTION ORDER
|
|
102
83
|
|
package/skills/planning/SKILL.md
CHANGED
|
@@ -43,35 +43,33 @@ Runs until: .gm/prd.yml empty AND git clean AND all pushes confirmed AND CI gree
|
|
|
43
43
|
|
|
44
44
|
## AUTONOMY — HARD RULE
|
|
45
45
|
|
|
46
|
-
PRD written → execute to COMPLETE without asking the user.
|
|
46
|
+
PRD written → execute to COMPLETE without asking the user. Doubts that arise during execution are resolved by witnessed probe, by recall, or by re-reading the PRD — never by asking. Any question whose answer is reachable from the agent's tools belongs to the agent, not the user.
|
|
47
47
|
|
|
48
|
-
Asking
|
|
48
|
+
Asking is last-resort: destructive-irreversible without PRD coverage, OR user intent irrecoverable from PRD/memory/code. Channel: `exec:pause` (renames `prd.yml` → `prd.paused.yml`; question in header). In-conversation asking is last-resort beneath last-resort.
|
|
49
49
|
|
|
50
|
-
**Cannot stop while**:
|
|
50
|
+
**Cannot stop while**: `.gm/prd.yml` has items | git uncommitted | git unpushed.
|
|
51
51
|
|
|
52
52
|
## LAWFUL DOWNGRADE — HARD RULE
|
|
53
53
|
|
|
54
|
-
Per paper III §2.5: lawful downgrade is always available; forced closure (refusal) is never available.
|
|
54
|
+
Per paper III §2.5 and paper IV §2: lawful downgrade is always available; forced closure (refusal) is never available. Refusal has two shapes — total (no work delivered) and distributed (one bounded subset delivered, the rest abandoned despite being witnessable). Both bypass witnessed execution.
|
|
55
55
|
|
|
56
|
-
|
|
56
|
+
**Required move when scope exceeds reach**: construct a *covering family* — every bounded subset of the request that is witnessable from this session — and write the family into the PRD as separate items, with the dependency graph explicit so independent members parallelize. Execute every member. Single-subset delivery is legitimate only when no other witnessable subset exists. The residual complement is named at end-of-turn, with the reason each excluded piece falls outside the witnessable closure.
|
|
57
57
|
|
|
58
|
-
|
|
58
|
+
Enforcement is on what is delivered, not on which words appear. Before closing the turn, check that committed work + named complement = witnessable closure of the request. Gap = cover not yet maximal → re-enter PLAN to expand.
|
|
59
59
|
|
|
60
60
|
## FIX ON SIGHT — HARD RULE
|
|
61
61
|
|
|
62
|
-
Every issue surfaced during planning, execution, or verification
|
|
62
|
+
Every issue surfaced during planning, execution, or verification is fixed in-band, the same session, at root cause. A known-bad signal carried past the moment of detection — by deferral, suppression, silencing, skipping, or "next time" narration — is a small forced closure.
|
|
63
63
|
|
|
64
|
-
Surface → diagnose
|
|
64
|
+
Surface → diagnose → fix → re-witness → continue. New unknown surfaced by the fix → regress here. Genuinely out-of-scope-irreversible → the residual goes into `.gm/prd.yml` *before* moving on; narration is not a substitute for an item.
|
|
65
65
|
|
|
66
66
|
## BROWSER WITNESS — HARD RULE
|
|
67
67
|
|
|
68
|
-
|
|
68
|
+
A `.prd` item that touches browser-facing code is not plan-complete unless its acceptance criteria include a live `exec:browser` witness with a `page.evaluate` assertion against the specific invariant the change establishes. "Manual verification", "test.js passes", and "browser test optional" are all unwitnessed and therefore unacceptable.
|
|
69
69
|
|
|
70
|
-
|
|
70
|
+
The trigger is functional, not a path-list: any change whose effect is observable in the DOM, canvas, WebGL surface, network frames captured by the page, or any global the page exposes, requires the browser witness. Pure-prose edits to static documents with no behavior change are exempt; the exemption is tagged on the item with the reason.
|
|
71
71
|
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
This propagates: EXECUTE witnesses on edit, EMIT re-witnesses post-write, VERIFY runs the final gate. Plan must encode it so all three layers fire.
|
|
72
|
+
Propagation: EXECUTE witnesses on edit, EMIT re-witnesses post-write, VERIFY runs the final gate. The plan must encode the rule so all three layers fire.
|
|
75
73
|
|
|
76
74
|
## SKIP PLANNING (DEFAULT for small work)
|
|
77
75
|
|
|
@@ -98,7 +96,7 @@ Client: `window.__debug` live registry; modules register on mount.
|
|
|
98
96
|
|
|
99
97
|
`console.log` ≠ observability. Discovery of gap → add .prd item immediately, never deferred.
|
|
100
98
|
|
|
101
|
-
**No parallel
|
|
99
|
+
**No parallel observability surfaces.** Per paper II §5.4, `window.__debug` is THE in-page registry; `test.js` at project root is the sole out-of-page test asset. Any new file whose purpose is to exercise, smoke-test, demo, or sandbox in-page behavior outside that registry is a parallel surface that fights the discipline — extend the registry instead.
|
|
102
100
|
|
|
103
101
|
## .PRD FORMAT
|
|
104
102
|
|
|
@@ -152,9 +150,9 @@ No comments. No scattered test files. 200-line limit per file. Fail loud. No dup
|
|
|
152
150
|
|
|
153
151
|
## SINGLE INTEGRATION TEST POLICY
|
|
154
152
|
|
|
155
|
-
One `test.js` at project root. 200-line max. No
|
|
153
|
+
One `test.js` at project root. 200-line max. No fixtures, mocks, or scattered test files under any naming convention. Plain assertions, real data, real system. `gm-complete` runs it. Failure = regression to EXECUTE.
|
|
156
154
|
|
|
157
|
-
|
|
155
|
+
Any second test runner — under any name, in any directory — is a smuggled parallel surface and fights the discipline. If a behavior needs to be exercised in-page, register it in `window.__debug` and assert via `test.js`.
|
|
158
156
|
|
|
159
157
|
## RESPONSE POLICY
|
|
160
158
|
|