gm-qwen 2.0.780 → 2.0.782
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/gm.json +1 -1
- package/package.json +1 -1
- package/skills/gm/SKILL.md +3 -1
- package/skills/planning/SKILL.md +18 -2
package/gm.json
CHANGED
package/package.json
CHANGED
package/skills/gm/SKILL.md
CHANGED
|
@@ -45,7 +45,9 @@ A written PRD is the user's authorization. Once it exists, EXECUTE owns the work
|
|
|
45
45
|
|
|
46
46
|
**FINISH ALL REMAINING STEPS — HARD RULE**: when a request enumerates or implies multiple work items ("all", "any", "everything", "the rest", "remaining"), or after a covering family is constructed under MAXIMAL COVER, the agent finishes every witnessable item in the same turn. Stopping after one item to ask "which next?" is forbidden — the answer is *all of them*, in one chain, until `.gm/prd.yml` is empty and git is clean and pushed. Mid-chain "should I…", "want me to…", "which would you like…" prompts are forced closure; replace them with the next skill invocation.
|
|
47
47
|
|
|
48
|
-
Asking is permitted only as a last resort, when the next action is destructive-irreversible AND the PRD does not cover it, OR when user intent is genuinely irrecoverable from PRD, memory,
|
|
48
|
+
Asking is permitted only as a last resort, when the next action is destructive-irreversible AND the PRD does not cover it, OR when user intent is genuinely irrecoverable from PRD, memory, code, AND the public web. The channel is structured: `exec:pause` (renames `.gm/prd.yml` → `.gm/prd.paused.yml`, question in header). In-conversation asking is last-resort beneath last-resort.
|
|
49
|
+
|
|
50
|
+
**Web-search before pause / before user-ask — HARD RULE.** Before `exec:pause` or any in-conversation question whose answer plausibly exists on the public web (missing artifact, prebuilt binary, library status, build recipe, version compatibility, upstream issue, "does X exist for Y"), fire `WebSearch` and at least one targeted `WebFetch` first. Pause/ask only after the web pass returns empty, or returns candidates the agent has witnessed and rejected. Pausing on a question the web could have answered is forced closure dressed as humility — re-enter planning, web-search, and resume. Genuine user-only questions: private credentials, preference among already-surfaced viable options, destructive-irreversible authorization.
|
|
49
51
|
|
|
50
52
|
The size of the task, the cost of context, and the duration of CI are never grounds to ask.
|
|
51
53
|
|
package/skills/planning/SKILL.md
CHANGED
|
@@ -45,7 +45,9 @@ Runs until: .gm/prd.yml empty AND git clean AND all pushes confirmed AND CI gree
|
|
|
45
45
|
|
|
46
46
|
PRD written → execute to COMPLETE without asking the user. Doubts that arise during execution are resolved by witnessed probe, by recall, or by re-reading the PRD — never by asking. Any question whose answer is reachable from the agent's tools belongs to the agent, not the user.
|
|
47
47
|
|
|
48
|
-
Asking is last-resort: destructive-irreversible without PRD coverage, OR user intent irrecoverable from PRD/memory/code. Channel: `exec:pause` (renames `prd.yml` → `prd.paused.yml`; question in header). In-conversation asking is last-resort beneath last-resort.
|
|
48
|
+
Asking is last-resort: destructive-irreversible without PRD coverage, OR user intent irrecoverable from PRD/memory/code/web. Channel: `exec:pause` (renames `prd.yml` → `prd.paused.yml`; question in header). In-conversation asking is last-resort beneath last-resort.
|
|
49
|
+
|
|
50
|
+
**WEB-SEARCH BEFORE PAUSE — HARD RULE.** Before `exec:pause` for any blocking question whose answer could plausibly exist on the public web — missing artifact, unknown library/API, prebuilt binary, version compatibility, build recipe, upstream status — fire `WebSearch` and at least one `WebFetch` first. Only after the web pass returns empty (or returns options the agent then witnesses and rejects) is `exec:pause` legitimate. Pausing on a question the web could have answered is forced closure dressed up as humility — fix on sight by re-entering planning, web-searching, and resuming. The only questions that genuinely require user-ask are ones the public web cannot answer: private credentials, intent/preference between viable options the agent has *already surfaced*, destructive-irreversible authorization.
|
|
49
51
|
|
|
50
52
|
**Cannot stop while**: `.gm/prd.yml` has items | git uncommitted | git unpushed.
|
|
51
53
|
|
|
@@ -162,10 +164,24 @@ No comments. No scattered test files. 200-line limit per file. Fail loud. No dup
|
|
|
162
164
|
|
|
163
165
|
## SINGLE INTEGRATION TEST POLICY
|
|
164
166
|
|
|
165
|
-
One `test.js` at project root. 200-line
|
|
167
|
+
One `test.js` at project root. **200-line hard cap.** No fixtures, mocks, or scattered test files under any naming convention. Plain assertions, real data, real system. `gm-complete` runs it. Failure = regression to EXECUTE.
|
|
166
168
|
|
|
167
169
|
Any second test runner — under any name, in any directory — is a smuggled parallel surface and fights the discipline. If a behavior needs to be exercised in-page, register it in `window.__debug` and assert via `test.js`.
|
|
168
170
|
|
|
171
|
+
**Purpose: maximum surface coverage in 200 lines.** test.js is a budget, not a target. Every line should witness a load-bearing behavior; redundant assertions are dead weight. Subsystems get *one* group each — combined groups (e.g. `profiles+observability+auth+context+cron+batch`) are the norm, not the exception. As thoth grew from 17 → 21 → 14 named groups while the surface tripled, the win came from collapsing per-subsystem groups into multi-subsystem ones.
|
|
172
|
+
|
|
173
|
+
**Use overlap to exclude.** When subsystem A's test exercises B as a side effect, B does not need its own group — drop the redundant assertion. Examples that have proven out:
|
|
174
|
+
- The agent-machine tool-loop test exercises bash dispatch → no separate bash test needed beyond a smoke-call inside the tools+toolsets group.
|
|
175
|
+
- The dashboard test asserts the API surface AND that the registry has ≥N tools → covers tool registration coverage.
|
|
176
|
+
- The plugins+memory group exercises observability metrics + achievements → no need for a separate plugins-extra group.
|
|
177
|
+
- The gateway test exercising one platform plus a platform-stub-shape loop covers all 18 adapters in one group.
|
|
178
|
+
|
|
179
|
+
**Adding a new subsystem:** first try to fold its assertion into the closest existing group. Only create a new group when the subsystem's failure mode is genuinely orthogonal (e.g. compressor's iterative-update behavior is not exercised by any other group). Test surface should grow linearly with subsystem count, not multiplicatively, and the line budget is the forcing function.
|
|
180
|
+
|
|
181
|
+
**Pattern that works:** name combined groups by joining their subsystems with `+`, e.g. `home+config+skin`, `mcp+swe+distributions+account+credpool`, `env+pi+cli+tui+setup+website`. Future readers see the coverage at a glance. A group title with 4–6 components is healthy; a group with 1 component should be questioned.
|
|
182
|
+
|
|
183
|
+
**Hygiene at edit time:** every change to test.js prefers compaction over expansion. If `wc -l test.js > 200`, the discipline is *not* "split" — it's "merge groups + drop redundancy" until it fits. If the budget is genuinely insufficient for the load-bearing surface, the right move is to question whether the assertion is load-bearing, not to lift the cap.
|
|
184
|
+
|
|
169
185
|
## RESPONSE POLICY
|
|
170
186
|
|
|
171
187
|
Terse. Drop filler. Fragments OK. Pattern: `[thing] [action] [reason]. [next step].` Code/commits/PRs = normal prose.
|