gm-qwen 2.0.781 → 2.0.782
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/gm.json +1 -1
- package/package.json +1 -1
- package/skills/planning/SKILL.md +15 -1
package/gm.json
CHANGED
package/package.json
CHANGED
package/skills/planning/SKILL.md
CHANGED
|
@@ -164,10 +164,24 @@ No comments. No scattered test files. 200-line limit per file. Fail loud. No dup
|
|
|
164
164
|
|
|
165
165
|
## SINGLE INTEGRATION TEST POLICY
|
|
166
166
|
|
|
167
|
-
One `test.js` at project root. 200-line
|
|
167
|
+
One `test.js` at project root. **200-line hard cap.** No fixtures, mocks, or scattered test files under any naming convention. Plain assertions, real data, real system. `gm-complete` runs it. Failure = regression to EXECUTE.
|
|
168
168
|
|
|
169
169
|
Any second test runner — under any name, in any directory — is a smuggled parallel surface and fights the discipline. If a behavior needs to be exercised in-page, register it in `window.__debug` and assert via `test.js`.
|
|
170
170
|
|
|
171
|
+
**Purpose: maximum surface coverage in 200 lines.** test.js is a budget, not a target. Every line should witness a load-bearing behavior; redundant assertions are dead weight. Subsystems get *one* group each — combined groups (e.g. `profiles+observability+auth+context+cron+batch`) are the norm, not the exception. As thoth grew from 17 → 21 → 14 named groups while the surface tripled, the win came from collapsing per-subsystem groups into multi-subsystem ones.
|
|
172
|
+
|
|
173
|
+
**Use overlap to exclude.** When subsystem A's test exercises B as a side effect, B does not need its own group — drop the redundant assertion. Examples that have proven out:
|
|
174
|
+
- The agent-machine tool-loop test exercises bash dispatch → no separate bash test needed beyond a smoke-call inside the tools+toolsets group.
|
|
175
|
+
- The dashboard test asserts the API surface AND that the registry has ≥N tools → covers tool registration coverage.
|
|
176
|
+
- The plugins+memory group exercises observability metrics + achievements → no need for a separate plugins-extra group.
|
|
177
|
+
- The gateway test exercising one platform plus a platform-stub-shape loop covers all 18 adapters in one group.
|
|
178
|
+
|
|
179
|
+
**Adding a new subsystem:** first try to fold its assertion into the closest existing group. Only create a new group when the subsystem's failure mode is genuinely orthogonal (e.g. compressor's iterative-update behavior is not exercised by any other group). Test surface should grow linearly with subsystem count, not multiplicatively, and the line budget is the forcing function.
|
|
180
|
+
|
|
181
|
+
**Pattern that works:** name combined groups by joining their subsystems with `+`, e.g. `home+config+skin`, `mcp+swe+distributions+account+credpool`, `env+pi+cli+tui+setup+website`. Future readers see the coverage at a glance. A group title with 4–6 components is healthy; a group with 1 component should be questioned.
|
|
182
|
+
|
|
183
|
+
**Hygiene at edit time:** every change to test.js prefers compaction over expansion. If `wc -l test.js > 200`, the discipline is *not* "split" — it's "merge groups + drop redundancy" until it fits. If the budget is genuinely insufficient for the load-bearing surface, the right move is to question whether the assertion is load-bearing, not to lift the cap.
|
|
184
|
+
|
|
171
185
|
## RESPONSE POLICY
|
|
172
186
|
|
|
173
187
|
Terse. Drop filler. Fragments OK. Pattern: `[thing] [action] [reason]. [next step].` Code/commits/PRs = normal prose.
|