xtrm-tools 0.5.47 → 0.5.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "xtrm-tools",
3
- "version": "0.5.47",
3
+ "version": "0.5.48",
4
4
  "description": "xtrm-tools: dual-runtime workflow enforcement (Claude Code + Pi) — hooks, extensions, skills, and MCP servers",
5
5
  "author": {
6
6
  "name": "jaggers"
package/cli/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "xtrm-cli",
3
- "version": "0.5.47",
3
+ "version": "0.5.48",
4
4
  "description": "Claude Code tools installer (skills, hooks, MCP servers)",
5
5
  "main": "./dist/index.js",
6
6
  "type": "module",
@@ -9,6 +9,7 @@
9
9
  "xt": "dist/index.cjs"
10
10
  },
11
11
  "scripts": {
12
+ "prebuild": "node -e \"if(process.cwd().includes('/.xtrm/worktrees/')){console.error('ERROR: Do not run npm run build from a worktree — dist paths will be contaminated.\\nRun from the main repo: cd <repo-root>/cli && npm run build');process.exit(1)}\"",
12
13
  "build": "tsup",
13
14
  "dev": "tsx src/index.ts",
14
15
  "typecheck": "tsc --noEmit",
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "xtrm-tools",
3
- "version": "0.5.47",
3
+ "version": "0.5.48",
4
4
  "description": "Claude Code tools installer (skills, hooks, MCP servers)",
5
5
  "license": "MIT",
6
6
  "type": "module",
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "xtrm-tools",
3
- "version": "0.5.47",
3
+ "version": "0.5.48",
4
4
  "description": "xtrm-tools: dual-runtime workflow enforcement (Claude Code + Pi) — hooks, extensions, skills, and MCP servers",
5
5
  "author": {
6
6
  "name": "jaggers"
@@ -206,3 +206,260 @@ Agent closes a feature issue that was done ad-hoc. No test issue found. Agent:
206
206
  2. Picks strategy
207
207
  3. Creates test issue as child of same parent
208
208
  4. Documents what to assert based on the actual code
209
+
210
+ ## Anti-Pattern Checklist
211
+
212
+ Run this checklist at both trigger points (planning and closure review). Flag any anti-patterns in the test issue description before closing.
213
+
214
+ ### 1. Assertion-free tests
215
+ **Detect**: Test body calls functions/methods but has no `assert`, `expect`, or equivalent statement.
216
+ **Fix**: Add at least one meaningful assertion. If the goal is "doesn't throw", assert that explicitly — `with pytest.raises(...)` or `expect(() => fn()).not.toThrow()`.
217
+
218
+ ### 2. Tautological assertions
219
+ **Detect**: The assertion can only fail if the test framework itself is broken. E.g. `assert result == result`, `expect(true).toBe(true)`, asserting a value against the same expression used to produce it.
220
+ **Fix**: Assert against a concrete expected value derived independently from the production code. If you can't state what the expected value is without running the code, the test has no falsifiable claim.
221
+
222
+ ### 3. Context leakage / shared mutable state
223
+ **Detect**: Tests share module-level variables, database rows, file state, or global config without reset between runs. Symptoms: tests pass individually but fail in suite order.
224
+ **Fix**: Use fixtures with setup/teardown (`beforeEach`/`afterEach`, pytest fixtures with function scope). Every test starts from a clean slate.
225
+
226
+ ### 4. Over-mocking internal collaborators
227
+ **Detect**: Mocks are patching classes or functions that live in the same module under test — not external services. The test validates that internal wiring was called, not that the observable outcome is correct.
228
+ **Fix**: Only mock at system boundaries (HTTP clients, file I/O, external services). Test internal collaborators by letting them run. If they're hard to instantiate, extract the pure logic and test that directly.
229
+
230
+ ### 5. Tests that cannot fail under realistic regressions
231
+ **Detect**: Remove the core logic being tested and re-read the test — would it still pass? If yes, the test provides no protection. Common form: only testing the happy path of a function whose bug would only appear in error paths.
232
+ **Fix**: Add at least one negative-path or edge-case assertion that would catch the most likely regression. Consult the implementation for obvious failure modes.
233
+
234
+ ## Priority Heuristics
235
+
236
+ Test issues inherit priority from their implementation issues with bounded adjustment. The table below gives the deterministic mapping.
237
+
238
+ | Implementation risk | Test issue priority | Examples |
239
+ |---|---|---|
240
+ | Security / auth / protocol compat | P0 (equal) | Auth token validation, schema migration safety, API contract |
241
+ | Regression-critical boundary path | P0–P1 (equal) | Client URL routing, CLI exit codes used by external tooling |
242
+ | High-business-impact core logic | P1 (equal or +0) | Pricing computations, session state transitions |
243
+ | Standard domain logic | P2 (+0 or +1) | Config merge, output formatters, parsers |
244
+ | Low-risk internals / non-critical adapters | P3 (+1) | Helper utilities, optional UI formatting |
245
+ | Polish / test debt cleanup | P4 | Improving existing test coverage, test naming |
246
+
247
+ **Inheritance rule**: start from the implementation issue's priority. Apply +1 if the test is covering a well-understood path with low regression risk. Never go lower than P2 for boundary or shell layer tests — integration tests are load-bearing.
248
+
249
+ **Equal priority examples**:
250
+ - Impl is P1 (auth endpoint) → test issue is P1 (auth contract test must ship with the feature)
251
+ - Impl is P0 (critical fix) → test issue is P0 (regression test must land in same PR)
252
+
253
+ **+1 priority examples**:
254
+ - Impl is P2 (output formatter) → test issue is P3 (unit tests are useful but not blocking)
255
+ - Impl is P3 (optional config key) → test issue is P4 (test debt, tackle in cleanup)
256
+
257
+ ## Definition of Done Templates
258
+
259
+ Use these templates verbatim in test issue descriptions. Replace `<...>` placeholders.
260
+
261
+ ### Core layer DoD
262
+
263
+ ```
264
+ Layer: core
265
+ Strategy: <unit | property-based | example-based>
266
+ Covers: <impl issue IDs>
267
+
268
+ Assertions required:
269
+ - [ ] Positive path: <expected output for valid input>
270
+ - [ ] Negative path: <expected error/output for invalid input>
271
+ - [ ] Edge cases explicitly enumerated: <list: empty input, zero, max boundary, ...>
272
+ - [ ] Invariants/properties included: <e.g. "result is always sorted", "output length == input length">
273
+
274
+ Fixture policy:
275
+ - [ ] No shared mutable state between tests
276
+ - [ ] Deterministic fixtures (no random, no time.now() without injection)
277
+ - [ ] Each test constructs its own input independently
278
+
279
+ Done when: all assertions above are implemented and passing in CI.
280
+ ```
281
+
282
+ ### Boundary layer DoD
283
+
284
+ ```
285
+ Layer: boundary
286
+ Strategy: <live-contract | recorded-fixture | mock (last resort)>
287
+ Covers: <impl issue IDs>
288
+
289
+ Assertions required:
290
+ - [ ] Schema/contract assertions: <field presence, types, required vs optional>
291
+ - [ ] Error codes and retry/fallback: <e.g. 404→empty list, 500→raises ServiceError>
292
+ - [ ] Drift-safe: assertions check field presence and types, not brittle internal structure
293
+ - [ ] Live-first policy documented: <live | recorded-fixture | mock — reason for choice>
294
+
295
+ Done when: contract assertions pass against live service (or recorded fixture if live unavailable).
296
+ Fallback documented in issue if live is not accessible.
297
+ ```
298
+
299
+ ### Shell layer DoD
300
+
301
+ ```
302
+ Layer: shell
303
+ Strategy: integration (subprocess or function-level wiring test)
304
+ Covers: <impl issue IDs>
305
+
306
+ Assertions required:
307
+ - [ ] End-to-end observable outcomes: <what the user sees — output format, exit code>
308
+ - [ ] Failure-mode UX: <error messages, non-zero exit codes, stderr vs stdout>
309
+ - [ ] Cross-component wiring: <core + boundary are called and integrated correctly>
310
+ - [ ] At least one real-data scenario (not mocked) if service is accessible
311
+
312
+ Done when: integration tests run against real components (not mocked internals) and cover
313
+ both success and at least one failure path.
314
+ ```
315
+
316
+ ## Critical-Path Coverage
317
+
318
+ Do not frame test issues around coverage percentages. Frame them around critical paths and risk rationale.
319
+
320
+ Every test issue description must include a **critical path map**:
321
+
322
+ ```
323
+ Critical paths covered:
324
+ - <path 1 and risk rationale>
325
+ - <path 2 and risk rationale>
326
+
327
+ Known deferred paths (with follow-up refs):
328
+ - <path not covered yet> → follow-up: <bd issue ID or "to be created">
329
+ ```
330
+
331
+ **Why**: a 90% line-coverage number says nothing about whether the one path that processes payments is tested. A critical path map forces explicit reasoning about what matters and what was skipped.
332
+
333
+ **What counts as a critical path**:
334
+ - Any path that involves auth, money, data loss, or external contract compliance
335
+ - Any path exercised by the user-facing CLI commands described in the issue
336
+ - Any path explicitly mentioned in the implementation issue's acceptance criteria
337
+
338
+ **What to do with deferred paths**:
339
+ - Document them — don't silently skip
340
+ - Create a follow-up test issue if the deferred path is P2 or higher risk
341
+ - Reference the follow-up issue ID in the current test issue's description
342
+
343
+ ## Advisory vs Enforcement Boundary
344
+
345
+ This skill is advisory. It recommends test strategy, creates test issues, and flags anti-patterns. It does not block code execution or enforce pass/fail decisions — that is the job of hooks and quality gates.
346
+
347
+ | Concern | Who owns it | How enforced |
348
+ |---|---|---|
349
+ | Test strategy selection (TDD vs contract vs unit) | This skill | Recommendation only |
350
+ | Anti-pattern detection in test issues | This skill | Checklist in issue description |
351
+ | Priority assignment | This skill | Heuristics table above |
352
+ | DoD template in issue description | This skill | Template pasted into bd issue |
353
+ | CI test pass/fail | quality-gates hook | PostToolUse hook blocks on test failures |
354
+ | Test file lint/type correctness | quality-gates hook | ESLint + mypy on every edit |
355
+ | Branch not mergeable without tests | Not enforced | Human review — no automated gate today |
356
+ | Claiming work without test issue existing | Not enforced | Human judgment — skill creates test issue at closure if missing |
357
+
358
+ **Example — advisory boundary in practice**:
359
+
360
+ You are planning tests for `.14` (async HTTP client). This skill:
361
+ - Classifies as boundary layer ✓
362
+ - Recommends live-contract tests ✓
363
+ - Creates a test issue with DoD template ✓
364
+ - Flags if you try to describe tests that only mock the HTTP layer ✓ (anti-pattern 4)
365
+
366
+ It does NOT:
367
+ - Block `.14` from closing if the test issue isn't done
368
+ - Fail the build if the test issue is open
369
+ - Require approval before the implementation is merged
370
+
371
+ The test issue is a tracked commitment, not a gate. Gating is opt-in via `bd dep` dependencies you set up during planning.
372
+
373
+ ## v1.1 Format Examples
374
+
375
+ ### Example A — Planning phase, boundary + shell epic
376
+
377
+ Epic: "Implement gitnexus MCP sync in xtrm install"
378
+
379
+ Children: `.1` (MCP config writer), `.2` (sync-on-install integration), `.3` (CLI `xtrm mcp` command)
380
+
381
+ Classification:
382
+ - `.1` → boundary (writes to `.mcp.json`, file I/O)
383
+ - `.2` → shell (orchestrates install flow)
384
+ - `.3` → shell (CLI command)
385
+
386
+ Test issues created:
387
+
388
+ ```
389
+ bd create "Test: MCP config writer — contract tests for .mcp.json output" \
390
+ -t task -p 2 --parent <epic> \
391
+ -d "Layer: boundary
392
+ Strategy: example-based (file I/O, no external service)
393
+ Covers: .1
394
+
395
+ Assertions required:
396
+ - [ ] Positive path: valid servers config produces correct .mcp.json structure
397
+ - [ ] Negative path: invalid server entry raises validation error
398
+ - [ ] Edge cases: empty servers list, duplicate server names, existing .mcp.json is merged not overwritten
399
+ - [ ] Drift-safe: assert on field presence (name, command, args), not internal object identity
400
+
401
+ Critical paths covered:
402
+ - gitnexus server entry written with correct stdio transport — risk: wrong transport breaks MCP
403
+ - existing user entries preserved during merge — risk: data loss
404
+
405
+ Known deferred paths:
406
+ - test with malformed existing .mcp.json → follow-up: to be created (P3)
407
+
408
+ Done when: all assertions pass, no shared state between tests."
409
+ ```
410
+
411
+ ```
412
+ bd create "Test: xtrm install MCP sync + xtrm mcp CLI — integration tests" \
413
+ -t task -p 2 --parent <epic> \
414
+ -d "Layer: shell
415
+ Strategy: integration (subprocess)
416
+ Covers: .2, .3
417
+
418
+ Assertions required:
419
+ - [ ] End-to-end: xtrm install writes correct .mcp.json in temp project dir
420
+ - [ ] CLI: xtrm mcp list outputs expected server names
421
+ - [ ] Failure-mode: xtrm mcp add with duplicate name exits non-zero with clear error
422
+ - [ ] Cross-component: install flow calls MCP writer with correct config
423
+
424
+ Critical paths covered:
425
+ - full install → .mcp.json present and readable by Claude Code — risk: MCP servers not available
426
+ - CLI add + list roundtrip — risk: user cannot inspect installed servers
427
+
428
+ Known deferred paths:
429
+ - test with no write permission on project dir → follow-up: to be created (P4)
430
+
431
+ Done when: integration tests run against real file system in temp dir, no mocked internals."
432
+ ```
433
+
434
+ ---
435
+
436
+ ### Example B — Closure gate, core layer, implementation diverged
437
+
438
+ Closing `.22` (config merge logic). Existing test issue `.31` was written before implementation.
439
+
440
+ What `.22` actually built:
441
+ - Added precedence chain: env > file > defaults (original plan had only file > defaults)
442
+ - Added type coercion for boolean env vars ("true"/"false" → bool)
443
+ - Removed support for `.xtrm.yaml` (only `.xtrm/config.json` now)
444
+
445
+ Updated test issue `.31`:
446
+
447
+ ```
448
+ bd update xtrm-31 --notes "Scope updated after .22 completed:
449
+ + Add test: env var takes precedence over file config (new precedence chain)
450
+ + Add test: 'true'/'false' env vars coerced to bool correctly
451
+ + Add test: 'TRUE', '1', '0' edge cases for bool coercion
452
+ + Remove test: .xtrm.yaml loading (format removed in .22)
453
+
454
+ Anti-pattern check:
455
+ - [ ] tautological: none detected
456
+ - [ ] over-mocking: env injection via monkeypatch only, no internal mocking
457
+ - [ ] shared state: each test resets env via fixture
458
+
459
+ Critical paths covered:
460
+ - env > file > defaults chain — risk: wrong precedence silently overrides user config
461
+ - bool coercion — risk: 'false' string treated as truthy in Python
462
+
463
+ Known deferred paths:
464
+ - test with missing HOME dir (pathlib resolution edge case) → follow-up: xtrm-4x (P4)"
465
+ ```
@@ -206,3 +206,260 @@ Agent closes a feature issue that was done ad-hoc. No test issue found. Agent:
206
206
  2. Picks strategy
207
207
  3. Creates test issue as child of same parent
208
208
  4. Documents what to assert based on the actual code
209
+
210
+ ## Anti-Pattern Checklist
211
+
212
+ Run this checklist at both trigger points (planning and closure review). Flag any anti-patterns in the test issue description before closing.
213
+
214
+ ### 1. Assertion-free tests
215
+ **Detect**: Test body calls functions/methods but has no `assert`, `expect`, or equivalent statement.
216
+ **Fix**: Add at least one meaningful assertion. If the goal is "doesn't throw", assert that explicitly — `with pytest.raises(...)` or `expect(() => fn()).not.toThrow()`.
217
+
218
+ ### 2. Tautological assertions
219
+ **Detect**: The assertion can only fail if the test framework itself is broken. E.g. `assert result == result`, `expect(true).toBe(true)`, asserting a value against the same expression used to produce it.
220
+ **Fix**: Assert against a concrete expected value derived independently from the production code. If you can't state what the expected value is without running the code, the test has no falsifiable claim.
221
+
222
+ ### 3. Context leakage / shared mutable state
223
+ **Detect**: Tests share module-level variables, database rows, file state, or global config without reset between runs. Symptoms: tests pass individually but fail in suite order.
224
+ **Fix**: Use fixtures with setup/teardown (`beforeEach`/`afterEach`, pytest fixtures with function scope). Every test starts from a clean slate.
225
+
226
+ ### 4. Over-mocking internal collaborators
227
+ **Detect**: Mocks are patching classes or functions that live in the same module under test — not external services. The test validates that internal wiring was called, not that the observable outcome is correct.
228
+ **Fix**: Only mock at system boundaries (HTTP clients, file I/O, external services). Test internal collaborators by letting them run. If they're hard to instantiate, extract the pure logic and test that directly.
229
+
230
+ ### 5. Tests that cannot fail under realistic regressions
231
+ **Detect**: Remove the core logic being tested and re-read the test — would it still pass? If yes, the test provides no protection. Common form: only testing the happy path of a function whose bug would only appear in error paths.
232
+ **Fix**: Add at least one negative-path or edge-case assertion that would catch the most likely regression. Consult the implementation for obvious failure modes.
233
+
234
+ ## Priority Heuristics
235
+
236
+ Test issues inherit priority from their implementation issues with bounded adjustment. The table below gives the deterministic mapping.
237
+
238
+ | Implementation risk | Test issue priority | Examples |
239
+ |---|---|---|
240
+ | Security / auth / protocol compat | P0 (equal) | Auth token validation, schema migration safety, API contract |
241
+ | Regression-critical boundary path | P0–P1 (equal) | Client URL routing, CLI exit codes used by external tooling |
242
+ | High-business-impact core logic | P1 (equal or +0) | Pricing computations, session state transitions |
243
+ | Standard domain logic | P2 (+0 or +1) | Config merge, output formatters, parsers |
244
+ | Low-risk internals / non-critical adapters | P3 (+1) | Helper utilities, optional UI formatting |
245
+ | Polish / test debt cleanup | P4 | Improving existing test coverage, test naming |
246
+
247
+ **Inheritance rule**: start from the implementation issue's priority. Apply +1 if the test is covering a well-understood path with low regression risk. Never go lower than P2 for boundary or shell layer tests — integration tests are load-bearing.
248
+
249
+ **Equal priority examples**:
250
+ - Impl is P1 (auth endpoint) → test issue is P1 (auth contract test must ship with the feature)
251
+ - Impl is P0 (critical fix) → test issue is P0 (regression test must land in same PR)
252
+
253
+ **+1 priority examples**:
254
+ - Impl is P2 (output formatter) → test issue is P3 (unit tests are useful but not blocking)
255
+ - Impl is P3 (optional config key) → test issue is P4 (test debt, tackle in cleanup)
256
+
257
+ ## Definition of Done Templates
258
+
259
+ Use these templates verbatim in test issue descriptions. Replace `<...>` placeholders.
260
+
261
+ ### Core layer DoD
262
+
263
+ ```
264
+ Layer: core
265
+ Strategy: <unit | property-based | example-based>
266
+ Covers: <impl issue IDs>
267
+
268
+ Assertions required:
269
+ - [ ] Positive path: <expected output for valid input>
270
+ - [ ] Negative path: <expected error/output for invalid input>
271
+ - [ ] Edge cases explicitly enumerated: <list: empty input, zero, max boundary, ...>
272
+ - [ ] Invariants/properties included: <e.g. "result is always sorted", "output length == input length">
273
+
274
+ Fixture policy:
275
+ - [ ] No shared mutable state between tests
276
+ - [ ] Deterministic fixtures (no random, no time.now() without injection)
277
+ - [ ] Each test constructs its own input independently
278
+
279
+ Done when: all assertions above are implemented and passing in CI.
280
+ ```
281
+
282
+ ### Boundary layer DoD
283
+
284
+ ```
285
+ Layer: boundary
286
+ Strategy: <live-contract | recorded-fixture | mock (last resort)>
287
+ Covers: <impl issue IDs>
288
+
289
+ Assertions required:
290
+ - [ ] Schema/contract assertions: <field presence, types, required vs optional>
291
+ - [ ] Error codes and retry/fallback: <e.g. 404→empty list, 500→raises ServiceError>
292
+ - [ ] Drift-safe: assertions check field presence and types, not brittle internal structure
293
+ - [ ] Live-first policy documented: <live | recorded-fixture | mock — reason for choice>
294
+
295
+ Done when: contract assertions pass against live service (or recorded fixture if live unavailable).
296
+ Fallback documented in issue if live is not accessible.
297
+ ```
298
+
299
+ ### Shell layer DoD
300
+
301
+ ```
302
+ Layer: shell
303
+ Strategy: integration (subprocess or function-level wiring test)
304
+ Covers: <impl issue IDs>
305
+
306
+ Assertions required:
307
+ - [ ] End-to-end observable outcomes: <what the user sees — output format, exit code>
308
+ - [ ] Failure-mode UX: <error messages, non-zero exit codes, stderr vs stdout>
309
+ - [ ] Cross-component wiring: <core + boundary are called and integrated correctly>
310
+ - [ ] At least one real-data scenario (not mocked) if service is accessible
311
+
312
+ Done when: integration tests run against real components (not mocked internals) and cover
313
+ both success and at least one failure path.
314
+ ```
315
+
316
+ ## Critical-Path Coverage
317
+
318
+ Do not frame test issues around coverage percentages. Frame them around critical paths and risk rationale.
319
+
320
+ Every test issue description must include a **critical path map**:
321
+
322
+ ```
323
+ Critical paths covered:
324
+ - <path 1 and risk rationale>
325
+ - <path 2 and risk rationale>
326
+
327
+ Known deferred paths (with follow-up refs):
328
+ - <path not covered yet> → follow-up: <bd issue ID or "to be created">
329
+ ```
330
+
331
+ **Why**: a 90% line-coverage number says nothing about whether the one path that processes payments is tested. A critical path map forces explicit reasoning about what matters and what was skipped.
332
+
333
+ **What counts as a critical path**:
334
+ - Any path that involves auth, money, data loss, or external contract compliance
335
+ - Any path exercised by the user-facing CLI commands described in the issue
336
+ - Any path explicitly mentioned in the implementation issue's acceptance criteria
337
+
338
+ **What to do with deferred paths**:
339
+ - Document them — don't silently skip
340
+ - Create a follow-up test issue if the deferred path is P2 or higher risk
341
+ - Reference the follow-up issue ID in the current test issue's description
342
+
343
+ ## Advisory vs Enforcement Boundary
344
+
345
+ This skill is advisory. It recommends test strategy, creates test issues, and flags anti-patterns. It does not block code execution or enforce pass/fail decisions — that is the job of hooks and quality gates.
346
+
347
+ | Concern | Who owns it | How enforced |
348
+ |---|---|---|
349
+ | Test strategy selection (TDD vs contract vs unit) | This skill | Recommendation only |
350
+ | Anti-pattern detection in test issues | This skill | Checklist in issue description |
351
+ | Priority assignment | This skill | Heuristics table above |
352
+ | DoD template in issue description | This skill | Template pasted into bd issue |
353
+ | CI test pass/fail | quality-gates hook | PostToolUse hook blocks on test failures |
354
+ | Test file lint/type correctness | quality-gates hook | ESLint + mypy on every edit |
355
+ | Branch not mergeable without tests | Not enforced | Human review — no automated gate today |
356
+ | Claiming work without test issue existing | Not enforced | Human judgment — skill creates test issue at closure if missing |
357
+
358
+ **Example — advisory boundary in practice**:
359
+
360
+ You are planning tests for `.14` (async HTTP client). This skill:
361
+ - Classifies as boundary layer ✓
362
+ - Recommends live-contract tests ✓
363
+ - Creates a test issue with DoD template ✓
364
+ - Flags if you try to describe tests that only mock the HTTP layer ✓ (anti-pattern 4)
365
+
366
+ It does NOT:
367
+ - Block `.14` from closing if the test issue isn't done
368
+ - Fail the build if the test issue is open
369
+ - Require approval before the implementation is merged
370
+
371
+ The test issue is a tracked commitment, not a gate. Gating is opt-in via `bd dep` dependencies you set up during planning.
372
+
373
+ ## v1.1 Format Examples
374
+
375
+ ### Example A — Planning phase, boundary + shell epic
376
+
377
+ Epic: "Implement gitnexus MCP sync in xtrm install"
378
+
379
+ Children: `.1` (MCP config writer), `.2` (sync-on-install integration), `.3` (CLI `xtrm mcp` command)
380
+
381
+ Classification:
382
+ - `.1` → boundary (writes to `.mcp.json`, file I/O)
383
+ - `.2` → shell (orchestrates install flow)
384
+ - `.3` → shell (CLI command)
385
+
386
+ Test issues created:
387
+
388
+ ```
389
+ bd create "Test: MCP config writer — contract tests for .mcp.json output" \
390
+ -t task -p 2 --parent <epic> \
391
+ -d "Layer: boundary
392
+ Strategy: example-based (file I/O, no external service)
393
+ Covers: .1
394
+
395
+ Assertions required:
396
+ - [ ] Positive path: valid servers config produces correct .mcp.json structure
397
+ - [ ] Negative path: invalid server entry raises validation error
398
+ - [ ] Edge cases: empty servers list, duplicate server names, existing .mcp.json is merged not overwritten
399
+ - [ ] Drift-safe: assert on field presence (name, command, args), not internal object identity
400
+
401
+ Critical paths covered:
402
+ - gitnexus server entry written with correct stdio transport — risk: wrong transport breaks MCP
403
+ - existing user entries preserved during merge — risk: data loss
404
+
405
+ Known deferred paths:
406
+ - test with malformed existing .mcp.json → follow-up: to be created (P3)
407
+
408
+ Done when: all assertions pass, no shared state between tests."
409
+ ```
410
+
411
+ ```
412
+ bd create "Test: xtrm install MCP sync + xtrm mcp CLI — integration tests" \
413
+ -t task -p 2 --parent <epic> \
414
+ -d "Layer: shell
415
+ Strategy: integration (subprocess)
416
+ Covers: .2, .3
417
+
418
+ Assertions required:
419
+ - [ ] End-to-end: xtrm install writes correct .mcp.json in temp project dir
420
+ - [ ] CLI: xtrm mcp list outputs expected server names
421
+ - [ ] Failure-mode: xtrm mcp add with duplicate name exits non-zero with clear error
422
+ - [ ] Cross-component: install flow calls MCP writer with correct config
423
+
424
+ Critical paths covered:
425
+ - full install → .mcp.json present and readable by Claude Code — risk: MCP servers not available
426
+ - CLI add + list roundtrip — risk: user cannot inspect installed servers
427
+
428
+ Known deferred paths:
429
+ - test with no write permission on project dir → follow-up: to be created (P4)
430
+
431
+ Done when: integration tests run against real file system in temp dir, no mocked internals."
432
+ ```
433
+
434
+ ---
435
+
436
+ ### Example B — Closure gate, core layer, implementation diverged
437
+
438
+ Closing `.22` (config merge logic). Existing test issue `.31` was written before implementation.
439
+
440
+ What `.22` actually built:
441
+ - Added precedence chain: env > file > defaults (original plan had only file > defaults)
442
+ - Added type coercion for boolean env vars ("true"/"false" → bool)
443
+ - Removed support for `.xtrm.yaml` (only `.xtrm/config.json` now)
444
+
445
+ Updated test issue `.31`:
446
+
447
+ ```
448
+ bd update xtrm-31 --notes "Scope updated after .22 completed:
449
+ + Add test: env var takes precedence over file config (new precedence chain)
450
+ + Add test: 'true'/'false' env vars coerced to bool correctly
451
+ + Add test: 'TRUE', '1', '0' edge cases for bool coercion
452
+ + Remove test: .xtrm.yaml loading (format removed in .22)
453
+
454
+ Anti-pattern check:
455
+ - [ ] tautological: none detected
456
+ - [ ] over-mocking: env injection via monkeypatch only, no internal mocking
457
+ - [ ] shared state: each test resets env via fixture
458
+
459
+ Critical paths covered:
460
+ - env > file > defaults chain — risk: wrong precedence silently overrides user config
461
+ - bool coercion — risk: 'false' string treated as truthy in Python
462
+
463
+ Known deferred paths:
464
+ - test with missing HOME dir (pathlib resolution edge case) → follow-up: xtrm-4x (P4)"
465
+ ```