xtrm-tools 0.5.46 → 0.5.48
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +1 -1
- package/README.md +5 -5
- package/cli/dist/index.cjs +9728 -9967
- package/cli/dist/index.cjs.map +1 -1
- package/cli/package.json +2 -1
- package/config/instructions/agents-top.md +2 -4
- package/config/instructions/claude-top.md +2 -4
- package/config/pi/extensions/beads/index.ts +18 -81
- package/config/pi/extensions/xtrm-ui/format.ts +93 -0
- package/config/pi/extensions/xtrm-ui/index.ts +725 -12
- package/hooks/beads-claim-sync.mjs +15 -96
- package/hooks/beads-gate-messages.mjs +2 -4
- package/hooks/beads-gate-utils.mjs +0 -18
- package/hooks/statusline.mjs +5 -3
- package/hooks/tsconfig-cache.json +2 -12
- package/package.json +1 -1
- package/plugins/xtrm-tools/.claude-plugin/plugin.json +1 -1
- package/plugins/xtrm-tools/hooks/beads-claim-sync.mjs +15 -96
- package/plugins/xtrm-tools/hooks/beads-gate-messages.mjs +2 -4
- package/plugins/xtrm-tools/hooks/beads-gate-utils.mjs +0 -18
- package/plugins/xtrm-tools/hooks/statusline.mjs +5 -3
- package/plugins/xtrm-tools/hooks/tsconfig-cache.json +2 -12
- package/plugins/xtrm-tools/skills/planning/SKILL.md +75 -20
- package/plugins/xtrm-tools/skills/test-planning/SKILL.md +257 -0
- package/plugins/xtrm-tools/skills/using-xtrm/SKILL.md +1 -1
- package/plugins/xtrm-tools/skills/xt-debugging/SKILL.md +149 -0
- package/plugins/xtrm-tools/skills/xt-end/SKILL.md +28 -0
- package/skills/planning/SKILL.md +75 -20
- package/skills/test-planning/SKILL.md +257 -0
- package/skills/using-xtrm/SKILL.md +1 -1
- package/skills/xt-debugging/SKILL.md +149 -0
- package/skills/xt-end/SKILL.md +28 -0
- package/plugins/xtrm-tools/skills/gitnexus-debugging/SKILL.md +0 -85
- package/skills/gitnexus-debugging/SKILL.md +0 -85
|
@@ -62,16 +62,33 @@ If the request is under 8 words or the scope is unclear, ask **one** clarifying
|
|
|
62
62
|
|
|
63
63
|
Use GitNexus and Serena to understand the landscape. No file edits.
|
|
64
64
|
|
|
65
|
+
### GitNexus-first protocol (mandatory when available)
|
|
66
|
+
|
|
65
67
|
```bash
|
|
66
|
-
# Find relevant execution flows
|
|
68
|
+
# 1) Find relevant execution flows by concept
|
|
67
69
|
gitnexus_query({query: "<concept related to task>"})
|
|
68
70
|
|
|
69
|
-
#
|
|
71
|
+
# 2) Get full caller/callee/process context for likely symbols
|
|
70
72
|
gitnexus_context({name: "<affected symbol>"})
|
|
71
73
|
|
|
72
|
-
#
|
|
74
|
+
# 3) Assess blast radius before locking the implementation plan
|
|
73
75
|
gitnexus_impact({target: "<symbol to change>", direction: "upstream"})
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### Refactor planning checks (when rename/extract/move is in scope)
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
# Preview safe multi-file rename plan first
|
|
82
|
+
gitnexus_rename({symbol_name: "<old>", new_name: "<new>", dry_run: true})
|
|
83
|
+
|
|
84
|
+
# Confirm context before extraction/split plans
|
|
85
|
+
gitnexus_context({name: "<symbol to extract/split>"})
|
|
86
|
+
gitnexus_impact({target: "<symbol to extract/split>", direction: "upstream"})
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Serena symbol-level inspection (targeted reads)
|
|
74
90
|
|
|
91
|
+
```bash
|
|
75
92
|
# Map a file without reading all of it
|
|
76
93
|
get_symbols_overview("path/to/relevant/file.ts")
|
|
77
94
|
|
|
@@ -79,11 +96,44 @@ get_symbols_overview("path/to/relevant/file.ts")
|
|
|
79
96
|
find_symbol("SymbolName", include_body=true)
|
|
80
97
|
```
|
|
81
98
|
|
|
99
|
+
### Fallback when GitNexus MCP tools are unavailable
|
|
100
|
+
|
|
101
|
+
If MCP GitNexus tools are unavailable, use the GitNexus CLI first, then Serena symbol exploration if needed.
|
|
102
|
+
|
|
103
|
+
```bash
|
|
104
|
+
# Verify index freshness / repository indexing
|
|
105
|
+
npx gitnexus status
|
|
106
|
+
npx gitnexus list
|
|
107
|
+
|
|
108
|
+
# Concept and architecture exploration
|
|
109
|
+
npx gitnexus query "<concept or symptom>" --limit 5
|
|
110
|
+
npx gitnexus context "<symbolName>"
|
|
111
|
+
|
|
112
|
+
# Blast radius before committing to a plan
|
|
113
|
+
npx gitnexus impact "<symbolName>" --direction upstream --depth 3
|
|
114
|
+
|
|
115
|
+
# If index is stale
|
|
116
|
+
npx gitnexus analyze
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Notes:
|
|
120
|
+
- In this environment, `detect_changes` and `rename` are available via MCP tools, not GitNexus CLI subcommands.
|
|
121
|
+
- If both MCP and CLI are unavailable, fall back to Serena search + symbols and state this explicitly in your plan output.
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
search_for_pattern("<concept or symbol>")
|
|
125
|
+
get_symbols_overview("path/to/relevant/file.ts")
|
|
126
|
+
find_symbol("<candidate symbol>", include_body=true)
|
|
127
|
+
find_referencing_symbols("<symbol>", "path/to/file.ts")
|
|
128
|
+
```
|
|
129
|
+
|
|
82
130
|
**Capture from exploration:**
|
|
83
131
|
- Which files/symbols will be affected
|
|
132
|
+
- Which execution flows/processes are involved (from `gitnexus_query`/`gitnexus_context`)
|
|
84
133
|
- What existing patterns to follow (naming, structure, error handling)
|
|
85
134
|
- Any d=1 dependents that require updates when you change a symbol
|
|
86
|
-
- Risk level: if CRITICAL or HIGH → warn user before proceeding
|
|
135
|
+
- Risk level from impact analysis: if CRITICAL or HIGH → warn user before proceeding
|
|
136
|
+
- If GitNexus fallback path was used, explicitly call it out in the handoff
|
|
87
137
|
|
|
88
138
|
---
|
|
89
139
|
|
|
@@ -101,6 +151,7 @@ Think through the plan before writing any bd commands. Use structured CoT:
|
|
|
101
151
|
3. What are the dependencies? (what must be done before X can start?)
|
|
102
152
|
4. What can run in parallel? (independent tasks → no deps between them)
|
|
103
153
|
5. What are the risks? (complex areas, unclear spec, risky refactors)
|
|
154
|
+
6. What is the blast-radius summary from GitNexus? (direct callers, affected processes, risk level)
|
|
104
155
|
</thinking>
|
|
105
156
|
|
|
106
157
|
<plan>
|
|
@@ -244,7 +295,13 @@ test-planning will:
|
|
|
244
295
|
|
|
245
296
|
## Phase 6 — Handoff
|
|
246
297
|
|
|
247
|
-
Present the board and transition to implementation
|
|
298
|
+
Present the board and transition to implementation.
|
|
299
|
+
|
|
300
|
+
Include a short **Architecture & Impact Summary** in your handoff message:
|
|
301
|
+
- Key execution flows/processes involved
|
|
302
|
+
- Top d=1 dependents to watch
|
|
303
|
+
- Highest observed risk (LOW/MEDIUM/HIGH/CRITICAL)
|
|
304
|
+
- Whether GitNexus-first or fallback exploration was used
|
|
248
305
|
|
|
249
306
|
```bash
|
|
250
307
|
# Show the full board
|
|
@@ -292,30 +349,26 @@ Then begin work on the first task. The planning phase is complete.
|
|
|
292
349
|
### Example 2 — Bug fix with investigation
|
|
293
350
|
|
|
294
351
|
<example>
|
|
295
|
-
<scenario>User: "bd close
|
|
352
|
+
<scenario>User: "bd close doesn't commit my changes"</scenario>
|
|
296
353
|
|
|
297
354
|
<exploration>
|
|
298
|
-
gitnexus_query({query: "bd close
|
|
355
|
+
gitnexus_query({query: "bd close commit workflow"})
|
|
299
356
|
→ finds: beads-claim-sync.mjs, close event handler
|
|
300
|
-
find_symbol("
|
|
301
|
-
→ discovers:
|
|
357
|
+
find_symbol("main", include_body=true)
|
|
358
|
+
→ discovers: bd close sets closed-this-session KV only; no git commit
|
|
302
359
|
</exploration>
|
|
303
360
|
|
|
304
361
|
<thinking>
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
Single task, no phases needed.
|
|
362
|
+
bd close does NOT auto-commit (removed in xtrm-wr0o).
|
|
363
|
+
Correct workflow: bd close <id>, then git add + git commit separately, then xt end.
|
|
364
|
+
No issue needed — this is expected behavior.
|
|
309
365
|
</thinking>
|
|
310
366
|
|
|
311
367
|
<bd_command>
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
'git add -A' scoped to expected paths before committing.
|
|
317
|
-
AC: [ ] auto-commit includes new untracked files [ ] existing behavior preserved"
|
|
318
|
-
--type=bug --priority=1
|
|
368
|
+
# No issue needed — explain the correct workflow to the user:
|
|
369
|
+
# 1. bd close <id> --reason="..." ← closes issue
|
|
370
|
+
# 2. git add . && git commit -m "..." ← commit changes manually
|
|
371
|
+
# 3. xt end ← push, PR, merge, worktree cleanup
|
|
319
372
|
</bd_command>
|
|
320
373
|
</example>
|
|
321
374
|
|
|
@@ -343,6 +396,8 @@ Before presenting the plan to the user:
|
|
|
343
396
|
- [ ] Every issue has context / what / AC / notes
|
|
344
397
|
- [ ] Dependencies are correct (A blocks B when B needs A's output)
|
|
345
398
|
- [ ] No task is more than "one session" of work (split if needed)
|
|
399
|
+
- [ ] GitNexus evidence captured (query/context/impact) or fallback path explicitly stated
|
|
400
|
+
- [ ] If refactor scope exists, rename/extract safety checks were included in plan
|
|
346
401
|
- [ ] test-planning was invoked (or scheduled as next step)
|
|
347
402
|
- [ ] First implementation issue is ready to claim
|
|
348
403
|
|
|
@@ -206,3 +206,260 @@ Agent closes a feature issue that was done ad-hoc. No test issue found. Agent:
|
|
|
206
206
|
2. Picks strategy
|
|
207
207
|
3. Creates test issue as child of same parent
|
|
208
208
|
4. Documents what to assert based on the actual code
|
|
209
|
+
|
|
210
|
+
## Anti-Pattern Checklist
|
|
211
|
+
|
|
212
|
+
Run this checklist at both trigger points (planning and closure review). Flag any anti-patterns in the test issue description before closing.
|
|
213
|
+
|
|
214
|
+
### 1. Assertion-free tests
|
|
215
|
+
**Detect**: Test body calls functions/methods but has no `assert`, `expect`, or equivalent statement.
|
|
216
|
+
**Fix**: Add at least one meaningful assertion. If the goal is "doesn't throw", assert that explicitly — `with pytest.raises(...)` or `expect(() => fn()).not.toThrow()`.
|
|
217
|
+
|
|
218
|
+
### 2. Tautological assertions
|
|
219
|
+
**Detect**: The assertion can only fail if the test framework itself is broken. E.g. `assert result == result`, `expect(true).toBe(true)`, asserting a value against the same expression used to produce it.
|
|
220
|
+
**Fix**: Assert against a concrete expected value derived independently from the production code. If you can't state what the expected value is without running the code, the test has no falsifiable claim.
|
|
221
|
+
|
|
222
|
+
### 3. Context leakage / shared mutable state
|
|
223
|
+
**Detect**: Tests share module-level variables, database rows, file state, or global config without reset between runs. Symptoms: tests pass individually but fail in suite order.
|
|
224
|
+
**Fix**: Use fixtures with setup/teardown (`beforeEach`/`afterEach`, pytest fixtures with function scope). Every test starts from a clean slate.
|
|
225
|
+
|
|
226
|
+
### 4. Over-mocking internal collaborators
|
|
227
|
+
**Detect**: Mocks are patching classes or functions that live in the same module under test — not external services. The test validates that internal wiring was called, not that the observable outcome is correct.
|
|
228
|
+
**Fix**: Only mock at system boundaries (HTTP clients, file I/O, external services). Test internal collaborators by letting them run. If they're hard to instantiate, extract the pure logic and test that directly.
|
|
229
|
+
|
|
230
|
+
### 5. Tests that cannot fail under realistic regressions
|
|
231
|
+
**Detect**: Remove the core logic being tested and re-read the test — would it still pass? If yes, the test provides no protection. Common form: only testing the happy path of a function whose bug would only appear in error paths.
|
|
232
|
+
**Fix**: Add at least one negative-path or edge-case assertion that would catch the most likely regression. Consult the implementation for obvious failure modes.
|
|
233
|
+
|
|
234
|
+
## Priority Heuristics
|
|
235
|
+
|
|
236
|
+
Test issues inherit priority from their implementation issues with bounded adjustment. The table below gives the deterministic mapping.
|
|
237
|
+
|
|
238
|
+
| Implementation risk | Test issue priority | Examples |
|
|
239
|
+
|---|---|---|
|
|
240
|
+
| Security / auth / protocol compat | P0 (equal) | Auth token validation, schema migration safety, API contract |
|
|
241
|
+
| Regression-critical boundary path | P0–P1 (equal) | Client URL routing, CLI exit codes used by external tooling |
|
|
242
|
+
| High-business-impact core logic | P1 (equal or +0) | Pricing computations, session state transitions |
|
|
243
|
+
| Standard domain logic | P2 (+0 or +1) | Config merge, output formatters, parsers |
|
|
244
|
+
| Low-risk internals / non-critical adapters | P3 (+1) | Helper utilities, optional UI formatting |
|
|
245
|
+
| Polish / test debt cleanup | P4 | Improving existing test coverage, test naming |
|
|
246
|
+
|
|
247
|
+
**Inheritance rule**: start from the implementation issue's priority. Apply +1 if the test is covering a well-understood path with low regression risk. Never go lower than P2 for boundary or shell layer tests — integration tests are load-bearing.
|
|
248
|
+
|
|
249
|
+
**Equal priority examples**:
|
|
250
|
+
- Impl is P1 (auth endpoint) → test issue is P1 (auth contract test must ship with the feature)
|
|
251
|
+
- Impl is P0 (critical fix) → test issue is P0 (regression test must land in same PR)
|
|
252
|
+
|
|
253
|
+
**+1 priority examples**:
|
|
254
|
+
- Impl is P2 (output formatter) → test issue is P3 (unit tests are useful but not blocking)
|
|
255
|
+
- Impl is P3 (optional config key) → test issue is P4 (test debt, tackle in cleanup)
|
|
256
|
+
|
|
257
|
+
## Definition of Done Templates
|
|
258
|
+
|
|
259
|
+
Use these templates verbatim in test issue descriptions. Replace `<...>` placeholders.
|
|
260
|
+
|
|
261
|
+
### Core layer DoD
|
|
262
|
+
|
|
263
|
+
```
|
|
264
|
+
Layer: core
|
|
265
|
+
Strategy: <unit | property-based | example-based>
|
|
266
|
+
Covers: <impl issue IDs>
|
|
267
|
+
|
|
268
|
+
Assertions required:
|
|
269
|
+
- [ ] Positive path: <expected output for valid input>
|
|
270
|
+
- [ ] Negative path: <expected error/output for invalid input>
|
|
271
|
+
- [ ] Edge cases explicitly enumerated: <list: empty input, zero, max boundary, ...>
|
|
272
|
+
- [ ] Invariants/properties included: <e.g. "result is always sorted", "output length == input length">
|
|
273
|
+
|
|
274
|
+
Fixture policy:
|
|
275
|
+
- [ ] No shared mutable state between tests
|
|
276
|
+
- [ ] Deterministic fixtures (no random, no time.now() without injection)
|
|
277
|
+
- [ ] Each test constructs its own input independently
|
|
278
|
+
|
|
279
|
+
Done when: all assertions above are implemented and passing in CI.
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
### Boundary layer DoD
|
|
283
|
+
|
|
284
|
+
```
|
|
285
|
+
Layer: boundary
|
|
286
|
+
Strategy: <live-contract | recorded-fixture | mock (last resort)>
|
|
287
|
+
Covers: <impl issue IDs>
|
|
288
|
+
|
|
289
|
+
Assertions required:
|
|
290
|
+
- [ ] Schema/contract assertions: <field presence, types, required vs optional>
|
|
291
|
+
- [ ] Error codes and retry/fallback: <e.g. 404→empty list, 500→raises ServiceError>
|
|
292
|
+
- [ ] Drift-safe: assertions check field presence and types, not brittle internal structure
|
|
293
|
+
- [ ] Live-first policy documented: <live | recorded-fixture | mock — reason for choice>
|
|
294
|
+
|
|
295
|
+
Done when: contract assertions pass against live service (or recorded fixture if live unavailable).
|
|
296
|
+
Fallback documented in issue if live is not accessible.
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
### Shell layer DoD
|
|
300
|
+
|
|
301
|
+
```
|
|
302
|
+
Layer: shell
|
|
303
|
+
Strategy: integration (subprocess or function-level wiring test)
|
|
304
|
+
Covers: <impl issue IDs>
|
|
305
|
+
|
|
306
|
+
Assertions required:
|
|
307
|
+
- [ ] End-to-end observable outcomes: <what the user sees — output format, exit code>
|
|
308
|
+
- [ ] Failure-mode UX: <error messages, non-zero exit codes, stderr vs stdout>
|
|
309
|
+
- [ ] Cross-component wiring: <core + boundary are called and integrated correctly>
|
|
310
|
+
- [ ] At least one real-data scenario (not mocked) if service is accessible
|
|
311
|
+
|
|
312
|
+
Done when: integration tests run against real components (not mocked internals) and cover
|
|
313
|
+
both success and at least one failure path.
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## Critical-Path Coverage
|
|
317
|
+
|
|
318
|
+
Do not frame test issues around coverage percentages. Frame them around critical paths and risk rationale.
|
|
319
|
+
|
|
320
|
+
Every test issue description must include a **critical path map**:
|
|
321
|
+
|
|
322
|
+
```
|
|
323
|
+
Critical paths covered:
|
|
324
|
+
- <path 1 and risk rationale>
|
|
325
|
+
- <path 2 and risk rationale>
|
|
326
|
+
|
|
327
|
+
Known deferred paths (with follow-up refs):
|
|
328
|
+
- <path not covered yet> → follow-up: <bd issue ID or "to be created">
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
**Why**: a 90% line-coverage number says nothing about whether the one path that processes payments is tested. A critical path map forces explicit reasoning about what matters and what was skipped.
|
|
332
|
+
|
|
333
|
+
**What counts as a critical path**:
|
|
334
|
+
- Any path that involves auth, money, data loss, or external contract compliance
|
|
335
|
+
- Any path exercised by the user-facing CLI commands described in the issue
|
|
336
|
+
- Any path explicitly mentioned in the implementation issue's acceptance criteria
|
|
337
|
+
|
|
338
|
+
**What to do with deferred paths**:
|
|
339
|
+
- Document them — don't silently skip
|
|
340
|
+
- Create a follow-up test issue if the deferred path is P2 or higher risk
|
|
341
|
+
- Reference the follow-up issue ID in the current test issue's description
|
|
342
|
+
|
|
343
|
+
## Advisory vs Enforcement Boundary
|
|
344
|
+
|
|
345
|
+
This skill is advisory. It recommends test strategy, creates test issues, and flags anti-patterns. It does not block code execution or enforce pass/fail decisions — that is the job of hooks and quality gates.
|
|
346
|
+
|
|
347
|
+
| Concern | Who owns it | How enforced |
|
|
348
|
+
|---|---|---|
|
|
349
|
+
| Test strategy selection (TDD vs contract vs unit) | This skill | Recommendation only |
|
|
350
|
+
| Anti-pattern detection in test issues | This skill | Checklist in issue description |
|
|
351
|
+
| Priority assignment | This skill | Heuristics table above |
|
|
352
|
+
| DoD template in issue description | This skill | Template pasted into bd issue |
|
|
353
|
+
| CI test pass/fail | quality-gates hook | PostToolUse hook blocks on test failures |
|
|
354
|
+
| Test file lint/type correctness | quality-gates hook | ESLint + mypy on every edit |
|
|
355
|
+
| Branch not mergeable without tests | Not enforced | Human review — no automated gate today |
|
|
356
|
+
| Claiming work without test issue existing | Not enforced | Human judgment — skill creates test issue at closure if missing |
|
|
357
|
+
|
|
358
|
+
**Example — advisory boundary in practice**:
|
|
359
|
+
|
|
360
|
+
You are planning tests for `.14` (async HTTP client). This skill:
|
|
361
|
+
- Classifies as boundary layer ✓
|
|
362
|
+
- Recommends live-contract tests ✓
|
|
363
|
+
- Creates a test issue with DoD template ✓
|
|
364
|
+
- Flags if you try to describe tests that only mock the HTTP layer ✓ (anti-pattern 4)
|
|
365
|
+
|
|
366
|
+
It does NOT:
|
|
367
|
+
- Block `.14` from closing if the test issue isn't done
|
|
368
|
+
- Fail the build if the test issue is open
|
|
369
|
+
- Require approval before the implementation is merged
|
|
370
|
+
|
|
371
|
+
The test issue is a tracked commitment, not a gate. Gating is opt-in via `bd dep` dependencies you set up during planning.
|
|
372
|
+
|
|
373
|
+
## v1.1 Format Examples
|
|
374
|
+
|
|
375
|
+
### Example A — Planning phase, boundary + shell epic
|
|
376
|
+
|
|
377
|
+
Epic: "Implement gitnexus MCP sync in xtrm install"
|
|
378
|
+
|
|
379
|
+
Children: `.1` (MCP config writer), `.2` (sync-on-install integration), `.3` (CLI `xtrm mcp` command)
|
|
380
|
+
|
|
381
|
+
Classification:
|
|
382
|
+
- `.1` → boundary (writes to `.mcp.json`, file I/O)
|
|
383
|
+
- `.2` → shell (orchestrates install flow)
|
|
384
|
+
- `.3` → shell (CLI command)
|
|
385
|
+
|
|
386
|
+
Test issues created:
|
|
387
|
+
|
|
388
|
+
```
|
|
389
|
+
bd create "Test: MCP config writer — contract tests for .mcp.json output" \
|
|
390
|
+
-t task -p 2 --parent <epic> \
|
|
391
|
+
-d "Layer: boundary
|
|
392
|
+
Strategy: example-based (file I/O, no external service)
|
|
393
|
+
Covers: .1
|
|
394
|
+
|
|
395
|
+
Assertions required:
|
|
396
|
+
- [ ] Positive path: valid servers config produces correct .mcp.json structure
|
|
397
|
+
- [ ] Negative path: invalid server entry raises validation error
|
|
398
|
+
- [ ] Edge cases: empty servers list, duplicate server names, existing .mcp.json is merged not overwritten
|
|
399
|
+
- [ ] Drift-safe: assert on field presence (name, command, args), not internal object identity
|
|
400
|
+
|
|
401
|
+
Critical paths covered:
|
|
402
|
+
- gitnexus server entry written with correct stdio transport — risk: wrong transport breaks MCP
|
|
403
|
+
- existing user entries preserved during merge — risk: data loss
|
|
404
|
+
|
|
405
|
+
Known deferred paths:
|
|
406
|
+
- test with malformed existing .mcp.json → follow-up: to be created (P3)
|
|
407
|
+
|
|
408
|
+
Done when: all assertions pass, no shared state between tests."
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
```
|
|
412
|
+
bd create "Test: xtrm install MCP sync + xtrm mcp CLI — integration tests" \
|
|
413
|
+
-t task -p 2 --parent <epic> \
|
|
414
|
+
-d "Layer: shell
|
|
415
|
+
Strategy: integration (subprocess)
|
|
416
|
+
Covers: .2, .3
|
|
417
|
+
|
|
418
|
+
Assertions required:
|
|
419
|
+
- [ ] End-to-end: xtrm install writes correct .mcp.json in temp project dir
|
|
420
|
+
- [ ] CLI: xtrm mcp list outputs expected server names
|
|
421
|
+
- [ ] Failure-mode: xtrm mcp add with duplicate name exits non-zero with clear error
|
|
422
|
+
- [ ] Cross-component: install flow calls MCP writer with correct config
|
|
423
|
+
|
|
424
|
+
Critical paths covered:
|
|
425
|
+
- full install → .mcp.json present and readable by Claude Code — risk: MCP servers not available
|
|
426
|
+
- CLI add + list roundtrip — risk: user cannot inspect installed servers
|
|
427
|
+
|
|
428
|
+
Known deferred paths:
|
|
429
|
+
- test with no write permission on project dir → follow-up: to be created (P4)
|
|
430
|
+
|
|
431
|
+
Done when: integration tests run against real file system in temp dir, no mocked internals."
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
---
|
|
435
|
+
|
|
436
|
+
### Example B — Closure gate, core layer, implementation diverged
|
|
437
|
+
|
|
438
|
+
Closing `.22` (config merge logic). Existing test issue `.31` was written before implementation.
|
|
439
|
+
|
|
440
|
+
What `.22` actually built:
|
|
441
|
+
- Added precedence chain: env > file > defaults (original plan had only file > defaults)
|
|
442
|
+
- Added type coercion for boolean env vars ("true"/"false" → bool)
|
|
443
|
+
- Removed support for `.xtrm.yaml` (only `.xtrm/config.json` now)
|
|
444
|
+
|
|
445
|
+
Updated test issue `.31`:
|
|
446
|
+
|
|
447
|
+
```
|
|
448
|
+
bd update xtrm-31 --notes "Scope updated after .22 completed:
|
|
449
|
+
+ Add test: env var takes precedence over file config (new precedence chain)
|
|
450
|
+
+ Add test: 'true'/'false' env vars coerced to bool correctly
|
|
451
|
+
+ Add test: 'TRUE', '1', '0' edge cases for bool coercion
|
|
452
|
+
+ Remove test: .xtrm.yaml loading (format removed in .22)
|
|
453
|
+
|
|
454
|
+
Anti-pattern check:
|
|
455
|
+
- [ ] tautological: none detected
|
|
456
|
+
- [ ] over-mocking: env injection via monkeypatch only, no internal mocking
|
|
457
|
+
- [ ] shared state: each test resets env via fixture
|
|
458
|
+
|
|
459
|
+
Critical paths covered:
|
|
460
|
+
- env > file > defaults chain — risk: wrong precedence silently overrides user config
|
|
461
|
+
- bool coercion — risk: 'false' string treated as truthy in Python
|
|
462
|
+
|
|
463
|
+
Known deferred paths:
|
|
464
|
+
- test with missing HOME dir (pathlib resolution edge case) → follow-up: xtrm-4x (P4)"
|
|
465
|
+
```
|
|
@@ -78,7 +78,7 @@ gitnexus_impact({target: "parseComposeServices", direction: "upstream"})
|
|
|
78
78
|
get_symbols_overview("hooks/init.ts") # map file
|
|
79
79
|
find_symbol("parseComposeServices", include_body=True) # read just this
|
|
80
80
|
replace_symbol_body("parseComposeServices", newBody) # Serena edit
|
|
81
|
-
bd close bd-xyz --reason="Fix YAML parse edge case" # close
|
|
81
|
+
bd close bd-xyz --reason="Fix YAML parse edge case" # close issue
|
|
82
82
|
xt end # push, PR, merge, cleanup
|
|
83
83
|
```
|
|
84
84
|
|
|
@@ -0,0 +1,149 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: xt-debugging
|
|
3
|
+
description: Complete debugging workflow — error analysis, log interpretation, performance profiling, and GitNexus call-chain tracing. Use when investigating bugs, errors, crashes, or performance issues.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# xt-debugging
|
|
7
|
+
|
|
8
|
+
Systematic debugging using the GitNexus knowledge graph for call-chain tracing, combined with error analysis, log interpretation, and performance profiling.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
- "Why is this function failing?"
|
|
13
|
+
- "Trace where this error comes from"
|
|
14
|
+
- "This endpoint returns 500"
|
|
15
|
+
- Investigating crashes, unexpected behavior, or regressions
|
|
16
|
+
- Performance issues
|
|
17
|
+
- Reading logs or stack traces
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Prerequisites
|
|
22
|
+
|
|
23
|
+
GitNexus must be indexed before starting. If you see "index is stale" or no results:
|
|
24
|
+
|
|
25
|
+
```bash
|
|
26
|
+
npx gitnexus analyze # re-index the repo (run this, then retry)
|
|
27
|
+
npx gitnexus status # verify freshness
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Phase 1 — Triage
|
|
33
|
+
|
|
34
|
+
Understand the symptom before touching any code.
|
|
35
|
+
|
|
36
|
+
1. Read the full error message and stack trace
|
|
37
|
+
2. Identify the suspect symbol (function, class, endpoint)
|
|
38
|
+
3. Check for regressions — what changed recently?
|
|
39
|
+
```
|
|
40
|
+
gitnexus_detect_changes({scope: "compare", base_ref: "main"})
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Phase 2 — Knowledge Graph Investigation
|
|
46
|
+
|
|
47
|
+
```
|
|
48
|
+
1. gitnexus_query({query: "<error text or symptom>"}) → Related execution flows + symbols
|
|
49
|
+
2. gitnexus_context({name: "<suspect>"}) → Callers, callees, process participation
|
|
50
|
+
3. READ gitnexus://repo/{name}/process/{processName} → Full step-by-step execution trace
|
|
51
|
+
4. gitnexus_cypher({...}) → Custom call chain if needed
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Patterns by Symptom
|
|
55
|
+
|
|
56
|
+
| Symptom | Approach |
|
|
57
|
+
|---------|----------|
|
|
58
|
+
| Error message | `query` for error text → `context` on throw site |
|
|
59
|
+
| Wrong return value | `context` on function → trace callees for data flow |
|
|
60
|
+
| Intermittent failure | `context` → look for external calls, async deps, race conditions |
|
|
61
|
+
| Performance issue | `context` → find hot-path symbols with many callers |
|
|
62
|
+
| Recent regression | `detect_changes` to see what changed |
|
|
63
|
+
|
|
64
|
+
### Example — "Payment endpoint returns 500 intermittently"
|
|
65
|
+
|
|
66
|
+
```
|
|
67
|
+
1. gitnexus_query({query: "payment error handling"})
|
|
68
|
+
→ Processes: CheckoutFlow, ErrorHandling
|
|
69
|
+
→ Symbols: validatePayment, handlePaymentError
|
|
70
|
+
|
|
71
|
+
2. gitnexus_context({name: "validatePayment"})
|
|
72
|
+
→ Outgoing calls: verifyCard, fetchRates (external API!)
|
|
73
|
+
|
|
74
|
+
3. READ gitnexus://repo/my-app/process/CheckoutFlow
|
|
75
|
+
→ Step 3: validatePayment → calls fetchRates (external, no timeout)
|
|
76
|
+
|
|
77
|
+
4. Root cause: fetchRates has no timeout → intermittent failures under load
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Phase 3 — Root Cause Analysis
|
|
83
|
+
|
|
84
|
+
1. **Reproduce** — Identify minimal reproduction steps
|
|
85
|
+
2. **Trace data flow** — Follow the value/control flow from input to error
|
|
86
|
+
3. **Isolate** — Narrow to the smallest failing unit
|
|
87
|
+
4. **Hypothesize** — Form explicit hypothesis before reading more code
|
|
88
|
+
5. **Confirm** — Verify hypothesis against source, not just symptoms
|
|
89
|
+
|
|
90
|
+
Common root cause categories:
|
|
91
|
+
- **Null/undefined** — missing guard, wrong assumption about data shape
|
|
92
|
+
- **Race condition** — async ordering, missing await, shared mutable state
|
|
93
|
+
- **External dependency** — timeout, API contract change, env difference
|
|
94
|
+
- **Type mismatch** — serialization, casting, implicit coercion
|
|
95
|
+
- **Configuration** — env var missing, wrong default, deployment drift
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## Phase 4 — Log Analysis
|
|
100
|
+
|
|
101
|
+
1. Read the full log output (not just the last line)
|
|
102
|
+
2. Identify: timestamps, error levels, request IDs, correlation tokens
|
|
103
|
+
3. Correlate events across log lines to build timeline
|
|
104
|
+
4. Look for: first occurrence, frequency, affected subset, preceding events
|
|
105
|
+
5. Summarize: what happened, when, why, and what was affected
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
## Phase 5 — Performance Profiling
|
|
110
|
+
|
|
111
|
+
1. **Measure baseline** — Never optimize blind
|
|
112
|
+
```bash
|
|
113
|
+
time <command>
|
|
114
|
+
```
|
|
115
|
+
2. **Profile** — Language-appropriate tools:
|
|
116
|
+
- Node.js: `--prof`, `clinic.js`, `0x`
|
|
117
|
+
- Python: `cProfile`, `py-spy`, `line_profiler`
|
|
118
|
+
- Go: `pprof`
|
|
119
|
+
3. **Identify hotspot** — Usually 1–2 functions account for >80% of time; use `gitnexus_context` to confirm call frequency
|
|
120
|
+
4. **Fix the bottleneck** — Minimal targeted change
|
|
121
|
+
5. **Verify** — Measure again, compare against baseline
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Phase 6 — Remediation
|
|
126
|
+
|
|
127
|
+
1. **Fix** — Minimal change that addresses the confirmed root cause
|
|
128
|
+
2. **Verify** — Run the failing case to confirm fix
|
|
129
|
+
3. **Regression test** — Add a test that would have caught this bug
|
|
130
|
+
4. **Check blast radius** — `gitnexus_impact({target: "fixedSymbol", direction: "upstream"})`
|
|
131
|
+
5. **Pre-commit scope check** — `gitnexus_detect_changes({scope: "staged"})`
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
## Checklist
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
- [ ] Read full error / stack trace
|
|
139
|
+
- [ ] Identify suspect symbol
|
|
140
|
+
- [ ] gitnexus_detect_changes to check for regressions
|
|
141
|
+
- [ ] gitnexus_query for error text or symptom
|
|
142
|
+
- [ ] gitnexus_context on suspect (callers, callees, processes)
|
|
143
|
+
- [ ] Trace execution flow via process resource
|
|
144
|
+
- [ ] Read source files to confirm root cause
|
|
145
|
+
- [ ] Form explicit hypothesis before fixing
|
|
146
|
+
- [ ] Verify fix against failing reproduction
|
|
147
|
+
- [ ] Add regression test
|
|
148
|
+
- [ ] gitnexus_detect_changes() before committing
|
|
149
|
+
```
|
|
@@ -78,6 +78,33 @@ If changes cannot be classified safely, stop with `BLOCKED_DIRTY_UNCLASSIFIED_CH
|
|
|
78
78
|
|
|
79
79
|
---
|
|
80
80
|
|
|
81
|
+
## Stage 2.5 — Scope Verification
|
|
82
|
+
|
|
83
|
+
Before committing or pushing, verify that branch changes match the expected scope of the session's closed issues. **Never skip this step** — unreviewed scope is the primary source of oversized or unintended PRs.
|
|
84
|
+
|
|
85
|
+
Run:
|
|
86
|
+
```bash
|
|
87
|
+
git diff --stat origin/main..HEAD 2>/dev/null || git diff --stat $(git merge-base HEAD main)..HEAD
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
For each significantly changed symbol, check blast radius:
|
|
91
|
+
```bash
|
|
92
|
+
npx gitnexus impact <symbol-name> # upstream dependants — what else breaks
|
|
93
|
+
npx gitnexus impact <symbol-name> -d downstream # downstream dependencies
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
For Claude agents with MCP access, also run:
|
|
97
|
+
```
|
|
98
|
+
gitnexus_detect_changes({scope: "compare", base_ref: "main"})
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
**Rules:**
|
|
102
|
+
- Changes clearly tied to session issues → continue
|
|
103
|
+
- Files unrelated to session issues → classify as overscoped (handle in Stage 3D)
|
|
104
|
+
- `npx gitnexus impact` returns HIGH or CRITICAL risk on a changed symbol → **stop and report to user** before continuing
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
81
108
|
## Stage 3 — Dry Run and Anomaly Detection
|
|
82
109
|
|
|
83
110
|
Run:
|
|
@@ -260,6 +287,7 @@ If linkage had to be inferred from commits rather than detected by `xt end`, say
|
|
|
260
287
|
|
|
261
288
|
The autonomous rule is simple:
|
|
262
289
|
- normalize the session
|
|
290
|
+
- verify scope with gitnexus before committing
|
|
263
291
|
- dry-run
|
|
264
292
|
- auto-fix predictable anomalies
|
|
265
293
|
- rerun dry-run
|