context-mode 1.0.53 → 1.0.56

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (66) hide show
  1. package/.claude-plugin/marketplace.json +2 -2
  2. package/.claude-plugin/plugin.json +1 -1
  3. package/.openclaw-plugin/openclaw.plugin.json +1 -1
  4. package/.openclaw-plugin/package.json +1 -1
  5. package/README.md +103 -32
  6. package/build/adapters/antigravity/index.d.ts +1 -3
  7. package/build/adapters/antigravity/index.js +0 -30
  8. package/build/adapters/claude-code/hooks.d.ts +18 -0
  9. package/build/adapters/claude-code/hooks.js +23 -0
  10. package/build/adapters/claude-code/index.d.ts +1 -3
  11. package/build/adapters/claude-code/index.js +48 -35
  12. package/build/adapters/client-map.js +1 -0
  13. package/build/adapters/codex/index.d.ts +1 -3
  14. package/build/adapters/codex/index.js +1 -31
  15. package/build/adapters/cursor/index.d.ts +1 -3
  16. package/build/adapters/cursor/index.js +0 -11
  17. package/build/adapters/detect.d.ts +1 -0
  18. package/build/adapters/detect.js +18 -2
  19. package/build/adapters/gemini-cli/index.d.ts +1 -3
  20. package/build/adapters/gemini-cli/index.js +0 -30
  21. package/build/adapters/kiro/index.d.ts +1 -3
  22. package/build/adapters/kiro/index.js +0 -30
  23. package/build/adapters/openclaw/index.d.ts +1 -3
  24. package/build/adapters/openclaw/index.js +0 -38
  25. package/build/adapters/opencode/index.d.ts +5 -4
  26. package/build/adapters/opencode/index.js +37 -41
  27. package/build/adapters/types.d.ts +1 -14
  28. package/build/adapters/vscode-copilot/index.d.ts +1 -3
  29. package/build/adapters/vscode-copilot/index.js +0 -32
  30. package/build/adapters/zed/index.d.ts +1 -3
  31. package/build/adapters/zed/index.js +0 -30
  32. package/build/cli.js +12 -28
  33. package/build/executor.d.ts +0 -1
  34. package/build/executor.js +28 -16
  35. package/build/openclaw-plugin.js +12 -34
  36. package/build/opencode-plugin.d.ts +1 -0
  37. package/build/opencode-plugin.js +5 -9
  38. package/build/runtime.js +29 -11
  39. package/build/server.d.ts +2 -0
  40. package/build/server.js +69 -61
  41. package/build/store.d.ts +4 -3
  42. package/build/store.js +101 -34
  43. package/build/truncate.d.ts +4 -17
  44. package/build/truncate.js +4 -52
  45. package/cli.bundle.mjs +184 -157
  46. package/configs/codex/AGENTS.md +19 -0
  47. package/configs/kilo/AGENTS.md +58 -0
  48. package/configs/kilo/kilo.json +10 -0
  49. package/hooks/core/tool-naming.mjs +1 -0
  50. package/hooks/ensure-deps.mjs +80 -2
  51. package/hooks/pretooluse.mjs +25 -20
  52. package/hooks/routing-block.mjs +10 -1
  53. package/hooks/session-snapshot.bundle.mjs +13 -13
  54. package/hooks/sessionstart.mjs +25 -1
  55. package/openclaw.plugin.json +1 -1
  56. package/package.json +1 -1
  57. package/server.bundle.mjs +159 -129
  58. package/skills/context-mode-ops/SKILL.md +111 -0
  59. package/skills/context-mode-ops/agent-teams.md +198 -0
  60. package/skills/context-mode-ops/communication.md +224 -0
  61. package/skills/context-mode-ops/release.md +199 -0
  62. package/skills/context-mode-ops/review-pr.md +269 -0
  63. package/skills/context-mode-ops/tdd.md +329 -0
  64. package/skills/context-mode-ops/triage-issue.md +218 -0
  65. package/skills/context-mode-ops/validation.md +238 -0
  66. package/start.mjs +5 -52
@@ -0,0 +1,329 @@
1
+ # Test-Driven Development
2
+
3
+ <tdd_enforcement>
4
+ THIS FILE IS MANDATORY. Every agent, every Staff Engineer, every Architect MUST follow this.
5
+ If you skip TDD, your work will be REJECTED. There are no exceptions.
6
+ Do NOT write implementation code before you have a failing test.
7
+ </tdd_enforcement>
8
+
9
+ > Source: [mattpocock/skills/tdd](https://github.com/mattpocock/skills/tree/main/tdd) — embedded with context-mode enforcement.
10
+
11
+ ## Philosophy
12
+
13
+ **Core principle**: Tests should verify behavior through public interfaces, not implementation details. Code can change entirely; tests shouldn't.
14
+
15
+ **Good tests** are integration-style: they exercise real code paths through public APIs. They describe _what_ the system does, not _how_ it does it. A good test reads like a specification — "user can checkout with valid cart" tells you exactly what capability exists. These tests survive refactors because they don't care about internal structure.
16
+
17
+ **Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means (like querying a database directly instead of using the interface). The warning sign: your test breaks when you refactor, but behavior hasn't changed. If you rename an internal function and tests fail, those tests were testing implementation, not behavior.
18
+
19
+ ## Anti-Pattern: Horizontal Slices
20
+
21
+ **DO NOT write all tests first, then all implementation.** This is "horizontal slicing" — treating RED as "write all tests" and GREEN as "write all code."
22
+
23
+ This produces **crap tests**:
24
+
25
+ - Tests written in bulk test _imagined_ behavior, not _actual_ behavior
26
+ - You end up testing the _shape_ of things (data structures, function signatures) rather than user-facing behavior
27
+ - Tests become insensitive to real changes — they pass when behavior breaks, fail when behavior is fine
28
+ - You outrun your headlights, committing to test structure before understanding the implementation
29
+
30
+ **Correct approach**: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle. Because you just wrote the code, you know exactly what behavior matters and how to verify it.
31
+
32
+ ```
33
+ WRONG (horizontal):
34
+ RED: test1, test2, test3, test4, test5
35
+ GREEN: impl1, impl2, impl3, impl4, impl5
36
+
37
+ RIGHT (vertical):
38
+ RED→GREEN: test1→impl1
39
+ RED→GREEN: test2→impl2
40
+ RED→GREEN: test3→impl3
41
+ ...
42
+ ```
43
+
44
+ ## Workflow
45
+
46
+ ### 1. Planning
47
+
48
+ Before writing any code:
49
+
50
+ - [ ] Identify what behaviors need to change or be added
51
+ - [ ] List the behaviors to test (not implementation steps)
52
+ - [ ] Identify opportunities for deep modules (small interface, deep implementation)
53
+ - [ ] Design interfaces for testability
54
+
55
+ **You can't test everything.** Focus testing effort on critical paths and complex logic, not every possible edge case.
56
+
57
+ ### 2. Tracer Bullet
58
+
59
+ For the first behavior:
60
+
61
+ ```
62
+ RED: Write test for first behavior → test fails
63
+ GREEN: Write minimal code to pass → test passes
64
+ ```
65
+
66
+ Then refactor:
67
+
68
+ - [ ] Extract duplication
69
+ - [ ] Deepen modules (move complexity behind simple interfaces)
70
+ - [ ] Apply SOLID principles where natural
71
+ - [ ] Consider what new code reveals about existing code
72
+ - [ ] Run tests after each refactor step
73
+
74
+ **Never refactor while RED.** Get to GREEN first.
75
+
76
+ ### 3. Next Behavior
77
+
78
+ ```
79
+ RED: Write next test → fails
80
+ GREEN: Minimal code to pass → passes
81
+ ```
82
+
83
+ Refactor again. Repeat until all behaviors are covered.
84
+
85
+ ---
86
+
87
+ ## Good and Bad Tests
88
+
89
+ ### Good Tests (Integration-Style)
90
+
91
+ ```typescript
92
+ // GOOD: Tests observable behavior
93
+ test("user can checkout with valid cart", async () => {
94
+ const cart = createCart();
95
+ cart.add(product);
96
+ const result = await checkout(cart, paymentMethod);
97
+ expect(result.status).toBe("confirmed");
98
+ });
99
+ ```
100
+
101
+ Characteristics:
102
+
103
+ - Tests behavior users/callers care about
104
+ - Uses public API only
105
+ - Survives internal refactors
106
+ - Describes WHAT, not HOW
107
+ - One logical assertion per test
108
+
109
+ ### Bad Tests (Implementation-Coupled)
110
+
111
+ ```typescript
112
+ // BAD: Tests implementation details
113
+ test("checkout calls paymentService.process", async () => {
114
+ const mockPayment = jest.mock(paymentService);
115
+ await checkout(cart, payment);
116
+ expect(mockPayment.process).toHaveBeenCalledWith(cart.total);
117
+ });
118
+ ```
119
+
120
+ Red flags:
121
+
122
+ - Mocking internal collaborators
123
+ - Testing private methods
124
+ - Asserting on call counts/order
125
+ - Test breaks when refactoring without behavior change
126
+ - Test name describes HOW not WHAT
127
+ - Verifying through external means instead of interface
128
+
129
+ ```typescript
130
+ // BAD: Bypasses interface to verify
131
+ test("createUser saves to database", async () => {
132
+ await createUser({ name: "Alice" });
133
+ const row = await db.query("SELECT * FROM users WHERE name = ?", ["Alice"]);
134
+ expect(row).toBeDefined();
135
+ });
136
+
137
+ // GOOD: Verifies through interface
138
+ test("createUser makes user retrievable", async () => {
139
+ const user = await createUser({ name: "Alice" });
140
+ const retrieved = await getUser(user.id);
141
+ expect(retrieved.name).toBe("Alice");
142
+ });
143
+ ```
144
+
145
+ ---
146
+
147
+ ## When to Mock
148
+
149
+ Mock at **system boundaries** only:
150
+
151
+ - External APIs (payment, email, etc.)
152
+ - Databases (sometimes — prefer test DB)
153
+ - Time/randomness
154
+ - File system (sometimes)
155
+
156
+ Don't mock:
157
+
158
+ - Your own classes/modules
159
+ - Internal collaborators
160
+ - Anything you control
161
+
162
+ ### Designing for Mockability
163
+
164
+ **1. Use dependency injection**
165
+
166
+ Pass external dependencies in rather than creating them internally:
167
+
168
+ ```typescript
169
+ // Easy to mock
170
+ function processPayment(order, paymentClient) {
171
+ return paymentClient.charge(order.total);
172
+ }
173
+
174
+ // Hard to mock
175
+ function processPayment(order) {
176
+ const client = new StripeClient(process.env.STRIPE_KEY);
177
+ return client.charge(order.total);
178
+ }
179
+ ```
180
+
181
+ **2. Prefer SDK-style interfaces over generic fetchers**
182
+
183
+ Create specific functions for each external operation instead of one generic function with conditional logic:
184
+
185
+ ```typescript
186
+ // GOOD: Each function is independently mockable
187
+ const api = {
188
+ getUser: (id) => fetch(`/users/${id}`),
189
+ getOrders: (userId) => fetch(`/users/${userId}/orders`),
190
+ createOrder: (data) => fetch('/orders', { method: 'POST', body: data }),
191
+ };
192
+
193
+ // BAD: Mocking requires conditional logic inside the mock
194
+ const api = {
195
+ fetch: (endpoint, options) => fetch(endpoint, options),
196
+ };
197
+ ```
198
+
199
+ The SDK approach means:
200
+ - Each mock returns one specific shape
201
+ - No conditional logic in test setup
202
+ - Easier to see which endpoints a test exercises
203
+ - Type safety per endpoint
204
+
205
+ ---
206
+
207
+ ## Interface Design for Testability
208
+
209
+ Good interfaces make testing natural:
210
+
211
+ 1. **Accept dependencies, don't create them**
212
+
213
+ ```typescript
214
+ // Testable
215
+ function processOrder(order, paymentGateway) {}
216
+
217
+ // Hard to test
218
+ function processOrder(order) {
219
+ const gateway = new StripeGateway();
220
+ }
221
+ ```
222
+
223
+ 2. **Return results, don't produce side effects**
224
+
225
+ ```typescript
226
+ // Testable
227
+ function calculateDiscount(cart): Discount {}
228
+
229
+ // Hard to test
230
+ function applyDiscount(cart): void {
231
+ cart.total -= discount;
232
+ }
233
+ ```
234
+
235
+ 3. **Small surface area**
236
+ - Fewer methods = fewer tests needed
237
+ - Fewer params = simpler test setup
238
+
239
+ ---
240
+
241
+ ## Deep Modules
242
+
243
+ From "A Philosophy of Software Design":
244
+
245
+ **Deep module** = small interface + lots of implementation
246
+
247
+ ```
248
+ ┌─────────────────────┐
249
+ │ Small Interface │ ← Few methods, simple params
250
+ ├─────────────────────┤
251
+ │ │
252
+ │ │
253
+ │ Deep Implementation│ ← Complex logic hidden
254
+ │ │
255
+ │ │
256
+ └─────────────────────┘
257
+ ```
258
+
259
+ **Shallow module** = large interface + little implementation (avoid)
260
+
261
+ ```
262
+ ┌─────────────────────────────────┐
263
+ │ Large Interface │ ← Many methods, complex params
264
+ ├─────────────────────────────────┤
265
+ │ Thin Implementation │ ← Just passes through
266
+ └─────────────────────────────────┘
267
+ ```
268
+
269
+ When designing interfaces, ask:
270
+
271
+ - Can I reduce the number of methods?
272
+ - Can I simplify the parameters?
273
+ - Can I hide more complexity inside?
274
+
275
+ ---
276
+
277
+ ## Refactor Candidates
278
+
279
+ After TDD cycle, look for:
280
+
281
+ - **Duplication** → Extract function/class
282
+ - **Long methods** → Break into private helpers (keep tests on public interface)
283
+ - **Shallow modules** → Combine or deepen
284
+ - **Feature envy** → Move logic to where data lives
285
+ - **Primitive obsession** → Introduce value objects
286
+ - **Existing code** the new code reveals as problematic
287
+
288
+ ---
289
+
290
+ ## context-mode Specific Rules
291
+
292
+ ### CONTRIBUTING.md Is the Authority
293
+
294
+ **Read `CONTRIBUTING.md` before writing any test.** It defines:
295
+ - Test file organization (which file to put your test in)
296
+ - TDD workflow (Red-Green-Refactor)
297
+ - Output quality comparison (before/after)
298
+ - Local development setup
299
+
300
+ **Do NOT create new test files.** `CONTRIBUTING.md` has the complete test file mapping. Add your tests to the existing file that covers the same domain. If no file fits, ask the maintainer.
301
+
302
+ ### CI Builds Bundles — You Don't
303
+
304
+ **Do NOT run `npm run build` or `npm run bundle`.** Bundle files (`server.bundle.mjs`, `cli.bundle.mjs`) are generated by GitHub CI automatically. Never create, modify, or push bundle files. You only run:
305
+
306
+ ```bash
307
+ npm test # vitest — validates behavior
308
+ npm run typecheck # tsc --noEmit — validates types
309
+ ```
310
+
311
+ That's it. No build. No bundle. CI handles the rest.
312
+
313
+ ### TDD Enforcement in Subagents
314
+
315
+ Every Staff Engineer agent MUST include this in their prompt:
316
+
317
+ ```
318
+ MANDATORY TDD — your work will be REJECTED without this:
319
+ 1. Read CONTRIBUTING.md for test file organization — do NOT create new test files
320
+ 2. Write a failing test FIRST in the correct existing test file
321
+ 3. Run: npx vitest run tests/{file} — MUST FAIL
322
+ 4. Write minimal code to pass
323
+ 5. Run: npx vitest run tests/{file} — MUST PASS
324
+ 6. Refactor if needed, tests stay green
325
+ 7. Report RED→GREEN evidence:
326
+ "RED: test 'detects opencode via env var' — FAIL (expected)"
327
+ "GREEN: added env check in detect.ts — PASS"
328
+ Without this evidence, your PR is auto-rejected.
329
+ ```
@@ -0,0 +1,218 @@
1
+ # Triage Issue Workflow
2
+
3
+ ## Trigger
4
+
5
+ User says: "triage issue #N", "fix issue #N", "analyze issue #N"
6
+
7
+ ## Step-by-Step
8
+
9
+ ### 1. Gather Intelligence (ONE batch call)
10
+
11
+ Use `ctx_batch_execute` to gather everything in ONE call:
12
+
13
+ ```javascript
14
+ commands: [
15
+ { label: "issue-body", command: "gh issue view {N} --json title,body,labels,state,comments,author,createdAt" },
16
+ { label: "issue-comments", command: "gh issue view {N} --comments" },
17
+ { label: "recent-related-prs", command: "gh pr list --state all --limit 10 --json number,title,state,headRefName" },
18
+ { label: "source-tree", command: "find src -type f -name '*.ts' | sort" },
19
+ { label: "test-tree", command: "find tests -type f -name '*.test.ts' | sort" },
20
+ { label: "open-issues", command: "gh issue list --state open --limit 20 --json number,title,labels" }
21
+ ],
22
+ queries: [
23
+ "issue title description problem",
24
+ "affected adapter platform",
25
+ "error message stack trace",
26
+ "environment variables mentioned",
27
+ "OS platform specific",
28
+ "related PRs and issues"
29
+ ]
30
+ ```
31
+
32
+ ### 2. Classify Domains
33
+
34
+ From the gathered intelligence, identify:
35
+
36
+ - [ ] **Affected adapters** — which of the 12 platforms?
37
+ - [ ] **Affected OS** — macOS, Linux, Windows, or all?
38
+ - [ ] **Core modules** — server, store, executor, session, hooks?
39
+ - [ ] **Issue type** — bug, feature request, question, discussion?
40
+ - [ ] **Severity** — breaking (can't use tool), degraded (works but wrong), cosmetic
41
+
42
+ ### 3. Spawn Agent Army
43
+
44
+ Based on classification, spawn from [agent-teams.md](agent-teams.md):
45
+
46
+ ```
47
+ ALWAYS spawn:
48
+ ├── Context Mode Architect (reviews everything)
49
+ ├── QA Engineer (runs all tests)
50
+ ├── DX Engineer (checks user-facing quality)
51
+
52
+ IF adapter X is affected:
53
+ ├── {X} Architect
54
+ ├── {X} Staff Engineer
55
+
56
+ IF OS-specific:
57
+ ├── OS Compatibility Architect
58
+ ├── {macOS|Linux|Windows} Staff Engineer
59
+
60
+ IF domain-specific:
61
+ ├── {Domain} Architect
62
+ └── (Staff Engineer if code changes needed)
63
+ ```
64
+
65
+ **Example: Issue #208 "CLI upgrade full support for Opencode/Kilocode"**
66
+ ```
67
+ Agents to spawn:
68
+ 1. Context Mode Architect
69
+ 2. QA Engineer
70
+ 3. DX Engineer
71
+ 4. OpenCode Architect
72
+ 5. OpenCode Staff Engineer
73
+ 6. Kilo Architect
74
+ 7. Kilo Staff Engineer
75
+ 8. Hooks Architect (CLI upgrade touches hooks)
76
+ 9. OS Compatibility Architect (CLI runs on all OS)
77
+ ```
78
+
79
+ ### 4. Investigation Phase (Parallel)
80
+
81
+ All agents investigate simultaneously:
82
+
83
+ **Architects** research:
84
+ - Read relevant source files
85
+ - Check if claimed behavior actually exists
86
+ - Validate ENV vars against real platform docs (use WebSearch + Context7)
87
+ - Review related closed issues for prior art
88
+ - Report: FINDINGS with specific file:line references
89
+
90
+ **Staff Engineers** prepare (TDD-first per [tdd.md](tdd.md)):
91
+ - Read the code that needs changing
92
+ - **RED**: Write a failing test that reproduces the bug / specifies new behavior
93
+ - Run test — verify it **FAILS** (if it passes, the test is useless)
94
+ - **GREEN**: Write minimal code to make the test pass
95
+ - Run test — verify it **PASSES**
96
+ - **REFACTOR**: Clean up while keeping tests green
97
+ - Repeat for each behavior (vertical slices, never horizontal)
98
+ - Run full affected adapter tests
99
+ - Report: DRAFT_FIX with RED→GREEN evidence for each behavior
100
+
101
+ ### 5. Ping-Pong Review
102
+
103
+ Route Staff Engineer outputs to their paired Architects:
104
+
105
+ ```
106
+ EM reads Staff Engineer result
107
+ → Sends to Architect via Agent(SendMessage)
108
+ → Architect reviews: APPROVED or CHANGES_NEEDED
109
+ → If CHANGES_NEEDED: route back to Staff Engineer
110
+ → Max 2 rounds, then EM decides
111
+ ```
112
+
113
+ ### 6. Validate (QA Engineer)
114
+
115
+ QA Engineer runs the full validation matrix:
116
+
117
+ ```shell
118
+ # All adapter tests
119
+ npx vitest run tests/adapters/
120
+
121
+ # Core tests
122
+ npx vitest run tests/core/
123
+
124
+ # Full suite
125
+ npm test
126
+
127
+ # TypeScript
128
+ npm run typecheck
129
+ ```
130
+
131
+ Report as a matrix:
132
+
133
+ ```
134
+ Adapter Tests:
135
+ ✓ claude-code ✓ gemini-cli ✓ opencode
136
+ ✓ openclaw ✓ kilo ✓ codex
137
+ ✓ vscode-copilot ✓ cursor ✓ antigravity
138
+ ✓ kiro ✓ pi ✓ zed
139
+
140
+ Core Tests: ✓ routing ✓ search ✓ server ✓ cli
141
+ TypeScript: ✓ no errors
142
+ Full Suite: ✓ 47/47 passed
143
+ ```
144
+
145
+ ### 7. Push Directly to `next`
146
+
147
+ **Do NOT open a PR.** Push fixes directly to the `next` branch:
148
+
149
+ ```bash
150
+ # Ensure we're on next
151
+ git checkout next
152
+ git pull origin next
153
+
154
+ # Apply changes from worktree agents
155
+ # ... (merge worktree changes)
156
+
157
+ # Commit with issue reference
158
+ git commit -m "fix: {concise description} (closes #{N})
159
+
160
+ - {what was broken}
161
+ - {what was fixed}
162
+ - {which adapters/modules affected}
163
+
164
+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
165
+
166
+ # Push to next
167
+ git push origin next
168
+ ```
169
+
170
+ ### 8. Comment on Issue & Close
171
+
172
+ After pushing to `next`, comment and **close the issue immediately**:
173
+
174
+ ```bash
175
+ gh issue comment {N} --body "$(cat <<'EOF'
176
+ Hey @{author}! 👋
177
+
178
+ We investigated this and pushed a fix to the `next` branch ({commit_sha}).
179
+
180
+ **What was happening:** {technical explanation of the root cause}
181
+
182
+ **What we fixed:** {technical explanation of the fix}
183
+
184
+ **Affected area:** {adapter/module names}
185
+
186
+ This will ship in the next release. Once it's out, could you please test it in your setup and let us know if it resolves the issue? 🙏
187
+
188
+ If the fix doesn't work for you, feel free to reopen this issue.
189
+
190
+ Thanks for reporting this!
191
+ EOF
192
+ )"
193
+
194
+ # Close the issue — fix is pushed, job done
195
+ gh issue close {N}
196
+ ```
197
+
198
+ ## Decision Tree: Fix vs. Wontfix vs. Needs Info
199
+
200
+ ```
201
+ Issue is clear and reproducible?
202
+ ├── YES → Fix it (steps 3-8 above)
203
+ ├── UNCLEAR → Comment asking for reproduction steps
204
+ │ └── Template: "Could you share the exact command/config that triggers this?"
205
+ └── BY DESIGN → Explain why, close with "working as intended" label
206
+ └── Be kind — explain the design decision
207
+ ```
208
+
209
+ ## Edge Cases
210
+
211
+ ### Issue references a feature that doesn't exist
212
+ The issue author may have been told by an LLM that a feature exists when it doesn't. Use [validation.md](validation.md) ENV verification to catch this. Comment explaining the misunderstanding kindly.
213
+
214
+ ### Issue is a duplicate
215
+ Link to the original issue, close as duplicate, thank the reporter.
216
+
217
+ ### Issue is actually a feature request
218
+ Re-label, add to backlog discussion, don't close — let the community weigh in.