context-mode 1.0.62 → 1.0.63
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/marketplace.json +2 -2
- package/.claude-plugin/plugin.json +1 -1
- package/.openclaw-plugin/openclaw.plugin.json +1 -1
- package/.openclaw-plugin/package.json +1 -1
- package/build/executor.d.ts +0 -3
- package/build/executor.js +5 -15
- package/build/server.js +3 -14
- package/cli.bundle.mjs +81 -81
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
- package/server.bundle.mjs +73 -73
- package/skills/context-mode-ops/SKILL.md +21 -0
- package/skills/context-mode-ops/triage-issue.md +55 -7
- package/skills/context-mode-ops/validation.md +69 -0
|
@@ -7,6 +7,26 @@ description: Manage context-mode GitHub issues, PRs, releases, and marketing wit
|
|
|
7
7
|
|
|
8
8
|
Parallel subagent army for issue triage, PR review, and releases.
|
|
9
9
|
|
|
10
|
+
## Claim Verification: BLOCKING GATE
|
|
11
|
+
|
|
12
|
+
<claim_verification_enforcement>
|
|
13
|
+
STOP. Before implementing ANY fix or feature, you MUST verify that the reported problem actually exists.
|
|
14
|
+
We shipped inheritEnvKeys because an LLM said Claude Code strips env vars from child processes — it does not.
|
|
15
|
+
We got burned shipping a fix for an unverified claim. Never again.
|
|
16
|
+
|
|
17
|
+
RULE: No code without proof. Every bug must be reproduced. Every behavioral claim must be
|
|
18
|
+
verified against official docs or source code. LLM knowledge about platform behavior is NOT evidence.
|
|
19
|
+
If you cannot verify the claim, ask the reporter for evidence BEFORE writing a single line of code.
|
|
20
|
+
</claim_verification_enforcement>
|
|
21
|
+
|
|
22
|
+
**Read [validation.md](validation.md) Problem Verification section FIRST.** Summary:
|
|
23
|
+
|
|
24
|
+
1. **Bug reports**: Reproduce locally or request reproduction steps. No repro = no fix.
|
|
25
|
+
2. **Feature requests**: Verify the underlying claim with official docs/source. Never trust LLM assertions about how platforms behave.
|
|
26
|
+
3. **Performance claims**: Benchmark it. "Should be faster" is not evidence.
|
|
27
|
+
4. **Cannot verify?** Comment on the issue asking for `ctx-debug.sh` output and repro steps. Do NOT implement speculatively.
|
|
28
|
+
5. Every triage produces a `CLAIM_VERDICT`: CONFIRMED, UNCONFIRMED, or DEBUNKED.
|
|
29
|
+
|
|
10
30
|
## TDD-First: BLOCKING GATE
|
|
11
31
|
|
|
12
32
|
<tdd_enforcement>
|
|
@@ -74,6 +94,7 @@ Never use curl/wget to GitHub API. `gh` handles auth, pagination, and rate limit
|
|
|
74
94
|
## Validation (Every Workflow)
|
|
75
95
|
|
|
76
96
|
Before shipping ANY change, validate per [validation.md](validation.md):
|
|
97
|
+
- [ ] **Problem verified** — claim reproduced or confirmed with hard evidence (CLAIM_VERDICT logged)
|
|
77
98
|
- [ ] ENV vars verified against real platform source (not LLM hallucinations)
|
|
78
99
|
- [ ] All 12 adapter tests pass: `npx vitest run tests/adapters/`
|
|
79
100
|
- [ ] TypeScript compiles: `npm run typecheck`
|
|
@@ -76,7 +76,49 @@ Agents to spawn:
|
|
|
76
76
|
9. OS Compatibility Architect (CLI runs on all OS)
|
|
77
77
|
```
|
|
78
78
|
|
|
79
|
-
### 4.
|
|
79
|
+
### 4. Claim Verification — BLOCKING GATE
|
|
80
|
+
|
|
81
|
+
<claim_verification_enforcement>
|
|
82
|
+
STOP. Before ANY agent writes implementation code, the claim in the issue MUST be verified
|
|
83
|
+
with hard evidence. We shipped inheritEnvKeys because an LLM said Claude Code strips env vars
|
|
84
|
+
— it doesn't. We got burned shipping a fix for an unverified claim. Never again.
|
|
85
|
+
</claim_verification_enforcement>
|
|
86
|
+
|
|
87
|
+
**Every issue makes a claim. Verify it BEFORE coding.**
|
|
88
|
+
|
|
89
|
+
| Issue Type | Required Evidence | How to Get It |
|
|
90
|
+
|------------|-------------------|---------------|
|
|
91
|
+
| **Bug report** | Reproduce locally with a failing test or command | Run the exact steps from the report. If it doesn't fail, the bug may not exist. |
|
|
92
|
+
| **Feature request claiming behavior X** | Prove behavior X actually happens | Check official docs, source code, or web search. NOT LLM knowledge — LLMs hallucinate platform behavior. |
|
|
93
|
+
| **Feature request claiming perf issue** | Benchmark the actual impact | Measure before/after. No "it should be faster" — show numbers. |
|
|
94
|
+
| **"Tool X sets env var Y"** | Find it in official source | `ctx_fetch_and_index` the platform's docs/source. Grep their repo. If you can't find it, it probably doesn't exist. |
|
|
95
|
+
|
|
96
|
+
**Verification Steps:**
|
|
97
|
+
|
|
98
|
+
1. **Architect agents** must produce a `CLAIM_VERDICT` before any Staff Engineer writes code:
|
|
99
|
+
```
|
|
100
|
+
CLAIM: "{exact claim from the issue}"
|
|
101
|
+
EVIDENCE: {link to official doc, source file, or reproduction output}
|
|
102
|
+
VERDICT: CONFIRMED | UNCONFIRMED | HALLUCINATED
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
2. If `VERDICT: UNCONFIRMED` — do NOT implement. Instead, comment on the issue:
|
|
106
|
+
```
|
|
107
|
+
We couldn't reproduce/verify this claim. Could you provide:
|
|
108
|
+
- Debug output from: npx context-mode doctor (or ctx-debug.sh)
|
|
109
|
+
- Exact steps to reproduce
|
|
110
|
+
- Platform version and OS
|
|
111
|
+
|
|
112
|
+
We want to fix this but need to confirm the problem exists first.
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
3. If `VERDICT: HALLUCINATED` — the reporter (or their LLM) made up a behavior that doesn't exist. Comment kindly explaining the misunderstanding. Close with "working as intended" if appropriate.
|
|
116
|
+
|
|
117
|
+
4. Only `VERDICT: CONFIRMED` proceeds to the Investigation Phase below.
|
|
118
|
+
|
|
119
|
+
**The `ctx-debug.sh` script exists for exactly this purpose.** When in doubt, ask the reporter to run it and paste the output.
|
|
120
|
+
|
|
121
|
+
### 5. Investigation Phase (Parallel)
|
|
80
122
|
|
|
81
123
|
All agents investigate simultaneously:
|
|
82
124
|
|
|
@@ -98,7 +140,7 @@ All agents investigate simultaneously:
|
|
|
98
140
|
- Run full affected adapter tests
|
|
99
141
|
- Report: DRAFT_FIX with RED→GREEN evidence for each behavior
|
|
100
142
|
|
|
101
|
-
###
|
|
143
|
+
### 6. Ping-Pong Review
|
|
102
144
|
|
|
103
145
|
Route Staff Engineer outputs to their paired Architects:
|
|
104
146
|
|
|
@@ -110,7 +152,7 @@ EM reads Staff Engineer result
|
|
|
110
152
|
→ Max 2 rounds, then EM decides
|
|
111
153
|
```
|
|
112
154
|
|
|
113
|
-
###
|
|
155
|
+
### 7. Validate (QA Engineer)
|
|
114
156
|
|
|
115
157
|
QA Engineer runs the full validation matrix:
|
|
116
158
|
|
|
@@ -142,7 +184,7 @@ TypeScript: ✓ no errors
|
|
|
142
184
|
Full Suite: ✓ 47/47 passed
|
|
143
185
|
```
|
|
144
186
|
|
|
145
|
-
###
|
|
187
|
+
### 8. Push Directly to `next`
|
|
146
188
|
|
|
147
189
|
**Do NOT open a PR.** Push fixes directly to the `next` branch:
|
|
148
190
|
|
|
@@ -167,7 +209,7 @@ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
|
|
|
167
209
|
git push origin next
|
|
168
210
|
```
|
|
169
211
|
|
|
170
|
-
###
|
|
212
|
+
### 9. Comment on Issue & Close
|
|
171
213
|
|
|
172
214
|
After pushing to `next`, comment and **close the issue immediately**:
|
|
173
215
|
|
|
@@ -198,8 +240,14 @@ gh issue close {N}
|
|
|
198
240
|
## Decision Tree: Fix vs. Wontfix vs. Needs Info
|
|
199
241
|
|
|
200
242
|
```
|
|
201
|
-
Issue
|
|
202
|
-
├── YES →
|
|
243
|
+
Issue makes a claim about platform behavior?
|
|
244
|
+
├── YES → Run Claim Verification (Step 4) FIRST
|
|
245
|
+
│ ├── CONFIRMED → Fix it (steps 5-9 above)
|
|
246
|
+
│ ├── UNCONFIRMED → Request evidence (ctx-debug.sh output, repro steps)
|
|
247
|
+
│ └── HALLUCINATED → Explain kindly, close if appropriate
|
|
248
|
+
│
|
|
249
|
+
Issue is clear and reproducible (no behavioral claim)?
|
|
250
|
+
├── YES → Fix it (steps 5-9 above)
|
|
203
251
|
├── UNCLEAR → Comment asking for reproduction steps
|
|
204
252
|
│ └── Template: "Could you share the exact command/config that triggers this?"
|
|
205
253
|
└── BY DESIGN → Explain why, close with "working as intended" label
|
|
@@ -2,6 +2,74 @@
|
|
|
2
2
|
|
|
3
3
|
Cross-cutting validation rules used by ALL workflows (triage, review, release).
|
|
4
4
|
|
|
5
|
+
## Problem Verification — FIRST GATE
|
|
6
|
+
|
|
7
|
+
<problem_verification_enforcement>
|
|
8
|
+
This is the FIRST validation step, before anything else. We shipped inheritEnvKeys because
|
|
9
|
+
we trusted an LLM claim that Claude Code strips environment variables — it does not.
|
|
10
|
+
We got burned shipping a fix for an unverified claim. Never again.
|
|
11
|
+
Every bug report, feature request, and behavioral claim MUST be proven true before code is written.
|
|
12
|
+
</problem_verification_enforcement>
|
|
13
|
+
|
|
14
|
+
### For Bug Reports
|
|
15
|
+
|
|
16
|
+
**Reproduce it or reject it.** Run the exact reproduction steps from the issue. If it doesn't fail, the bug may not exist.
|
|
17
|
+
|
|
18
|
+
```
|
|
19
|
+
Step 1: Extract the claimed reproduction steps from the issue
|
|
20
|
+
Step 2: Run them locally (use ctx_execute or a test)
|
|
21
|
+
Step 3: Record the ACTUAL output
|
|
22
|
+
Step 4: Compare actual vs. claimed behavior
|
|
23
|
+
Step 5: VERDICT:
|
|
24
|
+
→ REPRODUCED: Bug is real, proceed to fix
|
|
25
|
+
→ NOT_REPRODUCED: Ask reporter for ctx-debug.sh output and exact repro steps
|
|
26
|
+
→ INVALID: Reporter's environment is misconfigured, help them fix it
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
### For Feature Requests
|
|
30
|
+
|
|
31
|
+
**Verify the underlying claim.** Feature requests always contain an implicit claim ("X behaves this way", "Y is slow", "Z doesn't support W"). Prove the claim first.
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
Step 1: Identify the claim (e.g., "Claude Code strips env vars from child processes")
|
|
35
|
+
Step 2: Find HARD EVIDENCE — official docs, source code, or measured benchmarks
|
|
36
|
+
→ Use ctx_fetch_and_index on official docs/repos
|
|
37
|
+
→ Use ctx_execute to run actual tests
|
|
38
|
+
→ NEVER trust LLM knowledge about platform behavior — LLMs hallucinate this constantly
|
|
39
|
+
Step 3: VERDICT:
|
|
40
|
+
→ CONFIRMED: Claim is true, proceed to design
|
|
41
|
+
→ UNCONFIRMED: Cannot verify — ask reporter for evidence before implementing
|
|
42
|
+
→ DEBUNKED: Claim is false — comment on issue explaining the misunderstanding
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
### Requesting Evidence from Reporters
|
|
46
|
+
|
|
47
|
+
When a claim cannot be verified, comment on the issue BEFORE implementing:
|
|
48
|
+
|
|
49
|
+
```markdown
|
|
50
|
+
We want to address this but need to verify the underlying behavior first.
|
|
51
|
+
Could you provide:
|
|
52
|
+
1. Output from: `npx context-mode doctor` (or run `ctx-debug.sh`)
|
|
53
|
+
2. Exact reproduction steps
|
|
54
|
+
3. Platform version, adapter, and OS
|
|
55
|
+
|
|
56
|
+
We'll investigate as soon as we can confirm the issue. Thanks for reporting!
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
### Evidence Log
|
|
60
|
+
|
|
61
|
+
Every triage MUST produce a verification entry:
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
CLAIM: "{exact claim}"
|
|
65
|
+
SOURCE: {issue number or PR}
|
|
66
|
+
EVIDENCE: {link to doc, test output, or benchmark result}
|
|
67
|
+
VERDICT: CONFIRMED | UNCONFIRMED | DEBUNKED
|
|
68
|
+
ACTION: {proceed | request-info | close-as-invalid}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
---
|
|
72
|
+
|
|
5
73
|
## ENV Variable Verification
|
|
6
74
|
|
|
7
75
|
LLMs frequently hallucinate environment variables. Every ENV var in an issue or PR must be verified.
|
|
@@ -229,6 +297,7 @@ npm run typecheck
|
|
|
229
297
|
|
|
230
298
|
Every change, regardless of workflow, must pass:
|
|
231
299
|
|
|
300
|
+
- [ ] **Problem verified** — CLAIM_VERDICT is CONFIRMED with hard evidence (this is gate zero)
|
|
232
301
|
- [ ] `npm run typecheck` — 0 errors
|
|
233
302
|
- [ ] `npm test` — all pass
|
|
234
303
|
- [ ] Adapter tests — all 12 pass (or N/A if untouched)
|