context-mode 1.0.62 → 1.0.63

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -7,6 +7,26 @@ description: Manage context-mode GitHub issues, PRs, releases, and marketing wit
7
7
 
8
8
  Parallel subagent army for issue triage, PR review, and releases.
9
9
 
10
+ ## Claim Verification: BLOCKING GATE
11
+
12
+ <claim_verification_enforcement>
13
+ STOP. Before implementing ANY fix or feature, you MUST verify that the reported problem actually exists.
14
+ We shipped inheritEnvKeys because an LLM said Claude Code strips env vars from child processes — it does not.
15
+ We got burned shipping a fix for an unverified claim. Never again.
16
+
17
+ RULE: No code without proof. Every bug must be reproduced. Every behavioral claim must be
18
+ verified against official docs or source code. LLM knowledge about platform behavior is NOT evidence.
19
+ If you cannot verify the claim, ask the reporter for evidence BEFORE writing a single line of code.
20
+ </claim_verification_enforcement>
21
+
22
+ **Read [validation.md](validation.md) Problem Verification section FIRST.** Summary:
23
+
24
+ 1. **Bug reports**: Reproduce locally or request reproduction steps. No repro = no fix.
25
+ 2. **Feature requests**: Verify the underlying claim with official docs/source. Never trust LLM assertions about how platforms behave.
26
+ 3. **Performance claims**: Benchmark it. "Should be faster" is not evidence.
27
+ 4. **Cannot verify?** Comment on the issue asking for `ctx-debug.sh` output and repro steps. Do NOT implement speculatively.
28
+ 5. Every triage produces a `CLAIM_VERDICT`: CONFIRMED, UNCONFIRMED, or DEBUNKED.
29
+
10
30
  ## TDD-First: BLOCKING GATE
11
31
 
12
32
  <tdd_enforcement>
@@ -74,6 +94,7 @@ Never use curl/wget to GitHub API. `gh` handles auth, pagination, and rate limit
74
94
  ## Validation (Every Workflow)
75
95
 
76
96
  Before shipping ANY change, validate per [validation.md](validation.md):
97
+ - [ ] **Problem verified** — claim reproduced or confirmed with hard evidence (CLAIM_VERDICT logged)
77
98
  - [ ] ENV vars verified against real platform source (not LLM hallucinations)
78
99
  - [ ] All 12 adapter tests pass: `npx vitest run tests/adapters/`
79
100
  - [ ] TypeScript compiles: `npm run typecheck`
@@ -76,7 +76,49 @@ Agents to spawn:
76
76
  9. OS Compatibility Architect (CLI runs on all OS)
77
77
  ```
78
78
 
79
- ### 4. Investigation Phase (Parallel)
79
+ ### 4. Claim Verification — BLOCKING GATE
80
+
81
+ <claim_verification_enforcement>
82
+ STOP. Before ANY agent writes implementation code, the claim in the issue MUST be verified
83
+ with hard evidence. We shipped inheritEnvKeys because an LLM said Claude Code strips env vars
84
+ — it doesn't. We got burned shipping a fix for an unverified claim. Never again.
85
+ </claim_verification_enforcement>
86
+
87
+ **Every issue makes a claim. Verify it BEFORE coding.**
88
+
89
+ | Issue Type | Required Evidence | How to Get It |
90
+ |------------|-------------------|---------------|
91
+ | **Bug report** | Reproduce locally with a failing test or command | Run the exact steps from the report. If it doesn't fail, the bug may not exist. |
92
+ | **Feature request claiming behavior X** | Prove behavior X actually happens | Check official docs, source code, or web search. NOT LLM knowledge — LLMs hallucinate platform behavior. |
93
+ | **Feature request claiming perf issue** | Benchmark the actual impact | Measure before/after. No "it should be faster" — show numbers. |
94
+ | **"Tool X sets env var Y"** | Find it in official source | `ctx_fetch_and_index` the platform's docs/source. Grep their repo. If you can't find it, it probably doesn't exist. |
95
+
96
+ **Verification Steps:**
97
+
98
+ 1. **Architect agents** must produce a `CLAIM_VERDICT` before any Staff Engineer writes code:
99
+ ```
100
+ CLAIM: "{exact claim from the issue}"
101
+ EVIDENCE: {link to official doc, source file, or reproduction output}
102
+ VERDICT: CONFIRMED | UNCONFIRMED | HALLUCINATED
103
+ ```
104
+
105
+ 2. If `VERDICT: UNCONFIRMED` — do NOT implement. Instead, comment on the issue:
106
+ ```
107
+ We couldn't reproduce/verify this claim. Could you provide:
108
+ - Debug output from: npx context-mode doctor (or ctx-debug.sh)
109
+ - Exact steps to reproduce
110
+ - Platform version and OS
111
+
112
+ We want to fix this but need to confirm the problem exists first.
113
+ ```
114
+
115
+ 3. If `VERDICT: HALLUCINATED` — the reporter (or their LLM) made up a behavior that doesn't exist. Comment kindly explaining the misunderstanding. Close with "working as intended" if appropriate.
116
+
117
+ 4. Only `VERDICT: CONFIRMED` proceeds to the Investigation Phase below.
118
+
119
+ **The `ctx-debug.sh` script exists for exactly this purpose.** When in doubt, ask the reporter to run it and paste the output.
120
+
121
+ ### 5. Investigation Phase (Parallel)
80
122
 
81
123
  All agents investigate simultaneously:
82
124
 
@@ -98,7 +140,7 @@ All agents investigate simultaneously:
98
140
  - Run full affected adapter tests
99
141
  - Report: DRAFT_FIX with RED→GREEN evidence for each behavior
100
142
 
101
- ### 5. Ping-Pong Review
143
+ ### 6. Ping-Pong Review
102
144
 
103
145
  Route Staff Engineer outputs to their paired Architects:
104
146
 
@@ -110,7 +152,7 @@ EM reads Staff Engineer result
110
152
  → Max 2 rounds, then EM decides
111
153
  ```
112
154
 
113
- ### 6. Validate (QA Engineer)
155
+ ### 7. Validate (QA Engineer)
114
156
 
115
157
  QA Engineer runs the full validation matrix:
116
158
 
@@ -142,7 +184,7 @@ TypeScript: ✓ no errors
142
184
  Full Suite: ✓ 47/47 passed
143
185
  ```
144
186
 
145
- ### 7. Push Directly to `next`
187
+ ### 8. Push Directly to `next`
146
188
 
147
189
  **Do NOT open a PR.** Push fixes directly to the `next` branch:
148
190
 
@@ -167,7 +209,7 @@ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
167
209
  git push origin next
168
210
  ```
169
211
 
170
- ### 8. Comment on Issue & Close
212
+ ### 9. Comment on Issue & Close
171
213
 
172
214
  After pushing to `next`, comment and **close the issue immediately**:
173
215
 
@@ -198,8 +240,14 @@ gh issue close {N}
198
240
  ## Decision Tree: Fix vs. Wontfix vs. Needs Info
199
241
 
200
242
  ```
201
- Issue is clear and reproducible?
202
- ├── YES → Fix it (steps 3-8 above)
243
+ Issue makes a claim about platform behavior?
244
+ ├── YES → Run Claim Verification (Step 4) FIRST
245
+ │ ├── CONFIRMED → Fix it (steps 5-9 above)
246
+ │ ├── UNCONFIRMED → Request evidence (ctx-debug.sh output, repro steps)
247
+ │ └── HALLUCINATED → Explain kindly, close if appropriate
248
+
249
+ Issue is clear and reproducible (no behavioral claim)?
250
+ ├── YES → Fix it (steps 5-9 above)
203
251
  ├── UNCLEAR → Comment asking for reproduction steps
204
252
  │ └── Template: "Could you share the exact command/config that triggers this?"
205
253
  └── BY DESIGN → Explain why, close with "working as intended" label
@@ -2,6 +2,74 @@
2
2
 
3
3
  Cross-cutting validation rules used by ALL workflows (triage, review, release).
4
4
 
5
+ ## Problem Verification — FIRST GATE
6
+
7
+ <problem_verification_enforcement>
8
+ This is the FIRST validation step, before anything else. We shipped inheritEnvKeys because
9
+ we trusted an LLM claim that Claude Code strips environment variables — it does not.
10
+ We got burned shipping a fix for an unverified claim. Never again.
11
+ Every bug report, feature request, and behavioral claim MUST be proven true before code is written.
12
+ </problem_verification_enforcement>
13
+
14
+ ### For Bug Reports
15
+
16
+ **Reproduce it or reject it.** Run the exact reproduction steps from the issue. If it doesn't fail, the bug may not exist.
17
+
18
+ ```
19
+ Step 1: Extract the claimed reproduction steps from the issue
20
+ Step 2: Run them locally (use ctx_execute or a test)
21
+ Step 3: Record the ACTUAL output
22
+ Step 4: Compare actual vs. claimed behavior
23
+ Step 5: VERDICT:
24
+ → REPRODUCED: Bug is real, proceed to fix
25
+ → NOT_REPRODUCED: Ask reporter for ctx-debug.sh output and exact repro steps
26
+ → INVALID: Reporter's environment is misconfigured, help them fix it
27
+ ```
28
+
29
+ ### For Feature Requests
30
+
31
+ **Verify the underlying claim.** Feature requests always contain an implicit claim ("X behaves this way", "Y is slow", "Z doesn't support W"). Prove the claim first.
32
+
33
+ ```
34
+ Step 1: Identify the claim (e.g., "Claude Code strips env vars from child processes")
35
+ Step 2: Find HARD EVIDENCE — official docs, source code, or measured benchmarks
36
+ → Use ctx_fetch_and_index on official docs/repos
37
+ → Use ctx_execute to run actual tests
38
+ → NEVER trust LLM knowledge about platform behavior — LLMs hallucinate this constantly
39
+ Step 3: VERDICT:
40
+ → CONFIRMED: Claim is true, proceed to design
41
+ → UNCONFIRMED: Cannot verify — ask reporter for evidence before implementing
42
+ → DEBUNKED: Claim is false — comment on issue explaining the misunderstanding
43
+ ```
44
+
45
+ ### Requesting Evidence from Reporters
46
+
47
+ When a claim cannot be verified, comment on the issue BEFORE implementing:
48
+
49
+ ```markdown
50
+ We want to address this but need to verify the underlying behavior first.
51
+ Could you provide:
52
+ 1. Output from: `npx context-mode doctor` (or run `ctx-debug.sh`)
53
+ 2. Exact reproduction steps
54
+ 3. Platform version, adapter, and OS
55
+
56
+ We'll investigate as soon as we can confirm the issue. Thanks for reporting!
57
+ ```
58
+
59
+ ### Evidence Log
60
+
61
+ Every triage MUST produce a verification entry:
62
+
63
+ ```
64
+ CLAIM: "{exact claim}"
65
+ SOURCE: {issue number or PR}
66
+ EVIDENCE: {link to doc, test output, or benchmark result}
67
+ VERDICT: CONFIRMED | UNCONFIRMED | DEBUNKED
68
+ ACTION: {proceed | request-info | close-as-invalid}
69
+ ```
70
+
71
+ ---
72
+
5
73
  ## ENV Variable Verification
6
74
 
7
75
  LLMs frequently hallucinate environment variables. Every ENV var in an issue or PR must be verified.
@@ -229,6 +297,7 @@ npm run typecheck
229
297
 
230
298
  Every change, regardless of workflow, must pass:
231
299
 
300
+ - [ ] **Problem verified** — CLAIM_VERDICT is CONFIRMED with hard evidence (this is gate zero)
232
301
  - [ ] `npm run typecheck` — 0 errors
233
302
  - [ ] `npm test` — all pass
234
303
  - [ ] Adapter tests — all 12 pass (or N/A if untouched)