npm - context-mode - Versions diffs - 1.0.62 → 1.0.63 - Mend

context-mode 1.0.62 → 1.0.63

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

package/.claude-plugin/marketplace.json +2 -2
package/.claude-plugin/plugin.json +1 -1
package/.openclaw-plugin/openclaw.plugin.json +1 -1
package/.openclaw-plugin/package.json +1 -1
package/build/executor.d.ts +0 -3
package/build/executor.js +5 -15
package/build/server.js +3 -14
package/cli.bundle.mjs +81 -81
package/openclaw.plugin.json +1 -1
package/package.json +1 -1
package/server.bundle.mjs +73 -73
package/skills/context-mode-ops/SKILL.md +21 -0
package/skills/context-mode-ops/triage-issue.md +55 -7
package/skills/context-mode-ops/validation.md +69 -0

package/skills/context-mode-ops/SKILL.md CHANGED Viewed

@@ -7,6 +7,26 @@ description: Manage context-mode GitHub issues, PRs, releases, and marketing wit
 Parallel subagent army for issue triage, PR review, and releases.
+## Claim Verification: BLOCKING GATE
+<claim_verification_enforcement>
+STOP. Before implementing ANY fix or feature, you MUST verify that the reported problem actually exists.
+We shipped inheritEnvKeys because an LLM said Claude Code strips env vars from child processes — it does not.
+We got burned shipping a fix for an unverified claim. Never again.
+RULE: No code without proof. Every bug must be reproduced. Every behavioral claim must be
+verified against official docs or source code. LLM knowledge about platform behavior is NOT evidence.
+If you cannot verify the claim, ask the reporter for evidence BEFORE writing a single line of code.
+</claim_verification_enforcement>
+**Read [validation.md](validation.md) Problem Verification section FIRST.** Summary:
+1. **Bug reports**: Reproduce locally or request reproduction steps. No repro = no fix.
+2. **Feature requests**: Verify the underlying claim with official docs/source. Never trust LLM assertions about how platforms behave.
+3. **Performance claims**: Benchmark it. "Should be faster" is not evidence.
+4. **Cannot verify?** Comment on the issue asking for `ctx-debug.sh` output and repro steps. Do NOT implement speculatively.
+5. Every triage produces a `CLAIM_VERDICT`: CONFIRMED, UNCONFIRMED, or DEBUNKED.
 ## TDD-First: BLOCKING GATE
 <tdd_enforcement>
@@ -74,6 +94,7 @@ Never use curl/wget to GitHub API. `gh` handles auth, pagination, and rate limit
 ## Validation (Every Workflow)
 Before shipping ANY change, validate per [validation.md](validation.md):
+- [ ] **Problem verified** — claim reproduced or confirmed with hard evidence (CLAIM_VERDICT logged)
 - [ ] ENV vars verified against real platform source (not LLM hallucinations)
 - [ ] All 12 adapter tests pass: `npx vitest run tests/adapters/`
 - [ ] TypeScript compiles: `npm run typecheck`

package/skills/context-mode-ops/triage-issue.md CHANGED Viewed

@@ -76,7 +76,49 @@ Agents to spawn:
 9. OS Compatibility Architect (CLI runs on all OS)
 ```
-### 4. Investigation Phase (Parallel)
+### 4. Claim Verification — BLOCKING GATE
+<claim_verification_enforcement>
+STOP. Before ANY agent writes implementation code, the claim in the issue MUST be verified
+with hard evidence. We shipped inheritEnvKeys because an LLM said Claude Code strips env vars
+— it doesn't. We got burned shipping a fix for an unverified claim. Never again.
+</claim_verification_enforcement>
+**Every issue makes a claim. Verify it BEFORE coding.**
+| Issue Type | Required Evidence | How to Get It |
+|------------|-------------------|---------------|
+| **Bug report** | Reproduce locally with a failing test or command | Run the exact steps from the report. If it doesn't fail, the bug may not exist. |
+| **Feature request claiming behavior X** | Prove behavior X actually happens | Check official docs, source code, or web search. NOT LLM knowledge — LLMs hallucinate platform behavior. |
+| **Feature request claiming perf issue** | Benchmark the actual impact | Measure before/after. No "it should be faster" — show numbers. |
+| **"Tool X sets env var Y"** | Find it in official source | `ctx_fetch_and_index` the platform's docs/source. Grep their repo. If you can't find it, it probably doesn't exist. |
+**Verification Steps:**
+1. **Architect agents** must produce a `CLAIM_VERDICT` before any Staff Engineer writes code:
+   ```
+   CLAIM: "{exact claim from the issue}"
+   EVIDENCE: {link to official doc, source file, or reproduction output}
+   VERDICT: CONFIRMED | UNCONFIRMED | HALLUCINATED
+   ```
+2. If `VERDICT: UNCONFIRMED` — do NOT implement. Instead, comment on the issue:
+   ```
+   We couldn't reproduce/verify this claim. Could you provide:
+   - Debug output from: npx context-mode doctor (or ctx-debug.sh)
+   - Exact steps to reproduce
+   - Platform version and OS
+   We want to fix this but need to confirm the problem exists first.
+   ```
+3. If `VERDICT: HALLUCINATED` — the reporter (or their LLM) made up a behavior that doesn't exist. Comment kindly explaining the misunderstanding. Close with "working as intended" if appropriate.
+4. Only `VERDICT: CONFIRMED` proceeds to the Investigation Phase below.
+**The `ctx-debug.sh` script exists for exactly this purpose.** When in doubt, ask the reporter to run it and paste the output.
+### 5. Investigation Phase (Parallel)
 All agents investigate simultaneously:
@@ -98,7 +140,7 @@ All agents investigate simultaneously:
 - Run full affected adapter tests
 - Report: DRAFT_FIX with RED→GREEN evidence for each behavior
-### 5. Ping-Pong Review
+### 6. Ping-Pong Review
 Route Staff Engineer outputs to their paired Architects:
@@ -110,7 +152,7 @@ EM reads Staff Engineer result
   → Max 2 rounds, then EM decides
 ```
-### 6. Validate (QA Engineer)
+### 7. Validate (QA Engineer)
 QA Engineer runs the full validation matrix:
@@ -142,7 +184,7 @@ TypeScript:    ✓ no errors
 Full Suite:    ✓ 47/47 passed
 ```
-### 7. Push Directly to `next`
+### 8. Push Directly to `next`
 **Do NOT open a PR.** Push fixes directly to the `next` branch:
@@ -167,7 +209,7 @@ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>"
 git push origin next
 ```
-### 8. Comment on Issue & Close
+### 9. Comment on Issue & Close
 After pushing to `next`, comment and **close the issue immediately**:
@@ -198,8 +240,14 @@ gh issue close {N}
 ## Decision Tree: Fix vs. Wontfix vs. Needs Info
 ```
-Issue is clear and reproducible?
-├── YES → Fix it (steps 3-8 above)
+Issue makes a claim about platform behavior?
+├── YES → Run Claim Verification (Step 4) FIRST
+│   ├── CONFIRMED → Fix it (steps 5-9 above)
+│   ├── UNCONFIRMED → Request evidence (ctx-debug.sh output, repro steps)
+│   └── HALLUCINATED → Explain kindly, close if appropriate
+│
+Issue is clear and reproducible (no behavioral claim)?
+├── YES → Fix it (steps 5-9 above)
 ├── UNCLEAR → Comment asking for reproduction steps
 │   └── Template: "Could you share the exact command/config that triggers this?"
 └── BY DESIGN → Explain why, close with "working as intended" label

package/skills/context-mode-ops/validation.md CHANGED Viewed

@@ -2,6 +2,74 @@
 Cross-cutting validation rules used by ALL workflows (triage, review, release).
+## Problem Verification — FIRST GATE
+<problem_verification_enforcement>
+This is the FIRST validation step, before anything else. We shipped inheritEnvKeys because
+we trusted an LLM claim that Claude Code strips environment variables — it does not.
+We got burned shipping a fix for an unverified claim. Never again.
+Every bug report, feature request, and behavioral claim MUST be proven true before code is written.
+</problem_verification_enforcement>
+### For Bug Reports
+**Reproduce it or reject it.** Run the exact reproduction steps from the issue. If it doesn't fail, the bug may not exist.
+```
+Step 1: Extract the claimed reproduction steps from the issue
+Step 2: Run them locally (use ctx_execute or a test)
+Step 3: Record the ACTUAL output
+Step 4: Compare actual vs. claimed behavior
+Step 5: VERDICT:
+  → REPRODUCED: Bug is real, proceed to fix
+  → NOT_REPRODUCED: Ask reporter for ctx-debug.sh output and exact repro steps
+  → INVALID: Reporter's environment is misconfigured, help them fix it
+```
+### For Feature Requests
+**Verify the underlying claim.** Feature requests always contain an implicit claim ("X behaves this way", "Y is slow", "Z doesn't support W"). Prove the claim first.
+```
+Step 1: Identify the claim (e.g., "Claude Code strips env vars from child processes")
+Step 2: Find HARD EVIDENCE — official docs, source code, or measured benchmarks
+  → Use ctx_fetch_and_index on official docs/repos
+  → Use ctx_execute to run actual tests
+  → NEVER trust LLM knowledge about platform behavior — LLMs hallucinate this constantly
+Step 3: VERDICT:
+  → CONFIRMED: Claim is true, proceed to design
+  → UNCONFIRMED: Cannot verify — ask reporter for evidence before implementing
+  → DEBUNKED: Claim is false — comment on issue explaining the misunderstanding
+```
+### Requesting Evidence from Reporters
+When a claim cannot be verified, comment on the issue BEFORE implementing:
+```markdown
+We want to address this but need to verify the underlying behavior first.
+Could you provide:
+1. Output from: `npx context-mode doctor` (or run `ctx-debug.sh`)
+2. Exact reproduction steps
+3. Platform version, adapter, and OS
+We'll investigate as soon as we can confirm the issue. Thanks for reporting!
+```
+### Evidence Log
+Every triage MUST produce a verification entry:
+```
+CLAIM: "{exact claim}"
+SOURCE: {issue number or PR}
+EVIDENCE: {link to doc, test output, or benchmark result}
+VERDICT: CONFIRMED | UNCONFIRMED | DEBUNKED
+ACTION: {proceed | request-info | close-as-invalid}
+```
+---
 ## ENV Variable Verification
 LLMs frequently hallucinate environment variables. Every ENV var in an issue or PR must be verified.
@@ -229,6 +297,7 @@ npm run typecheck
 Every change, regardless of workflow, must pass:
+- [ ] **Problem verified** — CLAIM_VERDICT is CONFIRMED with hard evidence (this is gate zero)
 - [ ] `npm run typecheck` — 0 errors
 - [ ] `npm test` — all pass
 - [ ] Adapter tests — all 12 pass (or N/A if untouched)