npm - @heart-of-gold/toolkit - Versions diffs - 0.1.20 → 0.1.22 - Mend

@heart-of-gold/toolkit 0.1.20 → 0.1.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/package.json +1 -1
package/plugins/deep-thought/skills/brainstorm/SKILL.md +30 -0
package/plugins/deep-thought/skills/plan/SKILL.md +44 -8
package/plugins/deep-thought/skills/review/SKILL.md +9 -2
package/plugins/guide/skills/claude-code/SKILL.md +90 -190
package/plugins/marvin/skills/compound/SKILL.md +5 -1
package/plugins/marvin/skills/work/SKILL.md +11 -0
package/src/index.ts +1 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@heart-of-gold/toolkit",
-  "version": "0.1.20",
+  "version": "0.1.22",
   "type": "module",
   "description": "Cross-platform installer for Heart of Gold skills — works with Codex, OpenCode, Pi, Claude Code, and more",
   "bin": {

package/plugins/deep-thought/skills/brainstorm/SKILL.md CHANGED Viewed

@@ -112,6 +112,13 @@ Launch research agents **in parallel**:
 **If no relevant findings:** Say so. Don't invent relevance.
+If the task is design-heavy, copy-heavy, or boundary-sensitive, also surface:
+- relevant references already present in the repo
+- anti-references or known bad patterns from past work
+- whether a preview artifact will likely be required before autonomous implementation
+The goal is not only "what exists?" It is also "what should the future plan pull toward and stay away from?"
 **Exit:** Findings presented. User has seen what exists before exploring approaches.
 ---
@@ -129,6 +136,16 @@ Ask one question at a time. Start broad (purpose, users), narrow to specifics (c
 **If any open questions emerge:** You MUST ask the user about each one. Do not assume answers or defer them silently.
+If the chosen approach depends on taste, hierarchy, copy quality, workshop framing, or boundary judgment, you MUST also capture before leaving this phase:
+- target outcome
+- anti-goals
+- references
+- anti-references
+- tone or taste rules
+- representative proof slice
+- explicit rejection criteria
+- whether preview artifacts will be required
 **Exit when:**
 - The approach is clear and the user signals a decision
 - You've explored enough to choose (2-3 approaches with tradeoffs)
@@ -223,6 +240,19 @@ related:
 ## Why This Approach
 {Decision rationale — what it optimizes for, why alternatives were rejected}
+## Subjective Contract (when needed)
+- Target outcome: {What the result should feel or read like}
+- Anti-goals: {What it must not become}
+- References: {Positive models or repo examples}
+- Anti-references: {Patterns or tones to avoid}
+- Tone or taste rules: {Editorial, design, or teaching constraints}
+- Rejection criteria: {Concrete reasons to say the result is wrong}
+## Preview And Proof Slice (when needed)
+- Proof slice: {One representative slice to prove first}
+- Required preview artifacts: {HTML mockup, ASCII preview, screenshot comp, etc.}
+- Rollout rule: {When this can propagate broadly}
 ## Key Design Decisions
 ### Q1: {Decision topic} — RESOLVED

package/plugins/deep-thought/skills/plan/SKILL.md CHANGED Viewed

@@ -105,6 +105,13 @@ Search past plans for similar features. Surface proven patterns and past risks:
 See `../knowledge/active-memory-integration.md` for retrieval patterns.
+If the task is design-heavy, copy-heavy, or boundary-sensitive, also search for:
+- existing preview artifacts or mockups
+- repo docs that define tone, design, or boundary rules
+- prior plans that succeeded or drifted on subjective work
+Surface both positive models and anti-models. Strong autonomous planning needs references and anti-references, not just similar code.
 **Exit:** Codebase patterns known, past solutions surfaced, constraints identified.
 ### Autonomy Gate (Medium Challenge)
@@ -175,18 +182,32 @@ confidence: high | medium | low
 **All plans include:**
 1. **Title and one-line summary**
 2. **Problem Statement** — What's wrong or missing? Why does this matter?
-3. **Proposed Solution** — High-level approach
-4. **Implementation Tasks** — Checkboxes with dependency ordering. These become the tracker for `/work`.
-5. **Acceptance Criteria** — How do we know it's done? Measurable, testable.
+3. **Target End State** — What should be true when this lands?
+4. **Scope and Non-Goals** — What is explicitly outside this change?
+5. **Proposed Solution** — High-level approach
+6. **Implementation Tasks** — Checkboxes with dependency ordering. These become the tracker for `/work`.
+7. **Acceptance Criteria** — How do we know it's done? Measurable, testable.
 **Standard and detailed plans also include:**
-6. **Decision Rationale** — Why this approach? Alternatives considered? Tradeoffs?
-7. **Assumptions** — What must be true for this plan to work? (see Assumption Audit below)
-8. **Risk Analysis** — What could go wrong? How do we mitigate it?
+8. **Decision Rationale** — Why this approach? Alternatives considered? Tradeoffs?
+9. **Constraints and Boundaries** — Architectural, editorial, release, privacy, or operating rules that stay fixed
+10. **Assumptions** — What must be true for this plan to work? (see Assumption Audit below)
+11. **Risk Analysis** — What could go wrong? How do we mitigate it?
 **Detailed plans also include:**
-9. **Phased Implementation** — Phases with exit criteria per phase
-10. **References** — Links to brainstorm, relevant code, past solutions
+12. **Phased Implementation** — Phases with exit criteria per phase
+13. **References** — Links to brainstorm, relevant code, past solutions
+**For subjective or boundary-sensitive plans, also include:**
+- **Target outcome** — the felt or perceived result, not just the structural change
+- **Anti-goals** — what the result must not become
+- **References** — positive models or repo examples
+- **Anti-references** — patterns or tones to avoid
+- **Tone or taste rules** — explicit editorial, design, or teaching constraints
+- **Representative proof slice** — one slice to prove before propagation
+- **Rollout rule** — when it is safe to spread the pattern
+- **Rejection criteria** — what makes the result wrong even if it compiles
+- **Required preview artifacts** — the concrete previews needed before autonomous `work` starts
 **Confidence calibration (stated in frontmatter and body):**
 - **High:** Clear requirements + existing codebase patterns + bounded scope
@@ -234,6 +255,21 @@ Before finalizing, identify the assumptions the plan depends on and run the Recu
 See `../knowledge/discovery-patterns.md` → "Recursive Why" for the loop technique.
+### Subjective Contract And Preview Gate
+If the task materially changes UI, copy, information architecture, facilitation flow, or a trust boundary judged partly by human review, the plan must additionally include:
+- target outcome
+- anti-goals
+- references
+- anti-references
+- tone or taste rules
+- representative proof slice
+- rollout rule
+- rejection criteria
+- required preview artifacts
+For design-heavy tasks, autonomous `work` should not start until at least one preview artifact exists. Acceptable preview forms include an HTML/static mockup, a terminal-friendly structural preview, or another concrete representation that makes drift obvious before implementation. The plan should also say who reviews the preview and what failure sends the work back to planning.
 **Exit:** Plan document written.
 ---

package/plugins/deep-thought/skills/review/SKILL.md CHANGED Viewed

@@ -76,6 +76,7 @@ Then gather project context:
 2. Check what the change touches — auth, scoring, data, migrations, money → extra scrutiny
 3. Read the related plan or brainstorm if referenced in commits or PR description
 4. Note what's tested and what's not in the diff
+5. If the work is design-heavy, copy-heavy, or boundary-sensitive, check whether the plan included non-goals, references, anti-references, proof-slice discipline, and preview artifacts
 **Exit:** Conventions loaded, risk areas identified, related context read.
@@ -118,13 +119,16 @@ See `../knowledge/socratic-patterns.md` for CoVe technique details.
 ### For Document Reviews
-Read the full document — don't skim. Evaluate against five criteria:
+Read the full document — don't skim. Evaluate against eight criteria:
 1. **Does it explain WHY?** Decision rationale, not just "we'll use X."
 2. **Are risks identified?** What could go wrong? Mitigations?
 3. **Is the scope clear?** Explicit in/out of scope?
 4. **Are acceptance criteria measurable?** Testable? "Users can do X" not "the system is good."
-5. **Is it actionable?** Could someone start `/work` from this right now?
+5. **Are constraints and non-goals explicit?** Could autonomous work stay inside the lane?
+6. **Is propagation discipline clear?** Proof slice first or broad rollout?
+7. **If the work is subjective or boundary-sensitive, is the contract explicit?** References, anti-references, tone rules, rejection criteria.
+8. **If the work is design-heavy, is there a preview gate?** Concrete preview artifacts and a review step before implementation.
 ### For Architecture Reviews
@@ -220,6 +224,9 @@ Use **AskUserQuestion** with:
 - [ ] Checked scope (clear in/out)
 - [ ] Checked criteria (measurable, testable)
 - [ ] Checked actionability (can `/work` start from this?)
+- [ ] Checked non-goals and constraints (especially for autonomous work)
+- [ ] Checked subjective contract completeness when applicable
+- [ ] Checked preview artifacts and preview gate when applicable
 ## When NOT to Use /review

package/plugins/guide/skills/claude-code/SKILL.md CHANGED Viewed

@@ -5,259 +5,159 @@ description: Use when the user asks to run Claude Code CLI (`claude`, `claude re
 # Claude Code CLI Skill Guide
-Use this skill to hand a bounded task to Claude Code from the current harness, capture the result, and summarize it back to the user.
+Use this skill to run Claude Code directly, capture Claude's actual output, and summarize it back to the user.
-This skill is task-based, not review-only. Anthropic's `plan` mode is a read-only analysis mode, but because Claude Code frames it as "Plan Mode," do not treat it as the default for reviews. Prefer `default` for normal reviews and `acceptEdits` for implementation work unless you specifically want strict read-only behavior.
-The canonical execution path is direct `claude` CLI usage, mirroring the `codex` skill's direct `codex exec` pattern. Keep the workflow simple:
-- Ask which model to use.
-- Run one direct `claude` command.
-- If it works, summarize Claude's output.
-- If it fails or hangs, report that clearly and stop.
-Do not turn a failed Claude run into an improvised shell pipeline, background polling loop, or manual artifact-concatenation workaround unless the user explicitly asks for that style of invocation.
+Mirror the `codex` skill's style: short, direct, and command-first. Do not improvise. Do not use the wrapper as the normal path.
 ## Available Models
 | Model | Best for |
 | --- | --- |
-| `default` | Anthropic's recommended default for the current account and environment |
 | `sonnet` | Default recommendation for most coding, review, and analysis tasks |
 | `opus` | Stronger reasoning for ambiguous, high-stakes, or architecture-heavy work |
 | `haiku` | Fast, lightweight follow-ups and narrow questions |
-| `opusplan` | Hybrid mode that uses `opus` for planning and `sonnet` for execution |
-Prefer aliases over hardcoded snapshot names in this skill, because Anthropic documents aliases as the stable Claude Code interface and moves them forward as newer snapshots ship.
+| `default` | Anthropic's account/environment default |
+| `opusplan` | Planning-heavy tasks that intentionally mix planning and execution |
-Default recommendation: `sonnet` for most tasks, `opus` when depth matters more than speed, and `opusplan` when the task naturally alternates between planning and execution.
+Default recommendation: `sonnet` for most tasks, `opus` when depth matters more than speed.
 ## Permission Modes
-Choose the permission mode based on what Claude Code needs to do:
+Choose the permission mode yourself based on the task. Do not ask the user unless they explicitly want to control it.
-| Mode | Use when | Notes |
-| --- | --- | --- |
-| `plan` | You explicitly want strict read-only analysis with no command execution or edits | Best for exploration, planning, or artifact-only analysis; not the default recommendation for review |
-| `default` | Claude may need to inspect the repo more freely and you can tolerate permission prompts | Good interactive default; less reliable in headless automation unless permissions are pre-arranged |
-| `acceptEdits` | Claude should be able to make local code changes | Strong default for implementation, refactors, and fixes |
-| `bypassPermissions` | Full automation is required in a trusted sandbox and the user explicitly approves it | High risk; do not default to this |
+| Mode | Use when |
+| --- | --- |
+| `default` | Normal review, analysis, debugging, and repo inspection |
+| `acceptEdits` | Claude should edit files or implement changes |
+| `plan` | The user explicitly wants strict read-only planning / analysis |
+| `bypassPermissions` | Only with explicit user approval in a trusted environment |
-`plan` is not the general default for this skill. Use it only when you specifically want Claude in strict read-only analysis mode.
+Hard rule: do not use `plan` for ordinary reviews just because it sounds safe. For normal reviews, use `default`.
 ## Running a Task
-1. Always ask the user which model alias to use (default: `sonnet`) before running Claude Code, unless the user already specified the model.
-2. Ask for permission mode in the same prompt when the user did not specify it. Default to:
-   - `default` for reviews and analysis
-   - `acceptEdits` for implementation or refactoring work
-   - `plan` only when the user explicitly wants strict read-only analysis
-3. Decide whether Claude should receive the artifact directly or discover it itself:
-   - Pass diffs, logs, or file contents via stdin for safe read-only review
-   - Let Claude inspect the working tree when the task requires tool use
-4. Assemble the command with the appropriate options:
-   - `-p, --print` for non-interactive output
-   - `--output-format <text|json|stream-json>`
+1. Ask the user which model to use (default: `sonnet`) in one short question, unless they already specified it.
+2. Infer the permission mode from the task:
+   - review / analysis / debugging: `default`
+   - implementation / refactor / fix: `acceptEdits`
+   - strict read-only planning requested by the user: `plan`
+3. Pick a sane `--max-turns`:
+   - narrow question or follow-up: `3`
+   - normal file or diff review: `6`
+   - multi-directory repo review or investigation: `8`
+   - implementation / refactor: `10` to `12`
+4. Do not add `--effort high` by default. Use the CLI default unless the user explicitly wants deeper reasoning or the task is unusually ambiguous.
+5. Assemble the direct `claude` command with the appropriate options:
+   - `-p, --print`
+   - `--output-format text` for short runs that should complete quickly
+   - `--output-format stream-json --verbose` for long repo reviews, investigations, or any run where you need progress visibility
    - `--model <MODEL>`
    - `--permission-mode <MODE>`
-   - `--effort <low|medium|high|max>`
-   - `--max-turns <N>` to bound cost and runtime
-   - `--allowedTools` / `--disallowedTools` to constrain tool access
-   - `--add-dir <DIR>` when Claude must read outside the current working directory
-   - `--name <NAME>` when you want a stable, human-readable session
-   - `"your prompt here"` as the final positional argument
-5. Prefer `--output-format text` for human-readable summaries and `--output-format json` for automation or machine parsing.
-6. In headless automation, prefer explicit permissions:
-   - `default` for most review and analysis runs
-   - `acceptEdits` for edit-capable runs
-   - `plan` only when you specifically want read-only analysis with no command execution
-7. Run the command, capture stdout/stderr, and summarize the outcome for the user.
-8. If Claude Code does not actually return output, stop and report that failure. Do not substitute your own review or analysis and present it as if it came from Claude Code.
-9. Do not add fallback layers automatically. One direct Claude attempt is the normal path. A retry is acceptable only when the user asks for it or when you have a concrete reason to change one thing, such as model, permission mode, or execution environment.
+   - `--max-turns <N>`
+   - `"your prompt here"` immediately after `-p` when using variadic flags like `--allowedTools`
+   - `--allowedTools <TOOL> [<TOOL> ...]` only to pre-approve tools without prompting
+   - `--disallowedTools <TOOL> [<TOOL> ...]` when you actually need to prevent specific tools
+6. Run the command, capture the output, and summarize Claude's actual result.
+7. If Claude does not actually return output, stop and report failure plainly.
 ## Codex Execution
-When this skill is used from Codex, the skill cannot bypass Codex sandbox policy on its own. Instead, instruct Codex to choose the correct execution path:
-- Prefer the direct `claude` command patterns in this skill. Do not silently substitute the bundled wrapper for the normal path just because it exists.
-- In this environment, actual `claude -p` runs have been verified to hang inside the Codex sandbox but succeed outside it. Treat reviewer-approved elevated execution as the default practical path for real Claude invocations.
-- Use the sandboxed path only when the user explicitly wants to test sandbox behavior or when you have a concrete reason to believe the sandbox limitation no longer applies.
-- If Claude Code needs network access, auth refresh, access outside the writable sandbox, or the in-sandbox run fails or hangs for likely sandbox reasons, rerun via `exec_command` with `sandbox_permissions: "require_escalated"`.
-- If you verify that direct `claude -p` works outside the sandbox but hangs inside it in the current environment, stop treating the sandboxed path as normal. For the rest of that Codex session, prefer reviewer-approved elevated execution as the default path for actual Claude invocations.
-- Use a concise justification that tells the reviewer why Claude Code needs elevation, for example:
-  - "Do you want me to run Claude Code with network access so it can authenticate and complete this review?"
-  - "Do you want me to rerun Claude Code outside the sandbox because the in-sandbox headless run appears blocked by permissions?"
-- Prefer a narrow approval request. Ask for elevation only for the Claude invocation that needs it, not unrelated commands.
-- If the task is high-risk, restate the chosen Claude permission mode when asking for elevation.
+When this skill is used from Codex, use the execution path that actually works here.
-## Best Practices
+- For real Claude runs in this environment, prefer `exec_command` with `sandbox_permissions: "require_escalated"`.
+- Do not treat the sandboxed path as normal. Direct `claude -p` has been verified to work outside the sandbox and to hang or misbehave inside it.
+- For long repo reviews from Codex, prefer `--output-format stream-json --verbose` so progress is visible. Plain `text` output can remain silent for a long time even when Claude is actively working.
+- Use a narrow approval request tied to the Claude invocation, for example:
+  - "Do you want me to run Claude Code outside the sandbox so it can complete this review?"
+  - "Do you want me to run Claude Code with network/auth access so it can complete this task?"
+- Once elevated Claude execution is known to work for the current task, keep using that path. Do not retry the sandboxed path again during the same task.
-- Use `--max-turns` for automation. It keeps review and implementation runs bounded.
-- Use `--output-format json` when another agent or script needs structured output.
-- Use `--allowedTools` to make headless runs more reliable and safer.
-- Prefer letting Claude inspect the repo directly for normal review and implementation tasks.
-- Pass diffs or logs via stdin only when the user explicitly wants artifact-only review or when direct repo access is intentionally unavailable.
-- Use `acceptEdits` for real implementation work; do not force read-only `plan` mode onto edit tasks.
-- Use `bypassPermissions` only in a trusted sandbox and only with explicit user approval.
-- Use `-r latest -p` for follow-up instead of re-explaining the entire task.
-- Avoid giant `cat file1 file2 ... | claude ...` constructions as a default strategy. They are brittle, hard to inspect, and a poor substitute for direct Claude access to the repo.
+## Recommended Command Patterns
-## Recommended Patterns
+### 1. Normal repo review
-### 1. Review the repo directly
-Use this as the normal review path:
+Use this as the default review path:
 ```bash
 claude -p \
-  --output-format text \
+  "Review this repository and return findings ordered by severity with file references where possible." \
+  --output-format stream-json \
+  --verbose \
   --model sonnet \
   --permission-mode default \
-  --max-turns 4 \
-  "Review how the workshop skill is designed in this repo and whether it should be improved. Return findings ordered by severity."
+  --max-turns 8 \
+  --allowedTools "Read" "Grep" "Glob" "LS" "Bash(git status:*)" "Bash(git diff:*)" \
+  --disallowedTools "Edit" "Write" "NotebookEdit"
 ```
-### 2. Review a provided diff only
-Use stdin only when you intentionally want a bounded artifact review:
+### 2. Review current changes only
 ```bash
 git diff --staged | claude -p \
+  "Review this diff and return findings ordered by severity with concise explanations." \
   --output-format text \
   --model sonnet \
   --permission-mode default \
-  --max-turns 1 \
-  "Review this diff. Return findings ordered by severity with file paths and concise explanations."
+  --max-turns 6
 ```
-### 3. Ask Claude to inspect the repo and analyze with tighter tool bounds
+### 3. Multi-directory repo investigation
-Use this when Claude should inspect the repo but you want tighter limits:
+Use this when the prompt names several directories or asks for deeper inspection:
 ```bash
 claude -p \
-  --output-format text \
-  --model sonnet \
+  "Inspect the relevant parts of this repo, evaluate the design, and return evidence-based findings ordered by severity." \
+  --output-format stream-json \
+  --verbose \
+  --model opus \
   --permission-mode default \
-  --max-turns 4 \
-  --allowedTools "Read,Grep,Glob,Bash(git diff:*),Bash(git status:*)" \
-  "Analyze the current changes and explain the main design risks."
+  --max-turns 8 \
+  --allowedTools "Read" "Grep" "Glob" "LS" "Bash(git status:*)" "Bash(git diff:*)" \
+  --disallowedTools "Edit" "Write" "NotebookEdit"
 ```
-### 4. Ask Claude to implement or refactor
-Use `acceptEdits` when Claude should change files:
+### 4. Implementation or refactor
 ```bash
 claude -p \
+  "Implement the requested change in this repo, keep the diff minimal, and summarize what changed." \
   --output-format text \
   --model sonnet \
   --permission-mode acceptEdits \
-  --effort high \
-  --max-turns 8 \
-  "Implement the requested change in the current repo, keep the diff minimal, and summarize what changed."
-```
-### 5. Resume the latest Claude Code session
-```bash
-claude -r latest -p \
-  --output-format text \
-  "Focus only on the migration risk you mentioned earlier. What is the safest rollout plan?"
-```
-### 6. Request structured output for automation
-```bash
-claude -p \
-  --output-format json \
-  --model sonnet \
-  --permission-mode default \
-  --max-turns 1 \
-  "Summarize the provided diff as JSON with keys: verdict, findings, risks."
+  --max-turns 12
 ```
-### 7. Use strict read-only analysis mode explicitly
-Use `plan` only when you want Claude prevented from executing commands or editing files:
+### 5. Resume the latest session
 ```bash
-git diff --staged | claude -p \
-  --output-format text \
-  --model sonnet \
-  --permission-mode plan \
-  --max-turns 1 \
-  "Analyze this diff and explain the key risks. Keep the response concise."
+claude -r latest -p "Focus only on the open issue you identified earlier and propose the safest next step." \
+  --output-format text
 ```
-## Optional Wrapper
-This skill also ships a convenience wrapper at `scripts/run-claude-code.sh`.
-Use it as a thin helper around the documented CLI patterns above. It supports:
-- `--check`
-- `--prompt`
-- `--model`
-- `--permission-mode`
-- `--effort`
-- `--max-turns`
-- `--output-format`
-- `--allowed-tools`
-- `--disallowed-tools`
-- `--add-dir`
-- `--resume`
-- `--continue`
-- `--name`
-- `--no-session-persistence`
-- `--verbose`
-- `--timeout-seconds`
-Do not treat the wrapper as the default execution path for this skill. Use it only when:
-- the user explicitly asks to use the wrapper
-- you are debugging Claude Code hangs or invocation edge cases
-- you need its optional timeout helper for diagnosis
-Otherwise, use the direct `claude` commands in this document.
 ## Following Up
-- After every Claude Code command, confirm whether the user wants a follow-up, a resume, or a different model / permission mode.
-- Restate the chosen model, effort, and permission mode when proposing a retry.
-- Keep resume prompts narrow. Avoid restating the whole task unless context was lost.
-## Critical Evaluation of Claude Code Output
-Claude Code is a colleague, not an authority.
-### Guidelines
-- Trust your own knowledge when confident.
-- Push back if Claude Code makes a claim that conflicts with the code or docs in front of you.
-- Research disagreements before accepting them for high-impact decisions.
-- Do not defer blindly on models, permissions, or fast-moving best practices.
-### When Claude Code Seems Wrong
-1. State the disagreement clearly to the user.
-2. Provide evidence from the codebase, documentation, or your own verification.
-3. Optionally resume the Claude Code session with a corrective prompt:
-```bash
-claude -r latest -p \
-  --output-format text \
-  "I disagree with your earlier conclusion because [evidence]. Re-evaluate only that point."
-```
-4. Treat the exchange as a discussion, not a correction ritual.
+- After every Claude run, tell the user which model and permission mode were used.
+- Tell the user they can ask to resume the Claude session or rerun with a different model.
+- Keep follow-up prompts narrow.
 ## Error Handling
 - Stop and report failures whenever `claude --version` or a `claude -p` command exits non-zero.
-- If Claude Code hangs or times out, report that Claude did not complete in this environment. Do not continue with a self-authored substitute while implying it came from Claude.
-- If Claude Code reports permission issues, choose the correct permission mode or constrain tools explicitly.
-- If a headless run gets stuck on permissions, either switch to `plan`, use `acceptEdits`, or provide `--allowedTools` / a permission prompt tool.
-- In this environment, prefer reviewer-approved elevated execution for real Claude runs instead of treating sandbox execution as the normal path.
-- In Codex, if the likely cause is sandboxing or network denial, rerun with reviewer-approved `require_escalated` execution instead of repeatedly retrying the same sandboxed command.
-- In Codex, if you have already confirmed that elevated execution works and sandboxed execution hangs, do not re-run the sandboxed path again during the same task.
-- Do not use `bypassPermissions` unless the user explicitly approves it.
-- If the direct `claude` path fails, report that failure directly. Do not route around it by pretending a wrapper run or a self-authored review is equivalent to Claude output.
-- Do not automatically fall back to concatenating many files into stdin just because a direct Claude run failed once.
+- If Claude reaches `max turns`, report that plainly. Do not paraphrase it as a hang.
+- If `--output-format text` stays silent during a long repo review, do not assume it is hung. Re-run or start with `--output-format stream-json --verbose` to confirm whether Claude is actively working.
+- If Claude truly hangs or returns no output even in streaming mode, report that plainly and stop.
+- Do not substitute your own review and imply it came from Claude.
+- Do not automatically switch to `plan` after a failed `default` run.
+- Do not pass the prompt after `--allowedTools`, because this CLI can consume it as another allowed tool and then fail with "Input must be provided..."
+- Do not describe `--allowedTools` as a strict read-only sandbox. It pre-approves tools; it does not, by itself, ban other tools.
+- Do not automatically fall back to giant `cat file1 file2 ... | claude ...` pipelines.
+- Do not use the bundled wrapper unless the user explicitly asks for it or you are debugging the invocation itself.
+## Critical Evaluation
+Claude Code is a colleague, not an authority.
+- Trust your own technical judgment when the repo contradicts Claude.
+- Verify high-impact claims against the code or docs.
+- If needed, resume the Claude session with a corrective prompt instead of silently accepting a bad conclusion.

package/plugins/marvin/skills/compound/SKILL.md CHANGED Viewed

@@ -59,6 +59,8 @@ Auto-detect capture type:
   3. label: "Learning", description: "Pattern, preference, or principle for CLAUDE.md or memory"
 - multiSelect: false
+Repeated workflow friction counts as captureable knowledge. If the same clarification keeps happening during `brainstorm`, `plan`, `work`, or `review`, capture whether it belongs in repo doctrine or in reusable toolkit workflow guidance.
 **Exit:** Capture type determined.
 ---
@@ -119,7 +121,8 @@ Determine the right location:
 ### Learning Capture
 - **Project-specific:** Add to project CLAUDE.md or memory files
-- **Toolkit-wide:** Add to the relevant plugin's `../knowledge/` directory
+- **Toolkit-wide:** Add to the relevant plugin's `../knowledge/` directory or skill docs
+- **Hybrid:** Split repo-local truth from toolkit-wide truth explicitly instead of mixing them
 - Keep it concise — one pattern per entry
 **Exit:** Knowledge extracted and structured.
@@ -220,6 +223,7 @@ Before delivering, verify:
 - [ ] **Verified:** Root cause confirmed, fix tested (not just proposed)
 - [ ] **No duplicates:** Searched existing docs — this is genuinely new
 - [ ] **Correctly typed:** Solution vs context doc vs learning — right structure and location
+- [ ] **Layered correctly:** Repo-local rule vs toolkit-wide workflow truth is explicit
 - [ ] **No application code modified** — knowledge documents only
 ## What Makes This Heart of Gold

package/plugins/marvin/skills/work/SKILL.md CHANGED Viewed

@@ -56,6 +56,15 @@ Prefer the harness's structured choice UI when available. Otherwise present conc
 **If anything in the plan is unclear:**
 Ask clarifying questions now — better to ask than build wrong.
+If the plan authorizes design-heavy, copy-heavy, or boundary-sensitive work, verify before leaving Phase 0 that the plan already includes:
+- target outcome and anti-goals
+- references and anti-references
+- proof-slice or rollout rule
+- explicit rejection criteria
+- preview artifacts when the task depends on human judgment
+If those items are missing, stop and return to planning. Do not invent the missing subjective contract during implementation.
 ### Autonomy Activation
 When a plan file is provided: **default to autonomous mode.** Plans are pre-approved decisions — execute without re-litigating them.
@@ -137,6 +146,8 @@ while (unchecked tasks remain):
 **Stage specific files — never `git add .`**
+If the plan requires a proof slice, do not propagate beyond that slice until its verification passes. If preview review or plan adherence fails, update the plan instead of improvising the missing contract in code.
 **Exit:** All tasks checked off, all tests passing.
 ---

package/src/index.ts CHANGED Viewed

@@ -7,7 +7,7 @@ import { targetsCommand } from "./commands/targets";
 const main = defineCommand({
   meta: {
     name: "heart-of-gold",
-    version: "0.1.20",
+    version: "0.1.22",
     description:
       "Cross-platform installer for Heart of Gold skills — Codex, OpenCode, Pi, Claude Code, and more",
   },