@heart-of-gold/toolkit 0.1.20 → 0.1.22

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@heart-of-gold/toolkit",
3
- "version": "0.1.20",
3
+ "version": "0.1.22",
4
4
  "type": "module",
5
5
  "description": "Cross-platform installer for Heart of Gold skills — works with Codex, OpenCode, Pi, Claude Code, and more",
6
6
  "bin": {
@@ -112,6 +112,13 @@ Launch research agents **in parallel**:
112
112
 
113
113
  **If no relevant findings:** Say so. Don't invent relevance.
114
114
 
115
+ If the task is design-heavy, copy-heavy, or boundary-sensitive, also surface:
116
+ - relevant references already present in the repo
117
+ - anti-references or known bad patterns from past work
118
+ - whether a preview artifact will likely be required before autonomous implementation
119
+
120
+ The goal is not only "what exists?" It is also "what should the future plan pull toward and stay away from?"
121
+
115
122
  **Exit:** Findings presented. User has seen what exists before exploring approaches.
116
123
 
117
124
  ---
@@ -129,6 +136,16 @@ Ask one question at a time. Start broad (purpose, users), narrow to specifics (c
129
136
 
130
137
  **If any open questions emerge:** You MUST ask the user about each one. Do not assume answers or defer them silently.
131
138
 
139
+ If the chosen approach depends on taste, hierarchy, copy quality, workshop framing, or boundary judgment, you MUST also capture before leaving this phase:
140
+ - target outcome
141
+ - anti-goals
142
+ - references
143
+ - anti-references
144
+ - tone or taste rules
145
+ - representative proof slice
146
+ - explicit rejection criteria
147
+ - whether preview artifacts will be required
148
+
132
149
  **Exit when:**
133
150
  - The approach is clear and the user signals a decision
134
151
  - You've explored enough to choose (2-3 approaches with tradeoffs)
@@ -223,6 +240,19 @@ related:
223
240
  ## Why This Approach
224
241
  {Decision rationale — what it optimizes for, why alternatives were rejected}
225
242
 
243
+ ## Subjective Contract (when needed)
244
+ - Target outcome: {What the result should feel or read like}
245
+ - Anti-goals: {What it must not become}
246
+ - References: {Positive models or repo examples}
247
+ - Anti-references: {Patterns or tones to avoid}
248
+ - Tone or taste rules: {Editorial, design, or teaching constraints}
249
+ - Rejection criteria: {Concrete reasons to say the result is wrong}
250
+
251
+ ## Preview And Proof Slice (when needed)
252
+ - Proof slice: {One representative slice to prove first}
253
+ - Required preview artifacts: {HTML mockup, ASCII preview, screenshot comp, etc.}
254
+ - Rollout rule: {When this can propagate broadly}
255
+
226
256
  ## Key Design Decisions
227
257
 
228
258
  ### Q1: {Decision topic} — RESOLVED
@@ -105,6 +105,13 @@ Search past plans for similar features. Surface proven patterns and past risks:
105
105
 
106
106
  See `../knowledge/active-memory-integration.md` for retrieval patterns.
107
107
 
108
+ If the task is design-heavy, copy-heavy, or boundary-sensitive, also search for:
109
+ - existing preview artifacts or mockups
110
+ - repo docs that define tone, design, or boundary rules
111
+ - prior plans that succeeded or drifted on subjective work
112
+
113
+ Surface both positive models and anti-models. Strong autonomous planning needs references and anti-references, not just similar code.
114
+
108
115
  **Exit:** Codebase patterns known, past solutions surfaced, constraints identified.
109
116
 
110
117
  ### Autonomy Gate (Medium Challenge)
@@ -175,18 +182,32 @@ confidence: high | medium | low
175
182
  **All plans include:**
176
183
  1. **Title and one-line summary**
177
184
  2. **Problem Statement** — What's wrong or missing? Why does this matter?
178
- 3. **Proposed Solution** — High-level approach
179
- 4. **Implementation Tasks** — Checkboxes with dependency ordering. These become the tracker for `/work`.
180
- 5. **Acceptance Criteria** — How do we know it's done? Measurable, testable.
185
+ 3. **Target End State** — What should be true when this lands?
186
+ 4. **Scope and Non-Goals** — What is explicitly outside this change?
187
+ 5. **Proposed Solution** — High-level approach
188
+ 6. **Implementation Tasks** — Checkboxes with dependency ordering. These become the tracker for `/work`.
189
+ 7. **Acceptance Criteria** — How do we know it's done? Measurable, testable.
181
190
 
182
191
  **Standard and detailed plans also include:**
183
- 6. **Decision Rationale** — Why this approach? Alternatives considered? Tradeoffs?
184
- 7. **Assumptions** — What must be true for this plan to work? (see Assumption Audit below)
185
- 8. **Risk Analysis** — What could go wrong? How do we mitigate it?
192
+ 8. **Decision Rationale** — Why this approach? Alternatives considered? Tradeoffs?
193
+ 9. **Constraints and Boundaries** — Architectural, editorial, release, privacy, or operating rules that stay fixed
194
+ 10. **Assumptions** — What must be true for this plan to work? (see Assumption Audit below)
195
+ 11. **Risk Analysis** — What could go wrong? How do we mitigate it?
186
196
 
187
197
  **Detailed plans also include:**
188
- 9. **Phased Implementation** — Phases with exit criteria per phase
189
- 10. **References** — Links to brainstorm, relevant code, past solutions
198
+ 12. **Phased Implementation** — Phases with exit criteria per phase
199
+ 13. **References** — Links to brainstorm, relevant code, past solutions
200
+
201
+ **For subjective or boundary-sensitive plans, also include:**
202
+ - **Target outcome** — the felt or perceived result, not just the structural change
203
+ - **Anti-goals** — what the result must not become
204
+ - **References** — positive models or repo examples
205
+ - **Anti-references** — patterns or tones to avoid
206
+ - **Tone or taste rules** — explicit editorial, design, or teaching constraints
207
+ - **Representative proof slice** — one slice to prove before propagation
208
+ - **Rollout rule** — when it is safe to spread the pattern
209
+ - **Rejection criteria** — what makes the result wrong even if it compiles
210
+ - **Required preview artifacts** — the concrete previews needed before autonomous `work` starts
190
211
 
191
212
  **Confidence calibration (stated in frontmatter and body):**
192
213
  - **High:** Clear requirements + existing codebase patterns + bounded scope
@@ -234,6 +255,21 @@ Before finalizing, identify the assumptions the plan depends on and run the Recu
234
255
 
235
256
  See `../knowledge/discovery-patterns.md` → "Recursive Why" for the loop technique.
236
257
 
258
+ ### Subjective Contract And Preview Gate
259
+
260
+ If the task materially changes UI, copy, information architecture, facilitation flow, or a trust boundary judged partly by human review, the plan must additionally include:
261
+ - target outcome
262
+ - anti-goals
263
+ - references
264
+ - anti-references
265
+ - tone or taste rules
266
+ - representative proof slice
267
+ - rollout rule
268
+ - rejection criteria
269
+ - required preview artifacts
270
+
271
+ For design-heavy tasks, autonomous `work` should not start until at least one preview artifact exists. Acceptable preview forms include an HTML/static mockup, a terminal-friendly structural preview, or another concrete representation that makes drift obvious before implementation. The plan should also say who reviews the preview and what failure sends the work back to planning.
272
+
237
273
  **Exit:** Plan document written.
238
274
 
239
275
  ---
@@ -76,6 +76,7 @@ Then gather project context:
76
76
  2. Check what the change touches — auth, scoring, data, migrations, money → extra scrutiny
77
77
  3. Read the related plan or brainstorm if referenced in commits or PR description
78
78
  4. Note what's tested and what's not in the diff
79
+ 5. If the work is design-heavy, copy-heavy, or boundary-sensitive, check whether the plan included non-goals, references, anti-references, proof-slice discipline, and preview artifacts
79
80
 
80
81
  **Exit:** Conventions loaded, risk areas identified, related context read.
81
82
 
@@ -118,13 +119,16 @@ See `../knowledge/socratic-patterns.md` for CoVe technique details.
118
119
 
119
120
  ### For Document Reviews
120
121
 
121
- Read the full document — don't skim. Evaluate against five criteria:
122
+ Read the full document — don't skim. Evaluate against eight criteria:
122
123
 
123
124
  1. **Does it explain WHY?** Decision rationale, not just "we'll use X."
124
125
  2. **Are risks identified?** What could go wrong? Mitigations?
125
126
  3. **Is the scope clear?** Explicit in/out of scope?
126
127
  4. **Are acceptance criteria measurable?** Testable? "Users can do X" not "the system is good."
127
- 5. **Is it actionable?** Could someone start `/work` from this right now?
128
+ 5. **Are constraints and non-goals explicit?** Could autonomous work stay inside the lane?
129
+ 6. **Is propagation discipline clear?** Proof slice first or broad rollout?
130
+ 7. **If the work is subjective or boundary-sensitive, is the contract explicit?** References, anti-references, tone rules, rejection criteria.
131
+ 8. **If the work is design-heavy, is there a preview gate?** Concrete preview artifacts and a review step before implementation.
128
132
 
129
133
  ### For Architecture Reviews
130
134
 
@@ -220,6 +224,9 @@ Use **AskUserQuestion** with:
220
224
  - [ ] Checked scope (clear in/out)
221
225
  - [ ] Checked criteria (measurable, testable)
222
226
  - [ ] Checked actionability (can `/work` start from this?)
227
+ - [ ] Checked non-goals and constraints (especially for autonomous work)
228
+ - [ ] Checked subjective contract completeness when applicable
229
+ - [ ] Checked preview artifacts and preview gate when applicable
223
230
 
224
231
  ## When NOT to Use /review
225
232
 
@@ -5,259 +5,159 @@ description: Use when the user asks to run Claude Code CLI (`claude`, `claude re
5
5
 
6
6
  # Claude Code CLI Skill Guide
7
7
 
8
- Use this skill to hand a bounded task to Claude Code from the current harness, capture the result, and summarize it back to the user.
8
+ Use this skill to run Claude Code directly, capture Claude's actual output, and summarize it back to the user.
9
9
 
10
- This skill is task-based, not review-only. Anthropic's `plan` mode is a read-only analysis mode, but because Claude Code frames it as "Plan Mode," do not treat it as the default for reviews. Prefer `default` for normal reviews and `acceptEdits` for implementation work unless you specifically want strict read-only behavior.
11
-
12
- The canonical execution path is direct `claude` CLI usage, mirroring the `codex` skill's direct `codex exec` pattern. Keep the workflow simple:
13
-
14
- - Ask which model to use.
15
- - Run one direct `claude` command.
16
- - If it works, summarize Claude's output.
17
- - If it fails or hangs, report that clearly and stop.
18
-
19
- Do not turn a failed Claude run into an improvised shell pipeline, background polling loop, or manual artifact-concatenation workaround unless the user explicitly asks for that style of invocation.
10
+ Mirror the `codex` skill's style: short, direct, and command-first. Do not improvise. Do not use the wrapper as the normal path.
20
11
 
21
12
  ## Available Models
22
13
 
23
14
  | Model | Best for |
24
15
  | --- | --- |
25
- | `default` | Anthropic's recommended default for the current account and environment |
26
16
  | `sonnet` | Default recommendation for most coding, review, and analysis tasks |
27
17
  | `opus` | Stronger reasoning for ambiguous, high-stakes, or architecture-heavy work |
28
18
  | `haiku` | Fast, lightweight follow-ups and narrow questions |
29
- | `opusplan` | Hybrid mode that uses `opus` for planning and `sonnet` for execution |
30
-
31
- Prefer aliases over hardcoded snapshot names in this skill, because Anthropic documents aliases as the stable Claude Code interface and moves them forward as newer snapshots ship.
19
+ | `default` | Anthropic's account/environment default |
20
+ | `opusplan` | Planning-heavy tasks that intentionally mix planning and execution |
32
21
 
33
- Default recommendation: `sonnet` for most tasks, `opus` when depth matters more than speed, and `opusplan` when the task naturally alternates between planning and execution.
22
+ Default recommendation: `sonnet` for most tasks, `opus` when depth matters more than speed.
34
23
 
35
24
  ## Permission Modes
36
25
 
37
- Choose the permission mode based on what Claude Code needs to do:
26
+ Choose the permission mode yourself based on the task. Do not ask the user unless they explicitly want to control it.
38
27
 
39
- | Mode | Use when | Notes |
40
- | --- | --- | --- |
41
- | `plan` | You explicitly want strict read-only analysis with no command execution or edits | Best for exploration, planning, or artifact-only analysis; not the default recommendation for review |
42
- | `default` | Claude may need to inspect the repo more freely and you can tolerate permission prompts | Good interactive default; less reliable in headless automation unless permissions are pre-arranged |
43
- | `acceptEdits` | Claude should be able to make local code changes | Strong default for implementation, refactors, and fixes |
44
- | `bypassPermissions` | Full automation is required in a trusted sandbox and the user explicitly approves it | High risk; do not default to this |
28
+ | Mode | Use when |
29
+ | --- | --- |
30
+ | `default` | Normal review, analysis, debugging, and repo inspection |
31
+ | `acceptEdits` | Claude should edit files or implement changes |
32
+ | `plan` | The user explicitly wants strict read-only planning / analysis |
33
+ | `bypassPermissions` | Only with explicit user approval in a trusted environment |
45
34
 
46
- `plan` is not the general default for this skill. Use it only when you specifically want Claude in strict read-only analysis mode.
35
+ Hard rule: do not use `plan` for ordinary reviews just because it sounds safe. For normal reviews, use `default`.
47
36
 
48
37
  ## Running a Task
49
- 1. Always ask the user which model alias to use (default: `sonnet`) before running Claude Code, unless the user already specified the model.
50
- 2. Ask for permission mode in the same prompt when the user did not specify it. Default to:
51
- - `default` for reviews and analysis
52
- - `acceptEdits` for implementation or refactoring work
53
- - `plan` only when the user explicitly wants strict read-only analysis
54
- 3. Decide whether Claude should receive the artifact directly or discover it itself:
55
- - Pass diffs, logs, or file contents via stdin for safe read-only review
56
- - Let Claude inspect the working tree when the task requires tool use
57
- 4. Assemble the command with the appropriate options:
58
- - `-p, --print` for non-interactive output
59
- - `--output-format <text|json|stream-json>`
38
+
39
+ 1. Ask the user which model to use (default: `sonnet`) in one short question, unless they already specified it.
40
+ 2. Infer the permission mode from the task:
41
+ - review / analysis / debugging: `default`
42
+ - implementation / refactor / fix: `acceptEdits`
43
+ - strict read-only planning requested by the user: `plan`
44
+ 3. Pick a sane `--max-turns`:
45
+ - narrow question or follow-up: `3`
46
+ - normal file or diff review: `6`
47
+ - multi-directory repo review or investigation: `8`
48
+ - implementation / refactor: `10` to `12`
49
+ 4. Do not add `--effort high` by default. Use the CLI default unless the user explicitly wants deeper reasoning or the task is unusually ambiguous.
50
+ 5. Assemble the direct `claude` command with the appropriate options:
51
+ - `-p, --print`
52
+ - `--output-format text` for short runs that should complete quickly
53
+ - `--output-format stream-json --verbose` for long repo reviews, investigations, or any run where you need progress visibility
60
54
  - `--model <MODEL>`
61
55
  - `--permission-mode <MODE>`
62
- - `--effort <low|medium|high|max>`
63
- - `--max-turns <N>` to bound cost and runtime
64
- - `--allowedTools` / `--disallowedTools` to constrain tool access
65
- - `--add-dir <DIR>` when Claude must read outside the current working directory
66
- - `--name <NAME>` when you want a stable, human-readable session
67
- - `"your prompt here"` as the final positional argument
68
- 5. Prefer `--output-format text` for human-readable summaries and `--output-format json` for automation or machine parsing.
69
- 6. In headless automation, prefer explicit permissions:
70
- - `default` for most review and analysis runs
71
- - `acceptEdits` for edit-capable runs
72
- - `plan` only when you specifically want read-only analysis with no command execution
73
- 7. Run the command, capture stdout/stderr, and summarize the outcome for the user.
74
- 8. If Claude Code does not actually return output, stop and report that failure. Do not substitute your own review or analysis and present it as if it came from Claude Code.
75
- 9. Do not add fallback layers automatically. One direct Claude attempt is the normal path. A retry is acceptable only when the user asks for it or when you have a concrete reason to change one thing, such as model, permission mode, or execution environment.
56
+ - `--max-turns <N>`
57
+ - `"your prompt here"` immediately after `-p` when using variadic flags like `--allowedTools`
58
+ - `--allowedTools <TOOL> [<TOOL> ...]` only to pre-approve tools without prompting
59
+ - `--disallowedTools <TOOL> [<TOOL> ...]` when you actually need to prevent specific tools
60
+ 6. Run the command, capture the output, and summarize Claude's actual result.
61
+ 7. If Claude does not actually return output, stop and report failure plainly.
76
62
 
77
63
  ## Codex Execution
78
64
 
79
- When this skill is used from Codex, the skill cannot bypass Codex sandbox policy on its own. Instead, instruct Codex to choose the correct execution path:
80
-
81
- - Prefer the direct `claude` command patterns in this skill. Do not silently substitute the bundled wrapper for the normal path just because it exists.
82
- - In this environment, actual `claude -p` runs have been verified to hang inside the Codex sandbox but succeed outside it. Treat reviewer-approved elevated execution as the default practical path for real Claude invocations.
83
- - Use the sandboxed path only when the user explicitly wants to test sandbox behavior or when you have a concrete reason to believe the sandbox limitation no longer applies.
84
- - If Claude Code needs network access, auth refresh, access outside the writable sandbox, or the in-sandbox run fails or hangs for likely sandbox reasons, rerun via `exec_command` with `sandbox_permissions: "require_escalated"`.
85
- - If you verify that direct `claude -p` works outside the sandbox but hangs inside it in the current environment, stop treating the sandboxed path as normal. For the rest of that Codex session, prefer reviewer-approved elevated execution as the default path for actual Claude invocations.
86
- - Use a concise justification that tells the reviewer why Claude Code needs elevation, for example:
87
- - "Do you want me to run Claude Code with network access so it can authenticate and complete this review?"
88
- - "Do you want me to rerun Claude Code outside the sandbox because the in-sandbox headless run appears blocked by permissions?"
89
- - Prefer a narrow approval request. Ask for elevation only for the Claude invocation that needs it, not unrelated commands.
90
- - If the task is high-risk, restate the chosen Claude permission mode when asking for elevation.
65
+ When this skill is used from Codex, use the execution path that actually works here.
91
66
 
92
- ## Best Practices
67
+ - For real Claude runs in this environment, prefer `exec_command` with `sandbox_permissions: "require_escalated"`.
68
+ - Do not treat the sandboxed path as normal. Direct `claude -p` has been verified to work outside the sandbox and to hang or misbehave inside it.
69
+ - For long repo reviews from Codex, prefer `--output-format stream-json --verbose` so progress is visible. Plain `text` output can remain silent for a long time even when Claude is actively working.
70
+ - Use a narrow approval request tied to the Claude invocation, for example:
71
+ - "Do you want me to run Claude Code outside the sandbox so it can complete this review?"
72
+ - "Do you want me to run Claude Code with network/auth access so it can complete this task?"
73
+ - Once elevated Claude execution is known to work for the current task, keep using that path. Do not retry the sandboxed path again during the same task.
93
74
 
94
- - Use `--max-turns` for automation. It keeps review and implementation runs bounded.
95
- - Use `--output-format json` when another agent or script needs structured output.
96
- - Use `--allowedTools` to make headless runs more reliable and safer.
97
- - Prefer letting Claude inspect the repo directly for normal review and implementation tasks.
98
- - Pass diffs or logs via stdin only when the user explicitly wants artifact-only review or when direct repo access is intentionally unavailable.
99
- - Use `acceptEdits` for real implementation work; do not force read-only `plan` mode onto edit tasks.
100
- - Use `bypassPermissions` only in a trusted sandbox and only with explicit user approval.
101
- - Use `-r latest -p` for follow-up instead of re-explaining the entire task.
102
- - Avoid giant `cat file1 file2 ... | claude ...` constructions as a default strategy. They are brittle, hard to inspect, and a poor substitute for direct Claude access to the repo.
75
+ ## Recommended Command Patterns
103
76
 
104
- ## Recommended Patterns
77
+ ### 1. Normal repo review
105
78
 
106
- ### 1. Review the repo directly
107
-
108
- Use this as the normal review path:
79
+ Use this as the default review path:
109
80
 
110
81
  ```bash
111
82
  claude -p \
112
- --output-format text \
83
+ "Review this repository and return findings ordered by severity with file references where possible." \
84
+ --output-format stream-json \
85
+ --verbose \
113
86
  --model sonnet \
114
87
  --permission-mode default \
115
- --max-turns 4 \
116
- "Review how the workshop skill is designed in this repo and whether it should be improved. Return findings ordered by severity."
88
+ --max-turns 8 \
89
+ --allowedTools "Read" "Grep" "Glob" "LS" "Bash(git status:*)" "Bash(git diff:*)" \
90
+ --disallowedTools "Edit" "Write" "NotebookEdit"
117
91
  ```
118
92
 
119
- ### 2. Review a provided diff only
120
-
121
- Use stdin only when you intentionally want a bounded artifact review:
93
+ ### 2. Review current changes only
122
94
 
123
95
  ```bash
124
96
  git diff --staged | claude -p \
97
+ "Review this diff and return findings ordered by severity with concise explanations." \
125
98
  --output-format text \
126
99
  --model sonnet \
127
100
  --permission-mode default \
128
- --max-turns 1 \
129
- "Review this diff. Return findings ordered by severity with file paths and concise explanations."
101
+ --max-turns 6
130
102
  ```
131
103
 
132
- ### 3. Ask Claude to inspect the repo and analyze with tighter tool bounds
104
+ ### 3. Multi-directory repo investigation
133
105
 
134
- Use this when Claude should inspect the repo but you want tighter limits:
106
+ Use this when the prompt names several directories or asks for deeper inspection:
135
107
 
136
108
  ```bash
137
109
  claude -p \
138
- --output-format text \
139
- --model sonnet \
110
+ "Inspect the relevant parts of this repo, evaluate the design, and return evidence-based findings ordered by severity." \
111
+ --output-format stream-json \
112
+ --verbose \
113
+ --model opus \
140
114
  --permission-mode default \
141
- --max-turns 4 \
142
- --allowedTools "Read,Grep,Glob,Bash(git diff:*),Bash(git status:*)" \
143
- "Analyze the current changes and explain the main design risks."
115
+ --max-turns 8 \
116
+ --allowedTools "Read" "Grep" "Glob" "LS" "Bash(git status:*)" "Bash(git diff:*)" \
117
+ --disallowedTools "Edit" "Write" "NotebookEdit"
144
118
  ```
145
119
 
146
- ### 4. Ask Claude to implement or refactor
147
-
148
- Use `acceptEdits` when Claude should change files:
120
+ ### 4. Implementation or refactor
149
121
 
150
122
  ```bash
151
123
  claude -p \
124
+ "Implement the requested change in this repo, keep the diff minimal, and summarize what changed." \
152
125
  --output-format text \
153
126
  --model sonnet \
154
127
  --permission-mode acceptEdits \
155
- --effort high \
156
- --max-turns 8 \
157
- "Implement the requested change in the current repo, keep the diff minimal, and summarize what changed."
158
- ```
159
-
160
- ### 5. Resume the latest Claude Code session
161
-
162
- ```bash
163
- claude -r latest -p \
164
- --output-format text \
165
- "Focus only on the migration risk you mentioned earlier. What is the safest rollout plan?"
166
- ```
167
-
168
- ### 6. Request structured output for automation
169
-
170
- ```bash
171
- claude -p \
172
- --output-format json \
173
- --model sonnet \
174
- --permission-mode default \
175
- --max-turns 1 \
176
- "Summarize the provided diff as JSON with keys: verdict, findings, risks."
128
+ --max-turns 12
177
129
  ```
178
130
 
179
- ### 7. Use strict read-only analysis mode explicitly
180
-
181
- Use `plan` only when you want Claude prevented from executing commands or editing files:
131
+ ### 5. Resume the latest session
182
132
 
183
133
  ```bash
184
- git diff --staged | claude -p \
185
- --output-format text \
186
- --model sonnet \
187
- --permission-mode plan \
188
- --max-turns 1 \
189
- "Analyze this diff and explain the key risks. Keep the response concise."
134
+ claude -r latest -p "Focus only on the open issue you identified earlier and propose the safest next step." \
135
+ --output-format text
190
136
  ```
191
137
 
192
- ## Optional Wrapper
193
-
194
- This skill also ships a convenience wrapper at `scripts/run-claude-code.sh`.
195
-
196
- Use it as a thin helper around the documented CLI patterns above. It supports:
197
-
198
- - `--check`
199
- - `--prompt`
200
- - `--model`
201
- - `--permission-mode`
202
- - `--effort`
203
- - `--max-turns`
204
- - `--output-format`
205
- - `--allowed-tools`
206
- - `--disallowed-tools`
207
- - `--add-dir`
208
- - `--resume`
209
- - `--continue`
210
- - `--name`
211
- - `--no-session-persistence`
212
- - `--verbose`
213
- - `--timeout-seconds`
214
-
215
- Do not treat the wrapper as the default execution path for this skill. Use it only when:
216
-
217
- - the user explicitly asks to use the wrapper
218
- - you are debugging Claude Code hangs or invocation edge cases
219
- - you need its optional timeout helper for diagnosis
220
-
221
- Otherwise, use the direct `claude` commands in this document.
222
-
223
138
  ## Following Up
224
139
 
225
- - After every Claude Code command, confirm whether the user wants a follow-up, a resume, or a different model / permission mode.
226
- - Restate the chosen model, effort, and permission mode when proposing a retry.
227
- - Keep resume prompts narrow. Avoid restating the whole task unless context was lost.
228
-
229
- ## Critical Evaluation of Claude Code Output
230
-
231
- Claude Code is a colleague, not an authority.
232
-
233
- ### Guidelines
234
- - Trust your own knowledge when confident.
235
- - Push back if Claude Code makes a claim that conflicts with the code or docs in front of you.
236
- - Research disagreements before accepting them for high-impact decisions.
237
- - Do not defer blindly on models, permissions, or fast-moving best practices.
238
-
239
- ### When Claude Code Seems Wrong
240
- 1. State the disagreement clearly to the user.
241
- 2. Provide evidence from the codebase, documentation, or your own verification.
242
- 3. Optionally resume the Claude Code session with a corrective prompt:
243
-
244
- ```bash
245
- claude -r latest -p \
246
- --output-format text \
247
- "I disagree with your earlier conclusion because [evidence]. Re-evaluate only that point."
248
- ```
249
-
250
- 4. Treat the exchange as a discussion, not a correction ritual.
140
+ - After every Claude run, tell the user which model and permission mode were used.
141
+ - Tell the user they can ask to resume the Claude session or rerun with a different model.
142
+ - Keep follow-up prompts narrow.
251
143
 
252
144
  ## Error Handling
253
145
 
254
146
  - Stop and report failures whenever `claude --version` or a `claude -p` command exits non-zero.
255
- - If Claude Code hangs or times out, report that Claude did not complete in this environment. Do not continue with a self-authored substitute while implying it came from Claude.
256
- - If Claude Code reports permission issues, choose the correct permission mode or constrain tools explicitly.
257
- - If a headless run gets stuck on permissions, either switch to `plan`, use `acceptEdits`, or provide `--allowedTools` / a permission prompt tool.
258
- - In this environment, prefer reviewer-approved elevated execution for real Claude runs instead of treating sandbox execution as the normal path.
259
- - In Codex, if the likely cause is sandboxing or network denial, rerun with reviewer-approved `require_escalated` execution instead of repeatedly retrying the same sandboxed command.
260
- - In Codex, if you have already confirmed that elevated execution works and sandboxed execution hangs, do not re-run the sandboxed path again during the same task.
261
- - Do not use `bypassPermissions` unless the user explicitly approves it.
262
- - If the direct `claude` path fails, report that failure directly. Do not route around it by pretending a wrapper run or a self-authored review is equivalent to Claude output.
263
- - Do not automatically fall back to concatenating many files into stdin just because a direct Claude run failed once.
147
+ - If Claude reaches `max turns`, report that plainly. Do not paraphrase it as a hang.
148
+ - If `--output-format text` stays silent during a long repo review, do not assume it is hung. Re-run or start with `--output-format stream-json --verbose` to confirm whether Claude is actively working.
149
+ - If Claude truly hangs or returns no output even in streaming mode, report that plainly and stop.
150
+ - Do not substitute your own review and imply it came from Claude.
151
+ - Do not automatically switch to `plan` after a failed `default` run.
152
+ - Do not pass the prompt after `--allowedTools`, because this CLI can consume it as another allowed tool and then fail with "Input must be provided..."
153
+ - Do not describe `--allowedTools` as a strict read-only sandbox. It pre-approves tools; it does not, by itself, ban other tools.
154
+ - Do not automatically fall back to giant `cat file1 file2 ... | claude ...` pipelines.
155
+ - Do not use the bundled wrapper unless the user explicitly asks for it or you are debugging the invocation itself.
156
+
157
+ ## Critical Evaluation
158
+
159
+ Claude Code is a colleague, not an authority.
160
+
161
+ - Trust your own technical judgment when the repo contradicts Claude.
162
+ - Verify high-impact claims against the code or docs.
163
+ - If needed, resume the Claude session with a corrective prompt instead of silently accepting a bad conclusion.
@@ -59,6 +59,8 @@ Auto-detect capture type:
59
59
  3. label: "Learning", description: "Pattern, preference, or principle for CLAUDE.md or memory"
60
60
  - multiSelect: false
61
61
 
62
+ Repeated workflow friction counts as captureable knowledge. If the same clarification keeps happening during `brainstorm`, `plan`, `work`, or `review`, capture whether it belongs in repo doctrine or in reusable toolkit workflow guidance.
63
+
62
64
  **Exit:** Capture type determined.
63
65
 
64
66
  ---
@@ -119,7 +121,8 @@ Determine the right location:
119
121
  ### Learning Capture
120
122
 
121
123
  - **Project-specific:** Add to project CLAUDE.md or memory files
122
- - **Toolkit-wide:** Add to the relevant plugin's `../knowledge/` directory
124
+ - **Toolkit-wide:** Add to the relevant plugin's `../knowledge/` directory or skill docs
125
+ - **Hybrid:** Split repo-local truth from toolkit-wide truth explicitly instead of mixing them
123
126
  - Keep it concise — one pattern per entry
124
127
 
125
128
  **Exit:** Knowledge extracted and structured.
@@ -220,6 +223,7 @@ Before delivering, verify:
220
223
  - [ ] **Verified:** Root cause confirmed, fix tested (not just proposed)
221
224
  - [ ] **No duplicates:** Searched existing docs — this is genuinely new
222
225
  - [ ] **Correctly typed:** Solution vs context doc vs learning — right structure and location
226
+ - [ ] **Layered correctly:** Repo-local rule vs toolkit-wide workflow truth is explicit
223
227
  - [ ] **No application code modified** — knowledge documents only
224
228
 
225
229
  ## What Makes This Heart of Gold
@@ -56,6 +56,15 @@ Prefer the harness's structured choice UI when available. Otherwise present conc
56
56
  **If anything in the plan is unclear:**
57
57
  Ask clarifying questions now — better to ask than build wrong.
58
58
 
59
+ If the plan authorizes design-heavy, copy-heavy, or boundary-sensitive work, verify before leaving Phase 0 that the plan already includes:
60
+ - target outcome and anti-goals
61
+ - references and anti-references
62
+ - proof-slice or rollout rule
63
+ - explicit rejection criteria
64
+ - preview artifacts when the task depends on human judgment
65
+
66
+ If those items are missing, stop and return to planning. Do not invent the missing subjective contract during implementation.
67
+
59
68
  ### Autonomy Activation
60
69
 
61
70
  When a plan file is provided: **default to autonomous mode.** Plans are pre-approved decisions — execute without re-litigating them.
@@ -137,6 +146,8 @@ while (unchecked tasks remain):
137
146
 
138
147
  **Stage specific files — never `git add .`**
139
148
 
149
+ If the plan requires a proof slice, do not propagate beyond that slice until its verification passes. If preview review or plan adherence fails, update the plan instead of improvising the missing contract in code.
150
+
140
151
  **Exit:** All tasks checked off, all tests passing.
141
152
 
142
153
  ---
package/src/index.ts CHANGED
@@ -7,7 +7,7 @@ import { targetsCommand } from "./commands/targets";
7
7
  const main = defineCommand({
8
8
  meta: {
9
9
  name: "heart-of-gold",
10
- version: "0.1.20",
10
+ version: "0.1.22",
11
11
  description:
12
12
  "Cross-platform installer for Heart of Gold skills — Codex, OpenCode, Pi, Claude Code, and more",
13
13
  },