npm - claude-dev-env - Versions diffs - 1.38.0 → 1.39.0 - Mend

claude-dev-env 1.38.0 → 1.39.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (271) hide show

package/CLAUDE.md +10 -36
package/_shared/pr-loop/audit-reply-template.md +147 -0
package/_shared/pr-loop/fix-protocol.md +25 -4
package/_shared/pr-loop/gh-payloads.md +37 -50
package/_shared/pr-loop/scripts/code_rules_gate.py +0 -60
package/_shared/pr-loop/scripts/config/post_audit_thread_constants.py +189 -0
package/_shared/pr-loop/scripts/post_audit_thread.py +947 -0
package/_shared/pr-loop/scripts/tests/test_code_rules_gate.py +0 -19
package/_shared/pr-loop/scripts/tests/test_post_audit_thread.py +923 -0
package/_shared/pr-loop/scripts/tests/test_post_audit_thread_constants.py +127 -0
package/_shared/pr-loop/state-schema.md +1 -1
package/agents/clean-coder.md +2 -2
package/bin/install.mjs +6 -7
package/bin/install.test.mjs +8 -0
package/commands/doc-gist.md +16 -0
package/commands/plan.md +0 -2
package/commands/review-plan.md +1 -1
package/docs/CODE_RULES.md +122 -2
package/hooks/blocking/bot_mention_comment_blocker.py +75 -0
package/hooks/blocking/code_rules_enforcer.py +1236 -161
package/hooks/blocking/convergence_gate_blocker.py +130 -0
package/hooks/blocking/destructive_command_blocker.py +74 -0
package/hooks/blocking/gh_body_arg_blocker.py +30 -0
package/hooks/blocking/md_to_html_blocker.py +119 -0
package/hooks/blocking/test_bot_mention_comment_blocker.py +131 -0
package/hooks/blocking/test_code_rules_enforcer.py +21 -0
package/hooks/blocking/test_code_rules_enforcer_any_exempt_files.py +70 -0
package/hooks/blocking/test_code_rules_enforcer_any_imports_and_cast.py +92 -0
package/hooks/blocking/test_code_rules_enforcer_banned_import_alias.py +143 -0
package/hooks/blocking/test_code_rules_enforcer_banned_prefixes.py +152 -0
package/hooks/blocking/test_code_rules_enforcer_bare_except.py +120 -0
package/hooks/blocking/test_code_rules_enforcer_boundary_types.py +175 -0
package/hooks/blocking/test_code_rules_enforcer_cap_meta.py +0 -1
package/hooks/blocking/test_code_rules_enforcer_collection_prefix.py +50 -0
package/hooks/blocking/test_code_rules_enforcer_docstring_format.py +255 -0
package/hooks/blocking/test_code_rules_enforcer_inline_tuple_string_magic.py +130 -0
package/hooks/blocking/test_code_rules_enforcer_stub_implementations.py +141 -0
package/hooks/blocking/test_code_rules_enforcer_test_branching.py +143 -0
package/hooks/blocking/test_code_rules_enforcer_thin_wrapper_files.py +169 -0
package/hooks/blocking/test_code_rules_enforcer_todo_markers.py +99 -0
package/hooks/blocking/test_code_rules_enforcer_typed_dict_pairs.py +141 -0
package/hooks/blocking/test_code_rules_enforcer_unused_imports.py +158 -0
package/hooks/blocking/test_convergence_gate_blocker.py +63 -0
package/hooks/blocking/test_destructive_command_blocker.py +146 -0
package/hooks/blocking/test_destructive_command_blocker_no_verify.py +102 -0
package/hooks/blocking/test_gh_body_arg_blocker.py +45 -0
package/hooks/blocking/test_md_to_html_blocker.py +317 -0
package/hooks/config/any_type_config.py +7 -0
package/hooks/config/banned_identifiers_constants.py +11 -0
package/hooks/config/blocking_check_limits.py +38 -0
package/hooks/config/bot_mention_comment_blocker_constants.py +20 -0
package/hooks/config/code_rules_enforcer_constants.py +53 -0
package/hooks/config/convergence_branch_constants.py +9 -0
package/hooks/config/doc_gist_auto_publish_constants.py +18 -0
package/hooks/config/html_companion_constants.py +20 -0
package/hooks/config/inline_tuple_string_magic_constants.py +22 -0
package/hooks/config/test_banned_identifiers_constants.py +17 -0
package/hooks/hooks.json +28 -20
package/hooks/pyproject.toml +69 -0
package/hooks/validators/mypy_integration.py +47 -1
package/hooks/validators/run_all_validators.py +3 -3
package/hooks/validators/test_mypy_integration.py +50 -1
package/hooks/workflow/doc_gist_auto_publish.py +144 -0
package/hooks/workflow/md_to_html_companion.py +365 -0
package/hooks/workflow/test_doc_gist_auto_publish.py +117 -0
package/hooks/workflow/test_md_to_html_companion.py +452 -0
package/package.json +1 -1
package/rules/gh-body-file.md +2 -0
package/scripts/Install-SweepEmptyDirs.ps1 +111 -0
package/scripts/check.ps1 +106 -0
package/scripts/config/timing.py +11 -0
package/scripts/sweep_empty_dirs.py +138 -0
package/scripts/sync_to_cursor/rules.py +1 -1
package/scripts/test_sweep_empty_dirs.py +183 -0
package/skills/_shared/pr-loop/prompts/pr-consistency-audit.xml +323 -0
package/skills/_shared/pr-loop/scripts/_cli_utils.py +22 -0
package/skills/_shared/pr-loop/scripts/_path_resolver.py +165 -0
package/skills/_shared/pr-loop/scripts/_xml_utils.py +20 -0
package/skills/_shared/pr-loop/scripts/build_audit_prompt.py +182 -0
package/skills/_shared/pr-loop/scripts/build_fix_prompt.py +185 -0
package/skills/_shared/pr-loop/scripts/config/__init__.py +0 -0
package/skills/_shared/pr-loop/scripts/config/path_resolver_constants.py +78 -0
package/skills/_shared/pr-loop/scripts/init_loop_state.py +135 -0
package/skills/_shared/pr-loop/scripts/teardown_worktrees.py +175 -0
package/skills/_shared/pr-loop/scripts/write_audit_outcomes.py +182 -0
package/skills/_shared/pr-loop/scripts/write_fix_outcomes.py +206 -0
package/skills/bugteam/CONSTRAINTS.md +21 -22
package/skills/bugteam/EXAMPLES.md +3 -3
package/skills/bugteam/PROMPTS.md +227 -67
package/skills/bugteam/SKILL.md +114 -455
package/skills/bugteam/reference/README.md +1 -1
package/skills/bugteam/reference/audit-and-teammates.md +112 -39
package/skills/bugteam/reference/audit-contract.md +4 -22
package/skills/bugteam/reference/copilot-gap-analysis.md +8 -5
package/skills/bugteam/reference/design-rationale.md +2 -2
package/skills/bugteam/reference/github-pr-reviews.md +50 -57
package/skills/bugteam/reference/obstacles/audit-assign-ids.md +13 -0
package/skills/bugteam/reference/obstacles/audit-capture-excerpts.md +13 -0
package/skills/bugteam/reference/obstacles/audit-walk-categories.md +13 -0
package/skills/bugteam/reference/obstacles/audit-write-xml.md +13 -0
package/skills/bugteam/reference/obstacles/fix-append-summary.md +13 -0
package/skills/bugteam/reference/obstacles/fix-apply-fixes.md +13 -0
package/skills/bugteam/reference/obstacles/fix-git-add-commit.md +13 -0
package/skills/bugteam/reference/obstacles/fix-git-push.md +13 -0
package/skills/bugteam/reference/obstacles/fix-post-reply.md +13 -0
package/skills/bugteam/reference/obstacles/fix-publish-summary.md +13 -0
package/skills/bugteam/reference/obstacles/fix-py-compile.md +13 -0
package/skills/bugteam/reference/obstacles/fix-read-files.md +13 -0
package/skills/bugteam/reference/obstacles/fix-resolve-thread.md +13 -0
package/skills/bugteam/reference/obstacles/fix-test-suite.md +13 -0
package/skills/bugteam/reference/obstacles/fix-violation-count.md +13 -0
package/skills/bugteam/reference/obstacles/fix-write-xml.md +13 -0
package/skills/bugteam/reference/team-setup.md +106 -9
package/skills/bugteam/reference/teardown-publish-permissions.md +39 -8
package/skills/bugteam/scripts/README.md +60 -0
package/skills/bugteam/scripts/_claude_permissions_common.py +358 -0
package/skills/bugteam/scripts/bugteam_code_rules_gate.py +976 -0
package/skills/bugteam/scripts/bugteam_fix_hookspath.py +375 -0
package/skills/bugteam/scripts/bugteam_preflight.py +294 -0
package/skills/bugteam/scripts/config/bugteam_code_rules_gate_constants.py +25 -0
package/skills/bugteam/scripts/config/bugteam_fix_hookspath_constants.py +26 -0
package/skills/bugteam/scripts/config/bugteam_preflight_constants.py +35 -0
package/skills/bugteam/scripts/config/claude_permissions_common_constants.py +20 -0
package/skills/bugteam/scripts/config/probe_code_rules_enforcer_check_constants.py +12 -0
package/skills/bugteam/scripts/config/windows_safe_rmtree_constants.py +7 -0
package/skills/bugteam/scripts/grant_project_claude_permissions.py +175 -0
package/skills/bugteam/scripts/probe_code_rules_enforcer_check.py +107 -0
package/skills/bugteam/scripts/revoke_project_claude_permissions.py +220 -0
package/skills/bugteam/scripts/test__claude_permissions_common.py +112 -0
package/skills/bugteam/scripts/test_bugteam_code_rules_gate.py +400 -0
package/skills/bugteam/scripts/test_bugteam_fix_hookspath.py +384 -0
package/skills/bugteam/scripts/test_bugteam_preflight.py +268 -0
package/skills/bugteam/scripts/test_claude_permissions_common.py +195 -0
package/skills/bugteam/scripts/test_grant_project_claude_permissions.py +55 -0
package/skills/bugteam/scripts/test_probe_code_rules_enforcer_check.py +76 -0
package/skills/bugteam/scripts/test_revoke_project_claude_permissions.py +55 -0
package/skills/bugteam/scripts/test_windows_safe_rmtree.py +108 -0
package/skills/bugteam/scripts/windows_safe_rmtree.py +100 -0
package/skills/bugteam/test_skill_additions.py +1 -11
package/skills/code/SKILL.md +176 -0
package/skills/doc-gist/SKILL.md +99 -0
package/skills/doc-gist/references/examples/01-exploration-code-approaches.html +453 -0
package/skills/doc-gist/references/examples/02-exploration-visual-designs.html +515 -0
package/skills/doc-gist/references/examples/03-code-review-pr.html +638 -0
package/skills/doc-gist/references/examples/04-code-understanding.html +491 -0
package/skills/doc-gist/references/examples/05-design-system.html +629 -0
package/skills/doc-gist/references/examples/06-component-variants.html +605 -0
package/skills/doc-gist/references/examples/07-prototype-animation.html +455 -0
package/skills/doc-gist/references/examples/08-prototype-interaction.html +396 -0
package/skills/doc-gist/references/examples/09-slide-deck.html +592 -0
package/skills/doc-gist/references/examples/10-svg-illustrations.html +492 -0
package/skills/doc-gist/references/examples/11-status-report.html +528 -0
package/skills/doc-gist/references/examples/12-incident-report.html +596 -0
package/skills/doc-gist/references/examples/13-flowchart-diagram.html +395 -0
package/skills/doc-gist/references/examples/14-research-feature-explainer.html +381 -0
package/skills/doc-gist/references/examples/15-research-concept-explainer.html +368 -0
package/skills/doc-gist/references/examples/16-implementation-plan.html +702 -0
package/skills/doc-gist/references/examples/17-pr-writeup.html +595 -0
package/skills/doc-gist/references/examples/18-editor-triage-board.html +573 -0
package/skills/doc-gist/references/examples/19-editor-feature-flags.html +663 -0
package/skills/doc-gist/references/examples/20-editor-prompt-tuner.html +722 -0
package/skills/doc-gist/references/examples/README.md +5 -0
package/skills/doc-gist/scripts/config/__init__.py +0 -0
package/skills/doc-gist/scripts/config/gist_upload_constants.py +16 -0
package/skills/doc-gist/scripts/gist_upload.py +177 -0
package/skills/doc-gist/scripts/test_gist_upload.py +51 -0
package/skills/findbugs/SKILL.md +68 -2
package/skills/monitor-open-prs/SKILL.md +13 -32
package/skills/monitor-open-prs/test_skill_contract.py +0 -11
package/skills/pr-consistency-audit/SKILL.md +112 -0
package/skills/pr-consistency-audit/reference/detection-rules.md +96 -0
package/skills/pr-consistency-audit/reference/illustrations.md +78 -0
package/skills/pr-converge/SKILL.md +227 -23
package/skills/pr-converge/config/__init__.py +0 -0
package/skills/pr-converge/config/constants.py +62 -0
package/skills/pr-converge/reference/convergence-gates.md +138 -44
package/skills/pr-converge/reference/examples.md +43 -11
package/skills/pr-converge/reference/fix-protocol.md +6 -5
package/skills/pr-converge/reference/ground-rules.md +5 -3
package/skills/pr-converge/reference/multi-pr-orchestration.md +44 -19
package/skills/pr-converge/reference/obstacles/fix-post-replies.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-publish-summary.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-push.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-read-filelines.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-reset-state.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-resolve-threads.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-spawn-clean-coder.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-stage-commit.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-trigger-bugbot.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-write-test.md +13 -0
package/skills/pr-converge/reference/per-tick.md +90 -31
package/skills/pr-converge/reference/state-schema.md +22 -1
package/skills/pr-converge/reference/stop-conditions.md +9 -7
package/skills/pr-converge/scripts/README.md +34 -46
package/skills/pr-converge/scripts/check_bugbot_ci.py +174 -0
package/skills/pr-converge/scripts/check_convergence.py +497 -0
package/skills/pr-converge/scripts/check_pending_reviews.py +154 -0
package/skills/pr-converge/scripts/config/pr_converge_constants.py +118 -0
package/skills/pr-converge/scripts/fetch_copilot_reviews.py +134 -0
package/skills/pr-converge/scripts/post_fix_reply.py +168 -0
package/skills/pr-converge/workflows/schedule-wakeup-loop.md +5 -12
package/skills/qbug/SKILL.md +132 -27
package/skills/session-log/SKILL.md +216 -114
package/skills/session-tidy/SKILL.md +1 -1
package/skills/skill-builder/SKILL.md +138 -56
package/skills/skill-builder/references/delegation-map.md +72 -113
package/skills/skill-builder/references/progressive-disclosure.md +122 -0
package/skills/skill-builder/references/self-audit-checklist.md +92 -0
package/skills/skill-builder/references/skill-types.md +228 -0
package/skills/skill-builder/references/thariq-x-post-skills.json +33 -0
package/skills/skill-builder/templates/gap-analysis.md +15 -8
package/skills/skill-builder/workflows/improve-skill.md +86 -57
package/skills/skill-builder/workflows/new-skill.md +80 -168
package/skills/skill-builder/workflows/polish-skill.md +78 -54
package/skills/structure-prompt/SKILL.md +50 -0
package/skills/structure-prompt/reference/adversarial-tuning.md +62 -0
package/skills/structure-prompt/reference/block-classification.md +27 -0
package/skills/structure-prompt/reference/canonical-case.md +48 -0
package/skills/structure-prompt/reference/citation-depth.md +70 -0
package/skills/structure-prompt/reference/cleanup.md +33 -0
package/skills/structure-prompt/reference/constraints.md +33 -0
package/skills/structure-prompt/reference/directives.md +37 -0
package/skills/structure-prompt/reference/examples.md +72 -0
package/skills/structure-prompt/reference/instantiation.md +51 -0
package/skills/structure-prompt/reference/output-contract.md +72 -0
package/skills/structure-prompt/reference/per-category.md +23 -0
package/skills/structure-prompt/reference/persona.md +38 -0
package/skills/structure-prompt/reference/research.md +33 -0
package/skills/structure-prompt/reference/structure.md +28 -0
package/agents/code-standards-agent.md +0 -93
package/agents/groq-coder.md +0 -113
package/agents/plan-executor.md +0 -226
package/agents/project-docs-analyzer.md +0 -53
package/agents/project-structure-organizer-agent.md +0 -72
package/agents/skill-to-agent-converter.md +0 -370
package/agents/skill-writer-agent.md +0 -470
package/agents/user-docs-writer.md +0 -67
package/agents/workflow-visual-documenter.md +0 -82
package/commands/readability-review.md +0 -20
package/hooks/mypy.ini +0 -2
package/hooks/notification/attention_needed_notify.py +0 -71
package/hooks/notification/claude_notification_handler.py +0 -67
package/hooks/notification/notification_utils.py +0 -267
package/hooks/notification/subagent_complete_notify.py +0 -381
package/hooks/notification/test_attention_needed_notify.py +0 -47
package/hooks/notification/test_claude_notification_handler.py +0 -54
package/hooks/notification/test_notification_utils.py +0 -91
package/hooks/notification/test_subagent_complete_notify.py +0 -79
package/scripts/config/groq_bugteam_config.py +0 -230
package/scripts/config/test_groq_bugteam_config.py +0 -83
package/scripts/config/test_spec_implementer_prompt.py +0 -32
package/scripts/groq_bugteam.README.md +0 -131
package/scripts/groq_bugteam.py +0 -647
package/scripts/groq_bugteam_dotenv.py +0 -40
package/scripts/groq_bugteam_spec.py +0 -226
package/scripts/test_groq_bugteam.py +0 -529
package/scripts/test_groq_bugteam_apply_fix_from_spec.py +0 -426
package/scripts/test_groq_bugteam_dotenv.py +0 -66
package/scripts/test_groq_bugteam_spec.py +0 -338
package/skills/bugteam/SKILL_EVALS.md +0 -309
package/skills/dream/SKILL.md +0 -118
package/skills/ingest/SKILL.md +0 -40
package/skills/npm-creator/SKILL.md +0 -187
package/skills/readability-review/SKILL.md +0 -127
package/skills/resume-review/SKILL.md +0 -261
package/skills/rule-audit/SKILL.md +0 -307
package/skills/rule-creator/SKILL.md +0 -150
package/skills/searching-obsidian-vault/SKILL.md +0 -131
package/skills/skill-writer/REFERENCE.md +0 -284
package/skills/skill-writer/SKILL.md +0 -222
package/skills/tdd-team/SKILL.md +0 -128

package/skills/skill-builder/references/self-audit-checklist.md ADDED Viewed

@@ -0,0 +1,92 @@
+# Self-Audit Checklist
+Mandatory post-build verification. Every item must pass before a skill is delivered. Run after writing a new skill, improving an existing one, or polishing.
+Source synthesis: [Anthropic best practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices), [Lessons from Building Claude Code](thariq-x-post-skills.json), model skills (bugteam, pr-converge).
+## Core quality
+- [ ] **Conciseness** — Only context Claude doesn't already have. Every line justifies its token cost.
+  > "Default assumption: Claude is already very smart."
+- [ ] **Degree of freedom** — Matches task fragility. Low for narrow bridges, high for open fields.
+  > "Match the level of specificity to the task's fragility and variability."
+- [ ] **Naming convention** — Name uses gerund form (verb-ing) unless it's a well-known acronym or proper name. Lowercase, numbers, hyphens only. Max 64 chars. No reserved words.
+  > "Use consistent naming patterns to make Skills easier to reference."
+- [ ] **Description field** — Third person. Includes what AND when. Specific trigger phrases. Max 1024 chars. No XML tags.
+  > "The description is critical for skill selection: Claude uses it to choose the right Skill from potentially 100+ available Skills."
+- [ ] **SKILL.md body under 500 lines**
+  > "Keep SKILL.md body under 500 lines for optimal performance."
+- [ ] **One level deep** — All reference files link directly from SKILL.md. No nested references.
+  > "Claude may partially read files when they're referenced from other referenced files."
+- [ ] **TOC on files over 100 lines** — Every reference file >100 lines has a table of contents.
+  > "This ensures Claude can see the full scope of available information even when previewing with partial reads."
+- [ ] **No time-sensitive claims** — Or isolated in "old patterns" section.
+  > "Don't include information that will become outdated."
+- [ ] **Consistent terminology** — One term per concept throughout.
+  > "Consistency helps Claude understand and follow instructions."
+- [ ] **Forward slashes only** — File paths use `/`, not `\`.
+  > "Unix-style paths work across all platforms."
+- [ ] **Default provided, not options menu** — One recommended approach, escape hatch for special cases.
+  > "Don't present multiple approaches unless necessary. Provide a default with escape hatch."
+- [ ] **Gotchas section present** — Highest-signal content. Built from real failure observations.
+  > "The highest-signal content in any skill is the Gotchas section."
+- [ ] **Doesn't state the obvious** — Pushes Claude out of defaults, doesn't re-teach what Claude knows.
+  > "Focus on information that pushes Claude out of its normal way of thinking."
+- [ ] **Not railroading** — Gives information and flexibility, not rigid scripts.
+  > "Give Claude the information it needs, but give it the flexibility to adapt to the situation."
+- [ ] **When-this-applies section** — Trigger conditions clear. Refusal cases with exact response text.
+  > bugteam pattern — "Refusals — first match wins; respond with the quoted line exactly and stop."
+- [ ] **File index present** — Every file in the package listed with its purpose.
+  > "Tell Claude what files are in your skill, and it will read them at appropriate times."
+- [ ] **Concrete examples** — Input/output pairs or exit scenarios, not abstract descriptions.
+  > "Examples help Claude understand the desired style and level of detail more clearly than descriptions alone."
+- [ ] **Workflows have checklists** — Multi-step processes include copyable `[ ]` checklists.
+  > "For particularly complex workflows, provide a checklist that Claude can copy into its response and check off as it progresses."
+- [ ] **Feedback loops where quality-critical** — Run validator → fix → repeat pattern.
+  > "This pattern greatly improves output quality."
+- [ ] **Constraints separated** — Non-negotiables in CONSTRAINTS.md or equivalent section.
+  > bugteam pattern — constraints file with design rationale.
+- [ ] **Folder map at bottom** — Lists directories and their purposes.
+  > pr-converge pattern — "Folder map" section.
+## Skill-type-specific
+- [ ] **Skill type classified** — Fits one of 9 types. Folder structure matches type recommendation.
+  > "The best skills fit cleanly into one; the more confusing ones straddle several."
+- [ ] **Domain layout appropriate** — If multiple domains, organized by domain (reference/finance.md, reference/sales.md).
+  > Pattern 2 — domain-specific organization.
+## Code and scripts (if applicable)
+- [ ] **Scripts solve, don't punt** — Error handling explicit, no raw exceptions for Claude to figure out.
+  > "Handle error conditions rather than punting to Claude."
+- [ ] **No voodoo constants** — Every magic number has a documented justification.
+  > "Configuration parameters should be justified and documented."
+- [ ] **Execute vs read intent clear** — "Run script.py" (execute) vs "See script.py for algorithm" (read).
+  > "Make clear in your instructions whether Claude should execute the script or read it as reference."
+- [ ] **Dependencies listed** — Required packages stated, verified as available.
+  > "List required packages in your SKILL.md and verify they're available."
+- [ ] **MCP tools fully qualified** — `ServerName:tool_name` format.
+  > "Always use fully qualified tool names to avoid 'tool not found' errors."
+- [ ] **Plan-validate-execute for high-stakes ops** — Verifiable intermediate outputs before destructive actions.
+  > "Catches errors early: validation finds problems before changes are applied."
+## Setup and memory (if applicable)
+- [ ] **Setup instructions clear** — config.json pattern or AskUserQuestion for initial context.
+  > "If the config is not set up, the agent can then ask the user for information."
+- [ ] **Persistent data uses `${CLAUDE_PLUGIN_DATA}`** — Not stored in skill directory itself.
+  > "Data stored in the skill directory may be deleted when you upgrade the skill."
+## Composition and measurement (if applicable)
+- [ ] **Skill dependencies documented** — Skills this one composes with are named.
+  > "You can just reference other skills by name, and the model will invoke them if they are installed."
+- [ ] **Hooks declared** — If skill registers hooks, their purpose and scope is stated.
+  > "Skills can include hooks that are only activated when the skill is called."
+---
+## Usage
+Copy this checklist into your response after building. Check off each item. Any item that fails → fix before delivering. Any item marked "if applicable" that doesn't apply → mark N/A with a one-line reason.

package/skills/skill-builder/references/skill-types.md ADDED Viewed

@@ -0,0 +1,228 @@
+# Skill Types
+Source: [Lessons from Building Claude Code: How We Use Skills](thariq-x-post-skills.json)
+> "After cataloging all of our skills, we noticed they cluster into a few recurring categories. The best skills fit cleanly into one; the more confusing ones straddle several."
+## Type taxonomy
+For each type: what it is, recommended folder structure, primary needs.
+### 1. Library & API Reference
+Skills that explain how to correctly use a library, CLI, or SDK.
+> "These could be both for internal libraries or common libraries that Claude Code sometimes has trouble with."
+**Examples:** `billing-lib` (internal billing library: edge cases, footguns), `internal-platform-cli` (every subcommand with usage examples), `frontend-design` (design system guidance)
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Quick start + gotchas
+├── reference/        # API surface, method signatures
+│   └── api.md
+└── examples/         # Copy-pasteable code snippets
+    └── snippets.md
+```
+**Primary needs:** Reference docs, gotchas, code examples.
+---
+### 2. Product Verification
+Skills that describe how to test or verify that code is working.
+> "Verification skills are extremely useful for ensuring Claude's output is correct. It can be worth having an engineer spend a week just making your verification skills excellent."
+**Examples:** `signup-flow-driver`, `checkout-verifier`, `tmux-cli-driver`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Verification workflow + checklists
+├── scripts/          # Verification scripts, assertions
+│   ├── verify.py
+│   └── assert_state.py
+└── reference/        # Expected states, test data
+    └── expected-behavior.md
+```
+**Primary needs:** Scripts for verification, assertion libraries, state-checking patterns.
+---
+### 3. Data Fetching & Analysis
+Skills that connect to data and monitoring stacks.
+> "These skills might include libraries to fetch your data with credentials, specific dashboard ids, etc. as well as instructions on common workflows or ways to get data."
+**Examples:** `funnel-query`, `cohort-compare`, `grafana`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Common queries + gotchas
+├── reference/        # Table schemas, dashboard IDs, query patterns
+│   ├── schemas.md
+│   └── dashboards.md
+└── scripts/          # Data fetching helpers
+    └── query_helpers.py
+```
+**Primary needs:** Schema references, query patterns, credential setup instructions.
+---
+### 4. Business Process & Team Automation
+Skills that automate repetitive workflows into one command.
+> "For these skills, saving previous results in log files can help the model stay consistent and reflect on previous executions of the workflow."
+**Examples:** `standup-post`, `create-<ticket>-ticket`, `weekly-recap`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Workflow steps + templates
+├── templates/        # Output templates
+│   └── post-template.md
+└── scripts/          # Automation helpers
+    └── fetch_activity.py
+```
+**Primary needs:** Workflow steps, output templates, state persistence.
+---
+### 5. Code Scaffolding & Templates
+Skills that generate framework boilerplate.
+> "They are especially useful when your scaffolding has natural language requirements that can't be purely covered by code."
+**Examples:** `new-workflow`, `new-migration`, `create-app`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Scaffolding workflow
+├── templates/        # File templates (copy + fill)
+│   ├── handler.py.tmpl
+│   └── test.py.tmpl
+└── scripts/          # Scaffolding scripts
+    └── scaffold.py
+```
+**Primary needs:** Templates, naming conventions, file placement rules.
+---
+### 6. Code Quality & Review
+Skills that enforce code quality and help review code.
+> "These can include deterministic scripts or tools for maximum robustness. You may want to run these skills automatically as part of hooks or inside of a GitHub Action."
+**Examples:** `adversarial-review`, `code-style`, `testing-practices`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Review protocol + severity rubric
+├── reference/        # Category definitions, examples
+│   └── categories.md
+└── scripts/          # Deterministic validators
+    └── lint_check.py
+```
+**Primary needs:** Rubrics, scripts for deterministic checks, severity classification.
+---
+### 7. CI/CD & Deployment
+Skills that help fetch, push, and deploy code.
+> "These skills may reference other skills to collect data."
+**Examples:** `babysit-pr`, `deploy-<service>`, `cherry-pick-prod`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Deployment workflow + safety gates
+├── workflows/        # Sub-workflows for different scenarios
+│   └── rollback.md
+└── scripts/          # Deployment scripts
+    ├── smoke_test.py
+    └── rollout.py
+```
+**Primary needs:** Safety gates, rollback procedures, step-by-step checklists.
+---
+### 8. Runbooks
+Skills that take a symptom and walk through investigation.
+> "Skills that take a symptom (such as a Slack thread, alert, or error signature), walk through a multi-tool investigation, and produce a structured report."
+**Examples:** `<service>-debugging`, `oncall-runner`, `log-correlator`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Symptom → investigation mapping
+├── reference/        # Query patterns, known issues
+│   ├── query-patterns.md
+│   └── known-issues.md
+└── templates/        # Report template
+    └── finding-template.md
+```
+**Primary needs:** Symptom-to-query mapping, known issues catalog, report format.
+---
+### 9. Infrastructure Operations
+Skills that perform routine maintenance and operational procedures.
+> "Some of which involve destructive actions that benefit from guardrails. These make it easier for engineers to follow best practices in critical operations."
+**Examples:** `<resource>-orphans`, `dependency-management`, `cost-investigation`
+**Folder structure:**
+```
+skill-name/
+├── SKILL.md          # Operation steps + safety confirmations
+├── reference/        # Resource naming patterns, policies
+│   └── resources.md
+└── scripts/          # Cleanup/investigation scripts
+    └── find_orphans.py
+```
+**Primary needs:** Safety guardrails, confirmation gates, resource identification patterns.
+---
+## Routing table
+| User says... | Likely type |
+|---|---|
+| "Claude keeps using the wrong API" | 1. Library & API Reference |
+| "I need to verify Claude's output" | 2. Product Verification |
+| "Claude needs to query our data" | 3. Data Fetching & Analysis |
+| "Automate this repetitive workflow" | 4. Business Process & Team Automation |
+| "Generate boilerplate for new X" | 5. Code Scaffolding & Templates |
+| "Enforce code quality / review PRs" | 6. Code Quality & Review |
+| "Deploy / push / merge automation" | 7. CI/CD & Deployment |
+| "Investigate / debug when X happens" | 8. Runbooks |
+| "Manage infrastructure / cleanup" | 9. Infrastructure Operations |
+When the skill straddles multiple types, pick the dominant one for folder structure and note the secondary influence.

package/skills/skill-builder/references/thariq-x-post-skills.json ADDED Viewed

@@ -0,0 +1,33 @@
+{
+  "post": {
+    "url": "https://x.com/trq212/status/2033949937936085378",
+    "author": {
+      "name": "Thariq",
+      "handle": "@trq212",
+      "profile_image": "https://pbs.twimg.com/profile_images/1976939058741039104/r3GgzqRh_bigger.jpg"
+    },
+    "title": "Lessons from Building Claude Code: How We Use Skills",
+    "date": "12:53 PM · Mar 17, 2026",
+    "metrics": {
+      "replies": 383,
+      "reposts": "2.8K",
+      "likes": "16K",
+      "views": "6.8M",
+      "bookmarks": "44K"
+    },
+    "content": "Lessons from Building Claude Code: How We Use Skills\n\nSkills have become one of the most used extension points in Claude Code. They're flexible, easy to make, and simple to distribute.\n\nBut this flexibility also makes it hard to know what works best. What type of skills are worth making? What's the secret to writing a good skill? When do you share them with others?\n\nWe've been using skills in Claude Code extensively at Anthropic with hundreds of them in active use. These are the lessons we've learned about using skills to accelerate our development.\n\nWhat are Skills?\nIf you're new to skills, I'd recommend reading our docs or watching our newest course on new Skilljar on Agent Skills, this post will assume you already have some familiarity with skills.\n\nA common misconception we hear about skills is that they are \"just markdown files\", but the most interesting part of skills is that they're not just text files. They're folders that can include scripts, assets, data, etc. that the agent can discover, explore and manipulate.\n\nIn Claude Code, skills also have a wide variety of configuration options including registering dynamic hooks.\n\nWe've found that some of the most interesting skills in Claude Code use these configuration options and folder structure creatively.\n\nTypes of Skills\nAfter cataloging all of our skills, we noticed they cluster into a few recurring categories. The best skills fit cleanly into one; the more confusing ones straddle several. This isn't a definitive list, but it is a good way to think about if you're missing any inside of your org.\n\n1. Library & API Reference\nSkills that explain how to correctly use a library, CLI, or SDKs. These could be both for internal libraries or common libraries that Claude Code sometimes has trouble with. These skills often included a folder of reference code snippets and a list of gotchas for Claude to avoid when writing a script.\nExamples:\n- billing-lib — your internal billing library: edge cases, footguns, etc.\n- internal-platform-cli — every subcommand of your internal CLI wrapper with examples on when to use them\n- frontend-design — make Claude better at your design system\n\n2. Product Verification\nSkills that describe how to test or verify that your code is working. These are often paired with an external tool like playwright, tmux, etc. for doing the verification.\nVerification skills are extremely useful for ensuring Claude's output is correct. It can be worth having an engineer spend a week just making your verification skills excellent.\nConsider techniques like having Claude record a video of its output so you can see exactly what it tested, or enforcing programmatic assertions on state at each step. These are often done by including a variety of scripts in the skill.\nExamples:\n- signup-flow-driver — runs through signup → email verify → onboarding in a headless browser, with hooks for asserting state at each step\n- checkout-verifier — drives the checkout UI with Stripe test cards, verifies the invoice actually lands in the right state\n- tmux-cli-driver — for interactive CLI testing where the thing you're verifying needs a TTY\n\n3. Data Fetching & Analysis\nSkills that connect to your data and monitoring stacks. These skills might include libraries to fetch your data with credentials, specific dashboard ids, etc. as well as instructions on common workflows or ways to get data.\nExamples:\n- funnel-query — \"which events do I join to see signup → activation → paid\" plus the table that actually has the canonical user_id\n- cohort-compare — compare two cohorts' retention or conversion, flag statistically significant deltas, link to the segment definitions\n- grafana — datasource UIDs, cluster names, problem → dashboard lookup table\n\n4. Business Process & Team Automation\nSkills that automate repetitive workflows into one command. These skills are usually fairly simple instructions but might have more complicated dependencies on other skills or MCPs. For these skills, saving previous results in log files can help the model stay consistent and reflect on previous executions of the workflow.\nExamples:\n- standup-post — aggregates your ticket tracker, GitHub activity, and prior Slack → formatted standup, delta-only\n- create-<ticket-system>-ticket — enforces schema (valid enum values, required fields) plus post-creation workflow (ping reviewer, link in Slack)\n- weekly-recap — merged PRs + closed tickets + deploys → formatted recap post\n\n5. Code Scaffolding & Templates\nSkills that generate framework boilerplate for a specific function in codebase. You might combine these skills with scripts that can be composed. They are especially useful when your scaffolding has natural language requirements that can't be purely covered by code.\nExamples:\n- new-<framework>-workflow — scaffolds a new service/workflow/handler with your annotations\n- new-migration — your migration file template plus common gotchas\n- create-app — new internal app with your auth, logging, and deploy config pre-wired\n\n6. Code Quality & Review\nSkills that enforce code quality inside of your org and help review code. These can include deterministic scripts or tools for maximum robustness. You may want to run these skills automatically as part of hooks or inside of a GitHub Action.\n- adversarial-review — spawns a fresh-eyes subagent to critique, implements fixes, iterates until findings degrade to nitpicks\n- code-style — enforces code style, especially styles that Claude does not do well by default.\n- testing-practices — instructions on how to write tests and what to test.\n\n7. CI/CD & Deployment\nSkills that help you fetch, push, and deploy code inside of your codebase. These skills may reference other skills to collect data.\nExamples:\n- babysit-pr — monitors a PR → retries flaky CI → resolves merge conflicts → enables auto-merge\n- deploy-<service> — build → smoke test → gradual traffic rollout with error-rate comparison → auto-rollback on regression\n- cherry-pick-prod — isolated worktree → cherry-pick → conflict resolution → PR with template\n\n8. Runbooks\nSkills that take a symptom (such as a Slack thread, alert, or error signature), walk through a multi-tool investigation, and produce a structured report.\nExamples:\n- <service>-debugging — maps symptoms → tools → query patterns for your highest-traffic services\n- oncall-runner — fetches the alert → checks the usual suspects → formats a finding\n- log-correlator — given a request ID, pulls matching logs from every system that might have touched it\n\n9. Infrastructure Operations\nSkills that perform routine maintenance and operational procedures — some of which involve destructive actions that benefit from guardrails. These make it easier for engineers to follow best practices in critical operations.\nExamples:\n- <resource>-orphans — finds orphaned pods/volumes → posts to Slack → soak period → user confirms → cascading cleanup\n- dependency-management — your org's dependency approval workflow\n- cost-investigation — \"why did our storage/egress bill spike\" with the specific buckets and query patterns\n\nTips for Making Skills\nOnce you've decided on the skill to make, how do you write it? These are some of the best practices, tips, and tricks we've found.\nWe also recently released Skill Creator to make it easier to create skills in Claude Code.\n\nDon't State the Obvious\nClaude Code knows a lot about your codebase, and Claude knows a lot about coding, including many default opinions. If you're publishing a skill that is primarily about knowledge, try to focus on information that pushes Claude out of its normal way of thinking.\nThe frontend design skill is a great example — it was built by one of the engineers at Anthropic by iterating with customers on improving Claude's design taste, avoiding classic patterns like the Inter font and purple gradients.\n\nBuild a Gotchas Section\nThe highest-signal content in any skill is the Gotchas section. These sections should be built up from common failure points that Claude runs into when using your skill. Ideally, you will update your skill over time to capture these gotchas.\n\nUse the File System & Progressive Disclosure\nLike we said earlier, a skill is a folder, not just a markdown file. You should think of the entire file system as a form of context engineering and progressive disclosure. Tell Claude what files are in your skill, and it will read them at appropriate times.\nThe simplest form of progressive disclosure is to point to other markdown files for Claude to use. For example, you may split detailed function signatures and usage examples into references/api.md.\nAnother example: if your end output is a markdown file, you might include a template file for it in assets/ to copy and use.\nYou can have folders of references, scripts, examples, etc., which help Claude work more effectively.\n\nAvoid Railroading Claude\nClaude will generally try to stick to your instructions, and because Skills are so reusable you'll want to be careful of being too specific in your instructions. Give Claude the information it needs, but give it the flexibility to adapt to the situation.\n\nThink through the Setup\nSome skills may need to be set up with context from the user. For example, if you are making a skill that posts your standup to Slack, you may want Claude to ask which Slack channel to post it in.\nA good pattern to do this is to store this setup information in a config.json file in the skill directory like the above example. If the config is not set up, the agent can then ask the user for information.\nIf you want the agent to present structured, multiple choice questions you can instruct Claude to use the AskUserQuestion tool.\n\nThe Description Field Is For the Model\nWhen Claude Code starts a session, it builds a listing of every available skill with its description. This listing is what Claude scans to decide \"is there a skill for this request?\" Which means the description field is not a summary — it's a description of when to trigger this PR.\n\nMemory & Storing Data\nSome skills can include a form of memory by storing data within them. You could store data in anything as simple as an append only text log file or JSON files, or as complicated as a SQLite database.\nFor example, a standup-post skill might keep a standups.log with every post it's written, which means the next time you run it, Claude reads its own history and can tell what's changed since yesterday.\nData stored in the skill directory may be deleted when you upgrade the skill, so you should store this in a stable folder, as of today we provide `${CLAUDE_PLUGIN_DATA}` as a stable folder per plugin to store data in.\n\nStore Scripts & Generate Code\nOne of the most powerful tools you can give Claude is code. Giving Claude scripts and libraries lets Claude spend its turns on composition, deciding what to do next rather than reconstructing boilerplate.\nFor example, in your data science skill you might have a library of functions to fetch data from your event source. In order for Claude to do complex analysis, you could give it a set of helper functions like so:\nClaude can then generate scripts on the fly to compose this functionality to do more advanced analysis for prompts like \"What happened on Tuesday?\"\n\nOn Demand Hooks\nSkills can include hooks that are only activated when the skill is called, and last for the duration of the session. Use this for more opinionated hooks that you don't want to run all the time, but are extremely useful sometimes.\nFor example:\n- /careful — blocks rm -rf, DROP TABLE, force-push, kubectl delete via PreToolUse matcher on Bash. You only want this when you know you're touching prod — having it always on would drive you insane\n- /freeze — blocks any Edit/Write that's not in a specific directory. Useful when debugging: \"I want to add logs but I keep accidentally 'fixing' unrelated\"\n\nDistributing Skills\nOne of the biggest benefits of Skills is that you can share them with the rest of your team.\nThere are two ways you might to share skills with others:\n- check your skills into your repo (under ./.claude/skills)\n- make a plugin and have a Claude Code Plugin marketplace where users can upload and install plugins (read more on the documentation here)\nFor smaller teams working across relatively few repos, checking your skills into repos works well. But every skill that is checked in also adds a little bit to the context of the model. As you scale, an internal plugin marketplace allows you to distribute skills and let your team decide which ones to install.\n\nManaging a Marketplace\nHow do you decide which skills go in a marketplace? How do people submit them?\nWe don't have a centralized team that decides; instead we try and find the most useful skills organically. If you have a skill that you want people to try out, you can upload it to a sandbox folder in GitHub and point people to it in Slack or other forums.\nOnce a skill has gotten traction (which is up to the skill owner to decide), they can put in a PR to move it into the marketplace.\nA note of warning, it can be quite easy to create bad or redundant skills, so making sure you have some method of curation before release is important.\n\nComposing Skills\nYou may want to have skills that depend on each other. For example, you may have a file upload skill that uploads a file, and a CSV generation skill that makes a CSV and uploads it. This sort of dependency management is not natively built into marketplaces or skills yet, but you can just reference other skills by name, and the model will invoke them if they are installed.\n\nMeasuring Skills\nTo understand how a skill is doing, we use a PreToolUse hook that lets us log skill usage within the company (example code here). This means we can find skills that are popular or are undertriggering compared to our expectations.\n\nConclusion\nSkills are incredibly powerful, flexible tools for agents, but it's still early and we're all figuring out how to use them best.\nThink of this more as a grab bag of useful tips that we've seen work than a definitive guide. The best way to understand skills is to get started, experiment, and see what works for you. Most of ours began as a few lines and a single gotcha, and got better because people kept adding to them as Claude hit new edge cases.\nI hope this was helpful, let me know if you have any questions.",
+    "images": [
+      "https://x.com/trq212/article/2033949937936085378/media/2033787061128581120",
+      "https://x.com/trq212/article/2033949937936085378/media/2033778969078861826",
+      "https://x.com/trq212/article/2033949937936085378/media/2033949742137544704",
+      "https://x.com/trq212/article/2033949937936085378/media/2033779922590961669",
+      "https://x.com/trq212/article/2033949937936085378/media/2033780423952896002",
+      "https://x.com/trq212/article/2033949937936085378/media/2033780654052413443",
+      "https://x.com/trq212/article/2033949937936085378/media/2033780772872851462",
+      "https://x.com/trq212/article/2033949937936085378/media/2033780836705964036",
+      "https://x.com/trq212/article/2033949937936085378/media/2033947639721693189",
+      "https://x.com/trq212/article/2033949937936085378/media/2033781427637293056",
+      "https://x.com/trq212/article/2033949937936085378/media/2033781485233491968"
+    ]
+  }
+}

package/skills/skill-builder/templates/gap-analysis.md CHANGED Viewed

@@ -1,8 +1,16 @@
 # Gap Analysis: [Skill Name]
+## Skill Type
+[One of the 9 types from skill-types.md. Determines folder structure.]
 ## Task Description
-[What the user is trying to accomplish -- the capability this skill should provide]
+[What the user is trying to accomplish — the capability this skill should provide]
+## Degree of Freedom
+[High | Medium | Low] — [Reasoning based on task fragility and variability]
 ## Gaps Identified
@@ -29,13 +37,12 @@
 ## Patterns
-- [Recurring themes across gaps -- e.g., "Claude consistently lacks knowledge about X"]
-- [Common failure modes -- e.g., "Without guidance, Claude chooses library A when library B is required"]
+- [Recurring themes across gaps — e.g., "Claude consistently lacks knowledge about X"]
+- [Common failure modes — e.g., "Without guidance, Claude chooses library A when library B is required"]
 - [Context that was repeatedly provided manually]
-## Candidate Eval Scenarios
+## Initial Gotcha Candidates
-- [Task that would expose Gap 1 -- becomes the seed for an eval]
-- [Task that would expose Gap 2]
-- [Task that would expose multiple gaps simultaneously]
-- [Edge case that tests boundary behavior]
+- [Failure pattern distilled to one line — "Claude will try to use X when it should use Y"]
+- [Another failure pattern — "Without explicit instruction, Claude skips the validation step"]
+- [Edge case that could become a gotcha]

package/skills/skill-builder/workflows/improve-skill.md CHANGED Viewed

@@ -9,89 +9,118 @@ Observation-first flow for iterating on an existing skill.
 ---
-## Phase 1: Observe
+## Step 1: Observe
-**Goal:** Document the existing skill's current behavior by running it on real tasks.
+**Goal:** Document what’s wrong with the current skill by watching it in action or gathering user reports.
-> "Use the Skill in real workflows: Give Claude B (with the Skill loaded) actual tasks, not test scenarios"
+> "Use the Skill in real workflows: Give Claude B (with the Skill loaded) actual tasks, not test scenarios."
-### Process
+### Option A: User has observed issues
-1. Identify the skill to improve. Read its current SKILL.md and any reference files.
+Ask:
+- "What specific issue did you observe?"
+- "Can you give me a concrete task where the skill underperformed?"
+- "Is this a triggering issue (skill doesn’t activate), a quality issue (skill activates but produces poor results), or a scope issue (skill does the wrong thing)?"
-2. Ask the user what prompted the improvement:
-   - "What specific issue did you observe?"
-   - "Can you give me a concrete task where the skill underperformed?"
-   - "Is this a triggering issue (skill does not activate), a quality issue (skill activates but produces poor results), or a scope issue (skill does the wrong thing)?"
+### Option B: No observations yet — spawn a test
-3. Run the existing skill on 2-3 real tasks. For each, spawn a subagent:
+Spawn a subagent with the existing skill on a real task (see delegation-map.md for the spawn pattern). Read the transcript when complete.
-   ```
-   Execute this task using the skill at [path-to-existing-skill]:
-   - Read the skill at [path]/SKILL.md and follow its instructions
-   - Task: [realistic task from user]
-   - Save outputs to: [skill-name]-workspace/observation/task-[N]/outputs/
-   - Save transcript to: [skill-name]-workspace/observation/task-[N]/transcript.md
-   ```
+### Transcript analysis
-4. Analyze the transcripts. Document observations:
-   - Where did the skill work well?
-   - Where did it fail or produce subpar results?
-   - Did Claude B follow the skill's instructions as written?
-   - Did Claude B ignore any sections or files?
-   - Did Claude B explore in unexpected directions?
+> "Watch for unexpected exploration paths, missed connections, overreliance on certain sections, and ignored content."
-5. Generate a gap analysis (same template as new-skill Phase 1) focused on the delta between current behavior and desired behavior.
+Document:
+- Where did the skill work well?
+- Where did it fail or produce subpar results?
+- Did Claude B follow the skill’s instructions as written?
+- Did Claude B ignore any sections or files?
+- Did Claude B explore in unexpected directions?
+- What would a gotcha have prevented?
-**Output:** `[skill-name]-workspace/gap-analysis.md` with observation-based gaps
+**Output:** Observation notes with specific failure examples.
 ---
-## Phase 2-6: Follow the New Skill Workflow
+## Step 2: Diagnose
-From here, follow the same phases as `${CLAUDE_SKILL_DIR}/workflows/new-skill.md`, starting at Phase 2 (Build Evals).
+**Goal:** Classify each failure so you know which best practice to apply.
-### Collaborative package orchestration (Phases 2–6)
+Failure classification:
-Whenever Phases 2–6 will touch **multiple files**, **progressive disclosure layout**, or use **checkpointed file-by-file rollout**, treat this as **required** before expanding or rewriting the tree:
+| Symptom | Diagnosis | Apply |
+|---|---|---|
+| Skill never activates when it should | Description missing trigger phrases or too vague | Principles: Description field |
+| Skill activates when it shouldn’t | Description too broad, no refusal cases | Principles: Constraints and refusal cases |
+| Claude reads wrong files first | Structure not intuitive, hub doesn’t guide well | Progressive disclosure |
+| Claude ignores a companion file | File not signaled in SKILL.md or poorly linked | File index, hub pattern |
+| Claude over-explains basics | Skill states what Claude already knows | Principles: Concision |
+| Claude follows instructions too rigidly | Skill railroads instead of guiding | Principles: Degree of freedom |
+| Claude makes same mistake repeatedly | Missing gotcha | Principles: Gotchas |
+| Claude errors on script execution | Script doesn’t handle errors, missing deps | Principles: Scripts |
+| Output format is wrong | Missing template or examples | Principles: Templates and examples |
-1. Read `skill-refinement-package.md` from the installed prompt-generator skill, typically at `~/.claude/skills/prompt-generator/templates/skill-refinement-package.md` (source dependency: [jl-cmd/prompt-generator](https://github.com/jl-cmd/prompt-generator)).
-2. Run `/prompt-generator` with that template’s token table filled: set `[[BASELINE_SKILL_ROOT]]` to the existing skill directory, `[[WORKSPACE_ROOT]]` to your iteration workspace (in-place or snapshot per user preference), and `[[DESIGN_INPUT_GLOB]]` to this workflow’s observation-based `gap-analysis.md` when it exists.
+**Output:** Diagnosis per failure — which best practice was violated.
-Use `skill-from-ground-up.md` **only** for **greenfield** packages where no baseline skill directory exists yet; use `skill-refinement-package.md` for every refinement anchored to an existing skill.
+---
+## Step 3: Apply Patterns
+**Goal:** Fix each diagnosed failure by applying the specific best practice that addresses it.
+> "Only change what the feedback demands. Do not reorganize working content."
+For each diagnosis from Step 2:
+1. Read the relevant section in `${CLAUDE_SKILL_DIR}/references/progressive-disclosure.md` or the SKILL.md principles.
+2. Make the minimum change that addresses the failure.
+3. Verify the fix doesn’t break anything that was working.
+Delegate larger rewrites to `/skill-writer` using the refine-skill handoff from delegation-map.md.
+**Output:** Modified skill files with targeted fixes.
+---
+## Step 4: Capture Gotchas
+**Goal:** Every observation is a gotcha candidate. Accumulate them.
-Key differences from the new-skill flow:
+> "Ideally, you will update your skill over time to capture these gotchas."
-- **Phase 2 (Build Evals):** Evals should test the specific issues observed in Phase 1, not hypothetical gaps.
+For each failure observed in Step 1:
-- **Phase 3 (Write Skill):** Instead of writing from scratch, invoke `/skill-writer` with:
+1. Distill it to a one-line gotcha: what went wrong and the signal that should have prevented it.
+2. Add it to the skill’s Gotchas section.
+3. If the failure mode is about skill-builder itself (not the skill being improved), add it to skill-builder’s own Gotchas section.
-  ```
-  Refine this existing skill based on observation findings.
+**Output:** Updated gotchas in the skill’s SKILL.md (and potentially skill-builder’s SKILL.md).
+---
-  Current SKILL.md: [reference or paste current skill]
-  Gap analysis: [reference observation-based gaps]
-  Eval scenarios: [reference evals]
+## Step 5: Self-Audit
-  Constraint: Preserve what works. Only change what the observations demand.
-  ```
+**Goal:** Re-verify the modified skill against all best practices.
-- **Phase 4 (Test):** The baseline is the CURRENT skill (snapshot it before editing). Compare old-skill vs new-skill, not with-skill vs without-skill.
+Same process as new-skill Step 5:
+1. Read `${CLAUDE_SKILL_DIR}/references/self-audit-checklist.md`.
+2. Check every item. Fix failures. Re-check.
+3. Pay special attention to items that overlap with the diagnosis from Step 2 — those were the failures; confirm they’re now fixed.
+**Output:** Completed checklist, all PASS or N/A.
+---
-  Before making any changes, snapshot the existing skill:
-  ```bash
-  cp -r [skill-path] [workspace]/skill-snapshot/
-  ```
+## Step 6: Deliver
-  Then for baseline runs, point subagents at the snapshot:
-  ```
-  Execute this task using the ORIGINAL skill at [workspace]/skill-snapshot/:
-  - Read the skill and follow its instructions
-  - Task: [eval prompt]
-  - Save outputs to: [workspace]/iteration-N/eval-[name]/old_skill/outputs/
-  - Save transcript to: [workspace]/iteration-N/eval-[name]/old_skill/transcript.md
-  ```
+**Goal:** Hand off the improved skill with delta summary.
-- **Phase 5 (Iterate):** Same process. The improvement loop compares new version against the snapshot.
+Present to the user:
-- **Phase 6 (Polish):** Same process. Run description optimization if triggering was an issue.
+1. **What was observed** — summary of failures from Step 1.
+2. **What was diagnosed** — which best practices were violated.
+3. **What changed** — delta summary (files modified, lines added/removed).
+4. **New gotchas added** — list of gotchas captured.
+5. **Audit summary** — post-fix audit results.
+6. **Suggested re-test** — a concrete task to verify the fix with Claude B.