npm - claude-dev-env - Versions diffs - 1.38.0 → 1.39.0 - Mend

claude-dev-env 1.38.0 → 1.39.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (271) hide show

package/CLAUDE.md +10 -36
package/_shared/pr-loop/audit-reply-template.md +147 -0
package/_shared/pr-loop/fix-protocol.md +25 -4
package/_shared/pr-loop/gh-payloads.md +37 -50
package/_shared/pr-loop/scripts/code_rules_gate.py +0 -60
package/_shared/pr-loop/scripts/config/post_audit_thread_constants.py +189 -0
package/_shared/pr-loop/scripts/post_audit_thread.py +947 -0
package/_shared/pr-loop/scripts/tests/test_code_rules_gate.py +0 -19
package/_shared/pr-loop/scripts/tests/test_post_audit_thread.py +923 -0
package/_shared/pr-loop/scripts/tests/test_post_audit_thread_constants.py +127 -0
package/_shared/pr-loop/state-schema.md +1 -1
package/agents/clean-coder.md +2 -2
package/bin/install.mjs +6 -7
package/bin/install.test.mjs +8 -0
package/commands/doc-gist.md +16 -0
package/commands/plan.md +0 -2
package/commands/review-plan.md +1 -1
package/docs/CODE_RULES.md +122 -2
package/hooks/blocking/bot_mention_comment_blocker.py +75 -0
package/hooks/blocking/code_rules_enforcer.py +1236 -161
package/hooks/blocking/convergence_gate_blocker.py +130 -0
package/hooks/blocking/destructive_command_blocker.py +74 -0
package/hooks/blocking/gh_body_arg_blocker.py +30 -0
package/hooks/blocking/md_to_html_blocker.py +119 -0
package/hooks/blocking/test_bot_mention_comment_blocker.py +131 -0
package/hooks/blocking/test_code_rules_enforcer.py +21 -0
package/hooks/blocking/test_code_rules_enforcer_any_exempt_files.py +70 -0
package/hooks/blocking/test_code_rules_enforcer_any_imports_and_cast.py +92 -0
package/hooks/blocking/test_code_rules_enforcer_banned_import_alias.py +143 -0
package/hooks/blocking/test_code_rules_enforcer_banned_prefixes.py +152 -0
package/hooks/blocking/test_code_rules_enforcer_bare_except.py +120 -0
package/hooks/blocking/test_code_rules_enforcer_boundary_types.py +175 -0
package/hooks/blocking/test_code_rules_enforcer_cap_meta.py +0 -1
package/hooks/blocking/test_code_rules_enforcer_collection_prefix.py +50 -0
package/hooks/blocking/test_code_rules_enforcer_docstring_format.py +255 -0
package/hooks/blocking/test_code_rules_enforcer_inline_tuple_string_magic.py +130 -0
package/hooks/blocking/test_code_rules_enforcer_stub_implementations.py +141 -0
package/hooks/blocking/test_code_rules_enforcer_test_branching.py +143 -0
package/hooks/blocking/test_code_rules_enforcer_thin_wrapper_files.py +169 -0
package/hooks/blocking/test_code_rules_enforcer_todo_markers.py +99 -0
package/hooks/blocking/test_code_rules_enforcer_typed_dict_pairs.py +141 -0
package/hooks/blocking/test_code_rules_enforcer_unused_imports.py +158 -0
package/hooks/blocking/test_convergence_gate_blocker.py +63 -0
package/hooks/blocking/test_destructive_command_blocker.py +146 -0
package/hooks/blocking/test_destructive_command_blocker_no_verify.py +102 -0
package/hooks/blocking/test_gh_body_arg_blocker.py +45 -0
package/hooks/blocking/test_md_to_html_blocker.py +317 -0
package/hooks/config/any_type_config.py +7 -0
package/hooks/config/banned_identifiers_constants.py +11 -0
package/hooks/config/blocking_check_limits.py +38 -0
package/hooks/config/bot_mention_comment_blocker_constants.py +20 -0
package/hooks/config/code_rules_enforcer_constants.py +53 -0
package/hooks/config/convergence_branch_constants.py +9 -0
package/hooks/config/doc_gist_auto_publish_constants.py +18 -0
package/hooks/config/html_companion_constants.py +20 -0
package/hooks/config/inline_tuple_string_magic_constants.py +22 -0
package/hooks/config/test_banned_identifiers_constants.py +17 -0
package/hooks/hooks.json +28 -20
package/hooks/pyproject.toml +69 -0
package/hooks/validators/mypy_integration.py +47 -1
package/hooks/validators/run_all_validators.py +3 -3
package/hooks/validators/test_mypy_integration.py +50 -1
package/hooks/workflow/doc_gist_auto_publish.py +144 -0
package/hooks/workflow/md_to_html_companion.py +365 -0
package/hooks/workflow/test_doc_gist_auto_publish.py +117 -0
package/hooks/workflow/test_md_to_html_companion.py +452 -0
package/package.json +1 -1
package/rules/gh-body-file.md +2 -0
package/scripts/Install-SweepEmptyDirs.ps1 +111 -0
package/scripts/check.ps1 +106 -0
package/scripts/config/timing.py +11 -0
package/scripts/sweep_empty_dirs.py +138 -0
package/scripts/sync_to_cursor/rules.py +1 -1
package/scripts/test_sweep_empty_dirs.py +183 -0
package/skills/_shared/pr-loop/prompts/pr-consistency-audit.xml +323 -0
package/skills/_shared/pr-loop/scripts/_cli_utils.py +22 -0
package/skills/_shared/pr-loop/scripts/_path_resolver.py +165 -0
package/skills/_shared/pr-loop/scripts/_xml_utils.py +20 -0
package/skills/_shared/pr-loop/scripts/build_audit_prompt.py +182 -0
package/skills/_shared/pr-loop/scripts/build_fix_prompt.py +185 -0
package/skills/_shared/pr-loop/scripts/config/__init__.py +0 -0
package/skills/_shared/pr-loop/scripts/config/path_resolver_constants.py +78 -0
package/skills/_shared/pr-loop/scripts/init_loop_state.py +135 -0
package/skills/_shared/pr-loop/scripts/teardown_worktrees.py +175 -0
package/skills/_shared/pr-loop/scripts/write_audit_outcomes.py +182 -0
package/skills/_shared/pr-loop/scripts/write_fix_outcomes.py +206 -0
package/skills/bugteam/CONSTRAINTS.md +21 -22
package/skills/bugteam/EXAMPLES.md +3 -3
package/skills/bugteam/PROMPTS.md +227 -67
package/skills/bugteam/SKILL.md +114 -455
package/skills/bugteam/reference/README.md +1 -1
package/skills/bugteam/reference/audit-and-teammates.md +112 -39
package/skills/bugteam/reference/audit-contract.md +4 -22
package/skills/bugteam/reference/copilot-gap-analysis.md +8 -5
package/skills/bugteam/reference/design-rationale.md +2 -2
package/skills/bugteam/reference/github-pr-reviews.md +50 -57
package/skills/bugteam/reference/obstacles/audit-assign-ids.md +13 -0
package/skills/bugteam/reference/obstacles/audit-capture-excerpts.md +13 -0
package/skills/bugteam/reference/obstacles/audit-walk-categories.md +13 -0
package/skills/bugteam/reference/obstacles/audit-write-xml.md +13 -0
package/skills/bugteam/reference/obstacles/fix-append-summary.md +13 -0
package/skills/bugteam/reference/obstacles/fix-apply-fixes.md +13 -0
package/skills/bugteam/reference/obstacles/fix-git-add-commit.md +13 -0
package/skills/bugteam/reference/obstacles/fix-git-push.md +13 -0
package/skills/bugteam/reference/obstacles/fix-post-reply.md +13 -0
package/skills/bugteam/reference/obstacles/fix-publish-summary.md +13 -0
package/skills/bugteam/reference/obstacles/fix-py-compile.md +13 -0
package/skills/bugteam/reference/obstacles/fix-read-files.md +13 -0
package/skills/bugteam/reference/obstacles/fix-resolve-thread.md +13 -0
package/skills/bugteam/reference/obstacles/fix-test-suite.md +13 -0
package/skills/bugteam/reference/obstacles/fix-violation-count.md +13 -0
package/skills/bugteam/reference/obstacles/fix-write-xml.md +13 -0
package/skills/bugteam/reference/team-setup.md +106 -9
package/skills/bugteam/reference/teardown-publish-permissions.md +39 -8
package/skills/bugteam/scripts/README.md +60 -0
package/skills/bugteam/scripts/_claude_permissions_common.py +358 -0
package/skills/bugteam/scripts/bugteam_code_rules_gate.py +976 -0
package/skills/bugteam/scripts/bugteam_fix_hookspath.py +375 -0
package/skills/bugteam/scripts/bugteam_preflight.py +294 -0
package/skills/bugteam/scripts/config/bugteam_code_rules_gate_constants.py +25 -0
package/skills/bugteam/scripts/config/bugteam_fix_hookspath_constants.py +26 -0
package/skills/bugteam/scripts/config/bugteam_preflight_constants.py +35 -0
package/skills/bugteam/scripts/config/claude_permissions_common_constants.py +20 -0
package/skills/bugteam/scripts/config/probe_code_rules_enforcer_check_constants.py +12 -0
package/skills/bugteam/scripts/config/windows_safe_rmtree_constants.py +7 -0
package/skills/bugteam/scripts/grant_project_claude_permissions.py +175 -0
package/skills/bugteam/scripts/probe_code_rules_enforcer_check.py +107 -0
package/skills/bugteam/scripts/revoke_project_claude_permissions.py +220 -0
package/skills/bugteam/scripts/test__claude_permissions_common.py +112 -0
package/skills/bugteam/scripts/test_bugteam_code_rules_gate.py +400 -0
package/skills/bugteam/scripts/test_bugteam_fix_hookspath.py +384 -0
package/skills/bugteam/scripts/test_bugteam_preflight.py +268 -0
package/skills/bugteam/scripts/test_claude_permissions_common.py +195 -0
package/skills/bugteam/scripts/test_grant_project_claude_permissions.py +55 -0
package/skills/bugteam/scripts/test_probe_code_rules_enforcer_check.py +76 -0
package/skills/bugteam/scripts/test_revoke_project_claude_permissions.py +55 -0
package/skills/bugteam/scripts/test_windows_safe_rmtree.py +108 -0
package/skills/bugteam/scripts/windows_safe_rmtree.py +100 -0
package/skills/bugteam/test_skill_additions.py +1 -11
package/skills/code/SKILL.md +176 -0
package/skills/doc-gist/SKILL.md +99 -0
package/skills/doc-gist/references/examples/01-exploration-code-approaches.html +453 -0
package/skills/doc-gist/references/examples/02-exploration-visual-designs.html +515 -0
package/skills/doc-gist/references/examples/03-code-review-pr.html +638 -0
package/skills/doc-gist/references/examples/04-code-understanding.html +491 -0
package/skills/doc-gist/references/examples/05-design-system.html +629 -0
package/skills/doc-gist/references/examples/06-component-variants.html +605 -0
package/skills/doc-gist/references/examples/07-prototype-animation.html +455 -0
package/skills/doc-gist/references/examples/08-prototype-interaction.html +396 -0
package/skills/doc-gist/references/examples/09-slide-deck.html +592 -0
package/skills/doc-gist/references/examples/10-svg-illustrations.html +492 -0
package/skills/doc-gist/references/examples/11-status-report.html +528 -0
package/skills/doc-gist/references/examples/12-incident-report.html +596 -0
package/skills/doc-gist/references/examples/13-flowchart-diagram.html +395 -0
package/skills/doc-gist/references/examples/14-research-feature-explainer.html +381 -0
package/skills/doc-gist/references/examples/15-research-concept-explainer.html +368 -0
package/skills/doc-gist/references/examples/16-implementation-plan.html +702 -0
package/skills/doc-gist/references/examples/17-pr-writeup.html +595 -0
package/skills/doc-gist/references/examples/18-editor-triage-board.html +573 -0
package/skills/doc-gist/references/examples/19-editor-feature-flags.html +663 -0
package/skills/doc-gist/references/examples/20-editor-prompt-tuner.html +722 -0
package/skills/doc-gist/references/examples/README.md +5 -0
package/skills/doc-gist/scripts/config/__init__.py +0 -0
package/skills/doc-gist/scripts/config/gist_upload_constants.py +16 -0
package/skills/doc-gist/scripts/gist_upload.py +177 -0
package/skills/doc-gist/scripts/test_gist_upload.py +51 -0
package/skills/findbugs/SKILL.md +68 -2
package/skills/monitor-open-prs/SKILL.md +13 -32
package/skills/monitor-open-prs/test_skill_contract.py +0 -11
package/skills/pr-consistency-audit/SKILL.md +112 -0
package/skills/pr-consistency-audit/reference/detection-rules.md +96 -0
package/skills/pr-consistency-audit/reference/illustrations.md +78 -0
package/skills/pr-converge/SKILL.md +227 -23
package/skills/pr-converge/config/__init__.py +0 -0
package/skills/pr-converge/config/constants.py +62 -0
package/skills/pr-converge/reference/convergence-gates.md +138 -44
package/skills/pr-converge/reference/examples.md +43 -11
package/skills/pr-converge/reference/fix-protocol.md +6 -5
package/skills/pr-converge/reference/ground-rules.md +5 -3
package/skills/pr-converge/reference/multi-pr-orchestration.md +44 -19
package/skills/pr-converge/reference/obstacles/fix-post-replies.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-publish-summary.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-push.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-read-filelines.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-reset-state.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-resolve-threads.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-spawn-clean-coder.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-stage-commit.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-trigger-bugbot.md +13 -0
package/skills/pr-converge/reference/obstacles/fix-write-test.md +13 -0
package/skills/pr-converge/reference/per-tick.md +90 -31
package/skills/pr-converge/reference/state-schema.md +22 -1
package/skills/pr-converge/reference/stop-conditions.md +9 -7
package/skills/pr-converge/scripts/README.md +34 -46
package/skills/pr-converge/scripts/check_bugbot_ci.py +174 -0
package/skills/pr-converge/scripts/check_convergence.py +497 -0
package/skills/pr-converge/scripts/check_pending_reviews.py +154 -0
package/skills/pr-converge/scripts/config/pr_converge_constants.py +118 -0
package/skills/pr-converge/scripts/fetch_copilot_reviews.py +134 -0
package/skills/pr-converge/scripts/post_fix_reply.py +168 -0
package/skills/pr-converge/workflows/schedule-wakeup-loop.md +5 -12
package/skills/qbug/SKILL.md +132 -27
package/skills/session-log/SKILL.md +216 -114
package/skills/session-tidy/SKILL.md +1 -1
package/skills/skill-builder/SKILL.md +138 -56
package/skills/skill-builder/references/delegation-map.md +72 -113
package/skills/skill-builder/references/progressive-disclosure.md +122 -0
package/skills/skill-builder/references/self-audit-checklist.md +92 -0
package/skills/skill-builder/references/skill-types.md +228 -0
package/skills/skill-builder/references/thariq-x-post-skills.json +33 -0
package/skills/skill-builder/templates/gap-analysis.md +15 -8
package/skills/skill-builder/workflows/improve-skill.md +86 -57
package/skills/skill-builder/workflows/new-skill.md +80 -168
package/skills/skill-builder/workflows/polish-skill.md +78 -54
package/skills/structure-prompt/SKILL.md +50 -0
package/skills/structure-prompt/reference/adversarial-tuning.md +62 -0
package/skills/structure-prompt/reference/block-classification.md +27 -0
package/skills/structure-prompt/reference/canonical-case.md +48 -0
package/skills/structure-prompt/reference/citation-depth.md +70 -0
package/skills/structure-prompt/reference/cleanup.md +33 -0
package/skills/structure-prompt/reference/constraints.md +33 -0
package/skills/structure-prompt/reference/directives.md +37 -0
package/skills/structure-prompt/reference/examples.md +72 -0
package/skills/structure-prompt/reference/instantiation.md +51 -0
package/skills/structure-prompt/reference/output-contract.md +72 -0
package/skills/structure-prompt/reference/per-category.md +23 -0
package/skills/structure-prompt/reference/persona.md +38 -0
package/skills/structure-prompt/reference/research.md +33 -0
package/skills/structure-prompt/reference/structure.md +28 -0
package/agents/code-standards-agent.md +0 -93
package/agents/groq-coder.md +0 -113
package/agents/plan-executor.md +0 -226
package/agents/project-docs-analyzer.md +0 -53
package/agents/project-structure-organizer-agent.md +0 -72
package/agents/skill-to-agent-converter.md +0 -370
package/agents/skill-writer-agent.md +0 -470
package/agents/user-docs-writer.md +0 -67
package/agents/workflow-visual-documenter.md +0 -82
package/commands/readability-review.md +0 -20
package/hooks/mypy.ini +0 -2
package/hooks/notification/attention_needed_notify.py +0 -71
package/hooks/notification/claude_notification_handler.py +0 -67
package/hooks/notification/notification_utils.py +0 -267
package/hooks/notification/subagent_complete_notify.py +0 -381
package/hooks/notification/test_attention_needed_notify.py +0 -47
package/hooks/notification/test_claude_notification_handler.py +0 -54
package/hooks/notification/test_notification_utils.py +0 -91
package/hooks/notification/test_subagent_complete_notify.py +0 -79
package/scripts/config/groq_bugteam_config.py +0 -230
package/scripts/config/test_groq_bugteam_config.py +0 -83
package/scripts/config/test_spec_implementer_prompt.py +0 -32
package/scripts/groq_bugteam.README.md +0 -131
package/scripts/groq_bugteam.py +0 -647
package/scripts/groq_bugteam_dotenv.py +0 -40
package/scripts/groq_bugteam_spec.py +0 -226
package/scripts/test_groq_bugteam.py +0 -529
package/scripts/test_groq_bugteam_apply_fix_from_spec.py +0 -426
package/scripts/test_groq_bugteam_dotenv.py +0 -66
package/scripts/test_groq_bugteam_spec.py +0 -338
package/skills/bugteam/SKILL_EVALS.md +0 -309
package/skills/dream/SKILL.md +0 -118
package/skills/ingest/SKILL.md +0 -40
package/skills/npm-creator/SKILL.md +0 -187
package/skills/readability-review/SKILL.md +0 -127
package/skills/resume-review/SKILL.md +0 -261
package/skills/rule-audit/SKILL.md +0 -307
package/skills/rule-creator/SKILL.md +0 -150
package/skills/searching-obsidian-vault/SKILL.md +0 -131
package/skills/skill-writer/REFERENCE.md +0 -284
package/skills/skill-writer/SKILL.md +0 -222
package/skills/tdd-team/SKILL.md +0 -128

package/skills/skill-builder/workflows/new-skill.md CHANGED Viewed

@@ -1,235 +1,147 @@
 # New Skill Workflow
-Full evaluation-driven lifecycle for building a new skill from scratch.
+Best-practice-driven lifecycle for building a skill from scratch.
 ## Prerequisites
 - The user has a task or domain they want to capture as a skill
 - No existing skill for this capability (or intentionally starting fresh)
-### Ground-up package layout (required before multi-file implementation)
-When the outcome includes **ARCHITECTURE.md**, **REFERENCE / EXAMPLES / WORKFLOWS**, and **`evals/*.json`** under a workspace (Anthropic-style progressive disclosure plus checkpointed rollout):
-1. Read `prompt-generator/templates/skill-from-ground-up.md` from the installed `~/.claude/skills/` tree (provided by [@jl-cmd/prompt-generator](https://github.com/jl-cmd/prompt-generator)).
-2. Run `/prompt-generator` using that template (substitute tokens per its table) **before** Phase 3 expands the repo; align the XML scope block with this workflow’s workspace and evidence rules.
-3. Keep Phase 1–2 artifacts honest: eval prompts and expectations stay grounded in **real** user scenarios; the template reinforces eval rows that reference pasted or explicitly approved evidence only.
-Skip this block only when the user explicitly wants a **single-file** SKILL.md with no staged package plan.
-Refinements to an **existing** skill package use `prompt-generator/templates/skill-refinement-package.md` instead (see `improve-skill.md`).
----
-## Phase 1: Identify Gaps
-**Goal:** Document what fails or requires repeated context when working without a skill.
-### Process
-1. Have a guided conversation to uncover gaps. Explore these areas:
-   - "What task were you doing when you realized you needed a skill?"
-   - "What context did you repeatedly provide to Claude?"
-   - "Where did Claude fail or produce subpar results without guidance?"
-   - "What domain knowledge was missing?"
-   - "What specific format or structure did you need?"
-   - "Were there tools or scripts that needed to be used in a particular way?"
-   - "What rules or constraints did Claude violate?"
-2. As patterns emerge, probe for eval-worthy scenarios:
-   - "Can you give me a concrete example of a task where this failed?"
-   - "What would success look like for that specific task?"
-   - "Are there edge cases where the right approach changes?"
-3. Generate `gap-analysis.md` from the conversation using the template at `${CLAUDE_SKILL_DIR}/templates/gap-analysis.md`. Fill in all sections from what was discussed.
-4. Review the gap analysis with the user. Confirm completeness before moving to Phase 2.
-**Output:** `[skill-name]-workspace/gap-analysis.md`
 ---
-## Phase 2: Build Evals
-**Goal:** Create 3+ evaluation scenarios that test the identified gaps. Establish a baseline.
+## Step 1: Classify
-### Process
+**Goal:** Determine the skill type. Type dictates folder structure.
-1. Transform each gap into at least one eval scenario. Each scenario needs:
-   - A realistic user prompt (detailed and specific, like a real request)
-   - A description of what success looks like
-   - Objectively verifiable expectations (assertions)
+1. Read `${CLAUDE_SKILL_DIR}/references/skill-types.md`.
-2. Draft evals using the schema at `${CLAUDE_SKILL_DIR}/templates/eval-scenario.json`. Ensure:
-   - Minimum 3 scenarios (official requirement)
-   - Every identified gap has at least one scenario testing it
-   - Expectations are objectively verifiable, not subjective
-   - Prompts sound like things a real user would say
+2. Ask the user about the skill’s purpose:
-3. Review eval scenarios with the user. Adjust until both sides are satisfied.
+   > "What will this skill help Claude do?"
-4. Save to `[skill-name]-workspace/evals/evals.json`.
+   Match the answer against the 9 types. If ambiguous, present the top 2-3 matches and ask the user to choose.
-5. **Establish baseline.** For each eval, spawn a subagent WITHOUT any skill:
+3. Record the classification: type number, type name, recommended folders.
-   ```
-   Execute this task with NO skill loaded:
-   - Task: [eval prompt]
-   - Input files: [eval files if any, or "none"]
-   - Save all output files to: [workspace]/iteration-0/eval-[name]/without_skill/outputs/
-   - Save a complete transcript of your work to: [workspace]/iteration-0/eval-[name]/without_skill/transcript.md
-   ```
-   Spawn all baseline runs in parallel. Capture timing data when each completes.
-6. Grade baseline results using the skill-creator grading agent. See `${CLAUDE_SKILL_DIR}/references/delegation-map.md` for exact grading invocation.
-**Output:** `[skill-name]-workspace/evals/evals.json` and baseline results in `iteration-0/`
+**Output:** Type classification with folder plan.
 ---
-## Phase 3: Write Minimal Skill
+## Step 2: Scaffold
-**Goal:** Create just enough skill content to address the documented gaps and pass evaluations.
+**Goal:** Create the folder structure. Every skill starts with the same skeleton plus type-specific additions.
-### Process
+1. Create the skill directory if it doesn’t exist.
-1. Invoke `/skill-writer` with this context:
+2. Create the minimum structure:
    ```
-   Create a skill based on this gap analysis and eval scenarios.
-   Gap analysis: [reference or paste gap-analysis.md]
-   Eval scenarios: [reference or paste evals.json expected_output and expectations]
-   Baseline failures: [summarize what Claude got wrong in iteration-0]
-   Constraint: Write the minimum instructions needed to address these specific gaps.
-   Every line must serve a documented gap. Do not over-document.
+   skill-name/
+   ├── SKILL.md          # Hub — every skill has this
    ```
-2. `/skill-writer` will run its workflow: classify type, set degree of freedom, ask clarifying questions, produce the SKILL.md artifact.
+3. Add type-specific directories based on Step 1 classification (see `${CLAUDE_SKILL_DIR}/references/skill-types.md` for the folder recommendations per type).
-3. Review the draft with the user:
-   - "Does this address all the gaps we identified?"
-   - "Is anything here unnecessary or over-engineered?"
-   - "Would this pass our eval scenarios?"
+4. Verify the scaffold matches the type recommendation.
-4. Save the skill to its target directory.
+> "As your Skill grows, you can bundle additional content that Claude loads only when needed."
-**Output:** The skill's SKILL.md (and optional reference files)
+**Output:** Directory tree with SKILL.md stub.
 ---
-## Phase 4: Test (Feedback Loop)
+## Step 3: Gather
-**Goal:** Run the skill on eval scenarios, compare against baseline, identify remaining gaps.
+**Goal:** Collect domain knowledge, failure patterns, and gotchas from the user.
-### Process
+> "Build a Gotchas Section — these sections should be built up from common failure points that Claude runs into when using your skill."
-1. **Spawn all runs in parallel.** For each eval scenario, launch a with-skill subagent:
+### Interview questions
-   ```
-   Execute this task:
-   - Read the skill at [path-to-skill]/SKILL.md and follow its instructions
-   - Task: [eval prompt from evals.json]
-   - Input files: [eval files if any, or "none"]
-   - Save all output files to: [workspace]/iteration-N/eval-[name]/with_skill/outputs/
-   - Save a complete transcript of your work to: [workspace]/iteration-N/eval-[name]/with_skill/transcript.md
-   ```
+Ask the user:
-   For iteration-1, the without-skill baseline already exists from Phase 2.
+1. "What task were you doing when you realized you needed a skill?"
+2. "What context did you repeatedly provide to Claude?"
+3. "Where did Claude fail or produce subpar results without guidance?"
+4. "What does Claude consistently get wrong about this domain?"
+5. "What specific format or structure do you need in the output?"
+6. "Are there rules or constraints Claude must never violate?"
+7. "What tools, scripts, or libraries does Claude need to use?"
+8. "Does this skill need to run differently for different models (Haiku vs Opus)?"
-2. **While runs are in progress**, review and refine assertions if needed based on what was learned from the baseline.
+### Generate gap analysis
-3. **When runs complete**, immediately capture timing data (`total_tokens`, `duration_ms`) to `timing.json` in each run directory. This data is only available in the task completion notification.
+Use the template at `${CLAUDE_SKILL_DIR}/templates/gap-analysis.md`. Fill in:
-4. **Grade each run** using the skill-creator grading agent. See `${CLAUDE_SKILL_DIR}/references/delegation-map.md` for the grading process.
+- Skill type and degree of freedom
+- Task description
+- Gaps identified (what failed, what was needed)
+- Recurring patterns across gaps
+- Initial gotcha candidates
-5. **Aggregate into benchmark** using skill-creator's aggregation script. See delegation-map.md for the exact command.
+### Assess degree of freedom
-6. **Launch the eval viewer** using skill-creator's generate_review.py. See delegation-map.md for the exact command. For iteration 2+, include `--previous-workspace` to show diffs.
+> "Match the level of specificity to the task’s fragility and variability."
-7. Tell the user to review in the viewer:
-   - "Outputs" tab: click through each test case, leave feedback
-   - "Benchmark" tab: quantitative comparison (pass rates, timing, tokens)
+| Degree | When | Example |
+|---|---|---|
+| High | Multiple valid approaches, context-dependent | Code review guidelines |
+| Medium | Preferred pattern exists, some variation ok | Report generation with template |
+| Low | Fragile operations, consistency critical | Database migration with exact script |
-8. Wait for the user to complete their review.
+Record the assessment with reasoning.
-**Output:** `grading.json`, `benchmark.json`, `feedback.json` in the iteration directory
+**Output:** Completed gap analysis, initial gotchas list, degree-of-freedom assessment.
 ---
-## Phase 5: Iterate
-**Goal:** Refine the skill based on observed Claude B behavior and user feedback.
+## Step 4: Write
-### Process
+**Goal:** Produce the skill package — SKILL.md and companion files.
-1. Read `feedback.json` from the viewer. Empty feedback means the user was satisfied with that test case.
+Delegate to `/skill-writer` using the structured handoff from `${CLAUDE_SKILL_DIR}/references/delegation-map.md`.
-2. Read transcripts from Phase 4 runs. Watch for the signals the official docs highlight:
-   - **Unexpected exploration paths** -- Claude B read files in an order you did not anticipate
-   - **Missed connections** -- Claude B did not follow references to important files
-   - **Overreliance on certain sections** -- content that should be promoted to SKILL.md
-   - **Ignored content** -- files Claude B never accessed (may be unnecessary or poorly signaled)
-   - **Repeated work across test cases** -- all subagents wrote similar helper scripts (bundle them into the skill)
+The handoff must include: skill type, folder structure, gap analysis, initial gotchas, degree of freedom, constraints.
-3. Synthesize observations into actionable improvements. For each piece of feedback, identify the specific skill change that would fix it.
+After skill-writer produces the draft:
-4. Apply improvements. For significant changes, re-invoke `/skill-writer` with:
-   ```
-   Refine this existing skill based on testing observations.
+1. Verify it follows the hub layout (principle → gotchas → when-applies → process → file index → folder map).
+2. Verify SKILL.md body is under 500 lines.
+3. Verify all references are one level deep.
+4. Verify files over 100 lines have a TOC.
-   Current SKILL.md: [reference or paste]
-   User feedback: [from feedback.json -- only non-empty entries]
-   Behavioral observations: [from transcript analysis]
+Fix structural issues before proceeding.
-   Specific issues to address:
-   1. [Issue]
-   2. [Issue]
+**Output:** Complete skill package at the target directory.
-   Constraint: Only change what the feedback demands. Do not reorganize working content.
-   ```
+---
-5. Key principles for this phase (from the official docs):
-   - **Generalize from feedback** -- the skill will be used across many different prompts, not just these test cases
-   - **Keep the prompt lean** -- remove instructions that are not pulling their weight
-   - **Explain the why** -- theory of mind beats rigid MUSTs
-   - **Bundle repeated work** -- if subagents all wrote similar scripts, add them to the skill
+## Step 5: Self-Audit
-6. Return to Phase 4 with the refined skill. Continue iterating until:
-   - User feedback is all empty (satisfied with every test case)
-   - Pass rates meet acceptable thresholds
-   - No meaningful progress between iterations
+**Goal:** Verify every best practice is satisfied before delivery.
----
+1. Read `${CLAUDE_SKILL_DIR}/references/self-audit-checklist.md`.
+2. Copy the checklist into your response.
+3. Check every item against the built skill. For each: PASS, FAIL with file:line evidence, or N/A with reason.
+4. Every FAIL must be fixed before proceeding. Apply fixes, then re-check that item.
+5. When all items are PASS or N/A, proceed to Step 6.
-## Phase 6: Polish
+For an independent check, spawn a subagent to run the audit (see delegation-map.md).
-**Goal:** Optimize the skill description for triggering accuracy and run final validation.
+**Output:** Completed checklist with all items PASS or N/A.
-### Process
+---
-1. **Description optimization.** Follow the process in `${CLAUDE_SKILL_DIR}/workflows/polish-skill.md`.
+## Step 6: Deliver
-2. **Final validation.** Run the skill-writer self-check rubric against the finished skill:
-   - [ ] Description is third person with trigger phrases
-   - [ ] Under 500 lines
-   - [ ] States what to do in positive terms (not prohibition-heavy)
-   - [ ] Degree of freedom matches task fragility
-   - [ ] Progressive disclosure used (heavy content in separate files)
-   - [ ] Examples are concrete, not abstract
-   - [ ] Frontmatter fields are valid
-   - [ ] One skill = one capability
+**Goal:** Hand off the finished skill with full documentation.
-3. **Final checklist** from the official Anthropic docs:
-   - [ ] At least 3 evaluation scenarios created and passing
-   - [ ] Tested with real usage scenarios
-   - [ ] Skill solves documented gaps (not imagined requirements)
-   - [ ] Iterative refinement based on observed behavior (not assumptions)
+Present to the user:
-4. Present the finished skill to the user with:
-   - Final benchmark comparison (latest iteration vs baseline)
-   - Summary of gaps addressed
-   - Any remaining limitations or known edge cases
+1. **File map** — every file created, with its purpose.
+2. **Skill type** — classification and why it fits.
+3. **Degree of freedom** — assessment and reasoning.
+4. **Gotchas seeded** — initial gotchas captured.
+5. **Audit summary** — "All 38 items: N passed, M N/A."
+6. **Maintenance notes** — what to watch for in future usage that might warrant iteration.
+7. **Suggested first test** — a concrete task to try with Claude B.

package/skills/skill-builder/workflows/polish-skill.md CHANGED Viewed

@@ -4,89 +4,113 @@ Final optimization pass for a skill that is functionally complete.
 ## Prerequisites
-- The skill passes its evaluation scenarios
+- The skill has been used and observed
 - The user is satisfied with output quality
 - This is the final step before the skill is considered done
-### Package-aware polish (recommended)
+---
+## Step 1: Description Audit
+**Goal:** Verify the description field is optimized for model discovery.
+> "The description is critical for skill selection: Claude uses it to choose the right Skill from potentially 100+ available Skills."
+> "The description field is not a summary — it's a description of when to trigger."
-When the polish pass will touch **more than frontmatter alone** (for example `REFERENCE.md`, `EXAMPLES.md`, `WORKFLOWS.md`, link structure, or eval JSON), or the user wants **checkpointed** multi-file updates alongside description work:
+Check each requirement:
+- [ ] **Third person.** "Processes Excel files" not "I can help you process Excel files."
+- [ ] **Includes what AND when.** Both the capability and trigger contexts.
+- [ ] **Specific trigger phrases.** Different phrasings of the same intent should all match.
+- [ ] **Under 1024 characters.** Hard limit.
+- [ ] **No XML tags.**
+- [ ] **Distinguishable from similar skills.** If two skills overlap, the descriptions must make the boundary clear.
+### Trigger phrase review
+Generate 10 variations of the user's intent:
+- Formal and casual phrasings
+- Cases where the user doesn't explicitly name the skill but clearly needs it
+- Cases where this skill competes with another but should win
-1. Read `prompt-generator/templates/skill-refinement-package.md` (repository path: `skills/prompt-generator/templates/skill-refinement-package.md` in [jl-cmd/prompt-generator](https://github.com/jl-cmd/prompt-generator)).
-2. Run `/prompt-generator` with tokens filled so `ARCHITECTURE.md` records baseline inventory, planned deltas for polish, and evidence rules for any new trigger or behavior evals.
+For each, answer: would the current description cause Claude to select this skill?
-Purely **single-field** `description` edits with no structural package changes can skip this block.
+Also check 5 near-miss phrasings — adjacent domains where this skill should NOT trigger. Verify the description doesn't cause false activation.
+### Fix issues
+If the description fails any check, revise it. Show before/after with the specific change and why it improves discovery.
+**Output:** Verified description (and revised version if changes were made).
 ---
-## Step 1: Description Optimization
+## Step 2: Progressive Disclosure Audit
-Optimize the skill's description for triggering accuracy using the skill-creator's trigger eval system.
+**Goal:** Verify the file structure follows all progressive disclosure rules.
-### Generate trigger eval queries
+> "Keep SKILL.md body under 500 lines."
-Create 20 eval queries: 10 should-trigger and 10 should-not-trigger.
+Check:
-**Should-trigger queries (10):** Different phrasings of the same intent. Include:
-- Formal and casual variations
-- Cases where the user does not explicitly name the skill but clearly needs it
-- Uncommon use cases
-- Cases where this skill competes with another but should win
+- [ ] SKILL.md body under 500 lines.
+- [ ] All reference files link directly from SKILL.md (one level deep).
+- [ ] Every file over 100 lines has a table of contents.
+- [ ] File index in SKILL.md lists every companion file with its purpose.
+- [ ] Forward slashes only in all paths.
+- [ ] File names are descriptive (`form_validation_rules.md`, not `doc2.md`).
+- [ ] Scripts clearly marked as execute vs read-as-reference.
-**Should-not-trigger queries (10):** Near-misses that share keywords but need something different. Include:
-- Adjacent domains with overlapping terminology
-- Ambiguous phrasing where naive keyword matching would falsely trigger
-- Tasks that touch the skill's domain but in a context where another tool is better
+### Fix structural issues
-All queries must be realistic -- detailed, specific, with file paths, personal context, casual speech. Not abstract one-liners.
+If any check fails, restructure. Common fixes:
+- SKILL.md too long → move sections to companion files, leave links.
+- Nested references → surface all links to SKILL.md.
+- Missing TOC → add to files over 100 lines.
-### Review with user
+**Output:** Verified file structure (and restructured files if changes were made).
-Present the eval set using the skill-creator's HTML review template. See `${CLAUDE_SKILL_DIR}/references/delegation-map.md` for the exact process.
+---
-The user can edit queries, toggle should-trigger, and add/remove entries.
+## Step 3: Gotcha Freshness
-### Run optimization loop
+**Goal:** Ensure gotchas reflect current observations.
-See `${CLAUDE_SKILL_DIR}/references/delegation-map.md` for the exact command. The loop:
-1. Splits eval set into 60% train / 40% held-out test
-2. Evaluates current description (3 runs per query for reliability)
-3. Proposes improvements based on failures
-4. Re-evaluates on both train and test
-5. Iterates up to 5 times
-6. Selects best description by test score (avoids overfitting)
+> "Ideally, you will update your skill over time to capture these gotchas."
-### Apply result
+- Review the skill's Gotchas section.
+- Check against recent usage: are there new failure modes not yet captured?
+- Remove gotchas for issues that no longer occur (the skill fixed them).
+- Verify each gotcha is actionable — a reader should know what to avoid and why.
-Update the skill's SKILL.md frontmatter with the optimized description. Show the user before/after with scores.
+**Output:** Updated gotchas section (and any new gotchas for skill-builder itself).
 ---
-## Step 2: Final Validation
+## Step 4: Full Self-Audit
+**Goal:** Complete 38-point checklist pass.
-Run the skill-writer self-check rubric:
+Same as new-skill Step 5 and improve-skill Step 5:
-- [ ] Description is third person with trigger phrases
-- [ ] SKILL.md body under 500 lines
-- [ ] States what to do in positive terms (not prohibition-heavy)
-- [ ] Degree of freedom matches task fragility
-- [ ] Progressive disclosure used (heavy content in separate files)
-- [ ] No time-sensitive claims unless clearly dated
-- [ ] Examples are concrete, not abstract
-- [ ] Frontmatter fields are valid per official docs
-- [ ] One skill = one capability
-- [ ] Consistent terminology throughout
-- [ ] File references are one level deep from SKILL.md
-- [ ] Files over 100 lines have a table of contents
+1. Read `${CLAUDE_SKILL_DIR}/references/self-audit-checklist.md`.
+2. Check every item. Fix failures. Re-check.
+3. All items must be PASS or N/A.
+**Output:** Completed checklist.
 ---
-## Step 3: Final Summary
+## Step 5: Deliver
+**Goal:** Final summary of the polished skill.
-Present the finished skill to the user:
+Present to the user:
-1. **Benchmark summary:** Final pass rate vs baseline, with delta
-2. **Gaps addressed:** Map each original gap to the skill content that addresses it
-3. **Description optimization:** Before/after trigger accuracy scores
-4. **Known limitations:** Anything the skill does not handle (scope boundaries)
-5. **Maintenance notes:** What to watch for in future usage that might warrant re-iteration
+1. **Description** — final version, confirmed trigger phrases.
+2. **File structure** — folder map with line counts.
+3. **Gotchas** — current gotcha count and most recent additions.
+4. **Audit summary** — "All 38 items: N passed, M N/A."
+5. **Before/after** — description changes if any, structural changes if any.
+6. **Maintenance notes** — what to watch for, when to re-audit.

package/skills/structure-prompt/SKILL.md ADDED Viewed

@@ -0,0 +1,50 @@
+---
+name: structure-prompt
+description: >-
+  Restructure any user-provided prompt — order blocks correctly, replace persona
+  framing with task constraints, enforce per-category dispositions, replace
+  ceremony directives with measurable constraints, expand placeholder tokens
+  into real values via the sibling rubric or AskUserQuestion, add file:line
+  citations for identifiers that appear in the data body, mark the canonical
+  sub-bucket with ⭐, and sharpen generic adversarial-pass phrasing into a
+  category-specific failure-mode noun. Trigger when the user invokes
+  /structure-prompt, pastes a prompt and asks to optimize it, asks for a
+  "minimally invasive edit" to a prompt artifact, or asks to "tighten this
+  prompt."
+---
+# structure-prompt
+One pass per invocation. Classify each block of the input prompt, apply the matching spoke rules, and emit the rewritten prompt as a single fenced block (paste mode) or rewrite the file in place (file-path mode).
+## Pre-flight
+The input prompt arrives as the user's message body, as a fenced block within it, or as a file path argument. Treat the entire input as the artifact under optimization.
+## First invocation of a session
+Read [`reference/block-classification.md`](reference/block-classification.md), then [`reference/research.md`](reference/research.md), then [`reference/output-contract.md`](reference/output-contract.md).
+## Match situation, read spoke
+| Situation | Read |
+|---|---|
+| Starting any optimization | [`reference/block-classification.md`](reference/block-classification.md) |
+| A spoke needs information that isn't in the input | [`reference/research.md`](reference/research.md) |
+| Input contains a fenced code block, diff, dump, transcript, or single content region ≥ 500 characters, OR blocks appear out of canonical sequence (mission, metadata, framework, questions, output spec, data body) | [`reference/structure.md`](reference/structure.md) |
+| Input opens with a role assignment ("You are…", "Act as…", "Imagine you are…", "As a…", "Pretend to be…", "Role:…") | [`reference/persona.md`](reference/persona.md) |
+| Input names 2+ categories, surfaces, sub-buckets, items, checks, or criteria the agent processes | [`reference/per-category.md`](reference/per-category.md) |
+| Input contains performance directives ("be thorough", "think step by step", "you are an expert", "please", "kindly") | [`reference/directives.md`](reference/directives.md) |
+| Input contains narrative directives ("try to", "look at", "make sure", "consider", "be sure to", "think about") | [`reference/constraints.md`](reference/constraints.md) |
+| Input contains placeholder tokens (`[REPO/ARTIFACT]`, `[INLINE THE FULL ARTIFACT HERE]`, `[N]`, etc.) | [`reference/instantiation.md`](reference/instantiation.md) |
+| Sub-bucket bullets reference identifiers from the data body without `file:line` citations | [`reference/citation-depth.md`](reference/citation-depth.md) |
+| Framework has 5+ sub-buckets and no ⭐ canonical-case marker | [`reference/canonical-case.md`](reference/canonical-case.md) |
+| Output spec contains generic adversarial-pass phrasing ("missed at least N bugs/findings") | [`reference/adversarial-tuning.md`](reference/adversarial-tuning.md) |
+| Input has typos, mixed bullet styles, untagged code blocks, trailing whitespace, blank-line runs, or non-sequential heading levels | [`reference/cleanup.md`](reference/cleanup.md) |
+| Situation doesn't match any spoke above | [`reference/examples.md`](reference/examples.md) |
+| Emitting the rewritten prompt | [`reference/output-contract.md`](reference/output-contract.md) |
+## Folder map
+- `SKILL.md` — this hub.
+- `reference/` — rule detail per situation.

package/skills/structure-prompt/reference/adversarial-tuning.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Sharpen the adversarial-pass phrasing
+The output spec usually closes with an adversarial second-pass instruction like *assume your first pass missed at least 3 P1 bugs across these N sub-buckets — find them*. When that phrase uses a generic noun (`bugs`, `findings`, `issues`, `problems`), the skill replaces the noun with one that names the category's specific failure mode.
+## Detection
+The fix fires when the output spec contains a phrase matching this shape, with a generic noun:
+- "missed at least `<number>` [bugs / findings / issues / problems]" — optionally preceded by a severity tier (`P0` or `P1`) when the framework uses tiered findings.
+A noun is "generic" when it could apply to any audit category. A noun is "specific" when it names the failure mode of the category.
+## How to derive the specific noun
+Read the mission line and the framework header. Pull the category's domain from there. Match against this lookup:
+| Category domain | Specific failure-mode noun |
+|---|---|
+| API contracts (signatures, return types, callback shape) | contract drifts |
+| Selector / query / engine compatibility | engine-version incompatibilities |
+| Resource cleanup (handles, locks, subscriptions) | leaked resources |
+| Scoping and ordering | scope or ordering bugs |
+| Dead code | dead code paths |
+| Silent failures (swallowed exceptions, dropped errors) | silent failures |
+| Bounds and overflow | bounds or overflow bugs |
+| Security boundaries | trust-boundary violations |
+| Concurrency | concurrency hazards |
+| Code rules compliance | rule violations |
+| Codebase conflicts (incomplete propagation) | parallel sites that should have been updated alongside the diff |
+When the category sits outside this list, derive the noun from the framework's most prominent axis name (e.g., a framework whose axes all name "selectors" → "selector incompatibilities").
+## Procedure
+1. Find the adversarial-pass sentence in the output spec.
+2. Identify the generic noun in that sentence.
+3. Replace it with the specific noun from the table or framework.
+4. Keep the rest of the sentence intact: count (e.g., "3"), severity tier (e.g., "P1") when the original phrase carries one, and the closing "find them".
+## Examples
+Before (generic):
+> "assume your first pass missed at least 3 P1 bugs across these 7 sub-buckets — find them"
+After (Category B):
+> "assume your first pass missed at least 3 P1 engine-version incompatibilities across these 7 sub-buckets — find them"
+After (Category K):
+> "assume your first pass missed at least 3 P1 parallel sites that should have been updated alongside the diff across these 7 sub-buckets — find them"
+After (Category C):
+> "assume your first pass missed at least 3 P1 leaked resources across these 7 sub-buckets — find them"
+## What stays put
+When the adversarial phrase already names a specific failure mode, the noun stays. The skill changes only generic nouns.
+The count (e.g., 3) and severity tier (e.g., P1) stay intact when the original phrase carries them. Some categories name a noun that doesn't fit the P-tier model — Codebase Conflicts ("parallel sites that should have been updated alongside the diff") is the canonical example — but preservation still applies: if the original phrase includes a tier, the rewritten phrase includes it too. The rule is preservation, not insertion or removal.
+## Disposition reporting
+Every outcome emits an action note via the mechanism that [`output-contract.md`](output-contract.md) defines. When the noun was replaced: `> Gap: Adversarial-pass noun sharpened — "bugs" → "<specific noun>".` When the phrase already carries a specific noun: `> Gap: Adversarial-pass noun verified — "<specific noun>" already specific.` Silent pass is forbidden — see the [no silent action](output-contract.md#disposition-invariants) invariant.

package/skills/structure-prompt/reference/block-classification.md ADDED Viewed

@@ -0,0 +1,27 @@
+# Block classification
+Every input prompt decomposes into six block types. Tag each region of the input as exactly one type before applying any spoke rules.
+## Block types
+**Mission block.** One sentence stating what the agent does. The opening directive of the prompt.
+**Metadata block.** Identifiers, SHAs, PR numbers, target paths, ID prefixes, scope flags, mode toggles. Short atomic facts the agent uses as parameters.
+**Framework block.** The checklist, sub-bucket list, surface list, category list, or step list the agent processes. Multi-item structures with named entries.
+**Questions block.** Cross-cutting questions, synthesis questions, or open questions the agent answers after completing the framework.
+**Output spec block.** The format the agent's output takes — totals header, per-item shape, ordering, severity tags, locator format, length cap, lead phrase, closing phrase.
+**Data body block.** Any of:
+- Fenced code block (triple backtick) that sits INSIDE the prompt content — not the outer paste-mode fence that wraps the entire prompt artifact
+- Diff, file dump, transcript, log, table, or document inlined as content
+- Any single content region of 500 characters or more that the agent inspects rather than acts on
+## Tagging procedure
+1. Read the input prompt top to bottom.
+2. Annotate each region with exactly one tag.
+3. Confirm every content region is either tagged with one of the six block types or part of a gap-report block. Gap-note lines (`> Gap:`) and `<!-- gap-report:` comment blocks from a prior invocation form a passthrough region — preserved in place during classification and reordering, not re-tagged. During emission, the gap-report region is deterministically replaced by the current run's gap notes per [`output-contract.md`](output-contract.md). The gap-report region sits at the end of the prompt and carries no classification tag.
+4. Proceed to the matching spoke.