npm - supered - Versions diffs - 0.1.3 → 0.2.1 - Mend

supered 0.1.3 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/.claude-plugin/plugin.json +1 -1
package/.codex-plugin/plugin.json +1 -1
package/.cursor-plugin/plugin.json +1 -1
package/README.md +19 -0
package/bin/supered.mjs +8 -25
package/docs/evals/README.md +34 -0
package/docs/evals/baseline-results.json +132 -0
package/docs/evals/scenarios.json +212 -0
package/docs/index.html +23 -0
package/docs/skill-design-principles.md +25 -0
package/docs/styles.css +40 -0
package/docs/which-skill.md +33 -0
package/gemini-extension.json +1 -1
package/install.sh +37 -1
package/lib/eval-pack.js +101 -0
package/lib/host-install.js +88 -0
package/lib/manifest.js +7 -73
package/lib/package-verification.js +113 -0
package/lib/release-bundle.js +135 -0
package/lib/supered-policy.js +32 -0
package/package.json +1 -1
package/skills/build-in-slices/SKILL.md +71 -15
package/skills/make-a-map/SKILL.md +77 -15
package/skills/prove-the-change/SKILL.md +73 -15
package/skills/shape-the-task/SKILL.md +67 -14
package/skills/ship-the-work/SKILL.md +75 -15
package/skills/trace-the-fault/SKILL.md +71 -13
package/skills/using-supered/SKILL.md +79 -11

package/skills/build-in-slices/SKILL.md CHANGED Viewed

@@ -1,27 +1,83 @@
 ---
 name: build-in-slices
-description: Use when implementing features or repository changes that should be built incrementally and verified as they land.
+description: Use when implementing code, docs, package, site, or repo changes that can regress if done as one large edit.
 ---
 # Build In Slices
-Build one meaningful slice at a time. Each slice should make the repo more complete without hiding unfinished work behind broad claims.
+Build in slices so progress remains reviewable and mistakes stay small. The unit of work is not "a file"; it is a behavior plus the evidence that behavior changed.
-## Steps
+## Trigger
-1. Start with the smallest test or validator that can catch drift.
-2. Run it and observe the failure when practical.
-3. Implement the slice.
-4. Re-run the relevant check.
-5. Continue only after the evidence matches the claim.
+Use this for feature work, skill rewrites, docs that affect installation, CLI behavior, package metadata, release automation, browser-visible pages, tests, or any change where a broad edit could hide a broken assumption.
-## Slice Examples
+## Do Not Use When
-- Add manifest tests before writing manifest code.
-- Add CLI behavior before expanding documentation.
-- Add one skill file, validate it, then add the next.
-- Update docs only after the product surface is real.
+Do not use this as an excuse to create dozens of microscopic commits. Do not slice by activity only, such as "edit files" then "test later". Do not keep code that was written before the test when test-first proof is feasible.
-## Evidence
+## Required Inputs
-Every slice should leave behind a file diff and a command result that explains what changed.
+- Current plan or shaped task.
+- Existing test and validation commands.
+- Files likely to change.
+- Definition of done for the current slice.
+- Known user constraints and any dirty worktree state.
+## Operating Procedure
+1. Choose one behavior to change.
+2. Add or update the smallest automated check that would fail without the change. If automation is not practical, define a manual proof such as screenshot, HTTP response, or CLI transcript.
+3. Run the check and read the failure. If it passes before implementation, the check is weak or the behavior already exists.
+4. Implement the minimum change that makes the check pass.
+5. Run the targeted check again. Fix the implementation, not the test, unless the test is demonstrably wrong.
+6. Refactor only after the check is green.
+7. Run a broader verification set before moving to a new risk area.
+8. Update the user with the evidence, not just the intent.
+## Output Contract
+For each meaningful slice, record:
+```text
+Slice:
+Expected red check:
+Change:
+Green check:
+Files touched:
+Next risk:
+```
+## Guardrails
+- Stop if the worktree contains unrelated changes that could be staged accidentally.
+- Do not batch unrelated behavior under one "cleanup" slice.
+- Do not update docs to claim a feature before the feature or command exists.
+- Never say a slice is done without a fresh check.
+- Keep generated artifacts out of git unless they are intentional release assets.
+## Failure Modes
+- **Test-after bias:** The agent writes code first, then a test that mirrors the code. Fix by deleting or ignoring the implementation and writing the test from desired behavior.
+- **Mega-slice:** Many files change before any check runs. Fix by backing up to the smallest failing check.
+- **Doc drift:** README changes before CLI behavior exists. Fix by implementing and verifying command behavior first.
+- **Green but irrelevant:** The check passes but does not cover the new behavior. Fix by making the assertion user-visible.
+## Quality Gates
+- The slice has one primary behavior.
+- The proof would have caught the old state.
+- The diff is small enough to review.
+- Broader verification still passes before handoff.
+## Example
+Good:
+```text
+Slice: npm install path.
+Expected red check: package test fails because publishConfig/files are missing.
+Change: add package metadata and verify-package script.
+Green check: npm test and npm run verify-package pass.
+```
+Bad: "Rewrite the package, docs, CLI, release notes, and site" followed by one final test run.

package/skills/make-a-map/SKILL.md CHANGED Viewed

@@ -1,27 +1,89 @@
 ---
 name: make-a-map
-description: Use when an approved task needs a small execution plan with concrete files, checks, and handoff points.
+description: Use when a direction is approved but the implementation path, sequencing, files, or checks are not yet clear.
 ---
 # Make A Map
-Turn an approved brief into a short implementation map. Prefer a handful of crisp steps over a sprawling plan.
+Make a map after the task is shaped and before broad edits begin. A good map is short, sequenced by risk, and specific enough that the next command is obvious.
-## Steps
+## Trigger
-1. Inspect the repo before planning.
-2. Identify the files that must change.
-3. Split work into slices that can be verified independently.
-4. Put the highest-risk slice first.
-5. Name the exact verification commands.
+Use this when the work spans multiple files, needs tests, has ordering dependencies, touches packaging or release behavior, or could be done in several plausible ways. Also use it when the user asks for a plan after approving a direction.
-## Plan Rules
+## Do Not Use When
-- One active step at a time.
-- No ornamental refactors.
-- Keep tasks scoped to the user's request.
-- Update the map as facts change.
+Do not write a plan before inspecting the repo. Do not create a plan for a single obvious edit unless risk is high. Do not keep planning after implementation evidence has invalidated the map; update it.
-## Evidence
+## Required Inputs
-A good map names the next command, the next file, and the reason the step matters.
+- Approved brief or clearly understood request.
+- Repo structure and relevant files.
+- Available commands: tests, validation, build, lint, browser checks, publish checks.
+- Known risks and dependencies.
+- User constraints on scope, timing, or release.
+## Operating Procedure
+1. Inspect before planning: list files, read package scripts, check git status, and identify existing patterns.
+2. Write a small set of steps, usually three to seven. Each step should change behavior or reduce risk.
+3. Put the highest-risk unknown first: failing tests, package metadata, browser rendering, auth, external API, migration, or destructive operation.
+4. For each step, name the file area and the check that proves it.
+5. Keep only one step `in_progress`. Update the map when a fact changes.
+6. Prefer vertical slices: test plus implementation plus verification. Avoid "edit all docs" followed by "hope it works".
+7. Mark deferred work explicitly so it does not leak into the current task.
+## Output Contract
+```text
+Plan:
+1. Step:
+   Files:
+   Check:
+   Risk reduced:
+2. ...
+Deferred:
+Stop conditions:
+```
+Use compact bullets in chat. Use a formal checklist only for multi-step work.
+## Guardrails
+- Stop if the plan contains a step that cannot be verified.
+- Do not include unrelated refactors, dependency upgrades, or formatting churn.
+- Do not plan a push, release, or publish before naming the authentication and verification requirements.
+- Do not let tests be a final cleanup step. Put the first meaningful test before implementation when practical.
+## Failure Modes
+- **Inventory plan:** "Read files, edit files, test" adds no value. Fix by naming specific files and proof.
+- **Risk-last sequencing:** The hardest unknown appears at the end. Fix by moving it first.
+- **No stop condition:** The agent keeps going after a failed check. Fix by saying what blocks progress.
+- **Unowned dependencies:** The plan assumes tools or credentials exist. Fix by checking them early.
+## Quality Gates
+- Every step has a check.
+- The plan can survive interruption because status is explicit.
+- The plan honors the user's scope.
+- The first step produces useful evidence, even if the implementation fails.
+## Example
+Good:
+```text
+1. Add failing quality tests for skill structure.
+   Files: tests/skill-quality.test.mjs
+   Check: npm test fails on current thin skills.
+   Risk reduced: defines "robust" before rewriting.
+2. Rewrite skills against the tested structure.
+   Files: skills/*/SKILL.md
+   Check: npm test passes.
+3. Verify package and site.
+   Files: package metadata, docs if changed
+   Check: npm run verify-package && npm run verify-site.
+```
+Bad: "Improve all skills, update docs, run tests" with no file boundaries or proof.

package/skills/prove-the-change/SKILL.md CHANGED Viewed

@@ -1,27 +1,85 @@
 ---
 name: prove-the-change
-description: Use before claiming work is complete, tests pass, a bug is fixed, or a public handoff is ready.
+description: Use when a claim may be made that work is done, correct, fixed, tested, published, deployed, visible, or ready.
 ---
 # Prove The Change
-Claims need evidence. Run fresh verification close to the final answer.
+Evidence comes before claims. This skill prevents the agent from converting confidence, memory, or a partial check into a completion statement.
-## Steps
+## Trigger
-1. List the claims you are about to make.
-2. Pick the command or inspection that proves each claim.
-3. Run the checks.
-4. Read the output and note failures honestly.
-5. Report only what the evidence supports.
+Use this before final answers, after fixes, before commits, before releases, before saying tests pass, after publishing, after browser-visible work, and whenever the user asks "is it done?" or "does it work?"
-## Common Proof
+## Do Not Use When
-- `npm test` for JavaScript packages.
-- `npm run validate` for Supered bundle integrity.
-- `git status --short` for clean or expected working tree state.
-- A browser or screenshot check for visible UI work.
+Do not use stale command output from a previous turn. Do not use this to invent checks after the fact while skipping the actual command. Do not claim the whole product is verified when only one layer was checked.
-## Evidence
+## Required Inputs
-The final response should include the checks run and the result.
+- Claims you are about to make.
+- Commands or observations that can prove each claim.
+- Current git state.
+- External targets when relevant: npm registry, GitHub release, Pages URL, CI run, browser screenshot, API response.
+- Known limitations or blocked checks.
+## Operating Procedure
+1. Write the claims in plain language. Example: "Tests pass", "npm package is live", "site renders on mobile".
+2. Map each claim to evidence:
+   - code behavior: targeted test plus broader suite
+   - package integrity: `npm pack`, tarball inspection, install from tarball
+   - publish: registry/API lookup and real install from clean temp dir
+   - website: HTTP status, content check, browser screenshot
+   - repository state: `git status --short --branch`, remote run status
+3. Run the full command fresh. Read exit code and output.
+4. If any check fails, state the failure and either fix it or downgrade the claim.
+5. Report evidence compactly in the final response.
+6. Include residual risk only when it matters to the user's next decision.
+## Output Contract
+```text
+Claims checked:
+Commands run:
+Results:
+Remote checks:
+Not checked:
+Conclusion:
+```
+For routine coding work, the final answer can be shorter, but it must still name the meaningful checks.
+## Guardrails
+- Stop if you catch yourself writing "should", "probably", "looks", or "I think" in a completion claim.
+- Never claim a remote publish from a local dry-run.
+- Never call CI green until the remote run conclusion is success.
+- Do not hide failed checks behind passing checks.
+- Do not ask the user to verify something you can verify with available tools.
+## Failure Modes
+- **Partial proof:** Unit tests pass, but package install fails. Fix by checking the layer you changed.
+- **Wrong environment:** A command works inside the repo but not from a clean temp directory. Fix by testing like a user.
+- **Remote lag:** GitHub Pages or npm cache has not updated. Fix by polling or stating pending status.
+- **Overclaim:** "Everything works" after one command. Fix by listing only proven claims.
+## Quality Gates
+- Every final claim has a fresh check.
+- The strongest check matches the user's real use path.
+- Failures are reported plainly.
+- The final answer distinguishes local, CI, and remote evidence.
+## Example
+Good:
+```text
+Claim: npx install works.
+Check: run `npx --yes supered@latest install --target codex --dest <tmp>` from an empty temp directory.
+Result: installed expected skill files.
+```
+Bad: "npm publish dry-run passed, so npx works" before publishing or testing the registry package.

package/skills/shape-the-task/SKILL.md CHANGED Viewed

@@ -1,32 +1,85 @@
 ---
 name: shape-the-task
-description: Use when the user has a rough idea and the agent needs to turn it into a short, buildable brief before implementation.
+description: Use when a request is ambiguous, broad, underspecified, exploratory, or has multiple plausible interpretations.
 ---
 # Shape The Task
-Shape the work before building. The goal is a brief that is small enough to act on and concrete enough to test.
+Shape the task before building. The goal is not a long spec; it is a compact, testable brief that prevents the agent from solving the wrong problem with confidence.
-## Steps
+## Trigger
-1. Name the product or change.
-2. Write the target user and the job they need done.
-3. List the smallest useful feature set.
-4. Note constraints, exclusions, and risk.
-5. Confirm the implementation surface: files, package, repo, app, or deployment.
+Use this when the user gives a rough product idea, asks for "similar to X", says "make it better", asks for "best in market", names an outcome without acceptance criteria, or mixes product, design, code, and publishing concerns in one request.
-## Brief Format
+## Do Not Use When
+Do not use this when the user provided a precise bug, file, command, and expected result. Do not ask broad discovery questions if the repo or provided artifact can answer them. Do not turn a small implementation request into a product strategy exercise.
+## Required Inputs
+- User's literal request and any attached files or links.
+- Current repo state, if this is a local project.
+- Reference products or competitors, if named.
+- Constraints: platform, audience, deadline, budget, target quality, or release channel.
+- Evidence required for "done".
+## Operating Procedure
+1. Capture the request in the user's words, then rewrite it as a concrete outcome.
+2. Identify the audience and job-to-be-done. If the audience is unknown, infer from the product context and mark it as an assumption.
+3. Separate must-have outcomes from nice-to-have ideas. Keep the first build small enough to verify.
+4. Find constraints from local context before asking: package type, framework, assets, existing tests, publish target, branch state.
+5. Name risks that could change implementation: legal/copyright, auth, payments, external accounts, destructive operations, user data, or brand assets.
+6. Define acceptance criteria that can be checked by commands, screenshots, API responses, docs review, or user inspection.
+7. Ask at most one focused question only if the answer would materially change the work. Otherwise proceed with stated assumptions.
+## Output Contract
 Use this shape when helpful:
 ```text
 Outcome:
 Audience:
-Must include:
-Can skip:
-Evidence:
+Must have:
+Can defer:
+Assumptions:
+Risks:
+Acceptance checks:
+First slice:
 ```
-## Evidence
+## Guardrails
+- Stop if the brief contains vague adjectives without observable checks, such as "premium", "robust", or "world class". Translate them into criteria.
+- Do not overfit to a reference product. Preserve inspiration, avoid copying text, assets, or protected expression.
+- Do not ask the user to restate information available in the repo, issue, file, or link.
+- Never proceed if the task implies publishing, payment, credentials, or destructive actions without a clear target and proof path.
+## Failure Modes
+- **Question flood:** The agent asks five questions before inspecting context. Fix by reading the repo/artifact first.
+- **Spec bloat:** The brief becomes a roadmap. Fix by naming the smallest useful version.
+- **False certainty:** The agent hides assumptions. Fix by listing assumptions explicitly.
+- **Aesthetic-only criteria:** The brief says "looks good." Fix by adding viewport, accessibility, content, and screenshot checks.
+## Quality Gates
+- The brief can be read in under one minute.
+- Every must-have has a verification method.
+- The first slice is small enough to complete and test.
+- The user can correct the direction before major work begins.
+## Example
+Good:
+```text
+Outcome: Create a public installable agent-skill kit named Supered.
+Audience: developers using coding agents.
+Must have: installable skills, docs, validation, public repo.
+Can defer: marketplace submission.
+Acceptance checks: npm test, package validation, public URL, install smoke test.
+First slice: create manifest tests and package skeleton.
+```
-Before moving to implementation, the agent should be able to explain what will be built, what will not be built, and how success will be checked.
+Bad: "Build something like Superpowers but better" with no audience, checks, or boundaries.

package/skills/ship-the-work/SKILL.md CHANGED Viewed

@@ -1,28 +1,88 @@
 ---
 name: ship-the-work
-description: Use when preparing local work for GitHub: staging, committing, pushing, creating a repository, or handing off a public link.
+description: Use when work may be committed, pushed, tagged, released, published, deployed, or handed off publicly.
 ---
 # Ship The Work
-Shipping is the last implementation slice. Treat it with the same care as code.
+Shipping is product behavior. A commit, tag, package, release, or public URL is not done until the intended remote target confirms it.
-## Steps
+## Trigger
-1. Inspect `git status --short`.
-2. Review the changed files for accidental scope.
-3. Run the relevant verification commands.
-4. Stage only intended files.
-5. Commit with a short, specific message.
-6. Push to the intended public repository.
-7. Report the branch, commit, repository URL, and checks.
+Use this for staging, committing, pushing, creating GitHub repos, creating tags or releases, publishing npm packages, enabling Pages, deploying sites, or giving the user a public link.
+## Do Not Use When
+Do not ship uncommitted exploratory work. Do not publish a package or release because local tests pass. Do not push if the worktree contains unrelated user changes unless the user explicitly includes them.
+## Required Inputs
+- Intended remote target: repository, branch, registry, package, release, or site.
+- Clean understanding of changed files.
+- Verification commands for the changed surface.
+- Auth status for GitHub, npm, deploy provider, or other remote.
+- Version/tag strategy.
+- Rollback or correction path when possible.
+## Operating Procedure
+1. Inspect `git status --short --branch`.
+2. Review the diff and staged scope. Separate unrelated local changes.
+3. Run relevant checks before staging or publishing:
+   - code/package: `npm test`, `npm run validate`, `npm run verify-package`
+   - website: `npm run verify-site`
+   - shell installer: `sh -n install.sh` and install smoke checks
+4. Scan for obvious secrets in intended files.
+5. Stage only intended files. Check `git diff --cached --stat` and `git diff --cached --check`.
+6. Commit with a concise message that names the user-visible change.
+7. Push and verify remote branch state.
+8. For releases, create the release/tag only after the matching commit is pushed.
+9. For npm, verify name/version availability, run dry-run publish, publish, then verify `npm view` and real `npx` install from a clean temp directory.
+10. For sites, poll deployment until the public URL returns expected content.
+## Output Contract
+```text
+Changed:
+Checks before ship:
+Commit:
+Remote:
+Release/package/site:
+Post-ship verification:
+Blocked or follow-up:
+```
 ## Guardrails
-- Do not stage unrelated user changes silently.
-- Do not push secrets, local credentials, or generated noise.
-- Do not call a repo public until GitHub confirms visibility.
+- Stop if auth is missing; do not fake a publish with dry-run output.
+- Never paste or request secrets, OTPs, passwords, or tokens in chat.
+- Do not tag or announce a release before the remote artifact exists.
+- Do not stage ignored generated output unless it is intentionally shipped.
+- Do not call a repository public until the hosting service says it is not private.
+## Failure Modes
+- **Premature announcement:** README advertises `npx` before npm publish succeeds. Fix by publishing first or holding the push.
+- **Mixed worktree:** Unrelated files get staged. Fix by explicit path staging.
+- **Local-only release:** A tag exists locally but not remotely. Fix by checking remote release metadata.
+- **Unverified install:** Package publishes but CLI bin fails. Fix by testing from a clean temp directory.
+## Quality Gates
+- Local and remote states match.
+- Public artifacts are reachable.
+- The final answer includes commit/release/package URLs when relevant.
+- The user knows exactly what, if anything, remains blocked.
+## Example
+Good:
-## Evidence
+```text
+Checks: npm test, validate, smoke-install, verify-package, verify-site.
+Commit: 89f9a8a Prepare npm package install path.
+Package: npm view supered@latest confirms bin.
+Post-ship: npx install from empty temp dir succeeds.
+```
-The handoff should include a GitHub URL and the exact validation that ran before the push.
+Bad: pushing a tag after `npm publish` failed on OTP, then telling the user `npx` is ready.

package/skills/trace-the-fault/SKILL.md CHANGED Viewed

@@ -1,26 +1,84 @@
 ---
 name: trace-the-fault
-description: Use when debugging broken behavior, failing tests, confusing output, or deployment failures with an unknown cause.
+description: Use when behavior is broken, flaky, surprising, failing, inconsistent, or explained only by guesses.
 ---
 # Trace The Fault
-Debug from symptom to cause. Do not patch randomly.
+Trace from symptom to cause. The goal is not to make the error disappear; it is to understand the responsible boundary well enough that the fix is small and the original symptom is rechecked.
-## Steps
+## Trigger
-1. Reproduce the symptom.
-2. State what changed, what stayed the same, and what is unknown.
-3. Add the smallest observation that distinguishes likely causes.
-4. Follow the evidence to the responsible boundary.
-5. Fix the cause and keep the reproduction as a check when possible.
+Use this for failing tests, broken CLI commands, deployment failures, browser rendering bugs, package publishing errors, auth failures, flaky behavior, unexpected output, or any moment where the agent is tempted to say "probably".
+## Do Not Use When
+Do not use this for known, mechanical edits where cause and fix are already proven. Do not skip reproduction because the fix looks obvious. Do not treat third-party service errors as solved until the service response changes.
+## Required Inputs
+- Exact symptom: command, error, screenshot, log, failing assertion, or user report.
+- Recent changes and relevant environment details.
+- A way to reproduce or observe the symptom.
+- Boundaries to inspect: code, config, package, network, auth, docs, browser, CI.
+- A verification command for the original symptom.
+## Operating Procedure
+1. Reproduce the symptom as directly as possible. Capture the exact command and output.
+2. Classify the failure: missing file, wrong assumption, dependency, environment, auth, network, syntax, behavior, test design, or service state.
+3. List what is known, what is unknown, and what changed recently.
+4. Form one narrow hypothesis at a time. Prefer observations that split possibilities, such as checking published tarball contents before editing package code.
+5. Add instrumentation or targeted commands only where they reduce uncertainty.
+6. Fix the cause at the responsible boundary. Avoid broad rewrites.
+7. Re-run the original reproduction. Then run nearby regression checks.
+8. If the root cause was a missing guard, add a test or validator so it stays fixed.
+## Output Contract
+```text
+Symptom:
+Reproduction:
+Known facts:
+Hypothesis tested:
+Root cause:
+Fix:
+Original symptom check:
+Regression check:
+```
 ## Guardrails
-- Do not make multiple unrelated fixes at once.
-- Do not explain a failure before reproducing it.
-- Do not delete evidence until the fix is verified.
+- Stop if you cannot reproduce and the issue is not urgent; gather more evidence before patching.
+- Do not make multiple unrelated fixes in one debug pass.
+- Do not delete logs or screenshots until the final check passes.
+- Do not call an auth, publish, or deploy issue fixed until the remote system confirms it.
+- Never change tests just to make them pass unless the test itself is proven wrong.
+## Failure Modes
+- **Guess patching:** The agent edits likely files without proving the boundary. Fix by running a discriminating observation.
+- **Symptom swap:** The original error disappears but a new unexamined error appears. Fix by tracing the new symptom.
+- **Local-only proof:** The local check passes but CI/npm/GitHub still fails. Fix by verifying the actual remote target.
+- **Overbroad fix:** The agent rewrites architecture for a config typo. Fix at the smallest responsible boundary.
+## Quality Gates
+- Original symptom was reproduced or explicitly could not be reproduced.
+- Root cause is stated as a boundary, not a vibe.
+- The fix is narrower than the investigation.
+- Original symptom and at least one regression check pass.
+## Example
+Good:
-## Evidence
+```text
+Symptom: npx supered@latest fails with "command not found" in repo.
+Observation: published tarball has correct bin.
+Hypothesis: npm exec inside same-named checkout shadows registry bin.
+Check: run npx from empty temp directory.
+Result: registry install works.
+```
-Completion requires the original symptom to be checked again after the fix.
+Bad: changing `package.json` repeatedly without inspecting the actual published tarball or execution directory.