npm - opencode-skills-collection - Versions diffs - 3.1.0 → 3.1.1 - Mend

opencode-skills-collection 3.1.0 → 3.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (277) hide show

package/bundled-skills/yao-meta-skill/references/reference-scan.md ADDED Viewed

@@ -0,0 +1,126 @@
+# Reference Scan Strategy
+Use a short benchmark scan before deep authoring. The goal is to borrow durable patterns from strong reference objects without importing their prose, weight, or brand language into the new skill.
+## Source Priority
+Reference scan has two layers, and they must not be treated equally:
+1. **External Benchmark Scan**
+   - primary source of patterns
+   - use high-star public GitHub repos, official docs, strong public examples, and world-class products
+   - this layer defines the upper bound for quality
+2. **User Reference Intake**
+   - a taste-and-standard layer
+   - ask whether the user has reference repos, products, pages, prompts, or systems they admire
+   - learn only the pattern, structure, boundary, or quality bar; never copy wording or private material
+3. **Local Fit Check**
+   - secondary calibration layer
+   - use local files only for naming, privacy, compatibility, migration, and library-fit constraints
+   - this layer should not define the main design pattern
+External sources should lead. User references should sharpen direction. Local files should calibrate.
+## Default Visibility
+Reference synthesis should be silent by default.
+- do the benchmark scan and pattern synthesis in the background
+- convert the result into a recommendation for the user
+- surface the full evidence only to authors and reviewers
+- only expose the underlying tradeoffs to the user when intent is still uncertain or a real design conflict needs a decision
+## Why This Step Exists
+A new skill often fails because it starts from an isolated idea instead of a proven pattern. A controlled reference scan improves the package before it grows:
+- better boundary design
+- cleaner folder and metadata choices
+- more realistic quality gates
+- stronger portability decisions
+- better alignment with the user's own taste and quality bar
+## The Rule
+Reference scanning is mandatory for:
+- `Production` skills
+- `Library` skills
+- `Governed` skills
+- meta-skills or packaging-heavy skills
+Reference scanning is optional for:
+- `Scaffold` skills
+- one-person exploratory skills
+## Scope Limit
+Do not turn this into open-ended research.
+- scan at most `3-5` reference objects
+- pick from no more than `3` categories
+- extract patterns, not long copied content
+- stop as soon as the borrow plan is clear
+- prefer at least `2` external benchmark objects before treating the scan as complete
+- if the user provides references, record what they admire and what should explicitly not be copied
+## Pattern Acceptance
+Borrowed ideas must pass a lightweight pattern test before they shape the first package.
+- `recurrence`: the pattern appears in more than one serious context or source
+- `generativity`: the pattern can guide new cases, not just explain one example
+- `distinctiveness`: the pattern is more specific than generic good advice
+- `boundary`: the pattern names where not to apply it or what cost it introduces
+For low-risk scaffold work, generativity plus boundary clarity can be enough. For production, library, and governed work, require more evidence before the pattern changes the package shape.
+See [Pattern Extraction Doctrine](pattern-extraction-doctrine.md).
+## Reference Categories
+Choose the smallest relevant set:
+- `method`: loops, evaluation discipline, iteration structure
+- `structure`: package anatomy, resource boundaries, metadata patterns
+- `execution`: operator flow, scripts, initialization and validation experience
+- `portability`: neutral metadata, adapters, degradation strategy
+- `domain`: workflow-specific patterns from a top example in the same problem space
+## Output Format
+A good scan produces a short report with:
+1. current skill anchor
+2. scan focus
+3. external benchmark objects
+4. user-supplied references
+5. local fit constraints
+6. what to borrow
+7. what not to borrow
+8. a compact borrow plan
+## What To Borrow
+Borrow:
+- repeatable loops
+- clear boundary patterns
+- proven gate choices
+- portable metadata ideas
+- clear operator-facing flows
+- high-signal examples that show the finished experience, not just the internal method
+Do not borrow:
+- source-specific branding
+- long copied prose
+- unnecessary directories
+- quality gates that exceed the skill's risk tier
+- platform lock-in disguised as best practice
+- local historical habits that are weaker than public top-tier benchmarks
+## Design Principle
+The scan is successful only if it raises skill quality faster than it raises context cost. If benchmark material makes the new skill heavier without making it more reliable, discard it.

package/bundled-skills/yao-meta-skill/references/regression-cause-taxonomy.md ADDED Viewed

@@ -0,0 +1,80 @@
+# Regression Cause Taxonomy
+This taxonomy explains how iteration regressions are classified when a description candidate is evaluated for promotion.
+## Core Principle
+A candidate should not be judged only by aggregate pass/fail counts. The iteration system should explain why a candidate was blocked, kept behind the current description, or promoted.
+## Cause Tags
+### `no_candidate_outperformed_current`
+The selected winner is still the current description. No candidate earned promotion.
+### `visible_holdout_regression`
+The candidate regressed on the visible holdout suite by adding false positives or false negatives.
+### `blind_holdout_regression`
+The candidate regressed on blind holdout prompts. This blocks promotion because the failure is not only local to the tuning loop.
+### `current_holdout_gap_present`
+The current or selected winner still carries a visible holdout miss. Promotion may still stay on `keep_current`, but the iteration bundle should show the unresolved gap.
+### `current_holdout_risk`
+The visible holdout calibration still looks risky even when promotion is not blocked. This is an audit signal that the route boundary needs future work.
+### `judge_blind_regression`
+The rubric judge found worse blind-holdout behavior than the current or baseline description.
+### `judge_blind_low_agreement`
+The judge-backed blind evaluation did not produce enough agreement confidence to support promotion.
+### `adversarial_regression`
+The candidate performed worse on adversarial holdout prompts that simulate route collisions or disguised requests.
+### `adversarial_overlap_risk`
+The adversarial calibration layer reports an `overlap` risk band, meaning route boundaries are too weak for safe promotion.
+### `adversarial_watch_risk`
+The adversarial calibration layer reports a non-failing but cautionary risk band such as `watch` or `tight`.
+### `family_instability`
+At least one tracked family stops being clean under blind, judge-backed blind, or adversarial evaluation.
+### `route_confusion`
+The route confusion matrix shows route theft or misrouting between sibling skills.
+### `route_ambiguity`
+The route confusion matrix reports ambiguous cases near the configured margin-warning threshold.
+### `longer_without_gain`
+The candidate is materially longer than the current description without producing a better route outcome.
+### `promotion_ready`
+The candidate passed every promotion gate and is eligible for review and promotion.
+## Usage
+These cause tags should appear in:
+- promotion decisions
+- iteration bundles
+- regression histories
+- human review summaries
+They are intended to make iteration auditable rather than merely descriptive.

package/bundled-skills/yao-meta-skill/references/resource-boundaries.md ADDED Viewed

@@ -0,0 +1,120 @@
+# Resource Boundary Spec
+This spec defines where information belongs inside a skill package.
+## Principle
+Keep the main skill small enough to route and execute clearly. Move detail out of `SKILL.md` as soon as it stops helping routing or branch selection.
+Do not add structure for imagined future needs. A folder, script, eval, or governance file belongs in the package only when it reduces current ambiguity, execution burden, route risk, or maintenance risk.
+## Context Budget Tiers
+Use the lightest budget that still fits the package.
+- `scaffold`: `700` initial-load tokens
+- `production`: `1000` initial-load tokens
+- `library`: `1300` initial-load tokens
+- `governed`: `1300` initial-load tokens
+If `manifest.json` sets `context_budget_tier`, that tier overrides the default budget derived from lifecycle or maturity metadata. This allows a high-governance skill to keep a stricter initial-load budget than its lifecycle label alone would imply.
+## Placement Rules
+### Put content in `SKILL.md` when it is:
+- part of the trigger surface
+- part of the core execution skeleton
+- part of the output contract
+- necessary for branch selection or safe defaults
+### Put content in `references/` when it is:
+- domain guidance
+- long examples
+- policy material
+- schemas or templates humans or agents may read on demand
+### Put content in `scripts/` when it is:
+- deterministic
+- repetitive
+- brittle if rewritten from prose
+- easier to validate as code than as instructions
+### Put content in `evals/` when:
+- the skill is reused enough that routing mistakes matter
+- near-neighbor confusion is likely
+- quality claims should be reproducible
+### Put content in `assets/` when:
+- the package includes output artifacts, examples, or static files that should not bloat prompt context
+## Anti-Patterns
+Avoid these:
+- storing long policy text directly in `SKILL.md`
+- adding `references/` with no files that are actually used
+- adding `scripts/` for logic that is still best expressed in prose
+- adding `evals/` for one-off or disposable skills
+- creating every folder by default even when empty
+- keeping folders that are neither referenced in `SKILL.md` nor declared as factory components
+- adding broad configuration knobs before a real variation exists
+- adding governance or reports to make a scaffold look mature when no reuse pressure exists
+## Heuristics
+### `SKILL.md`
+- should stay focused
+- should not become the full knowledge base
+- should mention any optional directory that materially affects execution
+### `references/`
+- should earn their keep
+- should usually be named and discoverable from `SKILL.md`
+### `scripts/`
+- should exist only when deterministic logic or formatting logic is real
+- should be referenced explicitly from `SKILL.md` when required for execution
+### `evals/`
+- should exist when routing or quality claims need to be defended
+- should be skipped for disposable personal drafts
+## Unused Resource Detection
+`resource_boundary_check.py` warns when a non-empty optional directory appears decorative:
+- the directory exists and contains files
+- the main `SKILL.md` does not reference it
+- and the directory is not declared in `manifest.json` factory components
+This protects the package from looking more sophisticated than it actually is.
+## Quality Density
+The checker also reports `quality_density`, a local signal for how much governance and quality evidence is packed into the initial load budget.
+It combines:
+- governance score
+- presence of evals
+- presence of reports
+- references and scripts
+- interface and manifest metadata
+- failure or test evidence
+Higher density means the package is staying lean while still proving quality.
+## Quality Intent
+The best skill is not the one with the most files. The best skill is the smallest package that still makes the recurring job reliable, reusable, and auditable.
+See [Authoring Discipline](authoring-discipline.md) for the author and reviewer rules that keep resource growth tied to a real user goal.

package/bundled-skills/yao-meta-skill/references/review-studio-method.md ADDED Viewed

@@ -0,0 +1,87 @@
+# Review Studio 2.0 Method
+Review Studio is the release-facing audit surface for a skill package. It does not replace the detailed reports; it turns them into one reviewer decision page.
+## Purpose
+- Show release blockers and warnings before the package deepens.
+- Link every gate back to a concrete evidence artifact.
+- Generate review actions for every blocker and warning, with source-fix location and verification command.
+- Make human warning acceptance auditable through a waiver ledger.
+- Make reviewer comments auditable through an annotation ledger tied to gates, source/report paths, and optional line numbers.
+- Keep review flow vertical: summary first, gates second, supporting details after.
+- Avoid hiding output quality, runtime, trust, portfolio, and operating-loop issues across separate pages.
+## Required Gates
+1. Intent Canvas: intent confidence and unresolved input/output/exclusion gaps.
+2. Trigger Lab: route scorecard, misroutes, ambiguous cases, and near-neighbor safety.
+3. Output Lab: with-skill vs baseline delta, execution mode, timing/token evidence, case count, file-backed cases, near-neighbor cases, boundary cases, blind A/B review pack evidence, and reviewer adjudication status.
+4. Context Budget: initial load, budget tier, warnings, and quality density.
+5. Runtime Matrix: target conformance pass/fail and degradation notes.
+6. Trust Report: secret scan, script surface, dependency pinning, network/interactive flags, and package hash.
+7. Permission Gates: reviewer-approved capability scope, reason, expiry, evidence, and target-enforcement notes.
+8. Runtime Permission Probes: packaged adapter permission contracts, native-enforcement flags, metadata fallback notes, and residual risks.
+9. Skill Atlas: route collisions, stale skills, owner gaps, and no-route opportunities.
+10. Operations Loop: local-first metadata-only adoption, missed-trigger, bad-output, script-error, and review-drift signals.
+11. Review Waivers: human risk approvals, active coverage, expired approvals, invalid records, and expiry policy.
+12. Registry Audit: package metadata, install evidence, compatibility entries, and archive/source checksums.
+13. Release Notes: promotion status, migration notes, known gaps, and next move.
+## Gate Semantics
+- `pass`: evidence is present and the gate is satisfied.
+- `warn`: review can continue, but the issue must be visible before release.
+- `block`: do not claim production, library, governed, or public readiness until fixed.
+For library and governed skills, Output Lab should have at least five cases and cover file-backed, near-neighbor, and boundary scenarios.
+Production, library, and governed reviews should also show a blind A/B review pack. The Review Studio gate may warn when scorecard evidence exists but no blind pack is present, because the package can prove assertions but not yet reduce reviewer bias.
+When `reports/output_execution_runs.json` exists, Review Studio should show the number of variant runs, command-executed runs, model-executed runs, recorded fixtures, timing-observed runs, and token-estimated runs. Recorded fixtures are valid reproducibility evidence, but they must not be described as model-executed output evidence.
+When `reports/output_review_adjudication.json` exists, Review Studio should show reviewed pairs and pending pairs. Pending reviewer decisions are acceptable as an explicit state, but they must not be counted as agreement or human review evidence. For production, library, and governed packages, pending reviewer decisions should keep the Output Lab in `warn` until reviewer decisions are recorded or the warning is explicitly accepted in the waiver ledger. Invalid adjudication records should block release because they make the blind review audit untrustworthy.
+The Operations Loop must never display raw telemetry logs. It should link only to `reports/adoption_drift_report.md`; privacy or schema violations are blockers.
+The Runtime Permission Probes gate is evaluated after packaging, because it reads generated target adapters. A missing probe can warn in lighter modes, but governed release should not claim target permission readiness without `reports/runtime_permission_probes.json`.
+The Review Waivers gate must never convert a blocker into a pass. Waivers only cover warning-level risks, require reviewer, reason, scope, and expiry, and must link only to `reports/review_waivers.md`.
+Review Annotations are not waivers. They are reviewer comments attached to a Review Studio gate plus a relative source/report path and optional line number. Use them to preserve review context, requested edits, and source-line notes. Open blocker annotations should make the Review Studio decision `blocked` until the annotation is resolved or deferred with reviewer rationale. Open warning annotations can move the package into review, but they do not create gate-specific `review_actions`; actions remain reserved for non-pass gates.
+## Review Actions
+Every non-pass gate must produce a `review_actions` entry in `reports/review-studio.json`. When all gates pass, `review_actions` should be an empty list and the page should explicitly say there are no blocker or warning actions.
+Each action must include:
+- `gate_key`
+- `status`
+- `summary`
+- `why`
+- `source_fix`
+- `source_refs`
+- `evidence`
+- `verification_command`
+`source_refs` must be structured entries with relative `path`, human label, kind, existence flag, best-effort line number, matched pattern, short source excerpt, and relative link when the file exists. They should point to the smallest useful report or source file, not just a broad directory. The HTML page should render the excerpt next to the link so reviewers can understand why a line anchor matters before opening the full artifact.
+The HTML page should render these actions before the detailed supporting sections so a reviewer can move directly from warning to fix. Action entries do not change gate count or score; they make the current decision more operational.
+For `world-class-evidence`, the action should also expose an evidence-step card for every pending evidence key. Each card should show the submission path, template path, blocked source checks, command handoff, first runbook steps, provenance requirements, success checks, evidence artifacts, and privacy boundary. These cards are collection guidance only; they must not count as accepted evidence or change world-class readiness.
+## Review Annotations
+`reports/review_annotations.json` is the structured ledger, and `reports/review_annotations.md` is the human-readable review note surface. Each annotation should include:
+- `gate_key`
+- `target_path`
+- `line` when a useful source line exists
+- `severity`
+- `status`
+- `reviewer`
+- `body`
+- optional `suggested_action`
+The ledger should reject absolute paths or paths that escape the skill directory. Missing target files are allowed as visible evidence gaps, not hidden failures.

package/bundled-skills/yao-meta-skill/references/review-waiver-method.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Review Waiver Method
+Review waivers make human risk acceptance explicit. They are not a way to hide problems; they are a local audit record for warning-level risks that the reviewer intentionally accepts for a bounded release window.
+## When To Use
+Use a waiver when:
+- Review Studio shows a warning that is understood and intentionally accepted.
+- The warning cannot be fixed before release without a worse tradeoff.
+- A reviewer can name the reason, scope, evidence, and expiry date.
+Do not use a waiver for blocker gates in v0. Blockers must be fixed before production, library, governed, or public readiness is claimed. In governed mode, missing or invalid high-permission approvals are blockers and should be fixed in `security/permission_policy.json`, not waived.
+## Required Fields
+Every waiver must include:
+- `gate_key`: the Review Studio gate being accepted.
+- `decision`: `accepted-risk`, `false-positive`, or `temporary-exception`.
+- `reviewer`: the accountable human or team.
+- `reason`: a concrete reason of at least 20 characters.
+- `created_at`: ISO date.
+- `expires_at`: ISO date.
+- `evidence`: optional path or note that explains the decision.
+- `scope`: default `current-release`.
+## Gate Key Policy
+The waiver ledger must track the Review Studio gate universe explicitly:
+- `review_studio_gate_keys`: every gate Review Studio can render.
+- `waiverable_gate_keys`: warning gates that may receive bounded human acceptance.
+- `non_waivable_gate_keys`: gates that must not be accepted through a waiver.
+When Review Studio adds or renames a gate, update the waiver gate policy and tests in the same change. `review-waivers` and `world-class-evidence` stay non-waivable: the first is the waiver mechanism itself, and the second can only be satisfied by accepted ledger evidence.
+## Release Semantics
+- Invalid waiver records block Review Studio.
+- Expired waiver records stay visible and no longer cover warnings.
+- Active waivers cover only the exact gate key they name.
+- A warning without an active waiver remains visible as a warning.
+- Raw user prompts, outputs, credentials, and private transcripts must not be stored in waiver reasons.
+## Commands
+Render or validate the ledger:
+```bash
+python3 scripts/render_review_waivers.py .
+```
+Add a bounded approval:
+```bash
+python3 scripts/yao.py review-waivers . \
+  --add-waiver \
+  --gate-key trust-report \
+  --reviewer "Yao Team" \
+  --reason "Network-capable scripts are documented and bounded for this release." \
+  --expires-at 2026-09-30
+```
+For a non-governed release where `permission-gates` is only a warning, the same command can name `--gate-key permission-gates`. Governed releases must instead provide reviewer, scope, reason, expiry, evidence, and target-enforcement fields in `security/permission_policy.json`.
+Review Studio reads `reports/review_waivers.json` and links to `reports/review_waivers.md`.
+## Candidate Actions
+The waiver report also surfaces current candidate actions from local evidence:
+- waiverable warning candidates, such as an `output-lab` warning caused by pending reviewer decisions or missing provider-backed runs
+- non-waivable boundaries, especially `world-class-evidence`, where pending ledger evidence cannot be converted into completion by a waiver
+A waiver can make a bounded warning auditable for a release window. It cannot count as provider-backed evidence, human adjudication, native runtime enforcement, external telemetry, or public world-class readiness.

package/bundled-skills/yao-meta-skill/references/runtime-conformance-method.md ADDED Viewed

@@ -0,0 +1,21 @@
+# Runtime Conformance Method
+Runtime conformance turns platform compatibility from a packaging afterthought into a release gate.
+## Purpose
+Use this check when a skill is packaged for OpenAI, Claude, Agent Skills, VS Code / Copilot, or generic targets. The goal is not to prove that every runtime behaves identically. The goal is to prove that the package exposes enough metadata, files, and degradation notes for each runtime to consume it safely.
+## V0 Checks
+- `SKILL.md` exists and has frontmatter `name` and `description`.
+- `description` stays within the 1024 character limit used by common Agent Skills clients.
+- `manifest.json` includes name, version, owner, maturity, status, review cadence, and target platforms.
+- `agents/interface.yaml` includes display text, default prompt, activation mode, execution context, trust metadata, adapter targets, and degradation strategy.
+- Skill IR exists and matches the frontmatter name and description.
+- Resources named by Skill IR are relative paths and resolve inside the package.
+- Unsupported or lossy target behavior is represented by a degradation note.
+## Reviewer Gate
+A reviewer should be able to see a target matrix with pass/fail status, failures, warnings, and artifact paths. Any failed target blocks library, governed, or team-distributed release for that target.

package/bundled-skills/yao-meta-skill/references/skill-archetypes.md ADDED Viewed

@@ -0,0 +1,86 @@
+# Skill Archetypes
+Use these archetypes to decide what kind of skill you are building before you decide how many files or gates to add.
+## Scaffold
+Purpose:
+- quick packaging for a real but still exploratory workflow
+Default assets:
+- `SKILL.md`
+- `agents/interface.yaml`
+Use when:
+- reuse is plausible but not proven
+- failure cost is low
+- the workflow is still changing
+## Production
+Purpose:
+- compact skill for team reuse
+Default assets:
+- lean `SKILL.md`
+- `agents/interface.yaml`
+- selective `references/`
+- selective `evals/`
+Use when:
+- route mistakes waste team time
+- a checklist or focused script improves reliability
+## Library
+Purpose:
+- shared capability with visible evidence and portability expectations
+Default assets:
+- route evals
+- packaging checks
+- manifest metadata
+- public reports
+Use when:
+- the skill will be reused across teams or clients
+- the skill is likely to have near-neighbor route collisions
+## Governed
+Purpose:
+- high-trust skill with explicit ownership and review
+Default assets:
+- lifecycle metadata
+- governance score
+- review cadence
+- regression history
+- governed examples or policy references
+Use when:
+- the skill is operationally sensitive
+- the skill influences incident, release, compliance, or organizational standards
+## Anti-Archetypes
+Do not force a request into a skill archetype when it is really:
+- a one-off answer
+- a document
+- a brainstorm
+- an implementation task with no reusable process
+See [Non-Skill Decision Tree](non-skill-decision-tree.md).

package/bundled-skills/yao-meta-skill/references/skill-atlas-method.md ADDED Viewed

@@ -0,0 +1,35 @@
+# Skill Atlas Method
+Skill Atlas is the 2.0 operating layer for a workspace that contains many skills.
+## Purpose
+Single-skill quality is not enough for a team library. A skill portfolio also needs to reveal route collisions, stale ownership, duplicate resources, and repeated no-route opportunities.
+## V0 Checks
+- Catalog every `SKILL.md` under a workspace.
+- Extract name, description, owner, maturity, targets, updated date, and review cadence.
+- Detect similar descriptions as route-overlap candidates.
+- Detect duplicate skill names.
+- Detect shared script/reference filenames as dependency signals.
+- Flag missing owner or review metadata.
+- Flag stale skills based on `updated_at` and `review_cadence`.
+- Extract no-route opportunities from failure notes.
+- Read aggregate adoption drift reports and flag telemetry drift without reading raw telemetry logs.
+## Scope Policy
+Atlas keeps a full catalog, but release gates should distinguish actionable library skills from examples and test fixtures.
+Use `skill_atlas/policy.json` to mark path prefixes as non-actionable when they are intentionally retained as examples, evolution snapshots, embedded generated skills, or validator fixtures. Non-actionable items still appear in the full report, route matrix, stale list, and owner gap list, but Review Studio should use the actionable counts for release readiness.
+## Telemetry Link
+Atlas may read each skill's aggregate `reports/adoption_drift_report.json` to surface portfolio drift signals such as no telemetry for production/library/governed skills, missed triggers, bad outputs, missing resources, script errors, and review-overdue counts. It must not read or package `reports/telemetry_events.jsonl`; raw telemetry remains local-only evidence owned by the skill.
+Write drift output to `skill_atlas/drift_signals.json`. Non-actionable scopes stay visible in that file and in the HTML report, but only actionable drift signals should affect release readiness.
+## Reviewer Gate
+Use Atlas before promoting a single skill into a shared library. If an actionable route collision, missing owner, stale governed skill, or telemetry drift signal appears, fix the portfolio boundary before adding more local complexity to one skill. Non-actionable issues should stay visible as evidence, not as release blockers.