npm - opencode-skills-collection - Versions diffs - 3.1.0 → 3.1.1 - Mend

opencode-skills-collection 3.1.0 → 3.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (277) hide show

package/bundled-skills/yao-meta-skill/references/operating-modes.md ADDED Viewed

@@ -0,0 +1,107 @@
+# Operating Modes
+This playbook expands the compact mode routing in `SKILL.md`.
+## Scaffold
+Use when:
+- the skill is exploratory
+- the workflow is personal or short-lived
+- eval and packaging cost would exceed reuse value
+Default deliverables:
+- `SKILL.md`
+- `agents/interface.yaml`
+- `references/` only when a small amount of deferred reading is clearly helpful
+Avoid:
+- automatic `scripts/`, `evals/`, or `manifest.json`
+- packaging targets the user did not ask for
+## Production
+Use when:
+- the skill will be reused by a team
+- routing mistakes would waste time
+- a small amount of deterministic automation improves reliability
+Default deliverables:
+- lean `SKILL.md`
+- `agents/interface.yaml`
+- `references/` for policies, checklists, or examples
+- `scripts/` only when deterministic logic is real
+- `evals/` when trigger or output quality should be checked
+- `manifest.json` when lifecycle metadata matters
+Minimum gates:
+- `resource_boundary_check.py`
+- `validate_skill.py`
+- `trigger_eval.py` when route confusion is plausible
+## Library
+Use when:
+- the skill is organizationally important
+- the package will be shared broadly
+- maintenance and portability matter
+- the skill itself shapes how other skills are created or governed
+Default deliverables:
+- trigger positives, negatives, and near neighbors
+- packaging expectations
+- maintenance metadata
+- visible regression evidence
+- governance review readiness
+Minimum gates:
+- `resource_boundary_check.py`
+- `governance_check.py`
+- `trigger_eval.py`
+- `cross_packager.py` for requested targets
+## Governed
+Use when:
+- the skill affects incident, release, compliance, security, or organizational standards
+- external distribution, public claims, or high-permission scripts require reviewable evidence
+- wrong output or wrong activation can cause operational, legal, trust, or reputational harm
+Default deliverables:
+- everything required for Library
+- explicit owner, lifecycle, review cadence, and expiry-aware approvals
+- trust/security reports for scripts, dependencies, permissions, secrets, and package hash
+- output eval evidence with blind review status and reviewer-visible boundaries
+- world-class or public-claim evidence ledger when public readiness is claimed
+Minimum gates:
+- Library gates
+- `trust_check.py`
+- runtime permission probes for packaged adapters
+- review waiver ledger for accepted warning-level risk
+- Review Studio before release
+- claim guard before public world-class language
+## Escalation Rules
+- stay in Scaffold unless reuse is clearly real
+- move to Production when team reuse or route confusion matters
+- move to Library when the skill becomes shared infrastructure
+- move to Governed when the skill needs explicit risk ownership, high-permission review, or public-claim evidence
+## Context Discipline
+- a mode upgrade does not justify a larger `SKILL.md`
+- higher rigor should mostly add better references, reports, evals, and metadata
+- if a mode upgrade bloats the initial load, move detail out before adding more checks

package/bundled-skills/yao-meta-skill/references/output-eval-method.md ADDED Viewed

@@ -0,0 +1,113 @@
+# Output Eval Method
+Output Eval Lab proves whether a skill improves the final user-facing result, not only whether it routes correctly.
+## When To Use
+Use output evals for production, library, governed, or team-distributed skills. Scaffold skills can start with one smoke case, but production and above should show a positive with-skill vs baseline signal before promotion.
+## Case Design
+Each case should include:
+- a real prompt or task shape
+- any required input files
+- a baseline output that represents doing the task without the skill
+- a with-skill output that represents the skill-guided behavior
+- assertions that can be checked without subjective guessing
+- optional human review notes for taste, completeness, or judgment
+## Assertion Rules
+Prefer assertions that catch material quality:
+- required deliverable paths
+- required sections or contracts
+- required boundary or exclusion language
+- required evidence paths
+- forbidden generic placeholders
+- forbidden unsafe actions
+Avoid assertions that only reward wording memorization. If a case can pass by parroting one phrase while failing the real job, the assertion is too narrow.
+## Score Reading
+The first v0 scorecard reports:
+- baseline pass rate
+- with-skill pass rate
+- absolute delta
+- failed assertions and failure taxonomy
+- execution mode, timing, and token evidence when `reports/output_execution_runs.md` is generated
+- blind A/B review pack count
+- recommended next fixes
+Production promotion should require the with-skill pass rate to beat baseline and should explain every failed assertion.
+## Execution Evidence
+Run execution evidence after the scorecard:
+```bash
+python3 scripts/yao.py output-exec
+```
+By default, this records the current case outputs as `recorded_fixture`. That is useful for reproducibility, but it is not model-executed evidence. To collect real run evidence, pass `--runner-command` with a command or JSON string list. The runner receives a JSON request on stdin and should return JSON with:
+- `output`
+- optional `execution_kind`: `command` or `model`
+- optional `provider` and `model`
+- optional `usage.input_tokens`, `usage.output_tokens`, and `usage.total_tokens`
+Only runs that return provider/model metadata or `execution_kind: "model"` should count as model-executed. If token usage is absent, the report may estimate tokens, but the estimate must be labeled as estimated.
+For local release-gate smoke evidence without external model credentials, use the deterministic runner:
+```bash
+python3 scripts/yao.py output-exec --runner-command '["python3","scripts/local_output_eval_runner.py"]'
+```
+This verifies the command-runner contract, timing capture, grading path, and failure handling. It must not be described as provider-backed model evidence.
+For provider-backed evidence, use the bundled provider runner with real credentials:
+```bash
+YAO_OUTPUT_EVAL_MODEL=gpt-4.1-mini \
+OPENAI_API_KEY=... \
+python3 scripts/yao.py output-exec --provider-runner openai
+```
+The provider runner calls an OpenAI Responses API compatible endpoint, reads input files relative to `evals/output/`, returns `execution_kind: "model"`, and records observed token usage when the provider returns usage fields. If the API key or model is missing, the runner must fail instead of falling back to fixtures or pretending model evidence exists. Use `--provider-base-url` only for reviewed compatible endpoints; non-default HTTPS hosts require `--allow-custom-base-url`, and plain HTTP is allowed only with `--allow-insecure-localhost` for local test servers.
+## Blind A/B Review
+Every output eval run should also generate:
+- `reports/output_blind_review_pack.md`
+- `reports/output_blind_review_pack.json`
+- `reports/output_blind_answer_key.json`
+The review pack must hide whether Variant A or Variant B came from the baseline or the skill-guided output. The answer key is separate audit evidence and should only be opened after a reviewer has made a judgment.
+## Reviewer Adjudication
+After blind review, record reviewer choices in `reports/output_review_decisions.json` with `reviewer`, `reviewed_at`, `winner_variant`, optional `confidence`, and a required rubric-based `reason`, then run:
+```bash
+python3 scripts/adjudicate_output_review.py --write-template
+python3 scripts/yao.py output-review
+```
+The adjudication report writes:
+- `reports/output_review_decisions.json`
+- `reports/output_review_adjudication.json`
+- `reports/output_review_adjudication.md`
+When no reviewer decisions exist, the report should say the cases are pending and Review Studio should link to the decisions template. Do not count pending cases as human agreement. Only a real `winner_variant` of `A` or `B` with reviewer metadata and a non-empty `reason` should contribute to agreement rate, disagreement count, and reviewer judgment count.
+The adjudication report must preserve blind-review integrity: pending and invalid decisions should show the expected winner as hidden. Only reveal `expected_winner_variant` after a valid reviewer decision with rationale exists for that case.
+## Anti-Overfitting
+Keep a small public smoke set and a separate holdout set. Rotate real failures into the taxonomy instead of editing only the prompt that failed. Add near-neighbor cases whenever the output looks good but the boundary is still unclear.

package/bundled-skills/yao-meta-skill/references/output-quality-risk.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Output Quality Risk
+Use this layer when a generated skill produces user-facing artifacts such as tutorials, reports, Markdown pages, screenshots, tables, code snippets, or research summaries.
+## Principle
+A skill is not complete when it can route and execute. It also needs to predict how its final output can fail in small but visible ways, then constrain those failures before the user sees them.
+## Common Failure Modes
+- generic headings that make a tutorial feel templated
+- dense footnotes or citation markers that interrupt reading
+- tables that render poorly or hide decisions inside long cells
+- screenshot references that point to the wrong state, crop, or missing asset
+- polished summaries that lose the user's actual audience or scenario
+- commands or snippets that omit working directory, inputs, outputs, or side effects
+## Required Author Behavior
+Before finalizing a generated skill:
+1. infer the most likely output families from the job and target output
+2. generate `reports/output-risk-profile.md`
+3. generate `reports/artifact-design-profile.md` when the output is a report, tutorial, viewer, dashboard, screenshot, Markdown page, or visual artifact
+4. add output-specific constraints to the generated skill's operating frame
+5. expose the risk and design profiles in the review viewer
+6. treat unresolved output risks as iteration candidates instead of pretending the first version is complete
+## Self-Repair Rule
+Every output-facing skill should do a final pass for:
+- specificity: headings, titles, and summaries fit the actual domain
+- readability: Markdown, tables, and lists remain pleasant to scan
+- evidence hygiene: citations support real claims without clutter
+- visual truthfulness: screenshots and images are real, relevant, and correctly described
+- execution clarity: commands and snippets name their assumptions and expected results
+## Reviewer Rule
+Reviewers should approve the skill only when the likely output mistakes are visible and the generated package contains a reasonable self-repair path for the highest-risk family.

package/bundled-skills/yao-meta-skill/references/output-visual-quality.md ADDED Viewed

@@ -0,0 +1,53 @@
+# Output Visual Quality
+Use this checklist before approving a generated skill that produces reports, tutorials, HTML pages, screenshots, Markdown deliverables, or slide-like artifacts.
+## Common Visual Failures
+- generic headings such as Overview, Key Points, Summary, or Next Steps when the user's domain needs sharper section names
+- large citation or footnote clusters that break sentence flow
+- Markdown tables with paragraph-length cells or weak hierarchy
+- screenshots captured from the wrong state, viewport, crop, or missing asset
+- HTML reports that look like raw JSON converted into cards
+- repeated cards with identical weight, making the page hard to scan
+- decorative gradients, shadows, or glass effects that do not serve the content
+- mobile layouts that collapse into long undifferentiated blocks
+## Design Quality Gates
+### P0 Must Fix
+- no absolute `/Users/...` paths in final HTML
+- no placeholder titles, labels, screenshots, or source notes
+- no invented screenshots, charts, citations, or visual evidence
+- no table with paragraph-length cells when bullets or cards would scan better
+- no fixed design palette copied from another skill without content justification
+### P1 Should Fix
+- title and section headings use domain nouns and the target outcome
+- each report has one clear first-screen explanation of what it is for
+- visual hierarchy separates decisions, evidence, risks, and next actions
+- dense content is split across sections instead of squeezed into one block
+- reviewer-only detail is present but not pushed into the user's main reading flow
+### P2 Polish
+- typography roles are consistent
+- whitespace rhythm supports reading speed
+- cards, tables, and callouts are used for different semantic jobs
+- source notes are grouped where they preserve flow
+- mobile, print, and static-file viewing are considered when relevant
+## Self-Repair Pass
+Before handoff, scan the generated artifact for:
+1. heading specificity
+2. table readability
+3. citation density
+4. screenshot truthfulness
+5. local path leakage
+6. placeholder remnants
+7. mobile scanability
+8. reviewer-visible evidence

package/bundled-skills/yao-meta-skill/references/packaging-contracts.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Packaging Contracts
+`cross_packager.py` is not just an export helper. It validates platform contracts and embeds target compiler output from `compile_skill.py`.
+## Current Targets
+- `openai`
+- `claude`
+- `generic`
+## Contract Shape
+Each target contract defines:
+- required output fields
+- required output files
+- field mapping from the neutral source metadata
+- compiled contract from Skill IR
+- target transform metadata, including generated files and unsupported features
+- portable execution metadata
+- trust-boundary metadata
+- permission contract metadata from the trust report
+- target-specific permission representation and reviewer notes
+- target-native behavior contract for native surface, activation policy, resource strategy, script strategy, permission enforcement, install scope, review artifacts, and fallback behavior
+- degradation strategy metadata
+## Failure Handling
+When `--expectations` is provided:
+- missing required files cause exit code `2`
+- missing required fields cause exit code `2`
+- validation failures are emitted in the JSON report
+After packaging, run `scripts/probe_runtime_permissions.py` against the generated package directory. Packaging creates the permission metadata; the runtime permission probe verifies that each target adapter exposes the contract, target-specific representation, native-enforcement flag, operator note, and residual metadata-fallback risk.
+## Source Of Truth
+The platform-neutral semantic source is Skill IR when it exists:
+- `reports/skill-ir.json`
+- `skill-ir/examples/<skill-name>.json`
+The structural validation sources remain:
+- `SKILL.md`
+- `agents/interface.yaml`
+Target-specific metadata is generated through `scripts/compile_skill.py` and
+then embedded at packaging time. The adapter must carry `compiler`,
+`compiled_contract`, `permission_contract`, `target_permission_contract`,
+`target_native_contract`, `target_transform`, `ir_source`, `ir_schema_version`,
+`job_to_be_done`, `semantic_contract`, and `semantic_parity` so reviewers can
+see whether the target preserved the core skill meaning or fell back to
+frontmatter-only metadata.
+## Portability Model
+The packaging layer now preserves four portable semantics from the neutral source:
+- activation
+- execution
+- trust
+- permissions
+- degradation
+- platform-neutral skill meaning from Skill IR
+- target-specific native behavior notes for activation, resources, scripts, permission enforcement, install scope, review artifacts, and fallback behavior
+- target-specific compile notes for generated files, adapter mode, preserved semantics, and unsupported features
+This means portability is not just "can it export a file?" but also "does the exported target preserve the source package's activation and safety assumptions?"

package/bundled-skills/yao-meta-skill/references/pattern-extraction-doctrine.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Pattern Extraction Doctrine
+Use this doctrine when a skill borrows ideas from GitHub repositories, products, papers, experts, or user-supplied references. The goal is to extract durable patterns, not copy surface style.
+## Principle
+A borrowed pattern must improve the current skill's reliability, clarity, or portability faster than it increases context cost.
+## Four-Gate Pattern Test
+Accept a pattern only when it passes enough of these gates for the skill's risk tier.
+### 1. Recurrence
+The pattern appears in more than one serious example, source, workflow, or usage scenario.
+Use it to reject one-off tricks that look impressive but have no durable signal.
+### 2. Generativity
+The pattern can guide a new case, not just explain the original example.
+Use it to prefer operating principles, decision rules, and workflow loops over anecdotes.
+### 3. Distinctiveness
+The pattern is more specific than generic good advice.
+Use it to reject empty claims such as "be clear", "be useful", or "add quality" unless the reference shows how.
+### 4. Boundary
+The pattern has a known limit: when not to apply it, what not to borrow, or what cost it introduces.
+Use it to prevent reference scans from becoming unbounded feature expansion.
+## Acceptance Rule
+- `Scaffold`: accept a pattern when it has generativity and boundary clarity.
+- `Production`: require recurrence, generativity, and boundary clarity.
+- `Library`: require recurrence, generativity, distinctiveness, and boundary clarity.
+- `Governed`: require all four gates plus reviewer-visible evidence.
+When the evidence is not strong enough, move the pattern into an iteration candidate instead of the first package.
+## What To Borrow
+Borrow:
+- high-signal workflow loops
+- evidence-backed quality gates
+- repeatable review checkpoints
+- crisp output shapes
+- boundary language that prevents route confusion
+- portability patterns that preserve meaning across environments
+## What Not To Borrow
+Do not borrow:
+- source branding
+- long prose
+- roleplay style that does not match the target skill
+- heavy research workflows for low-risk scaffolds
+- platform-specific assumptions hidden inside a general method
+- impressive examples that cannot be verified or generalized
+## Reviewer Questions
+Before accepting a borrowed pattern, ask:
+- Where else does this pattern appear?
+- What new case can it help solve?
+- What makes it more specific than generic advice?
+- When should this skill refuse to use it?
+- What file, report, eval, or checklist proves it earned its weight?

package/bundled-skills/yao-meta-skill/references/platform-capability-matrix.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Platform Capability Matrix
+This matrix describes the current packaging targets and their support level.
+| Target | Metadata Adapter | Compiler Contract | Native Behavior Contract | Output Contract | Snapshot Test | Portability Semantics | Notes |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| `openai` | Yes | Yes | Yes | Yes | Yes | activation, execution, trust, permissions, degradation, native behavior | Generates `targets/openai/agents/openai.yaml` |
+| `claude` | Yes | Yes | Yes | Yes | Yes | activation, execution, trust, permissions, degradation, native behavior | Generates `targets/claude/README.md` plus adapter metadata |
+| `generic` | Yes | Yes | Yes | Yes | Yes | activation, execution, trust, permissions, degradation, native behavior | Uses neutral adapter metadata only |
+| `agent-skills-compatible` | Neutral source | Yes | Yes | Source-compatible | Yes | activation, execution, trust, permissions, degradation, native behavior | Keeps canonical `SKILL.md` plus `agents/interface.yaml` source shape |
+| `vscode` | Yes | Yes | Yes | Yes | Yes | activation, execution, trust, permissions, degradation, native behavior, install scope | Generates `targets/vscode/README.md` plus adapter metadata for VS Code / Copilot Agent Skills review |
+## Current Support Model
+- `openai`: strongest metadata adapter support with an explicit compiler contract.
+- `claude`: lightweight compatibility adapter with an explicit compiler contract and fallback notes.
+- `generic`: lowest-friction export for neutral Agent Skills consumers.
+- `agent-skills-compatible`: canonical source shape with compiler evidence for review and distribution.
+- `vscode`: VS Code / Copilot Agent Skills adapter that preserves the neutral source package and documents user/project scope plus workspace-trust review notes.
+- runtime permission probes currently report metadata fallback for generated targets; no target is claimed as native-enforced until a client or installer integration can actually enforce the permission model.
+## Portable Semantics
+Each target now preserves:
+- activation mode and optional path filters
+- execution context and shell choice
+- trust tier and remote inline-execution policy
+- permission contract for network, file-write, subprocess, and interactive script surfaces
+- target-native behavior contract for native surface, activation policy, resource strategy, script strategy, permission enforcement, install scope, review artifacts, and fallback behavior
+- degradation strategy for unsupported client behavior
+- generated-file mapping and adapter mode from `reports/compiled_targets.json`
+## Explicit Non-Goals
+This project does not yet implement:
+- client SDK integration
+- provider-specific execution logic
+- provider-native installer actions or account-level activation changes
+- native runtime permission enforcement
+## Degradation Rule
+If a target cannot support a source feature directly:
+1. preserve the neutral source package
+2. emit a minimal adapter
+3. document the fallback in the target output

package/bundled-skills/yao-meta-skill/references/prompt-engineering-doctrine.md ADDED Viewed

@@ -0,0 +1,76 @@
+# Prompt Engineering Doctrine
+Use this doctrine when a skill creates, improves, audits, or relies on prompts, role instructions, conversation scripts, writing systems, teaching guides, analysis instructions, or reusable task templates.
+## Principle
+Prompt quality is a skill-design input, not a long prompt to paste into `SKILL.md`.
+The useful abstraction is not a fixed RTF template. The useful abstraction is a compact reasoning layer:
+- understand the real need behind the request
+- choose the right task type and complexity
+- map role, task, and format into skill structure
+- score the prompt-facing behavior before the skill is treated as reusable
+## Need Model
+Before writing a prompt-heavy skill, identify:
+- explicit need: what the user clearly asked for
+- implicit need: what the context suggests but the user did not name
+- scenario: where and how the output will be used
+- user level: beginner, practitioner, expert, reviewer, or operator
+- success standard: what proves the output worked
+If any of these change the package boundary, ask one focused clarification. If they only affect implementation detail, record the assumption in a report instead of interrupting the user.
+## Task Families
+- creative generation: content, ideas, campaigns, variants, concepts
+- analytical reasoning: diagnosis, comparison, synthesis, decision support
+- execution operation: workflow steps, task completion, standardized operations
+- teaching guidance: explanation, curriculum, walkthrough, coaching
+- dialogue interaction: support, interview, roleplay, discovery, coaching
+- prompt engineering: prompt creation, prompt improvement, prompt review, prompt libraries
+## Complexity
+- simple: one output, few constraints, low ambiguity
+- medium: multiple steps, some judgment, moderate standards
+- complex: multiple inputs, tradeoffs, high-quality output expectations
+- expert: domain expertise, evaluation, governance, or safety-sensitive use
+Complexity should control how much structure is added. It should not justify bloating the entrypoint.
+## RTF To Skill Mapping
+| Prompt Layer | Skill Layer | Reviewer Question |
+| --- | --- | --- |
+| Role | operating stance, expertise, tone | Does the agent identity match the job and user level? |
+| Task | workflow, gates, scripts, references | Are the steps executable and verifiable? |
+| Format | output contract, examples, reports | Is the hand-back useful, readable, and testable? |
+## Quality Matrix
+Score prompt-facing behavior on:
+- completeness: enough context, constraints, and outputs are specified
+- clarity: wording is unambiguous and easy to execute
+- consistency: role, task, format, examples, and boundaries agree
+- practicality: the output can be used without hidden assumptions
+- specificity: language fits the user's domain instead of generic prompt jargon
+Treat innovation as optional. A reusable skill should first be clear, reliable, and specific.
+## Anti-Patterns
+- copying a full meta-prompt into `SKILL.md`
+- adding an elaborate persona when the workflow only needs a narrow capability
+- asking the user for every possible field instead of the few fields that change design
+- producing a polished prompt that lacks tests, examples, or output checks
+- using RTF labels as decoration without tying them to skill behavior
+## Reviewer Rule
+For prompt-heavy skills, reviewers should see the need model, task family, complexity, RTF-to-skill mapping, and quality matrix. If those are absent, the package may still run but its prompt behavior is not governed.

package/bundled-skills/yao-meta-skill/references/qa-ladder.md ADDED Viewed

@@ -0,0 +1,57 @@
+# QA Ladder
+Use the smallest quality gate set that still protects the user from likely failure.
+## Basic
+Use when:
+- the skill is disposable or exploratory
+- the route is obvious
+- there is little downside to imperfect output
+Recommended checks:
+- structure sanity
+- naming alignment
+- a quick read for boundary clarity
+## Standard
+Use when:
+- the skill will be reused
+- near-neighbor prompts are plausible
+- references or scripts could drift from the main instructions
+Recommended checks:
+- `validate_skill.py`
+- `resource_boundary_check.py`
+- a small trigger prompt set
+- one description optimization pass when route wording is still unstable
+- one realistic output example
+## Advanced
+Use when:
+- the skill is shared infrastructure
+- packaging or routing errors would be costly
+- you want evidence that the skill stays healthy over time
+Recommended checks:
+- description optimization suite with dev and holdout cases
+- family-based trigger regression
+- failure and anti-pattern regression
+- governance scoring
+- packaging contract validation
+- regression history and result reporting
+## Escalation Heuristics
+- add trigger eval before writing more instruction detail
+- add boundary checks before adding more folders
+- add governance and history once the skill becomes a maintained asset
+- do not add advanced checks only for optics