npm - @skill-graph/cli - Versions diffs - 0.5.7 → 0.5.8 - Mend

@skill-graph/cli 0.5.7 → 0.5.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (83) hide show

package/CHANGELOG.md +27 -3
package/README.md +40 -14
package/SKILL_GRAPH.md +2 -2
package/bin/skill-graph.js +118 -2
package/docs/ADOPTION.md +1 -1
package/docs/PRIMER.md +6 -5
package/docs/QUICKSTART-30MIN.md +2 -2
package/docs/SKILL_AUDIT_CHECKLIST.md +1 -1
package/docs/SKILL_METADATA_PROTOCOL.md +2 -2
package/docs/_archived/marketplace-publication-priority-2026-05-18.md +1 -1
package/docs/_drafts/0.5.8-release-prep.md +164 -0
package/docs/field-reference.generated.md +1 -1
package/docs/field-reference.md +2 -2
package/docs/manifest-field-mapping.md +3 -3
package/docs/marketplace-publication-queue.generated.md +2 -2
package/docs/plans/scripts-roadmap.md +2 -2
package/docs/positioning.md +88 -0
package/docs/research/skill-comprehension-eval-research.md +5 -5
package/docs/research/skill-demand-gap-roadmap-2026-05-16.md +215 -0
package/docs/status.generated.md +48 -0
package/examples/audits/context-graph/findings.md +59 -0
package/examples/audits/context-graph/scorecard.md +22 -0
package/examples/audits/context-graph/verdict.md +33 -0
package/examples/evals/a11y.json +45 -13
package/examples/evals/api-design.json +18 -5
package/examples/evals/code-review.json +18 -5
package/examples/evals/data-modeling.json +18 -5
package/examples/evals/database-migration.json +18 -5
package/examples/evals/debugging.json +37 -11
package/examples/evals/dependency-architecture.json +18 -5
package/examples/evals/design-system-architecture.json +18 -5
package/examples/evals/error-tracking.json +18 -5
package/examples/evals/event-contract-design.json +18 -5
package/examples/evals/form-ux-architecture.json +18 -5
package/examples/evals/framework-fit-analysis.json +18 -5
package/examples/evals/graph-audit.json +55 -13
package/examples/evals/information-architecture.json +18 -5
package/examples/evals/interaction-feedback.json +18 -5
package/examples/evals/interaction-patterns.json +18 -5
package/examples/evals/layout-composition.json +18 -5
package/examples/evals/lint-overlay.json +38 -11
package/examples/evals/microcopy.json +18 -5
package/examples/evals/observability-modeling.json +18 -5
package/examples/evals/pattern-recognition.json +32 -9
package/examples/evals/performance-engineering.json +18 -5
package/examples/evals/refactor.json +41 -12
package/examples/evals/semiotics.json +18 -5
package/examples/evals/skill-infrastructure.json +32 -9
package/examples/evals/skill-router.json +42 -13
package/examples/evals/system-interface-contracts.json +18 -5
package/examples/evals/task-analysis.json +18 -5
package/examples/evals/testing-strategy.json +36 -11
package/examples/evals/type-safety.json +251 -66
package/examples/evals/visual-design-foundations.json +18 -5
package/examples/evals/webhook-integration.json +18 -5
package/examples/fixture-skills/README.md +47 -0
package/examples/fixture-skills/comprehension-full/SKILL.md +79 -0
package/examples/fixture-skills/minimal-capability/SKILL.md +51 -0
package/examples/fixture-skills/with-grounding/SKILL.md +78 -0
package/examples/fixture-skills/with-relations/SKILL.md +87 -0
package/examples/skills.manifest.sample.json +1722 -446
package/marketplace/README.md +1 -1
package/marketplace/skills/a11y/SKILL.md +1 -1
package/marketplace/skills/best-practice/SKILL.md +211 -0
package/marketplace/skills/context-graph/SKILL.md +1 -1
package/marketplace/skills/debugging/SKILL.md +1 -1
package/marketplace/skills/graph-audit/SKILL.md +3 -1
package/marketplace/skills/postgres-rls/SKILL.md +284 -0
package/marketplace/skills/refactor/SKILL.md +1 -1
package/marketplace/skills/skill-infrastructure/SKILL.md +2 -0
package/marketplace/skills/skill-router/SKILL.md +3 -1
package/marketplace/skills/testing-strategy/SKILL.md +1 -1
package/package.json +3 -1
package/scripts/__tests__/test-marketplace-export.js +6 -2
package/scripts/__tests__/test-v3-1-alias-contract.js +3 -3
package/scripts/build-status-doc.js +177 -0
package/scripts/check-doc-drift.js +224 -0
package/scripts/check-markdown-links.js +34 -4
package/scripts/check-mirror-freeze.js +270 -0
package/scripts/export-marketplace-skills.js +35 -6
package/scripts/lib/audit-prompt-builder.js +3 -3
package/scripts/lib/parse-frontmatter.js +2 -2
package/scripts/skill-audit.js +7 -9

package/docs/_drafts/0.5.8-release-prep.md ADDED Viewed

@@ -0,0 +1,164 @@
+# 0.5.8 Release Prep — Review Before Commit/Publish
+> **Status:** DRAFT — not yet applied to `package.json` or `CHANGELOG.md`.
+> **Prepared:** 2026-05-19 by Karpathy-loop Phase 2 Item 34.
+> **Action required:** review the proposed CHANGELOG entry and version bump, then either apply directly via the commands at the bottom or hand-edit.
+## Proposed version bump
+`package.json`:
+```diff
+-  "version": "0.5.7",
++  "version": "0.5.8",
+```
+## Proposed CHANGELOG entry
+Insert as a new section between the existing `## [Unreleased]` header and the next `## [0.5.0]` section (the existing `[Unreleased]` block, currently containing pre-Phase-2 entries, should be **promoted** to the body of `## [0.5.8]`; a fresh empty `## [Unreleased]` header replaces it):
+```markdown
+## [Unreleased]
+## [0.5.8] — 2026-05-19
+The "Karpathy-loop Phase 2" release. Ships the deterministic-drift sentinels,
+the mirror-freeze linter, the generated status doc, the `doctor` subcommand,
+the first hermetic test fixture, and the positioning + onboarding rework that
+Phase 2 specified. All Phase 1 audit gates remained green throughout; the
+release also includes the 33 Phase 1 truth-repair commits.
+### Added
+- **`scripts/check-doc-drift.js`** — schema_version drift sentinel that
+  scans active docs for stale `schema_version: N` references and refuses
+  to teach an old contract as current. Reads the active version from
+  `schemas/skill.schema.json`; allowlists `_archived/`, `docs/migrations/`,
+  `examples/`, `CHANGELOG.md`, and `vN → vM` migration sections. Flags:
+  `--json`, `--quiet`, `--include-warn`.
+- **`scripts/check-mirror-freeze.js`** — fails on active-source / active-
+  package claims in the two docs-only deprecation mirrors
+  (`skill-metadata-protocol`, `skill-audit-loop`). Detects `@skill-graph/
+  protocol`, `@skill-graph/audit`, `src/`, `lib/schemas/`, `schemas/skill.v*.
+  schema.json`, and `evals/` references that contradict the post-ADR-0009
+  docs-only status. File-level banner detection skips files explicitly
+  stamped as historical / frozen / deprecation-mirror snapshots.
+- **`scripts/build-status-doc.js` + `docs/status.generated.md`** — single-
+  source-of-truth status snapshot that pulls live values from `package.json`,
+  the schema, the manifest, and the four deterministic check scripts.
+  Replaces hand-maintained "Latest release" lines and ad-hoc skill-count
+  claims. Flags: `--check`, `--stdout`, `--no-checks`.
+- **`skill-graph doctor`** — new CLI subcommand grouping all six deterministic
+  checks (links, protocol, drift, mirror-freeze, lint, manifest) into one
+  pass with a single summary table. Recommended first command for bug
+  reports. Flags: `--json`, `--bail`, `--skip <name>`.
+- **`examples/fixture-skills/`** — hermetic test fixtures for `@skill-graph/cli`
+  package tests. Ships `minimal-capability` (bare-minimum v6 frontmatter,
+  passes lint with zero errors). Three more planned (`with-grounding`,
+  `with-relations`, `comprehension-full`) are sketched in the directory
+  README; each will land as a separate commit when the sub-field shapes
+  are aligned with the lint's expectations.
+- **`docs/positioning.md`** — explicit positioning of Skill Graph in the
+  agent-skills ecosystem. Names what Skill Graph is **not** (a runtime,
+  a marketplace, a tool platform) before naming what it **is** (an
+  authoring + audit-time contract with a Karpathy keep-or-revert loop).
+  Compares against Anthropic Skills, the Agent Skills spec, MCP, A2A,
+  Smithery, Composio, and AGENTS.md. Includes the "60-second pitch",
+  the "what this is not" matrix, the "when to reach for Skill Graph"
+  gate, and the layered ecosystem map.
+- **README hero rework** — three new sections above the ecosystem
+  diagram: "Is this for me?" (yes-if / no-if qualifying signals for a
+  30-second visitor), "What makes this different" (Karpathy keep-or-revert
+  loop applied to skill libraries, with explicit citation), and a
+  surfacing of `docs/quality-doctrine.md` as the codified quality bar.
+- **README two-onboarding-paths framing** — explicit routing of authors
+  to `docs/QUICKSTART-30MIN.md` for hands-on first-skill authoring and
+  to `docs/PRIMER.md` for the conceptual model first. Synthesis Item 48
+  resolved as "link, do not merge" because they serve different reader
+  genres.
+### Changed
+- **`scripts/check-markdown-links.js`** — broken links in `_archived/`
+  paths now emit warnings instead of failing the build. Active docs
+  remain strict. New `--strict-archived` flag elevates archived warnings
+  to errors (useful when migrating an archive batch). Synthesis Item 49.
+- **README "External Context"** — replaced 5-bullet list of tangential
+  references with a "Where Skill Graph fits" section that links directly
+  to `docs/positioning.md` and surfaces the unique angle in one line.
+- **`bin/skill-graph.js`** — `evolve` subcommand labeled `[PREVIEW ·
+  monorepo-only]` in both the README quickstart and the CLI `--help`.
+  Aligns the discoverability surface with the dispatcher's existing
+  not-standalone-compatible error message.
+### Documented
+- **`docs/_archived/marketplace-publication-priority-2026-05-18.md`** —
+  the archived plan's broken `./adr/...` link was fixed to `../adr/...`
+  (Phase 1 Item 1).
+- **Mirror governance** — five mirror docs banner-stamped as frozen
+  deprecation snapshots: `skill-metadata-protocol/docs/{concept-map,
+  field-reference, field-reference.generated, skill-metadata-protocol}.md`
+  and `skill-metadata-protocol/CONTRIBUTING.md` + `skill-audit-loop/{
+  CONTRIBUTING, SKILL_AUDIT_LOOP, SKILL_AUDIT_CHECKLIST}.md` (Phase 1
+  Items 30, 25 + Phase 2 Item 36 follow-ups).
+- **`docs/plans/skill-graph-oss-docs-refresh.md`** (lives in the
+  Development workspace, not this repo) — banner-stamped as superseded
+  by ADR 0009 (Phase 2 Item 41).
+### Carried forward from earlier [Unreleased]
+[Move the existing pre-Phase-2 [Unreleased] entries here — i.e. the
+publication-classification ledger, the publish.yml workflow, the
+stability promotion lint check, the marketplace publication priority
+doc removal, the cross-repo version reconciliation, the schema_version 4/5/6
+changes, and the skill audit loop reframing.]
+```
+## Apply the diff (when ready)
+```bash
+cd /Users/jacobbalslev/Development/skill-graph
+# 1. Bump version
+sed -i '' 's/"version": "0.5.7"/"version": "0.5.8"/' package.json
+# 2. Hand-edit CHANGELOG.md per the proposed entry above. The mechanical
+#    steps are:
+#      - Move existing [Unreleased] body to "## [0.5.8] — 2026-05-19" with
+#        a "Carried forward from earlier [Unreleased]" sub-section.
+#      - Add the new Phase 2 entries from this draft.
+#      - Insert a fresh empty "## [Unreleased]" header above [0.5.8].
+# 3. Regenerate status doc so it reflects the new version
+node scripts/build-status-doc.js
+# 4. Verify everything's clean
+node bin/skill-graph.js doctor
+# 5. Commit with --only
+git commit --only -m "chore(release): cut 0.5.8 — Karpathy-loop Phase 2
+Promotes [Unreleased] to [0.5.8] and adds Phase 2 entries:
+drift sentinel, mirror-freeze linter, status doc, doctor subcommand,
+fixture-skills, positioning doc, README hero rework, evolve PREVIEW
+labeling, and the _archived/ link-check policy.
+Karpathy-loop Phase 2 item 34/55." -- package.json CHANGELOG.md docs/status.generated.md
+# 6. Tag + publish (only after committing and pushing)
+git tag v0.5.8
+git push origin main --tags
+# CI workflow .github/workflows/publish.yml publishes on tag push.
+```
+## Open questions for the user
+1. Should I keep all pre-Phase-2 [Unreleased] entries inside [0.5.8] (the
+   "kitchen-sink" choice), or split them into a separate intermediate version
+   (e.g. 0.5.6 / 0.5.7 / 0.5.8) to chronicle when they actually shipped?
+   The package.json version has already bumped 0.5.0 → 0.5.7 without
+   corresponding CHANGELOG entries, so there's accumulated drift.
+2. Confirm the release tag will trigger `.github/workflows/publish.yml`.
+   First-publish prereqs (npm org + `NPM_TOKEN` secret) are tracked in the
+   existing `## [Unreleased]` SH-6111 entry.

package/docs/field-reference.generated.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Skill Graph Field Reference (Generated)
-> **Generated from** `schemas/skill.schema.json` on 2026-05-18 by `scripts/build-field-reference.js`.
+> **Generated from** `schemas/skill.schema.json` on 2026-05-19 by `scripts/build-field-reference.js`.
 > **Do not edit by hand.** The canonical prose reference is [`docs/field-reference.md`](field-reference.md).
 > **Predicate glossary:** [`docs/glossary.md`](glossary.md).
 > **JSON-LD @context:** [`schemas/skill.context.jsonld`](../schemas/skill.context.jsonld).

package/docs/field-reference.md CHANGED Viewed

@@ -13,7 +13,7 @@ The field reference is split across three coordinated documents. Use whichever f
 | Doc | Genre | When to read |
 |---|---|---|
 | [`field-reference.md`](field-reference.md) (this doc) | **Hand-curated prose reference.** Field-by-field, with worked examples, lint notes, and cross-cutting guidance. | When authoring or reviewing a SKILL.md and you want examples and "when to use" rules alongside the schema-canonical definition. |
-| [`field-reference.generated.md`](field-reference.generated.md) | **Auto-generated index.** Built from `schemas/skill.v4.schema.json` description strings by `scripts/build-field-reference.js`. Drift-free against the schema. | When you want the machine-guaranteed list of every field, every type, every pattern, every enum value. The fastest way to verify what the schema actually accepts today. |
+| [`field-reference.generated.md`](field-reference.generated.md) | **Auto-generated index.** Built from `schemas/skill.v6.schema.json` description strings by `scripts/build-field-reference.js`. Drift-free against the schema. | When you want the machine-guaranteed list of every field, every type, every pattern, every enum value. The fastest way to verify what the schema actually accepts today. |
 | [`field-rationale.md`](field-rationale.md) | **Hand-authored "why this field" rationale.** Covers the ~10 fields whose meaning is non-obvious from the schema description (`scope`, `eval_artifacts`, `eval_state`, `routing_eval`, `relations.depends_on`, `relations.verify_with`, `relations.broader`, `grounding.evidence_priority`, `lifecycle.review_cadence`, `portability.readiness`). | When you understand *what* a field stores but want to know *why the field exists at all* and *what the common confusion looks like*. |
 The schema is the single source of truth for shape; this doc is the source of truth for prose; `field-rationale.md` is the source of truth for design intent. Lint check C7 (in `scripts/check-protocol-consistency.js`) verifies the generated index stays in sync with the schema description strings — running `node scripts/build-field-reference.js --check` against the live schema must succeed before commit.
@@ -34,7 +34,7 @@ The schema is the single source of truth for shape; this doc is the source of tr
 **Example.**
 ```yaml
-schema_version: 4
+schema_version: 6
 ```
 **When to use.** Always — this is a required field.

package/docs/manifest-field-mapping.md CHANGED Viewed

@@ -148,8 +148,8 @@ Three versions coexist in a manifest ecosystem:
 | Version | Lives in | Meaning |
 |---|---|---|
 | Authored skill `version` | Per-skill frontmatter `version` field | Version of the skill's content (e.g. `1.2.0` means the skill has been iterated twice since its initial publish). |
-| Authored schema version | Per-skill frontmatter `schema_version` field | Version of the `skill.schema.json` contract the skill was authored against. Currently `4` after the v4 naming cleanup. |
-| Manifest schema version | Manifest root `schema_version` field | Version of the `manifest.schema.json` contract the manifest was generated against. Currently `4`. |
+| Authored skill schema version | Per-skill frontmatter `schema_version` field | Version of the `skill.schema.json` contract the skill was authored against. The active value is `6` (skill schema, advanced through v5 to v6 marketplace fields). |
+| Manifest schema version | Manifest root `schema_version` field | Version of the `manifest.schema.json` contract the manifest was generated against. The active value is `4` — manifest contract shape has not changed since v4 even though skill schema has advanced. |
 ### When to bump `schema_version`
@@ -302,7 +302,7 @@ The `skill-metadata-template` starter (`examples/skill-metadata-template.md`) is
 ```yaml
 ---
-schema_version: 4
+schema_version: 6
 name: skill-metadata-template
 description: "Authoring template for new Skill Metadata Protocol skills. ..."
 version: 1.0.0

package/docs/marketplace-publication-queue.generated.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Marketplace Publication Queue (generated)
-Generated: 2026-05-18T10:55:00.441Z
+Generated: 2026-05-18T13:56:43.757Z
 Schema: 2.0.0
 Generator: skill-audit-loop@0.3.0
 Source ledger: `skill-graph/data/publication-classification.json`
@@ -14,7 +14,7 @@ Source ledger: `skill-graph/data/publication-classification.json`
 - Publishable candidates: 136
 - Sales-Hub-bound (excluded): 53
 - Personal-infra (excluded): 27
-- Already published (in OSS export): 140
+- Already published (in OSS export): 142
 - Tier counts: S=18 A=29 B=42 C=47
 - Ledger entries total: 216

package/docs/plans/scripts-roadmap.md CHANGED Viewed

@@ -77,14 +77,14 @@ Purpose:
 Shipping today. Two modes:
 1. **Stub mode** (default) — runs `scripts/skill-lint.js` and seeds `audits/<skill>/{findings,verdict,scorecard}.md` with lint-derived findings and human-TODO placeholders for the seven qualitative dimensions.
-2. **`--graded` mode** — on top of the stub, composes per-dimension prompts via `scripts/lib/audit-prompt-builder.js` and calls an external model CLI (`--grader-cli`, default `claude -p`) once per dimension. Each response must return a `<verdict>…</verdict>` JSON block matching the fixed schema (dimension / score / verdict / justification / findings[]). The runner parses, validates, and merges these into the artifact files, replacing TODO placeholders with structured PASS / PASS WITH FIXES / FAIL verdicts with evidence quotes.
+2. **`--graded` mode** — on top of the stub, composes per-dimension prompts via `scripts/lib/audit-prompt-builder.js` and calls an explicit external grader CLI (`--grader-cli "<command>"`) once per dimension. Each response must return a `<verdict>…</verdict>` JSON block matching the fixed schema (dimension / score / verdict / justification / findings[]). The runner parses, validates, and merges these into the artifact files, replacing TODO placeholders with structured PASS / PASS WITH FIXES / FAIL verdicts with evidence quotes.
 Related files:
 - `scripts/lib/audit-prompt-builder.js` — dimension registry, context collector, prompt composer, response parser, verdict aggregator
 - `scripts/lib/mock-grader.js` — deterministic canned-response grader for CI smoke-tests and reproducible examples (prints fixed `<verdict>` blocks per dimension)
-Grader CLI discipline: the runner NEVER embeds API keys. Auth is delegated to the external CLI on the host (see `.claude/rules/cli-first.md`). `--grader-cli "claude -p"` and `--grader-cli "codex exec"` both work when the respective CLIs are authenticated.
+Grader CLI discipline: the runner NEVER embeds API keys, never selects a default provider, and never writes model-routing metadata into Skill Graph project artifacts. Auth and provider choice are delegated to the explicit external CLI command supplied by the caller.
 ## Suggested Follow-on Scripts

package/docs/positioning.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Positioning — Skill Graph in the Agent-Skills Ecosystem
+> **One-line:** Skill Graph is the **authoring contract + audit loop** for agent skills — it sits *above* plain `SKILL.md` files and *beside* protocols like MCP and A2A. It does not host, dispatch, or execute skills.
+## What this answers
+If you came in via the README hero and asked "where does this fit?", this doc names every adjacent project and draws the boundary. The goal is to make the
+unique angle of Skill Graph legible in under 60 seconds.
+## The 60-second pitch
+Most agent-skill projects answer one of these questions:
+1. **How does an agent find and load a skill?** (Anthropic Skills, agent-skills registries)
+2. **How does an agent call a tool?** (MCP)
+3. **How do agents talk to each other?** (A2A)
+4. **Where do skills get packaged and discovered?** (Smithery, marketplaces)
+5. **How does an agent author orchestrate skill calls in production?** (Composio, agent platforms)
+Skill Graph answers a **different** question: **how do you keep a library of skills correct over time?**
+It does this by giving every skill a structured contract (`SKILL.md` frontmatter), a graph of typed edges between skills (`relations.*`), a deterministic audit loop (`audit → improve → evaluate`), and a Karpathy-style keep-or-revert discipline for changes. The contract is portable; the audit loop is the unique mechanism.
+## The "what this is not" matrix
+Skill Graph is not a competitor to MCP, A2A, Anthropic Skills, Smithery, or Composio. They sit at different layers:
+| Project | What it is | What Skill Graph is in that frame |
+|---|---|---|
+| **[Anthropic Skills](https://docs.anthropic.com/en/docs/claude-code/skills)** | A discovery + loading convention: agent-runtime auto-loads `SKILL.md` files from a directory and decides what to invoke. | Skill Graph **extends** the `SKILL.md` shape with structured metadata (relations, grounding, eval health) so a library of 100+ skills stays coherent. A plain `SKILL.md` still loads everywhere; Skill Graph adds validation, drift checks, and graph queries on top. |
+| **[Agent Skills spec](https://agentskills.my/specification)** | A portable `SKILL.md` packaging format with a draft cross-runtime contract. | Skill Graph **emits** Agent-Skills-compatible exports (see [`skill-graph export`](../README.md#quick-start)). The Skill Metadata Protocol adds typed relations and audit fields the base spec does not require, but the export is plain. |
+| **[Model Context Protocol (MCP)](https://modelcontextprotocol.io)** | A *runtime* protocol: how an agent client calls a tool server (function calling, structured tools, resources, prompts). | Skill Graph is **build-time and authoring-time**, not runtime. Skills can describe how an agent should use an MCP server (in their body or `relations.depends_on`), but Skill Graph itself does not implement MCP. |
+| **[Agent-to-Agent (A2A)](https://google.github.io/A2A/latest/specification/)** | A *runtime* protocol: how one agent delegates a task to another agent and exchanges capability cards. | Skill Graph's metadata could serve as the capability descriptor an A2A agent advertises, but Skill Graph does not implement A2A delegation, transport, or task lifecycle. |
+| **[Smithery](https://smithery.ai)** | A *registry*: searchable directory of MCP servers and agent skills with install confidence and compatibility signals. | Skill Graph's marketplace export pipeline (`marketplace/skills/`) is one upstream that *feeds* a registry like Smithery; Skill Graph itself is the authoring source, not the registry. |
+| **[Composio](https://docs.composio.dev)** | An *agent tooling platform*: hosted integrations, auth, and tool execution for production agents. | Skill Graph and Composio operate at different layers entirely. A Composio agent could include Skill-Graph-published skills as part of its task brief; Skill Graph does not host or dispatch. |
+| **[AGENTS.md](https://agents.md)** | A *repository-local* convention for instructing coding agents (Claude, Cursor, Codex, etc.) on how to work in a specific repo. | Skill Graph can **ingest** an `AGENTS.md` as a context source for a codebase-scope skill's `grounding` field, but it does not compete with the convention. Skill Graph is repository-portable; AGENTS.md is repository-local. |
+## The Karpathy axis (what makes Skill Graph itself distinct)
+Most of the above projects are concerned with the *what* and *how* of agent skills. Skill Graph adds an **opinionated discipline** for the *when* — when a skill is changed:
+- **One field, one commit, one keep-or-revert decision** ([Karpathy's autoresearch](https://github.com/karpathy/autoresearch) loop, applied to skill libraries instead of training scripts).
+- Every change has a hard pass/fail gate (a deterministic check script that turns red or green).
+- Failed changes auto-revert. The lesson is recorded; the field's truth is preserved.
+That discipline is the unique angle. The metadata schema is the substrate that makes it possible — without typed fields, you can't have a deterministic gate. The audit loop is the mechanism. The result is a skill library that drifts less, even as it grows.
+## When to reach for Skill Graph
+Reach for it when:
+- You have **more than ~5 skills** and they have started to depend on, verify, or exclude one another.
+- You want **deterministic checks** for skill correctness (schema, paths, eval health) and not just LLM-as-grader.
+- You want **graph queries** over the library — "what skills depend on this one?", "what's the boundary between X and Y?", "which skills verify this one?"
+- You want a **single audit loop** that produces a fingerprint per skill (`audit_verdict`, `eval_score`, `drift_status`) you can ship in the skill's own frontmatter.
+Do NOT reach for it when:
+- You have 1–3 skills and a plain folder is enough.
+- You want a hosted skill marketplace (use Smithery, agentskills.io).
+- You want an agent runtime (use Claude Code, Cursor, Codex, etc.).
+- You want a tool execution platform (use Composio, your agent runtime's tool layer).
+## Where this puts Skill Graph
+In the ecosystem map, Skill Graph sits in the **author-time + audit-time** layer:
+```
+runtime layer            Anthropic Skills | MCP | A2A | Composio | Smithery
+                              ↑   ↑     ↑       ↑          ↑
+                              |   |     |       |          |  (consumes published skills)
+author/audit-time layer       └───┴─────┴───────┴──────────┘
+                                          ↑
+                              ┌───────────┴────────────┐
+                              |   Skill Graph (here)   |
+                              | + Karpathy audit loop  |
+                              └────────────────────────┘
+```
+The runtime layer answers "how does this skill execute?" The author/audit-time layer answers "how does this library stay correct?" Both are needed; both are different products.
+## Related reading
+- [`docs/SKILL_METADATA_PROTOCOL.md`](SKILL_METADATA_PROTOCOL.md) — the contract.
+- [`SKILL_AUDIT_LOOP.md`](SKILL_AUDIT_LOOP.md) — the four operations (audit, improve, evaluate, evolve).
+- [`docs/quality-doctrine.md`](quality-doctrine.md) — what "improve" means in this discipline.
+- [`docs/adr/0009-sibling-repo-deprecation.md`](adr/0009-sibling-repo-deprecation.md) — why the protocol, audit, and CLI live in one repo now.
+- [Karpathy autoresearch](https://github.com/karpathy/autoresearch) — the keep-or-revert loop applied to LLM training scripts that Skill Graph borrows for skill libraries.

package/docs/research/skill-comprehension-eval-research.md CHANGED Viewed

@@ -53,7 +53,7 @@ The report does not propose a new schema version, does not redesign lint or drif
 ## 2. What the repo currently evaluates
-The Skill Graph's evaluation discipline is described in [`AGENTS.MD` § Evaluation Discipline (lines 143–189)](/Users/jacobbalslev/Development/skill-graph/AGENTS.MD). It names four layers, each with its own definition of "good":
+The Skill Graph's evaluation discipline is described in [`AGENTS.md` § Evaluation Discipline (lines 143–189)](/Users/jacobbalslev/Development/skill-graph/AGENTS.md). It names four layers, each with its own definition of "good":
 | Layer | Question | Surface | Deterministic? |
 |---|---|---|---|
@@ -171,7 +171,7 @@ The result: a skill author who fills the `concept` block does not know what a co
 ### 3.2 No verbatim-copy detector
-A failure mode the protocol's quality doctrine warns against — "evals that paraphrase the skill body back to itself" ([`AGENTS.MD` line 186](/Users/jacobbalslev/Development/skill-graph/AGENTS.MD)) — has no test. The current eval files frequently use prompts like:
+A failure mode the protocol's quality doctrine warns against — "evals that paraphrase the skill body back to itself" ([`AGENTS.md` line 186](/Users/jacobbalslev/Development/skill-graph/AGENTS.md)) — has no test. The current eval files frequently use prompts like:
 > "According to the X skill's Y section, what is the correct primitive…"
 > (e.g. [`examples/evals/a11y.json:8-16`](/Users/jacobbalslev/Development/skill-graph/examples/evals/a11y.json))
@@ -1045,7 +1045,7 @@ The 10 cases cover all 9 rubric dimensions:
 Transfer mix: 6 near-transfer (C1, C3, C4, C7-#1, C8, C9), 4 far-transfer (C2, C5, C6, C7-#2). The schema's heaviest-weighted dimensions (mental_model 1.5, boundary 1.5) both have far-transfer coverage. Misconception has two cases — one near (the Zod question, a domain-internal misconception) and one far (the C++ comparison, a cross-language misconception), exercising the misconception primitive across surface variations.
-The minimum threshold (`AGENTS.MD § Evaluation Discipline`: "≥7 realistic scenarios per skill") is exceeded; this file would also satisfy the existing audit-loop's Eval dimension if the grader is updated to accept the comprehension-extension keys.
+The minimum threshold (`AGENTS.md § Evaluation Discipline`: "≥7 realistic scenarios per skill") is exceeded; this file would also satisfy the existing audit-loop's Eval dimension if the grader is updated to accept the comprehension-extension keys.
 ### 6.4 What this worked example does NOT cover
@@ -1169,7 +1169,7 @@ Update [`SKILL_AUDIT_CHECKLIST.md`](https://github.com/jacob-balslev/skill-audit
 **Effort.** ~half day.
-**Change.** Add a new section to `docs/skill-metadata-protocol.md` and `docs/field-reference.md § concept` explaining the rubric. Cross-reference from `AGENTS.MD § Evaluation Discipline` so the protocol-level discipline is discoverable. Add the new optional eval-file keys (`comprehension_dimension`, `concept_field`, `transfer`, `expected_behaviors`) to `docs/field-reference.md` with the same field-level treatment as existing keys.
+**Change.** Add a new section to `docs/skill-metadata-protocol.md` and `docs/field-reference.md § concept` explaining the rubric. Cross-reference from `AGENTS.md § Evaluation Discipline` so the protocol-level discipline is discoverable. Add the new optional eval-file keys (`comprehension_dimension`, `concept_field`, `transfer`, `expected_behaviors`) to `docs/field-reference.md` with the same field-level treatment as existing keys.
 **Why this is leverage-8.** Without documentation, the rubric is folklore. With documentation in the canonical reference doc, authors discover it during their normal skill-authoring flow.
@@ -1397,7 +1397,7 @@ Recommendation, not decision: defer until R3 ships. Once the grader runs and pro
 This report examined the following **24 repo files** (23 inside `skill-graph/` plus 1 in the parent `Development/skills/` workspace, added in the 2026-05-16 revision):
-1. [`/Users/jacobbalslev/Development/skill-graph/AGENTS.MD`](/Users/jacobbalslev/Development/skill-graph/AGENTS.MD) — full read; especially §§ Evaluation Discipline, Skill Audit Loop, What the Skill Graph Is
+1. [`/Users/jacobbalslev/Development/skill-graph/AGENTS.md`](/Users/jacobbalslev/Development/skill-graph/AGENTS.md) — full read; especially §§ Evaluation Discipline, Skill Audit Loop, What the Skill Graph Is
 2. [`https://github.com/jacob-balslev/skill-metadata-protocol`](https://github.com/jacob-balslev/skill-metadata-protocol) — full read
 3. [`https://github.com/jacob-balslev/skill-audit-loop/blob/main/SKILL_AUDIT_LOOP.md`](https://github.com/jacob-balslev/skill-audit-loop/blob/main/SKILL_AUDIT_LOOP.md) — full read
 4. [`https://github.com/jacob-balslev/skill-audit-loop/blob/main/SKILL_AUDIT_CHECKLIST.md`](https://github.com/jacob-balslev/skill-audit-loop/blob/main/SKILL_AUDIT_CHECKLIST.md) — full read