npm - @minhpnq1807/contextos - Versions diffs - 0.5.51 → 0.5.53 - Mend

@minhpnq1807/contextos 0.5.51 → 0.5.53

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/CHANGELOG.md CHANGED Viewed

@@ -1,7 +1,22 @@
 # Changelog
-## 0.5.51
+## 0.5.53
+- **Optional adapter positioning:** Clarified that ContextOS core works standalone and that `code-review-graph`, `codegraph`, and `agent-memory` are optional adapters. Skill Router scoring now exposes separate `importGraphScore`, `externalGraphScore`, and `memoryScore` fields so missing adapters degrade to zero score instead of becoming install/runtime requirements.
+- **Adapter-aware benchmark update:** Updated the Skill Router formula to reserve explicit weights for local import graph, optional external graph, and optional memory adapters. The 52-case internal benchmark now reports Top-1 Accuracy 94.2%, Top-3 Recall 94.2%, False Positive Rate 0.0%, Confidence Calibration 100.0%, and Negative Gate Accuracy 100.0%.
+- **Release safety docs:** Added a README safety model covering standalone install, optional adapters, fail-open hooks, local-only telemetry, no hook network calls, and no postinstall behavior.
+- **Launch roadmap template:** Added a GitHub issue template for release hardening, README polish, benchmarks, optional adapters, setup, and telemetry roadmap work.
+## 0.5.52
+- **Release candidate polish:** Updated README positioning around ContextOS as a runtime context router, added npm/CI/license badges, a same-prompt/different-repo demo section, a benchmark table, a 30-second install callout, and an AGENTS.md vs RAG vs ContextOS comparison table.
+- **Non-interactive setup safety:** `ctx setup --yes` now defaults to Codex instead of failing with no selected agents, and skips the community skill installer when no TTY is available so release/install smoke tests can complete unattended.
+- **Hot MCP scorer:** `ctx-mcp` now preloads the local embedding pipeline and exposes `ctx_health`/bridge health so prompt hooks only call semantic scoring when the long-running scorer is ready.
+- **Skill Router v2:** Skill suggestions now combine semantic similarity with prompt triggers, dependency evidence, config-file evidence, negative triggers, and confidence explanations. Optional `skill.yaml` metadata beside `SKILL.md` can define positive/negative triggers and related skills.
+- **Confidence calibration:** Skill Router confidence is now calibrated separately from ranking. Prompt-only or semantic-only matches are capped, prompt+project-evidence matches are promoted to medium confidence, dependency+file evidence promotes to high confidence, negative signals cap confidence, and `ctx skills doctor` shows `high`/`medium`/`low` bands.
+- **Skill doctor:** Added `ctx skills doctor -- "task"` to explain selected skills with semantic score, prompt trigger score, project evidence, file evidence, negative signals, and final confidence.
+- **Skill routing eval:** Added `eval/skill-routing` fixtures and `ctx benchmark --skills` to report top-1 accuracy, top-3 recall, false positive rate, confidence calibration, and negative gate accuracy for evidence-based skill routing.
+- **Expanded Skill Router benchmark:** Expanded the eval from the initial 6-case smoke set to 52 cases across deployment, auth, database, testing, mobile, and adversarial negative gates. Current local benchmark: Top-1 Accuracy 92.3%, Top-3 Recall 94.2%, False Positive Rate 0.0%, Confidence Calibration 100.0%, Negative Gate Accuracy 100.0%.
 - **Faster prompt fallback:** Direct prompt-hook fallback now skips embedding work and uses a shorter timeout, so context injection can still return deterministic rule, file, skill, and workflow candidates when MCP or semantic scoring is unavailable.
 - **Shared skill index fallback:** Skill discovery now warms a shared global skill index and searches it when the workspace-specific skill index has no matches, improving reuse across projects.
 - **Agent-visible skill dedupe:** Community skill installs and skill sync now remove duplicate skills visible through shared, Codex, and Antigravity roots while preserving unique agent-specific skills.

package/README.md CHANGED Viewed

@@ -1,8 +1,12 @@
 # ContextOS
-Codex ignores the middle of your `AGENTS.md`. ContextOS fixes that.
+Runtime context router for coding agents.
-It ranks your project rules against the current prompt, injects the right ones at the moment the agent starts work, suggests relevant files/skills/workflows, and reports what the agent actually followed after the task.
+Rules, files, skills, workflows, and evidence: injected before the agent writes code.
+[![npm version](https://img.shields.io/npm/v/@minhpnq1807/contextos.svg)](https://www.npmjs.com/package/@minhpnq1807/contextos)
+[![CI](https://github.com/khovan123/contextOS/actions/workflows/ci.yml/badge.svg)](https://github.com/khovan123/contextOS/actions/workflows/ci.yml)
+[![license: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)
 ```text
 WITHOUT ContextOS
@@ -17,11 +21,38 @@ WITH ContextOS
          -> report followed / ignored / unknown
 ```
+ContextOS is not another `AGENTS.md` loader. It is a runtime context router for coding agents: it chooses the task-relevant rules, files, skills, workflows, and evidence before the agent starts editing.
 Published package: [`@minhpnq1807/contextos`](https://www.npmjs.com/package/@minhpnq1807/contextos)
 ## Demo
-![ContextOS actual terminal demo](docs/demo/contextos-demo.gif)
+![ContextOS demo: same prompt, different repo, correct skills](docs/demo/contextos-demo.gif)
+Same prompt. Different repo. Correct skills.
+```bash
+ctx skills doctor -- "fix deployed"
+```
+| Repo evidence | Expected route |
+| --- | --- |
+| `eas.json`, `expo`, `react-native` | `eas`, `mobile-deployment`, `github-actions-ci-cd` |
+| `vercel.json`, `next`, GitHub workflow | `vercel-deployment`, `github-actions-ci-cd`, `env-secret-management` |
+| ContextOS repo with no app deploy evidence | no deployment skill selected |
+Skill Router internal fixture benchmark:
+| Metric | Result |
+| --- | ---: |
+| Cases | 52 |
+| Top-1 Accuracy | 94.2% |
+| Top-3 Recall | 94.2% |
+| False Positive Rate | 0.0% |
+| Confidence Calibration | 100.0% |
+| Negative Gate Accuracy | 100.0% |
+This is an internal fixture benchmark, not an external real-world benchmark. It is designed to prove the router behavior across controlled Expo/EAS, Next/Vercel, Docker, Railway/Render, Firebase, auth, database, testing, mobile, and adversarial negative-gate cases.
 Example hook context injected before the agent works:
@@ -51,6 +82,8 @@ Runtime telemetry: code-review-graph, code-review-graph.query_graph_tool
 ## Quick Install
+Install in 30 seconds:
 ```bash
 npm install -g @minhpnq1807/contextos
 ctx setup
@@ -104,6 +137,29 @@ The problem is not that agents cannot read `AGENTS.md`. The problem is that larg
 | Sync | Rules/MCP via Ruler, skills via skillshare, workflows via ContextOS. |
 | Evidence | Stop hooks persist `followed`, `ignored`, `unknown`, and runtime telemetry for explicit reports. |
+## Comparison
+| Approach | What it gives the agent | Main gap |
+| --- | --- | --- |
+| Plain `AGENTS.md` | Static repo instructions. | Important rules get buried or ignored when the task changes. |
+| Generic RAG | Semantically related files or snippets. | It usually does not route skills/workflows or prove rule compliance. |
+| ContextOS | Task-routed rules, files, skills, workflows, and evidence. | Requires local setup and warm indexes for best results. |
+## Safety Model
+ContextOS is designed to be OSS-friendly and low-friction:
+| Guarantee | Behavior |
+| --- | --- |
+| Standalone by default | `ctx setup` works without `code-review-graph`, `codegraph`, or `agent-memory`. |
+| Optional adapters | Graph and memory backends add signal when available; missing adapters contribute score `0`. |
+| Fail-open hooks | Prompt hooks return local context or nothing instead of blocking the agent when MCP, embeddings, graph, or memory is unavailable. |
+| Local-only telemetry | Reports, prompt history, evidence, and telemetry stay under `~/.ctx/contextos/`. |
+| No hook network calls | Prompt and stop hooks do not call external services. Install/warm commands may download the local embedding model when explicitly run. |
+| No postinstall surprise | `npm install` only installs the CLI. Setup runs only when you call `ctx setup`. |
+Positioning: ContextOS works standalone and gets smarter when graph or memory adapters are available.
 ## Quick Commands
 | Command | Use it for |
@@ -114,6 +170,7 @@ The problem is not that agents cannot read `AGENTS.md`. The problem is that larg
 | `ctx evidence` | Show why each rule was marked followed/ignored/unknown. |
 | `ctx stats` | Show workspace-level usage and effectiveness metrics. |
 | `ctx benchmark -- "task"` | Compare raw AGENTS.md ordering vs ContextOS scheduling. |
+| `ctx benchmark --skills` | Run the Skill Router eval benchmark. |
 | `ctx sync --rules` | Sync AGENTS/Ruler/MCP config across agents. |
 | `ctx sync --skills` | Sync skills across agents through skillshare. |
 | `ctx sync --workflows` | Sync workflow markdown across Claude/Codex/Antigravity. |
@@ -225,6 +282,14 @@ Restart Antigravity or `agy` after installing.
 The embedding model is mandatory. `ctx install` checks `~/.ctx/contextos/models` first and downloads the MiniLM model only when the required local files are missing. It intentionally fails if the model cannot be prepared, because otherwise the first prompt hook would have to cold-load or download the model.
+ContextOS keeps the embedding model hot inside `ctx-mcp`. Prompt hooks never cold-load transformers; if the MCP bridge is unavailable or the model is still warming, hooks fail open with lightweight scoring. Current local smoke metrics:
+```text
+MCP warm p95: 15-58ms observed
+Hook lightweight fallback: 0.69s
+MCP embedding hot startup: 477ms
+```
 During install, ContextOS prints a 0-100 progress indicator. The longest stage is usually embedding warmup; if the model is already cached, install skips the download and only refreshes vectors.
 Verify the published package in any project:
@@ -418,7 +483,7 @@ This warning comes from a transitive dependency in the local embedding/WASM stac
 | `ctx install --inject` | Installs ContextOS with explicit injection mode. | You want to be explicit in scripts or docs. | Same runtime behavior as the default install mode; if combined with `--quiet`, `--inject` wins. |
 | `ctx install --copy` | Copies only the plugin payload to `$CODEX_HOME/plugins/ctx`. | Legacy local development or manual plugin experiments. | Does not sync the active marketplace, rebuild indexes, register MCP, or install global hooks. Prefer `ctx refresh` for active local updates. |
 | `ctx setup` | Runs the first-run setup wizard. | You want the recommended onboarding flow after `npm install -g @minhpnq1807/contextos`. | Installs selected agents, optionally syncs Ruler rules/MCP and skillshare skills, asks which prompt sections to show, then prints next steps. |
-| `ctx setup --yes` | Runs setup with defaults non-interactively. | You want scriptable all-agent setup. | Uses `codex,claude,agy`, enables injection, syncs rules, syncs skills, and passes `--yes` to dependency setup prompts. |
+| `ctx setup --yes` | Runs setup with defaults non-interactively. | You want scriptable Codex setup. | Uses `codex`, enables injection, syncs rules, syncs skills, skips interactive community-skill installation when no TTY is available, and passes `--yes` to dependency setup prompts. Use `--agents codex,claude,agy` for multi-agent setup. |
 | `ctx setup --agents <list>` | Runs setup for selected agents. | You want only part of the default set. | Accepts comma-separated `codex`, `claude`, `agy`, or `antigravity`. |
 | `ctx setup --no-rules` | Skips Ruler sync during setup. | You only want hooks/MCP install and maybe skill sync. | Does not run `ctx sync --rules`. |
 | `ctx setup --no-skills` | Skips skillshare sync during setup. | You do not want shared skills configured. | Does not run `ctx sync --skills`. |
@@ -428,6 +493,7 @@ This warning comes from a transitive dependency in the local embedding/WASM stac
 | `ctx evidence` | Shows detailed evidence behind the last report for the current workspace. | You want to inspect why a rule was marked `followed`, `ignored`, `unknown`, or `unmeasurable`. | Prints a compact evidence table plus per-rule detail tables. |
 | `ctx stats` | Shows aggregate runtime metrics for the current workspace. | You want to know whether ContextOS is active and useful over time. | Prints sectioned tables for prompt/report counts, injection rate, efficiency, rule outcomes, hook events, last prompt, and last report. |
 | `ctx benchmark -- "task"` | Compares baseline AGENTS.md ordering with ContextOS task-aware scheduling. | You want a before/after signal for lost-in-the-middle risk. | Prints tables for parsed/actionable/filtered rules, baseline middle-risk, scheduled high/mid rules, recency reminder status, and top scored rules. |
+| `ctx benchmark --skills` | Runs the Skill Router eval benchmark. | You want evidence for skill routing accuracy and negative gates. | Prints top-1 accuracy, top-3 recall, false positive rate, confidence calibration, and negative gate accuracy across `eval/skill-routing` fixtures. |
 | `ctx sync --rules` | Syncs project rules and MCP servers through Ruler. | You want Codex, Claude Code, and Antigravity to share one project rule/MCP source of truth. | Ensures `.ruler/ruler.toml`, injects `ctx-mcp`, imports existing MCP servers from Codex and project `.mcp.json`, runs `ruler apply --agents codex,claude,antigravity`, mirrors MCP servers to Antigravity MCP configs, and verifies generated config. |
 | `ctx sync --rules --agents <list>` | Syncs only selected agents through Ruler. | You want to update one or two agents without touching the others. | Accepts comma-separated values such as `codex`, `claude`, `agy`, `antigravity`, or `codex,claude,agy`; `agy` is normalized to Ruler's `antigravity`. |
 | `ctx sync --rules --dry-run` | Previews Ruler sync without writing files or running apply. | You want to inspect behavior before changing project config. | Prints the same flow with dry-run status. |
@@ -492,7 +558,17 @@ These files are local telemetry only. Hooks do not make network calls.
 ## Project Understanding
-ContextOS does not try to replace `code-review-graph`. It uses it as the project-understanding layer when the target repo has already built a graph database.
+ContextOS works standalone. The core path is local rules, file embeddings, import graph expansion, skill routing, workflow routing, and evidence capture.
+Project graph and memory backends are optional adapters:
+| Adapter | What it adds | Required? |
+| --- | --- | --- |
+| `code-review-graph` | Blast radius, semantic node search, and test relationships. | No |
+| `codegraph` | Symbol/call graph context once its MCP schema is stable. | No |
+| `agent-memory` / `agentmemory` | Prior task history, decisions, and recurring bug-fix context. | No |
+ContextOS does not require `code-review-graph`, `codegraph`, or `agent-memory` to install or run. It gets smarter when those backends are available; when they are missing, the adapter scores stay at zero and the hook continues with local context.
 For file suggestions, ContextOS now runs a local RAG-style retrieval pass:
@@ -502,12 +578,12 @@ prompt
   -> ctx-mcp reads AGENTS.md and scores rules with local MiniLM
   -> query the persisted file-vector index in embeddings.db for semantic file candidates
   -> expand candidates through relative import graph links
-  -> query code-review-graph semantic_search_nodes with seed entity names
-  -> merge and deduplicate semantic, import-graph, and code-review-graph matches
+  -> optionally query code-review-graph semantic_search_nodes with seed entity names
+  -> merge and deduplicate semantic, import-graph, and optional graph matches
   -> inject top suggested files with graph evidence reasons
 ```
-This keeps the hook fast and local while still using graph semantics when available. The graph search path is visible in runtime data through file reasons such as `graph:content-moderation.service`.
+This keeps the hook fast and local while still using graph semantics when available. The graph search path is visible in runtime data through file reasons such as `graph:content-moderation.service`. When no graph adapter is available, file suggestions still use local file vectors and import graph expansion.
 Prompt scoring does not walk the repository for file candidates or import expansion. `ctx install` and `ctx embeddings warm` rebuild the persisted file-vector index and one-hop import adjacency index by walking source paths once; prompt hooks query those indexes directly. Rules, files, skills, and workflows are scored concurrently with `Promise.all()`.
@@ -521,13 +597,71 @@ Injected prompt sections are intentionally compact: rules show only detected rul
 Codex may flatten newlines in its `UserPromptSubmit hook (completed)` preview. The injected `additionalContext` payload remains multiline; this is a Codex preview display limitation.
-Skill ranking is semantic-only. ContextOS builds a fused query from the user prompt plus a cached project profile, then compares that vector with cached skill vectors:
+Skill ranking uses Skill Router v2. ContextOS still starts with semantic retrieval, but final confidence is evidence-based:
+```text
+final_score =
+  semantic_score * 0.30
++ prompt_trigger_score * 0.20
++ project_evidence_score * 0.20
++ file_config_score * 0.10
++ import_graph_score * 0.10
++ external_graph_score * 0.05
++ memory_score * 0.05
+- negative_penalty * 0.20
+```
+`external_graph_score` is supplied by optional project graph adapters such as `code-review-graph` or `codegraph`. `memory_score` is reserved for optional memory adapters such as `agent-memory`. Without those adapters, both scores are `0`.
+Skill metadata can live beside `SKILL.md` as `skill.yaml`:
+```yaml
+id: eas
+name: Expo EAS Deployment
+positive_triggers:
+  prompts: [eas, expo build, deployed, android, ios]
+  files: [eas.json, app.json, app.config.ts]
+  dependencies: [expo, eas-cli]
+negative_triggers:
+  dependencies: [next, vite]
+  files: [vercel.json]
+related_skills:
+  - mobile-deployment
+  - github-actions-ci-cd
+  - env-secret-management
+```
+The project profile is built from bounded root/workspace `package.json` metadata, dependencies, scripts, detected languages, recent git files, and config files such as `eas.json`, `app.json`, `vercel.json`, and `.github/workflows/*`. ContextOS only gives high confidence to domain-specific skills when project evidence supports them. For example, `fix deployed` can rank `eas` highly in an Expo project with `eas.json` and `expo`, but a Next.js/Vercel project should route to Vercel and CI/CD deployment skills instead. Skill catalogs are deduplicated by normalized skill name before indexing and rendering.
+Use `ctx skills doctor -- "task"` to inspect routing:
+```bash
+ctx skills doctor -- "fix deployed"
+```
+The doctor output shows semantic score, prompt triggers, dependency/file evidence, negative signals, and final confidence for each selected skill.
+Confidence is calibrated separately from ranking and includes a band:
+```text
+high: >= 0.85
+medium: 0.65-0.84
+low: < 0.65
+```
+Use `ctx benchmark --skills` to run the local Skill Router benchmark. The eval lives in `eval/skill-routing` and currently covers 52 cases across deployment, auth, database, testing, mobile, and adversarial negative gates.
+Current local benchmark:
 ```text
-embed(prompt + project profile) -> cosine -> embed(skill name + description)
+Cases: 52
+Top-1 Accuracy: 94.2%
+Top-3 Recall: 94.2%
+False Positive Rate: 0.0%
+Confidence Calibration: 100.0%
+Negative Gate Accuracy: 100.0%
 ```
-The project profile is an embeddable string built from bounded root/workspace `package.json` metadata, dependencies, scripts, detected languages, and recent git files. It is cached under the ContextOS workspace data directory and invalidated when package metadata or git `HEAD` changes. ContextOS does not maintain a skill taxonomy or domain gate list for ranking; if the skill index is cold for a large catalog, prompt hooks fail open instead of falling back to arbitrary keyword matches. Skill catalogs are deduplicated by normalized skill name before indexing and rendering.
+The benchmark includes same-prompt/different-repo checks such as `fix deployed` in Expo/EAS, Next/Vercel, and ContextOS itself, plus adversarial cases like `expo-with-vercel-json` where `eas` is expected and `vercel-deployment` must be rejected.
 After `ctx refresh`, ContextOS invalidates the private hook bridge socket so prompts fall back to direct scoring until Codex restarts the long-running `ctx-mcp` process. Hook clients also discard a same-inode socket if an older bridge revision is detected.
@@ -541,10 +675,10 @@ CONTEXTOS_EMBEDDINGS=0            disable embedding rule scoring
 CONTEXTOS_MCP_CONNECT_TIMEOUT_MS=100 stale ctx-mcp socket connect timeout
 CONTEXTOS_MCP_BRIDGE_TIMEOUT_MS=2000 ctx-mcp hook bridge timeout
 CONTEXTOS_HOOK_DEADLINE_MS=8500 hard fail-open deadline for prompt hooks
-CONTEXTOS_DIRECT_FALLBACK_TIMEOUT_MS=6000 direct scoring timeout when the bridge is unavailable
+CONTEXTOS_DIRECT_FALLBACK_TIMEOUT_MS=2500 direct scoring timeout when the bridge is unavailable
 CONTEXTOS_HOOK_EMBEDDING_TIMEOUT_MS=500 rule embedding timeout during hook direct fallback
 CONTEXTOS_EMBEDDING_TIMEOUT_MS=800 embedding scoring timeout inside ctx-mcp/debug
-CONTEXTOS_HOOK_SKILL_EMBEDDING_TIMEOUT_MS=2000 skill retrieval timeout during hook direct fallback
+CONTEXTOS_HOOK_SKILL_EMBEDDING_TIMEOUT_MS=2000 skill retrieval timeout when embeddings are enabled
 CONTEXTOS_SKILL_EMBEDDING_TIMEOUT_MS=2000 skill retrieval timeout inside ctx-mcp/debug
 CONTEXTOS_FILE_EMBEDDINGS=0       disable file-path embedding retrieval
 CONTEXTOS_HOOK_FILE_EMBEDDING_TIMEOUT_MS=500 file retrieval timeout during hook direct fallback

package/bin/ctx.js CHANGED Viewed

@@ -19,6 +19,7 @@ import { scoreContext } from "../plugins/ctx/lib/score-context.js";
 import { defaultDataRoot, workspaceDataDir, workspaceMarkerPath } from "../plugins/ctx/lib/workspace-data.js";
 import { installMcpTelemetryProxies } from "../plugins/ctx/lib/mcp-proxy-install.js";
 import { benchmarkWorkspace, formatBenchmark } from "../plugins/ctx/lib/benchmark.js";
+import { formatSkillRoutingBenchmark, runSkillRoutingEval } from "../eval/skill-routing/run-eval.js";
 import { copyDir, copyPackageRoot, syncPackageRoot } from "../plugins/ctx/lib/package-install.js";
 import { installClaudeHooks } from "../plugins/ctx/lib/claude-hooks.js";
 import { installClaudeMcp } from "../plugins/ctx/lib/claude-mcp.js";
@@ -30,7 +31,7 @@ import { readCodexMcpServers, syncRules } from "../plugins/ctx/lib/ruler-sync.js
 import { detectGraphStrategy, embedCodeReviewGraph, formatCodeReviewGraphEmbedding, formatGraphStrategy } from "../plugins/ctx/lib/graph-strategy.js";
 import { writeInnerGitignore, ensureRootGitignore } from "../plugins/ctx/lib/gitignore.js";
 import { dedupeAgentVisibleSkills, repairSkillSymlinks, syncSkills, detectExistingSkills } from "../plugins/ctx/lib/skillshare-sync.js";
-import { scanSkills, warmSkillEmbeddings } from "../plugins/ctx/lib/skill-discoverer.js";
+import { diagnoseSkills, scanSkills, warmSkillEmbeddings } from "../plugins/ctx/lib/skill-discoverer.js";
 import { parsePassthroughArgs, runPassthrough } from "../plugins/ctx/lib/passthrough.js";
 import { parseAgentList, parseSetupArgs, setupSummaryLines } from "../plugins/ctx/lib/setup-wizard.js";
 import { multiSelect } from "../plugins/ctx/lib/multi-select.js";
@@ -193,6 +194,7 @@ Usage:
   ctx evidence                                      Show evidence from last report
   ctx stats                                         Show workspace statistics
   ctx benchmark -- "task"                           Benchmark workspace for a task
+  ctx benchmark --skills                            Run skill routing eval benchmark
   ctx sync --rules                                  Sync AGENTS.md rules to all agents
   ctx sync --rules --agents <names>                 Sync rules to specific agents only
   ctx sync --rules --dry-run                        Preview rule sync without writing
@@ -207,6 +209,7 @@ Usage:
   ctx sync --workflows --agents <names>             Sync workflows to specific agents
   ctx sync --workflows --dry-run                    Preview workflow sync without writing
   ctx skills                                        Browse community skill libraries
+  ctx skills doctor -- "task"                       Explain skill routing for a task
   ctx skills --agents <names>                       Filter skills for specific agents
   ctx skills --refresh                              Force refresh skill library cache
   ctx --config                                      Choose prompt context sections to show
@@ -650,6 +653,38 @@ async function debug(task) {
   console.log(scheduled.additionalContext || "(empty)");
 }
+async function skillsDoctor(task) {
+  if (!String(task || "").trim()) throw new Error('Usage: ctx skills doctor -- "task"');
+  const result = await diagnoseSkills({
+    cwd: process.cwd(),
+    prompt: task,
+    dataDir: contextOSDataDir(),
+    skills: scanSkills({ cwd: process.cwd() }),
+    limit: outputConfigLimits(loadOutputConfig({ dataRoot: contextOSDataDir() })).skills,
+    timeoutMs: Number(process.env.CONTEXTOS_SKILL_DOCTOR_TIMEOUT_MS || 3000)
+  });
+  console.log("ContextOS skill doctor");
+  console.log(`cwd: ${result.cwd}`);
+  console.log(`prompt: ${result.prompt}`);
+  console.log("");
+  console.log("Project evidence:");
+  console.log(`dependencies: ${result.projectEvidence.dependencies.slice(0, 30).join(", ") || "(none)"}`);
+  console.log(`files: ${result.projectEvidence.files.slice(0, 30).join(", ") || "(none)"}`);
+  console.log("");
+  console.log("Skills:");
+  if (!result.skills.length) {
+    console.log("(none)");
+    return;
+  }
+  for (const skill of result.skills) {
+    console.log(`${Number(skill.confidence || skill.score || 0).toFixed(2)}  ${skill.confidenceBand || "low"}  ${skill.name}`);
+    console.log(`      semantic:${Number(skill.semanticScore || 0).toFixed(2)} prompt:${Number(skill.promptTriggerScore || 0).toFixed(2)} project:${Number(skill.projectEvidenceScore || 0).toFixed(2)} files:${Number(skill.fileConfigScore || 0).toFixed(2)} import:${Number(skill.importGraphScore || 0).toFixed(2)} graph:${Number(skill.externalGraphScore || skill.graphScore || 0).toFixed(2)} memory:${Number(skill.memoryScore || 0).toFixed(2)} negative:${Number(skill.negativePenalty || 0).toFixed(2)}`);
+    if (skill.evidence?.length) console.log(`      evidence: ${skill.evidence.join(", ")}`);
+    if (skill.negativeEvidence?.length) console.log(`      rejected signals: ${skill.negativeEvidence.join(", ")}`);
+  }
+}
 async function warmEmbeddings(task, { syncMarketplace = true, quiet = false } = {}) {
   const warmResult = await warmWorkspaceIndexes({ task });
   const marketplaceSync = syncMarketplace ? syncActiveCodexMarketplace() : null;
@@ -874,15 +909,21 @@ async function setup({ args = [], cwd = process.cwd() } = {}) {
     const totalExisting = existing.reduce((sum, e) => sum + e.count, 0);
     if (totalExisting === 0) {
       console.log("");
-      console.log(`${YELLOW}⚠${RESET}  No skills found on this machine.`);
-      console.log(`${DIM}│${RESET}  Install community skills to get started.`);
+      console.log("⚠  No skills found on this machine.");
+      console.log("│  Install community skills to get started.");
       console.log("");
-      const installed = await runCommunitySkillInstaller(options.agents);
-      if (installed > 0) {
+      if (options.yes || !process.stdin.isTTY) {
+        console.log("│  Skipping community skill installer in non-interactive setup.");
+        console.log("│  Run: ctx skills");
         console.log("");
-        console.log("◇ Re-syncing skills after install...");
-        await doSyncSkills();
+      } else {
+        const installed = await runCommunitySkillInstaller(options.agents);
+        if (installed > 0) {
+          console.log("");
+          console.log("◇ Re-syncing skills after install...");
+          await doSyncSkills();
+        }
       }
     }
   }
@@ -981,11 +1022,21 @@ try {
   } else if (command === "stats") {
     console.log(formatStats(loadStats(contextOSWorkspaceDataDir())));
   } else if (command === "benchmark") {
+    if (args.includes("--skills")) {
+      console.log(formatSkillRoutingBenchmark(await runSkillRoutingEval({ rootDir })));
+    } else {
     const marker = args.indexOf("--");
     const task = marker >= 0 ? args.slice(marker + 1).join(" ") : args.slice(1).join(" ");
     if (!task.trim()) throw new Error('Usage: ctx benchmark -- "task"');
     console.log(formatBenchmark(benchmarkWorkspace({ cwd: process.cwd(), task })));
+    }
   } else if (command === "skills") {
+    if (args[1] === "doctor") {
+      const marker = args.indexOf("--");
+      const task = marker >= 0 ? args.slice(marker + 1).join(" ") : args.slice(2).join(" ");
+      await skillsDoctor(task);
+      process.exitCode = 0;
+    } else {
     // Interactive community skill library selector + installer
     const agentsFlag = args.indexOf("--agents");
     const forceRefresh = args.includes("--refresh");
@@ -1020,6 +1071,7 @@ try {
       }));
     }
     console.log("");
+    }
   } else if (command === "sync") {
     if (args.includes("--workflows")) {
       await syncWorkflows({