@minhpnq1807/contextos 0.5.53 → 0.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.codex/skills/contextos-community/SKILL.md +15 -0
- package/.codex/skills/contextos-community/skill.yaml +20 -0
- package/.codex/skills/contextos-release/SKILL.md +15 -0
- package/.codex/skills/contextos-release/skill.yaml +20 -0
- package/.codex/skills/contextos-routing/SKILL.md +15 -0
- package/.codex/skills/contextos-routing/skill.yaml +20 -0
- package/.codex/workflows/primary.md +13 -0
- package/.codex/workflows/release.md +12 -0
- package/CHANGELOG.md +13 -0
- package/README.md +100 -2
- package/bin/ctx.js +12 -0
- package/community-skills/README.md +42 -0
- package/community-skills/_template/SKILL.md +15 -0
- package/community-skills/_template/skill.yaml +20 -0
- package/community-skills/eas/SKILL.md +15 -0
- package/community-skills/eas/skill.yaml +23 -0
- package/community-skills/jwt-auth/SKILL.md +15 -0
- package/community-skills/jwt-auth/skill.yaml +22 -0
- package/community-skills/oauth-google/SKILL.md +15 -0
- package/community-skills/oauth-google/skill.yaml +22 -0
- package/community-skills/prisma/SKILL.md +15 -0
- package/community-skills/prisma/skill.yaml +22 -0
- package/community-skills/redis/SKILL.md +15 -0
- package/community-skills/redis/skill.yaml +22 -0
- package/community-skills/vercel/SKILL.md +15 -0
- package/community-skills/vercel/skill.yaml +22 -0
- package/docs/demo/agents-lost-middle.gif +0 -0
- package/docs/demo/agents-lost-middle.txt +28 -0
- package/docs/demo/contextos-ready.gif +0 -0
- package/docs/demo/contextos-ready.txt +20 -0
- package/docs/demo/same-prompt-different-context.gif +0 -0
- package/docs/demo/same-prompt-different-context.txt +26 -0
- package/docs/launch-demos.md +127 -0
- package/docs/roadmap.md +285 -0
- package/eval/hallucination/run-leaderboard.js +183 -0
- package/package.json +5 -1
- package/plugins/ctx/.codex-plugin/plugin.json +1 -1
- package/plugins/ctx/lib/certification.js +223 -0
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ContextOS Community
|
|
3
|
+
description: Maintain Community Skill Packs, certification docs, launch demos, and distribution-oriented ContextOS assets.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ContextOS Community
|
|
7
|
+
|
|
8
|
+
Use this skill for Community Skill Packs, ContextOS Ready certification, launch demos, roadmap docs, and contributor-facing assets.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Keep community assets small and PR-friendly.
|
|
13
|
+
2. Prefer docs, examples, and tests over hosted infrastructure.
|
|
14
|
+
3. Ensure every skill pack has triggers, evidence, negative triggers, and workflow.
|
|
15
|
+
4. Validate docs and package inclusion before committing.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
id: contextos-community
|
|
2
|
+
name: ContextOS Community
|
|
3
|
+
description: Maintain Community Skill Packs, certification docs, launch demos, and distribution assets.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [community skill packs, contextos ready, certified, badge, roadmap, launch demo, contributor]
|
|
6
|
+
files: [community-skills/README.md, docs/roadmap.md, docs/launch-demos.md, README.md]
|
|
7
|
+
dependencies: [vitest]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [community-skills/README.md, docs/roadmap.md, README.md]
|
|
10
|
+
dependencies: [vitest]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
prompts: [runtime scorer, mcp transport, embedding model]
|
|
13
|
+
workflow:
|
|
14
|
+
- Keep community assets small and PR-friendly.
|
|
15
|
+
- Prefer docs, examples, and tests over hosted infrastructure.
|
|
16
|
+
- Ensure every skill pack has triggers, evidence, negative triggers, and workflow.
|
|
17
|
+
- Validate docs and package inclusion before committing.
|
|
18
|
+
related_skills:
|
|
19
|
+
- community-management
|
|
20
|
+
- technical-writing
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ContextOS Release
|
|
3
|
+
description: Prepare ContextOS npm package releases, changelog updates, plugin validation, tags, and GitHub release safety checks.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ContextOS Release
|
|
7
|
+
|
|
8
|
+
Use this skill when bumping versions, preparing changelog entries, validating package contents, or publishing ContextOS.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Update README and CHANGELOG before tagging.
|
|
13
|
+
2. Verify package and plugin versions stay aligned.
|
|
14
|
+
3. Run build, plugin validation, MCP smoke, tests, and `npm pack --dry-run`.
|
|
15
|
+
4. Commit, tag, push, and verify the GitHub release/npm publish path.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
id: contextos-release
|
|
2
|
+
name: ContextOS Release
|
|
3
|
+
description: Prepare ContextOS npm releases, changelog updates, tags, and package validation.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [release, bump version, changelog, npm publish, tag, package, validate plugin]
|
|
6
|
+
files: [package.json, plugins/ctx/.codex-plugin/plugin.json, CHANGELOG.md, README.md]
|
|
7
|
+
dependencies: [vitest]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [package.json, plugins/ctx/.codex-plugin/plugin.json, CHANGELOG.md]
|
|
10
|
+
dependencies: [vitest]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
prompts: [application feature, dashboard UI, backend API]
|
|
13
|
+
workflow:
|
|
14
|
+
- Update README and CHANGELOG before tagging.
|
|
15
|
+
- Verify package and plugin versions stay aligned.
|
|
16
|
+
- Run build, plugin validation, MCP smoke, tests, and npm pack dry-run.
|
|
17
|
+
- Commit, tag, push, and verify release automation.
|
|
18
|
+
related_skills:
|
|
19
|
+
- npm-package-release
|
|
20
|
+
- github-actions-ci-cd
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ContextOS Routing
|
|
3
|
+
description: Work on ContextOS rule, file, skill, workflow, and evidence routing without adding broad runtime complexity.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# ContextOS Routing
|
|
7
|
+
|
|
8
|
+
Use this skill when changing ContextOS routing behavior, prompt hook output, scoring, evidence, or MCP scorer paths.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Inspect the existing routing module before changing behavior.
|
|
13
|
+
2. Keep prompt-hook hot paths bounded and fail-open.
|
|
14
|
+
3. Add focused tests for ranking, output, or telemetry behavior.
|
|
15
|
+
4. Run `npm test`, `npm run build`, and `npm run validate:plugin` when the change affects runtime.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
id: contextos-routing
|
|
2
|
+
name: ContextOS Routing
|
|
3
|
+
description: Work on ContextOS rule, file, skill, workflow, and evidence routing.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [contextos routing, prompt hook, skill router, workflow router, evidence, scoring, ctx-mcp]
|
|
6
|
+
files: [plugins/ctx/lib/score-context.js, plugins/ctx/lib/skill-discoverer.js, plugins/ctx/lib/workflow-discoverer.js, plugins/ctx/lib/prompt-hook.js]
|
|
7
|
+
dependencies: [@modelcontextprotocol/sdk]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [plugins/ctx/lib/score-context.js, plugins/ctx/lib/skill-discoverer.js, plugins/ctx/mcp/server.js]
|
|
10
|
+
dependencies: [@modelcontextprotocol/sdk, @xenova/transformers]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
prompts: [dashboard, cloud service, hosted marketplace]
|
|
13
|
+
workflow:
|
|
14
|
+
- Inspect the existing routing module before changing behavior.
|
|
15
|
+
- Keep prompt-hook hot paths bounded and fail-open.
|
|
16
|
+
- Add focused tests for ranking, output, or telemetry behavior.
|
|
17
|
+
- Run npm test plus build and plugin validation for runtime changes.
|
|
18
|
+
related_skills:
|
|
19
|
+
- mcp-tool-developer
|
|
20
|
+
- agent-memory-mcp
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# Primary ContextOS Workflow
|
|
2
|
+
|
|
3
|
+
Use this workflow for feature implementation, debugging, routing changes, and test fixes in ContextOS.
|
|
4
|
+
|
|
5
|
+
planner -> researcher -> tester -> code-reviewer -> docs-manager
|
|
6
|
+
|
|
7
|
+
## Steps
|
|
8
|
+
|
|
9
|
+
1. Confirm the affected ContextOS surface: CLI, prompt hook, MCP server, router, setup, docs, or tests.
|
|
10
|
+
2. Inspect the existing module and nearby tests before patching.
|
|
11
|
+
3. Keep runtime behavior local-first, bounded, and fail-open.
|
|
12
|
+
4. Add focused tests for the changed behavior.
|
|
13
|
+
5. Run the relevant validation commands before commit.
|
|
@@ -0,0 +1,12 @@
|
|
|
1
|
+
# ContextOS Release Workflow
|
|
2
|
+
|
|
3
|
+
Use this workflow for version bumps, changelog updates, package validation, tags, and release checks.
|
|
4
|
+
|
|
5
|
+
planner -> tester -> docs-manager -> code-reviewer
|
|
6
|
+
|
|
7
|
+
## Steps
|
|
8
|
+
|
|
9
|
+
1. Update package and plugin versions together when releasing.
|
|
10
|
+
2. Update README and CHANGELOG with user-visible behavior.
|
|
11
|
+
3. Run tests, build, plugin validation, MCP smoke, and package dry-run.
|
|
12
|
+
4. Commit, tag, push, and verify the release automation result.
|
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,18 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.6.1
|
|
4
|
+
|
|
5
|
+
- **Hallucination Leaderboard:** Added `ctx leaderboard --hallucination` and `npm run leaderboard:hallucination` to compare raw prompt-only skill guesses against ContextOS evidence-routed skill selection across 20 fixture tasks.
|
|
6
|
+
|
|
7
|
+
## 0.6.0
|
|
8
|
+
|
|
9
|
+
- **Launch demo framing:** Added Agent Hallucination Benchmark messaging, same-prompt/same-model/different-context copy, and `docs/launch-demos.md` with three short demo scripts: hallucination benchmark, AGENTS.md lost-in-the-middle, and repo-aware skills.
|
|
10
|
+
- **Roadmap template expansion:** Extended the launch roadmap issue template with Hallucination Benchmark, Agent Replay, and Community Skill Packs areas.
|
|
11
|
+
- **Roadmap docs:** Added `docs/roadmap.md` covering Hallucination Leaderboard, Agent Replay, Community Skill Packs, and ContextOS Ready certification without committing to dashboard/cloud work.
|
|
12
|
+
- **Community Skill Packs:** Added the initial `community-skills/` seed packs for EAS, Vercel, Prisma, Redis, Google OAuth, and JWT auth with `SKILL.md`, `skill.yaml`, contribution docs, and routing contract tests.
|
|
13
|
+
- **ContextOS Ready:** Added `ctx doctor` to score repository readiness across project rules, skill packs, and workflows, plus README badge/docs and certification tests.
|
|
14
|
+
- **Auto Skill Extraction roadmap:** Documented `ctx skill generate` as a research direction for detecting reusable repository skills and drafting publishable skill packs from project evidence.
|
|
15
|
+
|
|
3
16
|
## 0.5.53
|
|
4
17
|
|
|
5
18
|
- **Optional adapter positioning:** Clarified that ContextOS core works standalone and that `code-review-graph`, `codegraph`, and `agent-memory` are optional adapters. Skill Router scoring now exposes separate `importGraphScore`, `externalGraphScore`, and `memoryScore` fields so missing adapters degrade to zero score instead of becoming install/runtime requirements.
|
package/README.md
CHANGED
|
@@ -6,6 +6,7 @@ Rules, files, skills, workflows, and evidence: injected before the agent writes
|
|
|
6
6
|
|
|
7
7
|
[](https://www.npmjs.com/package/@minhpnq1807/contextos)
|
|
8
8
|
[](https://github.com/khovan123/contextOS/actions/workflows/ci.yml)
|
|
9
|
+
[](#contextos-ready)
|
|
9
10
|
[](LICENSE)
|
|
10
11
|
|
|
11
12
|
```text
|
|
@@ -27,9 +28,9 @@ Published package: [`@minhpnq1807/contextos`](https://www.npmjs.com/package/@min
|
|
|
27
28
|
|
|
28
29
|
## Demo
|
|
29
30
|
|
|
30
|
-

|
|
31
32
|
|
|
32
|
-
Same prompt.
|
|
33
|
+
Same prompt. Same model. Different context.
|
|
33
34
|
|
|
34
35
|
```bash
|
|
35
36
|
ctx skills doctor -- "fix deployed"
|
|
@@ -41,6 +42,38 @@ ctx skills doctor -- "fix deployed"
|
|
|
41
42
|
| `vercel.json`, `next`, GitHub workflow | `vercel-deployment`, `github-actions-ci-cd`, `env-secret-management` |
|
|
42
43
|
| ContextOS repo with no app deploy evidence | no deployment skill selected |
|
|
43
44
|
|
|
45
|
+
More 10-second demos:
|
|
46
|
+
|
|
47
|
+
| Demo | GIF |
|
|
48
|
+
| --- | --- |
|
|
49
|
+
| AGENTS.md Lost In The Middle | [docs/demo/agents-lost-middle.gif](docs/demo/agents-lost-middle.gif) |
|
|
50
|
+
| ContextOS Ready Gold | [docs/demo/contextos-ready.gif](docs/demo/contextos-ready.gif) |
|
|
51
|
+
|
|
52
|
+
## Agent Hallucination Benchmark
|
|
53
|
+
|
|
54
|
+
Generic agents often guess deployment tooling from the prompt alone:
|
|
55
|
+
|
|
56
|
+
```text
|
|
57
|
+
Prompt: Fix deployment
|
|
58
|
+
Raw agent guess: Vercel, Docker, Railway
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
ContextOS routes from project evidence instead:
|
|
62
|
+
|
|
63
|
+
```text
|
|
64
|
+
Detected evidence:
|
|
65
|
+
- eas.json
|
|
66
|
+
- expo dependency
|
|
67
|
+
- GitHub workflow
|
|
68
|
+
|
|
69
|
+
Selected skills:
|
|
70
|
+
- eas
|
|
71
|
+
- mobile-deployment
|
|
72
|
+
- github-actions-ci-cd
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
That is the core launch demo: same prompt, same model, different repo context, correct skills.
|
|
76
|
+
|
|
44
77
|
Skill Router internal fixture benchmark:
|
|
45
78
|
|
|
46
79
|
| Metric | Result |
|
|
@@ -54,6 +87,19 @@ Skill Router internal fixture benchmark:
|
|
|
54
87
|
|
|
55
88
|
This is an internal fixture benchmark, not an external real-world benchmark. It is designed to prove the router behavior across controlled Expo/EAS, Next/Vercel, Docker, Railway/Render, Firebase, auth, database, testing, mobile, and adversarial negative-gate cases.
|
|
56
89
|
|
|
90
|
+
Hallucination leaderboard:
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
ctx leaderboard --hallucination
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Current local result across 20 fixture tasks and 12 repo contexts:
|
|
97
|
+
|
|
98
|
+
| System | Correct Skill |
|
|
99
|
+
| --- | ---: |
|
|
100
|
+
| Raw Agent | 10.0% |
|
|
101
|
+
| ContextOS + Codex | 80.0% |
|
|
102
|
+
|
|
57
103
|
Example hook context injected before the agent works:
|
|
58
104
|
|
|
59
105
|
```text
|
|
@@ -126,6 +172,13 @@ Developers put real operating instructions in `AGENTS.md`: use this graph tool b
|
|
|
126
172
|
|
|
127
173
|
The problem is not that agents cannot read `AGENTS.md`. The problem is that large context windows bury the important rule in the middle, where attention is weak. ContextOS turns a static rules file into task-aware runtime context.
|
|
128
174
|
|
|
175
|
+
The next visible demo is not another feature. It is showing the pain in a few seconds:
|
|
176
|
+
|
|
177
|
+
```text
|
|
178
|
+
Raw agent: guesses from the prompt.
|
|
179
|
+
ContextOS: routes from repo evidence.
|
|
180
|
+
```
|
|
181
|
+
|
|
129
182
|
## What ContextOS Does
|
|
130
183
|
|
|
131
184
|
| Layer | What happens |
|
|
@@ -160,17 +213,60 @@ ContextOS is designed to be OSS-friendly and low-friction:
|
|
|
160
213
|
|
|
161
214
|
Positioning: ContextOS works standalone and gets smarter when graph or memory adapters are available.
|
|
162
215
|
|
|
216
|
+
## Roadmap
|
|
217
|
+
|
|
218
|
+
ContextOS is not heading toward a dashboard-first product. The next work is focused on making the existing local runtime more visible and reusable:
|
|
219
|
+
|
|
220
|
+
| Next | Why |
|
|
221
|
+
| --- | --- |
|
|
222
|
+
| Hallucination Leaderboard | Compare raw agent guesses vs ContextOS evidence-routed recommendations across the same repos and tasks. |
|
|
223
|
+
| Agent Replay | Turn telemetry into a readable post-task narrative: prompt, selected skills, followed rules, suggested files, touched files, efficiency. |
|
|
224
|
+
| Community Skill Packs | Let contributors PR ContextOS-ready skills with triggers, evidence, negative gates, and workflows before building a larger hub. |
|
|
225
|
+
| ContextOS Ready | Define a repository readiness badge for AGENTS.md, skills, workflows, and evidence quality. |
|
|
226
|
+
| Auto Skill Extraction | Research `ctx skill generate` so ContextOS can detect reusable skills from a repo and propose publishable skill packs. |
|
|
227
|
+
|
|
228
|
+
See [docs/roadmap.md](docs/roadmap.md) for the current roadmap notes.
|
|
229
|
+
|
|
230
|
+
## Community Skill Packs
|
|
231
|
+
|
|
232
|
+
ContextOS starts the community loop with [`community-skills/`](community-skills/) instead of a hosted marketplace. The seed packs are `eas`, `vercel`, `prisma`, `redis`, `oauth-google`, and `jwt-auth`.
|
|
233
|
+
|
|
234
|
+
Each pack contains a model-visible `SKILL.md` plus `skill.yaml` routing metadata with prompt triggers, project evidence, negative triggers, and a short workflow. Contributors can PR new packs by copying [`community-skills/_template/`](community-skills/_template/).
|
|
235
|
+
|
|
236
|
+
## ContextOS Ready
|
|
237
|
+
|
|
238
|
+
`ctx doctor` scores whether a repository is ready for ContextOS-style agent routing:
|
|
239
|
+
|
|
240
|
+
```bash
|
|
241
|
+
ctx doctor
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
```text
|
|
245
|
+
Repository Score
|
|
246
|
+
|
|
247
|
+
Rules: 92
|
|
248
|
+
Skills: 88
|
|
249
|
+
Workflows: 84
|
|
250
|
+
|
|
251
|
+
Overall:
|
|
252
|
+
ContextOS Ready Gold
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
The score checks project `AGENTS.md` rules, project skill packs under `.codex/skills/` or `.agents/skills/`, and project workflows under `.codex/workflows/` or `.claude/workflows/`. Use the badge only after `ctx doctor` reports Bronze, Silver, or Gold.
|
|
256
|
+
|
|
163
257
|
## Quick Commands
|
|
164
258
|
|
|
165
259
|
| Command | Use it for |
|
|
166
260
|
| --- | --- |
|
|
167
261
|
| `ctx setup` | Recommended first-run install flow. |
|
|
168
262
|
| `ctx debug -- "Recheck authen flow"` | Preview what ContextOS would inject. |
|
|
263
|
+
| `ctx doctor` | Score repository readiness for the `ContextOS Ready` badge. |
|
|
169
264
|
| `ctx report` | Show the last task's compliance summary. |
|
|
170
265
|
| `ctx evidence` | Show why each rule was marked followed/ignored/unknown. |
|
|
171
266
|
| `ctx stats` | Show workspace-level usage and effectiveness metrics. |
|
|
172
267
|
| `ctx benchmark -- "task"` | Compare raw AGENTS.md ordering vs ContextOS scheduling. |
|
|
173
268
|
| `ctx benchmark --skills` | Run the Skill Router eval benchmark. |
|
|
269
|
+
| `ctx leaderboard --hallucination` | Compare raw prompt-only guesses vs ContextOS routing. |
|
|
174
270
|
| `ctx sync --rules` | Sync AGENTS/Ruler/MCP config across agents. |
|
|
175
271
|
| `ctx sync --skills` | Sync skills across agents through skillshare. |
|
|
176
272
|
| `ctx sync --workflows` | Sync workflow markdown across Claude/Codex/Antigravity. |
|
|
@@ -489,11 +585,13 @@ This warning comes from a transitive dependency in the local embedding/WASM stac
|
|
|
489
585
|
| `ctx setup --no-skills` | Skips skillshare sync during setup. | You do not want shared skills configured. | Does not run `ctx sync --skills`. |
|
|
490
586
|
| `ctx setup --quiet` | Runs setup in measurement-only mode. | You want reports/stats without visible injected prompt context. | Installs hooks with prompt context injection disabled. |
|
|
491
587
|
| `ctx debug -- "task"` | Runs the scheduler locally for a fake prompt. | You want to see which AGENTS.md rules and files ContextOS would inject before using Codex. | Prints rule scores, scoring reasons, suggested files, and final `additionalContext`. |
|
|
588
|
+
| `ctx doctor` | Scores repository ContextOS readiness. | You want to add or verify a `ContextOS Ready` badge. | Prints Rules, Skills, Workflows, Overall tier, evidence, and next recommendations. |
|
|
492
589
|
| `ctx report` | Shows the last Stop-hook compliance report for the current workspace. | An agent task has finished and you want the summary again. | Prints sectioned tables for summary, rule outcomes, suggested files, and runtime telemetry from `~/.ctx/contextos/workspaces/<workspace-id>/last-report.json`. |
|
|
493
590
|
| `ctx evidence` | Shows detailed evidence behind the last report for the current workspace. | You want to inspect why a rule was marked `followed`, `ignored`, `unknown`, or `unmeasurable`. | Prints a compact evidence table plus per-rule detail tables. |
|
|
494
591
|
| `ctx stats` | Shows aggregate runtime metrics for the current workspace. | You want to know whether ContextOS is active and useful over time. | Prints sectioned tables for prompt/report counts, injection rate, efficiency, rule outcomes, hook events, last prompt, and last report. |
|
|
495
592
|
| `ctx benchmark -- "task"` | Compares baseline AGENTS.md ordering with ContextOS task-aware scheduling. | You want a before/after signal for lost-in-the-middle risk. | Prints tables for parsed/actionable/filtered rules, baseline middle-risk, scheduled high/mid rules, recency reminder status, and top scored rules. |
|
|
496
593
|
| `ctx benchmark --skills` | Runs the Skill Router eval benchmark. | You want evidence for skill routing accuracy and negative gates. | Prints top-1 accuracy, top-3 recall, false positive rate, confidence calibration, and negative gate accuracy across `eval/skill-routing` fixtures. |
|
|
594
|
+
| `ctx leaderboard --hallucination` | Compares raw prompt-only skill guesses with ContextOS evidence routing. | You want launch evidence for the hallucination problem. | Runs 20 fixture tasks across 10+ repo contexts and prints Raw Agent vs ContextOS correctness plus sample failures. |
|
|
497
595
|
| `ctx sync --rules` | Syncs project rules and MCP servers through Ruler. | You want Codex, Claude Code, and Antigravity to share one project rule/MCP source of truth. | Ensures `.ruler/ruler.toml`, injects `ctx-mcp`, imports existing MCP servers from Codex and project `.mcp.json`, runs `ruler apply --agents codex,claude,antigravity`, mirrors MCP servers to Antigravity MCP configs, and verifies generated config. |
|
|
498
596
|
| `ctx sync --rules --agents <list>` | Syncs only selected agents through Ruler. | You want to update one or two agents without touching the others. | Accepts comma-separated values such as `codex`, `claude`, `agy`, `antigravity`, or `codex,claude,agy`; `agy` is normalized to Ruler's `antigravity`. |
|
|
499
597
|
| `ctx sync --rules --dry-run` | Previews Ruler sync without writing files or running apply. | You want to inspect behavior before changing project config. | Prints the same flow with dry-run status. |
|
package/bin/ctx.js
CHANGED
|
@@ -20,6 +20,7 @@ import { defaultDataRoot, workspaceDataDir, workspaceMarkerPath } from "../plugi
|
|
|
20
20
|
import { installMcpTelemetryProxies } from "../plugins/ctx/lib/mcp-proxy-install.js";
|
|
21
21
|
import { benchmarkWorkspace, formatBenchmark } from "../plugins/ctx/lib/benchmark.js";
|
|
22
22
|
import { formatSkillRoutingBenchmark, runSkillRoutingEval } from "../eval/skill-routing/run-eval.js";
|
|
23
|
+
import { formatHallucinationLeaderboard, runHallucinationLeaderboard } from "../eval/hallucination/run-leaderboard.js";
|
|
23
24
|
import { copyDir, copyPackageRoot, syncPackageRoot } from "../plugins/ctx/lib/package-install.js";
|
|
24
25
|
import { installClaudeHooks } from "../plugins/ctx/lib/claude-hooks.js";
|
|
25
26
|
import { installClaudeMcp } from "../plugins/ctx/lib/claude-mcp.js";
|
|
@@ -41,6 +42,7 @@ import { checkForUpdate } from "../plugins/ctx/lib/update-notifier.js";
|
|
|
41
42
|
import { fetchSkillsForAgents, printSkillRecommendations, getAllLibraries, getInstallCommands } from "../plugins/ctx/lib/skill-library.js";
|
|
42
43
|
import { invalidateCtxMcpSocket } from "../plugins/ctx/lib/ctx-mcp-client.js";
|
|
43
44
|
import { runPrefixedCommand } from "../plugins/ctx/lib/shell-runner.js";
|
|
45
|
+
import { formatContextOSReady, inspectContextOSReady } from "../plugins/ctx/lib/certification.js";
|
|
44
46
|
|
|
45
47
|
/**
|
|
46
48
|
* Run a shell command with all output lines prefixed by │
|
|
@@ -190,11 +192,13 @@ Usage:
|
|
|
190
192
|
ctx setup --no-skills Skip skill sync
|
|
191
193
|
ctx setup --quiet Quiet mode (minimal output)
|
|
192
194
|
ctx debug -- "task" Debug a task with ContextOS tracing
|
|
195
|
+
ctx doctor Score repository ContextOS readiness
|
|
193
196
|
ctx report Show last ContextOS compliance report
|
|
194
197
|
ctx evidence Show evidence from last report
|
|
195
198
|
ctx stats Show workspace statistics
|
|
196
199
|
ctx benchmark -- "task" Benchmark workspace for a task
|
|
197
200
|
ctx benchmark --skills Run skill routing eval benchmark
|
|
201
|
+
ctx leaderboard --hallucination Compare raw agent guesses vs ContextOS routing
|
|
198
202
|
ctx sync --rules Sync AGENTS.md rules to all agents
|
|
199
203
|
ctx sync --rules --agents <names> Sync rules to specific agents only
|
|
200
204
|
ctx sync --rules --dry-run Preview rule sync without writing
|
|
@@ -1001,6 +1005,8 @@ try {
|
|
|
1001
1005
|
const task = marker >= 0 ? args.slice(marker + 1).join(" ") : args.slice(1).join(" ");
|
|
1002
1006
|
if (!task.trim()) throw new Error('Usage: ctx debug -- "task"');
|
|
1003
1007
|
await debug(task);
|
|
1008
|
+
} else if (command === "doctor") {
|
|
1009
|
+
console.log(formatContextOSReady(inspectContextOSReady({ cwd: process.cwd() })));
|
|
1004
1010
|
} else if (command === "refresh") {
|
|
1005
1011
|
await refresh();
|
|
1006
1012
|
} else if (command === "autowarm") {
|
|
@@ -1030,6 +1036,12 @@ try {
|
|
|
1030
1036
|
if (!task.trim()) throw new Error('Usage: ctx benchmark -- "task"');
|
|
1031
1037
|
console.log(formatBenchmark(benchmarkWorkspace({ cwd: process.cwd(), task })));
|
|
1032
1038
|
}
|
|
1039
|
+
} else if (command === "leaderboard") {
|
|
1040
|
+
if (args.includes("--hallucination")) {
|
|
1041
|
+
console.log(formatHallucinationLeaderboard(await runHallucinationLeaderboard({ rootDir })));
|
|
1042
|
+
} else {
|
|
1043
|
+
throw new Error("Usage: ctx leaderboard --hallucination");
|
|
1044
|
+
}
|
|
1033
1045
|
} else if (command === "skills") {
|
|
1034
1046
|
if (args[1] === "doctor") {
|
|
1035
1047
|
const marker = args.indexOf("--");
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
# Community Skill Packs
|
|
2
|
+
|
|
3
|
+
Community Skill Packs are the first public contribution loop for ContextOS.
|
|
4
|
+
|
|
5
|
+
This is intentionally smaller than a hosted Hub. Contributors can open a PR that adds one folder with:
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
community-skills/<skill-id>/
|
|
9
|
+
SKILL.md
|
|
10
|
+
skill.yaml
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
`SKILL.md` is the model-visible operating guide. `skill.yaml` is the routing contract used by the Skill Router.
|
|
14
|
+
|
|
15
|
+
## Contract
|
|
16
|
+
|
|
17
|
+
Every pack should define:
|
|
18
|
+
|
|
19
|
+
- `name`: Human-readable skill name.
|
|
20
|
+
- `positive_triggers`: Prompt, file, and dependency signals that should select the skill.
|
|
21
|
+
- `evidence`: Project files and dependencies that support high confidence routing.
|
|
22
|
+
- `negative_triggers`: Signals that should reduce confidence or reject the skill.
|
|
23
|
+
- `workflow`: Short execution checklist for the agent.
|
|
24
|
+
|
|
25
|
+
Use `_template/` when adding a new pack.
|
|
26
|
+
|
|
27
|
+
## Seed Packs
|
|
28
|
+
|
|
29
|
+
- `eas`: Expo EAS build, submit, preview, and production deployment.
|
|
30
|
+
- `vercel`: Next.js and Vercel deployment debugging.
|
|
31
|
+
- `prisma`: Prisma schema, migration, generated client, and query debugging.
|
|
32
|
+
- `redis`: Redis cache, TTL, rate limiting, queue, and session work.
|
|
33
|
+
- `oauth-google`: Google OAuth login and callback flows.
|
|
34
|
+
- `jwt-auth`: JWT access token, refresh token, guard, and middleware work.
|
|
35
|
+
|
|
36
|
+
## Contribution Rules
|
|
37
|
+
|
|
38
|
+
- Keep triggers specific enough to avoid false positives.
|
|
39
|
+
- Add negative triggers for adjacent but wrong platforms.
|
|
40
|
+
- Prefer project evidence over broad prompt words.
|
|
41
|
+
- Keep workflows compact and action-oriented.
|
|
42
|
+
- Do not require optional graph or memory adapters.
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Example Skill
|
|
3
|
+
description: Replace this with the specific task capability this skill helps with.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Example Skill
|
|
7
|
+
|
|
8
|
+
Use this skill when the prompt and project evidence match `skill.yaml`.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Confirm the matching files, dependencies, or config exist.
|
|
13
|
+
2. Inspect the smallest relevant implementation surface.
|
|
14
|
+
3. Apply the change using the project's existing patterns.
|
|
15
|
+
4. Run the narrowest useful verification command.
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
id: example-skill
|
|
2
|
+
name: Example Skill
|
|
3
|
+
description: Replace this with a concise skill description.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [example task, example failure]
|
|
6
|
+
files: [example.config.js]
|
|
7
|
+
dependencies: [example-package]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [example.config.js]
|
|
10
|
+
dependencies: [example-package]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
prompts: [unrelated task]
|
|
13
|
+
files: [wrong.config.js]
|
|
14
|
+
dependencies: [wrong-package]
|
|
15
|
+
workflow:
|
|
16
|
+
- Confirm project evidence before selecting this skill.
|
|
17
|
+
- Inspect the smallest relevant files.
|
|
18
|
+
- Make the targeted change.
|
|
19
|
+
- Run focused verification.
|
|
20
|
+
related_skills: []
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Expo EAS Deployment
|
|
3
|
+
description: Fix Expo EAS builds, submit flows, preview builds, production releases, and mobile deployment failures.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Expo EAS Deployment
|
|
7
|
+
|
|
8
|
+
Use this skill for Expo or React Native projects with EAS evidence such as `eas.json`, `app.json`, `app.config.ts`, `expo`, or `eas-cli`.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Inspect `eas.json`, app config, package scripts, and CI workflow files.
|
|
13
|
+
2. Identify whether the failure is credentials, profile config, native dependency, bundling, or submit configuration.
|
|
14
|
+
3. Patch the smallest config or package boundary that explains the log.
|
|
15
|
+
4. Verify with the relevant EAS, Expo, or package build command.
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
id: eas
|
|
2
|
+
name: Expo EAS Deployment
|
|
3
|
+
description: Fix Expo EAS builds, submit, Android and iOS deployment failures.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [eas, expo build, deployed, deploy, android, ios, submit, preview, production, qr, connect]
|
|
6
|
+
files: [eas.json, app.json, app.config.js, app.config.ts]
|
|
7
|
+
dependencies: [expo, eas-cli, expo-router, react-native]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [eas.json, app.json, app.config.js, app.config.ts, .github/workflows/*]
|
|
10
|
+
dependencies: [expo, eas-cli, expo-router, react-native]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
dependencies: [next, vite, vercel]
|
|
13
|
+
files: [vercel.json, next.config.js, next.config.ts]
|
|
14
|
+
workflow:
|
|
15
|
+
- Inspect EAS profiles, app config, package scripts, and CI workflows.
|
|
16
|
+
- Classify the failure as credentials, build profile, native dependency, bundling, or submit config.
|
|
17
|
+
- Patch only the config or package boundary that matches the log.
|
|
18
|
+
- Verify with the relevant Expo or EAS command.
|
|
19
|
+
related_skills:
|
|
20
|
+
- mobile-deployment
|
|
21
|
+
- github-actions-ci-cd
|
|
22
|
+
- env-secret-management
|
|
23
|
+
- build-log-debugging
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: JWT Auth
|
|
3
|
+
description: Implement and debug JWT login, access tokens, refresh tokens, guards, middleware, and authorization headers.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# JWT Auth
|
|
7
|
+
|
|
8
|
+
Use this skill when the prompt mentions JWT, access tokens, refresh tokens, bearer auth, guards, or token verification with project evidence such as `jsonwebtoken` or auth middleware.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Inspect token signing, verification, expiry, refresh rotation, and guard or middleware wiring.
|
|
13
|
+
2. Confirm how tokens are stored and sent by clients.
|
|
14
|
+
3. Patch the narrow auth boundary without changing unrelated authorization policy.
|
|
15
|
+
4. Verify with focused auth tests and typecheck.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
id: jwt-auth
|
|
2
|
+
name: JWT Auth
|
|
3
|
+
description: Implement and debug JWT authentication, refresh tokens, access tokens, guards, middleware, and login authorization.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [jwt, auth, token, access token, refresh token, bearer, login, guard, middleware]
|
|
6
|
+
files: [auth.guard.ts, jwt.strategy.ts, middleware.ts]
|
|
7
|
+
dependencies: [jsonwebtoken, jose, @nestjs/jwt, passport-jwt]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [auth.guard.ts, jwt.strategy.ts, middleware.ts, .env.example]
|
|
10
|
+
dependencies: [jsonwebtoken, jose, @nestjs/jwt, passport-jwt]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
prompts: [google oauth, social login, saml]
|
|
13
|
+
dependencies: [next-auth, @auth/core, passport-google-oauth20]
|
|
14
|
+
workflow:
|
|
15
|
+
- Inspect signing, verification, expiry, refresh rotation, and guard or middleware wiring.
|
|
16
|
+
- Confirm token storage and client authorization headers.
|
|
17
|
+
- Patch the narrow auth boundary without changing unrelated authorization policy.
|
|
18
|
+
- Verify with focused auth tests and typecheck.
|
|
19
|
+
related_skills:
|
|
20
|
+
- oauth-google
|
|
21
|
+
- backend-security-coder
|
|
22
|
+
- api-security-best-practices
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Google OAuth
|
|
3
|
+
description: Implement and debug Google OAuth login, callback routes, Auth.js or NextAuth providers, scopes, and account linking.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Google OAuth
|
|
7
|
+
|
|
8
|
+
Use this skill when the prompt mentions Google sign-in or OAuth and the repo has OAuth evidence such as `next-auth`, `@auth/core`, Passport Google OAuth, or auth callback routes.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Inspect provider configuration, callback URL handling, scopes, secrets, and session creation.
|
|
13
|
+
2. Verify that frontend login entrypoints and backend callback routes agree.
|
|
14
|
+
3. Patch the smallest auth boundary and keep existing session/token conventions.
|
|
15
|
+
4. Verify with focused auth tests, typecheck, or a local OAuth callback path when available.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
id: oauth-google
|
|
2
|
+
name: Google OAuth
|
|
3
|
+
description: Implement and debug Google OAuth login, OAuth callback routes, Auth.js, NextAuth, Passport providers, and social sign-in.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [oauth, google login, google sign in, callback, social login, auth provider]
|
|
6
|
+
files: [auth.config.ts, auth.ts, pages/api/auth/*, app/api/auth/*]
|
|
7
|
+
dependencies: [next-auth, @auth/core, passport-google-oauth20]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [auth.config.ts, auth.ts, pages/api/auth/*, app/api/auth/*, .env.example]
|
|
10
|
+
dependencies: [next-auth, @auth/core, passport, passport-google-oauth20]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
prompts: [jwt only, password login, refresh token only]
|
|
13
|
+
dependencies: [jsonwebtoken]
|
|
14
|
+
workflow:
|
|
15
|
+
- Inspect provider config, callback URLs, scopes, secrets, and session creation.
|
|
16
|
+
- Verify frontend login entrypoints and backend callback routes agree.
|
|
17
|
+
- Patch the smallest auth boundary while preserving session conventions.
|
|
18
|
+
- Verify with focused auth tests, typecheck, or local callback flow.
|
|
19
|
+
related_skills:
|
|
20
|
+
- jwt-auth
|
|
21
|
+
- env-secret-management
|
|
22
|
+
- auth-implementation-patterns
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Prisma
|
|
3
|
+
description: Debug Prisma schemas, migrations, generated client issues, relations, transactions, and slow ORM queries.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Prisma
|
|
7
|
+
|
|
8
|
+
Use this skill when the repo contains Prisma evidence such as `prisma/schema.prisma`, `@prisma/client`, or `prisma`.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Inspect `prisma/schema.prisma`, generated client usage, migrations, and package scripts.
|
|
13
|
+
2. Determine whether the issue is schema shape, migration state, generated types, query logic, or transaction behavior.
|
|
14
|
+
3. Patch the schema or service layer in the smallest compatible step.
|
|
15
|
+
4. Verify with Prisma generate/migrate checks and the relevant test or typecheck.
|
|
@@ -0,0 +1,22 @@
|
|
|
1
|
+
id: prisma
|
|
2
|
+
name: Prisma
|
|
3
|
+
description: Debug Prisma schema, migrations, generated client, relations, and ORM queries.
|
|
4
|
+
positive_triggers:
|
|
5
|
+
prompts: [prisma, migration, schema, database, query, relation, transaction, generated client]
|
|
6
|
+
files: [prisma/schema.prisma]
|
|
7
|
+
dependencies: [prisma, @prisma/client]
|
|
8
|
+
evidence:
|
|
9
|
+
files: [prisma/schema.prisma, prisma/migrations/*]
|
|
10
|
+
dependencies: [prisma, @prisma/client]
|
|
11
|
+
negative_triggers:
|
|
12
|
+
dependencies: [mongoose, mongodb, sequelize, typeorm]
|
|
13
|
+
files: [drizzle.config.ts]
|
|
14
|
+
workflow:
|
|
15
|
+
- Inspect schema, migrations, generated client usage, and package scripts.
|
|
16
|
+
- Classify the issue as schema, migration, generated type, query, or transaction behavior.
|
|
17
|
+
- Patch the smallest schema or service boundary.
|
|
18
|
+
- Verify with Prisma generate or migrate checks plus focused tests.
|
|
19
|
+
related_skills:
|
|
20
|
+
- database
|
|
21
|
+
- nestjs-module
|
|
22
|
+
- integration-testing
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: Redis
|
|
3
|
+
description: Add and debug Redis caching, TTLs, sessions, queues, rate limits, and cache invalidation.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Redis
|
|
7
|
+
|
|
8
|
+
Use this skill when the repo has Redis evidence such as `redis`, `ioredis`, BullMQ, session stores, or cache modules.
|
|
9
|
+
|
|
10
|
+
## Workflow
|
|
11
|
+
|
|
12
|
+
1. Inspect Redis client setup, cache keys, TTL policy, and invalidation paths.
|
|
13
|
+
2. Confirm whether Redis is used for cache, queue, session, rate limit, or pub/sub behavior.
|
|
14
|
+
3. Patch the smallest client/service boundary and preserve existing key conventions.
|
|
15
|
+
4. Verify with focused tests or a local command that exercises the Redis path.
|