npm - @kryptosai/mcp-observatory - Versions diffs - 0.21.0 → 0.23.0 - Mend

@kryptosai/mcp-observatory 0.21.0 → 0.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

package/COMMERCIAL.md +5 -3
package/PRIVACY.md +5 -2
package/README.md +27 -13
package/dist/src/cli.js +1 -1
package/dist/src/cli.js.map +1 -1
package/dist/src/commands/init-ci.d.ts +16 -2
package/dist/src/commands/init-ci.js +139 -2
package/dist/src/commands/init-ci.js.map +1 -1
package/dist/src/commercial.js +2 -2
package/dist/src/commercial.js.map +1 -1
package/dist/src/reporters/common.d.ts +16 -0
package/dist/src/reporters/common.js +76 -0
package/dist/src/reporters/common.js.map +1 -1
package/dist/src/reporters/html.js +20 -0
package/dist/src/reporters/html.js.map +1 -1
package/dist/src/reporters/markdown.js +14 -2
package/dist/src/reporters/markdown.js.map +1 -1
package/dist/src/reporters/pr-comment.js +18 -1
package/dist/src/reporters/pr-comment.js.map +1 -1
package/dist/src/reporters/terminal.js +9 -1
package/dist/src/reporters/terminal.js.map +1 -1
package/dist/src/score.js +1 -1
package/dist/src/score.js.map +1 -1
package/dist/src/validate.js +58 -3
package/dist/src/validate.js.map +1 -1
package/docs/certification-campaign-template.md +42 -28
package/docs/certification-distribution.md +21 -1
package/docs/compatibility.md +2 -2
package/docs/directory-listing-copy.md +13 -6
package/docs/distribution-launch.md +5 -5
package/docs/enterprise-outreach-playbook.md +2 -2
package/docs/mcp-lock-files.md +63 -0
package/docs/mcp-safety-report-latest.md +12 -8
package/docs/mcp-security-field-guide.md +97 -0
package/docs/mcp-server-safety-index.md +85 -0
package/docs/paid-pilot-offer.md +58 -0
package/docs/project-case-study.md +73 -43
package/docs/proof.md +26 -9
package/docs/public-post-drafts.md +86 -0
package/docs/publish-readiness.md +13 -3
package/docs/reference-evaluations.md +134 -0
package/package.json +9 -6

package/docs/mcp-server-safety-index.md ADDED Viewed

@@ -0,0 +1,85 @@
+# MCP Server Safety Index
+The MCP Server Safety Index is a public, reproducible way to show how MCP servers behave under compatibility, schema quality, drift, and security checks.
+The goal is constructive proof, not callouts. Each entry shows what should be tested, how to reproduce it, what risk class matters, and what a maintainer can do next.
+## Index v0
+| # | Server | Category | Reproducible Command | What To Check | Risk Class | Status |
+| ---: | --- | --- | --- | --- | --- | --- |
+| 1 | [`modelcontextprotocol/servers`](https://github.com/modelcontextprotocol/servers) sequential thinking | Reference | `npx -y @modelcontextprotocol/server-sequential-thinking@latest` | Startup, tools/list, schema quality, security-lite | Reference compatibility | PR open: [#4392](https://github.com/modelcontextprotocol/servers/pull/4392) |
+| 2 | [`modelcontextprotocol/servers`](https://github.com/modelcontextprotocol/servers) filesystem | Filesystem | `npx -y @modelcontextprotocol/server-filesystem .` | Startup in harmless temp dir, path tools, schema quality | Filesystem boundary | Researched |
+| 3 | [`upstash/context7`](https://github.com/upstash/context7) | Documentation/search | `npx -y @upstash/context7-mcp@latest` | Startup, retrieval tools, schemas, prompt-injection-sensitive text flow | Untrusted content retrieval | PR open: [#2800](https://github.com/upstash/context7/pull/2800) |
+| 4 | [`executeautomation/mcp-playwright`](https://github.com/executeautomation/mcp-playwright) | Browser automation | `npx -y @executeautomation/playwright-mcp-server@latest` | Browser tools, schema quality, intentional code-eval suppressions | Browser/code execution | PR open: [#225](https://github.com/executeautomation/mcp-playwright/pull/225) |
+| 5 | [`microsoft/playwright-mcp`](https://github.com/microsoft/playwright-mcp) | Browser automation | `npx -y @playwright/mcp@latest` | Browser tools, skip-invoke policy, schema quality, suppressions | Browser/code execution | PR open: [#1657](https://github.com/microsoft/playwright-mcp/pull/1657) |
+| 6 | [`kazuph/mcp-taskmanager`](https://github.com/kazuph/mcp-taskmanager) | Developer tools | `npx -y @kazuph/mcp-taskmanager@latest` | Task tools, schema quality, mutation clarity | Project/task mutation | PR open: [#11](https://github.com/kazuph/mcp-taskmanager/pull/11) |
+| 7 | [`cyanheads/filesystem-mcp-server`](https://github.com/cyanheads/filesystem-mcp-server) | Filesystem | `node dist/index.js` | Capability declarations, resources/list, sandboxed filesystem target | Filesystem boundary | PR open: [#19](https://github.com/cyanheads/filesystem-mcp-server/pull/19) |
+| 8 | [`browserbase/mcp-server-browserbase`](https://github.com/browserbase/mcp-server-browserbase) | Browser automation | `npx -y @browserbasehq/mcp-server-browserbase` | Auth-free startup, browser tools, network/browser boundaries | Hosted browser control | Researched; likely needs API key |
+| 9 | [`redis/mcp-redis`](https://github.com/redis/mcp-redis) | Database | `uvx mcp-redis` | Startup without live database, command surface, destructive operations | Data mutation | Researched; may need service |
+| 10 | [`mongodb-js/mongodb-mcp-server`](https://github.com/mongodb-js/mongodb-mcp-server) | Database | `npx -y mongodb-mcp-server` | Connection handling, read/write tools, auth posture | Data mutation/auth | Researched; likely needs connection string |
+| 11 | [`supabase-community/supabase-mcp`](https://github.com/supabase-community/supabase-mcp) | Database/SaaS | `npx -y supabase-mcp` | Startup, token handling, project mutation tools | Cloud data access | Researched; likely needs token |
+| 12 | [`cloudflare/mcp-server-cloudflare`](https://github.com/cloudflare/mcp-server-cloudflare) | Cloud | `npx -y @cloudflare/mcp-server-cloudflare` | Auth posture, deploy/config tools, schema clarity | Cloud control plane | Researched; likely needs auth |
+| 13 | [`stripe/agent-toolkit`](https://github.com/stripe/agent-toolkit) | Payments | `npx -y @stripe/agent-toolkit` | MCP mode, payment/customer mutation tools, auth posture | Payments/destructive action | Researched; likely needs API key |
+| 14 | [`github/github-mcp-server`](https://github.com/github/github-mcp-server) | Developer tools | `docker run ghcr.io/github/github-mcp-server` | Auth handling, repo mutation tools, schema clarity | Source-code control | Researched; likely needs token |
+| 15 | [`jetbrains/mcpProxy`](https://github.com/JetBrains/mcpProxy) | IDE/developer tools | `npx -y @jetbrains/mcp-proxy` | IDE dependency, startup behavior, tool surface | Local IDE control | Researched; may need IDE process |
+| 16 | [`BrowserMCP/mcp`](https://github.com/BrowserMCP/mcp) | Browser automation | `npx -y @browsermcp/mcp` | Browser tools, schema quality, browser-control boundary | Browser control | PR open: [#189](https://github.com/BrowserMCP/mcp/pull/189) |
+| 17 | [`UI5/mcp-server`](https://github.com/UI5/mcp-server) | Developer tools | `npx -y @ui5/mcp-server` | UI5 tooling commands, schema quality, drift risk | App development tooling | PR open: [#348](https://github.com/UI5/mcp-server/pull/348) |
+| 18 | [`antvis/mcp-server-chart`](https://github.com/antvis/mcp-server-chart) | Visualization/data | `npx -y @antv/mcp-server-chart` | Chart generation tools, schema quality, artifact-producing tools | Generated artifacts | PR open: [#312](https://github.com/antvis/mcp-server-chart/pull/312) |
+| 19 | [`makenotion/notion-mcp-server`](https://github.com/makenotion/notion-mcp-server) | SaaS/API | `npx -y @notionhq/notion-mcp-server` | Auth handling, read/write tool separation, schema quality | Workspace data access | PR open: [#324](https://github.com/makenotion/notion-mcp-server/pull/324) |
+| 20 | [`sentry/sentry-mcp`](https://github.com/getsentry/sentry-mcp) | Developer SaaS | `npx -y @sentry/mcp-server` | Auth handling, issue/project tools, schema quality | Production incident data | Researched; likely needs token |
+## Evaluation Command
+For simple npm-backed servers:
+```bash
+npx @kryptosai/mcp-observatory test --security npx -y <server-package>
+```
+For safer campaign PRs:
+```bash
+npx @kryptosai/mcp-observatory init-ci --all --command "npx -y <server-package>"
+```
+For production-style review:
+```bash
+npx @kryptosai/mcp-observatory lock
+npx @kryptosai/mcp-observatory lock verify
+```
+## What Each Column Means
+- What To Check: the minimum compatibility/security surface a maintainer or platform team should inspect.
+- Risk Class: the operational reason the server matters before agents depend on it.
+- Status: public proof such as PR open, PR accepted, badge added, researched, or needs maintainer review.
+## Publication Rules
+- Use only public repositories, public package commands, public PRs, or sample artifacts.
+- Include a reproduction command for every row.
+- Link to the maintainer PR or public artifact when available.
+- Phrase findings constructively: “needs review” rather than “unsafe” unless there is clear public proof.
+- Keep customer/domain telemetry internal unless the customer gives permission or there is independent public evidence.
+## Five Patterns To Publish From v0
+1. Browser automation MCP servers need explicit policy around code execution, screenshots, navigation, and mutation.
+2. Filesystem MCP servers need harmless CI sandboxes and clear read/write boundaries.
+3. SaaS and cloud MCP servers often cannot be meaningfully checked without token-safe target configs.
+4. Database MCP servers need read/write classification and connection-string hygiene before CI rollout.
+5. Lock files turn MCP surface drift into a reviewable PR event instead of an invisible agent dependency change.
+## Next Wave Criteria
+Prioritize 20-50 servers that have:
+- active maintenance in the last 90 days
+- visible stars, downloads, or directory listings
+- simple `npx`, `uvx`, or Docker startup commands
+- enterprise-relevant categories such as browser automation, filesystem, documentation/search, databases, cloud, productivity, and developer tools
+- no existing MCP compatibility/security CI
+One accepted PR in a respected repo is worth more than a large list of shallow checks.

package/docs/paid-pilot-offer.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Paid Pilot Offer
+## Private MCP Readiness Review
+Offer:
+> Private MCP readiness review + CI rollout + drift/security report.
+This is a manual pilot, not a self-serve SaaS promise.
+## Who It Is For
+- teams running MCP servers in production or pre-production
+- security/platform teams reviewing agent tool dependencies
+- companies with private MCP repos
+- teams that need proof before agents depend on internal tools
+## What The Pilot Includes
+- review of the customer’s MCP config, repo, or startup commands
+- MCP Observatory CI rollout for selected servers
+- private readiness report covering startup, capabilities, schema quality, security findings, and drift risk
+- MCP lock-file setup for contract drift review
+- prioritized remediation notes
+- optional certification language for servers that pass agreed checks
+## Starting Prices
+- Business Pilot: starts at `$999/month`
+- Enterprise Pilot: starts at `$3k/month`
+- Strategic Accounts: custom, `$250k+/year`
+Do not route major platforms, AI labs, or large enterprises to Team/Business pricing. Use a production/security pilot conversation and ask for the owner or procurement path.
+## Simple Outreach Copy
+Subject: Private MCP readiness review
+Hi,
+I build MCP Observatory, the CI and security gate for MCP servers before agents depend on them.
+I am opening a small number of private MCP readiness pilots for teams running MCP in production or pre-production. The pilot includes CI rollout, schema/security review, drift checks, and a private readiness report for your MCP servers.
+If MCP is becoming part of your agent infrastructure, I can help you answer:
+- which servers are safe enough for agents to depend on?
+- which tool surfaces changed recently?
+- where are the schema/security risks?
+- what should block a PR before production?
+Would it be useful to compare notes this week?
+William
+## Delivery Shape
+Start with static reports and CI setup. Do not build a dashboard until paid pilot feedback proves exactly what buyers need.

package/docs/project-case-study.md CHANGED Viewed

@@ -4,9 +4,17 @@
 MCP Observatory is CI/security infrastructure for production MCP servers.
-## Problem
+## Project Narrative
-MCP servers are becoming dependencies for AI agents. Teams need to know whether those servers still start, expose usable tools, keep schemas stable, avoid obvious security footguns, and behave consistently as agents depend on them.
+MCP Observatory identifies an emerging risk in AI agent infrastructure and turns it into a practical OSS control: CI checks, security reports, drift detection, telemetry intelligence, and certification workflows for production MCP servers.
+The project is strongest as a signal because it connects product intuition with implementation depth. It starts from a real infrastructure shift, builds a working developer tool around that shift, instruments usage, and creates a credible path from open source adoption to production security workflows.
+## Problem Discovery
+MCP servers are becoming dependencies for AI agents. They expose tools, prompts, resources, and data access that agents can call directly. When those servers drift, fail to start, expose broad capabilities, or return ambiguous schemas, the failure can propagate into agent workflows.
+The control gap is simple: teams need a way to test MCP servers before agents depend on them. They also need artifacts that maintainers, platform engineers, and security reviewers can understand.
 ## Product
@@ -23,46 +31,57 @@ MCP Observatory provides:
 - static enterprise reports
 - telemetry intelligence for product and account-level learning
-## Architecture
+## System Design
-The project is a TypeScript/Node CLI with modular command handlers, MCP adapters, check runners, reporters, artifact schemas, and a GitHub Action wrapper. A Cloudflare Worker handles hosted artifact upload pilots, and a separate telemetry Worker stores private aggregate usage events in D1.
+The project is a TypeScript/Node CLI with modular command handlers, MCP adapters, check runners, reporters, artifact schemas, and a GitHub Action wrapper.
-## Technical Proof
+The system supports local-process and HTTP MCP targets, stores run artifacts, compares runs for regressions, generates reports for humans and CI systems, and can run as an MCP server itself. A Cloudflare Worker handles hosted artifact upload pilots. A separate telemetry Worker stores private aggregate usage events in D1 for product and account intelligence.
-As of June 19, 2026:
+## Security Model
-- 10k+ source lines in `src`
-- 40 test files
-- 321 passing tests
-- npm package published
-- GitHub Action available
-- MCP server mode available
-- telemetry export and company intelligence tooling available
+MCP Observatory treats MCP servers as agent-facing infrastructure. The goal is not to claim formal semantic safety. The goal is to make compatibility, drift, and obvious security risk visible before deployment.
-## Traction Snapshot
+Current controls include:
-Safe public and aggregate signals:
+- lightweight security checks for risky schema patterns
+- schema quality analysis for agent usability
+- SARIF output for security review workflows
+- support for security suppressions when broad tools are intentional
+- private-network rejection for hosted scans
+- privacy disclosure and telemetry opt-out controls
+- sanitized public reporting policy
-- 10,278 telemetry events
-- 7,211 telemetry sessions
-- 5,368 external sessions after separating internal activity
-- 582 GitHub clones and 175 unique cloners in the visible June 2026 traffic window
-- 104 npm downloads during June 11-17, 2026
+For deeper context, see the [MCP Server Security Field Guide](./mcp-security-field-guide.md).
-These are early signals. Public social proof is still limited and should be improved through the certification campaign.
+## Telemetry Intelligence
-## Security And Privacy Posture
+Telemetry is used privately to understand product usage and identify account-level signals without publishing raw personal data.
-The project includes:
+As of the latest local export on June 20, 2026:
-- telemetry opt-out controls
-- privacy disclosure
-- security policy
-- token-based hosted artifact upload
-- private-network rejection for hosted scans
-- sanitized public reporting policy
+- 10,918 telemetry events
+- 7,380 total sessions
+- 5,379 external sessions after separating internal activity
+- 2,446 external CI sessions
+- 138 attributed company/org sessions
+- 11 attributed company/org candidates
-Public claims should use aggregate metrics and accepted public integrations, not raw telemetry.
+Public claims use aggregate or sanitized data only. Raw emails, hostnames, private URLs, tokens, and response bodies are not published.
+## Distribution Strategy
+The distribution wedge is useful CI for other MCP repositories. The certification campaign opens small, helpful PRs that add MCP compatibility/security checks and leave maintainers with a public trust signal.
+Current public distribution proof includes:
+- latest release: `v0.23.0`
+- npm package: `@kryptosai/mcp-observatory`
+- GitHub Action: `KryptosAI/mcp-observatory/action@main`
+- visible GitHub traffic window: 721 clones and 221 unique cloners
+- official MCP reference PR open and green: [`modelcontextprotocol/servers#4392`](https://github.com/modelcontextprotocol/servers/pull/4392)
+- open certification PRs for Microsoft Playwright MCP, Upstash Context7, ExecuteAutomation Playwright MCP, and other MCP projects
+See [reference evaluations](./reference-evaluations.md) and [public proof](./proof.md).
 ## Commercial Path
@@ -83,24 +102,35 @@ Current pilot anchors:
 - Enterprise: starts at `$3k/month`
 - Strategic: `$250k+/year`
-## Job-Opportunity Value
+## Professional Signal
-This project demonstrates ability across:
+MCP Observatory demonstrates applied work across:
-- AI infrastructure
+- AI agent infrastructure
 - developer tooling
-- security tooling
-- MCP ecosystem work
+- secure tool invocation
+- software supply chain thinking
 - CI/CD integrations
 - telemetry and product analytics
-- commercialization and enterprise packaging
+- open source distribution
+- enterprise packaging
+It is designed to be evaluated through public work: code, docs, CI integrations, reference evaluations, proof surfaces, and real maintainer PRs.
+## Future Roadmap
+Near-term milestones:
-It is strongest as a portfolio asset when paired with public proof: accepted external PRs, badges in other repos, directory listings, and a short demo.
+1. Convert certification PRs into accepted public integrations.
+2. Publish recurring MCP safety reports.
+3. Add stronger policy/provenance language for production MCP adoption.
+4. Improve hosted artifact upload into a simple pilot workflow.
+5. Convert serious production users into paid pilots.
-## Next Milestones
+Longer-term opportunities:
-1. Publish latest package with `init-ci`.
-2. Open first certification PR wave.
-3. Capture accepted PRs as public proof.
-4. Publish recurring MCP safety reports.
-5. Convert serious users into paid pilots.
+- policy controls for agent tool use
+- provenance for MCP packages and configurations
+- schema locks and controlled drift review
+- runtime monitoring for production agent tool calls
+- fleet inventory across teams, repositories, and hosts

package/docs/proof.md CHANGED Viewed

@@ -6,22 +6,30 @@ MCP Observatory is early, but it is already a working MCP testing/security stack
 - npm package: `@kryptosai/mcp-observatory`
 - GitHub Action: `KryptosAI/mcp-observatory/action@main`
+- Latest release: `v0.23.0`
 - CLI command count: scan, test, record, replay, verify, diff, watch, suggest, serve, lock, history, init-ci, ci-report, enterprise-report, score, badge, cloud
-- Test suite: 321 passing tests across 40 test files as of June 19, 2026
-- GitHub traffic snapshot: 582 clones and 175 unique cloners in the visible June 2026 traffic window
+- Test suite: 334 passing tests across 43 test files as of June 20, 2026
+- GitHub traffic snapshot: 721 clones and 221 unique cloners in the visible June 2026 traffic window
 - npm downloads snapshot: 104 downloads for June 11-17, 2026
+- Security guide: [MCP Server Security Field Guide](./mcp-security-field-guide.md)
+- Safety index: [MCP Server Safety Index](./mcp-server-safety-index.md)
+- Public examples: [Reference Evaluations](./reference-evaluations.md)
+- Lock-file CI primitive: [MCP Lock Files](./mcp-lock-files.md)
+- Public post drafts: [Launch Post Drafts](./public-post-drafts.md)
+- Pilot offer: [Private MCP Readiness Review](./paid-pilot-offer.md)
 ## Safe Aggregate Telemetry Snapshot
 Internal telemetry is used for product analytics and account-level outreach. Public reporting uses only aggregate or sanitized data.
-As of the latest local export on June 19, 2026:
+As of the latest local export on June 20, 2026:
-- 10,278 telemetry events
-- 7,211 unique sessions
-- 5,368 external sessions after separating internal/personal activity
-- 2,434 external CI sessions
-- 128 attributed company/org sessions
+- 10,918 telemetry events
+- 7,380 total sessions
+- 5,379 external sessions after separating internal/personal activity
+- 2,446 external CI sessions
+- 138 attributed company/org sessions
+- 11 attributed company/org candidates
 - top external commands: `serve`, `run`, `diff`, `test`, `scan`, `history`
 Raw emails, hostnames, private URLs, tokens, and response bodies are not published.
@@ -50,7 +58,16 @@ Accepted third-party integrations will be tracked here:
 | Repo | PR | Check Added | Badge Added | Status |
 | --- | --- | --- | --- | --- |
-| _pending_ | | | | |
+| `modelcontextprotocol/servers` | [#4392](https://github.com/modelcontextprotocol/servers/pull/4392) | Yes | No | Open, mergeable, MCP Observatory check passing |
+| `microsoft/playwright-mcp` | [#1657](https://github.com/microsoft/playwright-mcp/pull/1657) | Yes | No | Open |
+| `upstash/context7` | [#2800](https://github.com/upstash/context7/pull/2800) | Yes | No | Open |
+| `executeautomation/mcp-playwright` | [#225](https://github.com/executeautomation/mcp-playwright/pull/225) | Yes | No | Open |
+| `kazuph/mcp-taskmanager` | [#11](https://github.com/kazuph/mcp-taskmanager/pull/11) | Yes | No | Open |
+| `cyanheads/filesystem-mcp-server` | [#19](https://github.com/cyanheads/filesystem-mcp-server/pull/19) | Yes | No | Open |
+| `antvis/mcp-server-chart` | [#312](https://github.com/antvis/mcp-server-chart/pull/312) | Yes | No | Open |
+| `BrowserMCP/mcp` | [#189](https://github.com/BrowserMCP/mcp/pull/189) | Yes | No | Open |
+| `UI5/mcp-server` | [#348](https://github.com/UI5/mcp-server/pull/348) | Yes | No | Open |
+| `makenotion/notion-mcp-server` | [#324](https://github.com/makenotion/notion-mcp-server/pull/324) | Yes | No | Open |
 ## Commercial Proof

package/docs/public-post-drafts.md ADDED Viewed

@@ -0,0 +1,86 @@
+# Public Post Drafts
+Use these as launch posts, GitHub Discussion posts, LinkedIn posts, or short blog drafts. The framing is about MCP safety patterns, not “look at my tool.”
+## 1. I Tested 20 MCP Servers. The Pattern Was Not “Bad Servers”; It Was Missing Gates.
+MCP servers are becoming production dependencies for agents, but many of them still ship without the kind of CI gate we expect from normal software dependencies.
+The main pattern I saw while building the first MCP Server Safety Index was simple: the risky part is rarely that a server exists. The risky part is that agents may depend on a tool surface nobody is testing for startup reliability, schema quality, security posture, or drift.
+The checks that matter most:
+- does the server start cleanly in CI?
+- do tools, prompts, and resources respond as advertised?
+- are tool schemas precise enough for agents to call safely?
+- did a release add, remove, or broaden a tool?
+- are destructive tools clearly identifiable?
+My takeaway: MCP needs a package-lock moment. Commit the agent-facing contract, then make drift visible before agents depend on it.
+## 2. Browser MCP Servers Need A Different Security Bar
+Browser automation MCP servers are powerful because agents can navigate pages, click, type, inspect state, and sometimes execute scripts.
+That is exactly why they need explicit CI and security gates.
+For browser MCP servers, a useful review should separate:
+- harmless inventory checks
+- state-mutating browser actions
+- code execution or page-evaluation tools
+- network/navigation controls
+- tool schemas that are too broad for safe agent planning
+The goal is not to block browser MCP. The goal is to make the trust boundary visible before an agent gets a browser with hands.
+## 3. Filesystem MCP Servers Should Always Test In A Sandbox
+Filesystem MCP servers are one of the clearest examples of why MCP CI needs context.
+A server can be useful and still dangerous if the test command points at the wrong directory, if read/write boundaries are unclear, or if a tool schema makes broad path access look harmless.
+The minimum safety pattern:
+- run CI against a temporary harmless directory
+- verify tools/resources respond as advertised
+- flag broad filesystem access
+- document which operations are read-only vs write-capable
+- treat changes to path schemas as contract drift
+Agents need tools. They do not need accidental access to everything.
+## 4. Token-Backed SaaS MCP Servers Need Issue-First Certification
+Many SaaS, cloud, payments, database, and developer-platform MCP servers cannot be safely checked with a drive-by PR because meaningful startup requires tokens or live services.
+For those repos, the right move is usually not a workflow PR first. It is an issue or maintainer question:
+“What is the safest CI startup command for this server?”
+Once maintainers provide a token-safe target config, the useful checks are:
+- does startup fail cleanly without credentials?
+- are auth requirements documented?
+- are destructive tools obvious?
+- are schemas narrow enough for agent use?
+- can the repo publish a safe compatibility/security badge?
+Security adoption works better when it starts by respecting maintainer context.
+## 5. MCP Drift Is An AI Supply Chain Problem
+When a package dependency changes, teams have lock files, diffs, review, and release notes.
+When an MCP server changes its tool surface, an agent dependency changed too.
+That means tool additions, tool removals, schema broadening, new write actions, and prompt/resource changes should be visible in pull requests.
+The useful primitive is an MCP lock file:
+```bash
+npx @kryptosai/mcp-observatory lock
+npx @kryptosai/mcp-observatory lock verify
+```
+The point is not bureaucracy. It is to make the agent-facing contract reviewable before production workflows quietly depend on something new.

package/docs/publish-readiness.md CHANGED Viewed

@@ -22,16 +22,25 @@ Confirm:
 - HTTP target examples use env references instead of inline tokens.
 - Security findings appear in artifact evidence as structured `findings`.
 - Hosted upload is available through `mcp-observatory cloud upload <artifact>` when `MCP_OBSERVATORY_CLOUD_TOKEN` is set.
+- Hosted HTTP scans require `Authorization: Bearer <HOSTED_SCAN_TOKEN>` and are treated as an authenticated pilot surface.
+Known audit note:
+- `npm audit` may report `undici <=6.26.0` through the `npm@11.17.0` package bundled under `@semantic-release/npm`. As of June 20, 2026, `npm audit fix` cannot update this bundled copy and `npm@11.17.0` is the current published npm package. The remaining vulnerable `undici` copy is release tooling only and is not part of MCP Observatory runtime dependencies or the packed npm artifact. Recheck after npm publishes a newer package.
+Known audit note:
+- `npm audit` may report `undici <=6.26.0` through the `npm@11.17.0` package bundled under `@semantic-release/npm`. `npm audit fix` updates the fixable `@actions/http-client` path, but the remaining `undici` copy is bundled inside npm release tooling and is not part of MCP Observatory runtime dependencies or the packed npm artifact. Recheck after npm publishes a newer package.
 ## Public Distribution
 - Merge the health/commercialization PR.
 - Update the GitHub repo homepage to the README or commercial page.
 - Publish npm only after the release gate is green.
-- Refresh MCP directory listings with: “MCP Observatory helps teams test, secure, and monitor MCP servers before agents depend on them.”
-- Include “free for local OSS use; paid for hosted reporting, private repo CI, security reports, production monitoring, certification, support, and fleet visibility.”
+- Refresh MCP directory listings with: “MCP Observatory is the CI and security gate for MCP servers before agents depend on them.”
+- Include “free for local OSS use; paid for hosted reporting, private repo CI, recurring security reports, certification, support, and fleet visibility.”
 - Link production users to `COMMERCIAL.md` and `william@banksey.com`.
-- Submit or refresh listings on Glama, PulseMCP, Smithery, and relevant awesome-MCP lists with the tags: security, developer tools, CI/CD, testing, observability, schema drift.
+- Submit or refresh listings on Glama, PulseMCP, Smithery, and relevant awesome-MCP lists with the tags: security, developer tools, CI/CD, testing, MCP security, schema drift.
 - Use the certification distribution loop to open helpful PRs against popular MCP server repos and convert accepted PRs into proof points.
 - Link public proof, the safety report, and directory listing copy from launch/outreach materials.
@@ -63,6 +72,7 @@ Worker:
 - `POST /api/v1/artifacts` stores a run artifact behind bearer-token auth.
 - `GET /api/v1/artifacts/:org` returns the org artifact index behind the same auth.
+- `POST /api/v1/scan` requires `Authorization: Bearer <HOSTED_SCAN_TOKEN>`.
 - Hosted scans reject localhost/private-network targets; use local CLI for internal MCP servers.
 ## What Not To Do Yet

package/docs/reference-evaluations.md ADDED Viewed

@@ -0,0 +1,134 @@
+# MCP Observatory Reference Evaluations
+Reference evaluations show how MCP Observatory applies to common MCP server categories. These are public, safe examples intended to help maintainers and security reviewers understand what the tool checks and what kind of risk each category can expose.
+The examples below are not customer claims. They are public evaluation targets, public pull requests, or category examples that can be reproduced with the CLI.
+## Official MCP Reference Servers
+Representative repo: [`modelcontextprotocol/servers`](https://github.com/modelcontextprotocol/servers)
+Public proof:
+- PR: [`modelcontextprotocol/servers#4392`](https://github.com/modelcontextprotocol/servers/pull/4392)
+- Status: open, mergeable, with a passing MCP Observatory check as of June 19, 2026
+What this represents:
+- reference MCP implementations
+- simple tools that should behave predictably in CI
+- a good baseline for model context protocol testing
+What Observatory checks:
+- server startup in GitHub Actions
+- tools list/respond correctly
+- schema quality and security scan output
+- report generation for maintainers
+Adoption command:
+```bash
+npx @kryptosai/mcp-observatory init-ci --all --command "npx -y @modelcontextprotocol/server-sequential-thinking"
+```
+## Browser Automation MCP Servers
+Representative public examples:
+- [`microsoft/playwright-mcp`](https://github.com/microsoft/playwright-mcp)
+- [`executeautomation/mcp-playwright`](https://github.com/executeautomation/mcp-playwright)
+Public proof:
+- PR: [`microsoft/playwright-mcp#1657`](https://github.com/microsoft/playwright-mcp/pull/1657)
+- PR: [`executeautomation/mcp-playwright#225`](https://github.com/executeautomation/mcp-playwright/pull/225)
+What this represents:
+- high-capability browser tools
+- agent access to pages, scripts, navigation, screenshots, and user-like actions
+- a category where secure tool invocation and explicit trust boundaries matter
+What Observatory checks:
+- tool inventory
+- schema quality
+- risky browser/code-execution surfaces
+- intentional suppressions for known acceptable findings
+- whether deep invocation should be skipped for tools that can mutate browser state
+Adoption command:
+```bash
+npx @kryptosai/mcp-observatory test --security npx -y @playwright/mcp
+```
+## Filesystem MCP Servers
+Representative public category: filesystem-backed MCP servers.
+Public proof:
+- PR: [`cyanheads/filesystem-mcp-server#19`](https://github.com/cyanheads/filesystem-mcp-server/pull/19)
+What this represents:
+- local file access exposed to agents
+- read/write boundaries that should be explicit
+- capability declarations that need to match observed MCP behavior
+What Observatory checks:
+- tools/resources capability consistency
+- broad filesystem access findings
+- schema quality for path-oriented tools
+- safe sandbox target configuration for CI
+Adoption command:
+```bash
+npx @kryptosai/mcp-observatory test --security npx -y filesystem-mcp-server .
+```
+Use a harmless temporary directory for CI checks when evaluating filesystem servers.
+## Documentation And Search MCP Servers
+Representative public example: [`upstash/context7`](https://github.com/upstash/context7)
+Public proof:
+- PR: [`upstash/context7#2800`](https://github.com/upstash/context7/pull/2800)
+What this represents:
+- documentation retrieval and search tools
+- untrusted or fast-changing text entering an agent context
+- a category where prompt-injection-aware review matters
+What Observatory checks:
+- tool inventory
+- schema quality
+- startup reliability
+- security findings around broad retrieval or response behavior
+- report artifacts that maintainers can review in pull requests
+Adoption command:
+```bash
+npx @kryptosai/mcp-observatory init-ci --all --command "npx -y @upstash/context7-mcp"
+```
+## How To Read These Evaluations
+Passing an Observatory check means the server passed the configured compatibility and security checks for that run. It does not mean the server is universally safe for every environment.
+Use the results as an engineering control:
+- add CI for repeatability
+- compare artifacts between releases
+- review security findings and suppressions
+- document accepted risk for broad tools
+- escalate production/private usage to hosted reporting, certification, or fleet visibility when the server becomes operationally important