@dotsetlabs/bellwether 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +291 -0
- package/LICENSE +21 -0
- package/README.md +739 -0
- package/dist/auth/credentials.d.ts +64 -0
- package/dist/auth/credentials.js +218 -0
- package/dist/auth/index.d.ts +6 -0
- package/dist/auth/index.js +6 -0
- package/dist/auth/keychain.d.ts +64 -0
- package/dist/auth/keychain.js +268 -0
- package/dist/baseline/ab-testing.d.ts +80 -0
- package/dist/baseline/ab-testing.js +236 -0
- package/dist/baseline/ai-compatibility-scorer.d.ts +95 -0
- package/dist/baseline/ai-compatibility-scorer.js +606 -0
- package/dist/baseline/calibration.d.ts +77 -0
- package/dist/baseline/calibration.js +136 -0
- package/dist/baseline/category-matching.d.ts +85 -0
- package/dist/baseline/category-matching.js +289 -0
- package/dist/baseline/change-impact-analyzer.d.ts +98 -0
- package/dist/baseline/change-impact-analyzer.js +592 -0
- package/dist/baseline/comparator.d.ts +64 -0
- package/dist/baseline/comparator.js +916 -0
- package/dist/baseline/confidence.d.ts +55 -0
- package/dist/baseline/confidence.js +122 -0
- package/dist/baseline/converter.d.ts +61 -0
- package/dist/baseline/converter.js +585 -0
- package/dist/baseline/dependency-analyzer.d.ts +89 -0
- package/dist/baseline/dependency-analyzer.js +567 -0
- package/dist/baseline/deprecation-tracker.d.ts +133 -0
- package/dist/baseline/deprecation-tracker.js +322 -0
- package/dist/baseline/diff.d.ts +55 -0
- package/dist/baseline/diff.js +1584 -0
- package/dist/baseline/documentation-scorer.d.ts +205 -0
- package/dist/baseline/documentation-scorer.js +466 -0
- package/dist/baseline/embeddings.d.ts +118 -0
- package/dist/baseline/embeddings.js +251 -0
- package/dist/baseline/error-analyzer.d.ts +198 -0
- package/dist/baseline/error-analyzer.js +721 -0
- package/dist/baseline/evaluation/evaluator.d.ts +42 -0
- package/dist/baseline/evaluation/evaluator.js +323 -0
- package/dist/baseline/evaluation/expanded-dataset.d.ts +45 -0
- package/dist/baseline/evaluation/expanded-dataset.js +1164 -0
- package/dist/baseline/evaluation/golden-dataset.d.ts +58 -0
- package/dist/baseline/evaluation/golden-dataset.js +717 -0
- package/dist/baseline/evaluation/index.d.ts +15 -0
- package/dist/baseline/evaluation/index.js +15 -0
- package/dist/baseline/evaluation/types.d.ts +186 -0
- package/dist/baseline/evaluation/types.js +8 -0
- package/dist/baseline/external-dependency-detector.d.ts +181 -0
- package/dist/baseline/external-dependency-detector.js +524 -0
- package/dist/baseline/golden-output.d.ts +162 -0
- package/dist/baseline/golden-output.js +636 -0
- package/dist/baseline/health-scorer.d.ts +174 -0
- package/dist/baseline/health-scorer.js +451 -0
- package/dist/baseline/incremental-checker.d.ts +97 -0
- package/dist/baseline/incremental-checker.js +174 -0
- package/dist/baseline/index.d.ts +31 -0
- package/dist/baseline/index.js +42 -0
- package/dist/baseline/migration-generator.d.ts +137 -0
- package/dist/baseline/migration-generator.js +554 -0
- package/dist/baseline/migrations.d.ts +60 -0
- package/dist/baseline/migrations.js +197 -0
- package/dist/baseline/performance-tracker.d.ts +214 -0
- package/dist/baseline/performance-tracker.js +577 -0
- package/dist/baseline/pr-comment-generator.d.ts +117 -0
- package/dist/baseline/pr-comment-generator.js +546 -0
- package/dist/baseline/response-fingerprint.d.ts +127 -0
- package/dist/baseline/response-fingerprint.js +728 -0
- package/dist/baseline/response-schema-tracker.d.ts +129 -0
- package/dist/baseline/response-schema-tracker.js +420 -0
- package/dist/baseline/risk-scorer.d.ts +54 -0
- package/dist/baseline/risk-scorer.js +434 -0
- package/dist/baseline/saver.d.ts +89 -0
- package/dist/baseline/saver.js +554 -0
- package/dist/baseline/scenario-generator.d.ts +151 -0
- package/dist/baseline/scenario-generator.js +905 -0
- package/dist/baseline/schema-compare.d.ts +86 -0
- package/dist/baseline/schema-compare.js +557 -0
- package/dist/baseline/schema-evolution.d.ts +189 -0
- package/dist/baseline/schema-evolution.js +467 -0
- package/dist/baseline/semantic.d.ts +203 -0
- package/dist/baseline/semantic.js +908 -0
- package/dist/baseline/synonyms.d.ts +60 -0
- package/dist/baseline/synonyms.js +386 -0
- package/dist/baseline/telemetry.d.ts +165 -0
- package/dist/baseline/telemetry.js +294 -0
- package/dist/baseline/test-pruner.d.ts +120 -0
- package/dist/baseline/test-pruner.js +387 -0
- package/dist/baseline/types.d.ts +449 -0
- package/dist/baseline/types.js +5 -0
- package/dist/baseline/version.d.ts +138 -0
- package/dist/baseline/version.js +206 -0
- package/dist/cache/index.d.ts +5 -0
- package/dist/cache/index.js +5 -0
- package/dist/cache/response-cache.d.ts +151 -0
- package/dist/cache/response-cache.js +287 -0
- package/dist/ci/index.d.ts +60 -0
- package/dist/ci/index.js +342 -0
- package/dist/cli/commands/auth.d.ts +12 -0
- package/dist/cli/commands/auth.js +352 -0
- package/dist/cli/commands/badge.d.ts +3 -0
- package/dist/cli/commands/badge.js +74 -0
- package/dist/cli/commands/baseline-accept.d.ts +15 -0
- package/dist/cli/commands/baseline-accept.js +178 -0
- package/dist/cli/commands/baseline-migrate.d.ts +12 -0
- package/dist/cli/commands/baseline-migrate.js +164 -0
- package/dist/cli/commands/baseline.d.ts +14 -0
- package/dist/cli/commands/baseline.js +449 -0
- package/dist/cli/commands/beta.d.ts +10 -0
- package/dist/cli/commands/beta.js +231 -0
- package/dist/cli/commands/check.d.ts +11 -0
- package/dist/cli/commands/check.js +820 -0
- package/dist/cli/commands/cloud/badge.d.ts +3 -0
- package/dist/cli/commands/cloud/badge.js +74 -0
- package/dist/cli/commands/cloud/diff.d.ts +6 -0
- package/dist/cli/commands/cloud/diff.js +79 -0
- package/dist/cli/commands/cloud/history.d.ts +6 -0
- package/dist/cli/commands/cloud/history.js +102 -0
- package/dist/cli/commands/cloud/link.d.ts +9 -0
- package/dist/cli/commands/cloud/link.js +119 -0
- package/dist/cli/commands/cloud/login.d.ts +7 -0
- package/dist/cli/commands/cloud/login.js +499 -0
- package/dist/cli/commands/cloud/projects.d.ts +6 -0
- package/dist/cli/commands/cloud/projects.js +44 -0
- package/dist/cli/commands/cloud/shared.d.ts +7 -0
- package/dist/cli/commands/cloud/shared.js +42 -0
- package/dist/cli/commands/cloud/teams.d.ts +8 -0
- package/dist/cli/commands/cloud/teams.js +169 -0
- package/dist/cli/commands/cloud/upload.d.ts +8 -0
- package/dist/cli/commands/cloud/upload.js +181 -0
- package/dist/cli/commands/contract.d.ts +11 -0
- package/dist/cli/commands/contract.js +280 -0
- package/dist/cli/commands/discover.d.ts +3 -0
- package/dist/cli/commands/discover.js +82 -0
- package/dist/cli/commands/eval.d.ts +9 -0
- package/dist/cli/commands/eval.js +187 -0
- package/dist/cli/commands/explore.d.ts +11 -0
- package/dist/cli/commands/explore.js +437 -0
- package/dist/cli/commands/feedback.d.ts +9 -0
- package/dist/cli/commands/feedback.js +174 -0
- package/dist/cli/commands/golden.d.ts +12 -0
- package/dist/cli/commands/golden.js +407 -0
- package/dist/cli/commands/history.d.ts +10 -0
- package/dist/cli/commands/history.js +202 -0
- package/dist/cli/commands/init.d.ts +9 -0
- package/dist/cli/commands/init.js +219 -0
- package/dist/cli/commands/interview.d.ts +3 -0
- package/dist/cli/commands/interview.js +903 -0
- package/dist/cli/commands/link.d.ts +10 -0
- package/dist/cli/commands/link.js +169 -0
- package/dist/cli/commands/login.d.ts +7 -0
- package/dist/cli/commands/login.js +499 -0
- package/dist/cli/commands/preset.d.ts +33 -0
- package/dist/cli/commands/preset.js +297 -0
- package/dist/cli/commands/profile.d.ts +33 -0
- package/dist/cli/commands/profile.js +286 -0
- package/dist/cli/commands/registry.d.ts +11 -0
- package/dist/cli/commands/registry.js +146 -0
- package/dist/cli/commands/shared.d.ts +79 -0
- package/dist/cli/commands/shared.js +196 -0
- package/dist/cli/commands/teams.d.ts +8 -0
- package/dist/cli/commands/teams.js +169 -0
- package/dist/cli/commands/test.d.ts +9 -0
- package/dist/cli/commands/test.js +500 -0
- package/dist/cli/commands/upload.d.ts +8 -0
- package/dist/cli/commands/upload.js +223 -0
- package/dist/cli/commands/validate-config.d.ts +6 -0
- package/dist/cli/commands/validate-config.js +35 -0
- package/dist/cli/commands/verify.d.ts +11 -0
- package/dist/cli/commands/verify.js +283 -0
- package/dist/cli/commands/watch.d.ts +12 -0
- package/dist/cli/commands/watch.js +253 -0
- package/dist/cli/index.d.ts +3 -0
- package/dist/cli/index.js +178 -0
- package/dist/cli/interactive.d.ts +47 -0
- package/dist/cli/interactive.js +216 -0
- package/dist/cli/output/terminal-reporter.d.ts +19 -0
- package/dist/cli/output/terminal-reporter.js +104 -0
- package/dist/cli/output.d.ts +226 -0
- package/dist/cli/output.js +438 -0
- package/dist/cli/utils/env.d.ts +5 -0
- package/dist/cli/utils/env.js +14 -0
- package/dist/cli/utils/progress.d.ts +59 -0
- package/dist/cli/utils/progress.js +206 -0
- package/dist/cli/utils/server-context.d.ts +10 -0
- package/dist/cli/utils/server-context.js +36 -0
- package/dist/cloud/auth.d.ts +144 -0
- package/dist/cloud/auth.js +374 -0
- package/dist/cloud/client.d.ts +24 -0
- package/dist/cloud/client.js +65 -0
- package/dist/cloud/http-client.d.ts +38 -0
- package/dist/cloud/http-client.js +215 -0
- package/dist/cloud/index.d.ts +23 -0
- package/dist/cloud/index.js +25 -0
- package/dist/cloud/mock-client.d.ts +107 -0
- package/dist/cloud/mock-client.js +545 -0
- package/dist/cloud/types.d.ts +515 -0
- package/dist/cloud/types.js +15 -0
- package/dist/config/defaults.d.ts +160 -0
- package/dist/config/defaults.js +169 -0
- package/dist/config/loader.d.ts +24 -0
- package/dist/config/loader.js +122 -0
- package/dist/config/template.d.ts +42 -0
- package/dist/config/template.js +647 -0
- package/dist/config/validator.d.ts +2112 -0
- package/dist/config/validator.js +658 -0
- package/dist/constants/cloud.d.ts +107 -0
- package/dist/constants/cloud.js +110 -0
- package/dist/constants/core.d.ts +521 -0
- package/dist/constants/core.js +556 -0
- package/dist/constants/testing.d.ts +1283 -0
- package/dist/constants/testing.js +1568 -0
- package/dist/constants.d.ts +10 -0
- package/dist/constants.js +10 -0
- package/dist/contract/index.d.ts +6 -0
- package/dist/contract/index.js +5 -0
- package/dist/contract/validator.d.ts +177 -0
- package/dist/contract/validator.js +574 -0
- package/dist/cost/index.d.ts +6 -0
- package/dist/cost/index.js +5 -0
- package/dist/cost/tracker.d.ts +134 -0
- package/dist/cost/tracker.js +313 -0
- package/dist/discovery/discovery.d.ts +16 -0
- package/dist/discovery/discovery.js +173 -0
- package/dist/discovery/types.d.ts +51 -0
- package/dist/discovery/types.js +2 -0
- package/dist/docs/agents.d.ts +3 -0
- package/dist/docs/agents.js +995 -0
- package/dist/docs/contract.d.ts +51 -0
- package/dist/docs/contract.js +1681 -0
- package/dist/docs/generator.d.ts +4 -0
- package/dist/docs/generator.js +4 -0
- package/dist/docs/html-reporter.d.ts +9 -0
- package/dist/docs/html-reporter.js +757 -0
- package/dist/docs/index.d.ts +10 -0
- package/dist/docs/index.js +11 -0
- package/dist/docs/junit-reporter.d.ts +18 -0
- package/dist/docs/junit-reporter.js +210 -0
- package/dist/docs/report.d.ts +14 -0
- package/dist/docs/report.js +44 -0
- package/dist/docs/sarif-reporter.d.ts +19 -0
- package/dist/docs/sarif-reporter.js +335 -0
- package/dist/docs/shared.d.ts +35 -0
- package/dist/docs/shared.js +162 -0
- package/dist/docs/templates.d.ts +12 -0
- package/dist/docs/templates.js +76 -0
- package/dist/errors/index.d.ts +6 -0
- package/dist/errors/index.js +6 -0
- package/dist/errors/retry.d.ts +92 -0
- package/dist/errors/retry.js +323 -0
- package/dist/errors/types.d.ts +321 -0
- package/dist/errors/types.js +584 -0
- package/dist/index.d.ts +32 -0
- package/dist/index.js +32 -0
- package/dist/interview/dependency-resolver.d.ts +11 -0
- package/dist/interview/dependency-resolver.js +32 -0
- package/dist/interview/interviewer.d.ts +232 -0
- package/dist/interview/interviewer.js +1939 -0
- package/dist/interview/mock-response-generator.d.ts +7 -0
- package/dist/interview/mock-response-generator.js +102 -0
- package/dist/interview/orchestrator.d.ts +237 -0
- package/dist/interview/orchestrator.js +1296 -0
- package/dist/interview/rate-limiter.d.ts +15 -0
- package/dist/interview/rate-limiter.js +55 -0
- package/dist/interview/response-validator.d.ts +10 -0
- package/dist/interview/response-validator.js +132 -0
- package/dist/interview/schema-inferrer.d.ts +8 -0
- package/dist/interview/schema-inferrer.js +71 -0
- package/dist/interview/schema-test-generator.d.ts +71 -0
- package/dist/interview/schema-test-generator.js +834 -0
- package/dist/interview/smart-value-generator.d.ts +155 -0
- package/dist/interview/smart-value-generator.js +554 -0
- package/dist/interview/stateful-test-runner.d.ts +19 -0
- package/dist/interview/stateful-test-runner.js +106 -0
- package/dist/interview/types.d.ts +561 -0
- package/dist/interview/types.js +2 -0
- package/dist/llm/anthropic.d.ts +41 -0
- package/dist/llm/anthropic.js +355 -0
- package/dist/llm/client.d.ts +123 -0
- package/dist/llm/client.js +42 -0
- package/dist/llm/factory.d.ts +38 -0
- package/dist/llm/factory.js +145 -0
- package/dist/llm/fallback.d.ts +140 -0
- package/dist/llm/fallback.js +379 -0
- package/dist/llm/index.d.ts +18 -0
- package/dist/llm/index.js +15 -0
- package/dist/llm/ollama.d.ts +37 -0
- package/dist/llm/ollama.js +330 -0
- package/dist/llm/openai.d.ts +25 -0
- package/dist/llm/openai.js +320 -0
- package/dist/llm/token-budget.d.ts +161 -0
- package/dist/llm/token-budget.js +395 -0
- package/dist/logging/logger.d.ts +70 -0
- package/dist/logging/logger.js +130 -0
- package/dist/metrics/collector.d.ts +106 -0
- package/dist/metrics/collector.js +547 -0
- package/dist/metrics/index.d.ts +7 -0
- package/dist/metrics/index.js +7 -0
- package/dist/metrics/prometheus.d.ts +20 -0
- package/dist/metrics/prometheus.js +241 -0
- package/dist/metrics/types.d.ts +209 -0
- package/dist/metrics/types.js +5 -0
- package/dist/persona/builtins.d.ts +54 -0
- package/dist/persona/builtins.js +219 -0
- package/dist/persona/index.d.ts +8 -0
- package/dist/persona/index.js +8 -0
- package/dist/persona/loader.d.ts +30 -0
- package/dist/persona/loader.js +190 -0
- package/dist/persona/types.d.ts +144 -0
- package/dist/persona/types.js +5 -0
- package/dist/persona/validation.d.ts +94 -0
- package/dist/persona/validation.js +332 -0
- package/dist/prompts/index.d.ts +5 -0
- package/dist/prompts/index.js +5 -0
- package/dist/prompts/templates.d.ts +180 -0
- package/dist/prompts/templates.js +431 -0
- package/dist/registry/client.d.ts +49 -0
- package/dist/registry/client.js +191 -0
- package/dist/registry/index.d.ts +7 -0
- package/dist/registry/index.js +6 -0
- package/dist/registry/types.d.ts +140 -0
- package/dist/registry/types.js +6 -0
- package/dist/scenarios/evaluator.d.ts +43 -0
- package/dist/scenarios/evaluator.js +206 -0
- package/dist/scenarios/index.d.ts +10 -0
- package/dist/scenarios/index.js +9 -0
- package/dist/scenarios/loader.d.ts +20 -0
- package/dist/scenarios/loader.js +285 -0
- package/dist/scenarios/types.d.ts +153 -0
- package/dist/scenarios/types.js +8 -0
- package/dist/security/index.d.ts +17 -0
- package/dist/security/index.js +18 -0
- package/dist/security/payloads.d.ts +61 -0
- package/dist/security/payloads.js +268 -0
- package/dist/security/security-tester.d.ts +42 -0
- package/dist/security/security-tester.js +582 -0
- package/dist/security/types.d.ts +166 -0
- package/dist/security/types.js +8 -0
- package/dist/transport/base-transport.d.ts +59 -0
- package/dist/transport/base-transport.js +38 -0
- package/dist/transport/http-transport.d.ts +67 -0
- package/dist/transport/http-transport.js +238 -0
- package/dist/transport/mcp-client.d.ts +141 -0
- package/dist/transport/mcp-client.js +496 -0
- package/dist/transport/sse-transport.d.ts +88 -0
- package/dist/transport/sse-transport.js +316 -0
- package/dist/transport/stdio-transport.d.ts +43 -0
- package/dist/transport/stdio-transport.js +238 -0
- package/dist/transport/types.d.ts +125 -0
- package/dist/transport/types.js +16 -0
- package/dist/utils/concurrency.d.ts +123 -0
- package/dist/utils/concurrency.js +213 -0
- package/dist/utils/formatters.d.ts +16 -0
- package/dist/utils/formatters.js +37 -0
- package/dist/utils/index.d.ts +8 -0
- package/dist/utils/index.js +8 -0
- package/dist/utils/jsonpath.d.ts +87 -0
- package/dist/utils/jsonpath.js +326 -0
- package/dist/utils/markdown.d.ts +113 -0
- package/dist/utils/markdown.js +265 -0
- package/dist/utils/network.d.ts +14 -0
- package/dist/utils/network.js +17 -0
- package/dist/utils/sanitize.d.ts +92 -0
- package/dist/utils/sanitize.js +191 -0
- package/dist/utils/semantic.d.ts +194 -0
- package/dist/utils/semantic.js +1051 -0
- package/dist/utils/smart-truncate.d.ts +94 -0
- package/dist/utils/smart-truncate.js +361 -0
- package/dist/utils/timeout.d.ts +153 -0
- package/dist/utils/timeout.js +205 -0
- package/dist/utils/yaml-parser.d.ts +58 -0
- package/dist/utils/yaml-parser.js +86 -0
- package/dist/validation/index.d.ts +32 -0
- package/dist/validation/index.js +32 -0
- package/dist/validation/semantic-test-generator.d.ts +50 -0
- package/dist/validation/semantic-test-generator.js +176 -0
- package/dist/validation/semantic-types.d.ts +66 -0
- package/dist/validation/semantic-types.js +94 -0
- package/dist/validation/semantic-validator.d.ts +38 -0
- package/dist/validation/semantic-validator.js +340 -0
- package/dist/verification/index.d.ts +6 -0
- package/dist/verification/index.js +5 -0
- package/dist/verification/types.d.ts +133 -0
- package/dist/verification/types.js +5 -0
- package/dist/verification/verifier.d.ts +30 -0
- package/dist/verification/verifier.js +309 -0
- package/dist/version.d.ts +19 -0
- package/dist/version.js +48 -0
- package/dist/workflow/auto-generator.d.ts +27 -0
- package/dist/workflow/auto-generator.js +513 -0
- package/dist/workflow/discovery.d.ts +40 -0
- package/dist/workflow/discovery.js +195 -0
- package/dist/workflow/executor.d.ts +82 -0
- package/dist/workflow/executor.js +611 -0
- package/dist/workflow/index.d.ts +10 -0
- package/dist/workflow/index.js +10 -0
- package/dist/workflow/loader.d.ts +24 -0
- package/dist/workflow/loader.js +194 -0
- package/dist/workflow/state-tracker.d.ts +98 -0
- package/dist/workflow/state-tracker.js +424 -0
- package/dist/workflow/types.d.ts +337 -0
- package/dist/workflow/types.js +5 -0
- package/package.json +94 -0
- package/schemas/bellwether-check.schema.json +651 -0
package/README.md
ADDED
|
@@ -0,0 +1,739 @@
|
|
|
1
|
+
# Bellwether
|
|
2
|
+
|
|
3
|
+
[](https://github.com/dotsetlabs/bellwether/actions)
|
|
4
|
+
[](https://www.npmjs.com/package/@dotsetlabs/bellwether)
|
|
5
|
+
[](https://docs.bellwether.sh)
|
|
6
|
+
|
|
7
|
+
> **Catch MCP server drift before your users do. Zero LLM required.**
|
|
8
|
+
|
|
9
|
+
Bellwether detects structural changes in your [MCP (Model Context Protocol)](https://modelcontextprotocol.io/) server using **schema comparison**. No LLM needed. Free. Deterministic.
|
|
10
|
+
|
|
11
|
+
## Quick Start
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
# Install
|
|
15
|
+
npm install -g @dotsetlabs/bellwether
|
|
16
|
+
|
|
17
|
+
# Initialize configuration (required before any other command)
|
|
18
|
+
bellwether init npx @mcp/your-server
|
|
19
|
+
|
|
20
|
+
# Check for drift (free, fast, deterministic)
|
|
21
|
+
bellwether check
|
|
22
|
+
|
|
23
|
+
# Save baseline for drift detection
|
|
24
|
+
bellwether baseline save
|
|
25
|
+
|
|
26
|
+
# Optional: Explore behavior with LLM
|
|
27
|
+
bellwether explore
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
That's it. No API keys needed for check. No LLM costs. Deterministic results.
|
|
31
|
+
|
|
32
|
+
## CI/CD Integration
|
|
33
|
+
|
|
34
|
+
Add drift detection to every PR:
|
|
35
|
+
|
|
36
|
+
```yaml
|
|
37
|
+
# .github/workflows/bellwether.yml
|
|
38
|
+
name: MCP Drift Detection
|
|
39
|
+
on: [pull_request]
|
|
40
|
+
|
|
41
|
+
jobs:
|
|
42
|
+
bellwether:
|
|
43
|
+
runs-on: ubuntu-latest
|
|
44
|
+
steps:
|
|
45
|
+
- uses: actions/checkout@v4
|
|
46
|
+
- run: npx @dotsetlabs/bellwether init --preset ci npx @mcp/your-server
|
|
47
|
+
- run: npx @dotsetlabs/bellwether check --fail-on-drift
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
Commit `bellwether.yaml` to your repo so CI always has your config. No secrets needed for `check`. Runs in seconds.
|
|
51
|
+
|
|
52
|
+
### Exit Codes
|
|
53
|
+
|
|
54
|
+
Check command returns granular exit codes for CI/CD pipelines:
|
|
55
|
+
|
|
56
|
+
| Code | Meaning | CI Action |
|
|
57
|
+
|:-----|:--------|:----------|
|
|
58
|
+
| `0` | No changes detected | Pass |
|
|
59
|
+
| `1` | Info-level changes only | Exit code `1` (handle in CI as desired) |
|
|
60
|
+
| `2` | Warning-level changes | Exit code `2` (handle in CI as desired) |
|
|
61
|
+
| `3` | Breaking changes | Always fail |
|
|
62
|
+
| `4` | Runtime error | Fail |
|
|
63
|
+
| `5` | Low confidence (when `check.sampling.failOnLowConfidence` is true) | Fail |
|
|
64
|
+
|
|
65
|
+
## What Bellwether Detects
|
|
66
|
+
|
|
67
|
+
Check mode detects when your MCP server changes:
|
|
68
|
+
|
|
69
|
+
| Change Type | Example | Detected |
|
|
70
|
+
|:------------|:--------|:---------|
|
|
71
|
+
| **Tool added** | New `delete_file` tool appears | Yes |
|
|
72
|
+
| **Tool removed** | `write_file` tool disappears | Yes |
|
|
73
|
+
| **Schema changed** | Parameter `path` becomes required | Yes |
|
|
74
|
+
| **Description changed** | Tool help text updated | Yes |
|
|
75
|
+
| **Tool renamed** | `read` becomes `read_file` | Yes |
|
|
76
|
+
| **Performance regression** | Tool latency increased >10% | Yes |
|
|
77
|
+
| **Performance confidence** | Statistical reliability of metrics | Yes |
|
|
78
|
+
| **Security vulnerabilities** | SQL injection accepted (when `check.security.enabled` is on) | Yes |
|
|
79
|
+
| **Response schema changes** | Response fields added/removed | Yes |
|
|
80
|
+
| **Unstable schemas** | Inconsistent response structures | Yes |
|
|
81
|
+
| **Error trends** | New error types, increasing errors | Yes |
|
|
82
|
+
|
|
83
|
+
This catches the changes that break AI agent workflows.
|
|
84
|
+
|
|
85
|
+
## Documentation
|
|
86
|
+
|
|
87
|
+
**[docs.bellwether.sh](https://docs.bellwether.sh)** - Full documentation including:
|
|
88
|
+
|
|
89
|
+
- [Quick Start](https://docs.bellwether.sh/quickstart)
|
|
90
|
+
- [CLI Reference](https://docs.bellwether.sh/cli/init)
|
|
91
|
+
- [Test Modes](https://docs.bellwether.sh/concepts/test-modes)
|
|
92
|
+
- [CI/CD Integration](https://docs.bellwether.sh/guides/ci-cd)
|
|
93
|
+
- [Cloud Features](https://docs.bellwether.sh/cloud)
|
|
94
|
+
|
|
95
|
+
## Configuration
|
|
96
|
+
|
|
97
|
+
All settings are configured in `bellwether.yaml`. Create one with:
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
bellwether init npx @mcp/your-server # Default (free, fast)
|
|
101
|
+
bellwether init --preset ci npx @mcp/server # Optimized for CI/CD
|
|
102
|
+
bellwether init --preset security npx @mcp/server # Security-focused exploration
|
|
103
|
+
bellwether init --preset thorough npx @mcp/server # Comprehensive exploration
|
|
104
|
+
bellwether init --preset local npx @mcp/server # Exploration with local Ollama
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
The generated config file is fully documented with all available options.
|
|
108
|
+
|
|
109
|
+
### Environment Variable Interpolation
|
|
110
|
+
|
|
111
|
+
Reference environment variables in your config:
|
|
112
|
+
|
|
113
|
+
```yaml
|
|
114
|
+
server:
|
|
115
|
+
command: "npx @mcp/your-server"
|
|
116
|
+
env:
|
|
117
|
+
API_KEY: "${API_KEY}"
|
|
118
|
+
DEBUG: "${DEBUG:-false}" # With default value
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
This allows committing `bellwether.yaml` to version control without exposing secrets.
|
|
122
|
+
|
|
123
|
+
## Commands
|
|
124
|
+
|
|
125
|
+
### Check Command (Recommended for CI)
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
bellwether init npx @mcp/your-server
|
|
129
|
+
bellwether check
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
- **Zero LLM** - No API keys required
|
|
133
|
+
- **Free** - No token costs
|
|
134
|
+
- **Deterministic** - Same input = same output
|
|
135
|
+
- **Fast** - Runs in seconds (use `check.parallel` in config for more speed)
|
|
136
|
+
- **Output** - Writes `CONTRACT.md` to `output.docsDir` and `bellwether-check.json` to `output.dir` (filenames configurable via `output.files.contractDoc` and `output.files.checkReport`)
|
|
137
|
+
- **CI-Optimized** - Granular exit codes (0-5), JUnit/SARIF output formats
|
|
138
|
+
|
|
139
|
+
#### Check Mode Enhancements
|
|
140
|
+
|
|
141
|
+
- **Stateful testing** for create → use → delete chains
|
|
142
|
+
- **External service handling** (skip, mock, or fail when credentials are missing)
|
|
143
|
+
- **Response assertions** for semantic validation of outputs
|
|
144
|
+
- **Rate limiting** to avoid 429s on production servers
|
|
145
|
+
|
|
146
|
+
Example configuration:
|
|
147
|
+
|
|
148
|
+
```yaml
|
|
149
|
+
check:
|
|
150
|
+
statefulTesting:
|
|
151
|
+
enabled: true
|
|
152
|
+
maxChainLength: 5
|
|
153
|
+
shareOutputsBetweenTools: true
|
|
154
|
+
|
|
155
|
+
externalServices:
|
|
156
|
+
mode: skip # skip | mock | fail
|
|
157
|
+
services:
|
|
158
|
+
plaid:
|
|
159
|
+
enabled: false
|
|
160
|
+
sandboxCredentials:
|
|
161
|
+
clientId: "${PLAID_CLIENT_ID}"
|
|
162
|
+
secret: "${PLAID_SECRET}"
|
|
163
|
+
|
|
164
|
+
assertions:
|
|
165
|
+
enabled: true
|
|
166
|
+
strict: false
|
|
167
|
+
infer: true
|
|
168
|
+
|
|
169
|
+
rateLimit:
|
|
170
|
+
enabled: false
|
|
171
|
+
requestsPerSecond: 10
|
|
172
|
+
burstLimit: 20
|
|
173
|
+
backoffStrategy: exponential
|
|
174
|
+
maxRetries: 3
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
#### Check Report Schema
|
|
178
|
+
|
|
179
|
+
`bellwether-check.json` includes a `$schema` pointer and is validated before writing.
|
|
180
|
+
Schema URL:
|
|
181
|
+
|
|
182
|
+
```
|
|
183
|
+
https://unpkg.com/@dotsetlabs/bellwether/schemas/bellwether-check.schema.json
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
### Explore Command (Optional)
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
bellwether init --preset local npx @mcp/your-server # Uses local Ollama (free)
|
|
190
|
+
# or
|
|
191
|
+
bellwether init --preset thorough npx @mcp/server # Uses OpenAI (requires API key)
|
|
192
|
+
|
|
193
|
+
bellwether explore
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
- Requires LLM (Ollama for free local, or OpenAI/Anthropic)
|
|
197
|
+
- Multi-persona testing (technical writer, security tester, QA, novice)
|
|
198
|
+
- Generates `AGENTS.md` documentation (filename configurable via `output.files.agentsDoc`)
|
|
199
|
+
- Better for local development and deep exploration
|
|
200
|
+
|
|
201
|
+
### Core Commands
|
|
202
|
+
|
|
203
|
+
```bash
|
|
204
|
+
# Initialize configuration (creates bellwether.yaml)
|
|
205
|
+
bellwether init npx @mcp/server
|
|
206
|
+
bellwether init --preset ci npx @mcp/server
|
|
207
|
+
|
|
208
|
+
# Validate configuration (no tests)
|
|
209
|
+
bellwether validate-config
|
|
210
|
+
|
|
211
|
+
# Check for drift (free, fast, deterministic)
|
|
212
|
+
bellwether check # Uses server.command from config
|
|
213
|
+
bellwether check npx @mcp/server # Override server command
|
|
214
|
+
bellwether check --fail-on-drift # Override baseline.failOnDrift from config
|
|
215
|
+
bellwether check --format junit # JUnit XML output for CI
|
|
216
|
+
bellwether check --format sarif # SARIF output for GitHub Code Scanning
|
|
217
|
+
|
|
218
|
+
# Configure performance, parallelism, incremental, and security in bellwether.yaml
|
|
219
|
+
# (check.parallel, check.incremental, check.security, check.sampling)
|
|
220
|
+
|
|
221
|
+
# Explore behavior (LLM-powered)
|
|
222
|
+
bellwether explore # Uses server.command from config
|
|
223
|
+
bellwether explore npx @mcp/server # Override server command
|
|
224
|
+
|
|
225
|
+
# Discover server capabilities
|
|
226
|
+
bellwether discover npx @mcp/server
|
|
227
|
+
|
|
228
|
+
# Watch mode (re-check on file changes, uses config)
|
|
229
|
+
bellwether watch
|
|
230
|
+
|
|
231
|
+
# Search MCP Registry
|
|
232
|
+
bellwether registry filesystem
|
|
233
|
+
bellwether registry database --limit 5
|
|
234
|
+
|
|
235
|
+
# Generate verification report
|
|
236
|
+
bellwether verify --tier gold
|
|
237
|
+
|
|
238
|
+
# Validate against contracts
|
|
239
|
+
bellwether contract generate npx @mcp/server
|
|
240
|
+
bellwether contract validate npx @mcp/server
|
|
241
|
+
bellwether contract show # Display current contract
|
|
242
|
+
|
|
243
|
+
# Manage golden outputs (deterministic regression tests)
|
|
244
|
+
bellwether golden save --tool my_tool --args '{"id":"123"}'
|
|
245
|
+
bellwether golden compare
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### Baseline Commands
|
|
249
|
+
|
|
250
|
+
```bash
|
|
251
|
+
# Save test results as baseline
|
|
252
|
+
bellwether baseline save
|
|
253
|
+
bellwether baseline save ./my-baseline.json
|
|
254
|
+
|
|
255
|
+
# Compare test results against baseline
|
|
256
|
+
bellwether baseline compare --fail-on-drift # Uses baseline.comparePath or baseline.path from config
|
|
257
|
+
bellwether baseline compare ./baseline.json --ignore-version-mismatch # Force compare incompatible versions
|
|
258
|
+
|
|
259
|
+
# Show baseline contents
|
|
260
|
+
bellwether baseline show
|
|
261
|
+
bellwether baseline show ./baseline.json --json
|
|
262
|
+
|
|
263
|
+
# Compare two baseline files
|
|
264
|
+
bellwether baseline diff v1.json v2.json
|
|
265
|
+
bellwether baseline diff v1.json v2.json --ignore-version-mismatch # Force compare incompatible versions
|
|
266
|
+
|
|
267
|
+
# Migrate baseline to current format version
|
|
268
|
+
bellwether baseline migrate ./bellwether-baseline.json
|
|
269
|
+
bellwether baseline migrate ./baseline.json --dry-run
|
|
270
|
+
bellwether baseline migrate ./baseline.json --info
|
|
271
|
+
|
|
272
|
+
# Accept drift as intentional (update baseline)
|
|
273
|
+
bellwether baseline accept --reason "Intentional API change"
|
|
274
|
+
bellwether baseline accept --dry-run # Preview without saving
|
|
275
|
+
bellwether baseline accept --force # Required for breaking changes
|
|
276
|
+
```
|
|
277
|
+
|
|
278
|
+
### Baseline Format Versioning
|
|
279
|
+
|
|
280
|
+
Baselines use semantic versioning (e.g., `1.0.0`) for the format version:
|
|
281
|
+
|
|
282
|
+
- **Major version** - Breaking contract changes (removed fields, type changes)
|
|
283
|
+
- **Minor version** - New optional fields (backwards compatible)
|
|
284
|
+
- **Patch version** - Bug fixes in baseline generation
|
|
285
|
+
|
|
286
|
+
**Compatibility rules:**
|
|
287
|
+
- Same major version = Compatible (can compare baselines)
|
|
288
|
+
- Different major version = Incompatible (requires migration)
|
|
289
|
+
|
|
290
|
+
When comparing baselines with incompatible versions, the CLI will show an error:
|
|
291
|
+
|
|
292
|
+
```
|
|
293
|
+
Cannot compare baselines with incompatible format versions: v1.0.0 vs v2.0.0.
|
|
294
|
+
Use 'bellwether baseline migrate' to upgrade the older baseline,
|
|
295
|
+
or use --ignore-version-mismatch to force comparison (results may be incorrect).
|
|
296
|
+
```
|
|
297
|
+
|
|
298
|
+
To upgrade older baselines:
|
|
299
|
+
|
|
300
|
+
```bash
|
|
301
|
+
# Check if migration is needed
|
|
302
|
+
bellwether baseline migrate ./baseline.json --info
|
|
303
|
+
|
|
304
|
+
# Preview changes without writing
|
|
305
|
+
bellwether baseline migrate ./baseline.json --dry-run
|
|
306
|
+
|
|
307
|
+
# Perform migration
|
|
308
|
+
bellwether baseline migrate ./baseline.json
|
|
309
|
+
```
|
|
310
|
+
|
|
311
|
+
### Cloud Commands
|
|
312
|
+
|
|
313
|
+
```bash
|
|
314
|
+
# Authenticate with Bellwether Cloud
|
|
315
|
+
bellwether login
|
|
316
|
+
bellwether login --status
|
|
317
|
+
bellwether login --logout
|
|
318
|
+
|
|
319
|
+
# Manage team selection (for multi-team users)
|
|
320
|
+
bellwether teams # List your teams
|
|
321
|
+
bellwether teams switch # Interactive team selection
|
|
322
|
+
bellwether teams switch <id> # Switch to specific team
|
|
323
|
+
bellwether teams current # Show current active team
|
|
324
|
+
|
|
325
|
+
# Link project to cloud
|
|
326
|
+
bellwether link
|
|
327
|
+
bellwether link --status
|
|
328
|
+
bellwether link --unlink
|
|
329
|
+
|
|
330
|
+
# List cloud projects
|
|
331
|
+
bellwether projects
|
|
332
|
+
bellwether projects --json
|
|
333
|
+
|
|
334
|
+
# Upload baseline to cloud
|
|
335
|
+
bellwether upload
|
|
336
|
+
bellwether upload --ci --fail-on-drift
|
|
337
|
+
|
|
338
|
+
# View baseline version history
|
|
339
|
+
bellwether history
|
|
340
|
+
bellwether history --limit 20
|
|
341
|
+
|
|
342
|
+
# Compare cloud baseline versions
|
|
343
|
+
bellwether diff 1 2
|
|
344
|
+
|
|
345
|
+
# Get verification badge
|
|
346
|
+
bellwether badge --markdown
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
### Auth Commands
|
|
350
|
+
|
|
351
|
+
```bash
|
|
352
|
+
# Manage LLM API keys (stored in system keychain)
|
|
353
|
+
bellwether auth # Interactive API key setup
|
|
354
|
+
bellwether auth status # Show configured providers
|
|
355
|
+
bellwether auth add openai # Add a specific provider key
|
|
356
|
+
bellwether auth remove openai # Remove a specific provider key
|
|
357
|
+
bellwether auth clear # Remove all stored keys
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
## Security Testing
|
|
361
|
+
|
|
362
|
+
Run deterministic security vulnerability testing on your MCP tools:
|
|
363
|
+
|
|
364
|
+
```bash
|
|
365
|
+
# Enable security testing in config
|
|
366
|
+
bellwether init --preset security npx @mcp/your-server
|
|
367
|
+
bellwether check
|
|
368
|
+
```
|
|
369
|
+
|
|
370
|
+
### Security Categories
|
|
371
|
+
|
|
372
|
+
| Category | Description | CWE |
|
|
373
|
+
|:---------|:------------|:----|
|
|
374
|
+
| `sql_injection` | SQL injection payloads | CWE-89 |
|
|
375
|
+
| `xss` | Cross-site scripting payloads | CWE-79 |
|
|
376
|
+
| `path_traversal` | Path traversal attempts | CWE-22 |
|
|
377
|
+
| `command_injection` | Command injection payloads | CWE-78 |
|
|
378
|
+
| `ssrf` | Server-side request forgery | CWE-918 |
|
|
379
|
+
| `error_disclosure` | Sensitive error disclosure | CWE-209 |
|
|
380
|
+
|
|
381
|
+
### Security Baseline
|
|
382
|
+
|
|
383
|
+
Security findings are stored in your baseline and compared across runs:
|
|
384
|
+
|
|
385
|
+
```bash
|
|
386
|
+
# Enable security testing in config, then run check
|
|
387
|
+
bellwether check
|
|
388
|
+
bellwether baseline save
|
|
389
|
+
|
|
390
|
+
# On next run, compare security posture
|
|
391
|
+
bellwether check
|
|
392
|
+
# Reports: new findings, resolved findings, risk score changes
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
### Output
|
|
396
|
+
|
|
397
|
+
Security findings appear in:
|
|
398
|
+
- **CONTRACT.md** - Security Baseline section with findings and risk scores
|
|
399
|
+
- **Drift reports** - New/resolved findings when comparing baselines
|
|
400
|
+
- **SARIF format** - Integrates with GitHub Code Scanning
|
|
401
|
+
|
|
402
|
+
### Risk Levels
|
|
403
|
+
|
|
404
|
+
| Level | Score Range | Description |
|
|
405
|
+
|:------|:------------|:------------|
|
|
406
|
+
| Critical | 80-100 | Immediate action required |
|
|
407
|
+
| High | 60-79 | Serious vulnerability |
|
|
408
|
+
| Medium | 40-59 | Moderate risk |
|
|
409
|
+
| Low | 20-39 | Minor concern |
|
|
410
|
+
| Info | 0-19 | Informational |
|
|
411
|
+
|
|
412
|
+
## Semantic Validation
|
|
413
|
+
|
|
414
|
+
The check command automatically infers semantic types from parameter names and descriptions, then generates targeted validation tests.
|
|
415
|
+
|
|
416
|
+
### Inferred Semantic Types
|
|
417
|
+
|
|
418
|
+
| Type | Example Parameters | Validation |
|
|
419
|
+
|:-----|:-------------------|:-----------|
|
|
420
|
+
| `date_iso8601` | `created_date`, `birth_day` | YYYY-MM-DD format |
|
|
421
|
+
| `datetime` | `created_at`, `updated_at` | ISO 8601 datetime |
|
|
422
|
+
| `timestamp` | `unix_epoch`, `time_ms` | Positive integer |
|
|
423
|
+
| `email` | `user_email`, `contact_email` | Valid email format |
|
|
424
|
+
| `url` | `website_url`, `api_endpoint` | Valid URL format |
|
|
425
|
+
| `identifier` | `user_id`, `order_uuid` | Non-empty string |
|
|
426
|
+
| `ip_address` | `server_ip`, `client_ip` | IPv4 or IPv6 |
|
|
427
|
+
| `phone` | `phone_number`, `mobile` | At least 7 digits |
|
|
428
|
+
| `percentage` | `tax_rate`, `progress` | Numeric value |
|
|
429
|
+
| `amount_currency` | `total_price`, `balance` | Numeric value |
|
|
430
|
+
| `file_path` | `file_path`, `directory` | Path string |
|
|
431
|
+
| `json` | `config_data`, `payload` | Valid JSON |
|
|
432
|
+
| `base64` | `encoded_data`, `b64_content` | Valid base64 |
|
|
433
|
+
| `regex` | `filter_pattern`, `regex` | Valid regex |
|
|
434
|
+
|
|
435
|
+
### How It Works
|
|
436
|
+
|
|
437
|
+
1. **Inference**: Parameters are analyzed based on name patterns and descriptions
|
|
438
|
+
2. **Test Generation**: Invalid values are generated for each inferred type
|
|
439
|
+
3. **Validation**: Tests verify that tools properly reject invalid semantic values
|
|
440
|
+
4. **Documentation**: Inferred types appear in CONTRACT.md
|
|
441
|
+
|
|
442
|
+
Semantic validation runs automatically as part of `bellwether check` - no additional flags needed.
|
|
443
|
+
|
|
444
|
+
## Response Schema Tracking
|
|
445
|
+
|
|
446
|
+
The check command tracks response schema consistency across multiple test samples, detecting when tools return inconsistent or evolving response structures.
|
|
447
|
+
|
|
448
|
+
### What It Tracks
|
|
449
|
+
|
|
450
|
+
| Aspect | Detection | Impact |
|
|
451
|
+
|:-------|:----------|:-------|
|
|
452
|
+
| **Field consistency** | Fields appearing inconsistently across samples | Schema instability |
|
|
453
|
+
| **Type changes** | Field types varying between responses | Breaking changes |
|
|
454
|
+
| **Required changes** | Fields becoming required/optional | Contract changes |
|
|
455
|
+
| **Schema evolution** | Structural changes between baselines | API drift |
|
|
456
|
+
|
|
457
|
+
### Stability Grades
|
|
458
|
+
|
|
459
|
+
| Grade | Confidence | Meaning |
|
|
460
|
+
|:------|:-----------|:--------|
|
|
461
|
+
| A | 95%+ | Fully stable, consistent responses |
|
|
462
|
+
| B | 85%+ | Mostly stable, minor variations |
|
|
463
|
+
| C | 70%+ | Moderately stable, some inconsistency |
|
|
464
|
+
| D | 50%+ | Unstable, significant variations |
|
|
465
|
+
| F | <50% | Very unstable, unreliable responses |
|
|
466
|
+
| N/A | - | Insufficient samples (< 3) |
|
|
467
|
+
|
|
468
|
+
### Breaking vs Non-Breaking Changes
|
|
469
|
+
|
|
470
|
+
**Breaking changes** (fail CI):
|
|
471
|
+
- Fields removed from responses
|
|
472
|
+
- Types changed to incompatible types (e.g., `string` → `number`)
|
|
473
|
+
- Previously optional fields becoming required
|
|
474
|
+
|
|
475
|
+
**Non-breaking changes** (warning):
|
|
476
|
+
- New fields added
|
|
477
|
+
- Required fields becoming optional
|
|
478
|
+
- Compatible type widening (e.g., `integer` → `number`)
|
|
479
|
+
|
|
480
|
+
### Output
|
|
481
|
+
|
|
482
|
+
Schema evolution findings appear in:
|
|
483
|
+
- **CONTRACT.md** - Schema Stability section with grades and consistency metrics
|
|
484
|
+
- **Drift reports** - Structure changes, breaking changes, stability changes
|
|
485
|
+
- **JUnit/SARIF** - Test cases for schema evolution issues
|
|
486
|
+
|
|
487
|
+
Response schema tracking runs automatically during `bellwether check` - no additional flags needed.
|
|
488
|
+
|
|
489
|
+
## Error Analysis
|
|
490
|
+
|
|
491
|
+
The check command provides enhanced error analysis with root cause detection and remediation suggestions.
|
|
492
|
+
|
|
493
|
+
### What It Analyzes
|
|
494
|
+
|
|
495
|
+
| Aspect | Detection | Impact |
|
|
496
|
+
|:-------|:----------|:-------|
|
|
497
|
+
| **HTTP status codes** | Parses 4xx/5xx codes from messages | Error categorization |
|
|
498
|
+
| **Root cause** | Infers cause from error patterns | Debugging guidance |
|
|
499
|
+
| **Remediation** | Generates fix suggestions | Actionable solutions |
|
|
500
|
+
| **Transient errors** | Identifies retryable errors | Retry strategies |
|
|
501
|
+
| **Error trends** | Tracks errors across baselines | Regression detection |
|
|
502
|
+
|
|
503
|
+
### Error Categories
|
|
504
|
+
|
|
505
|
+
| Category | HTTP Codes | Description |
|
|
506
|
+
|:---------|:-----------|:------------|
|
|
507
|
+
| Validation Error | 400 | Invalid input or missing parameters |
|
|
508
|
+
| Authentication Error | 401, 403 | Auth or permission failure |
|
|
509
|
+
| Not Found | 404 | Resource does not exist |
|
|
510
|
+
| Conflict | 409 | Resource state conflict |
|
|
511
|
+
| Rate Limited | 429 | Too many requests |
|
|
512
|
+
| Server Error | 5xx | Internal server error |
|
|
513
|
+
|
|
514
|
+
### Error Trend Detection
|
|
515
|
+
|
|
516
|
+
When comparing baselines, Bellwether tracks:
|
|
517
|
+
- **New error types** - Errors that didn't occur before
|
|
518
|
+
- **Resolved errors** - Errors that no longer occur
|
|
519
|
+
- **Increasing errors** - Error frequency growing >50%
|
|
520
|
+
- **Decreasing errors** - Error frequency reduced >50%
|
|
521
|
+
|
|
522
|
+
### Output
|
|
523
|
+
|
|
524
|
+
Error analysis findings appear in:
|
|
525
|
+
- **CONTRACT.md** - Error Analysis section with root causes and remediations
|
|
526
|
+
- **Drift reports** - Error trend changes between baselines
|
|
527
|
+
- **JUnit/SARIF** - Test cases for error trend issues
|
|
528
|
+
|
|
529
|
+
Error analysis runs automatically during `bellwether check` - no additional flags needed.
|
|
530
|
+
|
|
531
|
+
## Performance Confidence
|
|
532
|
+
|
|
533
|
+
The check command calculates statistical confidence for performance metrics, indicating how reliable your performance baselines are.
|
|
534
|
+
|
|
535
|
+
### What It Measures
|
|
536
|
+
|
|
537
|
+
| Metric | Description | Impact |
|
|
538
|
+
|:-------|:------------|:-------|
|
|
539
|
+
| **Sample count** | Number of latency measurements | More samples = higher confidence |
|
|
540
|
+
| **Standard deviation** | Variability in response times | Lower = more consistent |
|
|
541
|
+
| **Coefficient of variation** | Relative variability (stdDev / mean) | Lower = more predictable |
|
|
542
|
+
|
|
543
|
+
### Confidence Levels
|
|
544
|
+
|
|
545
|
+
| Level | Requirements | Meaning |
|
|
546
|
+
|:------|:-------------|:--------|
|
|
547
|
+
| HIGH | 10+ samples, CV ≤ 30% | Reliable baseline for regression detection |
|
|
548
|
+
| MEDIUM | 5+ samples, CV ≤ 50% | Moderately reliable, use with caution |
|
|
549
|
+
| LOW | < 5 samples or CV > 50% | Unreliable baseline, collect more data |
|
|
550
|
+
|
|
551
|
+
### Why It Matters
|
|
552
|
+
|
|
553
|
+
Performance regressions detected with low confidence may not be real:
|
|
554
|
+
- **Few samples**: Random variation can look like regression
|
|
555
|
+
- **High variability**: Tool may have inconsistent performance
|
|
556
|
+
|
|
557
|
+
When confidence is low, the CLI recommends:
|
|
558
|
+
```
|
|
559
|
+
Increase `check.sampling.minSamples` for reliable baselines
|
|
560
|
+
```
|
|
561
|
+
|
|
562
|
+
### Output
|
|
563
|
+
|
|
564
|
+
Confidence information appears in:
|
|
565
|
+
- **CONTRACT.md** - Performance Baseline section with confidence column
|
|
566
|
+
- **Drift reports** - Regression markers indicate reliability
|
|
567
|
+
- **JUnit/SARIF** - Test cases for low confidence tools
|
|
568
|
+
- **GitHub Actions** - Annotations for confidence warnings
|
|
569
|
+
|
|
570
|
+
### Example Output
|
|
571
|
+
|
|
572
|
+
```
|
|
573
|
+
─── Performance Regressions ───
|
|
574
|
+
! read_file: 100ms → 150ms (+50%) (low confidence)
|
|
575
|
+
! write_file: 200ms → 250ms (+25%)
|
|
576
|
+
|
|
577
|
+
Note: Some tools have low confidence metrics.
|
|
578
|
+
Run with more samples for reliable baselines: read_file
|
|
579
|
+
```
|
|
580
|
+
|
|
581
|
+
Performance confidence runs automatically during `bellwether check` - no additional flags needed.
|
|
582
|
+
|
|
583
|
+
## Documentation Quality Scoring
|
|
584
|
+
|
|
585
|
+
The check command calculates a documentation quality score for your MCP server, evaluating how well tools and parameters are documented.
|
|
586
|
+
|
|
587
|
+
### What It Measures
|
|
588
|
+
|
|
589
|
+
| Component | Weight | Description |
|
|
590
|
+
|:----------|:-------|:------------|
|
|
591
|
+
| **Description Coverage** | 30% | Percentage of tools with descriptions |
|
|
592
|
+
| **Description Quality** | 30% | Length, clarity, and actionable language |
|
|
593
|
+
| **Parameter Documentation** | 25% | Percentage of parameters with descriptions |
|
|
594
|
+
| **Example Coverage** | 15% | Percentage of tools with schema examples |
|
|
595
|
+
|
|
596
|
+
### Grade Thresholds
|
|
597
|
+
|
|
598
|
+
| Grade | Score Range | Meaning |
|
|
599
|
+
|:------|:------------|:--------|
|
|
600
|
+
| A | 90-100 | Excellent documentation |
|
|
601
|
+
| B | 80-89 | Good documentation |
|
|
602
|
+
| C | 70-79 | Acceptable documentation |
|
|
603
|
+
| D | 60-69 | Poor documentation |
|
|
604
|
+
| F | 0-59 | Failing documentation |
|
|
605
|
+
|
|
606
|
+
### Quality Criteria
|
|
607
|
+
|
|
608
|
+
Descriptions are scored based on:
|
|
609
|
+
- **Length**: At least 50 characters for "good", 20+ for "acceptable"
|
|
610
|
+
- **Imperative verbs**: Starting with action words (Creates, Gets, Deletes)
|
|
611
|
+
- **Behavior description**: Mentioning what the tool returns or provides
|
|
612
|
+
- **Examples/specifics**: Including "e.g.", "example", or "such as"
|
|
613
|
+
|
|
614
|
+
### Issue Types
|
|
615
|
+
|
|
616
|
+
| Issue | Severity | Description |
|
|
617
|
+
|:------|:---------|:------------|
|
|
618
|
+
| Missing Description | Error | Tool has no description |
|
|
619
|
+
| Short Description | Warning | Description under 20 characters |
|
|
620
|
+
| Missing Param Description | Warning | Parameter has no description |
|
|
621
|
+
| No Examples | Info | Schema has no examples |
|
|
622
|
+
|
|
623
|
+
### Output
|
|
624
|
+
|
|
625
|
+
Documentation scores appear in:
|
|
626
|
+
- **CONTRACT.md** - Documentation Quality section with breakdown
|
|
627
|
+
- **Drift reports** - Score changes between baselines
|
|
628
|
+
- **JUnit/SARIF** - Test cases for documentation issues
|
|
629
|
+
- **GitHub Actions** - Annotations for quality degradation
|
|
630
|
+
|
|
631
|
+
### Example Output
|
|
632
|
+
|
|
633
|
+
```
|
|
634
|
+
─── Documentation Quality ───
|
|
635
|
+
✓ Score: 60 → 85 (+25)
|
|
636
|
+
Grade: D → B
|
|
637
|
+
✓ Issues fixed: 3
|
|
638
|
+
|
|
639
|
+
─── Statistics ───
|
|
640
|
+
Documentation score: 85/100 (B)
|
|
641
|
+
Documentation change: +25
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
Documentation quality scoring runs automatically during `bellwether check` - no additional flags needed.
|
|
645
|
+
|
|
646
|
+
## Custom Test Scenarios
|
|
647
|
+
|
|
648
|
+
Define deterministic tests in `bellwether-tests.yaml`:
|
|
649
|
+
|
|
650
|
+
```yaml
|
|
651
|
+
version: "1"
|
|
652
|
+
scenarios:
|
|
653
|
+
- tool: get_weather
|
|
654
|
+
args:
|
|
655
|
+
location: "San Francisco"
|
|
656
|
+
assertions:
|
|
657
|
+
- path: "content[0].text"
|
|
658
|
+
condition: "contains"
|
|
659
|
+
value: "temperature"
|
|
660
|
+
```
|
|
661
|
+
|
|
662
|
+
Reference in your config:
|
|
663
|
+
|
|
664
|
+
```yaml
|
|
665
|
+
# bellwether.yaml
|
|
666
|
+
scenarios:
|
|
667
|
+
path: "./bellwether-tests.yaml"
|
|
668
|
+
only: true # Run only scenarios, no LLM tests
|
|
669
|
+
```
|
|
670
|
+
|
|
671
|
+
Then run:
|
|
672
|
+
|
|
673
|
+
```bash
|
|
674
|
+
bellwether check # Run scenarios as part of check
|
|
675
|
+
bellwether explore # Run scenarios as part of explore
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
## Presets
|
|
679
|
+
|
|
680
|
+
| Preset | Optimized For | Description |
|
|
681
|
+
|:-------|:--------------|:------------|
|
|
682
|
+
| (default) | check | Zero LLM, free, deterministic |
|
|
683
|
+
| `ci` | check | Optimized for CI/CD, fails on drift |
|
|
684
|
+
| `security` | explore | Security + technical personas, OpenAI |
|
|
685
|
+
| `thorough` | explore | All 4 personas, workflow discovery |
|
|
686
|
+
| `local` | explore | Local Ollama, free, private |
|
|
687
|
+
|
|
688
|
+
Use with: `bellwether init --preset <name> npx @mcp/server`
|
|
689
|
+
|
|
690
|
+
## GitHub Action
|
|
691
|
+
|
|
692
|
+
```yaml
|
|
693
|
+
- name: Detect Behavioral Drift
|
|
694
|
+
uses: dotsetlabs/bellwether/action@v1
|
|
695
|
+
with:
|
|
696
|
+
server-command: 'npx @mcp/your-server'
|
|
697
|
+
baseline-path: './bellwether-baseline.json'
|
|
698
|
+
fail-on-severity: 'warning'
|
|
699
|
+
```
|
|
700
|
+
|
|
701
|
+
See [action/README.md](./action/README.md) for full documentation.
|
|
702
|
+
|
|
703
|
+
## Environment Variables
|
|
704
|
+
|
|
705
|
+
| Variable | Description |
|
|
706
|
+
|:---------|:------------|
|
|
707
|
+
| `OPENAI_API_KEY` | OpenAI API key (explore command) |
|
|
708
|
+
| `ANTHROPIC_API_KEY` | Anthropic API key (explore command) |
|
|
709
|
+
| `OLLAMA_BASE_URL` | Ollama server URL (default: `http://localhost:11434`) |
|
|
710
|
+
| `BELLWETHER_SESSION` | Cloud session token for CI/CD |
|
|
711
|
+
| `BELLWETHER_API_URL` | Cloud API URL (default: `https://api.bellwether.sh`) |
|
|
712
|
+
| `BELLWETHER_TEAM_ID` | Override active team for cloud operations (multi-team CI/CD) |
|
|
713
|
+
| `BELLWETHER_REGISTRY_URL` | Registry API URL override (for self-hosted registries) |
|
|
714
|
+
|
|
715
|
+
See [.env.example](./.env.example) for full documentation.
|
|
716
|
+
|
|
717
|
+
## Development
|
|
718
|
+
|
|
719
|
+
```bash
|
|
720
|
+
git clone https://github.com/dotsetlabs/bellwether
|
|
721
|
+
cd bellwether/cli
|
|
722
|
+
npm install
|
|
723
|
+
npm run build
|
|
724
|
+
npm test
|
|
725
|
+
|
|
726
|
+
# Run locally
|
|
727
|
+
./dist/cli/index.js check npx @mcp/server
|
|
728
|
+
./dist/cli/index.js explore npx @mcp/server
|
|
729
|
+
```
|
|
730
|
+
|
|
731
|
+
## License
|
|
732
|
+
|
|
733
|
+
MIT License - see [LICENSE](./LICENSE) for details.
|
|
734
|
+
|
|
735
|
+
---
|
|
736
|
+
|
|
737
|
+
<p align="center">
|
|
738
|
+
Built by <a href="https://dotsetlabs.com">Dotset Labs LLC</a>
|
|
739
|
+
</p>
|