@dotsetlabs/bellwether 0.10.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +291 -0
- package/LICENSE +21 -0
- package/README.md +739 -0
- package/dist/auth/credentials.d.ts +64 -0
- package/dist/auth/credentials.js +218 -0
- package/dist/auth/index.d.ts +6 -0
- package/dist/auth/index.js +6 -0
- package/dist/auth/keychain.d.ts +64 -0
- package/dist/auth/keychain.js +268 -0
- package/dist/baseline/ab-testing.d.ts +80 -0
- package/dist/baseline/ab-testing.js +236 -0
- package/dist/baseline/ai-compatibility-scorer.d.ts +95 -0
- package/dist/baseline/ai-compatibility-scorer.js +606 -0
- package/dist/baseline/calibration.d.ts +77 -0
- package/dist/baseline/calibration.js +136 -0
- package/dist/baseline/category-matching.d.ts +85 -0
- package/dist/baseline/category-matching.js +289 -0
- package/dist/baseline/change-impact-analyzer.d.ts +98 -0
- package/dist/baseline/change-impact-analyzer.js +592 -0
- package/dist/baseline/comparator.d.ts +64 -0
- package/dist/baseline/comparator.js +916 -0
- package/dist/baseline/confidence.d.ts +55 -0
- package/dist/baseline/confidence.js +122 -0
- package/dist/baseline/converter.d.ts +61 -0
- package/dist/baseline/converter.js +585 -0
- package/dist/baseline/dependency-analyzer.d.ts +89 -0
- package/dist/baseline/dependency-analyzer.js +567 -0
- package/dist/baseline/deprecation-tracker.d.ts +133 -0
- package/dist/baseline/deprecation-tracker.js +322 -0
- package/dist/baseline/diff.d.ts +55 -0
- package/dist/baseline/diff.js +1584 -0
- package/dist/baseline/documentation-scorer.d.ts +205 -0
- package/dist/baseline/documentation-scorer.js +466 -0
- package/dist/baseline/embeddings.d.ts +118 -0
- package/dist/baseline/embeddings.js +251 -0
- package/dist/baseline/error-analyzer.d.ts +198 -0
- package/dist/baseline/error-analyzer.js +721 -0
- package/dist/baseline/evaluation/evaluator.d.ts +42 -0
- package/dist/baseline/evaluation/evaluator.js +323 -0
- package/dist/baseline/evaluation/expanded-dataset.d.ts +45 -0
- package/dist/baseline/evaluation/expanded-dataset.js +1164 -0
- package/dist/baseline/evaluation/golden-dataset.d.ts +58 -0
- package/dist/baseline/evaluation/golden-dataset.js +717 -0
- package/dist/baseline/evaluation/index.d.ts +15 -0
- package/dist/baseline/evaluation/index.js +15 -0
- package/dist/baseline/evaluation/types.d.ts +186 -0
- package/dist/baseline/evaluation/types.js +8 -0
- package/dist/baseline/external-dependency-detector.d.ts +181 -0
- package/dist/baseline/external-dependency-detector.js +524 -0
- package/dist/baseline/golden-output.d.ts +162 -0
- package/dist/baseline/golden-output.js +636 -0
- package/dist/baseline/health-scorer.d.ts +174 -0
- package/dist/baseline/health-scorer.js +451 -0
- package/dist/baseline/incremental-checker.d.ts +97 -0
- package/dist/baseline/incremental-checker.js +174 -0
- package/dist/baseline/index.d.ts +31 -0
- package/dist/baseline/index.js +42 -0
- package/dist/baseline/migration-generator.d.ts +137 -0
- package/dist/baseline/migration-generator.js +554 -0
- package/dist/baseline/migrations.d.ts +60 -0
- package/dist/baseline/migrations.js +197 -0
- package/dist/baseline/performance-tracker.d.ts +214 -0
- package/dist/baseline/performance-tracker.js +577 -0
- package/dist/baseline/pr-comment-generator.d.ts +117 -0
- package/dist/baseline/pr-comment-generator.js +546 -0
- package/dist/baseline/response-fingerprint.d.ts +127 -0
- package/dist/baseline/response-fingerprint.js +728 -0
- package/dist/baseline/response-schema-tracker.d.ts +129 -0
- package/dist/baseline/response-schema-tracker.js +420 -0
- package/dist/baseline/risk-scorer.d.ts +54 -0
- package/dist/baseline/risk-scorer.js +434 -0
- package/dist/baseline/saver.d.ts +89 -0
- package/dist/baseline/saver.js +554 -0
- package/dist/baseline/scenario-generator.d.ts +151 -0
- package/dist/baseline/scenario-generator.js +905 -0
- package/dist/baseline/schema-compare.d.ts +86 -0
- package/dist/baseline/schema-compare.js +557 -0
- package/dist/baseline/schema-evolution.d.ts +189 -0
- package/dist/baseline/schema-evolution.js +467 -0
- package/dist/baseline/semantic.d.ts +203 -0
- package/dist/baseline/semantic.js +908 -0
- package/dist/baseline/synonyms.d.ts +60 -0
- package/dist/baseline/synonyms.js +386 -0
- package/dist/baseline/telemetry.d.ts +165 -0
- package/dist/baseline/telemetry.js +294 -0
- package/dist/baseline/test-pruner.d.ts +120 -0
- package/dist/baseline/test-pruner.js +387 -0
- package/dist/baseline/types.d.ts +449 -0
- package/dist/baseline/types.js +5 -0
- package/dist/baseline/version.d.ts +138 -0
- package/dist/baseline/version.js +206 -0
- package/dist/cache/index.d.ts +5 -0
- package/dist/cache/index.js +5 -0
- package/dist/cache/response-cache.d.ts +151 -0
- package/dist/cache/response-cache.js +287 -0
- package/dist/ci/index.d.ts +60 -0
- package/dist/ci/index.js +342 -0
- package/dist/cli/commands/auth.d.ts +12 -0
- package/dist/cli/commands/auth.js +352 -0
- package/dist/cli/commands/badge.d.ts +3 -0
- package/dist/cli/commands/badge.js +74 -0
- package/dist/cli/commands/baseline-accept.d.ts +15 -0
- package/dist/cli/commands/baseline-accept.js +178 -0
- package/dist/cli/commands/baseline-migrate.d.ts +12 -0
- package/dist/cli/commands/baseline-migrate.js +164 -0
- package/dist/cli/commands/baseline.d.ts +14 -0
- package/dist/cli/commands/baseline.js +449 -0
- package/dist/cli/commands/beta.d.ts +10 -0
- package/dist/cli/commands/beta.js +231 -0
- package/dist/cli/commands/check.d.ts +11 -0
- package/dist/cli/commands/check.js +820 -0
- package/dist/cli/commands/cloud/badge.d.ts +3 -0
- package/dist/cli/commands/cloud/badge.js +74 -0
- package/dist/cli/commands/cloud/diff.d.ts +6 -0
- package/dist/cli/commands/cloud/diff.js +79 -0
- package/dist/cli/commands/cloud/history.d.ts +6 -0
- package/dist/cli/commands/cloud/history.js +102 -0
- package/dist/cli/commands/cloud/link.d.ts +9 -0
- package/dist/cli/commands/cloud/link.js +119 -0
- package/dist/cli/commands/cloud/login.d.ts +7 -0
- package/dist/cli/commands/cloud/login.js +499 -0
- package/dist/cli/commands/cloud/projects.d.ts +6 -0
- package/dist/cli/commands/cloud/projects.js +44 -0
- package/dist/cli/commands/cloud/shared.d.ts +7 -0
- package/dist/cli/commands/cloud/shared.js +42 -0
- package/dist/cli/commands/cloud/teams.d.ts +8 -0
- package/dist/cli/commands/cloud/teams.js +169 -0
- package/dist/cli/commands/cloud/upload.d.ts +8 -0
- package/dist/cli/commands/cloud/upload.js +181 -0
- package/dist/cli/commands/contract.d.ts +11 -0
- package/dist/cli/commands/contract.js +280 -0
- package/dist/cli/commands/discover.d.ts +3 -0
- package/dist/cli/commands/discover.js +82 -0
- package/dist/cli/commands/eval.d.ts +9 -0
- package/dist/cli/commands/eval.js +187 -0
- package/dist/cli/commands/explore.d.ts +11 -0
- package/dist/cli/commands/explore.js +437 -0
- package/dist/cli/commands/feedback.d.ts +9 -0
- package/dist/cli/commands/feedback.js +174 -0
- package/dist/cli/commands/golden.d.ts +12 -0
- package/dist/cli/commands/golden.js +407 -0
- package/dist/cli/commands/history.d.ts +10 -0
- package/dist/cli/commands/history.js +202 -0
- package/dist/cli/commands/init.d.ts +9 -0
- package/dist/cli/commands/init.js +219 -0
- package/dist/cli/commands/interview.d.ts +3 -0
- package/dist/cli/commands/interview.js +903 -0
- package/dist/cli/commands/link.d.ts +10 -0
- package/dist/cli/commands/link.js +169 -0
- package/dist/cli/commands/login.d.ts +7 -0
- package/dist/cli/commands/login.js +499 -0
- package/dist/cli/commands/preset.d.ts +33 -0
- package/dist/cli/commands/preset.js +297 -0
- package/dist/cli/commands/profile.d.ts +33 -0
- package/dist/cli/commands/profile.js +286 -0
- package/dist/cli/commands/registry.d.ts +11 -0
- package/dist/cli/commands/registry.js +146 -0
- package/dist/cli/commands/shared.d.ts +79 -0
- package/dist/cli/commands/shared.js +196 -0
- package/dist/cli/commands/teams.d.ts +8 -0
- package/dist/cli/commands/teams.js +169 -0
- package/dist/cli/commands/test.d.ts +9 -0
- package/dist/cli/commands/test.js +500 -0
- package/dist/cli/commands/upload.d.ts +8 -0
- package/dist/cli/commands/upload.js +223 -0
- package/dist/cli/commands/validate-config.d.ts +6 -0
- package/dist/cli/commands/validate-config.js +35 -0
- package/dist/cli/commands/verify.d.ts +11 -0
- package/dist/cli/commands/verify.js +283 -0
- package/dist/cli/commands/watch.d.ts +12 -0
- package/dist/cli/commands/watch.js +253 -0
- package/dist/cli/index.d.ts +3 -0
- package/dist/cli/index.js +178 -0
- package/dist/cli/interactive.d.ts +47 -0
- package/dist/cli/interactive.js +216 -0
- package/dist/cli/output/terminal-reporter.d.ts +19 -0
- package/dist/cli/output/terminal-reporter.js +104 -0
- package/dist/cli/output.d.ts +226 -0
- package/dist/cli/output.js +438 -0
- package/dist/cli/utils/env.d.ts +5 -0
- package/dist/cli/utils/env.js +14 -0
- package/dist/cli/utils/progress.d.ts +59 -0
- package/dist/cli/utils/progress.js +206 -0
- package/dist/cli/utils/server-context.d.ts +10 -0
- package/dist/cli/utils/server-context.js +36 -0
- package/dist/cloud/auth.d.ts +144 -0
- package/dist/cloud/auth.js +374 -0
- package/dist/cloud/client.d.ts +24 -0
- package/dist/cloud/client.js +65 -0
- package/dist/cloud/http-client.d.ts +38 -0
- package/dist/cloud/http-client.js +215 -0
- package/dist/cloud/index.d.ts +23 -0
- package/dist/cloud/index.js +25 -0
- package/dist/cloud/mock-client.d.ts +107 -0
- package/dist/cloud/mock-client.js +545 -0
- package/dist/cloud/types.d.ts +515 -0
- package/dist/cloud/types.js +15 -0
- package/dist/config/defaults.d.ts +160 -0
- package/dist/config/defaults.js +169 -0
- package/dist/config/loader.d.ts +24 -0
- package/dist/config/loader.js +122 -0
- package/dist/config/template.d.ts +42 -0
- package/dist/config/template.js +647 -0
- package/dist/config/validator.d.ts +2112 -0
- package/dist/config/validator.js +658 -0
- package/dist/constants/cloud.d.ts +107 -0
- package/dist/constants/cloud.js +110 -0
- package/dist/constants/core.d.ts +521 -0
- package/dist/constants/core.js +556 -0
- package/dist/constants/testing.d.ts +1283 -0
- package/dist/constants/testing.js +1568 -0
- package/dist/constants.d.ts +10 -0
- package/dist/constants.js +10 -0
- package/dist/contract/index.d.ts +6 -0
- package/dist/contract/index.js +5 -0
- package/dist/contract/validator.d.ts +177 -0
- package/dist/contract/validator.js +574 -0
- package/dist/cost/index.d.ts +6 -0
- package/dist/cost/index.js +5 -0
- package/dist/cost/tracker.d.ts +134 -0
- package/dist/cost/tracker.js +313 -0
- package/dist/discovery/discovery.d.ts +16 -0
- package/dist/discovery/discovery.js +173 -0
- package/dist/discovery/types.d.ts +51 -0
- package/dist/discovery/types.js +2 -0
- package/dist/docs/agents.d.ts +3 -0
- package/dist/docs/agents.js +995 -0
- package/dist/docs/contract.d.ts +51 -0
- package/dist/docs/contract.js +1681 -0
- package/dist/docs/generator.d.ts +4 -0
- package/dist/docs/generator.js +4 -0
- package/dist/docs/html-reporter.d.ts +9 -0
- package/dist/docs/html-reporter.js +757 -0
- package/dist/docs/index.d.ts +10 -0
- package/dist/docs/index.js +11 -0
- package/dist/docs/junit-reporter.d.ts +18 -0
- package/dist/docs/junit-reporter.js +210 -0
- package/dist/docs/report.d.ts +14 -0
- package/dist/docs/report.js +44 -0
- package/dist/docs/sarif-reporter.d.ts +19 -0
- package/dist/docs/sarif-reporter.js +335 -0
- package/dist/docs/shared.d.ts +35 -0
- package/dist/docs/shared.js +162 -0
- package/dist/docs/templates.d.ts +12 -0
- package/dist/docs/templates.js +76 -0
- package/dist/errors/index.d.ts +6 -0
- package/dist/errors/index.js +6 -0
- package/dist/errors/retry.d.ts +92 -0
- package/dist/errors/retry.js +323 -0
- package/dist/errors/types.d.ts +321 -0
- package/dist/errors/types.js +584 -0
- package/dist/index.d.ts +32 -0
- package/dist/index.js +32 -0
- package/dist/interview/dependency-resolver.d.ts +11 -0
- package/dist/interview/dependency-resolver.js +32 -0
- package/dist/interview/interviewer.d.ts +232 -0
- package/dist/interview/interviewer.js +1939 -0
- package/dist/interview/mock-response-generator.d.ts +7 -0
- package/dist/interview/mock-response-generator.js +102 -0
- package/dist/interview/orchestrator.d.ts +237 -0
- package/dist/interview/orchestrator.js +1296 -0
- package/dist/interview/rate-limiter.d.ts +15 -0
- package/dist/interview/rate-limiter.js +55 -0
- package/dist/interview/response-validator.d.ts +10 -0
- package/dist/interview/response-validator.js +132 -0
- package/dist/interview/schema-inferrer.d.ts +8 -0
- package/dist/interview/schema-inferrer.js +71 -0
- package/dist/interview/schema-test-generator.d.ts +71 -0
- package/dist/interview/schema-test-generator.js +834 -0
- package/dist/interview/smart-value-generator.d.ts +155 -0
- package/dist/interview/smart-value-generator.js +554 -0
- package/dist/interview/stateful-test-runner.d.ts +19 -0
- package/dist/interview/stateful-test-runner.js +106 -0
- package/dist/interview/types.d.ts +561 -0
- package/dist/interview/types.js +2 -0
- package/dist/llm/anthropic.d.ts +41 -0
- package/dist/llm/anthropic.js +355 -0
- package/dist/llm/client.d.ts +123 -0
- package/dist/llm/client.js +42 -0
- package/dist/llm/factory.d.ts +38 -0
- package/dist/llm/factory.js +145 -0
- package/dist/llm/fallback.d.ts +140 -0
- package/dist/llm/fallback.js +379 -0
- package/dist/llm/index.d.ts +18 -0
- package/dist/llm/index.js +15 -0
- package/dist/llm/ollama.d.ts +37 -0
- package/dist/llm/ollama.js +330 -0
- package/dist/llm/openai.d.ts +25 -0
- package/dist/llm/openai.js +320 -0
- package/dist/llm/token-budget.d.ts +161 -0
- package/dist/llm/token-budget.js +395 -0
- package/dist/logging/logger.d.ts +70 -0
- package/dist/logging/logger.js +130 -0
- package/dist/metrics/collector.d.ts +106 -0
- package/dist/metrics/collector.js +547 -0
- package/dist/metrics/index.d.ts +7 -0
- package/dist/metrics/index.js +7 -0
- package/dist/metrics/prometheus.d.ts +20 -0
- package/dist/metrics/prometheus.js +241 -0
- package/dist/metrics/types.d.ts +209 -0
- package/dist/metrics/types.js +5 -0
- package/dist/persona/builtins.d.ts +54 -0
- package/dist/persona/builtins.js +219 -0
- package/dist/persona/index.d.ts +8 -0
- package/dist/persona/index.js +8 -0
- package/dist/persona/loader.d.ts +30 -0
- package/dist/persona/loader.js +190 -0
- package/dist/persona/types.d.ts +144 -0
- package/dist/persona/types.js +5 -0
- package/dist/persona/validation.d.ts +94 -0
- package/dist/persona/validation.js +332 -0
- package/dist/prompts/index.d.ts +5 -0
- package/dist/prompts/index.js +5 -0
- package/dist/prompts/templates.d.ts +180 -0
- package/dist/prompts/templates.js +431 -0
- package/dist/registry/client.d.ts +49 -0
- package/dist/registry/client.js +191 -0
- package/dist/registry/index.d.ts +7 -0
- package/dist/registry/index.js +6 -0
- package/dist/registry/types.d.ts +140 -0
- package/dist/registry/types.js +6 -0
- package/dist/scenarios/evaluator.d.ts +43 -0
- package/dist/scenarios/evaluator.js +206 -0
- package/dist/scenarios/index.d.ts +10 -0
- package/dist/scenarios/index.js +9 -0
- package/dist/scenarios/loader.d.ts +20 -0
- package/dist/scenarios/loader.js +285 -0
- package/dist/scenarios/types.d.ts +153 -0
- package/dist/scenarios/types.js +8 -0
- package/dist/security/index.d.ts +17 -0
- package/dist/security/index.js +18 -0
- package/dist/security/payloads.d.ts +61 -0
- package/dist/security/payloads.js +268 -0
- package/dist/security/security-tester.d.ts +42 -0
- package/dist/security/security-tester.js +582 -0
- package/dist/security/types.d.ts +166 -0
- package/dist/security/types.js +8 -0
- package/dist/transport/base-transport.d.ts +59 -0
- package/dist/transport/base-transport.js +38 -0
- package/dist/transport/http-transport.d.ts +67 -0
- package/dist/transport/http-transport.js +238 -0
- package/dist/transport/mcp-client.d.ts +141 -0
- package/dist/transport/mcp-client.js +496 -0
- package/dist/transport/sse-transport.d.ts +88 -0
- package/dist/transport/sse-transport.js +316 -0
- package/dist/transport/stdio-transport.d.ts +43 -0
- package/dist/transport/stdio-transport.js +238 -0
- package/dist/transport/types.d.ts +125 -0
- package/dist/transport/types.js +16 -0
- package/dist/utils/concurrency.d.ts +123 -0
- package/dist/utils/concurrency.js +213 -0
- package/dist/utils/formatters.d.ts +16 -0
- package/dist/utils/formatters.js +37 -0
- package/dist/utils/index.d.ts +8 -0
- package/dist/utils/index.js +8 -0
- package/dist/utils/jsonpath.d.ts +87 -0
- package/dist/utils/jsonpath.js +326 -0
- package/dist/utils/markdown.d.ts +113 -0
- package/dist/utils/markdown.js +265 -0
- package/dist/utils/network.d.ts +14 -0
- package/dist/utils/network.js +17 -0
- package/dist/utils/sanitize.d.ts +92 -0
- package/dist/utils/sanitize.js +191 -0
- package/dist/utils/semantic.d.ts +194 -0
- package/dist/utils/semantic.js +1051 -0
- package/dist/utils/smart-truncate.d.ts +94 -0
- package/dist/utils/smart-truncate.js +361 -0
- package/dist/utils/timeout.d.ts +153 -0
- package/dist/utils/timeout.js +205 -0
- package/dist/utils/yaml-parser.d.ts +58 -0
- package/dist/utils/yaml-parser.js +86 -0
- package/dist/validation/index.d.ts +32 -0
- package/dist/validation/index.js +32 -0
- package/dist/validation/semantic-test-generator.d.ts +50 -0
- package/dist/validation/semantic-test-generator.js +176 -0
- package/dist/validation/semantic-types.d.ts +66 -0
- package/dist/validation/semantic-types.js +94 -0
- package/dist/validation/semantic-validator.d.ts +38 -0
- package/dist/validation/semantic-validator.js +340 -0
- package/dist/verification/index.d.ts +6 -0
- package/dist/verification/index.js +5 -0
- package/dist/verification/types.d.ts +133 -0
- package/dist/verification/types.js +5 -0
- package/dist/verification/verifier.d.ts +30 -0
- package/dist/verification/verifier.js +309 -0
- package/dist/version.d.ts +19 -0
- package/dist/version.js +48 -0
- package/dist/workflow/auto-generator.d.ts +27 -0
- package/dist/workflow/auto-generator.js +513 -0
- package/dist/workflow/discovery.d.ts +40 -0
- package/dist/workflow/discovery.js +195 -0
- package/dist/workflow/executor.d.ts +82 -0
- package/dist/workflow/executor.js +611 -0
- package/dist/workflow/index.d.ts +10 -0
- package/dist/workflow/index.js +10 -0
- package/dist/workflow/loader.d.ts +24 -0
- package/dist/workflow/loader.js +194 -0
- package/dist/workflow/state-tracker.d.ts +98 -0
- package/dist/workflow/state-tracker.js +424 -0
- package/dist/workflow/types.d.ts +337 -0
- package/dist/workflow/types.js +5 -0
- package/package.json +94 -0
- package/schemas/bellwether-check.schema.json +651 -0
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,291 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
## [0.10.0] - 2026-01-24
|
|
6
|
+
|
|
7
|
+
### Features
|
|
8
|
+
|
|
9
|
+
- **Smart test value generation**: New intelligent value generator that produces semantically valid test inputs by:
|
|
10
|
+
- Recognizing patterns in field names (dates, emails, URLs, phone numbers, IDs, etc.)
|
|
11
|
+
- Respecting JSON Schema `format` fields
|
|
12
|
+
- Generating syntactically correct values more likely to be accepted by real tools
|
|
13
|
+
- **Stateful testing**: Tests can now share outputs between tool calls
|
|
14
|
+
- Tool responses are parsed and stored in a shared state map
|
|
15
|
+
- Subsequent tool calls can inject values from prior outputs (e.g., IDs created by one tool used by another)
|
|
16
|
+
- Configurable via `check.statefulTesting.enabled` and `check.statefulTesting.shareOutputsBetweenTools`
|
|
17
|
+
- Maximum chain length configurable via `check.statefulTesting.maxChainLength`
|
|
18
|
+
- **Rate limiting**: Token bucket rate limiter for tool calls
|
|
19
|
+
- Configurable requests per second and burst limits
|
|
20
|
+
- Exponential or linear backoff strategies
|
|
21
|
+
- Automatic retry on rate limit errors
|
|
22
|
+
- Enabled via `check.rateLimit.enabled` in config
|
|
23
|
+
- **Response assertions**: Semantic validation of tool responses
|
|
24
|
+
- Automatic schema inference from successful responses
|
|
25
|
+
- Configurable strict mode for assertion failures
|
|
26
|
+
- Assertion results tracked per interaction and aggregated per tool
|
|
27
|
+
- Enabled via `check.assertions.enabled` in config
|
|
28
|
+
- **External service detection enhancements**: Improved detection with confidence levels
|
|
29
|
+
- `confirmed`: Error messages from the service were observed
|
|
30
|
+
- `likely`: Strong evidence from tool name/description patterns
|
|
31
|
+
- `possible`: Weak evidence, partial matches
|
|
32
|
+
- Evidence breakdown for transparency (fromErrorMessage, fromToolName, fromDescription)
|
|
33
|
+
- Service configuration status tracking (configured, sandboxAvailable, mockAvailable)
|
|
34
|
+
- **Warmup runs**: Skip initial runs before timing samples to account for cold starts
|
|
35
|
+
- Configurable 0-5 warmup runs via `check.warmupRuns`
|
|
36
|
+
- **Config validation warnings**: Non-blocking warnings for configuration issues
|
|
37
|
+
- Displayed before check runs without failing
|
|
38
|
+
- Helps catch common misconfigurations early
|
|
39
|
+
- **Tool-by-tool progress reporting**: Live progress shows reliability and timing per tool as they complete
|
|
40
|
+
|
|
41
|
+
### Enhanced CONTRACT.md Output
|
|
42
|
+
|
|
43
|
+
- **Quick Reference table enhancements**: Now includes P50 latency, confidence indicators
|
|
44
|
+
- **Metrics legend section**: Explains confidence levels and reliability calculations
|
|
45
|
+
- **Validation testing section**: Separate metrics for validation tests vs happy-path tests
|
|
46
|
+
- **Issues detected section**: Aggregated summary of detected issues across tools
|
|
47
|
+
- **Stateful testing section**: Shows state sharing relationships between tools
|
|
48
|
+
- **External service configuration section**: Documents detected external services and their status
|
|
49
|
+
- **Response assertions section**: Documents inferred schemas and assertion rules
|
|
50
|
+
- **Skipped tool handling**: Tools skipped due to missing external service config are documented
|
|
51
|
+
|
|
52
|
+
### Configuration Changes
|
|
53
|
+
|
|
54
|
+
- **New config options**:
|
|
55
|
+
- `check.warmupRuns` - Number of warmup runs before timing (default: 0)
|
|
56
|
+
- `check.smartTestValues` - Enable smart value generation (default: true)
|
|
57
|
+
- `check.statefulTesting.*` - Stateful testing configuration
|
|
58
|
+
- `check.externalServices.*` - External service handling (skip/mock/test modes)
|
|
59
|
+
- `check.assertions.*` - Response assertion configuration
|
|
60
|
+
- `check.rateLimit.*` - Rate limiting configuration
|
|
61
|
+
- `check.metrics.countValidationAsSuccess` - Count validation rejections as success (default: true)
|
|
62
|
+
- `check.metrics.separateValidationMetrics` - Separate validation from happy-path metrics (default: true)
|
|
63
|
+
- `baseline.savePath` - Separate path for saving baselines (default: `.bellwether/bellwether-baseline.json`)
|
|
64
|
+
- **Changed defaults**:
|
|
65
|
+
- `check.sampling.minSamples`: 3 → 10 (more samples for statistical confidence)
|
|
66
|
+
- `check.sampling.targetConfidence`: 'medium' → 'low' (match the lower sample count)
|
|
67
|
+
- `workflows.autoGenerate`: true → false (explicit opt-in for workflow discovery)
|
|
68
|
+
- `workflows.requireSuccessfulDependencies`: new option (default: true)
|
|
69
|
+
- **Parallel testing + stateful testing**: Parallel mode automatically disabled when stateful testing is enabled (state sharing requires sequential execution)
|
|
70
|
+
|
|
71
|
+
### GitHub Action
|
|
72
|
+
|
|
73
|
+
- **Simplified inputs**: Removed CLI-flag-style inputs that are now config-only:
|
|
74
|
+
- Removed: `fail-on-drift`, `parallel`, `parallel-workers`, `incremental`, `incremental-cache-hours`, `performance-threshold`, `security`
|
|
75
|
+
- These are now configured in `bellwether.yaml` only
|
|
76
|
+
- **Improved config path handling**: Action now properly resolves config paths and copies existing configs when needed
|
|
77
|
+
- **New exit code**: Added exit code 5 for low-confidence results
|
|
78
|
+
- **Updated output descriptions**: Clarified severity levels and exit codes
|
|
79
|
+
|
|
80
|
+
### Documentation
|
|
81
|
+
|
|
82
|
+
- **README updates**: Added documentation for previously undocumented commands:
|
|
83
|
+
- `auth add <provider>` and `auth remove <provider>` for managing LLM API keys
|
|
84
|
+
- `baseline accept` command for accepting drift as intentional
|
|
85
|
+
- `contract show` command for displaying generated CONTRACT.md
|
|
86
|
+
- `teams current` command for showing active team
|
|
87
|
+
- **Website documentation**: Updated guides for configuration, CI/CD, workflows, and output formats
|
|
88
|
+
|
|
89
|
+
### Fixes
|
|
90
|
+
|
|
91
|
+
- **Fixed `-p` flag conflict**: Removed `-p` short flag from `init --preset` to avoid conflict with `upload -p/--project`. Use `--preset` for init command
|
|
92
|
+
- **Fixed stdio transport write error handling**: Added error handling for `output.write()` in stdio transport to properly emit errors when subprocess pipe breaks (EPIPE)
|
|
93
|
+
- **Fixed watch command signal handler cleanup**: Signal handlers (SIGINT/SIGTERM) are now properly removed on cleanup to prevent handler accumulation
|
|
94
|
+
- **Added debug logging to silent catches**: Silent catch blocks in Ollama client now log debug messages for better troubleshooting
|
|
95
|
+
- **Fixed minSamples override**: User's `minSamples` config is now respected exactly instead of being overridden by `targetConfidence` minimum
|
|
96
|
+
|
|
97
|
+
## [0.9.0] - 2026-01-23
|
|
98
|
+
|
|
99
|
+
### Documentation
|
|
100
|
+
|
|
101
|
+
- **Full documentation alignment**: Updated CLI docs, website guides, and README to match the config-first workflow and current command structure
|
|
102
|
+
- **New CLI references**: Added documentation for `bellwether golden` and `bellwether contract`
|
|
103
|
+
- **Cloud + registry updates**: Clarified config requirements, defaults, and registry overrides across cloud/registry pages
|
|
104
|
+
|
|
105
|
+
### GitHub Action
|
|
106
|
+
|
|
107
|
+
- **Action docs refresh**: Updated inputs, examples, and output filenames to match current action behavior
|
|
108
|
+
- **Config-first guidance**: Clarified config requirements and output directory expectations
|
|
109
|
+
|
|
110
|
+
### Developer Experience
|
|
111
|
+
|
|
112
|
+
- **Comprehensive .env example**: Added registry URL override and updated guidance for environment configuration
|
|
113
|
+
|
|
114
|
+
## [0.8.1] - 2026-01-22
|
|
115
|
+
|
|
116
|
+
### Features
|
|
117
|
+
|
|
118
|
+
- **Expanded credential resolution**: API keys can now be loaded from `.env` files
|
|
119
|
+
- Project `.env` file (`./.env` in current working directory)
|
|
120
|
+
- Global `.env` file (`~/.bellwether/.env`)
|
|
121
|
+
- Resolution order: config → custom env var → standard env var → project .env → global .env → keychain
|
|
122
|
+
- `bellwether auth status` now shows which `.env` file provided the key
|
|
123
|
+
|
|
124
|
+
### Fixes
|
|
125
|
+
|
|
126
|
+
- **Fixed check mode LLM dependency**: Check mode no longer creates an LLM orchestrator, removing unnecessary dependency on LLM configuration for schema-only validation
|
|
127
|
+
- **Fixed parallel tool testing config**: The `parallelTools` config flag is now properly respected; when disabled, uses sequential execution (concurrency=1)
|
|
128
|
+
- **Fixed `baselineExists()` for directories**: Now correctly returns `false` for directories instead of `true`
|
|
129
|
+
- **Fixed stdio transport error handling**: Invalid JSON in newline-delimited mode now emits an error event for consistent behavior with Content-Length mode
|
|
130
|
+
- **Fixed baseline-accept command tests**: Resolved 13 failing tests in `baseline-accept.test.ts`
|
|
131
|
+
- Fixed schema hash mismatches by using computed `'empty'` hash for tools with empty interactions
|
|
132
|
+
- Fixed integrity hash verification by computing valid hashes with `recalculateIntegrityHash()`
|
|
133
|
+
- Fixed property order in test baselines to match Zod schema order (required for deterministic JSON serialization)
|
|
134
|
+
- Fixed report path from `.bellwether/bellwether-check.json` to `bellwether-check.json`
|
|
135
|
+
- Added missing `responseFingerprint` field to baseline fixtures to match `createBaseline()` output
|
|
136
|
+
|
|
137
|
+
## [0.8.0] - 2026-01-22
|
|
138
|
+
|
|
139
|
+
### Features
|
|
140
|
+
|
|
141
|
+
- **Granular exit codes**: Check command now returns semantic exit codes for CI/CD:
|
|
142
|
+
- `0` = Clean (no changes)
|
|
143
|
+
- `1` = Info-level changes (non-breaking)
|
|
144
|
+
- `2` = Warning-level changes
|
|
145
|
+
- `3` = Breaking changes
|
|
146
|
+
- `4` = Runtime error
|
|
147
|
+
- **JUnit/SARIF output formats**: New `--format` option supports `junit` and `sarif` for CI integration
|
|
148
|
+
- JUnit XML for Jenkins, GitLab CI, CircleCI test reporting
|
|
149
|
+
- SARIF 2.1.0 for GitHub Code Scanning with rule IDs BWH001-BWH004
|
|
150
|
+
- **Configurable severity thresholds**: New `baseline.severity` config section
|
|
151
|
+
- `minimumSeverity` - Filter changes below a severity level
|
|
152
|
+
- `failOnSeverity` - CI failure threshold
|
|
153
|
+
- `suppressWarnings` - Hide warning-level changes
|
|
154
|
+
- `aspectOverrides` - Custom severity per change aspect
|
|
155
|
+
- **Parallel tool testing**: New `--parallel` and `--parallel-workers` options for faster checks
|
|
156
|
+
- Tests tools concurrently with configurable worker count (1-10)
|
|
157
|
+
- Uses mutex for MCP client serialization
|
|
158
|
+
- **Incremental checking**: New `--incremental` option to only test tools with changed schemas
|
|
159
|
+
- Compares current schemas against baseline
|
|
160
|
+
- Reuses cached fingerprints for unchanged tools
|
|
161
|
+
- Significantly faster for large servers
|
|
162
|
+
- **Performance regression detection**: Track and compare tool latency
|
|
163
|
+
- Captures P50/P95 latency and success rate per tool
|
|
164
|
+
- New `--performance-threshold` option (default: 10%)
|
|
165
|
+
- Flags tools with latency regression exceeding threshold
|
|
166
|
+
- **Enhanced CONTRACT.md**: Richer generated documentation
|
|
167
|
+
- Quick reference table with success rates
|
|
168
|
+
- Performance baseline section with latency metrics
|
|
169
|
+
- Example usage from successful interactions (up to 2 per tool)
|
|
170
|
+
- Categorized error patterns (Permission, NotFound, Validation, Timeout, Network)
|
|
171
|
+
- Error summary section aggregating patterns across tools
|
|
172
|
+
- **Detailed schema diff**: Property-level schema change detection
|
|
173
|
+
- Wired existing `compareSchemas()` into baseline comparison
|
|
174
|
+
- Shows specific property additions, removals, and type changes
|
|
175
|
+
- **Edge case handling**: Improved robustness for enterprise workloads
|
|
176
|
+
- Circular reference detection in schemas
|
|
177
|
+
- Unicode normalization for property names
|
|
178
|
+
- Binary content detection
|
|
179
|
+
- Payload size limits (1MB schema, 10MB baseline, 5MB response)
|
|
180
|
+
|
|
181
|
+
### Configuration
|
|
182
|
+
|
|
183
|
+
- New `check:` section in `bellwether.yaml`:
|
|
184
|
+
```yaml
|
|
185
|
+
check:
|
|
186
|
+
incremental: false
|
|
187
|
+
incrementalCacheHours: 168
|
|
188
|
+
parallel: false
|
|
189
|
+
parallelWorkers: 4
|
|
190
|
+
performanceThreshold: 10
|
|
191
|
+
```
|
|
192
|
+
- New `baseline.severity:` section for configurable thresholds
|
|
193
|
+
- CI preset now enables parallel testing by default
|
|
194
|
+
|
|
195
|
+
### CLI Options
|
|
196
|
+
|
|
197
|
+
- `--format <fmt>` - Output format: text, json, compact, github, markdown, junit, sarif
|
|
198
|
+
- `--parallel` - Enable parallel tool testing
|
|
199
|
+
- `--parallel-workers <n>` - Number of concurrent workers (1-10)
|
|
200
|
+
- `--incremental` - Only test tools with changed schemas
|
|
201
|
+
- `--incremental-cache-hours <hours>` - Cache validity for incremental checking
|
|
202
|
+
- `--performance-threshold <n>` - Performance regression threshold (%)
|
|
203
|
+
- `--min-severity <level>` - Minimum severity to report
|
|
204
|
+
- `--fail-on-severity <level>` - CI failure threshold
|
|
205
|
+
|
|
206
|
+
### Documentation
|
|
207
|
+
|
|
208
|
+
- Updated all CLI documentation with new options
|
|
209
|
+
- Added output formats guide with JUnit/SARIF examples
|
|
210
|
+
- Added parallel and incremental checking documentation
|
|
211
|
+
- Updated CI/CD guide with new exit codes and severity thresholds
|
|
212
|
+
- Updated baselines documentation with performance metrics
|
|
213
|
+
- Updated GitHub Action documentation with new inputs/outputs
|
|
214
|
+
|
|
215
|
+
## [0.7.1] - 2026-01-22
|
|
216
|
+
|
|
217
|
+
### Improvements
|
|
218
|
+
|
|
219
|
+
- **Reduced npm package size**: Excluded source maps from published package (682 kB → 445 kB, 35% smaller)
|
|
220
|
+
- **Added CHANGELOG.md to package**: Now included in npm package for version history visibility
|
|
221
|
+
|
|
222
|
+
### Fixes
|
|
223
|
+
|
|
224
|
+
- Replaced `console.warn()` with structured logger in baseline loading for consistent log level filtering
|
|
225
|
+
- Removed unused function parameters in `cloud/client.ts` and `baseline/deprecation-tracker.ts`
|
|
226
|
+
|
|
227
|
+
## [0.7.0] - 2026-01-21
|
|
228
|
+
|
|
229
|
+
### Features
|
|
230
|
+
|
|
231
|
+
- **Drift acceptance workflow**: New `baseline accept` command to accept detected drift as intentional with full audit trail
|
|
232
|
+
- Records who, when, why, and what changes were accepted
|
|
233
|
+
- `--reason` option to document why drift was accepted
|
|
234
|
+
- `--accepted-by` option to record who accepted (for CI/CD bots)
|
|
235
|
+
- `--dry-run` option to preview acceptance without writing
|
|
236
|
+
- `--force` flag required for accepting breaking changes
|
|
237
|
+
- **Accept drift during check**: New `--accept-drift` and `--accept-reason` flags for the check command to accept drift in one step
|
|
238
|
+
- **Acceptance metadata in baselines**: Baselines now include optional `acceptance` field with full audit trail for compliance and team visibility
|
|
239
|
+
|
|
240
|
+
### Fixes
|
|
241
|
+
|
|
242
|
+
- Fixed Date deserialization for `acceptance.acceptedAt` when loading baselines from JSON
|
|
243
|
+
|
|
244
|
+
### Documentation
|
|
245
|
+
|
|
246
|
+
- Added `baseline accept` subcommand documentation
|
|
247
|
+
- Updated `check` command docs with `--accept-drift` and `--accept-reason` options
|
|
248
|
+
- Added acceptance workflow options to CI/CD integration guide
|
|
249
|
+
|
|
250
|
+
## [0.6.1] - 2026-01-21
|
|
251
|
+
|
|
252
|
+
### Features
|
|
253
|
+
|
|
254
|
+
- **Verify command cloud submission**: Added `--project` option to submit verification results directly to Bellwether Cloud
|
|
255
|
+
- **Progress display**: Added progress bar for verification runs showing interview progress
|
|
256
|
+
|
|
257
|
+
### Changes
|
|
258
|
+
|
|
259
|
+
- **Default LLM models updated**: Changed OpenAI default to `gpt-4.1-nano` (budget-friendly, non-reasoning) and Ollama default to `qwen3:8b`
|
|
260
|
+
- **Preset providers updated**: Security and thorough presets now use Anthropic provider by default
|
|
261
|
+
- **Verify command**: Now requires config file; added `--config` option for explicit config path
|
|
262
|
+
|
|
263
|
+
### Documentation
|
|
264
|
+
|
|
265
|
+
- Added `cloud/diff.md` documentation for comparing baseline versions
|
|
266
|
+
- Updated documentation across all CLI commands with improved examples
|
|
267
|
+
- Enhanced verify command documentation with cloud submission examples
|
|
268
|
+
|
|
269
|
+
### Fixes
|
|
270
|
+
|
|
271
|
+
- Fixed test mocks to match updated default models and configurations
|
|
272
|
+
|
|
273
|
+
## [0.6.0] - 2026-01-20
|
|
274
|
+
|
|
275
|
+
Initial public beta release of Bellwether CLI.
|
|
276
|
+
|
|
277
|
+
### Features
|
|
278
|
+
|
|
279
|
+
- **Two testing modes**: `bellwether check` for free, deterministic schema validation and `bellwether explore` for LLM-powered behavioral exploration
|
|
280
|
+
- **Check mode**: Zero-cost structural drift detection without LLM dependencies, generates `CONTRACT.md`
|
|
281
|
+
- **Explore mode**: Multi-persona exploration with OpenAI, Anthropic, or Ollama, generates `AGENTS.md`
|
|
282
|
+
- **Four built-in personas**: Technical Writer, Security Tester, QA Engineer, and Novice User for comprehensive coverage
|
|
283
|
+
- **Baseline management**: Save, compare, and track schema changes over time with `bellwether baseline` commands
|
|
284
|
+
- **Drift detection**: Catch breaking changes before production with configurable severity levels
|
|
285
|
+
- **Workflow testing**: Define multi-step tool sequences with assertions and argument mapping
|
|
286
|
+
- **Custom scenarios**: YAML-based test definitions for repeatable validation
|
|
287
|
+
- **Watch mode**: Continuous testing during development with `bellwether watch`
|
|
288
|
+
- **MCP Registry integration**: Search and discover MCP servers with `bellwether registry`
|
|
289
|
+
- **Cloud integration**: Team collaboration, history tracking, and CI/CD support via Bellwether Cloud
|
|
290
|
+
- **Secure credential storage**: System keychain integration for API keys with `bellwether auth`
|
|
291
|
+
- **Multiple transports**: Support for stdio, SSE, and streamable-http MCP connections
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Dotset Labs LLC
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|