bmad-method-test-architecture-enterprise 1.17.1 → 1.18.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
|
@@ -12,7 +12,7 @@
|
|
|
12
12
|
"name": "bmad-method-test-architecture-enterprise",
|
|
13
13
|
"source": "./",
|
|
14
14
|
"description": "Master Test Architect module for quality strategy, test automation, CI/CD quality gates, and structured testing education. Part of the BMad Method ecosystem.",
|
|
15
|
-
"version": "1.
|
|
15
|
+
"version": "1.18.0",
|
|
16
16
|
"author": {
|
|
17
17
|
"name": "Murat K Ozcan (TEA Creator) & Brian (BMad) Madison"
|
|
18
18
|
},
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"$schema": "https://json.schemastore.org/package.json",
|
|
3
3
|
"name": "bmad-method-test-architecture-enterprise",
|
|
4
|
-
"version": "1.
|
|
4
|
+
"version": "1.18.0",
|
|
5
5
|
"description": "Master Test Architect for quality strategy, test automation, and release gates",
|
|
6
6
|
"keywords": [
|
|
7
7
|
"bmad",
|
|
@@ -0,0 +1,73 @@
|
|
|
1
|
+
# Confidence Gate
|
|
2
|
+
|
|
3
|
+
## Principle
|
|
4
|
+
|
|
5
|
+
When generating tests, scaffolding fixtures, classifying risk, or proposing any non-trivial test artifact, emit a confidence assessment before writing code. If confidence is below the threshold, stop and ask the user instead of generating plausible-looking output built on guesses.
|
|
6
|
+
|
|
7
|
+
## Rationale
|
|
8
|
+
|
|
9
|
+
The failure mode of LLM-generated tests is rarely "refused to try" — it is "generated something plausible that passes locally and breaks silently in CI." Hallucinated selectors, invented endpoint paths, fabricated risk scores, and reverse-engineered schemas all produce code that looks correct and tests nothing real. A confidence gate makes that failure mode loud by forcing the agent to declare its evidence and its unknowns before any artifact is committed.
|
|
10
|
+
|
|
11
|
+
## Required output shape
|
|
12
|
+
|
|
13
|
+
Every non-trivial test artifact proposal must include:
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
Confidence: <1-10>
|
|
17
|
+
Rationale: <one or two sentences citing concrete evidence from the repo or contract>
|
|
18
|
+
Unknowns: <bulleted list of things the agent does not know>
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
The Rationale must cite a file path, a contract document, an existing pattern, or a captured observation. Vague rationale ("based on standard patterns", "looks similar to other tests") is not evidence and forces the score down.
|
|
22
|
+
|
|
23
|
+
## Threshold rule
|
|
24
|
+
|
|
25
|
+
- **Confidence ≥ 7** — proceed with generation.
|
|
26
|
+
- **Confidence 5–6** — proceed but surface the assumptions to the user in the output so they can correct mid-flight.
|
|
27
|
+
- **Confidence < 5** — STOP. Do not generate. Ask the user to resolve the most-blocking Unknown first.
|
|
28
|
+
|
|
29
|
+
## When to apply
|
|
30
|
+
|
|
31
|
+
Apply the gate when generating or proposing:
|
|
32
|
+
|
|
33
|
+
- **Selectors and page objects.** Must have explored the live application via `playwright-cli` or read existing page object patterns. Confidence < 5 if neither.
|
|
34
|
+
- **Endpoint paths and request shapes.** Must have read the OpenAPI / Swagger contract or existing endpoint enums. Confidence < 5 if the endpoint is being invented.
|
|
35
|
+
- **Risk classification (test-design, NFR).** Must cite probability and impact evidence. Confidence < 5 if scoring is vibes-based.
|
|
36
|
+
- **Fixture composition.** Must understand existing `mergeTests` patterns and fixture boundaries in the repo. Confidence < 5 if composing blindly.
|
|
37
|
+
- **Schema authoring (Zod, Ajv, JSON Schema).** Must have a documented contract source (OpenAPI, JSON schema, existing schema file). Confidence < 5 if reverse-engineering from a single sample response.
|
|
38
|
+
- **Data factories.** Must understand the production data shape and constraints. Confidence < 5 if guessing field validity rules.
|
|
39
|
+
|
|
40
|
+
## When NOT to apply
|
|
41
|
+
|
|
42
|
+
- Mechanical refactors with clear scope (rename a variable, add a tag, update an import).
|
|
43
|
+
- Reading or summarizing existing artifacts.
|
|
44
|
+
- Producing reports from already-gathered data.
|
|
45
|
+
- Trivial test additions that copy an existing pattern exactly.
|
|
46
|
+
|
|
47
|
+
The gate exists to prevent fabrication, not to bureaucratize obvious work.
|
|
48
|
+
|
|
49
|
+
## Anti-patterns
|
|
50
|
+
|
|
51
|
+
❌ **Vanity scores.** `Confidence: 9` with no Rationale, or Rationale that does not cite evidence. Score the evidence, not the optimism.
|
|
52
|
+
|
|
53
|
+
❌ **Listing then ignoring Unknowns.** Listing unknowns and then proceeding anyway when Confidence is below threshold. If the gate is below threshold, the only valid next action is to ask the user.
|
|
54
|
+
|
|
55
|
+
❌ **Asking generically.** Asking "should I proceed?" instead of resolving the most-blocking Unknown with a concrete one-sentence question.
|
|
56
|
+
|
|
57
|
+
❌ **Inflating to clear the bar.** Adjusting Confidence upward to avoid the stop rule. If the evidence is weak, the score is weak; resolve the evidence, not the number.
|
|
58
|
+
|
|
59
|
+
## Patterns that work
|
|
60
|
+
|
|
61
|
+
✅ **Cite the source.** "Confidence: 8 — Rationale: read `src/openapi/users.yaml` line 142-167 and existing schema at `tests/api/users.schema.ts`."
|
|
62
|
+
|
|
63
|
+
✅ **One concrete Unknown.** When below threshold, ask one specific question: "Is `POST /users/{id}/role` documented anywhere? I can't find it in the OpenAPI spec and there are no existing tests for it."
|
|
64
|
+
|
|
65
|
+
✅ **Promote evidence.** When the user answers the Unknown, the Rationale gets stronger and Confidence rises legitimately. The gate is a feedback loop, not a checkpoint.
|
|
66
|
+
|
|
67
|
+
## Related fragments
|
|
68
|
+
|
|
69
|
+
- `test-quality.md` — Definition of Done for tests; the gate protects DoD compliance.
|
|
70
|
+
- `risk-governance.md` — risk scoring discipline that informs Rationale for risk-related gates.
|
|
71
|
+
- `probability-impact.md` — scoring scales used in risk-related Rationale.
|
|
72
|
+
- `selector-resilience.md` — selector confidence specifically.
|
|
73
|
+
- `playwright-cli.md` — the sanctioned exploration tool that promotes selector Confidence.
|
|
@@ -647,6 +647,7 @@ test('admin action', async ({ page }) => {
|
|
|
647
647
|
- `data-factories.md` - Isolated, parallel-safe data patterns
|
|
648
648
|
- `fixture-architecture.md` - Setup extraction and cleanup
|
|
649
649
|
- `test-levels-framework.md` - Choosing appropriate test granularity for speed
|
|
650
|
+
- `confidence-gate.md` - Agent reliability gate that protects DoD compliance during LLM-assisted test generation
|
|
650
651
|
|
|
651
652
|
## Core Quality Checklist
|
|
652
653
|
|
|
@@ -50,3 +50,4 @@ webhook-waiting,Webhook Waiting and Querying,"waitFor, waitForCount, getReceived
|
|
|
50
50
|
webhook-timeout-error,WebhookTimeoutError Debugging,"templateName, timeoutMs, totalReceived, receivedWebhooks, matcherDetails, toJSON — inspect what arrived vs what was expected","webhook,debugging,errors,playwright-utils",extended,knowledge/webhook-timeout-error.md
|
|
51
51
|
webhook-providers,Webhook Provider Patterns,"WireMock (deleteById supported), MockServer (deleteById no-op), Mockoon (deleteById no-op, 100-entry limit), custom WebhookProvider interface","webhook,providers,playwright-utils,wiremock,mockserver,mockoon",extended,knowledge/webhook-providers.md
|
|
52
52
|
webhook-risk,Webhook Testing Risk Guidance,"When webhook tests are required, P2×I3 default risk score, complete test checklist, failure patterns and mitigations, TA assessment checklist","webhook,risk,assessment,event-driven,async,playwright-utils,governance",core,knowledge/webhook-risk-guidance.md
|
|
53
|
+
confidence-gate,Confidence Gate,"1-10 confidence scoring with stop-and-ask rule below threshold for selectors, endpoints, risk classification, fixtures, schemas, and data factories — prevents agent fabrication","reliability,agent-safety,generation,quality,governance",core,knowledge/confidence-gate.md
|