sumo-qa 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. sumo_qa/__init__.py +3 -0
  2. sumo_qa/__main__.py +4 -0
  3. sumo_qa/_data/knowledge/approaches.md +46 -0
  4. sumo_qa/_data/knowledge/classifications.md +54 -0
  5. sumo_qa/_data/knowledge/principles.md +79 -0
  6. sumo_qa/_data/knowledge/specialty_tools.md +54 -0
  7. sumo_qa/_data/knowledge/techniques.md +87 -0
  8. sumo_qa/_data/knowledge/test_data/auth/sample_accounts.yaml +38 -0
  9. sumo_qa/_data/knowledge/test_data/billing/sample_invoices.yaml +38 -0
  10. sumo_qa/_data/skills/qa-answering-testing-question/SKILL.md +82 -0
  11. sumo_qa/_data/skills/qa-creating-test-plan/SKILL.md +114 -0
  12. sumo_qa/_data/skills/qa-deciding-approach/SKILL.md +104 -0
  13. sumo_qa/_data/skills/qa-executing-qa-rollout/SKILL.md +112 -0
  14. sumo_qa/_data/skills/qa-executing-qa-rollout/prompts/implementer-prompt.md +60 -0
  15. sumo_qa/_data/skills/qa-executing-qa-rollout/prompts/quality-reviewer-prompt.md +56 -0
  16. sumo_qa/_data/skills/qa-executing-qa-rollout/prompts/spec-reviewer-prompt.md +51 -0
  17. sumo_qa/_data/skills/qa-finding-test-data/SKILL.md +97 -0
  18. sumo_qa/_data/skills/qa-finishing-qa-work/SKILL.md +134 -0
  19. sumo_qa/_data/skills/qa-implementing-with-tdd/SKILL.md +116 -0
  20. sumo_qa/_data/skills/qa-planning-qa-rollout/SKILL.md +160 -0
  21. sumo_qa/_data/skills/qa-preparing-for-work/SKILL.md +87 -0
  22. sumo_qa/_data/skills/qa-reviewing-before-merge/SKILL.md +115 -0
  23. sumo_qa/_data/skills/qa-strengthening-tests/SKILL.md +116 -0
  24. sumo_qa/_data/skills/sumo-qa-strategising/SKILL.md +119 -0
  25. sumo_qa/_data/skills/using-sumo-qa/SKILL.md +140 -0
  26. sumo_qa/_data/standards/packs/istqb_v1.yml +135 -0
  27. sumo_qa/_data/standards/packs/qa_shift_left_v1.yml +51 -0
  28. sumo_qa/_data/standards/rules/change_rules.yaml +237 -0
  29. sumo_qa/debug_capture.py +51 -0
  30. sumo_qa/knowledge_loaders.py +138 -0
  31. sumo_qa/rules.py +130 -0
  32. sumo_qa/server.py +316 -0
  33. sumo_qa/skill_prompts.py +119 -0
  34. sumo_qa/standards.py +127 -0
  35. sumo_qa/tdm_catalogue.py +179 -0
  36. sumo_qa/tdm_models.py +121 -0
  37. sumo_qa/tdm_service.py +378 -0
  38. sumo_qa/tdm_validation.py +151 -0
  39. sumo_qa/tools.py +174 -0
  40. sumo_qa-0.1.0.dist-info/METADATA +125 -0
  41. sumo_qa-0.1.0.dist-info/RECORD +45 -0
  42. sumo_qa-0.1.0.dist-info/WHEEL +4 -0
  43. sumo_qa-0.1.0.dist-info/entry_points.txt +2 -0
  44. sumo_qa-0.1.0.dist-info/licenses/LICENSE +202 -0
  45. sumo_qa-0.1.0.dist-info/licenses/NOTICE +3 -0
sumo_qa/__init__.py ADDED
@@ -0,0 +1,3 @@
1
+ """QA Shift-Left MCP server package."""
2
+
3
+ __version__ = "0.1.0"
sumo_qa/__main__.py ADDED
@@ -0,0 +1,4 @@
1
+ from sumo_qa.server import main
2
+
3
+ if __name__ == "__main__":
4
+ main()
@@ -0,0 +1,46 @@
1
+ # Canonical QA approaches
2
+
3
+ Eight canonical approaches the host LLM picks from when deciding the shape
4
+ of QA work. The LLM may invent a new approach if the situation genuinely
5
+ needs one, but it must explain why none of these eight fit.
6
+
7
+ ## strategy-orchestration
8
+ Repo-wide / policy-shaped ask: "design a test strategy", "audit our coverage",
9
+ "design our pyramid", "rollout to other services", "minimum viable QA setup".
10
+ Do NOT force per-change output. Next step is loading the
11
+ `sumo-qa-strategising` skill.
12
+
13
+ ## tdd-scaffold
14
+ Greenfield-ish change adding behaviour. Plan -> scaffold -> red ->
15
+ implement -> green. Fits when production code is being written for new
16
+ functionality.
17
+
18
+ ## regression-first
19
+ Bug fix on existing code. Reproduce the defect as one failing test, fix it,
20
+ confirm green, run targeted regression. Fits when the user describes a
21
+ specific defect on existing behaviour.
22
+
23
+ ## coverage-first-then-refactor
24
+ Behaviour-preserving refactor (rename, extract method, split module, restructure
25
+ without changing outputs). Audit existing coverage and add characterization
26
+ tests BEFORE refactoring. Tests pin current behaviour so the refactor doesn't
27
+ silently change outputs.
28
+
29
+ ## strengthen-test-coverage
30
+ Strengthen existing tests on UNCHANGED production code. Mutation-testing
31
+ follow-up, raise-coverage tasks, killing weak assertions. Production code
32
+ stays still. Equivalent mutants get suppressed in tool config rather than
33
+ chased with tautological tests.
34
+
35
+ ## verify-existing
36
+ Config-only or trivial tweak that doesn't merit new tests. Run the existing
37
+ suite plus a smoke test. Fits when a config bump or mechanical edit needs
38
+ confirmation, not new coverage.
39
+
40
+ ## no-tests-recommended
41
+ Pure docs / typos / comments. Build + lint, no QA test work. The honest
42
+ senior-QA answer when the change has no behavioural surface.
43
+
44
+ ## spike-first-then-tests
45
+ Exploratory prototype. Defer test discipline until the design settles. The
46
+ deliverable is a captured-conditions-and-fit record, not scaffolded tests.
@@ -0,0 +1,54 @@
1
+ # Canonical change classifications
2
+
3
+ Ten canonical classifications used to shape testing strategy. The host LLM picks
4
+ which apply to a given change by reasoning over the user's intent and target
5
+ paths. The catalogue below is authoritative — do not invent classifications
6
+ not in this list.
7
+
8
+ ## api_contract_change
9
+ A change that adds, removes, or modifies a public API surface (HTTP endpoint,
10
+ gRPC method, public library function, event schema). Risk: downstream
11
+ consumers break on signature drift.
12
+
13
+ ## business_logic_change
14
+ A change to domain rules, calculations, decision logic, or state machines.
15
+ Risk: incorrect outcomes for valid inputs.
16
+
17
+ ## security_change
18
+ A change touching authentication, authorisation, secrets handling, encryption,
19
+ input sanitisation, rate limiting, audit logging. Risk: privilege escalation,
20
+ data leak, regression of a security control.
21
+
22
+ ## performance_change
23
+ A change motivated by latency, throughput, memory, or resource consumption,
24
+ including caching, batching, query plan changes, and indexing.
25
+ Risk: regression in p99 or memory profile under load.
26
+
27
+ ## frontend_change
28
+ A change to UI components, page layout, accessibility tree, client-side
29
+ interaction, or rendering. Risk: visual / interaction regressions, a11y
30
+ regressions.
31
+
32
+ ## infrastructure_change
33
+ A change to deployment, IaC, runtime configuration, networking, or platform-
34
+ level concerns (Kubernetes manifests, Terraform, Docker, CI). Risk:
35
+ environment-only failures invisible in unit tests.
36
+
37
+ ## test_change
38
+ A change exclusively to test code or test fixtures, with no production code
39
+ movement. Includes mutation-testing follow-up, raise-coverage tasks,
40
+ strengthening weak assertions, and refactoring tests. Risk: false confidence
41
+ if tests become tautological.
42
+
43
+ ## docs_change
44
+ A change to documentation, comments, README, or any non-executable artefact.
45
+ Risk: minimal — typically no QA test work needed beyond build/lint.
46
+
47
+ ## config_change
48
+ A change to configuration files (YAML, JSON, env files, feature flags) where
49
+ the configuration is consumed at runtime by existing code paths. Risk:
50
+ behaviour drift via the config without code review catching it.
51
+
52
+ ## data_migration
53
+ A change that transforms persisted data (schema migration, backfill, ETL).
54
+ Risk: data loss, broken referential integrity, partial migrations.
@@ -0,0 +1,79 @@
1
+ # QA principles — ISTQB Foundation, Advanced, ISO 25010
2
+
3
+ ## ISTQB Foundation — the seven testing principles
4
+
5
+ 1. Testing shows the presence of defects, not their absence.
6
+ 2. Exhaustive testing is impossible; use risk and prioritisation.
7
+ 3. Early testing saves time and money — shift left.
8
+ 4. Defects cluster — concentrate effort where defect history is dense.
9
+ 5. Pesticide paradox — the same tests stop finding defects; refresh
10
+ assertions and add new techniques.
11
+ 6. Testing is context-dependent — safety-critical, regulated, web,
12
+ mobile, AI all warrant different mixes.
13
+ 7. Absence-of-errors fallacy — validate fitness for use, not just
14
+ code-level correctness.
15
+
16
+ ## ISTQB Advanced
17
+
18
+ - **Test Manager:** risk-based testing (likelihood x impact), shaping
19
+ coverage to where risk is highest, accepting low-risk areas with
20
+ thinner tests, entry/exit criteria, test estimation.
21
+ - **Test Analyst:** black-box / experience-based technique mastery,
22
+ tester independence, defect taxonomy.
23
+ - **Technical Test Analyst:** white-box / structural coverage, code
24
+ analysis, performance / security / reliability test design.
25
+
26
+ ## ISO/IEC 25010 quality characteristics
27
+
28
+ - functional suitability
29
+ - performance efficiency
30
+ - compatibility
31
+ - usability
32
+ - reliability
33
+ - security
34
+ - maintainability
35
+ - portability
36
+
37
+ Pick the characteristics the change actually threatens; do not list them all.
38
+
39
+ ## Test levels and the pyramid
40
+
41
+ unit -> component integration -> system -> system integration -> acceptance.
42
+ Shape the mix to the risk and the change.
43
+
44
+ ## Test types (orthogonal to levels)
45
+
46
+ - functional
47
+ - non-functional (performance, security, accessibility, reliability,
48
+ compatibility, usability)
49
+ - white-box / structural
50
+ - change-related (confirmation + regression)
51
+
52
+ ## Static testing
53
+
54
+ Review (informal walkthrough, technical review, inspection) and static
55
+ analysis (linters, type checkers, SAST). Often the cheapest defect removal.
56
+ A code review or a stricter linter rule can be the right answer instead of
57
+ "add more tests".
58
+
59
+ ## Senior-QA disciplines
60
+
61
+ - Decide the SHAPE of the work first (single change vs repo-wide strategy;
62
+ bug vs greenfield vs refactor vs strengthen-existing-tests vs spike vs
63
+ config tweak vs docs). Wrong shape = wrong-shaped tests.
64
+ - Reach for the smallest useful test set that gives release confidence.
65
+ Avoid generic advice; tie every recommendation to a specific risk.
66
+ - When the user asks about strategy / audit / pyramid / rollout, that is a
67
+ strategy ask, not a single change. Don't force per-change output.
68
+ - When the user describes work that doesn't change production code (mutation-
69
+ testing follow-up, raise-coverage, kill surviving mutants, tighten weak
70
+ assertions), do NOT scaffold tests against new behaviour. Strengthen
71
+ existing tests. Suppress equivalent mutants in tool config rather than
72
+ chasing them.
73
+ - Critical paths (auth, authorization, payment, billing, encryption, rate
74
+ limiting, anything where a regression hits money, security, or customer
75
+ trust) warrant tighter coverage and at least one boundary test per rule.
76
+ - Honest TDD: red phase first. Tests that fail BEFORE production code is
77
+ written. Never bless a change as merge-ready without evidence.
78
+ - Static testing counts. A code review or a stricter linter rule can be
79
+ the right answer instead of "add more tests".
@@ -0,0 +1,54 @@
1
+ # Specialty + tool fit — category primer
2
+
3
+ Category-fit primer, **NOT** a brand whitelist. Each section says WHEN that
4
+ specialty category applies. Brand names are illustrative — pick the best fit
5
+ from your knowledge of the user's stack, verify currency via web search when
6
+ uncertain. Once chosen, **install and set the tool up** (package manager,
7
+ framework CLI, config edit, or MCP server — whichever path is shortest) and
8
+ write the first tests against the named risks. Empty selection is acceptable.
9
+
10
+ ---
11
+
12
+ **Token TTL / signature / claim validation** — JWT/JOSE/OAuth, signature
13
+ verify, claims, expiry. Examples: JJWT, Auth0 java-jwt, jose4j.
14
+
15
+ **HTTP DAST — new endpoint / auth filter** — header/CORS/auth bypasses on a
16
+ real HTTP surface (not in-process pure functions). Examples: OWASP ZAP, Burp Suite.
17
+
18
+ **Static security analysis** — hard-coded secrets, `alg=none`, SQLi, unsafe
19
+ deserialisation. Examples: Semgrep, Snyk, SonarQube.
20
+
21
+ **REST contract drift** — HTTP service with external consumers. Examples:
22
+ Pact (consumer-driven), Spring Cloud Contract, Schemathesis (OpenAPI fuzzing).
23
+
24
+ **Async / event contract drift** — handlers consuming events whose schemas
25
+ may drift. Examples: Schemathesis, AsyncAPI test runners.
26
+
27
+ **Frontend visual / interaction** — UI needing end-to-end browser coverage.
28
+ Examples: Cypress, Playwright. (MCP servers exist for some; package-manager
29
+ install is usually shorter.)
30
+
31
+ **Frontend a11y** — keyboard nav, screen readers, ARIA, contrast. Examples:
32
+ axe-core (often via Playwright), Pa11y.
33
+
34
+ **Mobile UI** — mobile app surface. Examples: Appium, Maestro, Detox, XCUITest, Espresso.
35
+
36
+ **Performance / load** — hot path with articulated SLO (p95, RPS). Without a
37
+ budget it's theatre. Examples: k6, Locust, Gatling, JMeter.
38
+
39
+ **Mutation testing** — coverage looks good but assertion strength suspect.
40
+ Examples: Pitest (JVM), Stryker (JS/TS/.NET/Scala), MutPy / mutmut (Python).
41
+
42
+ **Property-based** — invariant across many inputs (commutativity, idempotency,
43
+ round-trip, monotonicity). Examples: Hypothesis (Python), jqwik (JVM),
44
+ fast-check (JS/TS), ScalaCheck.
45
+
46
+ **AI / LLM behaviour** — probabilistic surfaces: prompts, RAG, agents.
47
+ Examples: Promptfoo, DeepEval, Ragas, TruLens, Evidently.
48
+
49
+ ---
50
+
51
+ Discipline: pick by fit, not familiarity. Verify currency before naming.
52
+ Once chosen, set the tool up yourself (install, config, scaffold the first
53
+ tests) — don't hand the user a list of commands. Confirm before installing
54
+ dependencies. Empty selection is honest; most changes don't need this.
@@ -0,0 +1,87 @@
1
+ # Test design techniques
2
+
3
+ Pick one technique per named risk. The catalogue below is authoritative —
4
+ do not invent techniques not in this list. If a risk needs something not
5
+ catalogued, flag it as a gap rather than confabulating.
6
+
7
+ ## Black-box
8
+
9
+ ### equivalence partitioning
10
+ Group inputs into classes that share behaviour; pick one representative
11
+ per class. Use when the input space is large but partitions clearly.
12
+
13
+ ### boundary value analysis
14
+ Test the values immediately above, at, and below boundaries (off-by-one,
15
+ limits, capacity thresholds). Defects cluster at boundaries.
16
+
17
+ ### decision tables
18
+ Enumerate every combination of input conditions and the expected output.
19
+ Use when business rules are conjunctions of multiple conditions.
20
+
21
+ ### state transition testing
22
+ Model the system as a finite state machine; test every legal transition
23
+ and a sample of illegal ones. Use for stateful components.
24
+
25
+ ### pairwise / orthogonal arrays
26
+ When a feature has many parameters with many values each, test every pair
27
+ of value combinations rather than the full Cartesian product.
28
+
29
+ ### classification trees
30
+ Hierarchical refinement of equivalence partitions, useful when partitions
31
+ have sub-partitions.
32
+
33
+ ### use case testing
34
+ Walk through end-to-end user scenarios. Catches integration defects unit
35
+ tests miss.
36
+
37
+ ## White-box / structural
38
+
39
+ ### statement coverage
40
+ Every statement executes at least once.
41
+
42
+ ### branch coverage
43
+ Every branch (if/else, switch arm, loop entry/exit) is exercised both ways.
44
+
45
+ ### decision coverage
46
+ Every boolean decision evaluates to both true and false.
47
+
48
+ ### MC-DC (modified condition / decision coverage)
49
+ Each condition independently affects the decision outcome. Required for
50
+ safety-critical software.
51
+
52
+ ### data-flow coverage
53
+ Every definition-use pair is exercised.
54
+
55
+ ## Experience-based
56
+
57
+ ### error guessing
58
+ Senior-QA judgment on where defects historically appear in similar systems.
59
+
60
+ ### exploratory testing charters
61
+ Time-boxed, mission-driven sessions documenting findings.
62
+
63
+ ### checklist-based testing
64
+ Reusable list of common pitfalls (e.g. OWASP Top 10).
65
+
66
+ ## Static
67
+
68
+ ### review (walkthrough / technical review / inspection)
69
+ Code review with varying formality. Cheapest defect removal.
70
+
71
+ ### static analysis
72
+ Linters, type checkers, SAST scanners. Catches whole classes of defects
73
+ without execution.
74
+
75
+ ## Property-based
76
+
77
+ ### property-based testing
78
+ Generate inputs satisfying invariants and check the output preserves the
79
+ invariant. Tools: Hypothesis (Python), jqwik (JVM), fast-check (JS).
80
+ Fits pure-function refactors and algorithms.
81
+
82
+ ## Mutation
83
+
84
+ ### mutation testing
85
+ Mutate the production code, run the test suite, kill the mutants. Surfaces
86
+ weak assertions. Tools: Pitest (JVM), Stryker (JS / .NET), mutmut (Python).
87
+ Fits strengthen-test-coverage approach.
@@ -0,0 +1,38 @@
1
+ entries:
2
+ - id: auth-mfa-required-001
3
+ environment: integration
4
+ domain: auth
5
+ scenario_tags:
6
+ - mfa_required
7
+ - active_account
8
+ known_valid_for:
9
+ - mfa enforcement testing
10
+ - active account login
11
+ constraints:
12
+ - Reset MFA state after test execution.
13
+ - Revalidate before release testing if MFA policy changed.
14
+ owner: identity-platform
15
+ last_validated_at: "2026-05-06T09:00:00Z"
16
+ confidence: high
17
+ source: qa-curated
18
+ validation_source: mock-heuristic-validator
19
+ notes: Active account with enforced MFA for login-flow testing in integration.
20
+
21
+ - id: auth-locked-account-001
22
+ environment: integration
23
+ domain: auth
24
+ scenario_tags:
25
+ - account_locked
26
+ - failed_login
27
+ known_valid_for:
28
+ - locked account rejection
29
+ - failed-login lockout testing
30
+ constraints:
31
+ - Use a fresh impersonation token for each test run.
32
+ - Expected failure must be account-locked, not invalid credentials.
33
+ owner: identity-platform
34
+ last_validated_at: "2026-05-04T11:30:00Z"
35
+ confidence: high
36
+ source: qa-curated
37
+ validation_source: mock-heuristic-validator
38
+ notes: Useful for validating that locked-account rejection is not conflated with invalid credentials.
@@ -0,0 +1,38 @@
1
+ entries:
2
+ - id: billing-paid-invoice-001
3
+ environment: integration
4
+ domain: billing
5
+ scenario_tags:
6
+ - invoice_paid
7
+ - refund_eligible
8
+ known_valid_for:
9
+ - paid invoice refund flow
10
+ - invoice state validation
11
+ constraints:
12
+ - Reset payment state before re-run.
13
+ - Release any holds after test execution.
14
+ owner: billing-platform
15
+ last_validated_at: "2026-05-05T15:15:00Z"
16
+ confidence: high
17
+ source: qa-curated
18
+ validation_source: mock-heuristic-validator
19
+ notes: Known-good paid invoice for refund-flow testing in integration.
20
+
21
+ - id: billing-pending-due-boundary-001
22
+ environment: integration
23
+ domain: billing
24
+ scenario_tags:
25
+ - invoice_pending
26
+ - boundary_due_date
27
+ known_valid_for:
28
+ - pending invoice due-date boundary
29
+ - boundary-value validation
30
+ constraints:
31
+ - Confirm clock skew is within tolerance before release sign-off.
32
+ - Avoid if promotion testing is not intended.
33
+ owner: billing-platform
34
+ last_validated_at: "2026-04-20T10:00:00Z"
35
+ confidence: medium
36
+ source: qa-curated
37
+ validation_source: mock-heuristic-validator
38
+ notes: Stable pending invoice for due-date boundary checks; also covers boundary-value technique.
@@ -0,0 +1,82 @@
1
+ ---
2
+ name: qa-answering-testing-question
3
+ description: Use when the user asks a generic testing question — "how do I test this?", "what should I check for X?" — that doesn't fit a more specific QA skill. Cites a principle or technique from the loaded catalogue rather than producing generic advice.
4
+ ---
5
+
6
+ # Answering a testing question
7
+
8
+ **Announce at start:** *"Answering with a cited principle and technique."*
9
+
10
+ ## Output discipline (mandatory)
11
+
12
+ **Never surface internal taxonomy labels in user-facing output.** No "Classification: X", "Approach: Y", "Per the checklist", "Step 3 of 6". The taxonomy is internal scaffolding; translate to natural English when the meaning matters to the user — *"this is a behaviour change in pricing"*, not *"Classification: business_logic_change"*. If you catch yourself typing a label, delete it.
13
+
14
+ ## Output economy (mandatory)
15
+
16
+ Spend output tokens on findings, not framing.
17
+
18
+ - **Don't preamble the work.** The host already shows tool calls — present findings, don't narrate *"I'll first read X, then Y, then deliver Z."*
19
+ - **One question per turn.** Don't follow a question with *"shall I proceed or clarify first?"* — the question IS the gate.
20
+ - **No self-narration.** *"Let me now..."* / *"I'm going to..."* → just do it.
21
+ - **Don't restate the user's input.** They know what they asked.
22
+ - **Section headings only when there are genuinely multiple sections.** A 3-line scope check doesn't need a `## Scope` heading.
23
+ - **Tables only when comparing >2 things on >2 axes.** Otherwise prose is shorter.
24
+ - **No closing pleasantries.** No *"happy to dig deeper"* / *"let me know if you want X"* — the next-skill handoff at the bottom of every skill is where routing lives.
25
+
26
+ ## The Iron Law
27
+ NO ANSWER WITHOUT A CITED PRINCIPLE OR TECHNIQUE.
28
+
29
+ Every answer ties to a named ISTQB principle and a named test design technique from the loaded catalogue; specialty tools come from your ecosystem knowledge, with `specialty_tools.md` confirming the category fits.
30
+
31
+ ## When to Use
32
+
33
+ `qa-deciding-approach` routes here when the user's intent is question-shaped but doesn't fit a more specific skill:
34
+
35
+ - "how do I test this service?"
36
+ - "what should I check for X feature?"
37
+ - "any QA suggestions for this design?"
38
+ - "what's the right test type for this?"
39
+
40
+ For "create a plan" / "prep for work" / "review my changes" → use the more specific skills.
41
+
42
+ ## Checklist
43
+ You MUST create a TodoWrite item per checklist item and complete in order:
44
+
45
+ 1. Read the user's question verbatim.
46
+ 2. Read any code/paths/specs the user supplied (host's file tools).
47
+ 3. Call `sumo_qa_load_principles()` and `sumo_qa_load_techniques()`. Read both catalogues.
48
+ 4. Identify the QA shape the question implies: what's the actual concern (correctness / regression / coverage / risk surface)?
49
+ 5. Pick at least one principle that shapes the answer (cite by number or name). Pick at least one technique that fits the concern.
50
+ 6. If the question implies a specialty surface (security, performance, contract, a11y, UI, etc.), recommend the best-fit tool from your knowledge of the ecosystem anchored to the user's stack. `sumo_qa_load_specialty_tools()` is a category-fit primer, NOT a brand whitelist. Verify currency with web search if uncertain. Offer to install and scaffold the first tests; confirm before installing dependencies.
51
+ 7. Synthesise the answer: 3-7 sentences, naming the principle/technique/tool. Conversational, not a JSON blob.
52
+ 8. If the question is actually a prep/plan/review/strategy in disguise, escalate: stop, route to the matching skill.
53
+
54
+ ## Process Flow
55
+
56
+ See the Checklist above — that's the flow.
57
+
58
+ ## Red Flags
59
+
60
+ | Thought | Reality |
61
+ |---|---|
62
+ | "Just say 'add unit tests and integration tests'" | Generic. Pick a technique from the catalogue (boundary value, decision table, etc.). |
63
+ | "Mention security as a consideration" | Name the actual surface AND the right tool for it (HTTP DAST scanner / SAST tool / token-validation harness — pick from your knowledge by fit). Bare "consider security" is not senior-QA. |
64
+ | "I'll cite a principle by paraphrasing — saves loading the catalogue" | Principles are catalogue-authoritative. Use the catalogue's wording. (Tool brand picks are different — those come from your knowledge of the ecosystem.) |
65
+ | "I'll restrict tool recommendations to the names in `specialty_tools.md`" | The primer is a category check, not a brand whitelist. Recommend the best fit from your knowledge; the listed names are illustrative. |
66
+ | "User asked a planning question — I'll answer inline" | Route to `qa-preparing-for-work` or `qa-creating-test-plan`. Don't reinvent. |
67
+ | "Answer should be 20+ sentences for completeness" | 3-7 sentences. Senior QA answers concisely. |
68
+
69
+ ## Examples
70
+
71
+ ### Good
72
+
73
+ User: "how should I test a new feature that re-orders user feeds?"
74
+ Answer cites ISTQB Principle 4 (defects cluster — feed ordering is a hotspot), names decision-table for the ordering rules and equivalence-partitioning for feed sizes, suggests k6 if scale matters, and asks the user to confirm scale before adding performance work.
75
+
76
+ ### Bad
77
+
78
+ "You should add unit tests, integration tests, and consider edge cases. Maybe test performance too." — no cited principle, no named technique, no specialty tool named by fit.
79
+
80
+ ## Next skill in the chain
81
+
82
+ Terminal skill — the answer is the deliverable. If the question turns out to be a disguised plan / review / strategy ask, stop and route to the matching specific skill.
@@ -0,0 +1,114 @@
1
+ ---
2
+ name: qa-creating-test-plan
3
+ description: Use when the user asks for a formal test plan, entry/exit criteria, or a phased QA approach for a piece of work. Walk the user through scope → risks → entry criteria → phases → exit criteria → residual risks one section at a time, getting confirmation before each step. Heavier than qa-preparing-for-work; use when the work is tracked or formally reviewed.
4
+ ---
5
+
6
+ # Creating a Test Plan
7
+
8
+ Help the user turn a piece of upcoming work into a phased ISTQB-style test plan through natural collaborative dialogue. Walk through scope, risks, criteria, and phases one section at a time, confirming with them after each, until the full plan is on the page. The user has domain context the AI can't infer — surface it through questions, don't assume it.
9
+
10
+ **Announce at start:** *"Building the formal test plan."*
11
+
12
+ ## Output discipline (mandatory)
13
+
14
+ **Never surface internal taxonomy labels in user-facing output.** No "Classification: X", "Approach: Y", "Per the checklist", "Step 3 of 6". The taxonomy is internal scaffolding; translate to natural English when the meaning matters to the user — *"this is a behaviour change in pricing"*, not *"Classification: business_logic_change"*. If you catch yourself typing a label, delete it.
15
+
16
+ Inherits the global discipline from `using-sumo-qa` (knowledge authority hierarchy, internal scaffolding stays internal, specialty-tool fit).
17
+
18
+ ## Output economy (mandatory)
19
+
20
+ Spend output tokens on findings, not framing.
21
+
22
+ - **Don't preamble the work.** The host already shows tool calls — present findings, don't narrate *"I'll first read X, then Y, then deliver Z."*
23
+ - **One question per turn.** Don't follow a question with *"shall I proceed or clarify first?"* — the question IS the gate.
24
+ - **No self-narration.** *"Let me now..."* / *"I'm going to..."* → just do it.
25
+ - **Don't restate the user's input.** They know what they asked.
26
+ - **Section headings only when there are genuinely multiple sections.** A 3-line scope check doesn't need a `## Scope` heading.
27
+ - **Tables only when comparing >2 things on >2 axes.** Otherwise prose is shorter.
28
+ - **No closing pleasantries.** No *"happy to dig deeper"* / *"let me know if you want X"* — the next-skill handoff at the bottom of every skill is where routing lives.
29
+
30
+ <HARD-GATE>
31
+ Do NOT emit a test plan in a single message. Walk through the sections one at a time, getting the user's confirmation or correction between each. A test plan dumped in one turn is a wishlist; a test plan built collaboratively is reviewable.
32
+ </HARD-GATE>
33
+
34
+ ## The Iron Law
35
+
36
+ **NO PLAN WITHOUT EXPLICIT ENTRY AND EXIT CRITERIA.** A document missing either is a wishlist, not a plan.
37
+
38
+ ## When to Use
39
+
40
+ User intents that trigger this skill:
41
+
42
+ - "create a test plan for X"
43
+ - "draft the formal QA plan I should follow"
44
+ - "give me entry/exit criteria for X"
45
+ - "I'm starting a major feature — plan QA properly"
46
+
47
+ Distinct from `qa-preparing-for-work` (lighter prep brief, no formal entry/exit gates) — use this when the work is tracked, formally reviewed, or large enough to warrant phased execution.
48
+
49
+ ## Checklist
50
+
51
+ You MUST work through these in order. Steps 1–3 are AI-only homework (no user questions). The user's confirmation gates steps 4 onward.
52
+
53
+ 1. **Extract scope hints from intent** *(no user question)* — re-read the user's intent verbatim. Identify keywords / paths / domain terms that point at where the work lives.
54
+
55
+ 2. **Walk the repo for the scope** *(no user question)* — use the host's file tools. Find where the production code lives, existing tests, related callers, classification signal. Don't ask the user where things are.
56
+
57
+ 3. **Load the catalogues** *(no user question)* — call `sumo_qa_load_standards`, `sumo_qa_load_rules`, `sumo_qa_load_techniques`, `sumo_qa_load_specialty_tools`. Internal only.
58
+
59
+ 4. **Confirm scope, only for the AMBIGUOUS parts** — present a short paragraph of what you FOUND (file paths, callers, existing tests). Then ask ONE focused question for whatever the code DIDN'T make clear. If exploration left nothing ambiguous, skip the question and move to step 5.
60
+
61
+ 5. **Propose named risks (one message, ask after)** — 3–7 named risks, each anchored in evidence you actually saw (file path, class name, domain term). NOT generic. Ask: *"do these match how you'd describe the risks?"*
62
+
63
+ 6. **Pick technique per risk** — name one technique per risk from the techniques catalogue. Present as a table: risk → technique. Ask: *"do these technique choices fit?"*
64
+
65
+ 7. **Recommend specialty tools (if any), and offer to set them up** — pick the best-fit tool from your knowledge of the ecosystem anchored to the user's stack. The primer is a category check, NOT a brand whitelist. Verify currency with web search if unsure. Offer to install and scaffold the first tests against the named risks. Confirm before installing dependencies. Empty list is acceptable.
66
+
67
+ 8. **Entry criteria — what must be true to START testing** — 3–5 observable preconditions (API spec frozen, test data loaded, feature flag default off, etc.).
68
+
69
+ 9. **Phases + deliverables** — propose analysis / design / execution / completion phases with concrete deliverables per phase.
70
+
71
+ 10. **Exit criteria — what must be true to SHIP** — observable exit criteria (all named risks have ≥1 passing test, no Sev-1/2 open, perf under p95 budget). Tautologies like "tests pass" are forbidden.
72
+
73
+ 11. **Residual risks accepted at exit** — name 1–3 risks you're NOT covering and why (out of scope, accepted cost, mitigated elsewhere).
74
+
75
+ 12. **Final plan** — assemble the confirmed sections into one document. Offer to write to a file (e.g. `docs/qa-plans/<topic>.md`) or surface inline. Confirm before writing.
76
+
77
+ ## Process Flow
78
+
79
+ See the Checklist above — that's the flow.
80
+
81
+ ## Red Flags — STOP and rework
82
+
83
+ | Thought | Reality |
84
+ |---|---|
85
+ | "I'll draft the whole plan and they can react" | Iron-Law-adjacent failure. Walk through one section at a time. |
86
+ | "I'll ask the user where the code lives instead of looking" | The code is in the repo. Read it. Ask only for what the code didn't make clear. |
87
+ | "I'll list 4 clarifying questions in one message to be thorough" | Overwhelming. ONE primary question per turn. |
88
+ | "Skip exit criteria — they'll know when it's done" | Then it's not a plan. Iron Law violated. |
89
+ | "Entry criteria: 'tests are green'" | Tautology. Entry criteria are about the world before testing — feature complete, data available, environments stand up. |
90
+ | "Add a phase called 'edge cases'" | Phases are analysis / design / execution / completion. "Edge cases" is a phase only in a junior QA's plan. |
91
+ | "Residual risks: 'none'" | Every plan has residual risks. Naming "none" means you didn't think about what could still go wrong post-ship. |
92
+ | "Mutation testing on a UI redesign" | Wrong tool fit. Pick from the catalogue based on the actual risk surface. |
93
+ | "Tests cover all behaviour" | "All behaviour" is not measurable. Exit criteria must be observable. |
94
+ | "User said create a plan — I'll just create it without asking" | The user has context you don't. Walk through the sections; their corrections shape the plan. |
95
+
96
+ ## Examples
97
+
98
+ ### Good (explore first, ask only what's ambiguous)
99
+
100
+ > **User:** "Create a test plan for the new tax-calculation feature."
101
+ >
102
+ > **AI (after internal repo walk):** *"Found the tax engine at `tax/TaxEngine.kt` with jurisdiction handlers under `tax/jurisdictions/`. Called from `pricing/Calculator.kt` upstream; `invoicing/InvoiceBuilder.kt` consumes it. Existing tests cover 4 jurisdictions. **Is scope just the tax engine, or also the upstream pricing-pipeline integration?**"*
103
+
104
+ ### Bad (single-shot dump)
105
+
106
+ > **AI:** *"Here's a plan: Phases — planning, testing, deployment. Tests — happy path, edge cases, integration. Entry — code complete. Exit — tests pass. Done."*
107
+ >
108
+ > Generic phases, no risks named, tautological exit, no collaboration. Iron Law violated.
109
+
110
+ ## Next skill in the chain
111
+
112
+ When the plan is signed off → `qa-planning-qa-rollout` to break the phases into bite-sized, dispatchable tasks ready for subagent execution.
113
+
114
+ If the user wants to act on a single phase directly rather than dispatch it → route to the matching execution skill instead (`qa-implementing-with-tdd` for new behaviour / regressions, `qa-strengthening-tests` for mutation follow-up, `qa-reviewing-before-merge` for review-shaped phases).