@qa-gentic/stlc-agents 1.0.17 → 1.0.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,283 @@
1
+ # Orchestration Rules — Multi-Step Pipeline Agents
2
+
3
+ > **Universal rules file for coding agents (Claude, Copilot, Cursor, Windsurf, Gemini, etc.)**
4
+ > Place this file in your project root or `.ai/` folder. Reference it in your prompt with:
5
+ > `"Apply all rules from ORCHESTRATION_RULES.md before executing any step."`
6
+
7
+ ---
8
+
9
+ ## 1. Core Principle
10
+
11
+ Every intermediate output is an **input contract** for the next step — not a done state.
12
+ A step is only complete when its output has been validated against its spec and confirmed
13
+ fit for downstream consumption. Never proceed on a "good enough" assumption.
14
+
15
+ ---
16
+
17
+ ## 2. Mandatory Behaviours
18
+
19
+ ### 2.1 Explicit Task Breakdown
20
+
21
+ - Before executing, decompose the full task into named steps using a todo/task list tool
22
+ (e.g. `manage_todo_list`, GitHub Copilot Tasks, Cursor task panel).
23
+ - Each step must have a declared **input**, **action**, and **output spec**.
24
+ - Mark steps as `[ ] pending`, `[~] in-progress`, `[x] done`, `[!] blocked` — update in real time.
25
+ - Never treat a step as in-progress and done simultaneously.
26
+
27
+ ### 2.2 No Skipping Intermediate Steps
28
+
29
+ - If a step produces data that the next step consumes, that data must be:
30
+ 1. **Extracted** from raw output (not left embedded)
31
+ 2. **Structured** into the agreed schema
32
+ 3. **Validated** against the checkpoint gate
33
+ 4. **Explicitly handed off** as a named artefact
34
+ - Do NOT jump ahead assuming the downstream step can infer missing data.
35
+ - Do NOT proceed if a required input is absent or malformed.
36
+
37
+ ### 2.3 Checkpoint Gates Are Blocking — Pre-Flight Required
38
+
39
+ Gates are not a post-generation reflection. They run **before** output is produced.
40
+
41
+ **Before generating output for any step, you MUST:**
42
+ 1. Output the pre-flight checklist with your intended answers filled in.
43
+ 2. Only if all items are YES — proceed to generate the output.
44
+ 3. If any item is NO — stop, state what is missing, and wait for the user.
45
+
46
+ This is not optional. Generating output first and checking after is a rule violation.
47
+
48
+ Pre-flight format (required before every step):
49
+ ```
50
+ PRE-FLIGHT: Step [N] — [Step Name]
51
+ [ ] Input artefact "[name]" received from Step [N-1]?
52
+ [ ] Input matches expected schema?
53
+ [ ] [Step-specific countable check, e.g. "11 scenarios in scenario_inventory?"]
54
+ [ ] [Any tool or selector availability check]
55
+ → PROCEED / → BLOCKED: [state what is missing]
56
+ ```
57
+
58
+ ### 2.4 Data Handoff Must Be Explicit
59
+
60
+ - Output data in full — not summarised, not truncated.
61
+ - Use a consistent schema (JSON, YAML, or named list) — do not change shape between steps.
62
+ - Name the artefact (e.g. `context_map`, `test_case_list`, `scenario_inventory`).
63
+ - The receiving step must reference the artefact by name, not re-derive it.
64
+
65
+ ### 2.5 No Placeholder or Stub Outputs
66
+
67
+ This rule applies at generation time, not reflection time. The agent must not produce
68
+ stubs and then acknowledge the violation — it must prevent them before generating.
69
+
70
+ - Never produce output containing `TODO`, `placeholder`, `// implement later`,
71
+ `throw new Error('pending')`, or any empty method/step body.
72
+ - Before generating a file, state the **expected item count** (e.g. number of step
73
+ definitions, number of test cases). The generated file must match that count exactly.
74
+ - If a step cannot produce a complete output, declare it `[!] blocked` and stop.
75
+ - Partial outputs passed downstream cause compounding failures and wasted tokens.
76
+
77
+ **Countable verification pattern (required for code generation steps):**
78
+ ```
79
+ Expected: [N] step definitions (from scenario_inventory)
80
+ Generating: [N] step definitions
81
+ Verify after: count implemented bodies — must equal [N], zero empty
82
+ ```
83
+
84
+ ### 2.6 Query-Driven Data Capture (Snapshot / Scraping Steps)
85
+
86
+ - Navigation is NOT the deliverable — **structured data extraction** is.
87
+ - For every screen or page visited, immediately extract all required fields before moving on.
88
+ - Do not defer extraction to a later step.
89
+ - Capture only what downstream steps need (defined by the step's output spec).
90
+ - Validate coverage: every field required by downstream must be present in the captured data.
91
+
92
+ ### 2.7 Split Generation Steps to Prevent Silent Stubs
93
+
94
+ For any step that generates code or structured output consumed by a subsequent step,
95
+ split it into two sub-steps:
96
+
97
+ - **[N]a — Signatures only:** generate method/step signatures (names, parameters) with no bodies.
98
+ Output as an inventory list. This makes the expected count explicit and visible.
99
+ - **[N]b — Implement each signature:** implement every item from the [N]a inventory.
100
+ No body may be left empty. Reference `context_map` or equivalent for all selectors/data.
101
+
102
+ This forces a visible count before implementation begins, eliminating silent stub generation.
103
+
104
+ ### 2.8 Token Efficiency
105
+
106
+ - Avoid re-deriving data already produced in a prior step.
107
+ - Reference prior artefacts by name; do not re-fetch or re-generate unless a gate failed.
108
+ - If rework is needed, state which gate failed, what was missing, and what the corrected output is.
109
+
110
+ ---
111
+
112
+ ## 3. Error Handling
113
+
114
+ | Situation | Required Action |
115
+ |---|---|
116
+ | Gate fails | STOP. Report failed items. Wait for user input or resolution. |
117
+ | Required input missing | STOP. Name the missing input. Do not guess. |
118
+ | Tool call returns empty | STOP. Report. Do not silently continue. |
119
+ | Partial output produced | Mark step `[!] blocked`. Do not pass partial output downstream. |
120
+ | Schema mismatch | STOP. Show expected vs actual schema. Do not transform silently. |
121
+ | Ambiguous instruction | Ask one clarifying question before proceeding. Do not assume. |
122
+ | Stub/TODO found in output | STOP. Do not accept the output. Regenerate from signatures. |
123
+ | Count mismatch (generated vs expected) | STOP. List which items are missing. Do not proceed. |
124
+
125
+ ---
126
+
127
+ ## 4. Checkpoint Gate Template
128
+
129
+ Run this **before** generating output — not after.
130
+
131
+ ```
132
+ CHECKPOINT GATE [N] — [Step Name]
133
+ ---------------------------------------
134
+ PRE-FLIGHT (run before generating):
135
+ [ ] Input artefact received and named
136
+ [ ] Input matches expected schema
137
+ [ ] Expected output count stated: [N items]
138
+ [ ] All required tools/selectors available
139
+
140
+ POST-GENERATION (run before handing off):
141
+ [ ] Actual output count matches expected: [N of N]
142
+ [ ] No stubs, TODOs, or empty bodies in output
143
+ [ ] All items from upstream list are accounted for
144
+ [ ] Output is in agreed schema / format
145
+
146
+ RESULT: [ ] PASS — hand off artefact to Step [N+1]
147
+ [ ] FAIL — stop and report: [list what failed]
148
+ ```
149
+
150
+ ---
151
+
152
+ ## 5. Step Definition Template
153
+
154
+ ```
155
+ STEP [N] — [Name]
156
+ ---------------------------------------
157
+ Tool / Agent: [name of tool, MCP server, or agent]
158
+
159
+ Input (required):
160
+ - [Named artefact from Step N-1]
161
+ - [Any other required input]
162
+
163
+ Pre-flight check:
164
+ - State expected output count before generating
165
+ - Confirm all inputs available and schema-valid
166
+
167
+ Action:
168
+ [Precise description — not vague verbs like "process" or "handle"]
169
+ If code generation: split into [N]a (signatures) and [N]b (implementations)
170
+
171
+ Output spec (the contract):
172
+ - Artefact name: [e.g. context_map]
173
+ - Format: [JSON | YAML | list | file | etc.]
174
+ - Required fields: [enumerate them]
175
+ - Coverage requirement: [e.g. one implementation per Gherkin step]
176
+
177
+ Checkpoint Gate: → run Gate [N] template above
178
+ ```
179
+
180
+ ---
181
+
182
+ ## 6. Anti-Patterns (Never Do These)
183
+
184
+ | Anti-Pattern | Why It Fails | Correct Behaviour |
185
+ |---|---|---|
186
+ | Generating output then checking the gate | Gate runs after stubs already exist — violation acknowledged but not prevented | Run pre-flight checklist before generating |
187
+ | Treating gate as a reflection step | Agent notices violation after the fact; output is already committed | Gate is a pre-condition, not a review |
188
+ | Skipping data extraction after capture | Downstream step receives raw/unstructured input and must infer | Extract and structure data immediately after capture |
189
+ | Jumping to generation without verified inputs | Output based on inference, not facts — stubs and errors result | Validate inputs at gate before calling the generator |
190
+ | Treating "good enough" output as done | Errors compound; rework costs more tokens than doing it right | Validate against spec before marking a step complete |
191
+ | Producing stubs with TODO | Downstream steps receive incomplete contracts and silently fail | Block the step; declare it incomplete; stop |
192
+ | Re-deriving upstream data in a downstream step | Wasted tokens; divergence risk if re-derivation differs | Reference the named artefact from the prior step |
193
+ | Proceeding past a failed gate | Snowballing failures requiring full rework | Stop at the gate; surface the gap; wait for resolution |
194
+ | Single atomic generation step for code | No visible count before generation — stubs go undetected | Split into signatures ([N]a) then implementations ([N]b) |
195
+
196
+ ---
197
+
198
+ ## 7. Orchestration Health Checks
199
+
200
+ Run at the start of any multi-step task:
201
+
202
+ - [ ] Are all steps named and sequenced in the task list?
203
+ - [ ] Does each step have a declared input and output spec?
204
+ - [ ] Does each step have a defined pre-flight and post-generation gate?
205
+ - [ ] Are code generation steps split into signatures + implementations?
206
+ - [ ] Are all required tools / MCP servers available?
207
+ - [ ] Are named artefacts from prior steps available as inputs?
208
+
209
+ If any health check fails before execution begins, resolve it first.
210
+
211
+ ---
212
+
213
+ ## 8. Agent-Specific Integration Notes
214
+
215
+ ### Claude (claude.ai / API)
216
+ - Reference this file in your system prompt or project instructions.
217
+ - Use `manage_todo_list` for step tracking.
218
+ - Attach this file as a project document so it persists across sessions.
219
+ - Pre-flight checklists work reliably as Claude outputs reasoning before tool calls.
220
+
221
+ ### GitHub Copilot (VS Code / JetBrains)
222
+ - Add to `.github/copilot-instructions.md` or reference in your workspace prompt.
223
+ - **Critical:** Copilot treats rules as advisory context — gates are not enforced at runtime.
224
+ Mitigate by scoping rules to file types (e.g. `When generating *.steps.ts files, you must...`).
225
+ - Always include the pre-flight checklist directly in your chat message for the current step,
226
+ not just in the rules file. Copilot applies in-message instructions more reliably than
227
+ file-level rules for generation constraints.
228
+ - Use the countable verification pattern (section 2.5) explicitly in each chat prompt:
229
+ "There are 11 scenarios. Generate exactly 11 step definitions. State the count before writing."
230
+ - For step definition files: add a file-type-scoped rule to `.github/copilot-instructions.md`:
231
+ ```
232
+ When generating Playwright step definition files (*.steps.ts):
233
+ 1. Count Given/When/Then steps in the linked .feature file and state the count first.
234
+ 2. Every step body must contain real implementation — no TODO, no throw pending, no empty bodies.
235
+ 3. If a selector is missing from context_map, name the missing step and stop. Do not stub it.
236
+ ```
237
+
238
+ ### Cursor
239
+ - Place in `.cursor/rules/` as `orchestration.mdc` (set scope: `always`).
240
+ - Or add to `.cursorrules` in the project root.
241
+ - Cursor applies project-level rules more consistently than Copilot for generation steps.
242
+ - Use `@file` references in chat to explicitly pull the rules into context per step.
243
+
244
+ ### Windsurf (Codeium)
245
+ - Place in `.windsurf/rules.md` or reference in the global rules panel.
246
+ - Windsurf's Cascade agent picks up project-level markdown rules automatically.
247
+
248
+ ### Gemini CLI / Vertex AI Agent Builder
249
+ - Reference via system instruction or as a grounding document.
250
+ - Use the Step Definition Template when constructing task configs.
251
+
252
+ ---
253
+
254
+ ## 9. Quick Reference Card
255
+
256
+ ```
257
+ BEFORE EACH STEP:
258
+ 1. Output the pre-flight checklist — fill in all items.
259
+ 2. If all YES → state expected output count → generate.
260
+ 3. If any NO → stop and report.
261
+
262
+ AFTER EACH STEP:
263
+ 1. Run post-generation gate — count actual vs expected.
264
+ 2. If PASS → hand off named artefact to next step.
265
+ 3. If FAIL → stop, report, regenerate.
266
+
267
+ FOR CODE GENERATION:
268
+ 1. Generate signatures/names only first ([N]a).
269
+ 2. State the count from [N]a.
270
+ 3. Implement every item ([N]b) — zero empty bodies allowed.
271
+
272
+ NEVER:
273
+ - Generate output then check the gate.
274
+ - Proceed past a failed gate.
275
+ - Pass unstructured or partial data downstream.
276
+ - Produce stubs and acknowledge them — prevent them.
277
+ ```
278
+
279
+ ---
280
+
281
+ *Version 1.1 — Updated to add pre-flight gate enforcement, countable verification,
282
+ split generation step pattern, and Copilot-specific stub prevention guidance.
283
+ Root cause addressed: gates were post-generation reflections, not pre-generation blockers.*
package/README.md CHANGED
@@ -1,11 +1,12 @@
1
1
  # @qa-gentic/stlc-agents
2
2
 
3
- > AI-powered QA STLC automation — from Azure DevOps **or Jira Cloud** work item to self-healing Playwright TypeScript in a Helix-QA project.
3
+ > AI-powered QA STLC automation — from Azure DevOps **or Jira Cloud** work item state change to self-healing Playwright TypeScript in a Helix-QA project.
4
4
 
5
- Works with **GitHub Copilot** (VS Code Agent mode), **Claude Code**, **Cursor**, and **Windsurf**.
5
+ Works with **GitHub Copilot** (VS Code Agent mode), **Claude Code**, **Cursor**, and **Windsurf**.
6
+ Also runs fully headless via the **webhook bridge** — no human in the loop required.
6
7
 
7
- ![npm version](https://img.shields.io/badge/npm-v1.0.10-blue)
8
- ![PyPI version](https://img.shields.io/badge/pypi-v1.0.10-blue)
8
+ ![npm version](https://img.shields.io/badge/npm-v1.0.17-blue)
9
+ ![PyPI version](https://img.shields.io/badge/pypi-v1.0.17-blue)
9
10
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
10
11
  [![Node.js >=18](https://img.shields.io/badge/node-%3E%3D18-brightgreen)](https://nodejs.org)
11
12
  [![Python >=3.10](https://img.shields.io/badge/python-%3E%3D3.10-blue)](https://python.org)
@@ -14,19 +15,103 @@ Works with **GitHub Copilot** (VS Code Agent mode), **Claude Code**, **Cursor**,
14
15
 
15
16
  ## What It Does
16
17
 
17
- Five Python MCP servers cover the full QA Software Test Life Cycle:
18
+ Five Python MCP servers cover the full QA Software Test Life Cycle. A sixth server (Playwright MCP) drives a real browser during code generation.
18
19
 
19
- | Agent | Input | Output |
20
- |---|---|---|
21
- | `qa-test-case-manager` | ADO PBI / Bug / Feature ID | Manual test cases created & linked via TestedBy-Forward |
22
- | `qa-gherkin-generator` | ADO or Jira Epic / Feature / PBI / Bug ID | `.feature` file attached to work item or saved locally |
23
- | `qa-playwright-generator` | Gherkin + live browser AX tree | `locators.ts` + page objects + step defs |
24
- | `qa-helix-writer` | Generated `.ts` files + `helix_root` | Files written to Helix-QA directory layout on disk |
25
- | `qa-jira-manager` | Jira Story / Bug / Task / Epic | Test cases in Jira + full pipeline to Helix-QA |
20
+ | Agent | Server name | Input | Output |
21
+ |---|---|---|---|
22
+ | Agent 1 | `qa-test-case-manager` | ADO PBI / Bug / Feature | Manual test cases created & linked (TestedBy-Forward), deduped on re-trigger |
23
+ | Agent 2 | `qa-gherkin-generator` | ADO Feature / PBI / Bug | `.feature` file validated and attached to work item |
24
+ | Agent 3 | `qa-playwright-generator` | Gherkin + optional AX-tree `context_map` | `locators.ts` + page objects + step defs (cached, retrieved via `get_generated_files`) |
25
+ | Agent 4 | `qa-helix-writer` | Generated `.ts` files + `helix_root` | Files written to Helix-QA directory layout, never overwrites |
26
+ | Agent 5 | `qa-jira-manager` | Jira Story / Bug / Task | Test cases created & linked in Jira, `.feature` attached to issue, deduped on re-trigger |
27
+
28
+ ---
29
+
30
+ ## End-to-End Flow
31
+
32
+ ### Webhook-triggered (headless)
33
+
34
+ A work item state change in ADO or Jira fires a webhook POST to the bridge. The bridge normalises the payload and routes it through the pipeline automatically.
35
+
36
+ ```
37
+ ADO Service Hook / Jira Webhook
38
+ │ POST
39
+
40
+ webhook_bridge/server.py (FastAPI — qa-stlc-serve)
41
+
42
+ ├─ parsers.py raw payload → normalised event dict
43
+ ├─ state_router.py event → STAGE_TEST_CASES | STAGE_FULL_PIPELINE | SKIP
44
+
45
+ └─ ci_runner/pipeline.py
46
+ ├─ run_test_cases() → Agent 1 (ADO) or Agent 5 (Jira)
47
+ └─ run_post_done() → Agent 2 → Agent 3 → Agent 4
48
+ ```
49
+
50
+ ### State → stage mapping
51
+
52
+ | Platform | Work item state | Stage | Agents called |
53
+ |---|---|---|---|
54
+ | ADO | `Approved` / `Committed` | `STAGE_TEST_CASES` | Agent 1 only |
55
+ | ADO | `Done` | `STAGE_FULL_PIPELINE` | Agents 2 → 3 → 4 |
56
+ | Jira | `In Progress` / `Selected for Development` | `STAGE_TEST_CASES` | Agent 5 only |
57
+ | Jira | `Done` | `STAGE_FULL_PIPELINE` | Agents 2 → 3 → 4 |
58
+
59
+ Any other state is silently dropped. State names are configurable in `state_router.py`.
60
+
61
+ ### STAGE_TEST_CASES — Agent 1 / Agent 5
62
+
63
+ ```
64
+ fetch_work_item / fetch_jira_issue
65
+
66
+ [LLM] generate_test_cases() ← pipeline bridges fetch → LLM → create
67
+
68
+ create_deduped_test_cases() ← internally: get_linked → filter dupes → create_and_link
69
+ ```
70
+
71
+ On re-trigger, titles already linked are skipped (case-insensitive, stop-word normalised). Only net-new test cases are created.
72
+
73
+ ### STAGE_FULL_PIPELINE — Agents 2 → 3 → 4
26
74
 
27
- A sixth server **Playwright MCP** (`http://localhost:8931/mcp`) drives a real browser during code generation, replacing hand-authored locators with accessibility-tree-derived, zero-hallucination selectors.
75
+ First error stops the chain and surfaces in the result dict. Subsequent agents are not called.
28
76
 
29
- > **ADO or Jira?** Both pipelines run the full STLC: fetch → analyse → test cases → Gherkin → Playwright → Helix-QA. ADO uses `qa-test-case-manager` as the entry point; Jira uses `qa-jira-manager`. Agents 2–4 (`qa-gherkin-generator`, `qa-playwright-generator`, `qa-helix-writer`) are shared by both pipelines.
77
+ **Agent 2 Gherkin**
78
+
79
+ ```
80
+ ADO Feature : fetch_feature_hierarchy → [LLM] generate_gherkin
81
+ ADO PBI/Bug : fetch_work_item_for_gherkin → [LLM] generate_gherkin
82
+ Jira : fetch_jira_issue (via Agent 5) → [LLM] generate_gherkin
83
+
84
+ validate_gherkin_content() ← structural check before attach
85
+ ↓ (if invalid → pipeline stops, returns validation errors)
86
+ ADO Feature : attach_gherkin_to_feature()
87
+ ADO PBI/Bug : attach_gherkin_to_work_item()
88
+ Jira : attach_gherkin_to_issue() ← uploads via Jira attachment API
89
+ ```
90
+
91
+ The `generate_and_attach_gherkin` composite tool wraps validate + attach for CI/headless callers.
92
+
93
+ **Agent 3 — Playwright**
94
+
95
+ ```
96
+ generate_playwright_code(gherkin_content, page_class_name)
97
+ ↓ returns cache_key (files held in-memory)
98
+ get_generated_files(cache_key)
99
+ ↓ returns { "path/to/file.ts": "content", ... }
100
+ ```
101
+
102
+ `page_class_name` is derived from `event["title"]` (ADO) or `event["summary"]` (Jira) — camel-cased, max 4 words. Without a `context_map`, locators are Gherkin-inferred (stability=0). Pass `app_url` to embed a snapshot hint comment in `locators.ts`.
103
+
104
+ **Agent 4 — Helix writer**
105
+
106
+ ```
107
+ inspect_helix_project(helix_root)
108
+ ↓ framework_state: absent | partial | present
109
+ write_helix_files(helix_root, files, mode)
110
+ mode = "scaffold_and_tests" if absent or partial
111
+ mode = "tests_only" if present
112
+ ```
113
+
114
+ Agent 4 handles all deduplication and conflict renaming internally. No file is ever overwritten.
30
115
 
31
116
  ---
32
117
 
@@ -34,11 +119,9 @@ A sixth server — **Playwright MCP** (`http://localhost:8931/mcp`) — drives a
34
119
 
35
120
  ```bash
36
121
  # 1. Install the CLI + npm package globally
37
- # You will be prompted to choose your integration: ado / jira / both
122
+ # Prompted to choose integration: ado / jira / both
38
123
  npm install -g @qa-gentic/stlc-agents
39
- ```
40
124
 
41
- ```bash
42
125
  # 2. Bootstrap your project
43
126
  qa-stlc init --vscode --integration ado # GitHub Copilot / VS Code — ADO
44
127
  qa-stlc init --vscode --integration jira # GitHub Copilot / VS Code — Jira
@@ -49,16 +132,17 @@ qa-stlc init --integration ado
49
132
  # 3. Scaffold a new Playwright + Cucumber + TypeScript QA project
50
133
  qa-stlc scaffold --name my-qa-project
51
134
 
52
- # 4. Start the Playwright browser server (required for code generation)
135
+ # 4. Start the Playwright browser server (required for live-locator generation)
53
136
  npx @playwright/mcp@latest --port 8931
54
137
  ```
55
138
 
56
- `qa-stlc init` does four things:
139
+ `qa-stlc init` does five things:
57
140
 
58
- 1. `pip install qa-gentic-stlc-agents` — installs all five Python MCP servers
141
+ 1. `pip install qa-gentic-stlc-agents` — installs all five Python MCP servers + rules files
59
142
  2. Copies skill files to `.github/copilot-instructions/` (and `.claude/` if not `--vscode`)
60
143
  3. Copies custom agent files to `.github/agents/`
61
- 4. Writes `.vscode/mcp.json` with all six servers configured
144
+ 4. Copies `ORCHESTRATION_RULES.md` to project root for reference during multi-step tasks
145
+ 5. Writes `.vscode/mcp.json` (or `.mcp.json`) with all six servers configured
62
146
 
63
147
  ---
64
148
 
@@ -78,12 +162,15 @@ npx @playwright/mcp@latest --port 8931
78
162
  ADO_ORGANIZATION_URL=https://dev.azure.com/your-org
79
163
  ADO_PROJECT_NAME=YourProject
80
164
  ADO_PAT=your-personal-access-token
81
- APP_BASE_URL=your-app-base-url
82
- APP_EMAIL=your-test-email@example.com
83
- APP_PASSWORD=your-test-password
165
+ APP_BASE_URL=https://your-app.example.com
166
+
167
+ # LLM — pick one provider:
168
+ AI_HEALING_PROVIDER=anthropic
169
+ AI_HEALING_API_KEY=sk-ant-...
170
+ # or: OPENAI_API_KEY / AZURE_OPENAI_API_KEY / GITHUB_TOKEN (Copilot)
84
171
  ```
85
172
 
86
- **Jira `.env` vars (additional):**
173
+ **Jira additional `.env` vars:**
87
174
 
88
175
  ```env
89
176
  JIRA_CLIENT_ID=your-atlassian-oauth-client-id
@@ -91,6 +178,14 @@ JIRA_CLIENT_SECRET=your-atlassian-oauth-client-secret
91
178
  JIRA_CLOUD_ID=your-atlassian-cloud-id
92
179
  ```
93
180
 
181
+ **Webhook bridge additional `.env` vars:**
182
+
183
+ ```env
184
+ WEBHOOK_SECRET=your-shared-secret
185
+ HELIX_PROJECT_ROOT=/path/to/helix-qa # where Agent 4 writes files
186
+ PLAYWRIGHT_MCP_URL=http://localhost:8931/mcp # leave blank to skip Agent 3
187
+ ```
188
+
94
189
  ---
95
190
 
96
191
  ## CLI Commands
@@ -101,56 +196,83 @@ JIRA_CLOUD_ID=your-atlassian-cloud-id
101
196
  | `qa-stlc scaffold [--name n] [--dir path]` | Copy full Playwright + Cucumber + TypeScript boilerplate to a new project |
102
197
  | `qa-stlc skills [--target claude\|vscode\|cursor\|windsurf]` | Copy skill files to the correct AI coding agent directory |
103
198
  | `qa-stlc mcp-config [--vscode] [--print]` | Write `.vscode/mcp.json` or `.mcp.json` with all servers configured |
104
- | `qa-stlc verify` | Check that all six MCP servers are reachable |
199
+ | `qa-stlc verify` | Check that all MCP servers are reachable and auth is cached |
200
+ | `qa-stlc-serve [--host] [--port] [--reload]` | Start the webhook bridge (FastAPI) |
201
+
202
+ ---
203
+
204
+ ## Orchestration Rules
205
+
206
+ When `qa-stlc init` is run, `ORCHESTRATION_RULES.md` is installed to your project root and placed in both npm (`node_modules/@qa-gentic/stlc-agents/`) and pip (`site-packages/stlc_agents/`) installations.
207
+
208
+ Refer to this file in multi-step QA workflows to ensure:
209
+ - **Step breakdown:** every task is decomposed into named steps with explicit inputs/outputs
210
+ - **Pre-flight gates:** output validation runs *before* generation, not after
211
+ - **No skipped steps:** intermediate data is structured and handed off explicitly, never inferred
212
+ - **Countable verification:** code generation steps state expected item counts before implementation
213
+ - **Zero stubs:** no partial outputs with TODOs or empty bodies passed downstream
214
+
215
+ Integration per AI coding agent:
216
+
217
+ - **GitHub Copilot (VS Code):** Add to `.github/copilot-instructions.md` or reference in your workspace prompt
218
+ - **Claude Code:** Reference in `.claude/instructions.md` or as a project document
219
+ - **Cursor:** Add to `.cursor/rules/orchestration.mdc` (scope: always)
220
+ - **Windsurf:** Reference in `.windsurf/rules.md` or global rules panel
221
+
222
+ See section 8 of `ORCHESTRATION_RULES.md` for agent-specific integration notes.
105
223
 
106
224
  ---
107
225
 
108
226
  ## Tool Reference
109
227
 
110
- ### qa-test-case-manager _(Azure DevOps)_
228
+ ### Agent 1 — `qa-test-case-manager` _(Azure DevOps)_
111
229
 
112
230
  | Tool | Description |
113
231
  |---|---|
114
- | `fetch_work_item` | Fetch a PBI, Bug, or Feature with acceptance criteria, coverage hints, and existing TC count. Returns `epic_not_supported` for Epics; `confirmation_required: true` for Features. |
115
- | `get_linked_test_cases` | List all test cases already linked to a work item (deduplication). |
116
- | `create_and_link_test_cases` | Create structured manual test cases in ADO and link via TestedBy-Forward. |
232
+ | `fetch_work_item` | Fetch a PBI, Bug, or Feature with acceptance criteria, coverage hints, and existing TC count. Returns `epic_not_supported` for Epics. |
233
+ | `get_linked_test_cases` | List all test cases linked via TestedBy-Forward (used by dedup). |
234
+ | `create_and_link_test_cases` | Create structured manual test cases and link to work item. Feature confirmation gate fires unless `confirmed=true`. |
235
+ | `create_deduped_test_cases` | **Headless/webhook tool.** Internally calls `get_linked_test_cases`, filters duplicates (normalised title match), then calls `create_and_link_test_cases` on net-new only. Safe to call on every re-trigger. |
117
236
 
118
- ### qa-gherkin-generator _(Azure DevOps)_
237
+ ### Agent 2 — `qa-gherkin-generator` _(Azure DevOps)_
119
238
 
120
239
  | Tool | Description |
121
240
  |---|---|
122
- | `fetch_feature_hierarchy` | Fetch a Feature and all child PBIs/Bugs with acceptance criteria and test case steps. |
123
- | `fetch_work_item_for_gherkin` | Fetch a PBI or Bug with parent Feature context and suggested file name. |
124
- | `attach_gherkin_to_feature` | Validate and attach a `.feature` file to a Feature work item. |
125
- | `attach_gherkin_to_work_item` | Validate and attach a `.feature` file to a PBI or Bug. |
126
- | `validate_gherkin_content` | Structural validation returns `valid: bool` + `errors` + `warnings`. |
241
+ | `fetch_feature_hierarchy` | Fetch a Feature + all child PBIs/Bugs + coverage hints. |
242
+ | `fetch_work_item_for_gherkin` | Fetch a single PBI or Bug with parent Feature context. |
243
+ | `validate_gherkin_content` | Structural check: @smoke/@regression tags, scenario count (5–10 feature / 3–9 work_item), every scenario has `When`, no duplicate titles. |
244
+ | `attach_gherkin_to_feature` | Validate + upload + link `.feature` to a Feature work item. |
245
+ | `attach_gherkin_to_work_item` | Validate + upload + link `.feature` to a PBI or Bug. |
246
+ | `generate_and_attach_gherkin` | **Headless/webhook composite.** Accepts pre-generated `gherkin_content`, validates, attaches. Returns `status: validation_failed` with errors if invalid — pipeline must re-generate. |
127
247
 
128
- ### qa-playwright-generator _(ADO + Jira)_
248
+ ### Agent 3 — `qa-playwright-generator` _(ADO + Jira)_
129
249
 
130
250
  | Tool | Description |
131
251
  |---|---|
132
- | `generate_playwright_code` | Generate `locators.ts`, `*Page.ts`, `*.steps.ts`, and `cucumber-profile.js` from Gherkin + live AX-tree `context_map`. Hard-blocks if `context_map` is absent. |
133
- | `scaffold_locator_repository` | Generate the five Helix-QA healing infrastructure files. |
252
+ | `generate_playwright_code` | Generate `locators.ts`, `*Page.ts`, `*.steps.ts`, `cucumber-profile.js`. Returns `cache_key`. Optional `context_map` for AX-tree-verified locators; optional `app_url` embeds snapshot hint. |
253
+ | `get_generated_files` | Retrieve full file content by `cache_key` from Agent 3's in-memory cache. |
254
+ | `scaffold_locator_repository` | Generate the five Helix-QA healing infrastructure files (`LocatorHealer`, `TimingHealer`, `VisualIntentChecker`, `LocatorRepository`, `HealingDashboard`). Call once per project. |
134
255
  | `validate_gherkin_steps` | Check for duplicate step strings and missing `When` steps. |
135
- | `attach_code_to_work_item` | Attach delta Playwright TypeScript files to an ADO work item. |
256
+ | `attach_code_to_work_item` | Attach generated TypeScript delta files to an ADO work item. |
136
257
 
137
- ### qa-helix-writer _(ADO + Jira)_
258
+ ### Agent 4 — `qa-helix-writer` _(ADO + Jira)_
138
259
 
139
260
  | Tool | Description |
140
261
  |---|---|
141
- | `inspect_helix_project` | Returns `framework_state` (`present` / `partial` / `absent`) and `recommendation`. |
262
+ | `inspect_helix_project` | Returns `framework_state`: `present` / `partial` / `absent`. Drives `mode` selection in pipeline. |
263
+ | `write_helix_files` | Write files to Helix-QA layout. `mode=scaffold_and_tests` for new/partial projects; `mode=tests_only` for existing. Never overwrites; deduplicates and conflict-renames. |
142
264
  | `list_helix_tree` | Full directory listing of a Helix-QA project. |
143
265
  | `read_helix_file` | Read an existing file for overlap detection. |
144
- | `write_helix_files` | Write generated files to the correct Helix-QA paths. |
145
266
 
146
- ### qa-jira-manager _(Jira Cloud)_
267
+ ### Agent 5 — `qa-jira-manager` _(Jira Cloud)_
147
268
 
148
269
  | Tool | Description |
149
270
  |---|---|
150
271
  | `fetch_jira_issue` | Fetch a Story, Bug, or Task with acceptance criteria and coverage hints. Returns `epic_use_hierarchy` for Epics. |
151
- | `fetch_jira_epic_hierarchy` | Fetch an Epic with all child issues (Stories, Tasks, Bugs, Sub-tasks). |
152
- | `create_and_link_test_cases` | Create test case issues in Jira (type `Test`, falls back to `Task`) and link via `is tested by`. Steps stored as ADF table — no Xray required. |
153
- | `get_linked_test_cases` | List all issues linked via `is tested by` / `Test` link type (deduplication). |
272
+ | `get_linked_test_cases` | List all issues linked via `is tested by` / `Test` link type (used by dedup). |
273
+ | `create_and_link_test_cases` | Create test case issues in Jira (type `Test`, falls back to `Task`) and link. Steps stored as ADF table — no Xray required. Epic confirmation gate fires unless `confirmed=true`. |
274
+ | `create_deduped_test_cases` | **Headless/webhook tool.** Internally calls `get_linked_test_cases`, filters duplicates (normalised summary match), then calls `create_and_link_test_cases` on net-new only. Safe on every re-trigger. |
275
+ | `attach_gherkin_to_issue` | Upload a `.feature` file as an attachment to a Jira issue via the Jira attachment API. File is named `{issue_key}_{summary_kebab}_regression.feature`. |
154
276
 
155
277
  ---
156
278
 
@@ -166,25 +288,84 @@ Healed selectors persist in `LocatorRepository`. All AI suggestions require huma
166
288
 
167
289
  ---
168
290
 
169
- ## Workflow Comparison: ADO vs Jira
291
+ ## Webhook / Auto-Trigger Setup
292
+
293
+ ### Local dev
294
+
295
+ ```bash
296
+ # 1. Install webhook extras
297
+ pip install "qa-gentic-stlc-agents[webhook]"
298
+
299
+ # 2. Start the bridge
300
+ make serve
301
+
302
+ # 3. Expose via ngrok
303
+ ngrok http 8080
304
+
305
+ # 4. Register hooks
306
+ make register-ado-hooks BRIDGE_URL=https://xxxx.ngrok.io
307
+ make register-jira-hooks BRIDGE_URL=https://xxxx.ngrok.io
308
+ ```
309
+
310
+ ### Azure Functions deploy
311
+
312
+ ```bash
313
+ make deploy-azure FUNC_APP=stlc-webhook-bridge
314
+ make register-ado-hooks BRIDGE_URL=https://stlc-webhook-bridge.azurewebsites.net/api
315
+ make register-jira-hooks BRIDGE_URL=https://stlc-webhook-bridge.azurewebsites.net/api
316
+ ```
317
+
318
+ ### Docker
319
+
320
+ ```bash
321
+ make docker-build-webhook
322
+ make docker-run-webhook
323
+ ```
324
+
325
+ ### LLM provider selection
326
+
327
+ | Provider | Env var(s) | Default model |
328
+ |---|---|---|
329
+ | `anthropic` | `AI_HEALING_API_KEY` or `ANTHROPIC_API_KEY` | `claude-haiku-4-5-20251001` |
330
+ | `copilot` | `GITHUB_TOKEN` | `gpt-4o` |
331
+ | `openai` | `OPENAI_API_KEY` | `gpt-4o-mini` |
332
+ | `azure-openai` | `AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` | `gpt-4o-mini` |
333
+ | `ollama` | `OLLAMA_HOST` (optional) | `llama3.2` |
334
+
335
+ Override model at any time: `LLM_MODEL=claude-sonnet-4-6`
336
+
337
+ ### Token cost (headless, Haiku default)
338
+
339
+ | Stage | Tokens / work item | Approx cost |
340
+ |---|---|---|
341
+ | Test case generation (state → Approved) | ~8,000–15,000 | $0.01–$0.03 |
342
+ | Gherkin generation (state → Done) | ~10,000–18,000 | $0.01–$0.04 |
343
+ | Playwright TS generation (state → Done) | ~20,000–35,000 | $0.03–$0.07 |
344
+
345
+ ---
346
+
347
+ ## ADO vs Jira — Side by Side
170
348
 
171
349
  | Step | Azure DevOps | Jira Cloud |
172
350
  |---|---|---|
351
+ | Trigger | Service Hook: `workitem.updated` | Webhook: `jira:issue_updated` |
173
352
  | Fetch issue | `fetch_work_item` | `fetch_jira_issue` |
174
- | Fetch Epic hierarchy | Not supported — warn user | `fetch_jira_epic_hierarchy` |
175
- | Check for duplicates | `get_linked_test_cases` | `get_linked_test_cases` |
176
- | Create & link test cases | `create_and_link_test_cases` | `create_and_link_test_cases` |
177
- | Generate Gherkin | `fetch_work_item_for_gherkin` `attach_gherkin_to_work_item` | `qa-gherkin-generator` with Jira AC |
178
- | Generate Playwright | `generate_playwright_code` (shared) | `generate_playwright_code` (shared) |
353
+ | Check duplicates | `get_linked_test_cases` | `get_linked_test_cases` |
354
+ | Create test cases (interactive) | `create_and_link_test_cases` | `create_and_link_test_cases` |
355
+ | Create test cases (headless) | `create_deduped_test_cases` | `create_deduped_test_cases` |
356
+ | Gherkin attach | `attach_gherkin_to_feature` / `attach_gherkin_to_work_item` | `attach_gherkin_to_issue` |
357
+ | Headless Gherkin composite | `generate_and_attach_gherkin` | `generate_and_attach_gherkin` + `attach_gherkin_to_issue` |
358
+ | Playwright generation | `generate_playwright_code` (shared) | `generate_playwright_code` (shared) |
179
359
  | Write to disk | `write_helix_files` (shared) | `write_helix_files` (shared) |
180
- | Authentication | MSAL silent + browser (`~/.msal-cache/`) | OAuth 2.0 (3LO) + browser (`~/.jira-cache/`) |
181
- | Link relation | `TestedBy-Forward` | `is tested by` / `Test` link type |
360
+ | Link relation | `TestedBy-Forward` | `is tested by` / `Test` |
361
+ | Auth | MSAL silent + browser (`~/.msal-cache/`) | OAuth 2.0 3LO + browser (`~/.jira-cache/`) |
182
362
 
183
363
  ---
184
364
 
185
365
  ## Run Tests
186
366
 
187
367
  ```bash
368
+ # Full suite
188
369
  ENABLE_SELF_HEALING=true \
189
370
  HEALING_DASHBOARD_PORT=7890 \
190
371
  APP_BASE_URL=<your-app-base-url> \
@@ -204,8 +385,7 @@ cucumber-js --config=config/cucumber.js -p <feature_profile> --tags "@smoke"
204
385
  - [ARCHITECTURE-JIRA.md](ARCHITECTURE-JIRA.md) — Full technical architecture, Jira pipeline
205
386
  - [WALKTHROUGH-ADO.md](WALKTHROUGH-ADO.md) — End-to-end walkthrough, ADO pipeline
206
387
  - [WALKTHROUGH-JIRA.md](WALKTHROUGH-JIRA.md) — End-to-end walkthrough, Jira pipeline
207
- - [PEER-DEV-PRESENTATION.md](PEER-DEV-PRESENTATION.md) — Developer team overview
208
- - [MANAGEMENT-ROI.md](MANAGEMENT-ROI.md) — ROI, quality impact, and cost analysis
388
+ - [WEBHOOK.md](WEBHOOK.md) — Webhook bridge setup, deployment, and state trigger customisation
209
389
 
210
390
  ---
211
391
 
@@ -52,7 +52,15 @@ const info = (s) => console.log(`${C.cyan}→${C.reset} ${s}`);
52
52
  const warn = (s) => console.log(`${C.yellow}⚠${C.reset} ${s}`);
53
53
  const d = (s) => `${C.dim}${s}${C.reset}`;
54
54
 
55
- console.log(`\n${b("QA STLC Agents")} v${pkg.version} — post-install\n`);
55
+ console.log(`
56
+ ${b("QA STLC Agents")} v${pkg.version} — post-install
57
+
58
+ ${d("This npm package includes:")}
59
+ • Five Python MCP servers for Azure DevOps + Jira Cloud
60
+ • Skill files for AI coding agents (Claude Code, Copilot, Cursor, Windsurf)
61
+ • ORCHESTRATION_RULES.md — reference guide for multi-step QA workflows
62
+ • Command-line tools: qa-stlc init, qa-stlc scaffold, qa-stlc skills, etc.
63
+ `);
56
64
 
57
65
  // ── 1. Find Python ────────────────────────────────────────────────────────────
58
66
  const pythonCandidates = ["python3", "python"];
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@qa-gentic/stlc-agents",
3
- "version": "1.0.17",
3
+ "version": "1.0.18",
4
4
  "description": "QA STLC Agents — five MCP servers + skills for AI-powered test case, Gherkin, Playwright generation, and Helix-QA file writing against Azure DevOps and Jira Cloud. Full pipeline for both: fetch → test cases → Gherkin → Playwright → Helix-QA. Works with Claude Code, GitHub Copilot, Cursor, Windsurf.",
5
5
  "keywords": [
6
6
  "playwright",
@@ -37,11 +37,24 @@
37
37
  "src/",
38
38
  "skills/",
39
39
  ".github/agents/",
40
- "README.md"
40
+ "README.md",
41
+ "ORCHESTRATION_RULES.md"
41
42
  ],
42
43
  "scripts": {
43
44
  "postinstall": "node ./bin/postinstall.js"
44
45
  },
46
+ "_comment": "Diff to apply to package.json — add the cost command to qa-stlc.js",
47
+
48
+ "diff": {
49
+ "bin/qa-stlc.js": {
50
+ "add_require": "const cmdCost = require('../src/cli/cmd-cost');",
51
+ "add_command": {
52
+ "after": "// ── scaffold (or any existing last command) ──────────────────────────────",
53
+ "insert": "\n// ── cost ─────────────────────────────────────────────────────────────────\nprogram\n .command('cost')\n .description('Show token usage and cost for the current or past sessions.\\n' +\n 'Reads session logs from ~/.qa-stlc/cost-*.jsonl\\n' +\n 'Written automatically by the MCP servers on every tool call.')\n .option('--all', 'Show all sessions (default: last session only)')\n .option('--session <id>', 'Show a specific session by ID')\n .option('--json', 'Output raw JSON')\n .action(cmdCost);\n"
54
+ }
55
+ }
56
+ },
57
+ "full_command_to_add_to_qa-stlc.js": "// ── cost ─────────────────────────────────────────────────────────────────\nprogram\n .command('cost')\n .description(\n 'Show token usage and cost for the current or past pipeline sessions.\\n' +\n 'Reads logs from ~/.qa-stlc/cost-*.jsonl written by the MCP servers.\\n' +\n 'Each MCP tool call logs tokens, cost, and latency automatically.'\n )\n .option('--all', 'Show all sessions (not just the last one)')\n .option('--session <id>', 'Show a specific session by its ID')\n .option('--json', 'Emit raw JSON instead of a formatted table')\n .action(cmdCost);",
45
58
  "dependencies": {
46
59
  "commander": "^12.0.0",
47
60
  "which": "^4.0.0"
@@ -1,7 +1,7 @@
1
1
  /**
2
2
  * cmd-init.js — `qa-stlc init`
3
3
  *
4
- * Full bootstrap: install Python agents + skills + MCP config.
4
+ * Full bootstrap: install Python agents + skills + ORCHESTRATION_RULES.md + MCP config.
5
5
  * Accepts --integration <ado|jira|both>. When omitted, reads
6
6
  * ~/.qa-stlc/integration (written by postinstall) or prompts interactively.
7
7
  */
@@ -89,7 +89,23 @@ module.exports = async function init(opts) {
89
89
  }
90
90
  ok("qa-gentic-stlc-agents installed.");
91
91
 
92
- // ── 4. Install skills ─────────────────────────────────────────────────────
92
+ // ── 4. Copy ORCHESTRATION_RULES.md to project root ─────────────────────────
93
+ info("Installing ORCHESTRATION_RULES.md to project root…");
94
+ try {
95
+ const npmPkgDir = path.join(path.dirname(require.resolve("@qa-gentic/stlc-agents/package.json")));
96
+ const srcRules = path.join(npmPkgDir, "ORCHESTRATION_RULES.md");
97
+ const destRules = path.join(process.cwd(), "ORCHESTRATION_RULES.md");
98
+ if (fs.existsSync(srcRules)) {
99
+ fs.copyFileSync(srcRules, destRules);
100
+ ok("ORCHESTRATION_RULES.md copied to project root.");
101
+ } else {
102
+ warn("ORCHESTRATION_RULES.md not found in npm package (expected in development).");
103
+ }
104
+ } catch (e) {
105
+ // Silently skip if not available (common in dev environments before full build)
106
+ }
107
+
108
+ // ── 5. Install skills ─────────────────────────────────────────────────────
93
109
  info("Installing skills…");
94
110
  const skillTarget = opts.vscode ? "vscode" : "claude";
95
111
  await cmdSkills({ target: skillTarget, integration });
@@ -132,6 +148,7 @@ ${C.bold}Setup complete.${C.reset}
132
148
  ${C.dim}Integration:${C.reset} ${C.bold}${integration}${C.reset}
133
149
  ${C.dim}MCP config :${C.reset} ${mcpLocation}
134
150
  ${C.dim}Skills :${C.reset} ${skillsLocation}
151
+ ${C.dim}Rules :${C.reset} ORCHESTRATION_RULES.md ${C.dim}(project root — reference for multi-step workflows)${C.reset}
135
152
 
136
153
  ${C.bold}Start Playwright MCP${C.reset} ${C.dim}(keep running in a separate terminal):${C.reset}
137
154
 
@@ -169,7 +169,10 @@ function buildClaudeConfig(pythonBin, playwrightPort, integration) {
169
169
  }
170
170
  }
171
171
 
172
- servers["playwright"] = { type: "url", url: `ws://localhost:${playwrightPort}` };
172
+ servers["playwright"] = {
173
+ command: "npx",
174
+ args: ["@playwright/mcp@latest", "--isolated"],
175
+ };
173
176
 
174
177
  return { config: { mcpServers: servers }, missing };
175
178
  }
@@ -190,14 +193,7 @@ function buildVscodeConfig(pythonBin, playwrightPort, integration) {
190
193
  type: "stdio",
191
194
  command: bin,
192
195
  args: [],
193
- env: {
194
- // Azure DevOps auth passthrough (original)
195
- "AZURE_TENANT_ID": "${env:AZURE_TENANT_ID}",
196
- "AZURE_CLIENT_ID": "${env:AZURE_CLIENT_ID}",
197
- "AZURE_CLIENT_SECRET": "${env:AZURE_CLIENT_SECRET}",
198
- // Cost tracking passthrough (new)
199
- ...COST_ENV,
200
- },
196
+ env: { ...COST_ENV },
201
197
  };
202
198
  } else {
203
199
  missing.push(name);
@@ -236,8 +232,9 @@ function buildVscodeConfig(pythonBin, playwrightPort, integration) {
236
232
  }
237
233
 
238
234
  servers["playwright"] = {
239
- type: "http",
240
- url: `http://localhost:${playwrightPort}/mcp`,
235
+ type: "stdio",
236
+ command: "npx",
237
+ args: ["@playwright/mcp@latest", "--isolated"],
241
238
  };
242
239
 
243
240
  return { config: { servers }, missing };
@@ -246,9 +243,8 @@ function buildVscodeConfig(pythonBin, playwrightPort, integration) {
246
243
  function printNextSteps(mode, playwrightPort) {
247
244
  const isVscode = mode === "vscode";
248
245
  console.log(`
249
- ${C.dim}Start Playwright MCP before running generation workflows:${C.reset}
250
- npx @playwright/mcp@latest --port ${playwrightPort}
251
- ${C.dim}headless (CI): npx @playwright/mcp@latest --headless --port ${playwrightPort}${C.reset}
246
+ ${C.dim}Playwright MCP is auto-started by the MCP framework (--isolated, no manual start needed).
247
+ For CI/headless: set PLAYWRIGHT_MCP_URL or start manually with --headless --isolated --port ${playwrightPort}${C.reset}
252
248
 
253
249
  ${isVscode
254
250
  ? `Reload VS Code window — all MCP servers will appear in the MCP panel.`
@@ -9,6 +9,8 @@
9
9
  * windsurf → .windsurf/rules/
10
10
  * both → claude + vscode
11
11
  * print → stdout only
12
+ *
13
+ * Also installs ORCHESTRATION_RULES.md to project root (multi-step workflow reference).
12
14
  */
13
15
  "use strict";
14
16
 
@@ -35,10 +37,11 @@ function readIntegrationPrefSk() {
35
37
  }
36
38
 
37
39
  // Resolve the skills bundled with this npm package
38
- const PKG_ROOT = path.resolve(__dirname, "../..");
39
- const SKILLS_DIR = path.join(PKG_ROOT, "skills");
40
- const BEHAVIOR_MD = path.join(SKILLS_DIR, "AGENT-BEHAVIOR.md");
41
- const AGENTS_DIR = path.join(PKG_ROOT, ".github", "agents");
40
+ const PKG_ROOT = path.resolve(__dirname, "../..");
41
+ const SKILLS_DIR = path.join(PKG_ROOT, "skills");
42
+ const BEHAVIOR_MD = path.join(SKILLS_DIR, "AGENT-BEHAVIOR.md");
43
+ const ORCHESTRATION_MD = path.join(PKG_ROOT, "ORCHESTRATION_RULES.md");
44
+ const AGENTS_DIR = path.join(PKG_ROOT, ".github", "agents");
42
45
 
43
46
  /** Copy a file, creating parent dirs as needed. */
44
47
  function cp(src, dest) {
@@ -105,6 +108,15 @@ function installAgents(integration) {
105
108
  agentFiles(integration).forEach((f) => info(path.basename(f)));
106
109
  }
107
110
 
111
+ /** Install ORCHESTRATION_RULES.md to project root for multi-step workflow reference. */
112
+ function installOrchestrationRules() {
113
+ if (!fs.existsSync(ORCHESTRATION_MD)) return;
114
+ const dest = path.join(CWD, "ORCHESTRATION_RULES.md");
115
+ cp(ORCHESTRATION_MD, dest);
116
+ ok(`ORCHESTRATION_RULES.md installed → project root`);
117
+ info("Reference this file for multi-step QA workflow best practices");
118
+ }
119
+
108
120
  function installClaude(integration) {
109
121
  const dest = path.join(CWD, ".claude", "skills");
110
122
  // Copy entire skill directory (preserves references/ subdirectory)
@@ -118,6 +130,7 @@ function installClaude(integration) {
118
130
  info("AGENT-BEHAVIOR.md → .claude/AGENT-BEHAVIOR.md");
119
131
  skillEntries(integration).forEach((e) => info(`${e.name}/SKILL.md`));
120
132
  installAgents(integration);
133
+ installOrchestrationRules();
121
134
  printPlaywrightHint();
122
135
  }
123
136
 
@@ -134,6 +147,7 @@ function installVscode(integration) {
134
147
  info("AGENT-BEHAVIOR.md → .github/copilot-instructions/AGENT-BEHAVIOR.md");
135
148
  skillEntries(integration).forEach((e) => info(`${e.name}/SKILL.md`));
136
149
  installAgents(integration);
150
+ installOrchestrationRules();
137
151
  printPlaywrightHint();
138
152
  }
139
153
 
@@ -146,6 +160,7 @@ function installCursor(integration) {
146
160
  cp(BEHAVIOR_MD, path.join(dest, "AGENT-BEHAVIOR.md"));
147
161
  ok(`Skills installed → .cursor/rules/`);
148
162
  installAgents(integration);
163
+ installOrchestrationRules();
149
164
  printPlaywrightHint();
150
165
  }
151
166
 
@@ -158,6 +173,7 @@ function installWindsurf(integration) {
158
173
  cp(BEHAVIOR_MD, path.join(dest, "AGENT-BEHAVIOR.md"));
159
174
  ok(`Skills installed → .windsurf/rules/`);
160
175
  installAgents(integration);
176
+ installOrchestrationRules();
161
177
  printPlaywrightHint();
162
178
  }
163
179
 
@@ -165,6 +181,7 @@ function printSkills() {
165
181
  console.log("\nAvailable skills:\n");
166
182
  skillEntries().forEach((e) => console.log(` ${e.name}/SKILL.md`));
167
183
  console.log(` AGENT-BEHAVIOR.md`);
184
+ console.log(` ORCHESTRATION_RULES.md (workflow reference)`);
168
185
  }
169
186
 
170
187
  /** Recursively copy a directory tree. */
@@ -0,0 +1,154 @@
1
+ """
2
+ install_hook.py — stlc_agents.shared.install_hook
3
+ ─────────────────────────────────────────────────────
4
+ Called automatically by postinstall.js after `pip install qa-gentic-stlc-agents`.
5
+ Also callable manually: python -m stlc_agents.shared.install_hook
6
+
7
+ Applies the cost tracking patch to all 5 MCP server files by importing
8
+ and running the same logic as scripts/apply_cost_tracking.py, but resolved
9
+ relative to the installed package location (works in site-packages, .venv, etc).
10
+ """
11
+
12
+ from __future__ import annotations
13
+ import re
14
+ import sys
15
+ from pathlib import Path
16
+
17
+
18
+ SERVERS = [
19
+ ("agent_gherkin_generator", "qa-gherkin-generator"),
20
+ ("agent_test_case_manager", "qa-test-case-manager"),
21
+ ("agent_playwright_generator", "qa-playwright-generator"),
22
+ ("agent_helix_writer", "qa-helix-writer"),
23
+ ("agent_jira_manager", "qa-jira-manager"),
24
+ ]
25
+
26
+ IMPORT_MARKER = "from stlc_agents.shared.cost_tracker import track"
27
+ TIME_IMPORT = "import time"
28
+
29
+ OLD_RETURN = (
30
+ 'return [types.TextContent(type="text", '
31
+ 'text=json.dumps(result, indent=2, ensure_ascii=False))]'
32
+ )
33
+ NEW_RETURN = "return track(result, tool_name=name, server={server!r}, t0=t0)"
34
+
35
+ OLD_ERR_BLOCK = """\
36
+ return [types.TextContent(
37
+ type="text",
38
+ text=json.dumps({"error": str(exc), "tool": name}, indent=2),
39
+ )]"""
40
+
41
+ NEW_ERR_BLOCK = """\
42
+ err_result = {{"error": str(exc), "tool": name}}
43
+ return track(err_result, tool_name=name, server={server!r}, t0=t0)"""
44
+
45
+
46
+ def _root() -> Path:
47
+ """Resolve the stlc_agents package root regardless of install method."""
48
+ import stlc_agents
49
+ return Path(stlc_agents.__file__).parent
50
+
51
+
52
+ def patch_server(agent_dir: str, server_name: str, root: Path) -> str:
53
+ """Patch one server file. Returns 'patched' | 'already_patched' | 'not_found' | 'no_change'."""
54
+ path = root / agent_dir / "server.py"
55
+ if not path.exists():
56
+ return "not_found"
57
+
58
+ src = path.read_text(encoding="utf-8")
59
+ if IMPORT_MARKER in src:
60
+ return "already_patched"
61
+
62
+ original = src
63
+
64
+ # 1. import time
65
+ if TIME_IMPORT not in src:
66
+ src = src.replace("import sys\n", "import sys\nimport time\n", 1)
67
+
68
+ # 2. cost_tracker import — after last `from stlc_agents...` line
69
+ last_match = None
70
+ for m in re.finditer(r"^from stlc_agents\..+\n", src, re.MULTILINE):
71
+ last_match = m
72
+ if last_match:
73
+ pos = last_match.end()
74
+ src = src[:pos] + "from stlc_agents.shared.cost_tracker import track\n" + src[pos:]
75
+ else:
76
+ src = src.replace(
77
+ "from dotenv import load_dotenv\n",
78
+ "from dotenv import load_dotenv\nfrom stlc_agents.shared.cost_tracker import track\n",
79
+ 1,
80
+ )
81
+
82
+ # 3. t0 = time.monotonic() inside call_tool()
83
+ src = src.replace(
84
+ "@app.call_tool()\nasync def call_tool(name: str, arguments: dict)"
85
+ " -> list[types.TextContent]:\n try:\n",
86
+ "@app.call_tool()\nasync def call_tool(name: str, arguments: dict)"
87
+ " -> list[types.TextContent]:\n t0 = time.monotonic()\n try:\n",
88
+ 1,
89
+ )
90
+
91
+ # 4. Replace all result return lines
92
+ new_ret = NEW_RETURN.format(server=server_name)
93
+ for indent in (" ", " ", " "):
94
+ src = src.replace(f"{indent}{OLD_RETURN}", f"{indent}{new_ret}")
95
+
96
+ # 5. Replace error-path block
97
+ src = src.replace(OLD_ERR_BLOCK, NEW_ERR_BLOCK.format(server=server_name))
98
+
99
+ if src == original:
100
+ return "no_change"
101
+
102
+ path.write_text(src, encoding="utf-8")
103
+ return "patched"
104
+
105
+
106
+ def apply_cost_tracking() -> None:
107
+ """Entry point — called by postinstall.js and the console_script."""
108
+ root = _root()
109
+
110
+ ok = "\x1b[32m✓\x1b[0m"
111
+ skip = "\x1b[33m–\x1b[0m"
112
+ err = "\x1b[31m✗\x1b[0m"
113
+
114
+ print("\n stlc-agents · Activating cost tracking on MCP servers...\n")
115
+
116
+ any_patched = False
117
+ for agent_dir, server_name in SERVERS:
118
+ status = patch_server(agent_dir, server_name, root)
119
+ if status == "patched":
120
+ print(f" {ok} {agent_dir} ({server_name})")
121
+ any_patched = True
122
+ elif status == "already_patched":
123
+ print(f" {skip} {agent_dir} — already active")
124
+ elif status == "not_found":
125
+ print(f" {err} {agent_dir}/server.py — not found (skip)")
126
+ elif status == "no_change":
127
+ print(f" {skip} {agent_dir} — no matching pattern (manual patch needed)")
128
+
129
+ print()
130
+ if any_patched:
131
+ print(" Cost tracking is now active. On every MCP tool call you will see:")
132
+ print(" [stlc-cost] <server> · <tool> ~<N>K tokens $<cost> (session: $<total>)")
133
+ print()
134
+ print(" Session logs: ~/.qa-stlc/cost-<session-id>.jsonl")
135
+ print(" View report: qa-stlc cost")
136
+ print(" View all: qa-stlc cost --all")
137
+ print()
138
+ print(" Environment variables:")
139
+ print(" STLC_COST_TRACKING=false disable output")
140
+ print(" STLC_CODING_AGENT_MODEL=<model> set your agent's model for exact pricing")
141
+ print(" e.g. claude-sonnet-4-6 | claude-opus-4-6 | gpt-4o")
142
+ print(" STLC_COST_LOG_DIR=<path> change log directory")
143
+ else:
144
+ print(" All servers already have cost tracking active.")
145
+
146
+ print()
147
+
148
+
149
+ def main() -> None:
150
+ apply_cost_tracking()
151
+
152
+
153
+ if __name__ == "__main__":
154
+ main()