@yasserkhanorg/e2e-agents 0.9.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (93) hide show
  1. package/README.md +112 -584
  2. package/dist/agent/api_catalog.d.ts +11 -0
  3. package/dist/agent/api_catalog.d.ts.map +1 -0
  4. package/dist/agent/api_catalog.js +210 -0
  5. package/dist/agent/llm_agents_flow.d.ts +15 -0
  6. package/dist/agent/llm_agents_flow.d.ts.map +1 -0
  7. package/dist/agent/llm_agents_flow.js +434 -0
  8. package/dist/agent/native_flow.d.ts +6 -0
  9. package/dist/agent/native_flow.d.ts.map +1 -0
  10. package/dist/agent/native_flow.js +179 -0
  11. package/dist/agent/pipeline.d.ts +2 -25
  12. package/dist/agent/pipeline.d.ts.map +1 -1
  13. package/dist/agent/pipeline.js +30 -1329
  14. package/dist/agent/pipeline_types.d.ts +54 -0
  15. package/dist/agent/pipeline_types.d.ts.map +1 -0
  16. package/dist/agent/pipeline_types.js +4 -0
  17. package/dist/agent/pipeline_utils.d.ts +12 -0
  18. package/dist/agent/pipeline_utils.d.ts.map +1 -0
  19. package/dist/agent/pipeline_utils.js +156 -0
  20. package/dist/agent/process_runner.d.ts +10 -0
  21. package/dist/agent/process_runner.d.ts.map +1 -0
  22. package/dist/agent/process_runner.js +92 -0
  23. package/dist/agent/spec_generator.d.ts +5 -0
  24. package/dist/agent/spec_generator.d.ts.map +1 -0
  25. package/dist/agent/spec_generator.js +253 -0
  26. package/dist/agent/validation_runner.d.ts +5 -0
  27. package/dist/agent/validation_runner.d.ts.map +1 -0
  28. package/dist/agent/validation_runner.js +77 -0
  29. package/dist/agentic/playwright_runner.js +1 -1
  30. package/dist/cli/commands/analyze.d.ts +3 -0
  31. package/dist/cli/commands/analyze.d.ts.map +1 -0
  32. package/dist/cli/commands/analyze.js +77 -0
  33. package/dist/cli/commands/feedback.d.ts +3 -0
  34. package/dist/cli/commands/feedback.d.ts.map +1 -0
  35. package/dist/cli/commands/feedback.js +39 -0
  36. package/dist/cli/commands/finalize.d.ts +3 -0
  37. package/dist/cli/commands/finalize.d.ts.map +1 -0
  38. package/dist/cli/commands/finalize.js +41 -0
  39. package/dist/cli/commands/generate.d.ts +4 -0
  40. package/dist/cli/commands/generate.d.ts.map +1 -0
  41. package/dist/cli/commands/generate.js +108 -0
  42. package/dist/cli/commands/heal.d.ts +3 -0
  43. package/dist/cli/commands/heal.d.ts.map +1 -0
  44. package/dist/cli/commands/heal.js +60 -0
  45. package/dist/cli/commands/impact.d.ts +4 -0
  46. package/dist/cli/commands/impact.d.ts.map +1 -0
  47. package/dist/cli/commands/impact.js +26 -0
  48. package/dist/cli/commands/llm_health.d.ts +2 -0
  49. package/dist/cli/commands/llm_health.d.ts.map +1 -0
  50. package/dist/cli/commands/llm_health.js +38 -0
  51. package/dist/cli/commands/plan.d.ts +4 -0
  52. package/dist/cli/commands/plan.d.ts.map +1 -0
  53. package/dist/cli/commands/plan.js +83 -0
  54. package/dist/cli/commands/traceability.d.ts +4 -0
  55. package/dist/cli/commands/traceability.d.ts.map +1 -0
  56. package/dist/cli/commands/traceability.js +77 -0
  57. package/dist/cli/parse_args.d.ts +6 -0
  58. package/dist/cli/parse_args.d.ts.map +1 -0
  59. package/dist/cli/parse_args.js +216 -0
  60. package/dist/cli/types.d.ts +70 -0
  61. package/dist/cli/types.d.ts.map +1 -0
  62. package/dist/cli/types.js +4 -0
  63. package/dist/cli/usage.d.ts +2 -0
  64. package/dist/cli/usage.d.ts.map +1 -0
  65. package/dist/cli/usage.js +86 -0
  66. package/dist/cli.js +26 -1057
  67. package/dist/esm/agent/api_catalog.js +199 -0
  68. package/dist/esm/agent/llm_agents_flow.js +421 -0
  69. package/dist/esm/agent/native_flow.js +175 -0
  70. package/dist/esm/agent/pipeline.js +8 -1307
  71. package/dist/esm/agent/pipeline_types.js +3 -0
  72. package/dist/esm/agent/pipeline_utils.js +146 -0
  73. package/dist/esm/agent/process_runner.js +83 -0
  74. package/dist/esm/agent/spec_generator.js +249 -0
  75. package/dist/esm/agent/validation_runner.js +73 -0
  76. package/dist/esm/agentic/playwright_runner.js +1 -1
  77. package/dist/esm/cli/commands/analyze.js +74 -0
  78. package/dist/esm/cli/commands/feedback.js +36 -0
  79. package/dist/esm/cli/commands/finalize.js +38 -0
  80. package/dist/esm/cli/commands/generate.js +105 -0
  81. package/dist/esm/cli/commands/heal.js +57 -0
  82. package/dist/esm/cli/commands/impact.js +23 -0
  83. package/dist/esm/cli/commands/llm_health.js +35 -0
  84. package/dist/esm/cli/commands/plan.js +80 -0
  85. package/dist/esm/cli/commands/traceability.js +73 -0
  86. package/dist/esm/cli/parse_args.js +210 -0
  87. package/dist/esm/cli/types.js +3 -0
  88. package/dist/esm/cli/usage.js +83 -0
  89. package/dist/esm/cli.js +20 -1051
  90. package/dist/esm/mcp-server.js +18 -1
  91. package/dist/mcp-server.d.ts.map +1 -1
  92. package/dist/mcp-server.js +17 -0
  93. package/package.json +2 -4
package/README.md CHANGED
@@ -1,19 +1,16 @@
1
1
  # @yasserkhanorg/e2e-agents
2
2
 
3
- Framework-agnostic LLM provider library with MCP server for autonomous E2E testing.
3
+ AI-powered E2E test impact analysis, generation, and healing for frontend repositories.
4
4
 
5
5
  [![npm](https://img.shields.io/npm/v/%40yasserkhanorg%2Fe2e-agents)](https://www.npmjs.com/package/@yasserkhanorg/e2e-agents)
6
6
  [![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)
7
7
  [![GitHub](https://img.shields.io/badge/github-yasserfaraazkhan%2Fe2e--agents-blue?logo=github)](https://github.com/yasserfaraazkhan/e2e-agents)
8
8
 
9
- ## Overview
9
+ ## What It Does
10
10
 
11
- Pluggable LLM provider abstraction for test automation with:
12
- - **Anthropic Claude** — Advanced reasoning, vision support
13
- - **OpenAI GPT** Official OpenAI API integration
14
- - **Ollama** — Free, local, privacy-first
15
- - **MCP Server** — 6 tools for test discovery, generation, and healing
16
- - **Custom Providers** — Extend with any OpenAI-compatible API
11
+ Given a git diff, `e2e-ai-agents` determines which E2E test flows are impacted, identifies coverage gaps, and can generate or heal Playwright tests — all from the CLI.
12
+
13
+ **Pipeline:** `impact` `plan` `generate` → `heal` → `finalize`
17
14
 
18
15
  ## Installation
19
16
 
@@ -21,666 +18,197 @@ Pluggable LLM provider abstraction for test automation with:
21
18
  npm install @yasserkhanorg/e2e-agents
22
19
  ```
23
20
 
24
- ## Module Formats (CJS + ESM)
25
-
26
- This package ships both CommonJS and ESM builds:
27
- - `require('@yasserkhanorg/e2e-agents')` loads the CommonJS build from `dist/index.js`.
28
- - `import ... from '@yasserkhanorg/e2e-agents'` loads the ESM build from `dist/esm/index.js`.
29
- - `./mcp` follows the same pattern (`dist/mcp-server.js` for CJS, `dist/esm/mcp-server.js` for ESM).
30
-
31
- Node.js >= 20 is required.
32
-
33
- ## Quick Links
34
-
35
- 📖 **[Comprehensive Guide](E2E_AI_TESTING.md)** - In-depth documentation including:
36
- - How to use e2e-ai-agents in your projects
37
- - Real-world examples for Playwright, Cypress, Selenium
38
- - How Mattermost uses this package
39
- - Cost optimization and best practices
40
-
41
- ## Quick Start
42
-
43
- ### Use Claude
44
-
45
- ```typescript
46
- import { AnthropicProvider } from '@yasserkhanorg/e2e-agents';
47
-
48
- const claude = new AnthropicProvider({
49
- apiKey: process.env.ANTHROPIC_API_KEY
50
- });
51
-
52
- const response = await claude.generateText('Analyze test failure');
53
- console.log(response.text);
54
- console.log(`Cost: $${response.cost.toFixed(4)}`);
55
- ```
56
-
57
- ### Use OpenAI
58
-
59
- ```typescript
60
- import { OpenAIProvider } from '@yasserkhanorg/e2e-agents';
61
-
62
- const openai = new OpenAIProvider({
63
- apiKey: process.env.OPENAI_API_KEY,
64
- model: 'gpt-4'
65
- });
66
-
67
- const response = await openai.generateText('Summarize test failure');
68
- console.log(response.text);
69
- ```
70
-
71
- Tip: for accurate OpenAI cost tracking, set `costPer1MInputTokens` and `costPer1MOutputTokens` in the `OpenAIProvider` config.
72
-
73
- ### Use Ollama (Free)
74
-
75
- ```typescript
76
- import { OllamaProvider } from '@yasserkhanorg/e2e-agents';
77
-
78
- const ollama = new OllamaProvider({
79
- model: 'deepseek-r1:7b'
80
- });
81
-
82
- const response = await ollama.generateText('Generate test case');
83
- console.log(response.text); // Free!
84
- ```
85
-
86
- ### Use Custom Provider (OpenAI-compatible endpoint)
87
-
88
- ```typescript
89
- import { CustomProvider } from '@yasserkhanorg/e2e-agents';
90
-
91
- const custom = new CustomProvider({
92
- baseUrl: 'https://your-llm-gateway.example.com/v1',
93
- auth: { Authorization: `Bearer ${process.env.CUSTOM_API_KEY}` },
94
- model: 'your-model-name',
95
- requestFormat: 'openai'
96
- });
97
-
98
- const response = await custom.generateText('Generate test case');
99
- console.log(response.text);
100
- ```
101
-
102
- `requestFormat` can be `'openai'`, `'anthropic'`, or `'custom'` (with `transformRequest`/`transformResponse`).
103
-
104
- ### Factory Pattern
105
-
106
- ```typescript
107
- import { LLMProviderFactory } from '@yasserkhanorg/e2e-agents';
21
+ Requires Node.js >= 20. Ships both CommonJS and ESM builds.
108
22
 
109
- // Auto-detect from environment
110
- const provider = LLMProviderFactory.create({
111
- type: 'anthropic',
112
- config: { apiKey: process.env.ANTHROPIC_API_KEY }
113
- });
114
- ```
23
+ ## CLI Commands
115
24
 
116
- ### Hybrid Mode (Free + Premium)
25
+ ```bash
26
+ # Analyze which flows are impacted by code changes
27
+ npx e2e-ai-agents impact --path /path/to/webapp
117
28
 
118
- ```typescript
119
- const provider = LLMProviderFactory.createHybrid({
120
- primary: { type: 'ollama', config: { model: 'deepseek-r1:7b' } },
121
- fallback: { type: 'anthropic', config: { apiKey: process.env.ANTHROPIC_API_KEY } },
122
- useFallbackFor: ['vision'] // Only use Claude for vision
123
- });
29
+ # Generate a coverage plan with gap analysis
30
+ npx e2e-ai-agents plan --path /path/to/webapp
124
31
 
125
- await provider.generateText('Analyze code'); // Uses Ollama (free)
126
- await provider.analyzeImage([...], 'Compare screenshots'); // Uses Claude (vision)
127
- ```
32
+ # Generate tests for uncovered gaps (requires plan output)
33
+ npx e2e-ai-agents generate --path /path/to/webapp
128
34
 
129
- ## CLI: Impact and Gap Analysis
35
+ # Heal flaky/failing specs from a Playwright report
36
+ npx e2e-ai-agents heal --path /path/to/webapp --traceability-report ./playwright-report.json
130
37
 
131
- Run AI-driven impact analysis or gap analysis on any frontend repo.
38
+ # Stage generated tests, commit, and open a PR
39
+ npx e2e-ai-agents finalize-generated-tests --path /path/to/webapp --create-pr
132
40
 
133
- ```bash
134
- npx e2e-ai-agents impact --path /path/to/webapp
135
- npx e2e-ai-agents gap --path /path/to/webapp
136
- npx e2e-ai-agents plan --path /path/to/webapp
137
- npx e2e-ai-agents suggest --path /path/to/webapp --mattermost
138
- npx e2e-ai-agents generate --path /path/to/webapp --pipeline
139
- npx e2e-ai-agents heal --path /path/to/webapp --traceability-report ./playwright-report.json
140
- npx e2e-ai-agents suggest --path /path/to/webapp
141
- npx e2e-ai-agents approve-and-generate --path /path/to/webapp
142
- npx e2e-ai-agents finalize-generated-tests --path /path/to/webapp
143
- npx e2e-ai-agents feedback --path /path/to/webapp --feedback-input ./feedback.json
41
+ # Ingest test execution data for traceability
144
42
  npx e2e-ai-agents traceability-capture --path /path/to/webapp --traceability-report ./playwright-report.json
145
43
  npx e2e-ai-agents traceability-ingest --path /path/to/webapp --traceability-input ./traceability-input.json
146
- ```
147
-
148
- Local approval workflow (dev/QA + AI) with one review artifact:
149
-
150
- ```bash
151
- # 1) Suggest and generate local review + pending approval JSON
152
- node scripts/local-impact-workflow.js suggest --config ./e2e-ai-agents.config.json --since master
153
44
 
154
- # 2) Approve or reject after review
155
- node scripts/local-impact-workflow.js approve --config ./e2e-ai-agents.config.json --decision approve --note "QA approved"
45
+ # Ingest recommendation feedback for calibration
46
+ npx e2e-ai-agents feedback --path /path/to/webapp --feedback-input ./feedback.json
156
47
 
157
- # 3) Generate/heal only after approval
158
- node scripts/local-impact-workflow.js generate --config ./e2e-ai-agents.config.json --since master --pipeline-dry-run
159
- # Generates in MCP-only mode by default (AI generation/healing only).
160
- # Optional: tune MCP timeout per call:
161
- # node scripts/local-impact-workflow.js generate --config ./e2e-ai-agents.config.json --since master --pipeline-mcp-timeout-ms 120000
48
+ # Test LLM provider connectivity
49
+ npx e2e-ai-agents llm-health
162
50
  ```
163
51
 
164
- Generated local artifacts:
165
- - `<tests-root>/.e2e-ai-agents/local-impact-review.md`
166
- - `<tests-root>/.e2e-ai-agents/local-impact-approval.json`
167
-
168
- If tests live outside the app root:
52
+ `plan` and `suggest` are aliases. Use `--help` for all available flags.
169
53
 
170
- ```bash
171
- npx e2e-ai-agents impact --path /path/to/webapp --tests-root /path/to/e2e-tests
172
- ```
54
+ ## Configuration
173
55
 
174
- Optional config file `e2e-ai-agents.config.json` (JSON):
56
+ Create `e2e-ai-agents.config.json` in your project (auto-discovered):
175
57
 
176
58
  ```json
177
59
  {
178
60
  "path": ".",
179
- "profile": "default",
61
+ "profile": "mattermost",
180
62
  "testsRoot": ".",
181
- "flowCatalogPath": ".e2e-ai-agents/flows.json",
182
63
  "mode": "impact",
183
64
  "framework": "auto",
184
- "timeLimitMinutes": 10,
185
- "budget": { "maxUSD": 2, "maxTokens": 20000 },
186
- "artifacts": { "mode": "commit", "specsDir": ".e2e-ai-agents/reports" },
187
- "selectors": { "patchOnApply": true },
188
- "testDiscovery": { "patterns": ["tests/**/*.spec.ts"] },
189
- "flowDiscovery": {
190
- "patterns": ["channels/src/components/**/*.{tsx,jsx}"],
191
- "exclude": ["**/components/**/stories/**"]
192
- },
193
- "catalogScoring": {
194
- "priorityScores": { "P0": 10, "P1": 6, "P2": 3 },
195
- "fileMatchWeight": 1
196
- },
65
+ "git": { "since": "origin/master" },
197
66
  "impact": {
198
- "allowFallback": false,
199
- "dependencyGraph": {
200
- "enabled": true,
201
- "maxDepth": 3,
202
- "maxExpandedFiles": 1000,
203
- "filePatterns": ["**/*.{ts,tsx,js,jsx}"],
204
- "excludePatterns": ["**/node_modules/**", "**/.git/**", "**/dist/**", "**/build/**"],
205
- "aliasRoots": ["src", "channels/src"],
206
- "pathAliases": {
207
- "@app/*": ["src/*"],
208
- "@channels/*": ["channels/src/*"]
209
- }
210
- },
211
- "traceability": {
212
- "enabled": true,
213
- "manifestPath": ".e2e-ai-agents/traceability.json",
214
- "minSignalsPerTest": 1
215
- },
216
- "subsystemRisk": {
217
- "enabled": false,
218
- "mapPath": ".e2e-ai-agents/subsystem-risk-map.json",
219
- "maxRulesPerFile": 4
220
- },
221
- "aiFlow": {
222
- "enabled": true,
223
- "strict": true,
224
- "provider": "anthropic",
225
- "contextFiles": [
226
- "CLAUDE.OPTIONAL.md",
227
- ".claude/CLAUDE.OPTIONAL.md"
228
- ],
229
- "maxFilesPerRequest": 220,
230
- "maxFlowsPerRequest": 80,
231
- "maxTokens": 4000,
232
- "temperature": 0
233
- },
234
- "aiMapping": {
235
- "enabled": false,
236
- "provider": "anthropic",
237
- "contextFiles": [
238
- "CLAUDE.OPTIONAL.md",
239
- ".claude/CLAUDE.OPTIONAL.md"
240
- ],
241
- "maxFlowsPerRequest": 30,
242
- "maxCandidateTests": 400,
243
- "maxTokens": 4000,
244
- "temperature": 0
245
- }
67
+ "dependencyGraph": { "enabled": true, "maxDepth": 3 },
68
+ "traceability": { "enabled": true },
69
+ "aiFlow": { "enabled": true, "provider": "anthropic" }
246
70
  },
247
71
  "pipeline": {
248
72
  "enabled": false,
249
73
  "scenarios": 3,
250
74
  "outputDir": "specs/functional/ai-assisted",
251
- "heal": true,
252
- "mcp": false,
253
- "mcpAllowFallback": false,
254
- "mcpOnly": false,
255
- "mcpCommandTimeoutMs": 180000,
256
- "mcpRetries": 1
75
+ "mcp": false
257
76
  },
258
- "llm": { "provider": "anthropic", "fallback": "ollama" },
259
77
  "policy": {
260
- "minConfidenceForTargeted": 60,
261
- "safeMergeMinConfidence": 85,
262
- "forceFullOnWarningsAtOrAbove": 2,
263
- "forceFullOnP0WithGaps": true,
264
- "forceFullOnRiskyFiles": true,
265
- "riskyFilePatterns": ["**/auth/**", "**/permissions/**", "**/security/**", "**/*.sql"],
266
- "enforcementMode": "advisory",
78
+ "enforcementMode": "block",
267
79
  "blockOnActions": ["must-add-tests"]
268
- },
269
- "flags": { "defaultState": "on" },
270
- "audience": { "defaultRoles": ["member"] },
271
- "blastRadius": {
272
- "memberBonus": 1,
273
- "guestBonus": 1,
274
- "adminOnlyPenalty": -1,
275
- "flagOffPenalty": -2
276
80
  }
277
81
  }
278
82
  ```
279
83
 
280
- Notes:
281
- - If no framework config is found, provide `testDiscovery.patterns` or `--patterns`.
282
- - Use `flowDiscovery.patterns` or `--flow-patterns` to customize flow scanning.
283
- - Use `testsRoot` when tests live outside the app root.
284
- - Use `flowCatalogPath` or `--flow-catalog` to provide a flow catalog for deterministic P0/P1 mapping.
285
- - Impact mode expects a git diff; use `--since` or add `"impact": { "allowFallback": true }` to fall back to scanning.
286
- - Impact analysis now uses static reverse dependency graph expansion (configurable via `impact.dependencyGraph`) to propagate changed-file impact, including alias imports via `aliasRoots` and `pathAliases`.
287
- - Impact analysis can use coverage-style traceability manifests (`impact.traceability`) for file->test mapping with heuristic fallback for uncovered flows.
288
- - Impact analysis can run AI-first flow mapping (`impact.aiFlow`) so impacted flows and priorities come from LLM reasoning rather than heuristic scoring.
289
- - Impact analysis can use optional Anthropic-powered AI mapping (`impact.aiMapping`) to map impacted flows to existing tests when traceability is missing/low; context is loaded from optional markdown files such as `CLAUDE.OPTIONAL.md`.
290
- - Impact analysis can apply subsystem-aware risk boosts and priority floors from a map (`impact.subsystemRisk`) to capture known high-blast-radius areas.
291
- - Diffing is computed from `merge-base(<since>, HEAD)` when available, which is the standard PR-impact baseline.
292
- - Reports are written under `testsRoot/.e2e-ai-agents/reports` (or app root if `testsRoot` is not set).
293
- - Use `approve-and-generate` for explicit approval before generating/healing tests.
294
- - Selector/data-testid patches are only applied when `--apply` is passed.
295
- - `plan` is a direct alias for `suggest`.
296
- - `generate` is a direct alias for `approve-and-generate`.
297
- - Mattermost-first strict mode is available with `--mattermost` (or `"profile": "mattermost"` in config).
298
- - In Mattermost mode, heuristic-only test mapping is treated as insufficient evidence and recommendations are escalated to broad runs.
299
- - `heal` targets flaky/failed specs from a Playwright JSON report (`--traceability-report`).
300
- - `--apply` remains available as a legacy shortcut for direct `gap` execution.
301
- - Use `--pipeline` to run the Playwright generation pipeline.
302
- - If `e2e-test-gen-cli.ts` exists in `testsRoot`, it is used as the advanced runner.
303
- - If it is absent, `@yasserkhanorg/e2e-agents` falls back to package-native generation with strategy-based templates, quality guardrails (`no test.describe`, single tag), and iterative heal attempts.
304
- - `--pipeline-mcp` now attempts the official Playwright Test Agent loop first (planner/generator/healer) using:
305
- - `npx playwright init-agents --loop=claude --prompts`
306
- - `.mcp.json` (`playwright run-test-mcp-server`)
307
- - `claude -p` non-interactive orchestration
308
- - In MCP mode, fallback is strict by default: if official agent setup fails, generation stops instead of silently degrading.
309
- - Use `--pipeline-mcp-allow-fallback` (or config `pipeline.mcpAllowFallback=true`) only when you explicitly want fallback generation.
310
- - MCP prerequisites: Playwright config in `testsRoot` and Claude CLI installed/authenticated.
311
- - Use `--pipeline-mcp-timeout-ms` (or config `pipeline.mcpCommandTimeoutMs`) to limit per-command MCP wait time and fail fast in strict mode.
312
- - Use `--pipeline-mcp-retries` (or config `pipeline.mcpRetries`) to retry transient MCP failures while staying in AI-only mode.
313
- - Official MCP outputs are validated against discovered local API surface (`pw.*`, `pw.testBrowser.*`, `channelsPage.*`) to block invented methods (for example `pw.mainClient.*`).
314
- - If fallback is enabled and official MCP agent execution is unavailable, pipeline falls back to `e2e-test-gen` (if present) or package-native generation with warnings in report output.
315
- - `impact/gap` pipeline output now includes `pipeline.mcp` (`requested`, `active`, `backend`) so MCP activation is explicit.
316
- - `suggest` writes `.e2e-ai-agents/plan.json` with `runSet` (`smoke|targeted|full`) and confidence.
317
- - `suggest` also writes `.e2e-ai-agents/ci-summary.md` with CI status: `run-now`, `must-add-tests`, or `safe-to-merge`.
318
- - CLI policy overrides: `--policy-min-confidence`, `--policy-safe-merge-confidence`, `--policy-force-full-on-warnings`, `--policy-risky-patterns`, `--policy-enforcement-mode`, `--policy-block-actions`.
319
- - GitHub Actions output wiring: `--github-output $GITHUB_OUTPUT`.
320
- - Optional merge gating: `--fail-on-must-add-tests` exits non-zero when uncovered P0/P1 gaps are detected. Leave this flag unset for advisory-only mode.
321
- - `suggest` now appends run metrics to `.e2e-ai-agents/metrics.jsonl` and writes aggregated `.e2e-ai-agents/metrics-summary.json`.
322
- - `impact/gap` now include actionable `testSuggestions` with linked source files and skeleton test code.
323
- - `impact/gap` now include `impactModel` metadata (`flowMapping`, `testMapping`, `confidenceClass`, traceability stats, dependency graph stats).
324
- - `impact/gap` now include `runMetadata` (run id/timestamps/duration/since ref) for auditability.
325
- - `impact/gap` now include optional `impactModel.subsystemRisk` stats (map status, matched files/rules, boosted flows).
326
- - `impact/gap` pipeline result rows now include failure taxonomy (`failureCategory`, `failureCode`) when generation/heal fails.
327
- - `feedback` appends outcomes to `.e2e-ai-agents/feedback.json` and recomputes `.e2e-ai-agents/calibration.json`.
328
- - `feedback` also computes intelligent flaky scores into `.e2e-ai-agents/flaky-tests.json`.
329
- - `traceability-capture` converts Playwright JSON execution report + optional coverage map into `.e2e-ai-agents/traceability-input.json`.
330
- - `traceability-ingest` merges CI execution mappings into `.e2e-ai-agents/traceability.json` and persists rolling counts in `.e2e-ai-agents/traceability-state.json`.
331
- - Traceability capture flags: `--traceability-report`, `--traceability-capture-output`, `--traceability-coverage-map`, `--traceability-changed-files`.
332
- - Traceability ingest tuning flags: `--traceability-min-hits`, `--traceability-max-files-per-test`, `--traceability-max-age-days`.
333
- - Optional ownership routing for flaky alerts: `.e2e-ai-agents/subsystem-owners.json`.
334
- - `suggest` automatically consumes optional operational manifests:
335
- - `.e2e-ai-agents/flaky-tests.json`
336
- - `.e2e-ai-agents/quality-gates.json`
337
- - `plan.json` includes `nextActions` commands for run/approve-and-generate/heal/finalize/PR handoff.
338
- - `finalize-generated-tests` stages generated artifacts from `gap.json`, commits, and can open a PR with `--create-pr`.
339
- - Generated Mattermost Playwright tests use standalone `test(...)` style (no `test.describe`) and a single tag string.
340
-
341
- Programmatic API:
84
+ Key options:
342
85
 
343
- ```typescript
344
- import {analyzeImpact, findGaps, recommendTests, captureTraceability, ingestTraceability} from '@yasserkhanorg/e2e-agents';
86
+ - **`testsRoot`** — path to tests when they live outside the app root
87
+ - **`profile`** `default` or `mattermost` (strict mode with escalation for heuristic-only mappings)
88
+ - **`impact.dependencyGraph`** — static reverse dependency graph for transitive impact
89
+ - **`impact.traceability`** — file-to-test mapping from CI execution data
90
+ - **`impact.aiFlow`** — LLM-powered flow mapping (requires `ANTHROPIC_API_KEY`)
91
+ - **`pipeline.mcp`** — use Playwright MCP server for browser-aware generation/healing
92
+ - **`policy.enforcementMode`** — `advisory`, `warn`, or `block`
345
93
 
346
- await analyzeImpact({path: '/path/to/webapp'});
347
- await findGaps({path: '/path/to/webapp'});
348
- const suggestion = await recommendTests({path: '/path/to/webapp'});
349
- console.log(suggestion.plan.runSet);
94
+ ## CI Integration
350
95
 
351
- const captured = captureTraceability({
352
- path: '/path/to/webapp',
353
- testsRoot: '/path/to/e2e-tests/playwright',
354
- reportPath: '/path/to/playwright-report.json',
355
- });
96
+ ### GitHub Actions
356
97
 
357
- ingestTraceability({
358
- path: '/path/to/webapp',
359
- testsRoot: '/path/to/e2e-tests/playwright',
360
- payload: JSON.parse(require('fs').readFileSync(captured.outputPath, 'utf8')),
361
- });
98
+ ```yaml
99
+ - name: Run E2E coverage check
100
+ env:
101
+ ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
102
+ run: |
103
+ npx e2e-ai-agents plan \
104
+ --config ./e2e-ai-agents.config.json \
105
+ --since origin/${{ github.base_ref }} \
106
+ --fail-on-must-add-tests \
107
+ --github-output "$GITHUB_OUTPUT"
362
108
  ```
363
109
 
364
- Feedback API:
110
+ The `plan` command writes:
111
+ - `.e2e-ai-agents/plan.json` — structured plan with `runSet`, `confidence`, `decision`
112
+ - `.e2e-ai-agents/ci-summary.md` — markdown summary for PR comments
113
+ - `.e2e-ai-agents/metrics-summary.json` — run metrics
365
114
 
366
- ```typescript
367
- import {appendFeedbackAndRecompute} from '@yasserkhanorg/e2e-agents';
368
-
369
- appendFeedbackAndRecompute('/path/to/webapp', {
370
- timestamp: new Date().toISOString(),
371
- runSet: 'targeted',
372
- recommendedTests: ['specs/channels/realtime.spec.ts'],
373
- executedTests: ['specs/channels/realtime.spec.ts'],
374
- failedTests: ['specs/channels/realtime.spec.ts'],
375
- escapedFailures: []
376
- });
377
- ```
115
+ Use `--fail-on-must-add-tests` to exit non-zero when uncovered P0/P1 gaps exist. Use `--github-output` to expose outputs to subsequent workflow steps.
378
116
 
379
- Traceability ingest API:
117
+ See [examples/github-actions/](examples/github-actions/) for a complete workflow template.
380
118
 
381
- ```typescript
382
- import {ingestTraceability} from '@yasserkhanorg/e2e-agents';
383
-
384
- ingestTraceability({
385
- path: '/path/to/webapp',
386
- testsRoot: '/path/to/e2e-tests/playwright',
387
- payload: {
388
- runs: [
389
- {
390
- test: 'specs/channels/channels.switch.spec.ts',
391
- touchedFiles: ['channels/src/components/channel_switcher/channel_switcher.tsx']
392
- }
393
- ]
394
- },
395
- options: {minHits: 2}
396
- });
397
- ```
119
+ ## Pipeline Modes
398
120
 
399
- Automation API:
121
+ ### Package Native (default)
400
122
 
401
- ```typescript
402
- import {handoffGeneratedTests} from '@yasserkhanorg/e2e-agents';
123
+ Strategy-based Playwright test templates with quality guardrails (no `test.describe`, single tag) and iterative heal attempts.
403
124
 
404
- handoffGeneratedTests({
405
- appPath: '/path/to/webapp',
406
- testsRoot: '/path/to/e2e-tests/playwright',
407
- createPr: true,
408
- });
409
- ```
125
+ ### MCP Mode (`--pipeline-mcp`)
410
126
 
411
- CI integration template:
127
+ Uses the official Playwright Test Agent loop (planner/generator/healer) with Claude CLI orchestration. Validates generated specs against discovered local API surface to block hallucinated methods.
412
128
 
413
- - [GitHub Actions example](examples/github-actions/pr-impact.yml)
414
- - The example uses Node 22 (`actions/setup-node@v4` with `node-version: 22`).
415
- - The example captures Playwright JSON output via `traceability-capture` and ingests it with `traceability-ingest`.
416
- - Feedback payload example: [examples/feedback.sample.json](examples/feedback.sample.json)
417
- - Subsystem owners example: [examples/subsystem-owners.sample.json](examples/subsystem-owners.sample.json)
418
- - Traceability ingest payload schema: [schemas/traceability-input.schema.json](schemas/traceability-input.schema.json)
419
- - Traceability ingest payload example: [examples/traceability-input.sample.json](examples/traceability-input.sample.json)
420
- - Traceability manifest example: [examples/traceability.sample.json](examples/traceability.sample.json)
421
- - Subsystem risk map schema: [schemas/subsystem-risk-map.schema.json](schemas/subsystem-risk-map.schema.json)
422
- - Subsystem risk map example: [examples/subsystem-risk-map.sample.json](examples/subsystem-risk-map.sample.json)
423
- - End-to-end verification steps: [examples/verification/README.md](examples/verification/README.md)
424
- - Impact checklist playbook: [examples/verification/IMPACT_ANALYSIS_CHECKLIST.md](examples/verification/IMPACT_ANALYSIS_CHECKLIST.md)
425
- - Checklist validator command: `npm run impact:checklist -- --root <tests-root>`
129
+ - **`--pipeline-mcp-only`** fail if MCP setup fails (no silent fallback)
130
+ - **`--pipeline-mcp-allow-fallback`** fall back to package-native if MCP unavailable
131
+ - **`--pipeline-mcp-timeout-ms`** per-command timeout
132
+ - **`--pipeline-mcp-retries`** retry count for transient failures
426
133
 
427
- Traceability manifest example (`.e2e-ai-agents/traceability.json`):
134
+ ### Agentic Generation (`generate` command)
428
135
 
429
- ```json
430
- {
431
- "schemaVersion": "1.0.0",
432
- "tests": [
433
- {
434
- "test": "specs/channels/channels.switch.spec.ts",
435
- "touchedFiles": ["channels/src/components/channel_switcher/channel_switcher.tsx"]
436
- }
437
- ]
438
- }
439
- ```
136
+ LLM-powered generate-run-fix loop: generates a spec, runs it, analyzes failures, and iterates up to `--max-attempts` times.
440
137
 
441
- Traceability ingest input example (`traceability-input.json`):
138
+ ## LLM Providers
442
139
 
443
- ```json
444
- {
445
- "runs": [
446
- {
447
- "test": "specs/channels/channels.switch.spec.ts",
448
- "touchedFiles": ["channels/src/components/channel_switcher/channel_switcher.tsx"]
449
- }
450
- ]
451
- }
452
- ```
453
-
454
- Flow catalog entries can also include optional audience and flag metadata:
140
+ Used internally for AI enrichment, test generation, and healing.
455
141
 
456
- ```json
457
- {
458
- "id": "messaging.realtime",
459
- "priority": "P0",
460
- "audience": ["member", "guest"],
461
- "flags": [
462
- "EnableSomething",
463
- { "name": "EnableEnterpriseOnly", "source": "config", "defaultState": "off" }
464
- ],
465
- "tests": ["specs/functional/channels/realtime.spec.ts"]
466
- }
467
- ```
468
-
469
- ## Extending with Custom Frameworks
142
+ ```bash
143
+ # Anthropic (default)
144
+ export ANTHROPIC_API_KEY=sk-ant-...
470
145
 
471
- ### 1. Create Custom Provider
146
+ # OpenAI
147
+ export OPENAI_API_KEY=sk-...
472
148
 
473
- ```typescript
474
- import { LLMProvider } from '@yasserkhanorg/e2e-agents';
475
-
476
- export class MyCustomProvider implements LLMProvider {
477
- async generateText(prompt: string) {
478
- // Your API call here
479
- return {
480
- text: '...',
481
- cost: 0.001,
482
- tokens: { input: 100, output: 50 }
483
- };
484
- }
485
-
486
- async analyzeImage(images, prompt) {
487
- throw new Error('Vision not supported');
488
- }
489
-
490
- async streamText(prompt) {
491
- // Generator implementation
492
- yield 'chunk1';
493
- yield 'chunk2';
494
- }
495
-
496
- getUsageStats() {
497
- return { /* ... */ };
498
- }
499
- }
149
+ # Ollama (free, local)
150
+ export OLLAMA_BASE_URL=http://localhost:11434
151
+ export OLLAMA_MODEL=deepseek-r1:7b
500
152
  ```
501
153
 
502
- ### 2. Register with Factory
154
+ Programmatic provider usage:
503
155
 
504
156
  ```typescript
505
- import { LLMProviderFactory } from '@yasserkhanorg/e2e-agents';
506
-
507
- LLMProviderFactory.register('my-provider', (config) => {
508
- return new MyCustomProvider(config);
509
- });
157
+ import { AnthropicProvider } from '@yasserkhanorg/e2e-agents';
510
158
 
511
- // Use it
512
- const provider = LLMProviderFactory.create({
513
- type: 'my-provider',
514
- config: { apiKey: '...' }
159
+ const claude = new AnthropicProvider({
160
+ apiKey: process.env.ANTHROPIC_API_KEY
515
161
  });
162
+ const response = await claude.generateText('Analyze test failure');
516
163
  ```
517
164
 
518
- ### 3. Integrate with Test Framework
165
+ Factory pattern with auto-detection, hybrid mode (free local + premium fallback), and custom OpenAI-compatible endpoints are also supported. See the [provider API exports](src/index.ts) for full details.
519
166
 
520
- ```typescript
521
- // Playwright example
522
- import { test } from '@playwright/test';
523
- import { LLMProviderFactory } from '@yasserkhanorg/e2e-agents';
167
+ ## MCP Server
524
168
 
525
- const llm = LLMProviderFactory.create({
526
- type: 'anthropic',
527
- config: { apiKey: process.env.ANTHROPIC_API_KEY }
528
- });
529
-
530
- test('use LLM to verify UI', async ({ page }) => {
531
- await page.goto('https://example.com');
532
- const screenshot = await page.screenshot();
533
-
534
- const analysis = await llm.analyzeImage(
535
- [{ data: screenshot.toString('base64'), mimeType: 'image/png' }],
536
- 'Is the login button visible and correctly styled?'
537
- );
538
-
539
- console.log(analysis.text);
540
- });
541
- ```
542
-
543
- ## MCP Server Integration
544
-
545
- For Playwright test agents (v1.56+):
169
+ Exposes 6 tools for test agents (Playwright v1.56+):
546
170
 
547
171
  ```typescript
548
172
  import { E2EAgentsMCPServer } from '@yasserkhanorg/e2e-agents/mcp';
549
173
 
550
174
  const server = new E2EAgentsMCPServer();
551
- const tools = server.getTools();
552
-
553
- // Available tools:
554
- // - discover_tests: Find tests needed for code changes
555
- // - read_file: Read repository files
556
- // - write_file: Create/update test files
557
- // - run_tests: Execute tests
558
- // - get_git_changes: Detect changed files
559
- // - get_repository_context: Gather metadata
560
- ```
561
-
562
- ## Configuration
563
-
564
- ### Environment Variables
565
-
566
- ```bash
567
- ANTHROPIC_API_KEY=sk-ant-...
568
- ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
569
-
570
- OPENAI_API_KEY=sk-...
571
- OPENAI_MODEL=gpt-4
572
- OPENAI_BASE_URL=https://api.openai.com/v1
573
- OPENAI_ORG_ID=org_...
574
-
575
- OLLAMA_BASE_URL=http://localhost:11434
576
- OLLAMA_MODEL=deepseek-r1:7b
577
- ```
578
-
579
- Note: If `OLLAMA_BASE_URL` points to the root host (for example, `http://localhost:11434`), it will be normalized to `/v1`.
580
-
581
- ### Setup
582
-
583
- **Claude:**
584
- 1. Get key: https://console.anthropic.com
585
- 2. Export: `export ANTHROPIC_API_KEY=sk-ant-...`
586
-
587
- **OpenAI:**
588
- 1. Get key: https://platform.openai.com
589
- 2. Export: `export OPENAI_API_KEY=sk-...`
590
-
591
- **Ollama:**
592
- 1. Install: `curl -fsSL https://ollama.com/install.sh | sh`
593
- 2. Pull: `ollama pull deepseek-r1:7b`
594
- 3. Run: `ollama serve`
595
-
596
- ## Error Handling
597
-
598
- ```typescript
599
- import { LLMProviderError, UnsupportedCapabilityError } from '@yasserkhanorg/e2e-agents';
600
-
601
- try {
602
- await provider.analyzeImage([...], 'Analyze');
603
- } catch (error) {
604
- if (error instanceof UnsupportedCapabilityError) {
605
- console.log(`Not supported by: ${error.provider}`);
606
- } else if (error instanceof LLMProviderError) {
607
- console.log(`API error: ${error.message}`);
608
- }
609
- }
610
- ```
611
-
612
- ## Performance Comparison
613
-
614
- | Feature | Claude | OpenAI | Ollama |
615
- |---------|--------|--------|--------|
616
- | Vision | ✅ | ✅ (model dependent) | ❌ |
617
- | Cost | $3-15/1M tokens | Model dependent | Free |
618
- | Speed | ~800ms | ~1000ms | ~3000ms |
619
- | Streaming | ✅ | ✅ | ✅ |
620
- | Local | ❌ | ❌ | ✅ |
621
-
622
- ## Cost Optimization
623
-
624
- ```typescript
625
- const stats = provider.getUsageStats();
626
- console.log(`Tokens: ${stats.totalTokens.toLocaleString()}`);
627
- console.log(`Cost: $${stats.totalCost.toFixed(2)}`);
628
- console.log(`Avg speed: ${stats.averageResponseTimeMs.toFixed(0)}ms`);
175
+ // Tools: discover_tests, read_file, write_file, run_tests, get_git_changes, get_repository_context
629
176
  ```
630
177
 
631
- ## Performance & Optimization (v0.3.0+)
178
+ Security: `write_file` is restricted to test spec files (`*.spec.ts`, `*.test.ts`) and the `.e2e-ai-agents/` directory. Path traversal and symlink escape are blocked. Rate limited to 100 requests/minute.
632
179
 
633
- ### Logging Configuration
180
+ ## Traceability
634
181
 
635
- Control logging verbosity with the `LOG_LEVEL` environment variable:
182
+ Build file-to-test mappings from CI execution data:
636
183
 
637
- ```bash
638
- # Production: errors only
639
- LOG_LEVEL=ERROR npm start
640
-
641
- # Development: all messages
642
- LOG_LEVEL=DEBUG npm start
643
- ```
644
-
645
- Supported levels: `ERROR`, `WARN`, `INFO`, `DEBUG` (default: `INFO`)
646
-
647
- ### Caching
184
+ 1. **Capture** — extract test-file relationships from Playwright JSON reports
185
+ 2. **Ingest** merge into a rolling manifest (`.e2e-ai-agents/traceability.json`)
186
+ 3. **Query** — impact analysis uses the manifest to map changed files to relevant tests
648
187
 
649
- Repository context and analysis data are cached internally by the tool.
650
- No public cache API is exposed; caching behavior is automatic.
188
+ Tuning flags: `--traceability-min-hits`, `--traceability-max-files-per-test`, `--traceability-max-age-days`.
651
189
 
652
- ### Performance Metrics (v0.3.0)
190
+ Schemas: [schemas/traceability-input.schema.json](schemas/traceability-input.schema.json)
653
191
 
654
- Improvements from code quality refactoring:
192
+ ## Artifacts
655
193
 
656
- - **40% faster** stats calculation (incremental updates)
657
- - **30% faster** API key validation (pre-compiled patterns)
658
- - **90% faster** repository context (cache hits)
659
- - **15% smaller** bundle size (code deduplication)
660
- - **44 comprehensive tests** (80%+ coverage)
194
+ | File | Written by | Purpose |
195
+ |------|-----------|---------|
196
+ | `plan.json` | `plan` | Coverage plan with gaps, decisions, metrics |
197
+ | `ci-summary.md` | `plan` | Markdown for PR comments |
198
+ | `metrics.jsonl` | `plan` | Append-only run metrics |
199
+ | `metrics-summary.json` | `plan` | Aggregated metrics |
200
+ | `traceability.json` | `traceability-ingest` | File-to-test manifest |
201
+ | `traceability-state.json` | `traceability-ingest` | Rolling counts |
202
+ | `feedback.json` | `feedback` | Recommendation outcomes |
203
+ | `calibration.json` | `feedback` | Precision/recall calibration |
204
+ | `flaky-tests.json` | `feedback` | Flaky test scores |
205
+ | `agentic-summary.json` | `generate` | Agentic generation results |
661
206
 
662
- See [CHANGELOG.md](CHANGELOG.md) for detailed improvements.
663
-
664
- ## Learn More
665
-
666
- For comprehensive documentation on:
667
- - Real-world usage examples
668
- - Integration with different frameworks
669
- - How Mattermost uses e2e-ai-agents in production
670
- - Cost optimization strategies
671
- - Security features and best practices
672
-
673
- 👉 **See [E2E_AI_TESTING.md](E2E_AI_TESTING.md)**
207
+ All written under `<testsRoot>/.e2e-ai-agents/`.
674
208
 
675
209
  ## Production Usage
676
210
 
677
- This package is used in production by Mattermost for:
678
- - ✅ Automated test generation
679
- - ✅ Test validation and healing
680
- - ✅ UI screenshot analysis
681
- - ✅ Test data generation
682
-
683
- See the [Mattermost e2e-test-gen implementation](https://github.com/mattermost/mattermost/tree/master/e2e-tests/playwright) for a complete example.
211
+ Used by [Mattermost](https://github.com/mattermost/mattermost) for CI-integrated E2E coverage gating, test generation, and spec healing. See the [Mattermost Playwright integration](https://github.com/mattermost/mattermost/tree/master/e2e-tests/playwright) for a real-world example.
684
212
 
685
213
  ## License
686
214