npm - gaia-framework - Versions diffs - 1.65.1 → 1.66.0 - Mend

gaia-framework 1.65.1 → 1.66.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/CLAUDE.md CHANGED Viewed

@@ -1,5 +1,5 @@
-# GAIA Framework v1.65.1
+# GAIA Framework v1.66.0
 This project uses the **GAIA** (Generative Agile Intelligence Architecture) framework — an AI agent framework for Claude Code that orchestrates software product development through 26 specialized agents, 65 workflows, and 8 shared skills.
@@ -149,6 +149,21 @@ Run `/gaia-run-all-reviews` to execute all six reviews sequentially via subagent
 If any review fails, the story returns to `in-progress`. The Review Gate table in the story file tracks progress.
+### Infra Review Gate Substitutions
+For infrastructure stories (those whose `traces_to` field contains `IR-###`, `OR-###`, or `SR-###` requirement IDs), 4 of the 6 review gates use adapted criteria. Code Review and Security Review remain unchanged for all story types.
+| Standard Gate | Infra Equivalent | Change |
+|---|---|---|
+| Code Review | IaC Code Review | Unchanged — same workflow, IaC expertise expected |
+| QA Tests | Policy-as-Code Validation | Checkov/tfsec/OPA pass replaces unit/integration test pass |
+| Security Review | Security Review | Unchanged — critical for infrastructure |
+| Test Automation | Plan Validation + Drift Checks | terraform plan assertions replace automated test coverage |
+| Test Review | Policy Review | OPA/Rego coverage replaces test quality review |
+| Performance Review | Cost Review + Scaling Validation | Cost analysis and autoscaling validation replace load testing |
+**Detection mechanism:** The `review-gate-check` protocol reads the story's `traces_to` field and checks the requirement ID prefix. Each story is evaluated independently — platform projects with mixed stories get per-story gate selection based on their own requirement prefix.
 ## Memory Hygiene
 Agent memory sidecars accumulate decisions across sessions. Run `/gaia-memory-hygiene` periodically (recommended before each sprint) to detect stale, contradicted, or orphaned entries by cross-referencing sidecar decisions against current planning and architecture artifacts.

package/README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # GAIA — Generative Agile Intelligence Architecture
-[![Framework](https://img.shields.io/badge/framework-v1.65.1-blue)]()
+[![Framework](https://img.shields.io/badge/framework-v1.66.0-blue)]()
 [![License](https://img.shields.io/badge/license-AGPL--3.0-green)]()
 [![Agents](https://img.shields.io/badge/agents-26-purple)]()
 [![Workflows](https://img.shields.io/badge/workflows-73-orange)]()
@@ -460,7 +460,7 @@ The single source of truth is `_gaia/_config/global.yaml`:
 ```yaml
 framework_name: "GAIA"
-framework_version: "1.65.1"
+framework_version: "1.66.0"
 user_name: "your-name"
 project_name: "your-project"
 ```

package/_gaia/_config/global.yaml CHANGED Viewed

@@ -3,7 +3,7 @@
 # After modifying this file, run /gaia-build-configs to regenerate resolved configs.
 framework_name: "GAIA"
-framework_version: "1.65.1"
+framework_version: "1.66.0"
 # User settings
 user_name: "jlouage"

package/_gaia/core/engine/workflow.xml CHANGED Viewed

@@ -50,6 +50,12 @@ execution modes (normal/yolo/planning), checkpoints, and quality gates.
     <action>Resolve {installed_path} from workflow.yaml location</action>
     <action>Resolve {date} to current date</action>
     <action>Ask user for any remaining unresolved variables</action>
+    <!-- Template Resolution (ADR-020, FR-101) — custom/templates/ overrides _gaia/lifecycle/templates/ -->
+    <!-- Resolution order for template reads: custom/templates/{filename} > _gaia/lifecycle/templates/{filename} -->
+    <!-- Resolution order for template writes: custom/templates/ ONLY — NEVER _gaia/lifecycle/templates/ -->
+    <action>Resolve template paths: If workflow.yaml declares a 'template' field, extract the template filename from the fully resolved template path (e.g., extract "story-template.md" from "{project-root}/_gaia/lifecycle/templates/story-template.md"). Check if {project-root}/custom/templates/{filename} exists and is non-empty (file size > 0 bytes). If yes: the custom template overrides the framework default — replace the resolved template variable with the custom path ({project-root}/custom/templates/{filename}). The custom template takes full precedence and completely replaces the framework default (no merge). If no (custom/templates/ directory does not exist, the specific file is not found, or the file is empty / 0 bytes): keep the original resolved framework path unchanged. No error, no warning on fallback — this is silent. If workflow.yaml has no 'template' field, skip template resolution entirely.</action>
+    <action>Template write-path mandate: Any workflow that writes or modifies template files MUST write to {project-root}/custom/templates/, NEVER to {project-root}/_gaia/lifecycle/templates/. Framework default templates are read-only.</action>
   </step>
   <step n="2" title="Preflight Validation">

package/_gaia/core/protocols/review-gate-check.xml CHANGED Viewed

@@ -2,16 +2,44 @@
 <description>
   Shared protocol invoked by each review workflow after updating its Review Gate row.
   Checks if ALL reviews have passed and transitions story to done if so.
+  Supports infrastructure review gate adaptations (ADR-022 §10.16.6):
+  stories tracing to infra requirements (IR-/OR-/SR- prefixes) use adapted gate criteria.
 </description>
 <critical>
   <mandate>The Review Gate table must have EXACTLY 6 rows: Code Review, QA Tests, Security Review, Test Automation, Test Review, Performance Review. No other rows are valid. If extra rows exist, remove them.</mandate>
+  <mandate>For infrastructure stories, 4 of 6 gates use adapted criteria. Code Review and Security Review remain unchanged for all story types.</mandate>
 </critical>
-<step n="1" title="Read Review Gate">
+<!-- Infra Review Gate Substitutions (FR-128, ADR-022 §10.16.6)
+     When a story traces to infrastructure requirements, the following gate
+     criteria are substituted. The gate row NAMES in the Review Gate table
+     stay the same (for compatibility), but the review workflows apply
+     infra-specific criteria instead of standard application criteria.
+     | Standard Gate      | Infra Equivalent                    | Changed? |
+     | Code Review        | IaC Code Review                     | Unchanged — same workflow, IaC expertise expected |
+     | QA Tests           | Policy-as-Code Validation           | Checkov/tfsec/OPA pass replaces unit/integration test pass |
+     | Security Review    | Security Review                     | Unchanged — critical for infrastructure |
+     | Test Automation    | Plan Validation + Drift Checks      | terraform plan assertions replace automated test coverage |
+     | Test Review        | Policy Review                       | OPA/Rego coverage replaces test quality review |
+     | Performance Review | Cost Review + Scaling Validation    | Cost analysis and autoscaling validation replace load testing |
+-->
+<step n="1" title="Read Review Gate and Determine Gate Type">
   <action>Read the story file's Review Gate table</action>
   <action>If Review Gate section is missing: initialize it with EXACTLY 6 rows — Code Review (PENDING), QA Tests (PENDING), Security Review (PENDING), Test Automation (PENDING), Test Review (PENDING), Performance Review (PENDING). Do NOT add any other rows.</action>
   <action>If Review Gate table has extra rows beyond the 6 valid ones: remove the invalid rows</action>
   <action>Parse each row: Review name, Status (PENDING | PASSED | FAILED), Report link</action>
+  <!-- Infra Gate Detection (FR-129): per-story gate type selection based on requirement ID prefix -->
+  <action>Read the story file's YAML frontmatter traces_to field (e.g., traces_to: [IR-001, FR-128])</action>
+  <action>Determine gate_type for this individual story by scanning its traces_to entries:
+    - If ANY entry has an IR-, OR-, or SR- prefix → set gate_type = "infra"
+    - If entries have only FR- or NFR- prefixes (or traces_to is empty/absent) → set gate_type = "standard"
+    Each story is evaluated independently — in platform projects with mixed stories, each story gets the gate set matching its own requirement prefix, not a single gate set for the whole project.</action>
+  <action if="gate_type == infra">Log: "Infra review gates detected for this story (traces to IR-/OR-/SR- requirements). Applying infrastructure gate criteria: QA Tests → Policy-as-Code Validation, Test Automation → Plan Validation + Drift Checks, Performance Review → Cost Review + Scaling Validation, Test Review → Policy Review. Code Review and Security Review remain unchanged."</action>
 </step>
 <step n="2" title="Evaluate Gate and Transition">
   <critical>
     <mandate>You MUST execute the transition even if the gate was already fully passed before this run. The purpose is to ensure story status matches gate state.</mandate>

package/_gaia/lifecycle/knowledge/brownfield/config-contradiction-scan.md ADDED Viewed

@@ -0,0 +1,137 @@
+# Config Contradiction Scanner — Subagent Prompt Template
+> Brownfield deep analysis scan subagent for detecting contradictory configuration values across files.
+> Reference: Architecture ADR-021, Section 10.15.2, 10.15.3, 10.15.5, ADR-022 §10.16.5
+> Infra-awareness: E12-S6 — applies infra-specific patterns when project_type is infrastructure or platform.
+## Subagent Invocation
+**Input variables:**
+- `{tech_stack}` — Detected technology stack from Step 1 discovery
+- `{project-path}` — Absolute path to the project source code directory
+- `{project_type}` — Project type: `application`, `infrastructure`, or `platform`
+**Output file:** `{planning_artifacts}/brownfield-scan-config-contradiction.md`
+## Subagent Prompt
+```
+You are a Config Contradiction Scanner for brownfield project analysis. Your task is to discover config files in the target project, build key-value maps, cross-reference values across files, and report contradictions using the standardized gap schema.
+### Inputs
+- Tech stack: {tech_stack}
+- Project path: {project-path}
+- Project type: {project_type}
+- Gap schema reference: Read _gaia/lifecycle/templates/gap-entry-schema.md for the output format
+### Step 1: Config File Discovery
+Discover config files using glob patterns. Apply both generic and stack-specific patterns.
+**Generic patterns (always apply):**
+- `**/*.yaml`, `**/*.yml` — YAML config files
+- `**/*.json` — JSON config files (exclude package-lock.json, yarn.lock)
+- `**/*.env` and `**/.env*` — Environment variable files
+- `**/*.toml` — TOML config files (exclude Pipfile.lock)
+- `**/*.ini` — INI config files
+- `**/*.properties` — Java properties files
+- `**/config*.xml` — XML config files
+**Exclusion patterns (always apply):**
+- `node_modules/`, `vendor/`, `dist/`, `build/`, `.git/`
+- Lock files: `package-lock.json`, `yarn.lock`, `Pipfile.lock`, `go.sum`, `pnpm-lock.yaml`
+- Test fixtures and mock data directories
+**Stack-specific patterns (apply based on {tech_stack}):**
+#### Java/Spring
+- `application.yml`, `application.properties`, `bootstrap.yml`
+- `application-{profile}.yml`, `application-{profile}.properties`
+- `src/main/resources/**/*.properties`, `src/main/resources/**/*.yml`
+#### Node/Express
+- `.env`, `.env.production`, `.env.development`, `.env.test`, `.env.local`
+- `config/` directory contents
+- `package.json` scripts section
+#### Python/Django
+- `settings.py`, `settings/*.py`
+- `.env`, `pyproject.toml` tool sections
+- `config.py`, `config/*.py`
+#### Go/Gin
+- `config.yaml`, `config.json`, `config.toml`
+- `.env`
+- Struct tags with `json:` / `mapstructure:` bindings
+### Step 1b: Infrastructure Config File Discovery (E12-S6)
+**Apply ONLY when {project_type} is `infrastructure` or `platform`.**
+In addition to the generic and stack-specific patterns above, scan for infrastructure configuration files:
+#### Terraform
+- `**/*.tf` — Terraform configuration files
+- `**/*.tfvars` — Terraform variable files (terraform.tfvars, *.auto.tfvars)
+- `**/*.tfvars.json` — JSON-format Terraform variables
+- `**/terraform.tfstate` — State files (check for drift, do not parse fully)
+- `**/backend.tf` — Backend configuration
+#### Helm / Kubernetes
+- `**/values.yaml`, `**/values-*.yaml` — Helm values files (values.yaml, values-dev.yaml, values-prod.yaml)
+- `**/Chart.yaml` — Helm chart metadata
+- `**/templates/**/*.yaml` — Helm templates (scan for hardcoded values vs template refs)
+- `**/*.yaml` in directories matching `k8s/`, `kubernetes/`, `manifests/`, `deploy/`
+#### Kustomize
+- `**/kustomization.yaml`, `**/kustomization.yml` — Kustomize configs
+- `**/overlays/**/*.yaml` — Kustomize overlay patches (detect contradictions between base and overlays)
+- `**/base/**/*.yaml` — Kustomize base resources
+#### Docker / Compose
+- `**/Dockerfile*` — Dockerfile variants
+- `**/docker-compose*.yml`, `**/docker-compose*.yaml` — Compose files
+- `**/.dockerignore` — Docker ignore files
+#### CI/CD
+- `.github/workflows/**/*.yml` — GitHub Actions workflows
+- `**/.gitlab-ci.yml` — GitLab CI config
+- `**/Jenkinsfile*` — Jenkins pipelines
+- `**/.circleci/config.yml` — CircleCI config
+**Infra contradiction detection focus areas:**
+- Same variable defined differently across terraform.tfvars files for different environments
+- Helm values.yaml contradicting kustomize overlay values for the same resource
+- Port numbers, resource limits, replica counts, and image tags inconsistent across environments
+- Backend configuration (S3 bucket, DynamoDB table) mismatched between Terraform state backends
+### Step 2: Build Key-Value Maps
+For each discovered config file, extract a key-value map:
+- Parse structured formats (YAML, JSON, TOML, INI, properties) into nested key paths
+- For .env files: parse KEY=VALUE pairs
+- For Terraform files: extract variable defaults, locals, and resource attributes
+- For Helm values: extract the full values tree
+- For kustomize overlays: extract patch operations and their target values
+### Step 3: Cross-Reference and Detect Contradictions
+Compare key-value maps across files:
+- Same key path with different values across files = contradiction
+- Environment-specific overrides that conflict with defaults
+- Port/host/URL mismatches between services
+- For infra projects: resource specification mismatches between environments
+### Step 4: Output
+Format each contradiction as a gap entry using the standardized schema:
+- category: `config-contradiction`
+- For infra-specific contradictions (terraform.tfvars, values.yaml, kustomize): also tag with infra context in the description
+- id: `GAP-CONFIG-{seq}` — sequential numbering starting at 001
+- verified_by: `machine-detected`
+- Budget: max 70 entries, truncate low-severity entries if exceeded
+```
+## Output File
+Write all findings to: `{planning_artifacts}/brownfield-scan-config-contradiction.md`

package/_gaia/lifecycle/knowledge/brownfield/dead-code-scan.md ADDED Viewed

@@ -0,0 +1,179 @@
+# Dead Code & Dead State Scanner — Subagent Prompt Template
+> Brownfield deep analysis scan subagent for detecting dead code, dead state, and abandoned functionality.
+> Reference: Architecture ADR-021, Section 10.15.2, 10.15.3, 10.15.5
+## Subagent Invocation
+**Input variables:**
+- `{tech_stack}` — Detected technology stack from Step 1 discovery (e.g., "Java/Spring", "Node/Express", "Python/Django", "Go/Gin")
+- `{project-path}` — Absolute path to the project source code directory
+**Output file:** `{planning_artifacts}/brownfield-scan-dead-code.md`
+**Invocation model:** Spawned via Agent tool in a single message alongside 6 other deep analysis scan subagents (parallel execution per architecture 10.15.2).
+## Subagent Prompt
+```
+You are a Dead Code & Dead State Scanner for brownfield project analysis. Your task is to discover dead code, unused state, and abandoned functionality in the target project using LLM-based static analysis (grep/glob/read), then report findings using the standardized gap schema format.
+### Inputs
+- Tech stack: {tech_stack}
+- Project path: {project-path}
+- Gap schema reference: Read _gaia/lifecycle/templates/gap-entry-schema.md for the output format
+### Step 1: Universal Dead Code Detection
+Apply these detection patterns regardless of tech stack.
+#### 1.1 Unreachable Code Paths
+Scan for code that can never execute:
+- Code after unconditional `return`, `throw`, `exit`, `break`, `continue` statements
+- Unreachable switch/match branches (default after exhaustive cases)
+- Dead branches behind constant `false` conditions (`if (false)`, `if (0)`)
+- Functions defined but never called anywhere in the project
+#### 1.2 Unused Exports, Functions, and Classes
+Cross-reference declarations against usage across the entire project:
+- Grep for all exported symbols (functions, classes, constants, types)
+- Cross-reference each export against import/require/usage statements in other files
+- A declaration with zero references across the project is definitely unused (confidence: high)
+- A declaration referenced only in the same file where it is defined may be dead if not exported
+#### 1.3 Commented-Out Code Blocks (>5 Lines)
+Scan for blocks of more than 5 consecutive commented lines that contain code patterns:
+- Function definitions, class declarations, control flow (if/else, for, while, switch)
+- Variable assignments, return statements, import/require statements
+- Threshold is strictly greater than 5 lines — exactly 5 lines does NOT trigger detection
+- Distinguish code comments from documentation comments (JSDoc, Javadoc, docstrings)
+#### 1.4 Unused Database Artifacts (Dead State)
+Cross-reference migration files against ORM models and query patterns:
+- Tables or columns defined in migration files but not referenced in any ORM model, query builder, or raw SQL
+- Indexes on columns/tables that are no longer queried
+- Seed data for tables that are no longer used
+#### 1.5 Feature Flag Staleness
+Identify feature flags that are permanently on or permanently off:
+- Flag variables assigned a constant value (true/false) with no conditional reassignment anywhere
+- Feature gate checks where the flag value is always the same at every call site
+- Determination is based on static analysis of the codebase only — no commit history analysis required
+### Step 2: Stack-Aware Pattern Detection
+Apply patterns based on the detected {tech_stack}. For multi-stack projects (monorepos), apply all relevant stack patterns — each stack's patterns apply only to files matching that stack's file extensions, preventing cross-contamination.
+#### Java/Spring
+- Unused `@Service`, `@Repository`, `@Component` beans — annotated classes with no `@Autowired` or constructor injection anywhere in the project
+- Unused `@Scheduled` methods — scheduled task methods that are defined but their containing bean is never loaded
+- Orphaned `@Entity` classes — JPA entities not referenced by any repository or query
+- Unused Spring `@Configuration` beans — config classes that declare beans never injected
+- Confidence: set to `medium` for Spring beans (XML config or component scan may inject dynamically)
+#### Node/Express
+- Unused `module.exports` or `export` declarations — exported symbols never imported elsewhere
+- Orphaned route handlers — handler functions defined but not registered in any router
+- Unused middleware — middleware functions defined but not applied to any route or app
+- Dead `require()` or `import` in index/barrel files — re-exported modules never consumed
+- Unused npm scripts — scripts in package.json never referenced by other scripts or CI
+#### Python/Django
+- Unused views — view functions or classes defined in views.py but not mapped in any `urlpatterns`
+- Unused serializers — serializer classes defined but never used in any view or viewset
+- Orphaned management commands — commands defined but never invoked in scripts or docs
+- Dead Celery tasks — task functions decorated with `@shared_task` or `@app.task` but never called via `.delay()` or `.apply_async()`
+- Unused Django model methods — methods on models never called outside the model file
+#### Go/Gin
+- Unexported functions with no callers in the same package — lowercase functions never referenced
+- Unused handler functions — HTTP handler functions not registered in any router group
+- Dead `init()` blocks — init functions in files that are never imported
+- Unused struct methods — methods on types never called anywhere in the project
+- Unused interface implementations — types implementing interfaces but never used polymorphically
+### Step 3: Confidence Level Assignment
+Assign confidence levels to distinguish between "definitely unused" and "possibly unused":
+- **`high`** — Zero references found anywhere in the project. The code is definitely unused based on static analysis. No dynamic import, reflection, or metaprogramming patterns could reference it.
+- **`medium`** — No direct references found, but dynamic import patterns exist in the project (e.g., `require(variable)`, `importlib.import_module()`, Spring component scanning). The code is possibly unused but dynamic references cannot be ruled out.
+- **`low`** — The code appears unused, but reflection, metaprogramming, or runtime code generation patterns are present (e.g., Java reflection, Python `getattr()`, Go `reflect` package). Cannot confidently determine usage status.
+Include a note in the `description` field explaining why certainty is limited for medium and low confidence findings.
+### Step 4: Format Output
+Format all findings as gap entries using the standardized gap entry schema format:
+- `category`: always `"dead-code"`
+- `verified_by`: always `"machine-detected"`
+- `id`: sequential `GAP-DEAD-CODE-001`, `GAP-DEAD-CODE-002`, etc.
+- `confidence`: per Step 3 classification
+Example gap entry structure:
+```yaml
+gap:
+  id: "GAP-DEAD-CODE-001"
+  category: "dead-code"
+  severity: "medium"
+  title: "Unused exported function processLegacyData()"
+  description: "Function is exported but never imported elsewhere. Zero references — definitely unused."
+  evidence:
+    file: "src/utils/legacy.js"
+    line: 42
+  recommendation: "Remove the unused function or mark as deprecated."
+  verified_by: "machine-detected"
+  confidence: "high"
+```
+All required fields must be populated:
+- `id` — unique identifier in format `GAP-DEAD-CODE-{seq}` (zero-padded 3-digit sequence)
+- `category` — always `"dead-code"`
+- `severity` — impact level (critical/high/medium/low)
+- `title` — one-line summary (max 80 chars)
+- `description` — detailed explanation including evidence and confidence rationale
+- `evidence` — composite object with `file` (relative path) and `line` (line number)
+- `recommendation` — actionable fix suggestion
+- `verified_by` — always `"machine-detected"`
+- `confidence` — detection certainty (high/medium/low)
+**Severity classification:**
+- **critical:** Dead code that masks active security vulnerabilities or causes resource leaks
+- **high:** Large dead code blocks (>50 lines) or dead database state causing confusion
+- **medium:** Unused functions, classes, or exports (standard dead code)
+- **low:** Small commented-out blocks, unused imports, stale feature flags
+### Step 5: Budget Control
+Use structured schema format (~100 tokens per gap entry) — no prose descriptions.
+- Maximum ~70 gap entries in the output (per NFR-024)
+- If more than 70 findings are detected, include the 70 highest-severity entries
+- When approaching the budget limit, prioritize higher-severity findings and summarize remaining as a count
+- Append a budget summary section:
+  ```
+  ## Budget Summary
+  Total gaps detected: {N}. Showing top 70 by severity. Omitted: {N-70} entries ({breakdown by severity}).
+  ```
+Write the complete output to: `{planning_artifacts}/brownfield-scan-dead-code.md`
+The output file should have this structure:
+```markdown
+# Brownfield Scan: Dead Code & Dead State
+> Scanner: Dead Code & Dead State Scanner
+> Tech Stack: {tech_stack}
+> Date: {date}
+> Files Scanned: {count}
+## Findings
+{gap entries in standardized schema format}
+## Budget Summary (if applicable)
+{truncation details if >70 entries}
+```
+```

package/_gaia/lifecycle/knowledge/brownfield/test-execution-scan.md ADDED Viewed

@@ -0,0 +1,209 @@
+# Test Execution Scan — Brownfield Subagent Prompt
+> **Version:** 1.0.0
+> **Story:** E11-S9
+> **Traces to:** FR-110, US-37, ADR-021
+> **Category:** runtime-behavior (test failures map to runtime-behavior per gap-entry-schema.md)
+> **Output format:** Standardized gap entry schema (`_gaia/lifecycle/templates/gap-entry-schema.md`)
+## Objective
+Run the existing test suite at `{project-path}` during brownfield discovery. Capture test failures as gap entries conforming to the gap schema. This scan is **non-blocking** — failures do not halt the brownfield workflow.
+## Test Runner Auto-Detection
+Detect the test runner by checking for the following files at `{project-path}`. Use the **priority order** below — select the first matching runner. For monorepo/polyglot projects, detect **all** matching runners and execute them sequentially.
+### Detection Priority Order
+| Priority | File Check | Condition | Runner Command |
+|----------|-----------|-----------|----------------|
+| 1 | `package.json` | Has `scripts.test` defined AND value is not `"echo \"Error: no test specified\""` | `npm test` |
+| 2 | `pytest.ini` / `pyproject.toml` / `setup.cfg` | `pytest.ini` exists, OR `pyproject.toml` contains `[tool.pytest]`, OR `setup.cfg` contains `[tool:pytest]` | `pytest` |
+| 3 | `pom.xml` | File exists | `mvn test` |
+| 4 | `build.gradle` / `build.gradle.kts` | Either file exists | `gradle test` |
+| 5 | `go.mod` | File exists | `go test ./...` |
+| 6 | `pubspec.yaml` | File exists | `flutter test` |
+### No Test Suite Detected (AC6)
+If no test runner is detected at `{project-path}`, produce a single info-level gap entry:
+```yaml
+id: "GAP-TEST-INFO-001"
+category: "runtime-behavior"
+severity: "info"
+title: "No test suite detected"
+description: "No recognized test runner configuration found at {project-path}. The project has no automated tests or uses an unsupported test framework."
+evidence:
+  file: "{project-path}"
+  line: 0
+recommendation: "Add a test framework (Jest, pytest, JUnit, etc.) and write initial unit tests for critical paths."
+verified_by: "machine-detected"
+confidence: "high"
+```
+Proceed without error after logging this gap.
+## Test Execution with Timeout (AC3)
+Execute each detected runner with a configurable timeout (default **5 minutes** / 300 seconds).
+```bash
+timeout 300 npm test 2>&1
+```
+### Timeout Behavior
+- If the timeout is exceeded, terminate the process gracefully
+- Capture partial results from stdout/stderr up to the timeout point
+- Log a warning-level gap entry noting the timeout:
+```yaml
+id: "GAP-TEST-{seq}"
+category: "runtime-behavior"
+severity: "medium"
+title: "Test suite timed out after 5 minutes"
+description: "Test execution exceeded the 300s timeout. Partial results captured: {N} tests ran before timeout."
+evidence:
+  file: "{test-config-file}"
+  line: 0
+recommendation: "Investigate slow tests. Consider splitting the test suite or increasing the timeout for CI."
+verified_by: "machine-detected"
+confidence: "medium"
+```
+### Sequential Execution for Multiple Runners (AC9)
+For monorepo or polyglot projects with multiple test runners detected:
+1. Execute each detected runner sequentially (not in parallel)
+2. Aggregate results across all runners
+3. Include the runner name in the `description` field of each gap entry (e.g., "npm test: ...")
+4. Use a shared sequence counter across all runners for gap entry IDs
+## Output Parsing (AC4)
+After each test run completes (or times out), parse the output to extract metrics:
+- **Total** test count
+- **Passing** count
+- **Failing** count
+- **Skipped** count
+- **Error messages** for each failing test
+### Parsing Patterns by Runner
+**Jest/Mocha/Vitest:**
+- Summary line: `Tests: N passed, N failed, N total` (Jest) or `Tests N passed | N failed` (Vitest)
+- Individual: `FAIL src/path/to/test.js` / `PASS src/path/to/test.js`
+- Exit code 1 = failures present
+**pytest:**
+- Summary line: `N passed, N failed, N error`
+- Individual: `FAILED test_file.py::test_name`
+**Maven Surefire:**
+- Summary in `target/surefire-reports/` XML files
+- Console: `Tests run: N, Failures: N, Errors: N, Skipped: N`
+**Go test:**
+- Per-test: `--- FAIL: TestName` / `--- PASS: TestName`
+- Summary: `FAIL` or `ok` per package
+**Flutter test:**
+- Summary: `All tests passed!` or `N tests failed`
+- Per-test: `FAILED: test description`
+## Infrastructure Error Detection (AC8)
+Before converting failures to gap entries, check if the error is an infrastructure dependency failure rather than an actual test failure.
+**Infrastructure error heuristics:**
+- Pattern match stderr/stdout for: `ECONNREFUSED`, `connection refused`, `missing environment variable`, `ENOENT`, `docker`, `database connection`, `redis`, `ETIMEDOUT`, `EHOSTUNREACH`
+- Exit codes indicating non-test errors (e.g., process crash, missing binary)
+If an infrastructure error is detected:
+- Do NOT convert to test failure gap entries
+- Instead, log a **warning-level** gap entry:
+```yaml
+id: "GAP-TEST-{seq}"
+category: "runtime-behavior"
+severity: "medium"
+title: "Test infrastructure dependency unavailable"
+description: "Test execution failed due to infrastructure dependency: {detected_pattern}. This is not a test logic failure."
+evidence:
+  file: "{test-config-file}"
+  line: 0
+recommendation: "Ensure required infrastructure (databases, caches, external services) is available before running tests. Consider using test doubles for external dependencies."
+verified_by: "machine-detected"
+confidence: "medium"
+```
+## Gap Entry Conversion (AC5)
+For each failing test, produce a gap entry conforming to the standardized gap schema:
+### ID Format
+- `GAP-TEST-{seq}` where `{seq}` is a zero-padded 3-digit sequence (001, 002, ...)
+- Example: `GAP-TEST-001`, `GAP-TEST-002`
+### Severity Mapping by Test Type
+Infer test type from file path patterns:
+| File Path Pattern | Test Type | Severity |
+|-------------------|-----------|----------|
+| `test/unit/`, `tests/unit/`, `__tests__/`, `*.unit.test.*` | unit | medium |
+| `test/integration/`, `tests/integration/`, `*.integration.test.*` | integration | high |
+| `test/e2e/`, `tests/e2e/`, `test/end-to-end/`, `*.e2e.test.*` | e2e | critical |
+| Cannot be determined | default | medium |
+### Gap Entry Template
+```yaml
+id: "GAP-TEST-{seq}"
+category: "runtime-behavior"
+severity: "{severity_from_test_type}"
+title: "{test_name} — failing"
+description: "{runner_name}: {error_message}"
+evidence:
+  file: "{test_file_path}"
+  line: "{line_number_if_available}"
+recommendation: "Fix the failing test or update the test to match current behavior."
+verified_by: "machine-detected"
+confidence: "high"
+```
+## Token Budget Control (AC7)
+Per NFR-024, the total output must stay within the 40K token framework budget.
+- Each gap entry averages ~100 tokens
+- If test output produces more than 70 gap entries, truncate:
+  - Keep the first 70 gap entries (prioritized by severity: critical > high > medium > low)
+  - Add a summary line: `<!-- TRUNCATED: {N} additional test failures omitted to stay within NFR-024 token budget -->`
+- If raw test output exceeds budget before parsing, truncate the raw output and parse what is available
+## Output File
+Write all gap entries to: `{planning_artifacts}/brownfield-scan-test-execution.md`
+Format:
+```markdown
+# Brownfield Scan: Test Execution
+> Scan type: test-execution
+> Runner(s): {detected_runners}
+> Date: {date}
+## Test Metrics
+| Runner | Total | Passed | Failed | Skipped |
+|--------|-------|--------|--------|---------|
+| {runner} | {n} | {n} | {n} | {n} |
+## Gap Entries
+{YAML gap entries here}
+```