npm - @ryuenn3123/agentic-senior-core - Versions diffs - 2.0.13 → 2.0.15 - Mend

@ryuenn3123/agentic-senior-core 2.0.13 → 2.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/.agent-context/blueprints/laravel-api.md +13 -3
package/.agent-context/prompts/init-project.md +1 -1
package/.agent-context/stacks/php.md +19 -5
package/.agent-context/state/benchmark-evidence-bundle.json +390 -0
package/.agent-context/state/benchmark-reproducibility.json +85 -0
package/.cursorrules +1 -1
package/.windsurfrules +1 -1
package/README.md +17 -0
package/package.json +2 -1
package/scripts/benchmark-evidence-bundle.mjs +175 -0
package/scripts/validate.mjs +2 -0

package/.agent-context/blueprints/laravel-api.md CHANGED Viewed

@@ -1,12 +1,12 @@
 # Blueprint: Laravel API
-> PHP backend API service using Laravel 12, PHP 8.5+, Form Requests, Eloquent, and Scribe for docs.
+> PHP backend API service using Laravel 13, PHP 8.3+, Form Requests, Eloquent, and Scribe for docs.
 ## Tech Stack
 | Layer | Technology |
 |-------|-----------|
-| Framework | Laravel 12 |
+| Framework | Laravel 13 |
 | Validation | Form Requests |
 | ORM | Eloquent |
 | Migration | Laravel Migrations |
@@ -15,6 +15,14 @@
 | Formatting | Laravel Pint |
 | API docs | Scribe |
+## Laravel 13 Upgrade Guardrails
+- Target `laravel/framework:^13.0` with PHP 8.3+.
+- Use `PreventRequestForgery` when explicitly disabling or excluding CSRF middleware in tests and routes.
+- Keep `upsert` calls explicit with a non-empty `uniqueBy` value for MySQL and MariaDB paths.
+- Decide cache object strategy up front: primitive payloads, or explicit `serializable_classes` allow-list.
+- For existing Laravel 12 projects, keep framework-12-compatible middleware and APIs until upgrade is done; treat this blueprint as target-state guidance.
 ---
 ## Project Structure
@@ -210,7 +218,9 @@ final class UserResource extends JsonResource
 ## Scaffolding Checklist
-- [ ] Create Laravel project: `composer create-project laravel/laravel`
+- [ ] Create Laravel project: `composer create-project laravel/laravel:^13.0`
+- [ ] Confirm core dependencies: `laravel/framework:^13.0`, `laravel/tinker:^3.0`, `phpunit/phpunit:^12.0`, `pestphp/pest:^4.0`
+- [ ] Optional AI workflow: install `laravel/boost:^2.0` and run `php artisan boost:install`
 - [ ] Set up modular structure under `app/Modules/`
 - [ ] Create shared error handler with consistent JSON responses
 - [ ] Create shared `ApiResponse` trait for standard response format

package/.agent-context/prompts/init-project.md CHANGED Viewed

@@ -74,7 +74,7 @@ Every dependency MUST be justified per rules/efficiency-vs-hype.md.
 | `api-nextjs` | Next.js App Router API project |
 | `nestjs-logic` | NestJS backend service |
 | `fastapi-service` | Python FastAPI backend service |
-| `laravel-api` | PHP Laravel 12 API |
+| `laravel-api` | PHP Laravel 13 API |
 | `spring-boot-api`| Java Spring Boot 4 API |
 | `go-service` | Go chi HTTP service |
 | `aspnet-api` | C# ASP.NET Minimal API |

package/.agent-context/stacks/php.md CHANGED Viewed

@@ -3,9 +3,9 @@
 > PHP 8.x is a different language from PHP 5.
 > If your AI writes PHP without type declarations, reject it immediately.
-## Language Version: PHP 8.5+ (Latest Stable)
+## Language Version: PHP 8.3+ (Laravel 13 Baseline, 8.5 Recommended)
-PHP 8.5 is stable since November 2025. Use modern PHP features including the pipe operator (`|>`), `Clone With`, and readonly classes.
+Laravel 13 requires PHP 8.3+. Use PHP 8.5 when your runtime supports it, but avoid forcing 8.5-only syntax in shared packages unless project constraints explicitly require it.
 ### Strict Types Everywhere
 ```php
@@ -42,7 +42,7 @@ enum OrderStatus: string {
 }
 ```
-### Readonly Properties and Classes (PHP 8.2+) and Pipe Operator (PHP 8.5+)
+### Readonly Properties and Classes (PHP 8.2+)
 ```php
 // Readonly for DTOs and value objects
 readonly class CreateUserDto {
@@ -52,8 +52,11 @@ readonly class CreateUserDto {
         public int $age,
     ) {}
 }
+```
-// Pipe operator for cleaner function chains (PHP 8.5)
+### Optional on PHP 8.5+: Pipe Operator
+```php
+// Use when your project runtime is locked to PHP 8.5+
 $result = $input
     |> 'trim'
     |> 'strtolower'
@@ -151,7 +154,7 @@ parameters:
 | Need | Library | Why |
 |------|---------|-----|
-| Framework | Laravel 12 | Most productive PHP framework, auto eager loading, GraphQL |
+| Framework | Laravel 13 | Most productive PHP framework with AI SDK, JSON:API resources, and stronger security defaults |
 | Validation | Laravel Form Requests | Built-in, declarative |
 | ORM | Eloquent | Convention over configuration |
 | Testing | PHPUnit / Pest | Pest preferred for readability |
@@ -164,6 +167,17 @@ parameters:
 ---
+## Laravel 13 Guardrails
+- Use `PreventRequestForgery` for explicit CSRF middleware references (old aliases still exist but are deprecated).
+- Ensure `upsert(..., uniqueBy: ...)` always passes a non-empty `uniqueBy` value.
+- Prefer first-party JSON:API resources when you need JSON:API-compliant responses.
+- If caching objects, configure `cache.serializable_classes` allow-list explicitly.
+- For AI-assisted Laravel projects, use `laravel/boost` `^2.0` and run `php artisan boost:install`.
+- Laravel 12 projects are still supported: keep `VerifyCsrfToken` and avoid 13-only API assumptions until framework upgrade is complete.
+---
 ## Banned Patterns
 | Pattern | Why | Alternative |

package/.agent-context/state/benchmark-evidence-bundle.json ADDED Viewed

@@ -0,0 +1,390 @@
+{
+  "generatedAt": "2026-04-13T15:56:01.200Z",
+  "reportName": "benchmark-evidence-bundle",
+  "phase": "v2.5.1",
+  "passed": true,
+  "failureCount": 0,
+  "methodology": {
+    "deterministicRuntime": {
+      "timezone": "UTC",
+      "locale": "C",
+      "nodeMajor": "22",
+      "lineEndings": "LF-preferred",
+      "shellNotes": "PowerShell and POSIX shells are supported; prefer portable commands for benchmark reruns."
+    },
+    "scenarioCount": 4,
+    "commandCount": 5
+  },
+  "rerunInstructions": [
+    "Run npm run benchmark:detection to regenerate detection benchmark output.",
+    "Run npm run benchmark:gate to validate benchmark anti-regression thresholds.",
+    "Run npm run benchmark:intelligence to validate benchmark watchlist freshness.",
+    "Run npm run benchmark:bundle to emit a reproducible benchmark evidence bundle."
+  ],
+  "commandExamples": [
+    "npm run benchmark:detection",
+    "npm run benchmark:gate",
+    "npm run benchmark:intelligence",
+    "npm run benchmark:bundle",
+    "node ./scripts/benchmark-evidence-bundle.mjs --stdout-only"
+  ],
+  "rawInputs": {
+    "scenarios": [
+      {
+        "id": "planning",
+        "category": "planning",
+        "inputReferences": [
+          ".agent-context/state/architecture-map.md",
+          ".agent-context/state/dependency-map.md",
+          ".agent-context/rules/architecture.md"
+        ],
+        "expectedSignals": [
+          "clear sequencing",
+          "risk mapping",
+          "rollback path"
+        ],
+        "primaryCommand": "npm run benchmark:detection"
+      },
+      {
+        "id": "refactor",
+        "category": "refactor",
+        "inputReferences": [
+          "tests/cli-smoke.test.mjs",
+          "scripts/validate.mjs"
+        ],
+        "expectedSignals": [
+          "regression awareness",
+          "small safe diffs",
+          "test-backed changes"
+        ],
+        "primaryCommand": "npm run benchmark:gate"
+      },
+      {
+        "id": "security",
+        "category": "security",
+        "inputReferences": [
+          ".agent-context/rules/security.md",
+          "scripts/forbidden-content-check.mjs"
+        ],
+        "expectedSignals": [
+          "secret hygiene",
+          "unsafe pattern detection",
+          "release blocking on risk"
+        ],
+        "primaryCommand": "npm run gate:release"
+      },
+      {
+        "id": "delivery",
+        "category": "delivery",
+        "inputReferences": [
+          "scripts/release-gate.mjs",
+          "scripts/benchmark-intelligence.mjs",
+          ".agent-context/state/benchmark-watchlist.json"
+        ],
+        "expectedSignals": [
+          "release readiness",
+          "competitive coverage",
+          "SLA freshness"
+        ],
+        "primaryCommand": "npm run benchmark:intelligence"
+      }
+    ],
+    "benchmarkThresholds": {
+      "minimumTop1Accuracy": 0.9,
+      "maximumManualCorrectionRate": 0.12,
+      "maximumTop1AccuracyDrop": 0.02,
+      "maximumManualCorrectionIncrease": 0.03,
+      "previousReleaseBaseline": {
+        "top1Accuracy": 0.9167,
+        "manualCorrectionRate": 0.0833
+      }
+    },
+    "benchmarkWatchlist": [
+      {
+        "repository": "sickn33/antigravity-awesome-skills",
+        "owner": "core-architecture",
+        "lastReviewedAt": "2026-04-02"
+      },
+      {
+        "repository": "github/awesome-copilot",
+        "owner": "core-architecture",
+        "lastReviewedAt": "2026-04-02"
+      },
+      {
+        "repository": "MiniMax-AI/skills",
+        "owner": "frontend-governance",
+        "lastReviewedAt": "2026-04-02"
+      }
+    ]
+  },
+  "rubric": {
+    "benchmarkThresholds": {
+      "minimumTop1Accuracy": 0.9,
+      "maximumManualCorrectionRate": 0.12,
+      "maximumTop1AccuracyDrop": 0.02,
+      "maximumManualCorrectionIncrease": 0.03
+    },
+    "intelligenceSlaDays": 14
+  },
+  "outputs": {
+    "detectionBenchmark": {
+      "generatedAt": "2026-04-13T15:56:01.040Z",
+      "fixtureCount": 12,
+      "top1Accuracy": 0.9167,
+      "manualCorrectionRate": 0.0833,
+      "fixtures": [
+        {
+          "fixtureName": "typescript-basic",
+          "expectedStack": "typescript.md",
+          "detectedStack": "typescript.md",
+          "confidenceGap": 0.94,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "typescript-next",
+          "expectedStack": "typescript.md",
+          "detectedStack": "typescript.md",
+          "confidenceGap": 0.97,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "python-poetry",
+          "expectedStack": "python.md",
+          "detectedStack": "python.md",
+          "confidenceGap": 0.96,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "python-requirements",
+          "expectedStack": "python.md",
+          "detectedStack": "python.md",
+          "confidenceGap": 0.78,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "java-maven",
+          "expectedStack": "java.md",
+          "detectedStack": "java.md",
+          "confidenceGap": 0.95,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "java-gradle",
+          "expectedStack": "java.md",
+          "detectedStack": "java.md",
+          "confidenceGap": 0.84,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "php-composer",
+          "expectedStack": "php.md",
+          "detectedStack": "php.md",
+          "confidenceGap": 0.95,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "go-module",
+          "expectedStack": "go.md",
+          "detectedStack": "go.md",
+          "confidenceGap": 0.96,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "dotnet-solution",
+          "expectedStack": "csharp.md",
+          "detectedStack": "csharp.md",
+          "confidenceGap": 0.95,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "rust-cargo",
+          "expectedStack": "rust.md",
+          "detectedStack": "rust.md",
+          "confidenceGap": 0.96,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "ruby-gemfile",
+          "expectedStack": "ruby.md",
+          "detectedStack": "ruby.md",
+          "confidenceGap": 0.95,
+          "needsManualCorrection": false,
+          "isCorrect": true
+        },
+        {
+          "fixtureName": "mixed-ts-python",
+          "expectedStack": "typescript.md",
+          "detectedStack": "python.md",
+          "confidenceGap": 0.02,
+          "needsManualCorrection": true,
+          "isCorrect": false
+        }
+      ]
+    },
+    "benchmarkGate": {
+      "generatedAt": "2026-04-13T15:56:01.144Z",
+      "gateName": "benchmark-gate",
+      "passed": true,
+      "failureCount": 0,
+      "benchmarkResult": {
+        "fixtureCount": 12,
+        "top1Accuracy": 0.9167,
+        "manualCorrectionRate": 0.0833
+      },
+      "thresholds": {
+        "minimumTop1Accuracy": 0.9,
+        "maximumManualCorrectionRate": 0.12,
+        "maximumTop1AccuracyDrop": 0.02,
+        "maximumManualCorrectionIncrease": 0.03,
+        "previousReleaseBaseline": {
+          "top1Accuracy": 0.9167,
+          "manualCorrectionRate": 0.0833
+        }
+      },
+      "results": [
+        {
+          "checkName": "minimum-top1-accuracy",
+          "passed": true,
+          "details": "top1Accuracy=0.9167 minimum=0.9"
+        },
+        {
+          "checkName": "maximum-manual-correction-rate",
+          "passed": true,
+          "details": "manualCorrectionRate=0.0833 maximum=0.12"
+        },
+        {
+          "checkName": "maximum-top1-accuracy-drop",
+          "passed": true,
+          "details": "drop=0 maximum=0.02"
+        },
+        {
+          "checkName": "maximum-manual-correction-increase",
+          "passed": true,
+          "details": "increase=0 maximum=0.03"
+        }
+      ]
+    },
+    "benchmarkIntelligence": {
+      "generatedAt": "2026-04-13T15:56:01.192Z",
+      "reportName": "benchmark-intelligence",
+      "passed": true,
+      "failureCount": 0,
+      "reviewSlaDays": 14,
+      "watchlist": [
+        {
+          "repository": "sickn33/antigravity-awesome-skills",
+          "owner": "core-architecture",
+          "lastReviewedAt": "2026-04-02",
+          "ageInDays": 11,
+          "stale": false
+        },
+        {
+          "repository": "github/awesome-copilot",
+          "owner": "core-architecture",
+          "lastReviewedAt": "2026-04-02",
+          "ageInDays": 11,
+          "stale": false
+        },
+        {
+          "repository": "MiniMax-AI/skills",
+          "owner": "frontend-governance",
+          "lastReviewedAt": "2026-04-02",
+          "ageInDays": 11,
+          "stale": false
+        }
+      ],
+      "results": [
+        {
+          "checkName": "required-benchmark-repository",
+          "repository": "sickn33/antigravity-awesome-skills",
+          "passed": true,
+          "details": "sickn33/antigravity-awesome-skills is present in watchlist"
+        },
+        {
+          "checkName": "required-benchmark-repository",
+          "repository": "github/awesome-copilot",
+          "passed": true,
+          "details": "github/awesome-copilot is present in watchlist"
+        },
+        {
+          "checkName": "required-benchmark-repository",
+          "repository": "MiniMax-AI/skills",
+          "passed": true,
+          "details": "MiniMax-AI/skills is present in watchlist"
+        },
+        {
+          "checkName": "watchlist-owner-defined",
+          "repository": "sickn33/antigravity-awesome-skills",
+          "passed": true,
+          "details": "Owner core-architecture is defined"
+        },
+        {
+          "checkName": "review-sla-compliance",
+          "repository": "sickn33/antigravity-awesome-skills",
+          "passed": true,
+          "details": "ageInDays=11 slaDays=14"
+        },
+        {
+          "checkName": "watchlist-owner-defined",
+          "repository": "github/awesome-copilot",
+          "passed": true,
+          "details": "Owner core-architecture is defined"
+        },
+        {
+          "checkName": "review-sla-compliance",
+          "repository": "github/awesome-copilot",
+          "passed": true,
+          "details": "ageInDays=11 slaDays=14"
+        },
+        {
+          "checkName": "watchlist-owner-defined",
+          "repository": "MiniMax-AI/skills",
+          "passed": true,
+          "details": "Owner frontend-governance is defined"
+        },
+        {
+          "checkName": "review-sla-compliance",
+          "repository": "MiniMax-AI/skills",
+          "passed": true,
+          "details": "ageInDays=11 slaDays=14"
+        }
+      ]
+    }
+  },
+  "executions": [
+    {
+      "scriptPath": "scripts/detection-benchmark.mjs",
+      "exitCode": 0,
+      "parseError": null,
+      "stderr": null,
+      "reportName": null,
+      "passed": null
+    },
+    {
+      "scriptPath": "scripts/benchmark-gate.mjs",
+      "exitCode": 0,
+      "parseError": null,
+      "stderr": null,
+      "reportName": "benchmark-gate",
+      "passed": true
+    },
+    {
+      "scriptPath": "scripts/benchmark-intelligence.mjs",
+      "exitCode": 0,
+      "parseError": null,
+      "stderr": null,
+      "reportName": "benchmark-intelligence",
+      "passed": true
+    }
+  ]
+}

package/.agent-context/state/benchmark-reproducibility.json ADDED Viewed

@@ -0,0 +1,85 @@
+{
+  "version": "1.0.0",
+  "phase": "v2.5.1",
+  "updatedAt": "2026-04-13",
+  "deterministicRuntime": {
+    "timezone": "UTC",
+    "locale": "C",
+    "nodeMajor": "22",
+    "lineEndings": "LF-preferred",
+    "shellNotes": "PowerShell and POSIX shells are supported; prefer portable commands for benchmark reruns."
+  },
+  "scenarios": [
+    {
+      "id": "planning",
+      "category": "planning",
+      "inputReferences": [
+        ".agent-context/state/architecture-map.md",
+        ".agent-context/state/dependency-map.md",
+        ".agent-context/rules/architecture.md"
+      ],
+      "expectedSignals": [
+        "clear sequencing",
+        "risk mapping",
+        "rollback path"
+      ],
+      "primaryCommand": "npm run benchmark:detection"
+    },
+    {
+      "id": "refactor",
+      "category": "refactor",
+      "inputReferences": [
+        "tests/cli-smoke.test.mjs",
+        "scripts/validate.mjs"
+      ],
+      "expectedSignals": [
+        "regression awareness",
+        "small safe diffs",
+        "test-backed changes"
+      ],
+      "primaryCommand": "npm run benchmark:gate"
+    },
+    {
+      "id": "security",
+      "category": "security",
+      "inputReferences": [
+        ".agent-context/rules/security.md",
+        "scripts/forbidden-content-check.mjs"
+      ],
+      "expectedSignals": [
+        "secret hygiene",
+        "unsafe pattern detection",
+        "release blocking on risk"
+      ],
+      "primaryCommand": "npm run gate:release"
+    },
+    {
+      "id": "delivery",
+      "category": "delivery",
+      "inputReferences": [
+        "scripts/release-gate.mjs",
+        "scripts/benchmark-intelligence.mjs",
+        ".agent-context/state/benchmark-watchlist.json"
+      ],
+      "expectedSignals": [
+        "release readiness",
+        "competitive coverage",
+        "SLA freshness"
+      ],
+      "primaryCommand": "npm run benchmark:intelligence"
+    }
+  ],
+  "rerunInstructions": [
+    "Run npm run benchmark:detection to regenerate detection benchmark output.",
+    "Run npm run benchmark:gate to validate benchmark anti-regression thresholds.",
+    "Run npm run benchmark:intelligence to validate benchmark watchlist freshness.",
+    "Run npm run benchmark:bundle to emit a reproducible benchmark evidence bundle."
+  ],
+  "commandExamples": [
+    "npm run benchmark:detection",
+    "npm run benchmark:gate",
+    "npm run benchmark:intelligence",
+    "npm run benchmark:bundle",
+    "node ./scripts/benchmark-evidence-bundle.mjs --stdout-only"
+  ]
+}

package/.cursorrules CHANGED Viewed

@@ -1,6 +1,6 @@
 # AGENTIC-SENIOR-CORE DYNAMIC GOVERNANCE RULESET
-Generated by Agentic-Senior-Core CLI v2.0.13
+Generated by Agentic-Senior-Core CLI v2.0.15
 Timestamp: 2026-04-08T14:58:53.570Z
 Selected profile: beginner
 Selected policy file: .agent-context/policies/llm-judge-threshold.json

package/.windsurfrules CHANGED Viewed

@@ -1,6 +1,6 @@
 # AGENTIC-SENIOR-CORE DYNAMIC GOVERNANCE RULESET
-Generated by Agentic-Senior-Core CLI v2.0.13
+Generated by Agentic-Senior-Core CLI v2.0.15
 Timestamp: 2026-04-08T14:58:53.570Z
 Selected profile: beginner
 Selected policy file: .agent-context/policies/llm-judge-threshold.json

package/README.md CHANGED Viewed

@@ -244,6 +244,23 @@ Reproduce and refresh this table:
 npm run benchmark:token
 ```
+### Benchmark Evidence Bundle (V2.5.1 Baseline)
+Generate a reproducible benchmark evidence artifact (inputs, rubric, rerun instructions, and outputs):
+```bash
+npm run benchmark:bundle
+```
+This command writes:
+- `.agent-context/state/benchmark-evidence-bundle.json`
+For CI pipelines that only need stdout JSON:
+```bash
+node ./scripts/benchmark-evidence-bundle.mjs --stdout-only
+```
 ### Install and Setup Choices
 The CLI now supports a smaller decision surface for first-time setup:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@ryuenn3123/agentic-senior-core",
-  "version": "2.0.13",
+  "version": "2.0.15",
   "type": "module",
   "description": "Force your AI Agent to code like a Staff Engineer, not a Junior.",
   "bin": {
@@ -48,6 +48,7 @@
     "sbom:generate": "node ./scripts/generate-sbom.mjs",
     "benchmark:detection": "node ./scripts/detection-benchmark.mjs",
     "benchmark:token": "node ./scripts/token-optimization-benchmark.mjs",
+    "benchmark:bundle": "node ./scripts/benchmark-evidence-bundle.mjs",
     "benchmark:gate": "node ./scripts/benchmark-gate.mjs",
     "benchmark:intelligence": "node ./scripts/benchmark-intelligence.mjs",
     "report:quality-trend": "node ./scripts/quality-trend-report.mjs",

package/scripts/benchmark-evidence-bundle.mjs ADDED Viewed

@@ -0,0 +1,175 @@
+#!/usr/bin/env node
+/**
+ * benchmark-evidence-bundle.mjs
+ *
+ * V2.5.1 reproducibility baseline artifact.
+ * Aggregates benchmark inputs, rubric, command examples, and outputs
+ * into a single machine-readable evidence bundle.
+ */
+import { existsSync, readFileSync } from 'node:fs';
+import fs from 'node:fs/promises';
+import { spawnSync } from 'node:child_process';
+import { dirname, join, resolve } from 'node:path';
+import { fileURLToPath } from 'node:url';
+const SCRIPT_FILE_PATH = fileURLToPath(import.meta.url);
+const SCRIPT_DIR = dirname(SCRIPT_FILE_PATH);
+const REPOSITORY_ROOT = resolve(SCRIPT_DIR, '..');
+const ARGUMENT_FLAGS = new Set(process.argv.slice(2));
+const isStdoutOnlyMode = ARGUMENT_FLAGS.has('--stdout-only');
+const REPRO_PROFILE_PATH = join(REPOSITORY_ROOT, '.agent-context', 'state', 'benchmark-reproducibility.json');
+const BENCHMARK_THRESHOLD_PATH = join(REPOSITORY_ROOT, '.agent-context', 'state', 'benchmark-thresholds.json');
+const BENCHMARK_WATCHLIST_PATH = join(REPOSITORY_ROOT, '.agent-context', 'state', 'benchmark-watchlist.json');
+const OUTPUT_PATH = join(REPOSITORY_ROOT, '.agent-context', 'state', 'benchmark-evidence-bundle.json');
+function readJsonOrNull(filePath) {
+  if (!existsSync(filePath)) {
+    return null;
+  }
+  try {
+    return JSON.parse(readFileSync(filePath, 'utf8'));
+  } catch {
+    return null;
+  }
+}
+function runJsonScript(scriptRelativePath) {
+  const absoluteScriptPath = join(REPOSITORY_ROOT, scriptRelativePath);
+  const executionResult = spawnSync('node', [absoluteScriptPath], {
+    cwd: REPOSITORY_ROOT,
+    encoding: 'utf8',
+    maxBuffer: 1024 * 1024 * 10,
+  });
+  const stdoutContent = (executionResult.stdout || '').trim();
+  const stderrContent = (executionResult.stderr || '').trim();
+  const exitCode = typeof executionResult.status === 'number' ? executionResult.status : 1;
+  if (!stdoutContent) {
+    return {
+      scriptPath: scriptRelativePath,
+      exitCode,
+      parsedReport: null,
+      parseError: 'Script produced no stdout JSON payload',
+      stderr: stderrContent,
+    };
+  }
+  try {
+    return {
+      scriptPath: scriptRelativePath,
+      exitCode,
+      parsedReport: JSON.parse(stdoutContent),
+      parseError: null,
+      stderr: stderrContent,
+    };
+  } catch (jsonParseError) {
+    const parseErrorMessage = jsonParseError instanceof Error ? jsonParseError.message : String(jsonParseError);
+    return {
+      scriptPath: scriptRelativePath,
+      exitCode,
+      parsedReport: null,
+      parseError: parseErrorMessage,
+      stderr: stderrContent,
+    };
+  }
+}
+function summarizeExecution(scriptExecutionResult) {
+  return {
+    scriptPath: scriptExecutionResult.scriptPath,
+    exitCode: scriptExecutionResult.exitCode,
+    parseError: scriptExecutionResult.parseError,
+    stderr: scriptExecutionResult.stderr || null,
+    reportName: scriptExecutionResult.parsedReport?.reportName || scriptExecutionResult.parsedReport?.gateName || null,
+    passed: typeof scriptExecutionResult.parsedReport?.passed === 'boolean'
+      ? scriptExecutionResult.parsedReport.passed
+      : null,
+  };
+}
+function buildRubricSummary(thresholdConfiguration, intelligenceReport) {
+  return {
+    benchmarkThresholds: {
+      minimumTop1Accuracy: thresholdConfiguration?.minimumTop1Accuracy ?? null,
+      maximumManualCorrectionRate: thresholdConfiguration?.maximumManualCorrectionRate ?? null,
+      maximumTop1AccuracyDrop: thresholdConfiguration?.maximumTop1AccuracyDrop ?? null,
+      maximumManualCorrectionIncrease: thresholdConfiguration?.maximumManualCorrectionIncrease ?? null,
+    },
+    intelligenceSlaDays: intelligenceReport?.reviewSlaDays ?? null,
+  };
+}
+async function runBenchmarkEvidenceBundle() {
+  const reproducibilityProfile = readJsonOrNull(REPRO_PROFILE_PATH);
+  const thresholdConfiguration = readJsonOrNull(BENCHMARK_THRESHOLD_PATH);
+  const watchlistConfiguration = readJsonOrNull(BENCHMARK_WATCHLIST_PATH);
+  const detectionBenchmarkExecution = runJsonScript('scripts/detection-benchmark.mjs');
+  const benchmarkGateExecution = runJsonScript('scripts/benchmark-gate.mjs');
+  const benchmarkIntelligenceExecution = runJsonScript('scripts/benchmark-intelligence.mjs');
+  const executionSummaries = [
+    summarizeExecution(detectionBenchmarkExecution),
+    summarizeExecution(benchmarkGateExecution),
+    summarizeExecution(benchmarkIntelligenceExecution),
+  ];
+  const failureCount = executionSummaries.filter((executionSummary) => {
+    if (executionSummary.parseError) {
+      return true;
+    }
+    if (typeof executionSummary.passed === 'boolean') {
+      return executionSummary.passed === false;
+    }
+    return executionSummary.exitCode !== 0;
+  }).length;
+  const evidenceBundleReport = {
+    generatedAt: new Date().toISOString(),
+    reportName: 'benchmark-evidence-bundle',
+    phase: 'v2.5.1',
+    passed: failureCount === 0,
+    failureCount,
+    methodology: {
+      deterministicRuntime: reproducibilityProfile?.deterministicRuntime || null,
+      scenarioCount: Array.isArray(reproducibilityProfile?.scenarios) ? reproducibilityProfile.scenarios.length : 0,
+      commandCount: Array.isArray(reproducibilityProfile?.commandExamples) ? reproducibilityProfile.commandExamples.length : 0,
+    },
+    rerunInstructions: Array.isArray(reproducibilityProfile?.rerunInstructions)
+      ? reproducibilityProfile.rerunInstructions
+      : [],
+    commandExamples: Array.isArray(reproducibilityProfile?.commandExamples)
+      ? reproducibilityProfile.commandExamples
+      : [],
+    rawInputs: {
+      scenarios: Array.isArray(reproducibilityProfile?.scenarios) ? reproducibilityProfile.scenarios : [],
+      benchmarkThresholds: thresholdConfiguration,
+      benchmarkWatchlist: Array.isArray(watchlistConfiguration?.repositories)
+        ? watchlistConfiguration.repositories
+        : [],
+    },
+    rubric: buildRubricSummary(thresholdConfiguration, benchmarkIntelligenceExecution.parsedReport),
+    outputs: {
+      detectionBenchmark: detectionBenchmarkExecution.parsedReport,
+      benchmarkGate: benchmarkGateExecution.parsedReport,
+      benchmarkIntelligence: benchmarkIntelligenceExecution.parsedReport,
+    },
+    executions: executionSummaries,
+  };
+  if (!isStdoutOnlyMode) {
+    await fs.writeFile(OUTPUT_PATH, JSON.stringify(evidenceBundleReport, null, 2) + '\n', 'utf8');
+  }
+  console.log(JSON.stringify(evidenceBundleReport, null, 2));
+  process.exit(evidenceBundleReport.passed ? 0 : 1);
+}
+runBenchmarkEvidenceBundle();

package/scripts/validate.mjs CHANGED Viewed

@@ -148,6 +148,7 @@ async function validateRequiredFiles() {
     'scripts/validate.mjs',
     'scripts/llm-judge.mjs',
     'scripts/detection-benchmark.mjs',
+    'scripts/benchmark-evidence-bundle.mjs',
     'scripts/benchmark-gate.mjs',
     'scripts/benchmark-intelligence.mjs',
     'scripts/governance-weekly-report.mjs',
@@ -173,6 +174,7 @@ async function validateRequiredFiles() {
     'docs/v1.7-issue-breakdown.md',
     'docs/v1.8-operations-playbook.md',
     'docs/v2-upgrade-playbook.md',
+    '.agent-context/state/benchmark-reproducibility.json',
     '.agent-context/state/benchmark-watchlist.json',
     '.agent-context/state/skill-platform.json',
     '.agent-context/skills/index.json',