npm - @skyramp/mcp - Versions diffs - 0.2.0 → 0.2.1-rc.1 - Mend

@skyramp/mcp 0.2.0 → 0.2.1-rc.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/build/prompts/test-maintenance/drift-analysis-prompt.js +87 -98
package/build/prompts/test-maintenance/drift-analysis-prompt.test.js +60 -92
package/build/prompts/test-maintenance/driftAnalysisSections.js +197 -139
package/build/prompts/testbot/testbot-prompts.js +7 -4
package/build/prompts/testbot/testbot-prompts.test.js +22 -17
package/build/services/TestDiscoveryService.js +9 -39
package/build/tools/test-management/actionsTool.js +148 -166
package/build/tools/test-management/analyzeChangesTool.js +10 -2
package/build/tools/test-management/analyzeTestHealthTool.js +22 -10
package/package.json +1 -1

package/build/prompts/test-maintenance/driftAnalysisSections.js CHANGED Viewed

@@ -2,37 +2,42 @@
  * Modular section builders for the Drift Analysis prompt,
  * mirroring the recommendationSections.ts pattern.
  */
-export function buildActionDecisionMatrix() {
+import { AUTH_MIDDLEWARE_PATTERNS_STR } from "../../utils/workspaceAuth.js";
+export function buildActionDecisionTree() {
     return `<decision_rules>
-## Action Decision Tree
+**Before the numbered checks, apply the scope gate:** can this test's service or interface boundary actually reach the changed code? If the answer is clearly no — the test targets a definitively different service, a read-only replica, a completely separate microservice, or a different protocol — assign IGNORE and stop. A failed scope gate is terminal: it does not produce a signal subject to severity comparison below. When reachability is uncertain (dynamic serializers, inherited base models, conditional field exposure), use VERIFY instead of IGNORE.
-For each existing test, work through these checks in order — the first match wins:
+**Before working through any individual check, do a single pass over the entire diff** to record all high-signal patterns. For each matched diff line write one line: \`{pattern type} — "{diff line}" — affects {endpoint/test}\`. Build this detection list first — it is your working artifact for all tests. An action with no entry in this list is unsupported.
-1. **All endpoints the test covers were removed** → **DELETE**
-2. **Some endpoints removed, some renamed** → **UPDATE**
-3. **New response field added to a covered endpoint** → **UPDATE** — the test needs a new assertion even if existing assertions still pass
-4. **Shape change breaks assertions (field-level: ≤2 fields changed, renamed, or type-swapped)** → **UPDATE**
-   **Shape change restructures the root response (flat→nested, new wrapper object, root key renamed, ≥50% of test assertions broken)** → **REGENERATE**
-5. **Auth added or auth method changed** → **UPDATE**
-   **Auth removed** → **VERIFY**
-6. **No breaking changes detected** → **IGNORE** or **VERIFY** for minor drift
+High-signal patterns to look for in the pre-scan:
+- Route removed or renamed: \`- @app.route\`, \`- router.get\`, \`- @GetMapping\`, or paired \`-\`/\`+\` on a route decorator
+- Field type/removal: \`- field: int\`/\`+ field: string\`, \`- "responseField":\`
+- Status/enum change: \`- return 200\`/\`+ return 201\`, \`- status: "active"\`/\`+ status: "enabled"\`
+- Root wrapper change: \`- return Response({...})\`/\`+ return Response({"data":{...},"meta":{...}})\`
+- New field in serializer/view/output formatter: \`+ "newField":\` or \`+ newField =\` inside a Response or serializer
+- New field in model/migration only (no serializer): \`+ newField = Column(...)\`
+- Auth added/removed/changed: \`+ @require_auth\`, \`- @require_auth\`, token-type change
+- Scope narrowed: \`+ requireRole\`, \`+ raise PermissionError\`, \`+ if not is_owner\`, \`+ [x for x in xs if x.owner == caller_id]\`
+- Behavioral: \`+ raise ValidationError\`/\`+ HTTPException(409)\` on new \`if\`, \`+ VALID_TRANSITIONS\`, sync→async (\`- return 200\`/\`+ return 202\`), formula change (\`- total = a - b\`/\`+ total = a + tax - b\`)
+Then, for each test where the changed code *is* reachable, work through the individual checks below using your pre-built detection list. Collect **all** matching signals, then assign the single highest-severity action across all matches. Severity order (highest first): **DELETE > REGENERATE > UPDATE > VERIFY > IGNORE**. **Before assigning UPDATE, REGENERATE, or DELETE, quote the specific diff line(s) that triggered it in the rationale. If you cannot point to a diff line this test's endpoint can observe, the action is IGNORE or VERIFY, not UPDATE.**
 Rules:
-- DELETE when all covered endpoints no longer exist; REGENERATE when they still exist but changed drastically.
-- REGENERATE means: the top-level response shape changed (flat→nested, new wrapper object added, root key renamed), OR ≥50% of the test's assertions reference fields that were removed or restructured. In all other cases, prefer UPDATE.
-- Prefer UPDATE over REGENERATE when changes are field-level (≤2 fields added, removed, renamed, or type-swapped).
-- Prefer IGNORE over VERIFY when all changed files are unrelated to the test's endpoint.
+- Collect all signals; assign the highest-severity action across them. Include all matched signals in the rationale and all matching \`updateInstructions\` — a diff that renames a path AND adds a field requires both a URL patch and a new assertion.
+- DELETE when all covered endpoints no longer exist; REGENERATE when they still exist but the root response wrapper changed and essentially every assertion is now invalid. In all other cases, prefer UPDATE.
+- REGENERATE when the root response wrapper changed (flat→nested, new envelope object, root key renamed) and essentially every assertion is invalid — if you can describe the fix as patching N specific paths, it is UPDATE regardless of how many paths there are.
+- Prefer IGNORE over VERIFY when all changed files are unrelated to the test's endpoint. Exception: if the diff touches a shared serializer, base model, or response formatter that the test's endpoint uses, prefer VERIFY even if no route file changed.
 - ADD actions belong in the next step — complete this assessment with IGNORE / VERIFY / UPDATE / REGENERATE / DELETE only.
 <examples>
 <example>
-Diff adds one field to a response object and renames a URL path segment:
+Diff adds one field to a serializer and renames a URL path segment:
 \`\`\`
 - @app.route("/users/<id>/orders")
 + @app.route("/users/<id>/purchases")
-+ "total_items": len(order.items)
++ "total_items": len(order.items)  # inside the serializer this test hits
 \`\`\`
-→ **UPDATE**: path rename + one new field — both are field-level changes. Patch the URL and add an assertion for \`total_items\`.
+→ **UPDATE**: serializer signal confirmed in diff (total_items added to the same serializer) + path rename. Patch the URL and add an assertion for \`total_items\`.
 </example>
 <example>
 Diff wraps the entire response in a new envelope object:
@@ -40,135 +45,197 @@ Diff wraps the entire response in a new envelope object:
 - return Response({"id": ..., "status": ..., "items": [...]})
 + return Response({"data": {"id": ..., "status": ..., "items": [...]}, "meta": {"page": 1}})
 \`\`\`
-→ **REGENERATE**: root shape changed from a flat object to \`{data, meta}\`. Every existing assertion (e.g. \`response["id"]\`, \`response["status"]\`) is broken — rewrite the test from scratch.
+→ **REGENERATE**: root wrapper changed — every existing assertion (e.g. \`response["id"]\`) is broken. Rewrite from scratch.
+</example>
+<example>
+Diff adds a field to a model/migration only — project uses explicit serializers (DRF, FastAPI, etc.):
+\`\`\`
++ sort_order = Column(Integer, nullable=True)  # in models.py
+\`\`\`
+No serializer or field-inclusion list changed.
+→ **VERIFY**: model-only signal in a project with explicit serializers — cannot confirm from the diff whether the field is included in the serializer's \`fields\` list and therefore exposed in API responses.
+</example>
+<example>
+Diff adds a field to a schema/model only — project has no explicit serializer layer (ORM fields passed through directly):
+\`\`\`
++ sort_order: {type: 'integer', nullable: true}  # in db/schema.js
+\`\`\`
+No serializer file changed. No \`fields =\` or field-exclusion list exists for this resource.
+→ **UPDATE**: project has no serializer gate, so new schema columns are auto-exposed in API responses. Augment the test to assert the new field is present and round-trips correctly.
+</example>
+<example>
+Diff changes a status code in the handler:
+\`\`\`
+- res.status(200).json(...)
++ res.status(201).json(...)
+\`\`\`
+→ **UPDATE**: the test asserts \`toBe(200)\` which now fails. Patch the status assertion.
+</example>
+<example>
+Diff adds a role gate to a route the test covers:
+\`\`\`
++ if (user.role !== "owner") {
++   return res.status(403).json({ error: "forbidden_role" });
++ }
+\`\`\`
+→ **UPDATE**: the test's existing token now gets 403. Send a token with sufficient role and add a 403 negative assertion for the restricted role. (Authorization scope narrowed — not an auth-mechanism change; the Auth/AuthZ and Behavioral Contract checks cover this.)
+</example>
+<example>
+Diff adds a state-transition guard:
+\`\`\`
++ const VALID_TRANSITIONS = { draft: ["review"], review: ["published"] };
++ if (!VALID_TRANSITIONS[currentStatus]?.includes(newStatus)) {
++   throw new HTTPException(409, "invalid_transition");
++ }
+\`\`\`
+→ **UPDATE**: an integration test that previously posted \`draft→published\` directly now gets 409. Chain through the valid states (draft→review→published) and add a 409 assertion for the direct skip.
 </example>
 </examples>
 </decision_rules>`;
 }
+// Retained for backwards compatibility — no longer rendered in the prompt.
+// Diff signals are now inlined into each individual check function.
+/** @deprecated use the individual check functions; this function is no longer part of the prompt */
 export function buildBreakingChangePatterns() {
-    return `## Breaking Change Patterns to Detect
+    return "";
+}
+export function buildCheckEndpointExistence() {
+    return `Does the endpoint the test targets still exist in the codebase?
-Scan the diff lines for these high-signal patterns:
+Diff signals to look for:
+- Route removed: \`- @app.route("/path")\`, \`- router.get("/path")\`, \`- @GetMapping("/path")\`
+- Route renamed: paired \`-\` and \`+\` on a route decorator with a different path
-### Endpoint-level breaking changes
-- \`-  @app.route("/old-path")\` / \`+  @app.route("/new-path")\` — renamed endpoint
-- \`-  router.get("/old")\` / \`+  router.get("/new")\` — renamed route
-- \`-  @GetMapping("/old")\` / \`+  @GetMapping("/new")\` — Spring rename
-- Lines removing a route decorator entirely (endpoint removed)
+Actions:
+- ALL endpoints the test covers were removed → DELETE (the entire test file is obsolete)
+- SOME methods removed but others remain → UPDATE (remove test functions for deleted methods, keep the rest)
+- Endpoint renamed → UPDATE (path substitution; supply \`renamedEndpoints\`)`;
+}
+export function buildCheckResponseShape() {
+    return `Has the request body or response structure changed in a way that breaks the test?
-### Request/response shape changes
-- Field type changes: \`- field: int\` → \`+ field: string\`
-- Required field added: \`+ required: [..., "newField"]\`
+Diff signals to look for:
+- Field type change: \`- field: int\` / \`+ field: string\`
+- Required request body field added: \`+ required: [..., "newField"]\`
+- Required query param added with no default
 - Response field removed: \`- "responseField":\`
-- Enum value changes: \`- status: "active"\` → \`+ status: "enabled"\`
-### Auth changes
-- \`+ @require_auth\`, \`+ @login_required\`, \`+ middleware(authMiddleware)\`
-- \`- @require_auth\` (auth removed)
-- Token type changed: Bearer → Cookie
-### Status code changes
-- \`- return 200\` → \`+ return 201\`
-- \`- status_code=200\` → \`+ status_code=204\`
-- \`- res.status(201)\` → \`+ res.status(200)\`
-### Additive response field changes (non-breaking but coverage gap)
-These do NOT break existing assertions but leave the new field untested. Always flag as UPDATE for covered endpoints.
-- \`+ "newField": queryset.filter(...).count()\` added inside a \`Response({...})\` or \`res.json({...})\`
-- \`+ newField = serializers.XXXField()\` added to a serializer used by a tested endpoint
-- \`+ "newField":\` added to a response body dict returned by the endpoint
-- New key added inside an existing dict/object returned by the endpoint`;
+- Enum value changed: \`- status: "active"\` / \`+ status: "enabled"\`
+- Status code changed: \`- return 200\` / \`+ return 201\`
+- Root wrapper added: \`- return Response({...})\` / \`+ return Response({"data": {...}, "meta": {...}})\`
+Actions:
+- Type changes, new required fields, removed asserted fields, status/enum changes → UPDATE
+- Root response wrapper changed and essentially every assertion is now invalid → REGENERATE
+**UPDATE vs REGENERATE — the deciding question is whether the root response wrapper changed:**
+- **REGENERATE** only when a new top-level envelope object wraps the entire payload or the root key is renamed so that essentially every existing assertion must change.
+- **UPDATE** for everything else. If you can describe the fix as "patch these N assertion paths", it is UPDATE regardless of how many paths there are.
+- When every assertion in the file is invalid, it is REGENERATE. When you can still patch individual paths, it is UPDATE.`;
 }
-export function buildTestAssessmentGuidelines() {
-    return `## Per-Test Assessment (4 Checks)
-For each existing test file, run these checks:
-### Check A: Endpoint existence
-Does the endpoint the test targets still exist in the codebase?
-- If ALL endpoints the test covers were removed → action: DELETE (the entire test file is obsolete)
-- If SOME methods were removed but others remain → action: UPDATE (remove the test functions for deleted methods, keep the rest)
-- If the endpoint was renamed → action: UPDATE (path substitution)
-### Check B: Request/response shape (breaking changes)
-Has the request body or response structure changed in a way that breaks the test?
-- Compare test's expected fields against current schema/model definitions
-- Type changes (string→int, int→string) on individual fields → action: UPDATE
-- Type change restructures the root object or makes the entire request body invalid → action: REGENERATE
-- New required fields the test doesn't send → action: UPDATE
-- Response fields the test asserts on have been removed → action: UPDATE
-- ≥50% of the test's assertions reference fields that were removed or restructured → action: REGENERATE
-**UPDATE vs REGENERATE:** choose UPDATE when changes are field-level (≤2 fields added, removed, renamed, or type-swapped). Choose REGENERATE only when the root response shape changed (flat→nested, new wrapper object, root key renamed) or ≥50% of assertions are broken.
-### Check B2: Additive response field changes (coverage gaps)
-**Even if existing assertions still pass**, does the diff add a new field to the response of an endpoint this test already covers?
-- Look at the diff for lines like \`+ "newField":\` or \`+ newField =\` inside a view/serializer this test hits
-- If YES → action: UPDATE
-- This applies even when the test only checks status codes — the test should be extended to cover the new field
-- A new response field on a covered endpoint always triggers UPDATE — even when existing assertions still pass.
-### Check C: Auth changes
-Has the authentication mechanism for this endpoint changed?
-- Auth added where none existed → action: UPDATE
-- Auth method changed (bearer→cookie) → action: UPDATE
-- Auth removed → action: VERIFY
-### Check D: Assign action
-Based on the above, choose the action (IGNORE / VERIFY / UPDATE / REGENERATE / DELETE) and provide a 1-2 sentence rationale.
-- If Check B2 flagged an additive field → action must be UPDATE, even if Checks B/C found no breaking changes.`;
+export function buildCheckAuthAndAuthorization() {
+    return `Has the authentication or authorization for this endpoint changed?
+**Authentication mechanism**
+Diff signals to look for: ${AUTH_MIDDLEWARE_PATTERNS_STR}. Also: \`@requiresRole\`/\`@Protected\`, \`validateToken\`/\`checkPermission\`/\`verifyHMAC\`, imports from auth/security packages, \`- @require_auth\` (removal), token type change (Bearer → Cookie).
+Actions:
+- Auth added where none existed → UPDATE (test would 401/403 on every request without the new credential)
+- Auth method changed (e.g. bearer→cookie) → UPDATE (test sends the wrong credential type)
+- Auth removed and test asserts a 401/403 response that will no longer fire → UPDATE
+- Auth removed and test does not assert on auth responses → VERIFY (endpoint may now be intentionally public)
+**Authorization scope** (same credential, narrower access)
+Diff signals to look for: \`+ requireRole\`, \`+ requireCreateX\`/\`requireDeleteX\`, \`+ assert_*_scope\`, \`+ ALLOWED_ROLES.includes\`/\`ASSIGNABLE_ROLES\`, \`+ if not is_owner\`, \`+ raise PermissionError\`/\`HTTPException(403)\`, \`+ [x for x in xs if x.owner == caller_id]\`, new role-carrying request header (e.g. \`X-Workspace-Role\`).
+Actions:
+- Role or ownership gate added → UPDATE (test's existing token may now get 403; send a sufficient-role token and add a 403 negative assertion)
+- Caller-identity filtering added → UPDATE (test's token now returns a subset; adjust expectations or use an admin-scope token)
+Do NOT assign IGNORE just because the auth *mechanism* is unchanged — scope narrowing breaks a token-valid test.`;
 }
-export function buildAddRecommendationGuidelines() {
-    return `## ADD — New Tests for New Endpoints
+export function buildCheckBehavioralContract() {
+    return `Has the endpoint's BEHAVIOR changed while its response shape stayed the same? A test can break even when no field was added or removed.
+Diff signals and actions:
+- **Validation tightened**: \`+ raise ValidationError\`/\`+ throw new ValidationError\` gated on field value, \`+ Field(pattern=...)\`/\`ge=\`/\`le=\`/\`max_length=\` on an existing field → UPDATE (fix the payload to satisfy the new constraint; add the 4xx negative case)
+- **New conditional rejection / state guard**: \`+ raise HTTPException(status_code=409)\`/\`+ res.status(409)\` inside a new \`if\`, \`+ VALID_TRANSITIONS\`, \`+ allowed_states = ...\` → UPDATE (chain through valid states; assert the rejection status for the now-illegal path)
+- **Sync → async**: \`- return 200 result\` / \`+ return 202 {job_id}\` → UPDATE (assert \`202\` and the job/id field; remove old result-field assertions from the immediate response)
+- **Computed-field formula changed**: \`- total = a - b\` / \`+ total = a + tax - b\` on an existing asserted field → UPDATE; describe the new formula in \`updateInstructions\` and provide the recomputed expected value where inputs are known from the diff
+- **Behavior gated on a request header**: old shape returned only when a version header is sent; new shape is now the default → UPDATE (migrate assertions to the new default shape, or pin the old shape by sending the version header)
-**ADD applies only when:**
-- The diff introduces a brand-new route that has **no existing test coverage at all**, OR
-- The diff introduces a new auth path, error branch, or fundamentally separate scenario that no existing test covers.
+**Reachability for behavioral changes:** The service/interface scope gate still applies — if the test targets a completely different service or protocol, IGNORE. However, do NOT use the absence of a route or serializer file in the diff as grounds for IGNORE. Behavioral changes (new error branches, new validation, status-code changes) are observable from any test calling the same endpoint, even when the logic lives in an internal handler, middleware, or utility file rather than the route file itself.`;
+}
+export function buildCheckAssignAction() {
+    return `Based on the above checks, choose the action (IGNORE / VERIFY / UPDATE / REGENERATE / DELETE) and provide a 1-2 sentence rationale.
+**Every action requires a specific rationale — including IGNORE:**
+- UPDATE / REGENERATE / DELETE: quote the specific diff line that triggered it.
+- VERIFY: name the uncertain element (e.g. "shared serializer, cannot confirm field exposure without reading the file").
+- IGNORE: name the specific reason the changed code cannot reach this test's endpoint (e.g. "diff only touches \`auth/session-service.js\` — this test targets \`/api/v1/orders\` which has no session dependency"). Generic "unrelated endpoint" or "service boundary" without a diff reference is not sufficient.
+- If the Additive Fields check flagged a new field with serializer signal confirmed in the diff → action is UPDATE. If the Additive Fields check returned VERIFY (model/schema only, serializer not in diff) → action remains VERIFY.
+- **Service/layer scope gate is terminal:** If the changed code is clearly not reachable through the service or base URL this test targets, assign IGNORE — this overrides all other checks. When reachability is uncertain, assign VERIFY rather than IGNORE.
+- **Pre-commit verification — confirm all three before finalizing UPDATE/REGENERATE/DELETE:**
+  1. You can quote a specific diff line this test's endpoint observes that triggered the action.
+  2. The changed code is reachable through this test's service and base URL.
+  3. For REGENERATE: every assertion in the file is invalid, not just some — if you can patch N paths, it is UPDATE.
+  If any check fails, downgrade to VERIFY or IGNORE.
+- **For user-written (external) tests** marked \`[external]\` in the test list:
+  - UPDATE is permitted — targeted edits (fix renamed URL, add assertion for new field).
+  - REGENERATE and DELETE are **not permitted** — assign those actions in your recommendations but \`skyramp_actions\` will surface them as report-only findings for the developer to act on. Do NOT attempt to rewrite or delete a user-authored test file.`;
+}
+export function buildCheckAdditiveFields() {
+    return `Even if existing assertions still pass, new response fields on a covered endpoint may need a new assertion.
-**Use UPDATE instead of ADD when:**
-- The resource already has existing tests and the diff only adds a new HTTP method — add the new method's test cases to the existing file.
-- The endpoint existed before this diff but lacks tests — log it in \`additionalRecommendations\` and skip it; pre-existing coverage gaps are out of scope for ADD.
+Diff signals to look for:
+- \`+ "newField": ...\` inside a \`Response({...})\`, \`res.json({...})\`, serializer class, or output formatter → serializer signal confirmed
+- \`+ newField = serializers.XXXField()\` in a serializer used by this endpoint → serializer signal confirmed
+- \`+ newField = Column(...)\` or \`+ newField:\` in a model/migration only, with no serializer change → model-only signal
-**Test type priority by HTTP method:**
-| Method | Recommended test types |
-|--------|----------------------|
-| POST / PUT / PATCH | integration, contract |
-| GET | contract, smoke |
-| DELETE | integration, smoke |
+Actions:
+- Serializer signal confirmed → UPDATE (add an assertion for the new field)
+- Model/schema only, serializer not in diff, **project has explicit serializer layer** (a separate serializer file, Pydantic schema, field allowlist, or \`fields =\` definition controls what gets exposed) → VERIFY (cannot confirm from diff alone whether the field is included)
+- Model/schema only, serializer not in diff, **project has no explicit serializer gate** (no field allowlist or exclusion for this resource; ORM/model fields are passed through directly) → UPDATE (field is auto-exposed in responses; augment the test)
-Use a unique descriptive filename for every new test file. For a resource with existing tests, update the existing file — always prefer UPDATE over creating a new file.`;
+To determine which applies: check whether the repo has a serializer file or field-inclusion list for this resource. If none exists in the diff context, prefer UPDATE. When genuinely uncertain, prefer VERIFY.`;
 }
 export function buildUpdateExecutionRules() {
     return `<execution_rules>
-## Update Execution Rules
+**UPDATE means edit-in-place — never use a generation tool for UPDATE**
+UPDATE instructs you to modify the existing file using the Edit tool. Do NOT call \`skyramp_contract_test_generation\`, \`skyramp_integration_test_generation\`, or any other generation tool for an UPDATE action — generation tools create a new file and will overwrite or duplicate the existing one. Only use generation tools for REGENERATE actions.
 When applying UPDATE actions to existing test files, follow these rules in addition to the drift-detected changes:
-### Test file ordering
+**Test file ordering**
 Place mutation test functions (PATCH, PUT, POST) **before** any DELETE test function targeting the same resource. DELETE removes the resource — any mutation call after it will 404. When inserting a new mutation test, place it above the DELETE function and above the DELETE call in the \`if __name__ == "__main__"\` block (or equivalent runner entrypoint).
-### Happy path first
+**Happy path first**
 When adding a new HTTP method (PUT, PATCH, POST) to an existing test file, always include a 2xx success assertion first. Error-path tests (404, 422) may follow, but the happy path case is required.
-### All test files for a resource
+**All test files for a resource**
 When a diff adds a new HTTP method to a resource, UPDATE covers **all** existing test files for that resource — contract, integration, and UI. Apply UPDATE to every file the analyze tool reported for that resource path; do not stop after updating the first one.
-### PATCH/PUT with child collections
+**PATCH/PUT with child collections**
 Child collection arrays (e.g. \`items\`, \`products\`, \`line_items\`) drive computed totals — a test that omits them cannot catch the most common mutation bugs. When the request/response includes a child collection:
 1. Include the child array with at least one item containing the FK field (e.g. \`product_id\`) and a \`quantity\` field.
 2. Assert each item's FK field and \`quantity\` match the sent values.
 3. Assert the top-level computed total (e.g. \`total_amount\`) equals the expected math from the items.
-### REGENERATE
+**REGENERATE**
 Call the appropriate generation tool to replace the existing test from scratch. Use the same filename so it overwrites the old file.
-### DELETE
-Remove the test file when ALL endpoints it covers were removed from the codebase. If only SOME methods were removed, use UPDATE instead — remove the test functions for deleted methods and keep the rest.
+**DELETE**
+Assign DELETE when ALL endpoints the test covers were removed from the codebase. \`skyramp_actions\` surfaces DELETE as a report-only finding — it does not delete the file automatically. The developer must delete the obsolete test file manually. If only SOME methods were removed, use UPDATE instead — remove the test functions for deleted methods and keep the rest.
-### Test data isolation
+**Test data isolation**
 Never use hardcoded resource IDs (e.g. \`order_id=1\`) in any test step, including GET or DELETE steps. Always create required resources via prior POST steps and chain IDs dynamically. Use timestamp-based unique names for created resources (e.g. \`"Product-\${int(time.time())}"\`) to prevent collisions across test runs.
-### Enhance assertions after UPDATE
+**Assertion quality**
+When adding assertions, assert response *values* (field equals expected), not just field presence or status code — match the assertion depth the test already uses for other fields.
+**Enhance assertions after UPDATE**
 Call \`skyramp_enhance_assertions\` with \`testFile\` set to the absolute path of the test file you just updated, \`enhanceType: "maintenance"\`, and the matching \`testType\` based on the file you are editing:
 - **Integration test file** (multi-step chained requests): call with \`testType: "integration"\`
 - **Contract-provider test file** (single endpoint with \`beforeAll\`/\`afterAll\` setup, provider mode): call with \`testType: "contract"\`. Skip for consumer-mode contract tests.
@@ -177,44 +244,35 @@ Call \`skyramp_enhance_assertions\` with \`testFile\` set to the absolute path o
 Then apply every instruction returned by the tool to the test file.
 </execution_rules>`;
 }
-export function buildDriftOutputChecklist(existingTestCount, newEndpointCount, inlineMode = false, stateFile) {
-    const finalStep = inlineMode
-        ? `### Final step
-Apply all maintenance actions (UPDATE / REGENERATE / DELETE) directly by editing the test files. Apply IGNORE, VERIFY, UPDATE, REGENERATE, or DELETE only — ADD is handled in the next task.`
-        : `### Final step
-After completing all assessments above, call \`skyramp_actions\` with \`stateFile: "${stateFile ?? "<stateFile>"}"\` and a \`recommendations\` entry for every test assessed. For each entry include: \`testFile\` (absolute path as reported by the analysis tools), \`action\`, \`rationale\`, \`updateInstructions\` (free-form summary of what this test must change — new fields to assert, constraint details, auth changes, new request params, or any other drift specifics; \`skyramp_actions\` passes this directly to the downstream LLM editing the file), and \`renamedEndpoints\` (for path-rename updates).
-Call \`skyramp_actions\` as the sole final action — skip all other file writes.`;
-    const existingTestHeader = inlineMode
-        ? "### Existing tests (reported by skyramp_analyze_changes)"
-        : `### Existing tests (${existingTestCount} total)`;
-    const existingTestSection = `${existingTestHeader}
+export function buildDriftOutputChecklist(existingTestCount, newEndpointCount, stateFile) {
+    const finalStep = `**Final step**
+After completing all assessments above, call \`skyramp_actions\` with \`stateFile: "${stateFile ?? "<stateFile>"}"\` and a \`recommendations\` entry for every test assessed. For each entry include:
+- \`testFile\` (absolute path as reported by the analysis tools)
+- \`action\`
+- \`rationale\` — one sentence naming the diff line or pattern that triggered the action
+- \`updateInstructions\` — written for the downstream LLM that will edit the file; name the specific fields, types, paths, or constraints that must change. Example: "Added stock_count: int (ge=0, default=0) to ProductBase. Test hits GET /products — assert stock_count is present and non-negative." Vague instructions produce incomplete edits — be specific.
+- \`renamedEndpoints\` (for path-rename updates)
+For \`[external]\` tests: include them in \`recommendations[]\` with the assessed action. \`skyramp_actions\` will apply UPDATE edits and surface REGENERATE/DELETE as report-only findings — it will never rewrite or delete a user-authored file.
+Do not edit or regenerate files before calling \`skyramp_actions\`. After calling it, follow its emitted instructions to apply UPDATE edits and run REGENERATE tool calls.`;
+    const existingTestSection = `**Existing tests (${existingTestCount} total)**
 For each existing test:
 - **IGNORE/VERIFY tests**: one line each: \`{testFile} — IGNORE\` or \`{testFile} — VERIFY\`. Rationale omitted for brevity.
 - **UPDATE/REGENERATE/DELETE tests**: output the full block:
 \`\`\`
 Test: {testFile}
 Action: {UPDATE | REGENERATE | DELETE}
-Rationale: {1-2 sentence explanation}
+Rationale: {action} because {quoted diff signal}; affects {assertion/path}
 \`\`\`
 Focus your analysis on tests that need action — keep reasoning for unchanged tests to a single line.`;
-    const newEndpointSection = inlineMode
-        ? ""
-        : newEndpointCount > 0
-            ? `### New endpoints (${newEndpointCount} detected)
-For EACH new endpoint, output:
-\`\`\`
-Endpoint: {METHOD} {path}
-Action: ADD
-Test types: {contract | integration | smoke | ...}
-Rationale: {1 sentence}
-\`\`\``
-            : `### New endpoints
-No new endpoints detected in this diff.`;
-    const sections = [existingTestSection, newEndpointSection, finalStep].filter(s => s.length > 0);
+    const newEndpointSection = newEndpointCount > 0
+        ? `**New endpoints (${newEndpointCount} detected)**
+For EACH new endpoint, output one line: \`{METHOD} {path} — {recommended test types} — {1 sentence rationale}\`
+Do NOT include new endpoints in the \`recommendations[]\` passed to \`skyramp_actions\` — ADD is handled separately by the test generation tools.`
+        : `**New endpoints** — none detected in this diff.`;
+    const sections = [existingTestSection, newEndpointSection, finalStep];
     return `<output_format>
-## Output Checklist
 Complete ALL of the following:
 ${sections.join("\n\n")}

package/build/prompts/testbot/testbot-prompts.js CHANGED Viewed

@@ -2,7 +2,6 @@ import { z } from "zod";
 import { logger } from "../../utils/logger.js";
 import { AnalyticsService } from "../../services/AnalyticsService.js";
 import { MAX_TESTS_TO_GENERATE, MAX_RECOMMENDATIONS, MAX_CRITICAL_TESTS, PATH_PARAM_UUID_GUIDANCE, AUTH_CONFLICT_ERROR_MSG, } from "../test-recommendation/recommendationSections.js";
-import { buildDriftAnalysisPrompt } from "../test-maintenance/drift-analysis-prompt.js";
 import { getTraceRecordingPromptText } from "../../playwright/traceRecordingPrompt.js";
 import { isContractConsumerModeEnabled } from "../../utils/featureFlags.js";
 import { resolveServiceDetailsRef } from "../../utils/utils.js";
@@ -66,9 +65,13 @@ Use those recommendations as your baseline. Only add or remove tests that the us
    **If \`skyramp_analyze_changes\` returns an error:** retry once only if the error is transient (timeout, network blip, temporary unavailability) — do NOT retry for permanent errors (invalid repository path, missing required parameter, authentication failure). If it fails again, call \`skyramp_submit_report\` with a minimal valid payload: leave all test arrays empty and add the error to \`issuesFound\`. Refer to the \`skyramp_submit_report\` schema for required fields. Do NOT attempt Task 2 without a valid stateFile.
    **If all changed files are non-application** (CI/CD, docs, lock files, config) → skip to Task 3 (Submit Report) with empty arrays and a single \`issuesFound\` entry explaining why (same format as the zero-test path below).
-3. **Maintain existing tests** using the rules in \`<drift_analysis_rules>\` below. For each existing test reported by \`skyramp_analyze_changes\`, score it and choose the action exactly as directed by the Action Decision Matrix in \`<drift_analysis_rules>\`. Only read test files that require action per that matrix — do NOT read files that will be IGNORED. **Do NOT read source files (routers, models, CRUD, components) — all the information you need is in the \`skyramp_analyze_changes\` output and the diff.** When reading multiple test files, **read them all in a single parallel batch** — do NOT read them one at a time. Apply actions directly. Results go in \`testMaintenance\`.
+3. **Maintain existing tests:**
-${buildDriftAnalysisPrompt({ existingTests: [], scannedEndpoints: [], repositoryPath })}
+   a. Call \`skyramp_analyze_test_health\` with \`stateFile\` (from step 2). Follow every instruction in the returned \`<drift_analysis_rules>\` block — use the Action Decision Tree, apply the Breaking Change Patterns, and work through each check (Endpoint Existence, Response Shape, Additive Fields, Auth/AuthZ, Behavioral Contract, Assign Action). **Do NOT read source files** — all information you need is in the \`skyramp_analyze_changes\` output and the diff. When reading multiple test files that require action, **read them all in a single parallel batch**.
+   b. For each test scored UPDATE or REGENERATE, write \`updateInstructions\` (a concise description of what must change) **before** calling \`skyramp_actions\`. This articulation step prevents the LLM from letting file content override diff-based reasoning.
+   c. Call \`skyramp_actions\` with \`stateFile\` (from step 2) and your \`recommendations[]\` — one entry per test assessed, including IGNORE and VERIFY. The tool returns file content for each UPDATE/REGENERATE test — apply the edits. Results go in \`testMaintenance\`.
 4. **Code review:** From the \`skyramp_analyze_changes\` output and the existing test files you read for maintenance, note any logic bugs. Do NOT read additional source files just for code review — use what is already available from the analysis and test file reads. Common patterns to flag:
    - Computed fields not recalculated after mutation (e.g. \`total_amount\` unchanged after items are added/removed)
@@ -331,7 +334,7 @@ Call \`skyramp_submit_report\` with \`summaryOutputFile\`: "${summaryOutputFile}
 - **additionalRecommendations**: AT MOST ${maxRecommendations - maxGenerate} items.
   - For \`testType: "contract"\` entries: **\`primaryEndpoint\` is required** (e.g. \`"GET /api/v1/users/{user_id}"\`). The tool will reject the submission without it — do not omit it or you will be forced to resubmit.
   - For \`testType: "integration"\` or \`"e2e"\` entries: omit \`primaryEndpoint\` — use \`description\` to list the endpoints involved instead.
-- **testMaintenance**: Use \`[]\` **only** if no existing Skyramp tests were found in the repository. If existing tests were found (any score), include one entry per test. Set \`action\` to the exact drift action you chose from the Action Decision Matrix (\`UPDATE\`, \`REGENERATE\`, \`DELETE\`, \`VERIFY\`, or \`IGNORE\`). For UPDATE/REGENERATE/DELETE tests that were modified and executed, populate all fields from real before/after execution results. For VERIFY/IGNORE tests (not modified), derive \`beforeStatus\` from the \`skyramp_analyze_test_health\` health score (typically \`"Pass"\` if drift score is 0 and no health issues were flagged), set \`afterStatus\` to \`"Skipped"\`, and use \`afterDetails\` to explain why (e.g. "IGNORE: drift score 0 — endpoint not modified in this PR"). Do **not** add entries for tests that were not returned by the health analysis.
+- **testMaintenance**: Use \`[]\` **only** if no existing Skyramp tests were found in the repository. If existing tests were found (any score), include one entry per test. Set \`action\` to the exact drift action assigned by the Action Decision Tree (\`UPDATE\`, \`REGENERATE\`, \`DELETE\`, \`VERIFY\`, or \`IGNORE\`). For UPDATE/REGENERATE/DELETE tests that were modified and executed, populate all fields from real before/after execution results. For VERIFY/IGNORE tests (not modified), derive \`beforeStatus\` from the drift assessment you performed in step 3 (typically \`"Pass"\` if no drift was detected), set \`afterStatus\` to \`"Skipped"\`, and use \`afterDetails\` to explain why (e.g. "IGNORE: no drift detected — endpoint not modified in this PR"). Do **not** add entries for tests that were not assessed in step 3.
 ---

package/build/prompts/testbot/testbot-prompts.test.js CHANGED Viewed

@@ -202,35 +202,40 @@ describe("uiCredentials in getTestbotPrompt", () => {
             .toThrow("</ui-credentials>");
     });
 });
-describe("drift analysis inline embedding", () => {
-    beforeAll(() => { process.env.SKYRAMP_FEATURE_TESTBOT = "1"; });
-    afterAll(() => { delete process.env.SKYRAMP_FEATURE_TESTBOT; });
+describe("drift analysis — runtime tool call (step 3)", () => {
+    // The build-time embed of buildDriftAnalysisPrompt was replaced with a
+    // runtime instruction: LLM calls skyramp_analyze_test_health then skyramp_actions.
     function basePrompt() {
         return getTestbotPrompt(baseArgs.prTitle, baseArgs.prDescription, baseArgs.summaryOutputFile, baseArgs.repositoryPath);
     }
-    it("wraps inline drift rules in XML tags", () => {
+    it("step 3 instructs the LLM to call skyramp_analyze_test_health", () => {
         const prompt = basePrompt();
-        expect(prompt).toContain("<drift_analysis_rules>");
-        expect(prompt).toContain("</drift_analysis_rules>");
+        expect(prompt).toContain("skyramp_analyze_test_health");
     });
-    it("does not include a persona statement inside the inline XML block", () => {
+    it("step 3 instructs the LLM to call skyramp_actions with recommendations[]", () => {
         const prompt = basePrompt();
-        const start = prompt.indexOf("<drift_analysis_rules>");
-        const end = prompt.indexOf("</drift_analysis_rules>");
-        const block = prompt.slice(start, end);
-        expect(block).not.toContain("You are acting as a Skyramp Integration Architect");
+        expect(prompt).toContain("skyramp_actions");
+        expect(prompt).toContain("recommendations[]");
     });
-    it("drift_analysis_rules block appears inside Task 1, before Task 2", () => {
+    it("step 3 appears inside Task 1, before Task 2", () => {
         const prompt = basePrompt();
         const task1Pos = prompt.indexOf("## Task 1");
-        const rulesPos = prompt.indexOf("<drift_analysis_rules>");
+        const healthPos = prompt.indexOf("skyramp_analyze_test_health");
         const task2Pos = prompt.indexOf("## Task 2");
-        expect(rulesPos).toBeGreaterThan(task1Pos);
-        expect(rulesPos).toBeLessThan(task2Pos);
+        expect(healthPos).toBeGreaterThan(task1Pos);
+        expect(healthPos).toBeLessThan(task2Pos);
     });
-    it("Task 1 step 3 prose references drift_analysis_rules tag", () => {
+    it("does not contain the build-time embedded drift_analysis_rules content (Action Decision Tree)", () => {
+        // The rules are now fetched at runtime via skyramp_analyze_test_health —
+        // the <drift_analysis_rules> tag may appear as a reference in prose,
+        // but the actual rule content (Action Decision Tree) must not be baked in.
         const prompt = basePrompt();
-        expect(prompt).toContain("rules in `<drift_analysis_rules>`");
+        expect(prompt).not.toContain("Action Decision Tree\n\nFor each existing test");
+        expect(prompt).not.toContain("Update Execution Rules\n\nWhen applying UPDATE actions");
+    });
+    it("does not contain a persona statement (no nested identity from old embed)", () => {
+        const prompt = basePrompt();
+        expect(prompt).not.toContain("You are acting as a Skyramp Integration Architect");
     });
 });
 describe("UI grounding via Task 2 capture-act-capture", () => {