npm - @skyramp/mcp - Versions diffs - 0.0.60-rc.1 → 0.0.60-rc.2 - Mend

@skyramp/mcp 0.0.60-rc.1 → 0.0.60-rc.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/build/prompts/test-recommendation/recommendationSections.js CHANGED Viewed

@@ -62,62 +62,60 @@ export function buildTestPatternGuidelines() {
   effects occur (e.g., POST /orders triggers email notification)`;
 }
 export function buildTestQualityCriteria() {
-    return `## ⭐ MANDATORY — Meaningful Integration & E2E Tests
-**What makes a MEANINGFUL integration test (all 3 required):**
-1. **Cross-resource data flow** — Step A creates data that Step B depends on (e.g., create
-   product → create order referencing that product's ID → verify order contains correct product).
-   A test that just does CRUD on a single resource is NOT an integration test.
-2. **Realistic request bodies** — Use domain-appropriate data from the source code schemas.
-   Product tests should use real field names (name, price, category), not placeholders.
-   Order tests should reference actual product IDs from prior steps, not hardcoded "1".
-3. **Verification assertions** — Each test MUST verify the response DATA, not just status
-   codes. After creating an order, GET it and verify it contains the correct product_id,
-   quantity, and computed total. After searching products, verify the results match the query.
-**What makes a MEANINGFUL E2E test:**
-1. **Realistic user journey** — A real user flow from start to finish: browse products →
-   search/filter → add to cart → checkout. NOT just "navigate to /products and check 200".
-2. **Frontend-to-backend validation** — Verify that frontend actions trigger the correct API
-   calls and that the UI reflects the backend state correctly.
-3. **Domain-specific scenarios** — If the PR adds search, test: enter search term → verify
-   results appear → click result → verify detail page loads with correct data.
-**What makes a MEANINGFUL UI test:**
-1. **Component behavior** — Test that UI components render correctly and respond to user
-   interactions (clicks, typing, form submissions).
-2. **Visual state changes** — Test loading states, error states, empty states.
-3. **User interaction flows** — Test complete interaction sequences: fill form → validate
-   inputs → submit → see confirmation. NOT just "page renders".
-4. **Accessibility** — Verify keyboard navigation, ARIA labels, and focus management.`;
+    return `## What Makes a Good Test
+**Integration tests** should demonstrate cross-resource data flow — step A creates data
+that step B depends on (e.g., create product \u2192 create order referencing that product's ID \u2192
+verify order contains correct product). Single-resource CRUD alone is not an integration test.
+Use realistic request bodies from source code schemas and verify response data, not just
+status codes.
+**E2E tests** should follow realistic user journeys end-to-end: browse products \u2192 search \u2192
+add to cart \u2192 checkout. Verify that frontend actions trigger the correct API calls and
+that the UI reflects backend state.
+**UI tests** should exercise component behavior and interaction flows: fill form \u2192 validate
+inputs \u2192 submit \u2192 see confirmation. Include visual state changes (loading, error, empty)
+and accessibility checks.`;
 }
 export function buildGenerationRules(isUIOnlyPR) {
-    return `**RULE 1 (non-negotiable):** Every "workflow" scenario below MUST become an integration
-test recommendation. But ALSO: if the pre-drafted scenarios don't reflect actual resource
-relationships in the code, **replace them** with scenarios that reflect the REAL data model.
-**RULE 2: Priority ordering — integration tests DOMINATE, but UI/E2E are MANDATORY for frontend changes:**
-1. **Multi-resource integration tests** (HIGHEST PRIORITY) — one for EACH cross-resource
-   workflow scenario. Generate AT LEAST 2 when 2+ resources exist.
-2. **CRUD lifecycle integration tests** — one for EACH resource with new/changed endpoints.
-3. **E2E tests** — one for EACH distinct user flow spanning frontend → backend.
-4. **UI tests** — one for EACH changed frontend component/page with meaningful interactions.
-   **If no Playwright trace exists, still recommend the test and provide step-by-step instructions
-   for the user to record a trace using \`skyramp_start_trace_collection\` with \`playwright: true\`.**
-5. **Fuzz tests** — one for EACH POST/PUT endpoint with request body validation.
-6. **Contract tests** — one for EACH endpoint with a new/changed response schema.
-7. **NEVER generate smoke tests. Zero smoke tests. Not even one.**
-**RULE 2a (MANDATORY): When changed files include frontend components/pages/views:**
-- At least 1 of the top 7 MUST be a UI test for a changed component.
-- At least 1 of the top 7 MUST be an E2E test for a user flow involving the changed UI.
-- If Playwright trace is missing, set \`frontendTrace\` to a message like:
-  "Requires Playwright recording — run \`skyramp_start_trace_collection\` with playwright:true"
-**RULE 3: No duplicate coverage.**
-Do NOT generate a test if an existing test already covers that endpoint × test type.
-If \`products_integration_test.py\` already exists covering CRUD products, recommend a
-multi-resource workflow that INCLUDES products alongside other resources instead.`;
+    return `## Generation Guidelines
+**Scenario fidelity:** Every workflow scenario should reflect the actual resource
+relationships in the code. If the pre-drafted scenarios don't match the real data model,
+replace them with accurate ones.
+**Priority ordering by PR type:**
+${isUIOnlyPR ? `This is a **UI-only PR**. The most valuable tests are UI and E2E tests.
+If Playwright traces exist for the changed pages, prioritize UI/E2E tests in the top 4.
+If no traces exist, UI/E2E tests are still the highest-value recommendations — rank them
+in the top 7 with scenario steps and trace recording instructions. The testbot will not
+generate tests without traces, so all 7 become additionalRecommendations.
+1. **UI tests** — per changed component/page
+2. **E2E tests** — per user flow spanning frontend to backend
+3. **Integration tests** — only when the changed UI calls backend APIs
+` : `1. **Multi-resource integration tests** — one per cross-resource workflow (2-3 max).
+2. **Fuzz tests** — per POST/PUT endpoint with complex request bodies. Tests boundary values,
+   type coercion, missing/extra fields, and edge cases the schema allows.
+3. **Contract tests** — per endpoint with new/changed response schemas. Validates the response
+   structure matches expectations (field types, required fields, nested objects).
+4. **E2E tests** — per distinct user flow if the API serves a frontend or client
+5. **CRUD lifecycle integration tests** — only for resources with new/changed endpoints
+   where multi-resource tests don't already cover them.
+`}When no Playwright trace exists, still recommend the test with instructions for recording
+a trace using \`skyramp_start_trace_collection\` with \`playwright: true\`.
+**Mixed PRs with frontend changes:** Include at least 1 E2E or UI test in the top 7,
+ranked by value regardless of trace availability. If traces exist, place it in the top 4.
+If no traces, it can still rank highly — the testbot will handle trace-dependent generation.
+**Before finalizing:** Check that the top 4 aren't filled with CRUD tests for unchanged
+resources when PR-relevant tests exist lower in the ranking. Swap if needed.
+**No duplicate coverage.** If an existing test already covers an endpoint + test type,
+recommend a multi-resource workflow that includes that endpoint alongside others instead.`;
 }
 export function buildToolWorkflows(authHeaderValue) {
     return `## How to Generate Tests — Tool Workflows
@@ -126,20 +124,24 @@ export function buildToolWorkflows(authHeaderValue) {
 **For multi-endpoint workflows (integration tests) — Scenario → Integration pipeline:**
 1. Call \`skyramp_scenario_test_generation\` once per step: \`scenarioName\`, \`destination\`,
-   \`baseURL\`, \`method\`, \`path\`, \`requestBody\`, \`statusCode\`, \`authHeader: "${authHeaderValue}"\`.
+   \`baseURL\`, \`method\`, \`path\`, \`requestBody\`, \`authHeader: "${authHeaderValue}"\`.
+   \`statusCode\` is optional — defaults: POST→201, DELETE→204, GET/PUT/PATCH→200. Only override for non-standard codes.
    **OpenAPI spec is NOT required.** \`apiSchema\` is OPTIONAL — omit it if no spec exists.
-   \`requestBody\` MUST use realistic field values from source code schemas (Zod, Pydantic, DTOs).
-   Never send \`{}\` — inspect the source code to determine the correct request body shape.
+   \`requestBody\` should use realistic field values from source code schemas (Zod, Pydantic, DTOs).
+   Inspect the source code to determine the correct request body shape — avoid sending \`{}\`.
    Use unique names with timestamp suffix to avoid conflicts on re-runs.
    For GET/PUT/DELETE with path IDs, use a placeholder — chaining resolves the real ID.
-2. Produces a \`scenario_<name>.json\` capturing the multi-step flow.
+2. Produces a \`scenario_<name>.json\` in the same \`outputDir\` as the test files (not \`.skyramp/\`).
 3. Call \`skyramp_integration_test_generation\` with \`scenarioFile\` AND \`authHeader: "${authHeaderValue}"\`.
-   Do NOT pass \`chainingKey\` — auto-set to \`response.id\`.
+   Do NOT pass \`chainingKey\` — defaults to \`response.id\`. After generation, the testbot
+   will verify and fix path param chaining in the generated test.
 **For single-endpoint tests (contract/fuzz):**
 \`skyramp_{type}_test_generation\` with \`endpointURL\` (full URL incl. base + path), \`method\`,
 \`authHeader: "${authHeaderValue}"\`, and \`requestData\` from source code schemas.
-**OpenAPI is NOT required** — \`endpointURL\` is sufficient. Only pass \`apiSchema\` if one exists.
+If an OpenAPI spec exists, ALSO pass \`apiSchema\` — it enables schema-aware validation
+(contract tests verify response structure, fuzz tests generate smarter boundary values).
+Without a spec, \`endpointURL\` alone is sufficient.
 **For UI tests (no Playwright recording):**
 1. \`skyramp_start_trace_collection\` (playwright: true)
@@ -152,58 +154,73 @@ Same trace flow, pass both trace file and playwright zip to \`skyramp_e2e_test_g
 }
 export function buildCoverageChecklist(openApiSpec, isUIOnlyPR, hasFrontendChanges, authHeaderValue, topN) {
     const specNote = openApiSpec
-        ? `\n**OpenAPI Spec**: \`${openApiSpec.path}\` — pass \`apiSchema: "${openApiSpec.path}"\` to ALL tool calls.\n`
+        ? `\n**OpenAPI Spec available**: \`${openApiSpec.path}\`
+Use it actively:
+- **Contract tests**: pass \`apiSchema: "${openApiSpec.path}"\` — the CLI validates response schemas against the spec.
+- **Fuzz tests**: pass \`apiSchema: "${openApiSpec.path}"\` — the CLI generates boundary values from schema constraints.
+- **Integration tests**: pass \`apiSchema\` to \`skyramp_scenario_test_generation\` — it extracts destination and request/response shapes.
+- **Single-endpoint tests**: pass both \`endpointURL\` AND \`apiSchema\` for schema-aware generation.
+\n`
         : "";
     const distribution = isUIOnlyPR
-        ? `- ≥50% UI tests, ≥30% E2E tests, remaining: integration. 0% fuzz/contract/smoke.`
+        ? `- Prioritize UI tests (≥3), then E2E tests (≥2), then integration only if UI calls APIs. 0% smoke.`
         : hasFrontendChanges
-            ? `- ≥30% integration, ≥20% E2E, ≥15% UI (at least 1 UI + 1 E2E MANDATORY). Remaining: fuzz + contract. 0% smoke.`
-            : `- ≥60% integration (multi-resource FIRST, then CRUD). Remaining: fuzz + contract. 0% smoke.`;
-    const skipUI = isUIOnlyPR ? " (Skip for UI-only PRs)" : "";
+            ? `- Mix: integration (2-3), E2E (1-2), UI (1-2), fuzz or contract (1). 0% smoke.`
+            : `- Mix: integration (2-3, multi-resource first), fuzz (1-2), contract (1-2), E2E (1 if user-facing flows exist). 0% smoke.`;
     return `## Coverage Checklist
 ${specNote}
-For EACH endpoint, recommend ALL applicable types **that don't already exist**:
-1. **Integration** (HIGHEST) — per workflow scenario + per resource CRUD.
-2. **E2E** — per user flow spanning frontend → backend.
-3. **UI** — per changed component/page.${isUIOnlyPR ? " **PRIMARY for this PR.**" : ""}
-4. **Fuzz** — per POST/PUT with request body.${skipUI}
-5. **Contract** — per new/changed response schema.${skipUI}
-6. **NO smoke tests.**
-## For Each Recommendation:
+${isUIOnlyPR ? `**UI-only PR** — This PR has no backend changes. Focus on UI and E2E tests.
+With Playwright traces: prioritize UI tests (one per changed component) and E2E tests
+(one per page-level user flow). Integration tests are relevant only if the UI calls APIs.
+Without traces: recommend UI/E2E tests with scenario steps and trace recording instructions
+(\`skyramp_start_trace_collection\` with \`playwright: true\`). The testbot will skip generation
+entirely for frontend-only PRs without traces — all recommendations become additional
+recommendations in the report. Skip fuzz, contract, and smoke tests.
+` : `For each endpoint, recommend the most valuable test types — aim for variety:
+1. **Integration** — multi-resource workflows (not just single-resource CRUD)
+2. **Fuzz** — POST/PUT endpoints with request bodies (validates edge cases, type safety)
+3. **Contract** — endpoints with new/changed response schemas (validates structure)
+4. **E2E** — user flows spanning frontend to backend${hasFrontendChanges ? " (include at least 1 for this PR)" : ""}
+5. **UI** — changed frontend components${hasFrontendChanges ? " (include at least 1)" : ""}
+6. No smoke tests.
+Do NOT recommend 7 integration tests — diversify across test types.
+`}
+## For Each Recommendation Include:
 1. Test type  2. Priority (high/medium/low)  3. Target endpoint/scenario
-4. **What it validates** (business logic, not just "tests the endpoint")
-5. **Skyramp tool call details** — exact tool + key params for zero-editing execution
+4. What it validates (business logic, not just "tests the endpoint")
+5. Skyramp tool call details — exact tool + key params for zero-editing execution
 6. For integration/E2E: reference draftedScenario by scenarioName
-## When Artifacts Are Missing — NEVER mark "blocked", NEVER skip the recommendation:
-- **No OpenAPI spec** → use \`endpointURL\` (full URL) and \`requestBody\` from source code.
-  Scenario generation, contract, fuzz, and integration tests ALL work without OpenAPI.
-- **No Playwright recording** → recommend the UI/E2E test anyway. Provide step-by-step
-  instructions for the user to record a trace with \`skyramp_start_trace_collection\`.
-- **No backend trace** → recommend the test anyway. Use scenario generation pipeline instead.
-## Generate Many, Show Few
-Internally consider ALL possible tests from the endpoint catalog (endpoints × interaction types
-+ scenarios). Then select the top ${topN} most valuable, ranked by priority. In your output, include:
-- \`totalConsidered: <number>\` — how many candidate tests you evaluated
-- The curated top ${topN} recommendations, ranked #1 (highest value) to #${topN}
-- The top 4 will be generated and executed; recommendations #5-#${topN} will be reported but not generated.
-  Therefore, ensure the top 4 are the most impactful tests. For mixed PRs with frontend changes,
-  the top 4 MUST include at least 1 E2E or UI test alongside integration tests.
-Prioritize integration/E2E, fill with fuzz/contract.
+## When Artifacts Are Missing
+Recommend the test anyway — never mark it "blocked":
+- **No OpenAPI spec** \u2192 use \`endpointURL\` and \`requestBody\` from source code
+- **No Playwright recording** \u2192 provide trace recording instructions
+- **No backend trace** \u2192 use the scenario generation pipeline
+## Select the Top ${topN}
+Consider all possible tests (endpoints \u00d7 interaction types + scenarios), then select the
+top ${topN} most valuable. Include \`totalConsidered\` count in your output. The top 4 will
+be generated; recommendations #5-${topN} will appear in the report but won't be generated,
+so ensure the top 4 are the highest-impact tests.
+**Before outputting, verify:**
+${isUIOnlyPR ? `- If traces exist, at least 2 of the top 4 should be UI/E2E tests.
+- Without traces, all 7 become additionalRecommendations (no generation). Rank UI/E2E highest.
+- Avoid CRUD tests for unchanged resources the UI doesn't call.` : `- If the PR includes frontend changes, include at least 1 E2E/UI test in the top 4.
+- CRUD tests for unchanged resources should not displace PR-relevant tests in the top 4.`}
+- Each integration scenario's step sequence should be logically valid — preconditions
+  met by prior steps.
+Preferred ordering: ${isUIOnlyPR ? "UI \u2192 E2E \u2192 integration (if UI calls APIs)." : "integration \u2192 fuzz \u2192 contract \u2192 E2E \u2192 UI."}
 ${distribution}
-## MANDATORY RULES:
-1. "high"/"medium"/"low" only — no numeric scores.
-2. Never mark "blocked". 3. No file creation. 4. Order: integration → E2E → UI → fuzz → contract.
-5. Reference draftedScenarios by name. 6. Reference interactions by description.
-7. Every recommendation = enough detail for direct tool invocation.
-**FINAL CHECK:** Count: workflow scenarios → integration tests, resources → CRUD tests,
-user flows → E2E, components → UI, POST/PUT → fuzz${skipUI}, schemas → contract${skipUI}.
-Total must be ≥ ${topN}.
+Each recommendation should include enough detail for direct tool invocation.
+Reference draftedScenarios by name and interactions by description.
+Use "high"/"medium"/"low" for priority — no numeric scores.
+Total candidates should be \u2265 ${topN}.
 Generate recommendations now.`;
 }

package/build/prompts/testbot/testbot-prompts.js CHANGED Viewed

@@ -9,138 +9,102 @@ function getTestbotPrompt(prTitle, prDescription, diffFile, testDirectory, summa
 <TEST DIRECTORY>${testDirectory}</TEST DIRECTORY>
 <REPOSITORY PATH>${repositoryPath}</REPOSITORY PATH>
-For all the following work, use the tools offered by Skyramp MCP server.
-Then perform ALL of the following tasks. Every task is MANDATORY — do NOT skip any task based on your own judgment unless the task itself gives you an explicit condition to skip.
-## Task 1: Recommend New Tests (MANDATORY — but skip if no application code changed)
-Read the diff at \`${diffFile}\`. Classify EVERY changed file using these categories:
-**Non-application files (DO NOT generate tests for these):**
-- CI/CD workflow files (.github/workflows/*.yml, .gitlab-ci.yml, Jenkinsfile, etc.)
-- Markdown documentation (.md files, README, CHANGELOG, CONTRIBUTING, etc.)
-- Dependency lock files (package-lock.json, yarn.lock, Pipfile.lock, poetry.lock, Gemfile.lock, go.sum, etc.)
-- Configuration-only files (.gitignore, .editorconfig, .prettierrc, renovate.json, dependabot.yml, etc.)
-- License files (LICENSE, NOTICE, etc.)
-**Application source code (generate tests for these):**
-- Routes, controllers, handlers, API endpoints
-- Models, schemas, validators, serializers, DTOs
-- Business logic, services, middleware, utilities
-- Test helpers and test fixtures
-- Any file with a source extension (.py, .ts, .js, .java, .go, .rb, .cs, .kt, .swift, etc.) that is NOT in the non-application list above
-**SKIP RULE — THIS IS MANDATORY:**
-If EVERY changed file in the diff falls into the "non-application files" category above, you MUST skip steps 1–6 entirely. Do NOT call \`skyramp_analyze_repository\`, do NOT generate any tests. Instead, proceed directly to Task 2. In your report, state: "Task 1 skipped: PR contains only non-application changes (CI/docs/config)."
-**When in doubt:** If even ONE changed file looks like it could be application source code, run steps 1–6.
-1. Call \`skyramp_analyze_repository\` with:
-   - \`repositoryPath\`: "${repositoryPath}"
-   - \`analysisScope\`: "current_branch_diff"${baseBranch ? `\n   - \`baseBranch\`: "${baseBranch}"` : ''}
-2. MANDATORY: Call \`skyramp_recommend_tests\` with the \`sessionId\` returned by \`skyramp_analyze_repository\`. This returns 7 test recommendations ranked by value. You will:
-   - **Generate and execute the TOP 4** highest-value tests.
-   - **Report the remaining 3** as \`additionalRecommendations\` in the final report (do NOT generate or execute them).
-3. Generate the **TOP 4** recommended tests (by priority rank from step 2).
-   **CRITICAL: Integration tests are the HIGHEST VALUE tests. The top 4 MUST include at least
-   2 integration tests (1 multi-resource workflow + 1 CRUD lifecycle).
-   NEVER generate smoke tests — they provide zero value.**
-   **Generate in this order:**
-   a. **Integration tests FIRST** (highest value) — generate ALL integration tests from top 4.
-      Multi-resource workflows are MORE VALUABLE than single-resource CRUD.
-      For multi-resource: call \`skyramp_scenario_test_generation\` for each step in the workflow
-      (e.g., create product \u2192 create order referencing product \u2192 verify), then call
-      \`skyramp_integration_test_generation\`. Follow chaining verification instructions.
-      For CRUD lifecycle: call \`skyramp_scenario_test_generation\` for each CRUD step
-      (create \u2192 read \u2192 update \u2192 delete), then \`skyramp_integration_test_generation\`.
-   b. **E2E tests** — generate E2E tests from top 4 for frontend user flows.
-   c. **UI tests** — generate UI tests from top 4 for changed frontend components/pages.
-      If Playwright recording is missing, you MUST still include the UI test recommendation
-      in the report. Provide the step-by-step UI scenario and explicitly instruct the user:
-      "To generate this UI test, record a Playwright trace by running
-      \`skyramp_start_trace_collection\` with \`playwright: true\`, perform the UI interactions
-      described below, then run \`skyramp_stop_trace_collection\` and \`skyramp_ui_test_generation\`."
-   d. **Fuzz tests** — generate fuzz tests from top 4 for POST/PUT endpoints.
-   e. **Contract tests** — generate contract tests from top 4 for new/changed schemas.
-   f. **NEVER generate smoke tests. Not a single one. Zero smoke tests.**
-   **Do NOT duplicate existing tests.** Before generating, check if a test file already exists
-   for that endpoint \u00d7 test type. Do NOT skip integration/E2E tests because they require
-   more tool calls \u2014 they are the most valuable tests and MUST be generated.
-4. Use Skyramp MCP to execute the generated tests and validate the results.
-5. **E2E / UI Test Generation from Trace Files**: Search the repository for existing Skyramp trace files that can be used for E2E or UI test generation. Look for:
-   - Backend trace files: files matching patterns like \`**/skyramp*trace*.json\`, \`**/skyramp-traces.json\`, or \`**/*trace*.json\` in test directories
-   - Playwright UI trace files: files matching patterns like \`**/skyramp*playwright*.zip\`, \`**/*playwright*.zip\`, or \`**/*ui*trace*.zip\`
-   Search in the test directory (\`${testDirectory}\`), the repository root, and any \`.skyramp/\` directories.
-   - If you find BOTH a backend trace file AND a Playwright trace ZIP, call \`skyramp_e2e_test_generation\` with both files to generate an E2E test.
-   - If you find ONLY a Playwright trace ZIP (no backend trace), call \`skyramp_ui_test_generation\` with the Playwright file to generate a UI test.
-   - When generating E2E/UI tests, use the same language and framework as other tests in the repository. Default to Python with pytest if no convention is detected.
-   - Execute any generated E2E/UI tests to validate them. Note: Playwright browsers are pre-installed in the CI environment.
-**IMPORTANT — Endpoint Renames:** If the diff shows an endpoint path was renamed (e.g. \`/products\` changed to \`/items\`) and existing tests already cover that endpoint under the old name, do NOT generate new tests for the renamed endpoint. The existing tests will be updated with the new path in Task 2 (Test Maintenance). Only generate new tests for genuinely new endpoints that have no existing test coverage under any name.
-## Task 2: Existing Test Maintenance (MANDATORY — DO NOT SKIP)
-**WARNING: You MUST complete this task even if Task 1 generated tests successfully. Task 1 and Task 2 are INDEPENDENT — completing Task 1 does NOT mean you can skip Task 2. The report WILL be considered incomplete without test maintenance results.**
-You MUST always run the steps below. Do NOT skip this task based on your own assessment of whether tests exist or are relevant — use the tools to determine that.
-1. Call \`skyramp_discover_tests\` with \`repositoryPath\`: "${repositoryPath}" to find all existing Skyramp-generated tests.
-   You may skip the rest of this task ONLY if it explicitly returns zero Skyramp-generated tests.
-2. **Baseline — check for parallel CI first:**
-   a. Read the workflow files in \`.github/workflows/\` and check if any workflow (other than the Skyramp Testbot workflow) is triggered on \`pull_request\` AND runs tests against the test directory (look for commands like \`pytest\`, \`jest\`, \`npm test\`, \`go test\`, \`skyramp test\`, or similar test execution commands).
-   b. If such a workflow exists, run: \`gh run list --commit $(git rev-parse HEAD) --workflow <workflow-filename> --json status,conclusion --limit 1\` to check if it has completed for the current commit.
-   c. If the parallel workflow completed successfully — record beforeStatus as "Pass" for the discovered tests and note "baseline from CI workflow <workflow-name>" in beforeDetails. Skip to step 3.
-   d. If the parallel workflow completed with failure — record beforeStatus as "Fail" and capture the failure context in beforeDetails. Skip to step 3.
-   e. If no parallel test workflow exists, it hasn't completed yet, or the \`gh\` command fails for any reason (e.g. permissions, CLI not available) — execute ALL discovered tests AS-IS (before any modifications) using \`skyramp_execute_tests_batch\` or \`skyramp_execute_test\`. Record each test's status and details as the "before" results. In beforeDetails, describe the execution result (e.g. "Pass (10.8s)" or "Fail (404 Not Found)"). If you could not query CI, just note "unable to query existing CI pipeline" — do NOT expose internal details like authentication errors.
-3. Call \`skyramp_analyze_test_drift\` with the \`stateFile\` returned by \`skyramp_discover_tests\`.
-4. Call \`skyramp_calculate_health_scores\` with the \`stateFile\` from the previous step.
-5. Call \`skyramp_actions\` with the updated \`stateFile\`. This tool returns instructions describing what needs to change in each test file — it does NOT modify the files itself.
-6. **You MUST modify the existing test files in-place using your file editing tools.** Read the instructions from \`skyramp_actions\`, cross-reference with the code diff, and edit each test file directly.
-   - If \`skyramp_actions\` returns endpoint rename mappings (old path → new path), apply them as simple find-and-replace on the test file URLs. Do NOT regenerate or restructure the test — only update the paths.
-   - If \`skyramp_actions\` suggests file renames (e.g. \`products_smoke_test.py\` → \`items_smoke_test.py\`), rename the files using \`git mv\` after updating their content.
-   - The goal is to fix the discovered tests so they pass with the new code, preserving the original test structure and logic. Do NOT create new test files as a substitute for fixing existing ones.
-7. Execute the modified tests using Skyramp MCP and validate the results. This includes E2E and UI tests — Playwright browsers are pre-installed in the CI environment, so E2E/UI test execution is fully supported. Record each test's status and details as the "after" results.
-8. For each maintained test, report BOTH the before and after results in the \`testMaintenance\` array of the report (using the fileName, beforeStatus, beforeDetails, afterStatus, afterDetails fields), so the user has full visibility into whether the code change or the existing test was at fault.
-## Task 3: Submit Report (MANDATORY)
-**CHECKPOINT: Before submitting, verify you completed BOTH tasks:**
-- Task 1: Did you call \`skyramp_recommend_tests\` and generate tests? (or skip if no application code changed?)
-- Task 2: Did you call \`skyramp_discover_tests\`? If it found tests, did you run drift analysis and report before/after results in \`testMaintenance\`?
-**If you skipped Task 2, GO BACK and complete it now before submitting.**
-After completing Tasks 1 and 2, you MUST call the Skyramp MCP tool "skyramp_submit_report" to submit your report.
-Pass '${summaryOutputFile}' as the summaryOutputFile parameter.
-For the commitMessage parameter, write a succinct summary (under 72 chars) of what you did, without any prefix. Examples:
-- "add contract tests for /products endpoint"
-- "add multiple integration tests including cross-resource workflow"
-- "add integration and e2e tests for new /reviews endpoint"
-Do NOT write the report to a file yourself. Do NOT skip this step. The skyramp_submit_report tool is the ONLY way to submit the report.
-**additionalRecommendations:** For the remaining recommendations (ranked #5-#7 from step 2) that you did NOT generate, include them in the \`additionalRecommendations\` array. For each, provide:
-   - \`testType\`, \`scenarioName\`, \`priority\` (high/medium/low), \`description\` (why it is valuable)
-   - \`steps\`: ordered sequence — each step has \`description\`, and for API steps: \`method\`, \`path\`, \`expectedStatusCode\`, \`requestBody\` (example values), \`responseBody\` (key fields to verify)
-   - \`openApiSpec\`: path to spec file if one exists in the repo
-   - \`backendTrace\`: path to backend trace file if found (used by E2E and integration tests)
-   - \`frontendTrace\`: path to Playwright/UI trace file if found (used by UI tests; E2E tests need BOTH backend + frontend traces) of what the test would cover and why it is valuable.
-## Report Guidelines
-**businessCaseAnalysis:** Base this ONLY on facts from the PR title, description, and what the tools reported. If \`skyramp_analyze_repository\` reported 0 new endpoints, do NOT claim new endpoints were added — instead describe the change accurately (e.g. "frontend changes to consume existing API endpoints", "refactored service layer", "updated test configuration"). Never infer new backend endpoints from frontend fetch/API calls in the diff.
-When reporting test results, if you chose to skip executing a test, you MUST explain WHY you skipped it.
-NEVER use the phrase "CI timeout" or imply a timeout occurred unless a tool call actually timed out.
-Instead, set the status to "Skipped" and provide an honest reason in the details, for example:
-- "Skipped: no code changes affect this endpoint"
-- "Skipped: skyramp_discover_tests found no existing Skyramp tests"
-- "Skipped: only CI/config changes in this PR, no API changes"
-Reminder: Use the Skyramp MCP tools available to you for test analysis, generation, and execution.`;
+Use the Skyramp MCP server tools for all tasks below.
+## Task 1: Recommend & Generate New Tests
+Read the diff at \`${diffFile}\`. Skip Task 1 if all changed files are non-application
+(CI/CD, docs, lock files, config). Otherwise proceed:
+### Steps
+1. Call \`skyramp_analyze_repository\` with \`repositoryPath\`: "${repositoryPath}", \`analysisScope\`: "current_branch_diff"${baseBranch ? `\n   , \`baseBranch\`: "${baseBranch}"` : ''}
+2. Call \`skyramp_recommend_tests\` with the returned \`sessionId\`.
+   It returns 7 ranked recommendations. Generate the top 4, report the remaining 3
+   as \`additionalRecommendations\`.
+3. **Generate** at most 4 tests from the top 4 recommendations. Stop after 4.
+   Keep a list of every file the CLI creates (test files AND scenario JSON files).
+   **Frontend-only PRs** (no backend/API changes): only generate tests if relevant
+   Playwright traces exist. If no traces are available, skip generation entirely and
+   move all 7 recommendations to \`additionalRecommendations\` with scenario steps and
+   trace recording instructions. Do not generate integration tests for unchanged backend
+   APIs just to fill the quota — those tests don't validate the PR's changes.
+   **How to generate each type:**
+   - **Integration**: call \`skyramp_scenario_test_generation\` per step, then
+     \`skyramp_integration_test_generation\` with the scenario file.
+     The scenario JSON is written to the same \`outputDir\` as the test files
+     (e.g. \`tests/scenario_<name>.json\`), not \`.skyramp/\`.
+   - **Contract**: call \`skyramp_contract_test_generation\` with \`endpointURL\`, \`method\`,
+     and \`requestData\` for POST/PUT endpoints.
+     Pass \`apiSchema\` if an OpenAPI spec exists — it validates response structure.
+   - **Fuzz**: call \`skyramp_fuzz_test_generation\` with \`endpointURL\`, \`method\`, \`requestData\`.
+     Pass \`apiSchema\` if available — it generates smarter boundary values.
+   - **E2E/UI**: only generate when relevant Playwright traces exist (see step 5).
+     Without traces, move the test to \`additionalRecommendations\` with scenario steps
+     and trace recording instructions instead.
+   - Skip smoke tests entirely.
+   **Scenario quality:** Before generating, verify each step's preconditions are met by
+   prior steps. For example, you can't update a membership that was never created — check
+   the controller code for existence checks and ensure the scenario creates records first.
+   **Filenames:** Pass a descriptive \`--output\` name per test to avoid CLI overwrites.
+4. **Execute** the generated tests and record results.
+5. **Trace search** for E2E/UI: look in \`\${testDirectory}\`, repo root, and \`.skyramp/\` for
+   trace files (\`*trace*.json\`, \`*playwright*.zip\`). Only use a trace if it covers code
+   changed in this PR and targets localhost — skip traces for external hosts or unrelated code.
+   With relevant traces: backend + Playwright → \`skyramp_e2e_test_generation\`,
+   Playwright only → \`skyramp_ui_test_generation\`.
+**After generation, fix chaining only.** The CLI may use literal/hardcoded IDs instead
+of dynamic values from prior responses. Fix these two cases:
+1. **Path params:** variables like \`product_id = 'product_id'\` → use the response accessor
+   (e.g. \`getResponseValue(response, "response.id")\` in TS, \`skyramp.get_response_value(response, "id")\` in Python).
+2. **Request body refs:** hardcoded IDs in request bodies (e.g. \`"product_id": 1\`) → replace
+   with the dynamic ID extracted from the prior POST response (e.g. \`product_id\` variable or
+   \`dataOverride\`/\`data_override\` for the field).
+Change ONLY chaining-related values (path param assignments and body ID references).
+Preserve everything else exactly as the CLI generated it — headers, auth code, assertions,
+imports, and all other request body fields.
+## Task 2: Existing Test Maintenance
+Run this task regardless of Task 1 outcome — even if Task 1 was skipped or generated zero tests.
+1. Call \`skyramp_discover_tests\` with \`repositoryPath\`: "${repositoryPath}".
+2. If zero Skyramp tests found, report \`testMaintenance\` as an empty array with
+   a note in \`issuesFound\`: "No existing Skyramp tests found for maintenance."
+3. If tests exist:
+   a. Baseline them (from CI status or by executing).
+   b. Run \`skyramp_analyze_test_drift\` → \`skyramp_calculate_health_scores\` → \`skyramp_actions\`.
+   c. Apply actions (path renames, schema updates) in-place. Do not regenerate.
+   d. Execute modified tests. Report before/after in \`testMaintenance\`.
+## Task 3: Submit Report
+Verify Tasks 1 and 2 are complete, then call \`skyramp_submit_report\` with
+\`summaryOutputFile\`: "${summaryOutputFile}".
+\`commitMessage\`: under 72 chars, e.g. "add integration tests for /products and /orders"
+**newTestsCreated** — list every generated test file (at most 4):
+   \`testType\`, \`endpoint\`, \`fileName\`, \`description\`, \`scenarioFile\`, \`traceFile\`, \`frontendTrace\`
+   Use the actual file path returned by the generation tool for \`scenarioFile\`.
+   Include scenario JSON files in the git commit alongside test files.
+   Every test file in the commit should appear here. If you over-generated, delete extras first.
+   If no tests were generated (e.g. frontend-only PR without traces), pass an empty array.
+**additionalRecommendations** — remaining recommendations not generated:
+   \`testType\`, \`scenarioName\`, \`priority\`, \`description\`, \`steps\`, artifact paths
+**businessCaseAnalysis** — based only on PR data and tool outputs.`;
 }
 export function registerTestbotPrompt(server) {
     logger.info("Registering testbot prompt");

package/build/services/ScenarioGenerationService.js CHANGED Viewed

@@ -21,8 +21,6 @@ export class ScenarioGenerationService {
                 };
             }
             // Handle file writing
-            //add hyphen to the scenario name
-            //make file in tmp directory
             const scenarioName = params.scenarioName.replace(/ /g, "-").toLowerCase();
             const fileName = `scenario_${scenarioName}.json`;
             const filePath = path.join(params.outputDir, fileName);
@@ -130,7 +128,7 @@ ${JSON.stringify(traceRequest, null, 2)}
         const timestamp = new Date().toISOString();
         const method = params.method;
         const path = params.path;
-        const statusCode = params.statusCode || 200;
+        const statusCode = params.statusCode;
         const requestBody = params.requestBody ||
             (method === "GET" || method === "DELETE" ? "" : "{}");
         let responseBody = params.responseBody;

package/build/tools/generate-tests/generateIntegrationRestTool.js CHANGED Viewed

@@ -17,7 +17,6 @@ const integrationTestSchema = z
         .describe("Path to the scenario file to be used for test generation. This file is generated by the skyramp_scenario_test_generation tool.")
         .optional(),
     ...codeRefactoringSchema.shape,
-    ...baseTestSchema,
     endpointURL: baseTestSchema.endpointURL.default(""),
 })
     .omit({ method: true }).shape;
@@ -56,12 +55,16 @@ multi-resource workflows often need **different chaining per step**. You MUST:
      (e.g., \`/products/{product_id}\` uses the product's ID, not the order's ID)
    - POST calls that create a child resource must include the **parent's ID** in the request body
      or path (e.g., \`POST /orders\` body should include \`product_id\` from the products POST response)
-4. **Fix any issues found**:
-   - Replace hardcoded IDs (like \`/products/1\`) with the dynamic variable from the POST response
+4. **Fix ONLY chaining** (path params AND request body ID references) — nothing else:
+   - Replace hardcoded path IDs (like \`/products/1\`) with the dynamic variable from the POST response
+   - Replace hardcoded IDs in request bodies (like \`"product_id": 1\`) with the dynamic variable
+     (use \`data_override\`/\`dataOverride\` or direct variable substitution)
    - Rename duplicate variable names so each resource has its own ID variable
-   - Ensure request bodies reference the correct chained IDs
    - For Python: use \`skyramp.get_response_value(response_N, "id")\` to extract and f-strings for paths
-   - For TypeScript: use \`response_N.body.id\` or the appropriate response accessor
+   - For TypeScript: use \`getResponseValue(response, "response.id")\` or the appropriate accessor
+   ⚠️ **Preserve everything else exactly as generated** — do not add, remove, or modify
+   auth headers, cookies, tokens, env vars, imports, assertions, or non-chaining request body fields.
+   The CLI output for auth/headers is intentional.
 **Example fix for a products → orders workflow:**
 \`\`\`

package/build/tools/generate-tests/generateScenarioRestTool.js CHANGED Viewed

@@ -36,8 +36,7 @@ const scenarioTestSchema = {
         .describe("JSON string of the response body parsed by AI from the scenario"),
     statusCode: z
         .number()
-        .optional()
-        .describe("HTTP status code (e.g., 200, 201, 204) parsed by AI from the scenario"),
+        .describe("Expected HTTP status code. Read status codes from the API schema file else defaults: POST→201, DELETE→204, GET/PUT/PATCH→200."),
     outputDir: baseSchema.shape.outputDir,
     authHeader: z
         .string()
@@ -83,8 +82,6 @@ The AI should parse the natural language scenario and provide:
 - AI-parsed HTTP method and path (required)
 - AI-parsed request/response bodies (optional)
-**IMPORTANT: If an apiSchema parameter (OpenAPI/Swagger file path or URL) is provided, DO NOT attempt to read or analyze the file contents. These files can be very large. Simply pass the path/URL to the tool - the backend will handle reading and processing the schema file.**
 **Note:** This tool generates one request at a time. Call multiple times for multi-step scenarios.
 **CRITICAL - Integration Test Generation After Scenario Creation:**

package/build/tools/submitReportTool.js CHANGED Viewed

@@ -15,6 +15,10 @@ const newTestSchema = z.object({
     testType: z.string().describe("Type of test created: Smoke, Contract, Integration, etc."),
     endpoint: z.string().describe("HTTP verb and path, e.g. 'GET /api/v1/products'"),
     fileName: z.string().describe("Name of the generated test file"),
+    description: z.string().optional().describe("What the test scenario covers, e.g. 'Creates a collection, adds a link, then verifies the link exists'"),
+    scenarioFile: z.string().optional().describe("Path to the scenario JSON file if one was generated (e.g. 'tests/scenario_collections-links.json')"),
+    traceFile: z.string().optional().describe("Path to the backend trace file if used or created"),
+    frontendTrace: z.string().optional().describe("Path to the Playwright/UI trace file if used or created"),
 });
 const descriptionSchema = z.object({
     description: z.string().describe("One-line description"),

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@skyramp/mcp",
-  "version": "0.0.60-rc.1",
+  "version": "0.0.60-rc.2",
   "main": "build/index.js",
   "type": "module",
   "bin": {