npm - @skyramp/mcp - Versions diffs - 0.1.0-rc.6 → 0.1.1 - Mend

@skyramp/mcp 0.1.0-rc.6 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/build/playwright/registerPlaywrightTools.js +2 -0
package/build/prompts/test-recommendation/analysisOutputPrompt.js +12 -2
package/build/prompts/test-recommendation/test-recommendation-prompt.js +146 -27
package/build/prompts/test-recommendation/test-recommendation-prompt.test.js +202 -5
package/build/prompts/testbot/testbot-prompts.js +30 -14
package/build/services/TestDiscoveryService.js +417 -58
package/build/services/TestDiscoveryService.test.js +361 -0
package/build/tools/test-management/actionsTool.js +4 -1
package/build/tools/test-management/analyzeChangesTool.js +76 -9
package/build/tools/test-management/analyzeTestHealthTool.js +6 -2
package/build/types/RepositoryAnalysis.js +1 -0
package/build/types/TestAnalysis.js +6 -1
package/build/utils/docker.test.js +1 -1
package/build/utils/routeParsers.js +7 -0
package/build/utils/routeParsers.test.js +29 -1
package/build/utils/versions.js +1 -1
package/node_modules/playwright/lib/common/expectBundleImpl.js +221 -221
package/node_modules/playwright/lib/mcp/browser/tools/extensionFrames.js +180 -0
package/node_modules/playwright/lib/mcp/browser/tools/keyboard.js +2 -2
package/node_modules/playwright/lib/utilsBundleImpl.js +49 -49
package/package.json +2 -2

package/build/prompts/testbot/testbot-prompts.js CHANGED Viewed

@@ -51,7 +51,7 @@ Use those recommendations as your baseline. Only add or remove tests that the us
 1. Call \`skyramp_analyze_changes\` with \`repositoryPath\`: "${repositoryPath}", \`scope\`: "branch_diff", \`topN\`: ${maxRecommendations}, \`maxGenerate\`: ${maxGenerate}${baseBranch ? `, \`baseBranch\`: "${baseBranch}"` : ""}${prNumber ? `, \`prNumber\`: ${prNumber}` : ""}${stateOutputFile ? `, \`stateOutputFile\`: "${stateOutputFile}"` : ""} — discovers existing Skyramp tests, scans endpoints changed in the diff, loads workspace config, and returns ${maxRecommendations} ranked ADD recommendations (${maxGenerate} to generate, ${maxRecommendations - maxGenerate} as additional).${prNumber ? " Uses PR comment history to avoid re-recommending already-generated tests." : ""}
    **If \`skyramp_analyze_changes\` returns an error:** retry once only if the error is transient (timeout, network blip, temporary unavailability) — do NOT retry for permanent errors (invalid repository path, missing required parameter, authentication failure). If it fails again, call \`skyramp_submit_report\` with a minimal valid payload: leave all test arrays empty and add the error to \`issuesFound\`. Refer to the \`skyramp_submit_report\` schema for required fields. Do NOT attempt Task 2 without a valid stateFile.
-   **If all changed files are non-application** (CI/CD, docs, lock files, config) → skip to Task 3 (Submit Report) with empty arrays.
+   **If all changed files are non-application** (CI/CD, docs, lock files, config) → skip to Task 3 (Submit Report) with empty arrays and a single \`issuesFound\` entry explaining why (same format as the zero-test path below).
 2. **Maintain existing tests** using the rules in \`<drift_analysis_rules>\` below. For each existing test reported by \`skyramp_analyze_changes\`, score it and choose the action exactly as directed by the Action Decision Matrix in \`<drift_analysis_rules>\`. Only read test files that require action per that matrix — do NOT read files that will be IGNORED. **Do NOT read source files (routers, models, CRUD, components) — all the information you need is in the \`skyramp_analyze_changes\` output and the diff.** When reading multiple test files, **read them all in a single parallel batch** — do NOT read them one at a time. Apply actions directly. Results go in \`testMaintenance\`.
@@ -87,18 +87,29 @@ ${task1Section}
 ## Task 2: Generate New Tests
-${userPrompt ? "" : "Drift-based maintenance (Task 1) is complete. This step only processes the GENERATE list. Exception: if a GENERATE item targets a resource with an existing contract test, UPDATE that file instead (see covered-resource handling below) — this is a generation-driven edit, not a maintenance re-run."}
+${userPrompt ? "Generate only the tests that the user requested from the Additional Recommendations. The rules below still apply." : "Drift-based maintenance (Task 1) is complete. This step only processes the GENERATE list. Exception: if a GENERATE item targets a resource with an existing `[skyramp]` contract test, UPDATE that test file (see covered-resource handling below) — a new test case added to an existing file counts toward the budget and is reported in `newTestsCreated`."}
 - **MANDATORY — use the pre-ranked GENERATE list as-is**: The Execution Plan's GENERATE section governs ADD actions. You MUST generate exactly those scenarios in the exact order listed. Do NOT substitute, rename, or replace a GENERATE item. If parameter grounding uncovers a distinct bug-catching scenario not already in the GENERATE or ADDITIONAL list, generate it after all planned GENERATE items are complete and report it in \`newTestsCreated\` — this is an additional test driven by source-code analysis and does not count against the GENERATE budget.
 - Scenario JSON files are always new files — always generate them for new methods. Every generated scenario JSON must have a corresponding new integration test generated from it via \`skyramp_integration_test_generation\`.
-- **Covered-resource handling (aligns with Execution Plan Step 0):** When a GENERATE item targets a resource that already has an existing test file of the **same test type** (e.g. existing contract test → GENERATE contract test for same resource):
-  - **Contract tests**: UPDATE the existing file (add the new method's test cases). Report in \`testMaintenance\`, NOT \`newTestsCreated\`. This does NOT count toward the budget — advance to the next candidate.
-  - **Integration/scenario tests**: Always generate as a new file via the scenario pipeline (\`skyramp_batch_scenario_test_generation\` → \`skyramp_integration_test_generation\`), even if an existing integration test covers the same resource. A new multi-step scenario (e.g. create → PATCH → verify recalculation) is a distinct test file. Report in \`newTestsCreated\` and count toward the budget.
-  - **UI tests**: Always generate as a new file. Report in \`newTestsCreated\`.
+- Covered-resource handling (aligns with Execution Plan Step 0): When a GENERATE item targets a resource that already has an existing test file covering the same endpoint:
+  - If the existing test source is \`[external]\`, skip the resource entirely — the external test already provides coverage. Do NOT UPDATE, REGENERATE, or DELETE external tests.
+  - If the existing test is tagged \`[skyramp]\`, apply type-specific rules:
+    - Contract tests: UPDATE the existing Skyramp test file (add the new method's test cases). A new test case is a new test even if the file already exists — report in \`newTestsCreated\` and count toward the budget.
+    - Integration/scenario tests: Always generate as a new file via the scenario pipeline (\`skyramp_batch_scenario_test_generation\` → \`skyramp_integration_test_generation\`), even if an existing integration test covers the same resource. A new multi-step scenario (e.g. create → PATCH → verify recalculation) is a distinct test file. Report in \`newTestsCreated\` and count toward the budget.
+    - UI tests: Always generate as a new file. Report in \`newTestsCreated\`.
   Keep advancing until you have created exactly ${maxGenerate} new test files OR exhausted all candidates.
-- **Example**: If enrichment reveals that sending \`discount_value\` without \`discount_type\` silently orphans the value (a concrete bug), complete all planned GENERATE items first, then generate this discovered scenario as an extra test and report it in \`newTestsCreated\`.
-- **Total generated**: Follow the **"Budget: N generate"** line in the Execution Plan. Process every GENERATE-tagged item in order. Items that become UPDATEs (covered resource) do not count — backfill from ADDITIONAL candidates (highest-ranked first) until \`newTestsCreated\` reaches ${maxGenerate} or all candidates are exhausted.
-- **UI test priority**: If the diff contains frontend/UI changes (e.g. \`.tsx\`, \`.jsx\`, \`.vue\`, \`.svelte\` files), you MUST attempt to generate at least one UI test. Use \`browser_navigate\` to the app's base URL — if the app responds, record a trace and generate the test. Only skip if the app is unreachable. This takes priority over generating additional backend-only tests.
+- Example: If enrichment reveals that sending \`discount_value\` without \`discount_type\` silently orphans the value (a concrete bug), complete all planned GENERATE items first, then generate this discovered scenario as an extra test and report it in \`newTestsCreated\`.
+- Total generated: Follow the "Budget: N generate" line in the Execution Plan. Process every GENERATE-tagged item in order. Backfill from ADDITIONAL candidates (highest-ranked first) until \`newTestsCreated\` reaches ${maxGenerate} or all candidates are exhausted.
+- **UI test priority**: If the diff contains frontend/UI changes (e.g. \`.tsx\`, \`.jsx\`, \`.vue\`, \`.svelte\` files), you MUST attempt to generate at least one UI test. Use \`browser_navigate\` to the app's base URL — if the app responds, record a trace and generate the test.
+  **Skip only if one of these conditions is met:**
+  - **(a) App is unreachable** — \`browser_navigate\` fails or connection is refused.
+  - **(b) Unintegrated non-route component** — the changed file is a leaf component (not a framework route/entrypoint) that has no integration point in the running app. To confirm:
+    1. Grep for the component's exported name AND its module path/filename across all production source files (excluding \`*.test.*\`, \`*.spec.*\`, \`*.stories.*\`, \`__tests__/\` directories — only production code imports count).
+    2. If no production file imports, re-exports, or renders it, the component has no DOM node in the running app → unintegrated.
+    3. **Exception**: if the same PR also adds a route/page file (e.g. under Next.js \`pages/\` or \`app/\`) that imports the component, the route IS the integration point — test through it.
+  **Never** apply the unintegrated heuristic to framework route/entrypoint files themselves — those are always reachable by convention.
+  **Never** generate tests for unrelated pages as a substitute for an unintegrated component.
+  This rule takes priority over generating additional backend-only tests.
 - **Always generate a test for critical bugs, even if it will fail.** When a GENERATE-tagged item targets a page or endpoint with a known bug, do NOT skip it because you expect the test to fail — a failing test that documents a bug is more valuable than a text-only description. This applies within the existing GENERATE budget; do not add extra tests beyond the plan.
    - For UI rendering bugs: navigate to the broken page and add a \`browser_assert\` that verifies the page rendered its expected content (e.g. assert the page heading is visible). The assertion will fail on the broken page, which is the correct outcome — it documents the bug as a failing test.
    - The assertion MUST target the broken page itself, not a different page that works. If \`/orders/{id}/edit\` crashes, assert on \`/orders/{id}/edit\` (e.g. "Edit Order" heading visible), NOT on \`/orders\`.
@@ -197,8 +208,7 @@ If a test **generation** tool call fails:
 2. If it fails again, **skip** that candidate and move to the next ranked candidate.
 3. If all candidates in the GENERATE set fail, fall back to generating the **simplest possible test**: a single contract test for the highest-scored endpoint (GET → 200 or POST → 201).
    **Exception — frontend-only PRs**: If the diff modifies ONLY frontend files (\`.tsx\`, \`.jsx\`, \`.vue\`, \`.svelte\`, \`.css\`, \`.html\`) AND browser recording was not possible, do NOT generate a backend fallback contract test — it is irrelevant to the PR. Instead move ALL GENERATE candidates to \`additionalRecommendations\` and proceed to Task 3.
-4. You MUST generate **at least 1 test** for any PR that touches application code. Zero generated tests is NOT acceptable — unless the frontend-only exception above applies.
-5. Log skipped candidates in \`issuesFound\` with the error message.
+4. Log skipped candidates in \`issuesFound\` with the error message.
 If a test **execution** (\`skyramp_execute_test\`) fails for a newly generated test:
 1. Read the error output to diagnose the root cause (4xx on prereq step, assertion mismatch, floating-point precision, 500 from app bug, timeout, etc.).
@@ -235,12 +245,18 @@ Do not make any changes other than the assertion enhancements described above. F
 **Before calling \`skyramp_submit_report\` — mandatory count check:**
 If you skipped here due to non-application changes (per Task 1), submit with empty arrays — the count checks below do not apply.
-**If you generated zero new tests because the PR has no testable behavioral surface (cosmetic/docs/style/dependency-only):**
+**If you generated zero new tests because the PR has no testable behavioral surface:**
+This applies when the diff contains ONLY changes with no observable API or UI behavior change. Examples:
+- Cosmetic/docs/style: JSDoc updates, CSS reformats, comment-only changes
+- Dependency-only: version bumps with no API surface change
+- Dead code / unintegrated utility or component: a new helper function, utility, or UI component added to the codebase but not imported, mounted, or rendered anywhere — use this classification only after confirming the new symbol does not appear as an import or render call in any other source file; do NOT classify as dead code based solely on the diff. For UI components specifically: an unintegrated component has no DOM node in the running app and cannot be browser-tested regardless of how complex its logic is
+- Config-only: linter rules, build config, environment variable additions with no runtime behavior change
+In these cases:
 - \`newTestsCreated\` must be \`[]\`
-- Add exactly one entry to \`issuesFound\`: \`"No testable behavioral surface detected: <brief reason, e.g. 'JSDoc-only changes with no endpoint modifications', 'CSS reformat with no logic changes', 'dependency version bump with no API surface change'>. Zero new tests generated by design."\`
+- \`issuesFound\` must be \`[]\` — do NOT add a "No testable behavioral surface" entry; the business case already explains the abstention
 - \`businessCaseAnalysis\` must be a one-sentence summary of what the PR actually does (do NOT leave it blank)
 - \`additionalRecommendations\` must be \`[]\` — do NOT recommend tests for a no-surface PR
-- A blank \`issuesFound\` when tests were intentionally skipped will lose report quality points
 Otherwise: in \`newTestsCreated\`, you must have exactly ${maxGenerate} budget-counting new tests for the planned GENERATE items. Only new files (ADD) created for those planned GENERATE items count toward this ${maxGenerate} target — GENERATE items converted to UPDATE do not. You may also include at most one additional discovered-scenario file in \`newTestsCreated\` (the bug-catching test generated after all planned items); that extra test does **not** count against the ${maxGenerate} budget. If you have fewer than ${maxGenerate} budget-counting new tests, backfill from the remaining ADDITIONAL candidates before proceeding. Only proceed with fewer than ${maxGenerate} budget-counting new tests if all candidates failed after retry AND the fallback single-contract test also failed.