@skyramp/mcp 0.2.0 → 0.2.1-rc.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/build/prompts/test-maintenance/drift-analysis-prompt.js +87 -98
- package/build/prompts/test-maintenance/drift-analysis-prompt.test.js +60 -92
- package/build/prompts/test-maintenance/driftAnalysisSections.js +197 -139
- package/build/prompts/testbot/testbot-prompts.js +7 -4
- package/build/prompts/testbot/testbot-prompts.test.js +22 -17
- package/build/services/TestDiscoveryService.js +9 -39
- package/build/tools/test-management/actionsTool.js +148 -166
- package/build/tools/test-management/analyzeChangesTool.js +10 -2
- package/build/tools/test-management/analyzeTestHealthTool.js +22 -10
- package/package.json +1 -1
|
@@ -2,37 +2,42 @@
|
|
|
2
2
|
* Modular section builders for the Drift Analysis prompt,
|
|
3
3
|
* mirroring the recommendationSections.ts pattern.
|
|
4
4
|
*/
|
|
5
|
-
|
|
5
|
+
import { AUTH_MIDDLEWARE_PATTERNS_STR } from "../../utils/workspaceAuth.js";
|
|
6
|
+
export function buildActionDecisionTree() {
|
|
6
7
|
return `<decision_rules>
|
|
7
|
-
|
|
8
|
+
**Before the numbered checks, apply the scope gate:** can this test's service or interface boundary actually reach the changed code? If the answer is clearly no — the test targets a definitively different service, a read-only replica, a completely separate microservice, or a different protocol — assign IGNORE and stop. A failed scope gate is terminal: it does not produce a signal subject to severity comparison below. When reachability is uncertain (dynamic serializers, inherited base models, conditional field exposure), use VERIFY instead of IGNORE.
|
|
8
9
|
|
|
9
|
-
For each
|
|
10
|
+
**Before working through any individual check, do a single pass over the entire diff** to record all high-signal patterns. For each matched diff line write one line: \`{pattern type} — "{diff line}" — affects {endpoint/test}\`. Build this detection list first — it is your working artifact for all tests. An action with no entry in this list is unsupported.
|
|
10
11
|
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
12
|
+
High-signal patterns to look for in the pre-scan:
|
|
13
|
+
- Route removed or renamed: \`- @app.route\`, \`- router.get\`, \`- @GetMapping\`, or paired \`-\`/\`+\` on a route decorator
|
|
14
|
+
- Field type/removal: \`- field: int\`/\`+ field: string\`, \`- "responseField":\`
|
|
15
|
+
- Status/enum change: \`- return 200\`/\`+ return 201\`, \`- status: "active"\`/\`+ status: "enabled"\`
|
|
16
|
+
- Root wrapper change: \`- return Response({...})\`/\`+ return Response({"data":{...},"meta":{...}})\`
|
|
17
|
+
- New field in serializer/view/output formatter: \`+ "newField":\` or \`+ newField =\` inside a Response or serializer
|
|
18
|
+
- New field in model/migration only (no serializer): \`+ newField = Column(...)\`
|
|
19
|
+
- Auth added/removed/changed: \`+ @require_auth\`, \`- @require_auth\`, token-type change
|
|
20
|
+
- Scope narrowed: \`+ requireRole\`, \`+ raise PermissionError\`, \`+ if not is_owner\`, \`+ [x for x in xs if x.owner == caller_id]\`
|
|
21
|
+
- Behavioral: \`+ raise ValidationError\`/\`+ HTTPException(409)\` on new \`if\`, \`+ VALID_TRANSITIONS\`, sync→async (\`- return 200\`/\`+ return 202\`), formula change (\`- total = a - b\`/\`+ total = a + tax - b\`)
|
|
22
|
+
|
|
23
|
+
Then, for each test where the changed code *is* reachable, work through the individual checks below using your pre-built detection list. Collect **all** matching signals, then assign the single highest-severity action across all matches. Severity order (highest first): **DELETE > REGENERATE > UPDATE > VERIFY > IGNORE**. **Before assigning UPDATE, REGENERATE, or DELETE, quote the specific diff line(s) that triggered it in the rationale. If you cannot point to a diff line this test's endpoint can observe, the action is IGNORE or VERIFY, not UPDATE.**
|
|
19
24
|
|
|
20
25
|
Rules:
|
|
21
|
-
-
|
|
22
|
-
-
|
|
23
|
-
-
|
|
24
|
-
- Prefer IGNORE over VERIFY when all changed files are unrelated to the test's endpoint.
|
|
26
|
+
- Collect all signals; assign the highest-severity action across them. Include all matched signals in the rationale and all matching \`updateInstructions\` — a diff that renames a path AND adds a field requires both a URL patch and a new assertion.
|
|
27
|
+
- DELETE when all covered endpoints no longer exist; REGENERATE when they still exist but the root response wrapper changed and essentially every assertion is now invalid. In all other cases, prefer UPDATE.
|
|
28
|
+
- REGENERATE when the root response wrapper changed (flat→nested, new envelope object, root key renamed) and essentially every assertion is invalid — if you can describe the fix as patching N specific paths, it is UPDATE regardless of how many paths there are.
|
|
29
|
+
- Prefer IGNORE over VERIFY when all changed files are unrelated to the test's endpoint. Exception: if the diff touches a shared serializer, base model, or response formatter that the test's endpoint uses, prefer VERIFY even if no route file changed.
|
|
25
30
|
- ADD actions belong in the next step — complete this assessment with IGNORE / VERIFY / UPDATE / REGENERATE / DELETE only.
|
|
26
31
|
|
|
27
32
|
<examples>
|
|
28
33
|
<example>
|
|
29
|
-
Diff adds one field to a
|
|
34
|
+
Diff adds one field to a serializer and renames a URL path segment:
|
|
30
35
|
\`\`\`
|
|
31
36
|
- @app.route("/users/<id>/orders")
|
|
32
37
|
+ @app.route("/users/<id>/purchases")
|
|
33
|
-
+ "total_items": len(order.items)
|
|
38
|
+
+ "total_items": len(order.items) # inside the serializer this test hits
|
|
34
39
|
\`\`\`
|
|
35
|
-
→ **UPDATE**:
|
|
40
|
+
→ **UPDATE**: serializer signal confirmed in diff (total_items added to the same serializer) + path rename. Patch the URL and add an assertion for \`total_items\`.
|
|
36
41
|
</example>
|
|
37
42
|
<example>
|
|
38
43
|
Diff wraps the entire response in a new envelope object:
|
|
@@ -40,135 +45,197 @@ Diff wraps the entire response in a new envelope object:
|
|
|
40
45
|
- return Response({"id": ..., "status": ..., "items": [...]})
|
|
41
46
|
+ return Response({"data": {"id": ..., "status": ..., "items": [...]}, "meta": {"page": 1}})
|
|
42
47
|
\`\`\`
|
|
43
|
-
→ **REGENERATE**: root
|
|
48
|
+
→ **REGENERATE**: root wrapper changed — every existing assertion (e.g. \`response["id"]\`) is broken. Rewrite from scratch.
|
|
49
|
+
</example>
|
|
50
|
+
<example>
|
|
51
|
+
Diff adds a field to a model/migration only — project uses explicit serializers (DRF, FastAPI, etc.):
|
|
52
|
+
\`\`\`
|
|
53
|
+
+ sort_order = Column(Integer, nullable=True) # in models.py
|
|
54
|
+
\`\`\`
|
|
55
|
+
No serializer or field-inclusion list changed.
|
|
56
|
+
→ **VERIFY**: model-only signal in a project with explicit serializers — cannot confirm from the diff whether the field is included in the serializer's \`fields\` list and therefore exposed in API responses.
|
|
57
|
+
</example>
|
|
58
|
+
<example>
|
|
59
|
+
Diff adds a field to a schema/model only — project has no explicit serializer layer (ORM fields passed through directly):
|
|
60
|
+
\`\`\`
|
|
61
|
+
+ sort_order: {type: 'integer', nullable: true} # in db/schema.js
|
|
62
|
+
\`\`\`
|
|
63
|
+
No serializer file changed. No \`fields =\` or field-exclusion list exists for this resource.
|
|
64
|
+
→ **UPDATE**: project has no serializer gate, so new schema columns are auto-exposed in API responses. Augment the test to assert the new field is present and round-trips correctly.
|
|
65
|
+
</example>
|
|
66
|
+
<example>
|
|
67
|
+
Diff changes a status code in the handler:
|
|
68
|
+
\`\`\`
|
|
69
|
+
- res.status(200).json(...)
|
|
70
|
+
+ res.status(201).json(...)
|
|
71
|
+
\`\`\`
|
|
72
|
+
→ **UPDATE**: the test asserts \`toBe(200)\` which now fails. Patch the status assertion.
|
|
73
|
+
</example>
|
|
74
|
+
<example>
|
|
75
|
+
Diff adds a role gate to a route the test covers:
|
|
76
|
+
\`\`\`
|
|
77
|
+
+ if (user.role !== "owner") {
|
|
78
|
+
+ return res.status(403).json({ error: "forbidden_role" });
|
|
79
|
+
+ }
|
|
80
|
+
\`\`\`
|
|
81
|
+
→ **UPDATE**: the test's existing token now gets 403. Send a token with sufficient role and add a 403 negative assertion for the restricted role. (Authorization scope narrowed — not an auth-mechanism change; the Auth/AuthZ and Behavioral Contract checks cover this.)
|
|
82
|
+
</example>
|
|
83
|
+
<example>
|
|
84
|
+
Diff adds a state-transition guard:
|
|
85
|
+
\`\`\`
|
|
86
|
+
+ const VALID_TRANSITIONS = { draft: ["review"], review: ["published"] };
|
|
87
|
+
+ if (!VALID_TRANSITIONS[currentStatus]?.includes(newStatus)) {
|
|
88
|
+
+ throw new HTTPException(409, "invalid_transition");
|
|
89
|
+
+ }
|
|
90
|
+
\`\`\`
|
|
91
|
+
→ **UPDATE**: an integration test that previously posted \`draft→published\` directly now gets 409. Chain through the valid states (draft→review→published) and add a 409 assertion for the direct skip.
|
|
44
92
|
</example>
|
|
45
93
|
</examples>
|
|
46
94
|
</decision_rules>`;
|
|
47
95
|
}
|
|
96
|
+
// Retained for backwards compatibility — no longer rendered in the prompt.
|
|
97
|
+
// Diff signals are now inlined into each individual check function.
|
|
98
|
+
/** @deprecated use the individual check functions; this function is no longer part of the prompt */
|
|
48
99
|
export function buildBreakingChangePatterns() {
|
|
49
|
-
return
|
|
100
|
+
return "";
|
|
101
|
+
}
|
|
102
|
+
export function buildCheckEndpointExistence() {
|
|
103
|
+
return `Does the endpoint the test targets still exist in the codebase?
|
|
50
104
|
|
|
51
|
-
|
|
105
|
+
Diff signals to look for:
|
|
106
|
+
- Route removed: \`- @app.route("/path")\`, \`- router.get("/path")\`, \`- @GetMapping("/path")\`
|
|
107
|
+
- Route renamed: paired \`-\` and \`+\` on a route decorator with a different path
|
|
52
108
|
|
|
53
|
-
|
|
54
|
-
-
|
|
55
|
-
-
|
|
56
|
-
-
|
|
57
|
-
|
|
109
|
+
Actions:
|
|
110
|
+
- ALL endpoints the test covers were removed → DELETE (the entire test file is obsolete)
|
|
111
|
+
- SOME methods removed but others remain → UPDATE (remove test functions for deleted methods, keep the rest)
|
|
112
|
+
- Endpoint renamed → UPDATE (path substitution; supply \`renamedEndpoints\`)`;
|
|
113
|
+
}
|
|
114
|
+
export function buildCheckResponseShape() {
|
|
115
|
+
return `Has the request body or response structure changed in a way that breaks the test?
|
|
58
116
|
|
|
59
|
-
|
|
60
|
-
- Field type
|
|
61
|
-
- Required field added: \`+ required: [..., "newField"]\`
|
|
117
|
+
Diff signals to look for:
|
|
118
|
+
- Field type change: \`- field: int\` / \`+ field: string\`
|
|
119
|
+
- Required request body field added: \`+ required: [..., "newField"]\`
|
|
120
|
+
- Required query param added with no default
|
|
62
121
|
- Response field removed: \`- "responseField":\`
|
|
63
|
-
- Enum value
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
|
|
75
|
-
### Additive response field changes (non-breaking but coverage gap)
|
|
76
|
-
These do NOT break existing assertions but leave the new field untested. Always flag as UPDATE for covered endpoints.
|
|
77
|
-
- \`+ "newField": queryset.filter(...).count()\` added inside a \`Response({...})\` or \`res.json({...})\`
|
|
78
|
-
- \`+ newField = serializers.XXXField()\` added to a serializer used by a tested endpoint
|
|
79
|
-
- \`+ "newField":\` added to a response body dict returned by the endpoint
|
|
80
|
-
- New key added inside an existing dict/object returned by the endpoint`;
|
|
122
|
+
- Enum value changed: \`- status: "active"\` / \`+ status: "enabled"\`
|
|
123
|
+
- Status code changed: \`- return 200\` / \`+ return 201\`
|
|
124
|
+
- Root wrapper added: \`- return Response({...})\` / \`+ return Response({"data": {...}, "meta": {...}})\`
|
|
125
|
+
|
|
126
|
+
Actions:
|
|
127
|
+
- Type changes, new required fields, removed asserted fields, status/enum changes → UPDATE
|
|
128
|
+
- Root response wrapper changed and essentially every assertion is now invalid → REGENERATE
|
|
129
|
+
|
|
130
|
+
**UPDATE vs REGENERATE — the deciding question is whether the root response wrapper changed:**
|
|
131
|
+
- **REGENERATE** only when a new top-level envelope object wraps the entire payload or the root key is renamed so that essentially every existing assertion must change.
|
|
132
|
+
- **UPDATE** for everything else. If you can describe the fix as "patch these N assertion paths", it is UPDATE regardless of how many paths there are.
|
|
133
|
+
- When every assertion in the file is invalid, it is REGENERATE. When you can still patch individual paths, it is UPDATE.`;
|
|
81
134
|
}
|
|
82
|
-
export function
|
|
83
|
-
return
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
-
|
|
91
|
-
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
### Check B2: Additive response field changes (coverage gaps)
|
|
105
|
-
**Even if existing assertions still pass**, does the diff add a new field to the response of an endpoint this test already covers?
|
|
106
|
-
- Look at the diff for lines like \`+ "newField":\` or \`+ newField =\` inside a view/serializer this test hits
|
|
107
|
-
- If YES → action: UPDATE
|
|
108
|
-
- This applies even when the test only checks status codes — the test should be extended to cover the new field
|
|
109
|
-
- A new response field on a covered endpoint always triggers UPDATE — even when existing assertions still pass.
|
|
110
|
-
|
|
111
|
-
### Check C: Auth changes
|
|
112
|
-
Has the authentication mechanism for this endpoint changed?
|
|
113
|
-
- Auth added where none existed → action: UPDATE
|
|
114
|
-
- Auth method changed (bearer→cookie) → action: UPDATE
|
|
115
|
-
- Auth removed → action: VERIFY
|
|
116
|
-
|
|
117
|
-
### Check D: Assign action
|
|
118
|
-
Based on the above, choose the action (IGNORE / VERIFY / UPDATE / REGENERATE / DELETE) and provide a 1-2 sentence rationale.
|
|
119
|
-
- If Check B2 flagged an additive field → action must be UPDATE, even if Checks B/C found no breaking changes.`;
|
|
135
|
+
export function buildCheckAuthAndAuthorization() {
|
|
136
|
+
return `Has the authentication or authorization for this endpoint changed?
|
|
137
|
+
|
|
138
|
+
**Authentication mechanism**
|
|
139
|
+
|
|
140
|
+
Diff signals to look for: ${AUTH_MIDDLEWARE_PATTERNS_STR}. Also: \`@requiresRole\`/\`@Protected\`, \`validateToken\`/\`checkPermission\`/\`verifyHMAC\`, imports from auth/security packages, \`- @require_auth\` (removal), token type change (Bearer → Cookie).
|
|
141
|
+
|
|
142
|
+
Actions:
|
|
143
|
+
- Auth added where none existed → UPDATE (test would 401/403 on every request without the new credential)
|
|
144
|
+
- Auth method changed (e.g. bearer→cookie) → UPDATE (test sends the wrong credential type)
|
|
145
|
+
- Auth removed and test asserts a 401/403 response that will no longer fire → UPDATE
|
|
146
|
+
- Auth removed and test does not assert on auth responses → VERIFY (endpoint may now be intentionally public)
|
|
147
|
+
|
|
148
|
+
**Authorization scope** (same credential, narrower access)
|
|
149
|
+
|
|
150
|
+
Diff signals to look for: \`+ requireRole\`, \`+ requireCreateX\`/\`requireDeleteX\`, \`+ assert_*_scope\`, \`+ ALLOWED_ROLES.includes\`/\`ASSIGNABLE_ROLES\`, \`+ if not is_owner\`, \`+ raise PermissionError\`/\`HTTPException(403)\`, \`+ [x for x in xs if x.owner == caller_id]\`, new role-carrying request header (e.g. \`X-Workspace-Role\`).
|
|
151
|
+
|
|
152
|
+
Actions:
|
|
153
|
+
- Role or ownership gate added → UPDATE (test's existing token may now get 403; send a sufficient-role token and add a 403 negative assertion)
|
|
154
|
+
- Caller-identity filtering added → UPDATE (test's token now returns a subset; adjust expectations or use an admin-scope token)
|
|
155
|
+
|
|
156
|
+
Do NOT assign IGNORE just because the auth *mechanism* is unchanged — scope narrowing breaks a token-valid test.`;
|
|
120
157
|
}
|
|
121
|
-
export function
|
|
122
|
-
return
|
|
158
|
+
export function buildCheckBehavioralContract() {
|
|
159
|
+
return `Has the endpoint's BEHAVIOR changed while its response shape stayed the same? A test can break even when no field was added or removed.
|
|
160
|
+
|
|
161
|
+
Diff signals and actions:
|
|
162
|
+
- **Validation tightened**: \`+ raise ValidationError\`/\`+ throw new ValidationError\` gated on field value, \`+ Field(pattern=...)\`/\`ge=\`/\`le=\`/\`max_length=\` on an existing field → UPDATE (fix the payload to satisfy the new constraint; add the 4xx negative case)
|
|
163
|
+
- **New conditional rejection / state guard**: \`+ raise HTTPException(status_code=409)\`/\`+ res.status(409)\` inside a new \`if\`, \`+ VALID_TRANSITIONS\`, \`+ allowed_states = ...\` → UPDATE (chain through valid states; assert the rejection status for the now-illegal path)
|
|
164
|
+
- **Sync → async**: \`- return 200 result\` / \`+ return 202 {job_id}\` → UPDATE (assert \`202\` and the job/id field; remove old result-field assertions from the immediate response)
|
|
165
|
+
- **Computed-field formula changed**: \`- total = a - b\` / \`+ total = a + tax - b\` on an existing asserted field → UPDATE; describe the new formula in \`updateInstructions\` and provide the recomputed expected value where inputs are known from the diff
|
|
166
|
+
- **Behavior gated on a request header**: old shape returned only when a version header is sent; new shape is now the default → UPDATE (migrate assertions to the new default shape, or pin the old shape by sending the version header)
|
|
123
167
|
|
|
124
|
-
**
|
|
125
|
-
|
|
126
|
-
|
|
168
|
+
**Reachability for behavioral changes:** The service/interface scope gate still applies — if the test targets a completely different service or protocol, IGNORE. However, do NOT use the absence of a route or serializer file in the diff as grounds for IGNORE. Behavioral changes (new error branches, new validation, status-code changes) are observable from any test calling the same endpoint, even when the logic lives in an internal handler, middleware, or utility file rather than the route file itself.`;
|
|
169
|
+
}
|
|
170
|
+
export function buildCheckAssignAction() {
|
|
171
|
+
return `Based on the above checks, choose the action (IGNORE / VERIFY / UPDATE / REGENERATE / DELETE) and provide a 1-2 sentence rationale.
|
|
172
|
+
|
|
173
|
+
**Every action requires a specific rationale — including IGNORE:**
|
|
174
|
+
- UPDATE / REGENERATE / DELETE: quote the specific diff line that triggered it.
|
|
175
|
+
- VERIFY: name the uncertain element (e.g. "shared serializer, cannot confirm field exposure without reading the file").
|
|
176
|
+
- IGNORE: name the specific reason the changed code cannot reach this test's endpoint (e.g. "diff only touches \`auth/session-service.js\` — this test targets \`/api/v1/orders\` which has no session dependency"). Generic "unrelated endpoint" or "service boundary" without a diff reference is not sufficient.
|
|
177
|
+
|
|
178
|
+
- If the Additive Fields check flagged a new field with serializer signal confirmed in the diff → action is UPDATE. If the Additive Fields check returned VERIFY (model/schema only, serializer not in diff) → action remains VERIFY.
|
|
179
|
+
- **Service/layer scope gate is terminal:** If the changed code is clearly not reachable through the service or base URL this test targets, assign IGNORE — this overrides all other checks. When reachability is uncertain, assign VERIFY rather than IGNORE.
|
|
180
|
+
- **Pre-commit verification — confirm all three before finalizing UPDATE/REGENERATE/DELETE:**
|
|
181
|
+
1. You can quote a specific diff line this test's endpoint observes that triggered the action.
|
|
182
|
+
2. The changed code is reachable through this test's service and base URL.
|
|
183
|
+
3. For REGENERATE: every assertion in the file is invalid, not just some — if you can patch N paths, it is UPDATE.
|
|
184
|
+
If any check fails, downgrade to VERIFY or IGNORE.
|
|
185
|
+
- **For user-written (external) tests** marked \`[external]\` in the test list:
|
|
186
|
+
- UPDATE is permitted — targeted edits (fix renamed URL, add assertion for new field).
|
|
187
|
+
- REGENERATE and DELETE are **not permitted** — assign those actions in your recommendations but \`skyramp_actions\` will surface them as report-only findings for the developer to act on. Do NOT attempt to rewrite or delete a user-authored test file.`;
|
|
188
|
+
}
|
|
189
|
+
export function buildCheckAdditiveFields() {
|
|
190
|
+
return `Even if existing assertions still pass, new response fields on a covered endpoint may need a new assertion.
|
|
127
191
|
|
|
128
|
-
|
|
129
|
-
-
|
|
130
|
-
-
|
|
192
|
+
Diff signals to look for:
|
|
193
|
+
- \`+ "newField": ...\` inside a \`Response({...})\`, \`res.json({...})\`, serializer class, or output formatter → serializer signal confirmed
|
|
194
|
+
- \`+ newField = serializers.XXXField()\` in a serializer used by this endpoint → serializer signal confirmed
|
|
195
|
+
- \`+ newField = Column(...)\` or \`+ newField:\` in a model/migration only, with no serializer change → model-only signal
|
|
131
196
|
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
| GET | contract, smoke |
|
|
137
|
-
| DELETE | integration, smoke |
|
|
197
|
+
Actions:
|
|
198
|
+
- Serializer signal confirmed → UPDATE (add an assertion for the new field)
|
|
199
|
+
- Model/schema only, serializer not in diff, **project has explicit serializer layer** (a separate serializer file, Pydantic schema, field allowlist, or \`fields =\` definition controls what gets exposed) → VERIFY (cannot confirm from diff alone whether the field is included)
|
|
200
|
+
- Model/schema only, serializer not in diff, **project has no explicit serializer gate** (no field allowlist or exclusion for this resource; ORM/model fields are passed through directly) → UPDATE (field is auto-exposed in responses; augment the test)
|
|
138
201
|
|
|
139
|
-
|
|
202
|
+
To determine which applies: check whether the repo has a serializer file or field-inclusion list for this resource. If none exists in the diff context, prefer UPDATE. When genuinely uncertain, prefer VERIFY.`;
|
|
140
203
|
}
|
|
141
204
|
export function buildUpdateExecutionRules() {
|
|
142
205
|
return `<execution_rules>
|
|
143
|
-
|
|
206
|
+
**UPDATE means edit-in-place — never use a generation tool for UPDATE**
|
|
207
|
+
UPDATE instructs you to modify the existing file using the Edit tool. Do NOT call \`skyramp_contract_test_generation\`, \`skyramp_integration_test_generation\`, or any other generation tool for an UPDATE action — generation tools create a new file and will overwrite or duplicate the existing one. Only use generation tools for REGENERATE actions.
|
|
144
208
|
|
|
145
209
|
When applying UPDATE actions to existing test files, follow these rules in addition to the drift-detected changes:
|
|
146
210
|
|
|
147
|
-
|
|
211
|
+
**Test file ordering**
|
|
148
212
|
Place mutation test functions (PATCH, PUT, POST) **before** any DELETE test function targeting the same resource. DELETE removes the resource — any mutation call after it will 404. When inserting a new mutation test, place it above the DELETE function and above the DELETE call in the \`if __name__ == "__main__"\` block (or equivalent runner entrypoint).
|
|
149
213
|
|
|
150
|
-
|
|
214
|
+
**Happy path first**
|
|
151
215
|
When adding a new HTTP method (PUT, PATCH, POST) to an existing test file, always include a 2xx success assertion first. Error-path tests (404, 422) may follow, but the happy path case is required.
|
|
152
216
|
|
|
153
|
-
|
|
217
|
+
**All test files for a resource**
|
|
154
218
|
When a diff adds a new HTTP method to a resource, UPDATE covers **all** existing test files for that resource — contract, integration, and UI. Apply UPDATE to every file the analyze tool reported for that resource path; do not stop after updating the first one.
|
|
155
219
|
|
|
156
|
-
|
|
220
|
+
**PATCH/PUT with child collections**
|
|
157
221
|
Child collection arrays (e.g. \`items\`, \`products\`, \`line_items\`) drive computed totals — a test that omits them cannot catch the most common mutation bugs. When the request/response includes a child collection:
|
|
158
222
|
1. Include the child array with at least one item containing the FK field (e.g. \`product_id\`) and a \`quantity\` field.
|
|
159
223
|
2. Assert each item's FK field and \`quantity\` match the sent values.
|
|
160
224
|
3. Assert the top-level computed total (e.g. \`total_amount\`) equals the expected math from the items.
|
|
161
225
|
|
|
162
|
-
|
|
226
|
+
**REGENERATE**
|
|
163
227
|
Call the appropriate generation tool to replace the existing test from scratch. Use the same filename so it overwrites the old file.
|
|
164
228
|
|
|
165
|
-
|
|
166
|
-
|
|
229
|
+
**DELETE**
|
|
230
|
+
Assign DELETE when ALL endpoints the test covers were removed from the codebase. \`skyramp_actions\` surfaces DELETE as a report-only finding — it does not delete the file automatically. The developer must delete the obsolete test file manually. If only SOME methods were removed, use UPDATE instead — remove the test functions for deleted methods and keep the rest.
|
|
167
231
|
|
|
168
|
-
|
|
232
|
+
**Test data isolation**
|
|
169
233
|
Never use hardcoded resource IDs (e.g. \`order_id=1\`) in any test step, including GET or DELETE steps. Always create required resources via prior POST steps and chain IDs dynamically. Use timestamp-based unique names for created resources (e.g. \`"Product-\${int(time.time())}"\`) to prevent collisions across test runs.
|
|
170
234
|
|
|
171
|
-
|
|
235
|
+
**Assertion quality**
|
|
236
|
+
When adding assertions, assert response *values* (field equals expected), not just field presence or status code — match the assertion depth the test already uses for other fields.
|
|
237
|
+
|
|
238
|
+
**Enhance assertions after UPDATE**
|
|
172
239
|
Call \`skyramp_enhance_assertions\` with \`testFile\` set to the absolute path of the test file you just updated, \`enhanceType: "maintenance"\`, and the matching \`testType\` based on the file you are editing:
|
|
173
240
|
- **Integration test file** (multi-step chained requests): call with \`testType: "integration"\`
|
|
174
241
|
- **Contract-provider test file** (single endpoint with \`beforeAll\`/\`afterAll\` setup, provider mode): call with \`testType: "contract"\`. Skip for consumer-mode contract tests.
|
|
@@ -177,44 +244,35 @@ Call \`skyramp_enhance_assertions\` with \`testFile\` set to the absolute path o
|
|
|
177
244
|
Then apply every instruction returned by the tool to the test file.
|
|
178
245
|
</execution_rules>`;
|
|
179
246
|
}
|
|
180
|
-
export function buildDriftOutputChecklist(existingTestCount, newEndpointCount,
|
|
181
|
-
const finalStep =
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
247
|
+
export function buildDriftOutputChecklist(existingTestCount, newEndpointCount, stateFile) {
|
|
248
|
+
const finalStep = `**Final step**
|
|
249
|
+
After completing all assessments above, call \`skyramp_actions\` with \`stateFile: "${stateFile ?? "<stateFile>"}"\` and a \`recommendations\` entry for every test assessed. For each entry include:
|
|
250
|
+
- \`testFile\` (absolute path as reported by the analysis tools)
|
|
251
|
+
- \`action\`
|
|
252
|
+
- \`rationale\` — one sentence naming the diff line or pattern that triggered the action
|
|
253
|
+
- \`updateInstructions\` — written for the downstream LLM that will edit the file; name the specific fields, types, paths, or constraints that must change. Example: "Added stock_count: int (ge=0, default=0) to ProductBase. Test hits GET /products — assert stock_count is present and non-negative." Vague instructions produce incomplete edits — be specific.
|
|
254
|
+
- \`renamedEndpoints\` (for path-rename updates)
|
|
255
|
+
|
|
256
|
+
For \`[external]\` tests: include them in \`recommendations[]\` with the assessed action. \`skyramp_actions\` will apply UPDATE edits and surface REGENERATE/DELETE as report-only findings — it will never rewrite or delete a user-authored file.
|
|
257
|
+
|
|
258
|
+
Do not edit or regenerate files before calling \`skyramp_actions\`. After calling it, follow its emitted instructions to apply UPDATE edits and run REGENERATE tool calls.`;
|
|
259
|
+
const existingTestSection = `**Existing tests (${existingTestCount} total)**
|
|
192
260
|
For each existing test:
|
|
193
261
|
- **IGNORE/VERIFY tests**: one line each: \`{testFile} — IGNORE\` or \`{testFile} — VERIFY\`. Rationale omitted for brevity.
|
|
194
262
|
- **UPDATE/REGENERATE/DELETE tests**: output the full block:
|
|
195
263
|
\`\`\`
|
|
196
264
|
Test: {testFile}
|
|
197
265
|
Action: {UPDATE | REGENERATE | DELETE}
|
|
198
|
-
Rationale: {
|
|
266
|
+
Rationale: {action} because {quoted diff signal}; affects {assertion/path}
|
|
199
267
|
\`\`\`
|
|
200
268
|
Focus your analysis on tests that need action — keep reasoning for unchanged tests to a single line.`;
|
|
201
|
-
const newEndpointSection =
|
|
202
|
-
?
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
Endpoint: {METHOD} {path}
|
|
208
|
-
Action: ADD
|
|
209
|
-
Test types: {contract | integration | smoke | ...}
|
|
210
|
-
Rationale: {1 sentence}
|
|
211
|
-
\`\`\``
|
|
212
|
-
: `### New endpoints
|
|
213
|
-
No new endpoints detected in this diff.`;
|
|
214
|
-
const sections = [existingTestSection, newEndpointSection, finalStep].filter(s => s.length > 0);
|
|
269
|
+
const newEndpointSection = newEndpointCount > 0
|
|
270
|
+
? `**New endpoints (${newEndpointCount} detected)**
|
|
271
|
+
For EACH new endpoint, output one line: \`{METHOD} {path} — {recommended test types} — {1 sentence rationale}\`
|
|
272
|
+
Do NOT include new endpoints in the \`recommendations[]\` passed to \`skyramp_actions\` — ADD is handled separately by the test generation tools.`
|
|
273
|
+
: `**New endpoints** — none detected in this diff.`;
|
|
274
|
+
const sections = [existingTestSection, newEndpointSection, finalStep];
|
|
215
275
|
return `<output_format>
|
|
216
|
-
## Output Checklist
|
|
217
|
-
|
|
218
276
|
Complete ALL of the following:
|
|
219
277
|
|
|
220
278
|
${sections.join("\n\n")}
|
|
@@ -2,7 +2,6 @@ import { z } from "zod";
|
|
|
2
2
|
import { logger } from "../../utils/logger.js";
|
|
3
3
|
import { AnalyticsService } from "../../services/AnalyticsService.js";
|
|
4
4
|
import { MAX_TESTS_TO_GENERATE, MAX_RECOMMENDATIONS, MAX_CRITICAL_TESTS, PATH_PARAM_UUID_GUIDANCE, AUTH_CONFLICT_ERROR_MSG, } from "../test-recommendation/recommendationSections.js";
|
|
5
|
-
import { buildDriftAnalysisPrompt } from "../test-maintenance/drift-analysis-prompt.js";
|
|
6
5
|
import { getTraceRecordingPromptText } from "../../playwright/traceRecordingPrompt.js";
|
|
7
6
|
import { isContractConsumerModeEnabled } from "../../utils/featureFlags.js";
|
|
8
7
|
import { resolveServiceDetailsRef } from "../../utils/utils.js";
|
|
@@ -66,9 +65,13 @@ Use those recommendations as your baseline. Only add or remove tests that the us
|
|
|
66
65
|
**If \`skyramp_analyze_changes\` returns an error:** retry once only if the error is transient (timeout, network blip, temporary unavailability) — do NOT retry for permanent errors (invalid repository path, missing required parameter, authentication failure). If it fails again, call \`skyramp_submit_report\` with a minimal valid payload: leave all test arrays empty and add the error to \`issuesFound\`. Refer to the \`skyramp_submit_report\` schema for required fields. Do NOT attempt Task 2 without a valid stateFile.
|
|
67
66
|
**If all changed files are non-application** (CI/CD, docs, lock files, config) → skip to Task 3 (Submit Report) with empty arrays and a single \`issuesFound\` entry explaining why (same format as the zero-test path below).
|
|
68
67
|
|
|
69
|
-
3. **Maintain existing tests
|
|
68
|
+
3. **Maintain existing tests:**
|
|
70
69
|
|
|
71
|
-
|
|
70
|
+
a. Call \`skyramp_analyze_test_health\` with \`stateFile\` (from step 2). Follow every instruction in the returned \`<drift_analysis_rules>\` block — use the Action Decision Tree, apply the Breaking Change Patterns, and work through each check (Endpoint Existence, Response Shape, Additive Fields, Auth/AuthZ, Behavioral Contract, Assign Action). **Do NOT read source files** — all information you need is in the \`skyramp_analyze_changes\` output and the diff. When reading multiple test files that require action, **read them all in a single parallel batch**.
|
|
71
|
+
|
|
72
|
+
b. For each test scored UPDATE or REGENERATE, write \`updateInstructions\` (a concise description of what must change) **before** calling \`skyramp_actions\`. This articulation step prevents the LLM from letting file content override diff-based reasoning.
|
|
73
|
+
|
|
74
|
+
c. Call \`skyramp_actions\` with \`stateFile\` (from step 2) and your \`recommendations[]\` — one entry per test assessed, including IGNORE and VERIFY. The tool returns file content for each UPDATE/REGENERATE test — apply the edits. Results go in \`testMaintenance\`.
|
|
72
75
|
|
|
73
76
|
4. **Code review:** From the \`skyramp_analyze_changes\` output and the existing test files you read for maintenance, note any logic bugs. Do NOT read additional source files just for code review — use what is already available from the analysis and test file reads. Common patterns to flag:
|
|
74
77
|
- Computed fields not recalculated after mutation (e.g. \`total_amount\` unchanged after items are added/removed)
|
|
@@ -331,7 +334,7 @@ Call \`skyramp_submit_report\` with \`summaryOutputFile\`: "${summaryOutputFile}
|
|
|
331
334
|
- **additionalRecommendations**: AT MOST ${maxRecommendations - maxGenerate} items.
|
|
332
335
|
- For \`testType: "contract"\` entries: **\`primaryEndpoint\` is required** (e.g. \`"GET /api/v1/users/{user_id}"\`). The tool will reject the submission without it — do not omit it or you will be forced to resubmit.
|
|
333
336
|
- For \`testType: "integration"\` or \`"e2e"\` entries: omit \`primaryEndpoint\` — use \`description\` to list the endpoints involved instead.
|
|
334
|
-
- **testMaintenance**: Use \`[]\` **only** if no existing Skyramp tests were found in the repository. If existing tests were found (any score), include one entry per test. Set \`action\` to the exact drift action
|
|
337
|
+
- **testMaintenance**: Use \`[]\` **only** if no existing Skyramp tests were found in the repository. If existing tests were found (any score), include one entry per test. Set \`action\` to the exact drift action assigned by the Action Decision Tree (\`UPDATE\`, \`REGENERATE\`, \`DELETE\`, \`VERIFY\`, or \`IGNORE\`). For UPDATE/REGENERATE/DELETE tests that were modified and executed, populate all fields from real before/after execution results. For VERIFY/IGNORE tests (not modified), derive \`beforeStatus\` from the drift assessment you performed in step 3 (typically \`"Pass"\` if no drift was detected), set \`afterStatus\` to \`"Skipped"\`, and use \`afterDetails\` to explain why (e.g. "IGNORE: no drift detected — endpoint not modified in this PR"). Do **not** add entries for tests that were not assessed in step 3.
|
|
335
338
|
|
|
336
339
|
---
|
|
337
340
|
|
|
@@ -202,35 +202,40 @@ describe("uiCredentials in getTestbotPrompt", () => {
|
|
|
202
202
|
.toThrow("</ui-credentials>");
|
|
203
203
|
});
|
|
204
204
|
});
|
|
205
|
-
describe("drift analysis
|
|
206
|
-
|
|
207
|
-
|
|
205
|
+
describe("drift analysis — runtime tool call (step 3)", () => {
|
|
206
|
+
// The build-time embed of buildDriftAnalysisPrompt was replaced with a
|
|
207
|
+
// runtime instruction: LLM calls skyramp_analyze_test_health then skyramp_actions.
|
|
208
208
|
function basePrompt() {
|
|
209
209
|
return getTestbotPrompt(baseArgs.prTitle, baseArgs.prDescription, baseArgs.summaryOutputFile, baseArgs.repositoryPath);
|
|
210
210
|
}
|
|
211
|
-
it("
|
|
211
|
+
it("step 3 instructs the LLM to call skyramp_analyze_test_health", () => {
|
|
212
212
|
const prompt = basePrompt();
|
|
213
|
-
expect(prompt).toContain("
|
|
214
|
-
expect(prompt).toContain("</drift_analysis_rules>");
|
|
213
|
+
expect(prompt).toContain("skyramp_analyze_test_health");
|
|
215
214
|
});
|
|
216
|
-
it("
|
|
215
|
+
it("step 3 instructs the LLM to call skyramp_actions with recommendations[]", () => {
|
|
217
216
|
const prompt = basePrompt();
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
const block = prompt.slice(start, end);
|
|
221
|
-
expect(block).not.toContain("You are acting as a Skyramp Integration Architect");
|
|
217
|
+
expect(prompt).toContain("skyramp_actions");
|
|
218
|
+
expect(prompt).toContain("recommendations[]");
|
|
222
219
|
});
|
|
223
|
-
it("
|
|
220
|
+
it("step 3 appears inside Task 1, before Task 2", () => {
|
|
224
221
|
const prompt = basePrompt();
|
|
225
222
|
const task1Pos = prompt.indexOf("## Task 1");
|
|
226
|
-
const
|
|
223
|
+
const healthPos = prompt.indexOf("skyramp_analyze_test_health");
|
|
227
224
|
const task2Pos = prompt.indexOf("## Task 2");
|
|
228
|
-
expect(
|
|
229
|
-
expect(
|
|
225
|
+
expect(healthPos).toBeGreaterThan(task1Pos);
|
|
226
|
+
expect(healthPos).toBeLessThan(task2Pos);
|
|
230
227
|
});
|
|
231
|
-
it("
|
|
228
|
+
it("does not contain the build-time embedded drift_analysis_rules content (Action Decision Tree)", () => {
|
|
229
|
+
// The rules are now fetched at runtime via skyramp_analyze_test_health —
|
|
230
|
+
// the <drift_analysis_rules> tag may appear as a reference in prose,
|
|
231
|
+
// but the actual rule content (Action Decision Tree) must not be baked in.
|
|
232
232
|
const prompt = basePrompt();
|
|
233
|
-
expect(prompt).toContain("
|
|
233
|
+
expect(prompt).not.toContain("Action Decision Tree\n\nFor each existing test");
|
|
234
|
+
expect(prompt).not.toContain("Update Execution Rules\n\nWhen applying UPDATE actions");
|
|
235
|
+
});
|
|
236
|
+
it("does not contain a persona statement (no nested identity from old embed)", () => {
|
|
237
|
+
const prompt = basePrompt();
|
|
238
|
+
expect(prompt).not.toContain("You are acting as a Skyramp Integration Architect");
|
|
234
239
|
});
|
|
235
240
|
});
|
|
236
241
|
describe("UI grounding via Task 2 capture-act-capture", () => {
|