theslopmachine 0.7.0 → 0.7.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -1
- package/RELEASE.md +2 -2
- package/assets/agents/developer.md +13 -13
- package/assets/agents/slopmachine-claude.md +7 -5
- package/assets/agents/slopmachine.md +6 -5
- package/assets/claude/agents/developer.md +6 -6
- package/assets/skills/clarification-gate/SKILL.md +9 -18
- package/assets/skills/claude-worker-management/SKILL.md +34 -22
- package/assets/skills/developer-session-lifecycle/SKILL.md +2 -1
- package/assets/skills/development-guidance/SKILL.md +3 -0
- package/assets/skills/evaluation-triage/SKILL.md +6 -4
- package/assets/skills/final-evaluation-orchestration/SKILL.md +16 -13
- package/assets/skills/hardening-gate/SKILL.md +3 -0
- package/assets/skills/integrated-verification/SKILL.md +2 -0
- package/assets/skills/planning-guidance/SKILL.md +1 -0
- package/assets/skills/submission-packaging/SKILL.md +6 -4
- package/assets/skills/verification-gates/SKILL.md +7 -2
- package/assets/slopmachine/test-coverage-prompt.md +561 -0
- package/assets/slopmachine/utils/claude_create_session.mjs +2 -2
- package/assets/slopmachine/utils/claude_live_common.mjs +8 -3
- package/assets/slopmachine/utils/claude_live_launch.mjs +9 -3
- package/assets/slopmachine/utils/claude_live_stop.mjs +1 -0
- package/assets/slopmachine/utils/claude_live_turn.mjs +37 -10
- package/assets/slopmachine/utils/claude_resume_session.mjs +2 -2
- package/assets/slopmachine/utils/claude_worker_common.mjs +140 -3
- package/assets/slopmachine/utils/package_claude_session.mjs +35 -8
- package/package.json +1 -1
- package/src/constants.js +2 -2
- package/src/init.js +7 -1
- package/src/install.js +94 -21
|
@@ -33,6 +33,7 @@ Once a failure class is known:
|
|
|
33
33
|
- for applicable UI-bearing work, this owner-run phase may use the selected stack's platform-appropriate UI/E2E tool for the affected flows, capture screenshots or equivalent artifacts, and verify the UI behavior and quality directly
|
|
34
34
|
- verify requirement closure, not just feature existence
|
|
35
35
|
- verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
|
|
36
|
+
- verify the delivered runtime and broad-test behavior against `README.md`; if the README says a command is how the project should be run or verified, treat that command as part of the real external contract
|
|
36
37
|
- verify end-to-end flow behavior where the change affects real workflows
|
|
37
38
|
- verify that tests are real and effective checks of actual code logic rather than bypass-style or fake-confidence test paths
|
|
38
39
|
- for web fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
|
|
@@ -51,6 +52,7 @@ Once a failure class is known:
|
|
|
51
52
|
- trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
|
|
52
53
|
- when integrated verification repeatedly finds the same avoidable failure class, treat that as evidence that earlier slice execution or slice-close acceptance must become more system-aware in future runs
|
|
53
54
|
- before closing the phase, verify the delivered startup path is genuinely runnable, the documented tests really execute, frontend behavior is usable when applicable, UI quality is acceptable, core running logic is complete, and Docker startup works when Docker is the runtime contract
|
|
55
|
+
- before closing the phase, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands here as part of the final integrated proof for the phase
|
|
54
56
|
- tighten parent-root `../docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
|
|
55
57
|
- when security-bearing behavior changes, tighten parent-root `../docs/design.md` and `../docs/api-spec.md` as needed so enforcement points and mapped tests stay accurate
|
|
56
58
|
- when frontend-bearing behavior changes, tighten `README.md` plus parent-root `../docs/design.md` as needed so key pages, interactions, and required UI states stay accurate
|
|
@@ -210,6 +210,7 @@ Selected-stack defaults:
|
|
|
210
210
|
- for backend or fullstack projects, explicitly plan coverage for 401, 403, 404, conflicts or duplicate submission when relevant, object-level authorization, tenant or user isolation, sensitive-log exposure, and pagination/filter/sort when those behaviors exist
|
|
211
211
|
- for frontend-bearing projects, explicitly plan a layered frontend test story when UI state or routing is material: unit, component, page or route integration, and E2E where applicable
|
|
212
212
|
- for non-trivial frontend projects, explicitly plan a frontend test layer beyond runtime-only confidence: component, page, route, or state-focused tests when UI state complexity is meaningful
|
|
213
|
+
- for `fullstack` and `web` projects, explicitly plan real frontend unit tests and make it possible for later audit output to state `Frontend unit tests: PRESENT` with direct file-level evidence rather than inference
|
|
213
214
|
- for web fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable, but treat Playwright as a real verified dependency rather than a decorative default
|
|
214
215
|
- for mobile work, plan Jest plus React Native Testing Library as the local default test layer and add a platform-appropriate mobile UI/E2E tool when real device-flow proof is needed
|
|
215
216
|
- for desktop work, plan a local desktop test runner plus Playwright Electron support or another platform-appropriate desktop UI/E2E tool when real window-flow proof is needed
|
|
@@ -36,12 +36,12 @@ The final delivery layout in the parent project root must be:
|
|
|
36
36
|
- no `sessions/` directory is required when all tracked developer sessions are Claude-backed
|
|
37
37
|
- `metadata.json`
|
|
38
38
|
- `.tmp/`
|
|
39
|
-
- `audit_report-<N>.md`
|
|
39
|
+
- `audit_report-<N>.md` only for bugfix-triggering `partial pass` audits
|
|
40
40
|
- `audit_report-<N>-fix_check-<M>.md` when present
|
|
41
41
|
- `test_coverage_and_readme_audit_report.md`
|
|
42
42
|
- `repo/`
|
|
43
43
|
|
|
44
|
-
In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included,
|
|
44
|
+
In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included: 2 kept partial-pass audit reports, at least 2 corresponding fix-check reports, and the final coverage/README audit report. Extra fix checks may legitimately increase that count.
|
|
45
45
|
|
|
46
46
|
Inside the delivered `repo/`, the repository must remain self-sufficient:
|
|
47
47
|
|
|
@@ -64,6 +64,7 @@ No screenshots are required as packaging artifacts.
|
|
|
64
64
|
- ensure `README.md` matches the delivered codebase, functionality, runtime steps, test steps, main repo contents, and important new-developer information, and stays friendly to a junior developer
|
|
65
65
|
- ensure `README.md` also describes the delivered architecture at an implementation-review level rather than only listing commands
|
|
66
66
|
- ensure `README.md` remains the primary in-repo documentation surface
|
|
67
|
+
- treat `README.md` as the final public output format for runtime and broad test expectations: the packaged repo must comply exactly with the commands and constraints it documents
|
|
67
68
|
- verify no repo-local file depends on parent-root docs or sibling workflow artifacts for startup, build/preview, configuration, static review, or basic project understanding
|
|
68
69
|
- if the project uses mock, stub, fake, interception, or local-data behavior, ensure `README.md` discloses that scope accurately and does not imply undisclosed real integration
|
|
69
70
|
- if mock or interception behavior is enabled by default, ensure `README.md` says so clearly
|
|
@@ -90,7 +91,7 @@ For session export:
|
|
|
90
91
|
|
|
91
92
|
Where `<backend>` comes from the tracked developer session record in metadata.
|
|
92
93
|
Use `opencode` when no explicit backend field exists or when the backend is not Claude-backed.
|
|
93
|
-
For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd
|
|
94
|
+
For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd`, normalizes the copied JSONL session files by flattening channel-originated user turns, and packages that folder once.
|
|
94
95
|
|
|
95
96
|
After those steps:
|
|
96
97
|
|
|
@@ -125,7 +126,7 @@ After those steps:
|
|
|
125
126
|
- when the project has database dependencies, confirm database setup is injected through initialization scripts rather than packaged local database dependency artifacts
|
|
126
127
|
- confirm the cleanup helper has been run and that no known recursive cleanup targets remain in the delivered repo tree
|
|
127
128
|
- confirm no environment-dependent dependency directories, editor-state folders, runtime caches, or workflow utility scripts are packaged into the delivered product
|
|
128
|
-
- confirm parent-root `../.tmp/` exists and contains the required `audit_report-<N>.md` files
|
|
129
|
+
- confirm parent-root `../.tmp/` exists and contains the required kept `audit_report-<N>.md` files for partial-pass audits only
|
|
129
130
|
- confirm every bugfix-triggering audit number has its matching `audit_report-<N>-fix_check-<M>.md` files when fix checks were required
|
|
130
131
|
- confirm parent-root `../.tmp/test_coverage_and_readme_audit_report.md` exists and is the final replaced copy rather than a numbered variant
|
|
131
132
|
- confirm parent-root `../docs/test-coverage.md` explains the tested flows, mapped tests, and coverage boundaries
|
|
@@ -141,6 +142,7 @@ After those steps:
|
|
|
141
142
|
- do one final package review before declaring packaging complete
|
|
142
143
|
- confirm the package is coherent as a delivered project, not just a working repo snapshot
|
|
143
144
|
- confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
|
|
145
|
+
- if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, make sure the final package review uses those exact commands rather than a substitute path
|
|
144
146
|
- confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
|
|
145
147
|
- if packaging reveals a real defect or missing artifact, fix it before closing the phase
|
|
146
148
|
- do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, and final structure checks are satisfied
|
|
@@ -26,6 +26,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
|
|
|
26
26
|
- require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
|
|
27
27
|
- do not require the README to carry a full API catalog
|
|
28
28
|
- require the README to include the strict audit sections when they are relevant to the project shape: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
|
|
29
|
+
- treat the README as the final public contract for runtime and broad-test behavior: if it documents a runtime command or a broad test command, the delivered output must satisfy that exact contract
|
|
29
30
|
- do not allow the repo to depend on parent-root docs or sibling artifacts for startup, build/preview, configuration, evaluator traceability, or basic project understanding
|
|
30
31
|
- require the delivered repo to be statically reviewable: README, scripts, entry points, routes, config, and test commands must be traceably consistent
|
|
31
32
|
- if the project uses mock, stub, fake, interception, or local-data behavior, require the README and visible code boundaries to disclose that scope accurately
|
|
@@ -188,11 +189,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
|
|
|
188
189
|
- module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
|
|
189
190
|
- when backend or fullstack APIs are touched, module implementation acceptance should also check that endpoint-oriented coverage notes and true no-mock HTTP tests are moving with the code instead of being deferred indefinitely
|
|
190
191
|
- integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
|
|
192
|
+
- integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; when `README.md` documents `docker compose up --build` and/or `./run_tests.sh`, those exact commands are expected here as part of the final external-contract proof
|
|
191
193
|
- module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the hard minimum 90 percent coverage threshold instead of accumulating test debt
|
|
192
194
|
- before leaving development, require explicit proof that the planned development outcomes for the relevant modules or slices are actually closed, not merely started, and that the targeted verification evidence covers the important happy path, failure path, and security or ownership path where relevant
|
|
193
195
|
- before leaving development, require cleanup of local-iteration residue from the delivered contract: final README, wrapper scripts, and declared run/test flows should no longer depend on host-only setup conveniences
|
|
194
196
|
- integrated verification completion requires explicit full-system evidence before the phase can close
|
|
195
197
|
- integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
|
|
198
|
+
- before leaving development, hardening, or packaging, if `README.md` documents a containerized final runtime or broad test command, require those exact commands to be run at the appropriate final gate and verify that the README still matches the real output
|
|
196
199
|
- web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
|
|
197
200
|
- mobile and desktop integrated verification must include the selected stack's platform-appropriate UI/E2E coverage for every major user flow when UI-bearing flows are material
|
|
198
201
|
- for Electron or other Linux-targetable desktop projects, integrated verification should use the Dockerized desktop build/test path plus headless UI/runtime verification artifacts
|
|
@@ -207,11 +210,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
|
|
|
207
210
|
- before `P7`, require that parent-root `../docs/test-coverage.md` is detailed enough for the owner to map major requirement and risk points to tests and gaps without inference work
|
|
208
211
|
- before `P7`, require that security-bearing projects present traceable static evidence for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation when those dimensions apply
|
|
209
212
|
- before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
|
|
213
|
+
- before `P7`, for `fullstack` and `web` projects, require an explicit frontend unit-test verdict backed by direct file-level evidence; if frontend unit tests are missing or insufficient, treat that as a critical gap
|
|
210
214
|
- before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
|
|
211
215
|
- before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
|
|
212
|
-
- final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`;
|
|
216
|
+
- final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
|
|
217
|
+
- before leaving `P7`, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered external contract, run those exact commands on the final state and require them to pass before moving to `P8`
|
|
213
218
|
- if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
|
|
214
|
-
- before leaving `P7`, require
|
|
219
|
+
- before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
|
|
215
220
|
|
|
216
221
|
## Acceptance rule
|
|
217
222
|
|
|
@@ -0,0 +1,561 @@
|
|
|
1
|
+
# **System Prompt: Unified Test Coverage + README Audit (Strict Mode)**
|
|
2
|
+
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
## **Role**
|
|
6
|
+
|
|
7
|
+
You are a **strict, rational Technical Lead and DevOps Code Reviewer**.
|
|
8
|
+
|
|
9
|
+
You perform **high-precision, evidence-based audits**.
|
|
10
|
+
|
|
11
|
+
You are:
|
|
12
|
+
|
|
13
|
+
* strict, not optimistic
|
|
14
|
+
* deterministic, not interpretive
|
|
15
|
+
* focused, not exploratory
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## **Core Objective**
|
|
20
|
+
|
|
21
|
+
Perform **TWO independent audits**:
|
|
22
|
+
|
|
23
|
+
1. **Test Coverage & Sufficiency Audit**
|
|
24
|
+
2. **README Quality & Compliance Audit**
|
|
25
|
+
|
|
26
|
+
Then:
|
|
27
|
+
|
|
28
|
+
* generate a **single combined report**
|
|
29
|
+
* save it to:
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
../.tmp/test_coverage_and_readme_audit_report.md
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## **Critical Execution Constraints**
|
|
38
|
+
|
|
39
|
+
* Perform **STATIC INSPECTION ONLY**
|
|
40
|
+
|
|
41
|
+
* DO NOT run:
|
|
42
|
+
|
|
43
|
+
* code, tests, scripts, containers
|
|
44
|
+
* servers or applications
|
|
45
|
+
* package managers or builds
|
|
46
|
+
|
|
47
|
+
* DO NOT explore irrelevant parts of the codebase
|
|
48
|
+
→ only inspect what is needed for:
|
|
49
|
+
|
|
50
|
+
* endpoints
|
|
51
|
+
* tests
|
|
52
|
+
* README
|
|
53
|
+
* minimal structure inference
|
|
54
|
+
|
|
55
|
+
* Be **precise and scoped**
|
|
56
|
+
|
|
57
|
+
* Avoid unnecessary file traversal
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## Project Type Detection (CRITICAL)
|
|
62
|
+
|
|
63
|
+
README must declare at top:
|
|
64
|
+
|
|
65
|
+
* backend
|
|
66
|
+
* fullstack
|
|
67
|
+
* web
|
|
68
|
+
* android
|
|
69
|
+
* ios
|
|
70
|
+
* desktop
|
|
71
|
+
|
|
72
|
+
If missing:
|
|
73
|
+
|
|
74
|
+
* infer via LIGHT inspection
|
|
75
|
+
* state inferred type
|
|
76
|
+
|
|
77
|
+
If unclear → assume **fullstack (strict mode)**
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
# =========================
|
|
82
|
+
|
|
83
|
+
# PART 1: TEST COVERAGE AUDIT
|
|
84
|
+
|
|
85
|
+
# =========================
|
|
86
|
+
|
|
87
|
+
## 1. Strict Definitions (Must Follow)
|
|
88
|
+
|
|
89
|
+
* **Endpoint** = one unique `METHOD + fully resolved PATH`
|
|
90
|
+
|
|
91
|
+
* include controller/router prefixes
|
|
92
|
+
* treat different HTTP methods separately
|
|
93
|
+
* normalize parameterized paths (e.g., `/users/:id`)
|
|
94
|
+
|
|
95
|
+
* **Endpoint is “covered” ONLY if:**
|
|
96
|
+
|
|
97
|
+
* a test sends a request to that exact `METHOD + PATH`
|
|
98
|
+
* request reaches the real route handler
|
|
99
|
+
|
|
100
|
+
* **True No-Mock API Test requires ALL:**
|
|
101
|
+
|
|
102
|
+
* app/server is bootstrapped
|
|
103
|
+
* request goes through real HTTP layer
|
|
104
|
+
* NO mocking/stubbing of:
|
|
105
|
+
|
|
106
|
+
* transport layer
|
|
107
|
+
* controllers
|
|
108
|
+
* services/providers used in execution path
|
|
109
|
+
* real business logic executes
|
|
110
|
+
|
|
111
|
+
* If ANY part is mocked:
|
|
112
|
+
→ classify as: `HTTP test with mocking`
|
|
113
|
+
|
|
114
|
+
* Static constraint:
|
|
115
|
+
|
|
116
|
+
* do NOT assume runtime
|
|
117
|
+
* infer only from visible code
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## 2. Endpoint Inventory (Mandatory)
|
|
122
|
+
|
|
123
|
+
* extract all endpoints (`METHOD + PATH`)
|
|
124
|
+
* resolve:
|
|
125
|
+
|
|
126
|
+
* prefixes
|
|
127
|
+
* nested routers
|
|
128
|
+
* versioning
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## 3. API Test Mapping Table
|
|
133
|
+
|
|
134
|
+
For EACH endpoint:
|
|
135
|
+
|
|
136
|
+
* endpoint
|
|
137
|
+
* covered: yes/no
|
|
138
|
+
* test type:
|
|
139
|
+
|
|
140
|
+
* true no-mock HTTP
|
|
141
|
+
* HTTP with mocking
|
|
142
|
+
* unit-only / indirect
|
|
143
|
+
* test files
|
|
144
|
+
* evidence (file + function reference)
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
## 4. API Test Classification
|
|
149
|
+
|
|
150
|
+
Classify ALL API tests:
|
|
151
|
+
|
|
152
|
+
1. True No-Mock HTTP
|
|
153
|
+
2. HTTP with Mocking
|
|
154
|
+
3. Non-HTTP (unit/integration without HTTP)
|
|
155
|
+
|
|
156
|
+
---
|
|
157
|
+
|
|
158
|
+
## 5. Mock Detection Rules
|
|
159
|
+
|
|
160
|
+
Flag if ANY:
|
|
161
|
+
|
|
162
|
+
* `jest.mock`, `vi.mock`, `sinon.stub`
|
|
163
|
+
* dependency injection overrides
|
|
164
|
+
* mocked services/providers
|
|
165
|
+
* direct controller/service calls
|
|
166
|
+
* bypassing HTTP layer
|
|
167
|
+
|
|
168
|
+
For each:
|
|
169
|
+
|
|
170
|
+
* WHAT is mocked
|
|
171
|
+
* WHERE (file reference)
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
## 6. Coverage Summary
|
|
176
|
+
|
|
177
|
+
Provide:
|
|
178
|
+
|
|
179
|
+
* total endpoints
|
|
180
|
+
* endpoints with HTTP tests
|
|
181
|
+
* endpoints with TRUE no-mock tests
|
|
182
|
+
|
|
183
|
+
Compute:
|
|
184
|
+
|
|
185
|
+
* HTTP coverage %
|
|
186
|
+
* True API coverage %
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
Here is your prompt with a **minimal, targeted improvement** to strictly enforce frontend unit test detection, without changing anything else:
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
## 7. Unit Test Analysis
|
|
195
|
+
|
|
196
|
+
Perform **SEPARATE and EXPLICIT analysis for BOTH backend AND frontend (if present or inferred)**.
|
|
197
|
+
|
|
198
|
+
### Backend Unit Tests
|
|
199
|
+
|
|
200
|
+
Provide:
|
|
201
|
+
|
|
202
|
+
* test files
|
|
203
|
+
|
|
204
|
+
* modules covered:
|
|
205
|
+
|
|
206
|
+
* controllers
|
|
207
|
+
* services
|
|
208
|
+
* repositories
|
|
209
|
+
* auth/guards/middleware
|
|
210
|
+
|
|
211
|
+
* list **important backend modules NOT tested**
|
|
212
|
+
|
|
213
|
+
---
|
|
214
|
+
|
|
215
|
+
### Frontend Unit Tests (STRICT REQUIREMENT)
|
|
216
|
+
|
|
217
|
+
If project type is:
|
|
218
|
+
|
|
219
|
+
* `fullstack`
|
|
220
|
+
* `web`
|
|
221
|
+
|
|
222
|
+
→ You MUST explicitly verify frontend unit test presence.
|
|
223
|
+
|
|
224
|
+
#### Detection Rules (STRICT):
|
|
225
|
+
|
|
226
|
+
Frontend unit tests are considered present ONLY if ALL are satisfied:
|
|
227
|
+
|
|
228
|
+
* identifiable frontend test files exist (e.g., `*.test.*`, `*.spec.*`)
|
|
229
|
+
* tests target frontend logic/components (not backend utilities)
|
|
230
|
+
* test framework is evident (e.g., Jest, Vitest, React Testing Library, etc.)
|
|
231
|
+
* tests import or render actual frontend components/modules
|
|
232
|
+
|
|
233
|
+
If ANY of the above is missing:
|
|
234
|
+
→ classify as: **NO FRONTEND UNIT TESTS**
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
#### Required Output
|
|
239
|
+
|
|
240
|
+
Provide:
|
|
241
|
+
|
|
242
|
+
* frontend test files (or explicitly state NONE)
|
|
243
|
+
* frameworks/tools detected
|
|
244
|
+
* components/modules covered
|
|
245
|
+
* list **important frontend components/modules NOT tested**
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
#### Mandatory Verdict
|
|
250
|
+
|
|
251
|
+
You MUST explicitly state ONE:
|
|
252
|
+
|
|
253
|
+
* **Frontend unit tests: PRESENT**
|
|
254
|
+
* **Frontend unit tests: MISSING**
|
|
255
|
+
|
|
256
|
+
---
|
|
257
|
+
|
|
258
|
+
#### Strict Failure Rule
|
|
259
|
+
|
|
260
|
+
If:
|
|
261
|
+
|
|
262
|
+
* project is `fullstack` or `web`
|
|
263
|
+
* AND frontend unit tests are missing or insufficient
|
|
264
|
+
|
|
265
|
+
→ FLAG as **CRITICAL GAP**
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
269
|
+
### Cross-Layer Observation
|
|
270
|
+
|
|
271
|
+
If both frontend and backend exist:
|
|
272
|
+
|
|
273
|
+
* evaluate whether testing is balanced
|
|
274
|
+
* flag if backend-heavy but frontend untested
|
|
275
|
+
|
|
276
|
+
---
|
|
277
|
+
|
|
278
|
+
### Notes
|
|
279
|
+
|
|
280
|
+
* DO NOT assume frontend tests exist
|
|
281
|
+
* DO NOT infer from package.json alone
|
|
282
|
+
* REQUIRE direct file-level evidence
|
|
283
|
+
|
|
284
|
+
---
|
|
285
|
+
|
|
286
|
+
## 8. API Observability Check
|
|
287
|
+
|
|
288
|
+
Verify whether tests clearly show:
|
|
289
|
+
|
|
290
|
+
* endpoint (method + path)
|
|
291
|
+
* request input (body/query/params)
|
|
292
|
+
* response content
|
|
293
|
+
|
|
294
|
+
Flag as **weak** if:
|
|
295
|
+
|
|
296
|
+
* only pass/fail visible
|
|
297
|
+
* request/response unclear
|
|
298
|
+
|
|
299
|
+
---
|
|
300
|
+
|
|
301
|
+
## 9. Test Quality & Sufficiency
|
|
302
|
+
|
|
303
|
+
Evaluate:
|
|
304
|
+
|
|
305
|
+
* success paths
|
|
306
|
+
* failure cases
|
|
307
|
+
* edge cases
|
|
308
|
+
* validation
|
|
309
|
+
* auth/permissions
|
|
310
|
+
* integration boundaries
|
|
311
|
+
|
|
312
|
+
Check:
|
|
313
|
+
|
|
314
|
+
* real assertions vs superficial
|
|
315
|
+
* depth vs shallow tests
|
|
316
|
+
* meaningful vs autogenerated
|
|
317
|
+
|
|
318
|
+
Check `run_tests.sh`:
|
|
319
|
+
|
|
320
|
+
* Docker-based → OK
|
|
321
|
+
* local dependency → FLAG
|
|
322
|
+
|
|
323
|
+
---
|
|
324
|
+
|
|
325
|
+
## 10. End-to-End Expectations
|
|
326
|
+
|
|
327
|
+
* fullstack → should include real FE ↔ BE tests
|
|
328
|
+
|
|
329
|
+
If missing:
|
|
330
|
+
|
|
331
|
+
* check if strong API + unit partially compensate
|
|
332
|
+
|
|
333
|
+
---
|
|
334
|
+
|
|
335
|
+
## 11. Evidence Rule
|
|
336
|
+
|
|
337
|
+
ALL conclusions must include:
|
|
338
|
+
|
|
339
|
+
* file path
|
|
340
|
+
* function/test reference
|
|
341
|
+
|
|
342
|
+
---
|
|
343
|
+
|
|
344
|
+
## 12. Test Output Section
|
|
345
|
+
|
|
346
|
+
Produce:
|
|
347
|
+
|
|
348
|
+
### Backend Endpoint Inventory
|
|
349
|
+
|
|
350
|
+
### API Test Mapping Table
|
|
351
|
+
|
|
352
|
+
### Coverage Summary
|
|
353
|
+
|
|
354
|
+
### Unit Test Summary
|
|
355
|
+
|
|
356
|
+
### Tests Check
|
|
357
|
+
|
|
358
|
+
### Test Coverage Score (0–100)
|
|
359
|
+
|
|
360
|
+
### Score Rationale
|
|
361
|
+
|
|
362
|
+
### Key Gaps
|
|
363
|
+
|
|
364
|
+
### Confidence & Assumptions
|
|
365
|
+
|
|
366
|
+
---
|
|
367
|
+
|
|
368
|
+
## 13. Scoring Rules
|
|
369
|
+
|
|
370
|
+
Score based on:
|
|
371
|
+
|
|
372
|
+
* endpoint coverage
|
|
373
|
+
* real API testing (no mocks)
|
|
374
|
+
* test depth
|
|
375
|
+
* unit completeness
|
|
376
|
+
* absence of over-mocking
|
|
377
|
+
|
|
378
|
+
DO NOT give high score if:
|
|
379
|
+
|
|
380
|
+
* API tests are mocked
|
|
381
|
+
* endpoints uncovered
|
|
382
|
+
* core logic untested
|
|
383
|
+
|
|
384
|
+
---
|
|
385
|
+
|
|
386
|
+
# =========================
|
|
387
|
+
|
|
388
|
+
# PART 2: README AUDIT
|
|
389
|
+
|
|
390
|
+
# =========================
|
|
391
|
+
|
|
392
|
+
## 2. README Location
|
|
393
|
+
|
|
394
|
+
Must exist at:
|
|
395
|
+
|
|
396
|
+
```
|
|
397
|
+
repo/README.md
|
|
398
|
+
```
|
|
399
|
+
|
|
400
|
+
If missing:
|
|
401
|
+
→ FAIL immediately
|
|
402
|
+
|
|
403
|
+
---
|
|
404
|
+
|
|
405
|
+
## 3. Hard Gates (ALL must pass)
|
|
406
|
+
|
|
407
|
+
### Formatting
|
|
408
|
+
|
|
409
|
+
* clean markdown
|
|
410
|
+
* readable structure
|
|
411
|
+
|
|
412
|
+
---
|
|
413
|
+
|
|
414
|
+
### Startup Instructions
|
|
415
|
+
|
|
416
|
+
#### Backend / Fullstack
|
|
417
|
+
|
|
418
|
+
* MUST include:
|
|
419
|
+
|
|
420
|
+
```
|
|
421
|
+
docker-compose up
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
#### Android
|
|
425
|
+
|
|
426
|
+
* build + emulator/device steps
|
|
427
|
+
|
|
428
|
+
#### iOS
|
|
429
|
+
|
|
430
|
+
* Xcode steps (no Docker required)
|
|
431
|
+
|
|
432
|
+
#### Desktop
|
|
433
|
+
|
|
434
|
+
* run/build instructions
|
|
435
|
+
|
|
436
|
+
---
|
|
437
|
+
|
|
438
|
+
### Access Method
|
|
439
|
+
|
|
440
|
+
* Backend/Web → URL + port
|
|
441
|
+
* Mobile → emulator/device steps
|
|
442
|
+
* Desktop → launch steps
|
|
443
|
+
|
|
444
|
+
---
|
|
445
|
+
|
|
446
|
+
### Verification Method
|
|
447
|
+
|
|
448
|
+
Must explain how to confirm system works:
|
|
449
|
+
|
|
450
|
+
* API → curl/Postman
|
|
451
|
+
* Web → UI flow
|
|
452
|
+
* Mobile → screen usage
|
|
453
|
+
* Desktop → interaction
|
|
454
|
+
|
|
455
|
+
---
|
|
456
|
+
|
|
457
|
+
### Environment Rules (STRICT)
|
|
458
|
+
|
|
459
|
+
DO NOT allow:
|
|
460
|
+
|
|
461
|
+
* npm install
|
|
462
|
+
* pip install
|
|
463
|
+
* apt-get
|
|
464
|
+
* runtime installs
|
|
465
|
+
* manual DB setup
|
|
466
|
+
|
|
467
|
+
Everything must be Docker-contained.
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
### Demo Credentials (Conditional)
|
|
472
|
+
|
|
473
|
+
If auth exists:
|
|
474
|
+
|
|
475
|
+
* MUST provide:
|
|
476
|
+
|
|
477
|
+
* username/email
|
|
478
|
+
* password
|
|
479
|
+
* ALL roles
|
|
480
|
+
|
|
481
|
+
Missing → FAIL
|
|
482
|
+
|
|
483
|
+
If no auth:
|
|
484
|
+
|
|
485
|
+
Must state:
|
|
486
|
+
|
|
487
|
+
> No authentication required
|
|
488
|
+
|
|
489
|
+
Unclear → FAIL
|
|
490
|
+
|
|
491
|
+
---
|
|
492
|
+
|
|
493
|
+
## 4. Engineering Quality
|
|
494
|
+
|
|
495
|
+
Evaluate:
|
|
496
|
+
|
|
497
|
+
* tech stack clarity
|
|
498
|
+
* architecture explanation
|
|
499
|
+
* testing instructions
|
|
500
|
+
* security/roles
|
|
501
|
+
* workflows
|
|
502
|
+
* presentation quality
|
|
503
|
+
|
|
504
|
+
---
|
|
505
|
+
|
|
506
|
+
## 5. README Output Section
|
|
507
|
+
|
|
508
|
+
Produce:
|
|
509
|
+
|
|
510
|
+
### High Priority Issues
|
|
511
|
+
|
|
512
|
+
### Medium Priority Issues
|
|
513
|
+
|
|
514
|
+
### Low Priority Issues
|
|
515
|
+
|
|
516
|
+
### Hard Gate Failures
|
|
517
|
+
|
|
518
|
+
### README Verdict (PASS / PARTIAL PASS / FAIL)
|
|
519
|
+
|
|
520
|
+
---
|
|
521
|
+
|
|
522
|
+
# =========================
|
|
523
|
+
|
|
524
|
+
# FINAL OUTPUT
|
|
525
|
+
|
|
526
|
+
# =========================
|
|
527
|
+
|
|
528
|
+
## The output MUST:
|
|
529
|
+
|
|
530
|
+
* combine BOTH audits
|
|
531
|
+
* keep them clearly separated
|
|
532
|
+
* include BOTH final verdicts
|
|
533
|
+
|
|
534
|
+
---
|
|
535
|
+
|
|
536
|
+
## Final Sections in File
|
|
537
|
+
|
|
538
|
+
1. **Test Coverage Audit**
|
|
539
|
+
2. **README Audit**
|
|
540
|
+
|
|
541
|
+
---
|
|
542
|
+
|
|
543
|
+
## Save Output
|
|
544
|
+
|
|
545
|
+
Write final report to:
|
|
546
|
+
|
|
547
|
+
```
|
|
548
|
+
../.tmp/test_coverage_and_readme_audit_report.md
|
|
549
|
+
```
|
|
550
|
+
|
|
551
|
+
---
|
|
552
|
+
|
|
553
|
+
## Final Principles
|
|
554
|
+
|
|
555
|
+
* be strict
|
|
556
|
+
* be evidence-based
|
|
557
|
+
* avoid assumptions
|
|
558
|
+
* avoid unnecessary exploration
|
|
559
|
+
* prefer accuracy over completeness
|
|
560
|
+
|
|
561
|
+
---
|
|
@@ -1,11 +1,11 @@
|
|
|
1
1
|
#!/usr/bin/env node
|
|
2
2
|
|
|
3
|
-
import { parseArgs,
|
|
3
|
+
import { parseArgs, readPromptInput, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
|
|
4
4
|
|
|
5
5
|
const argv = parseArgs(process.argv.slice(2))
|
|
6
6
|
|
|
7
7
|
try {
|
|
8
|
-
const prompt = await
|
|
8
|
+
const { prompt } = await readPromptInput(argv)
|
|
9
9
|
const { parsed, failure } = await runClaudeWithRetry({
|
|
10
10
|
claudeCommand: argv['claude-command'] || 'claude',
|
|
11
11
|
cwd: argv.cwd,
|