theslopmachine 0.7.0 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. package/README.md +1 -1
  2. package/RELEASE.md +2 -2
  3. package/assets/agents/developer.md +13 -13
  4. package/assets/agents/slopmachine-claude.md +7 -5
  5. package/assets/agents/slopmachine.md +6 -5
  6. package/assets/claude/agents/developer.md +6 -6
  7. package/assets/skills/clarification-gate/SKILL.md +9 -18
  8. package/assets/skills/claude-worker-management/SKILL.md +34 -22
  9. package/assets/skills/developer-session-lifecycle/SKILL.md +2 -1
  10. package/assets/skills/development-guidance/SKILL.md +3 -0
  11. package/assets/skills/evaluation-triage/SKILL.md +6 -4
  12. package/assets/skills/final-evaluation-orchestration/SKILL.md +16 -13
  13. package/assets/skills/hardening-gate/SKILL.md +3 -0
  14. package/assets/skills/integrated-verification/SKILL.md +2 -0
  15. package/assets/skills/planning-guidance/SKILL.md +1 -0
  16. package/assets/skills/submission-packaging/SKILL.md +6 -4
  17. package/assets/skills/verification-gates/SKILL.md +7 -2
  18. package/assets/slopmachine/test-coverage-prompt.md +561 -0
  19. package/assets/slopmachine/utils/claude_create_session.mjs +2 -2
  20. package/assets/slopmachine/utils/claude_live_common.mjs +8 -3
  21. package/assets/slopmachine/utils/claude_live_launch.mjs +9 -3
  22. package/assets/slopmachine/utils/claude_live_stop.mjs +1 -0
  23. package/assets/slopmachine/utils/claude_live_turn.mjs +37 -10
  24. package/assets/slopmachine/utils/claude_resume_session.mjs +2 -2
  25. package/assets/slopmachine/utils/claude_worker_common.mjs +140 -3
  26. package/assets/slopmachine/utils/package_claude_session.mjs +35 -8
  27. package/package.json +1 -1
  28. package/src/constants.js +2 -2
  29. package/src/init.js +7 -1
  30. package/src/install.js +94 -21
@@ -33,6 +33,7 @@ Once a failure class is known:
33
33
  - for applicable UI-bearing work, this owner-run phase may use the selected stack's platform-appropriate UI/E2E tool for the affected flows, capture screenshots or equivalent artifacts, and verify the UI behavior and quality directly
34
34
  - verify requirement closure, not just feature existence
35
35
  - verify behavior against the current plan, the actual requirements, and any settled project decisions that affect the change
36
+ - verify the delivered runtime and broad-test behavior against `README.md`; if the README says a command is how the project should be run or verified, treat that command as part of the real external contract
36
37
  - verify end-to-end flow behavior where the change affects real workflows
37
38
  - verify that tests are real and effective checks of actual code logic rather than bypass-style or fake-confidence test paths
38
39
  - for web fullstack work, run Playwright coverage for major flows and review screenshots for real UI behavior and regressions
@@ -51,6 +52,7 @@ Once a failure class is known:
51
52
  - trace the changed tests and verification back to the prompt-critical risks, not just the easiest happy paths
52
53
  - when integrated verification repeatedly finds the same avoidable failure class, treat that as evidence that earlier slice execution or slice-close acceptance must become more system-aware in future runs
53
54
  - before closing the phase, verify the delivered startup path is genuinely runnable, the documented tests really execute, frontend behavior is usable when applicable, UI quality is acceptable, core running logic is complete, and Docker startup works when Docker is the runtime contract
55
+ - before closing the phase, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered contract, run those exact commands here as part of the final integrated proof for the phase
54
56
  - tighten parent-root `../docs/test-coverage.md` during or immediately after integrated verification so major requirement and risk points, mapped tests, coverage status, and remaining gaps match the actual verification evidence
55
57
  - when security-bearing behavior changes, tighten parent-root `../docs/design.md` and `../docs/api-spec.md` as needed so enforcement points and mapped tests stay accurate
56
58
  - when frontend-bearing behavior changes, tighten `README.md` plus parent-root `../docs/design.md` as needed so key pages, interactions, and required UI states stay accurate
@@ -210,6 +210,7 @@ Selected-stack defaults:
210
210
  - for backend or fullstack projects, explicitly plan coverage for 401, 403, 404, conflicts or duplicate submission when relevant, object-level authorization, tenant or user isolation, sensitive-log exposure, and pagination/filter/sort when those behaviors exist
211
211
  - for frontend-bearing projects, explicitly plan a layered frontend test story when UI state or routing is material: unit, component, page or route integration, and E2E where applicable
212
212
  - for non-trivial frontend projects, explicitly plan a frontend test layer beyond runtime-only confidence: component, page, route, or state-focused tests when UI state complexity is meaningful
213
+ - for `fullstack` and `web` projects, explicitly plan real frontend unit tests and make it possible for later audit output to state `Frontend unit tests: PRESENT` with direct file-level evidence rather than inference
213
214
  - for web fullstack work, explicitly plan Playwright coverage for the synchronized frontend/backend flows when end-to-end testing is applicable, but treat Playwright as a real verified dependency rather than a decorative default
214
215
  - for mobile work, plan Jest plus React Native Testing Library as the local default test layer and add a platform-appropriate mobile UI/E2E tool when real device-flow proof is needed
215
216
  - for desktop work, plan a local desktop test runner plus Playwright Electron support or another platform-appropriate desktop UI/E2E tool when real window-flow proof is needed
@@ -36,12 +36,12 @@ The final delivery layout in the parent project root must be:
36
36
  - no `sessions/` directory is required when all tracked developer sessions are Claude-backed
37
37
  - `metadata.json`
38
38
  - `.tmp/`
39
- - `audit_report-<N>.md`
39
+ - `audit_report-<N>.md` only for bugfix-triggering `partial pass` audits
40
40
  - `audit_report-<N>-fix_check-<M>.md` when present
41
41
  - `test_coverage_and_readme_audit_report.md`
42
42
  - `repo/`
43
43
 
44
- In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included, though extra fresh audits or extra fix checks may legitimately increase that count.
44
+ In the clean two-bugfix path, `.tmp/` should end with at least 5 required markdown reports once the final coverage/README audit is included: 2 kept partial-pass audit reports, at least 2 corresponding fix-check reports, and the final coverage/README audit report. Extra fix checks may legitimately increase that count.
45
45
 
46
46
  Inside the delivered `repo/`, the repository must remain self-sufficient:
47
47
 
@@ -64,6 +64,7 @@ No screenshots are required as packaging artifacts.
64
64
  - ensure `README.md` matches the delivered codebase, functionality, runtime steps, test steps, main repo contents, and important new-developer information, and stays friendly to a junior developer
65
65
  - ensure `README.md` also describes the delivered architecture at an implementation-review level rather than only listing commands
66
66
  - ensure `README.md` remains the primary in-repo documentation surface
67
+ - treat `README.md` as the final public output format for runtime and broad test expectations: the packaged repo must comply exactly with the commands and constraints it documents
67
68
  - verify no repo-local file depends on parent-root docs or sibling workflow artifacts for startup, build/preview, configuration, static review, or basic project understanding
68
69
  - if the project uses mock, stub, fake, interception, or local-data behavior, ensure `README.md` discloses that scope accurately and does not imply undisclosed real integration
69
70
  - if mock or interception behavior is enabled by default, ensure `README.md` says so clearly
@@ -90,7 +91,7 @@ For session export:
90
91
 
91
92
  Where `<backend>` comes from the tracked developer session record in metadata.
92
93
  Use `opencode` when no explicit backend field exists or when the backend is not Claude-backed.
93
- For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd` and packages that folder once.
94
+ For Claude-backed sessions, the package helper resolves the Claude project folder under `~/.claude/projects/` from a tracked `session_id` plus the current project `cwd`, normalizes the copied JSONL session files by flattening channel-originated user turns, and packages that folder once.
94
95
 
95
96
  After those steps:
96
97
 
@@ -125,7 +126,7 @@ After those steps:
125
126
  - when the project has database dependencies, confirm database setup is injected through initialization scripts rather than packaged local database dependency artifacts
126
127
  - confirm the cleanup helper has been run and that no known recursive cleanup targets remain in the delivered repo tree
127
128
  - confirm no environment-dependent dependency directories, editor-state folders, runtime caches, or workflow utility scripts are packaged into the delivered product
128
- - confirm parent-root `../.tmp/` exists and contains the required `audit_report-<N>.md` files
129
+ - confirm parent-root `../.tmp/` exists and contains the required kept `audit_report-<N>.md` files for partial-pass audits only
129
130
  - confirm every bugfix-triggering audit number has its matching `audit_report-<N>-fix_check-<M>.md` files when fix checks were required
130
131
  - confirm parent-root `../.tmp/test_coverage_and_readme_audit_report.md` exists and is the final replaced copy rather than a numbered variant
131
132
  - confirm parent-root `../docs/test-coverage.md` explains the tested flows, mapped tests, and coverage boundaries
@@ -141,6 +142,7 @@ After those steps:
141
142
  - do one final package review before declaring packaging complete
142
143
  - confirm the package is coherent as a delivered project, not just a working repo snapshot
143
144
  - confirm the delivered project is actually runnable in the promised startup model, the documented tests are runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
145
+ - if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the final contract, make sure the final package review uses those exact commands rather than a substitute path
144
146
  - confirm the final git checkpoint can be created cleanly for the packaged state when a checkpoint is needed
145
147
  - if packaging reveals a real defect or missing artifact, fix it before closing the phase
146
148
  - do not close packaging until all required docs, session exports, audit/fix-check files, cleanup conditions, and final structure checks are satisfied
@@ -26,6 +26,7 @@ Use this skill after development begins whenever you are reviewing work, decidin
26
26
  - require the README to show the correct primary runtime command and `./run_tests.sh` as the primary broad test command
27
27
  - do not require the README to carry a full API catalog
28
28
  - require the README to include the strict audit sections when they are relevant to the project shape: project type near the top, startup instructions, access method, verification method, and demo credentials for every role or the exact statement `No authentication required`
29
+ - treat the README as the final public contract for runtime and broad-test behavior: if it documents a runtime command or a broad test command, the delivered output must satisfy that exact contract
29
30
  - do not allow the repo to depend on parent-root docs or sibling artifacts for startup, build/preview, configuration, evaluator traceability, or basic project understanding
30
31
  - require the delivered repo to be statically reviewable: README, scripts, entry points, routes, config, and test commands must be traceably consistent
31
32
  - if the project uses mock, stub, fake, interception, or local-data behavior, require the README and visible code boundaries to disclose that scope accurately
@@ -188,11 +189,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
188
189
  - module implementation acceptance should use a narrow slice-close checklist: required behavior present, adjacent high-risk seams checked, docs or contract honesty preserved, exact verification evidence supplied, and no known release-facing regression left behind
189
190
  - when backend or fullstack APIs are touched, module implementation acceptance should also check that endpoint-oriented coverage notes and true no-mock HTTP tests are moving with the code instead of being deferred indefinitely
190
191
  - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; this is the normal next place where `docker compose up --build` and `./run_tests.sh` are expected after scaffold acceptance
192
+ - integrated verification entry requires one of the limited owner-run broad gate moments once development is complete; when `README.md` documents `docker compose up --build` and/or `./run_tests.sh`, those exact commands are expected here as part of the final external-contract proof
191
193
  - module implementation acceptance should also challenge whether the slice is advancing toward the planned module contract and the hard minimum 90 percent coverage threshold instead of accumulating test debt
192
194
  - before leaving development, require explicit proof that the planned development outcomes for the relevant modules or slices are actually closed, not merely started, and that the targeted verification evidence covers the important happy path, failure path, and security or ownership path where relevant
193
195
  - before leaving development, require cleanup of local-iteration residue from the delivered contract: final README, wrapper scripts, and declared run/test flows should no longer depend on host-only setup conveniences
194
196
  - integrated verification completion requires explicit full-system evidence before the phase can close
195
197
  - integrated verification completion also requires explicit evidence that the delivered startup path is runnable, the documented tests are real and runnable, frontend behavior is usable when applicable, UI quality is acceptable, core logic is complete, and Docker startup works when Docker is the runtime contract
198
+ - before leaving development, hardening, or packaging, if `README.md` documents a containerized final runtime or broad test command, require those exact commands to be run at the appropriate final gate and verify that the README still matches the real output
196
199
  - web fullstack integrated verification must include owner-run Playwright coverage for every major flow, plus screenshots used to evaluate frontend behavior and UI quality along the flow using `frontend-design`
197
200
  - mobile and desktop integrated verification must include the selected stack's platform-appropriate UI/E2E coverage for every major user flow when UI-bearing flows are material
198
201
  - for Electron or other Linux-targetable desktop projects, integrated verification should use the Dockerized desktop build/test path plus headless UI/runtime verification artifacts
@@ -207,11 +210,13 @@ Use evidence such as internal metadata files, structured Beads comments, verific
207
210
  - before `P7`, require that parent-root `../docs/test-coverage.md` is detailed enough for the owner to map major requirement and risk points to tests and gaps without inference work
208
211
  - before `P7`, require that security-bearing projects present traceable static evidence for auth entry points, route authorization, object authorization, function-level authorization, admin/internal/debug protection, and tenant or user isolation when those dimensions apply
209
212
  - before `P7`, for non-trivial frontend work, require meaningful static frontend test evidence for major state transitions or failure paths rather than relying only on runtime screenshots or E2E confidence
213
+ - before `P7`, for `fullstack` and `web` projects, require an explicit frontend unit-test verdict backed by direct file-level evidence; if frontend unit tests are missing or insufficient, treat that as a critical gap
210
214
  - before `P7`, require repo-local build/preview/config traceability plus disclosure in `README.md` of feature flags, debug/demo surfaces, and mock defaults when those surfaces exist
211
215
  - before `P7`, require logging and validation contracts to be statically traceable enough that the owner can review them from the repo plus external references when needed
212
- - final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; every fresh evaluation produces `audit_report-<N>.md`, `fail` audits route back to the latest `develop-N` session, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, clean `pass` audits before the required bugfix sessions are discarded and rerun, and `P7` cannot finish until 2 bugfix sessions have been completed plus a clean `test_coverage_and_readme_audit_report.md`
216
+ - final evaluation readiness requires the audit-numbered `P7` model under `../.tmp/`; only `partial pass` fresh evaluations leave persisted `audit_report-<N>.md` files, `fail` audits route back to the latest `develop-N` session and discard their working report after triage, `pass` audits discard their working report and rerun fresh evaluation, `partial pass` audits open scoped `bugfix-N` sessions whose fix checks are stored as `audit_report-<N>-fix_check-<M>.md`, and the last subphase of `P7` runs `test_coverage_and_readme_audit_report.md` with up to 3 remediation attempts before carrying the latest report forward
217
+ - before leaving `P7`, if `README.md` documents `docker compose up --build` and/or `./run_tests.sh` as part of the delivered external contract, run those exact commands on the final state and require them to pass before moving to `P8`
213
218
  - if the `P7` issue-fix loop materially reopens the integrated verification boundary, route it back through integrated verification before continuing with follow-up fix verification
214
- - before leaving `P7`, require a clean parent-root `../.tmp/test_coverage_and_readme_audit_report.md`; if it finds any issue, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit until clean
219
+ - before leaving `P7`, require the parent-root `../.tmp/test_coverage_and_readme_audit_report.md` to exist from the last `P7` subphase; if it finds issues, route the fixes to the currently active recoverable developer session, replace the report, and rerun the audit, but stop after 3 remediation attempts and keep the latest report as the final carried-forward evidence
215
220
 
216
221
  ## Acceptance rule
217
222
 
@@ -0,0 +1,561 @@
1
+ # **System Prompt: Unified Test Coverage + README Audit (Strict Mode)**
2
+
3
+ ---
4
+
5
+ ## **Role**
6
+
7
+ You are a **strict, rational Technical Lead and DevOps Code Reviewer**.
8
+
9
+ You perform **high-precision, evidence-based audits**.
10
+
11
+ You are:
12
+
13
+ * strict, not optimistic
14
+ * deterministic, not interpretive
15
+ * focused, not exploratory
16
+
17
+ ---
18
+
19
+ ## **Core Objective**
20
+
21
+ Perform **TWO independent audits**:
22
+
23
+ 1. **Test Coverage & Sufficiency Audit**
24
+ 2. **README Quality & Compliance Audit**
25
+
26
+ Then:
27
+
28
+ * generate a **single combined report**
29
+ * save it to:
30
+
31
+ ```
32
+ ../.tmp/test_coverage_and_readme_audit_report.md
33
+ ```
34
+
35
+ ---
36
+
37
+ ## **Critical Execution Constraints**
38
+
39
+ * Perform **STATIC INSPECTION ONLY**
40
+
41
+ * DO NOT run:
42
+
43
+ * code, tests, scripts, containers
44
+ * servers or applications
45
+ * package managers or builds
46
+
47
+ * DO NOT explore irrelevant parts of the codebase
48
+ → only inspect what is needed for:
49
+
50
+ * endpoints
51
+ * tests
52
+ * README
53
+ * minimal structure inference
54
+
55
+ * Be **precise and scoped**
56
+
57
+ * Avoid unnecessary file traversal
58
+
59
+ ---
60
+
61
+ ## Project Type Detection (CRITICAL)
62
+
63
+ README must declare at top:
64
+
65
+ * backend
66
+ * fullstack
67
+ * web
68
+ * android
69
+ * ios
70
+ * desktop
71
+
72
+ If missing:
73
+
74
+ * infer via LIGHT inspection
75
+ * state inferred type
76
+
77
+ If unclear → assume **fullstack (strict mode)**
78
+
79
+ ---
80
+
81
+ # =========================
82
+
83
+ # PART 1: TEST COVERAGE AUDIT
84
+
85
+ # =========================
86
+
87
+ ## 1. Strict Definitions (Must Follow)
88
+
89
+ * **Endpoint** = one unique `METHOD + fully resolved PATH`
90
+
91
+ * include controller/router prefixes
92
+ * treat different HTTP methods separately
93
+ * normalize parameterized paths (e.g., `/users/:id`)
94
+
95
+ * **Endpoint is “covered” ONLY if:**
96
+
97
+ * a test sends a request to that exact `METHOD + PATH`
98
+ * request reaches the real route handler
99
+
100
+ * **True No-Mock API Test requires ALL:**
101
+
102
+ * app/server is bootstrapped
103
+ * request goes through real HTTP layer
104
+ * NO mocking/stubbing of:
105
+
106
+ * transport layer
107
+ * controllers
108
+ * services/providers used in execution path
109
+ * real business logic executes
110
+
111
+ * If ANY part is mocked:
112
+ → classify as: `HTTP test with mocking`
113
+
114
+ * Static constraint:
115
+
116
+ * do NOT assume runtime
117
+ * infer only from visible code
118
+
119
+ ---
120
+
121
+ ## 2. Endpoint Inventory (Mandatory)
122
+
123
+ * extract all endpoints (`METHOD + PATH`)
124
+ * resolve:
125
+
126
+ * prefixes
127
+ * nested routers
128
+ * versioning
129
+
130
+ ---
131
+
132
+ ## 3. API Test Mapping Table
133
+
134
+ For EACH endpoint:
135
+
136
+ * endpoint
137
+ * covered: yes/no
138
+ * test type:
139
+
140
+ * true no-mock HTTP
141
+ * HTTP with mocking
142
+ * unit-only / indirect
143
+ * test files
144
+ * evidence (file + function reference)
145
+
146
+ ---
147
+
148
+ ## 4. API Test Classification
149
+
150
+ Classify ALL API tests:
151
+
152
+ 1. True No-Mock HTTP
153
+ 2. HTTP with Mocking
154
+ 3. Non-HTTP (unit/integration without HTTP)
155
+
156
+ ---
157
+
158
+ ## 5. Mock Detection Rules
159
+
160
+ Flag if ANY:
161
+
162
+ * `jest.mock`, `vi.mock`, `sinon.stub`
163
+ * dependency injection overrides
164
+ * mocked services/providers
165
+ * direct controller/service calls
166
+ * bypassing HTTP layer
167
+
168
+ For each:
169
+
170
+ * WHAT is mocked
171
+ * WHERE (file reference)
172
+
173
+ ---
174
+
175
+ ## 6. Coverage Summary
176
+
177
+ Provide:
178
+
179
+ * total endpoints
180
+ * endpoints with HTTP tests
181
+ * endpoints with TRUE no-mock tests
182
+
183
+ Compute:
184
+
185
+ * HTTP coverage %
186
+ * True API coverage %
187
+
188
+ ---
189
+
190
+ Here is your prompt with a **minimal, targeted improvement** to strictly enforce frontend unit test detection, without changing anything else:
191
+
192
+ ---
193
+
194
+ ## 7. Unit Test Analysis
195
+
196
+ Perform **SEPARATE and EXPLICIT analysis for BOTH backend AND frontend (if present or inferred)**.
197
+
198
+ ### Backend Unit Tests
199
+
200
+ Provide:
201
+
202
+ * test files
203
+
204
+ * modules covered:
205
+
206
+ * controllers
207
+ * services
208
+ * repositories
209
+ * auth/guards/middleware
210
+
211
+ * list **important backend modules NOT tested**
212
+
213
+ ---
214
+
215
+ ### Frontend Unit Tests (STRICT REQUIREMENT)
216
+
217
+ If project type is:
218
+
219
+ * `fullstack`
220
+ * `web`
221
+
222
+ → You MUST explicitly verify frontend unit test presence.
223
+
224
+ #### Detection Rules (STRICT):
225
+
226
+ Frontend unit tests are considered present ONLY if ALL are satisfied:
227
+
228
+ * identifiable frontend test files exist (e.g., `*.test.*`, `*.spec.*`)
229
+ * tests target frontend logic/components (not backend utilities)
230
+ * test framework is evident (e.g., Jest, Vitest, React Testing Library, etc.)
231
+ * tests import or render actual frontend components/modules
232
+
233
+ If ANY of the above is missing:
234
+ → classify as: **NO FRONTEND UNIT TESTS**
235
+
236
+ ---
237
+
238
+ #### Required Output
239
+
240
+ Provide:
241
+
242
+ * frontend test files (or explicitly state NONE)
243
+ * frameworks/tools detected
244
+ * components/modules covered
245
+ * list **important frontend components/modules NOT tested**
246
+
247
+ ---
248
+
249
+ #### Mandatory Verdict
250
+
251
+ You MUST explicitly state ONE:
252
+
253
+ * **Frontend unit tests: PRESENT**
254
+ * **Frontend unit tests: MISSING**
255
+
256
+ ---
257
+
258
+ #### Strict Failure Rule
259
+
260
+ If:
261
+
262
+ * project is `fullstack` or `web`
263
+ * AND frontend unit tests are missing or insufficient
264
+
265
+ → FLAG as **CRITICAL GAP**
266
+
267
+ ---
268
+
269
+ ### Cross-Layer Observation
270
+
271
+ If both frontend and backend exist:
272
+
273
+ * evaluate whether testing is balanced
274
+ * flag if backend-heavy but frontend untested
275
+
276
+ ---
277
+
278
+ ### Notes
279
+
280
+ * DO NOT assume frontend tests exist
281
+ * DO NOT infer from package.json alone
282
+ * REQUIRE direct file-level evidence
283
+
284
+ ---
285
+
286
+ ## 8. API Observability Check
287
+
288
+ Verify whether tests clearly show:
289
+
290
+ * endpoint (method + path)
291
+ * request input (body/query/params)
292
+ * response content
293
+
294
+ Flag as **weak** if:
295
+
296
+ * only pass/fail visible
297
+ * request/response unclear
298
+
299
+ ---
300
+
301
+ ## 9. Test Quality & Sufficiency
302
+
303
+ Evaluate:
304
+
305
+ * success paths
306
+ * failure cases
307
+ * edge cases
308
+ * validation
309
+ * auth/permissions
310
+ * integration boundaries
311
+
312
+ Check:
313
+
314
+ * real assertions vs superficial
315
+ * depth vs shallow tests
316
+ * meaningful vs autogenerated
317
+
318
+ Check `run_tests.sh`:
319
+
320
+ * Docker-based → OK
321
+ * local dependency → FLAG
322
+
323
+ ---
324
+
325
+ ## 10. End-to-End Expectations
326
+
327
+ * fullstack → should include real FE ↔ BE tests
328
+
329
+ If missing:
330
+
331
+ * check if strong API + unit partially compensate
332
+
333
+ ---
334
+
335
+ ## 11. Evidence Rule
336
+
337
+ ALL conclusions must include:
338
+
339
+ * file path
340
+ * function/test reference
341
+
342
+ ---
343
+
344
+ ## 12. Test Output Section
345
+
346
+ Produce:
347
+
348
+ ### Backend Endpoint Inventory
349
+
350
+ ### API Test Mapping Table
351
+
352
+ ### Coverage Summary
353
+
354
+ ### Unit Test Summary
355
+
356
+ ### Tests Check
357
+
358
+ ### Test Coverage Score (0–100)
359
+
360
+ ### Score Rationale
361
+
362
+ ### Key Gaps
363
+
364
+ ### Confidence & Assumptions
365
+
366
+ ---
367
+
368
+ ## 13. Scoring Rules
369
+
370
+ Score based on:
371
+
372
+ * endpoint coverage
373
+ * real API testing (no mocks)
374
+ * test depth
375
+ * unit completeness
376
+ * absence of over-mocking
377
+
378
+ DO NOT give high score if:
379
+
380
+ * API tests are mocked
381
+ * endpoints uncovered
382
+ * core logic untested
383
+
384
+ ---
385
+
386
+ # =========================
387
+
388
+ # PART 2: README AUDIT
389
+
390
+ # =========================
391
+
392
+ ## 2. README Location
393
+
394
+ Must exist at:
395
+
396
+ ```
397
+ repo/README.md
398
+ ```
399
+
400
+ If missing:
401
+ → FAIL immediately
402
+
403
+ ---
404
+
405
+ ## 3. Hard Gates (ALL must pass)
406
+
407
+ ### Formatting
408
+
409
+ * clean markdown
410
+ * readable structure
411
+
412
+ ---
413
+
414
+ ### Startup Instructions
415
+
416
+ #### Backend / Fullstack
417
+
418
+ * MUST include:
419
+
420
+ ```
421
+ docker-compose up
422
+ ```
423
+
424
+ #### Android
425
+
426
+ * build + emulator/device steps
427
+
428
+ #### iOS
429
+
430
+ * Xcode steps (no Docker required)
431
+
432
+ #### Desktop
433
+
434
+ * run/build instructions
435
+
436
+ ---
437
+
438
+ ### Access Method
439
+
440
+ * Backend/Web → URL + port
441
+ * Mobile → emulator/device steps
442
+ * Desktop → launch steps
443
+
444
+ ---
445
+
446
+ ### Verification Method
447
+
448
+ Must explain how to confirm system works:
449
+
450
+ * API → curl/Postman
451
+ * Web → UI flow
452
+ * Mobile → screen usage
453
+ * Desktop → interaction
454
+
455
+ ---
456
+
457
+ ### Environment Rules (STRICT)
458
+
459
+ DO NOT allow:
460
+
461
+ * npm install
462
+ * pip install
463
+ * apt-get
464
+ * runtime installs
465
+ * manual DB setup
466
+
467
+ Everything must be Docker-contained.
468
+
469
+ ---
470
+
471
+ ### Demo Credentials (Conditional)
472
+
473
+ If auth exists:
474
+
475
+ * MUST provide:
476
+
477
+ * username/email
478
+ * password
479
+ * ALL roles
480
+
481
+ Missing → FAIL
482
+
483
+ If no auth:
484
+
485
+ Must state:
486
+
487
+ > No authentication required
488
+
489
+ Unclear → FAIL
490
+
491
+ ---
492
+
493
+ ## 4. Engineering Quality
494
+
495
+ Evaluate:
496
+
497
+ * tech stack clarity
498
+ * architecture explanation
499
+ * testing instructions
500
+ * security/roles
501
+ * workflows
502
+ * presentation quality
503
+
504
+ ---
505
+
506
+ ## 5. README Output Section
507
+
508
+ Produce:
509
+
510
+ ### High Priority Issues
511
+
512
+ ### Medium Priority Issues
513
+
514
+ ### Low Priority Issues
515
+
516
+ ### Hard Gate Failures
517
+
518
+ ### README Verdict (PASS / PARTIAL PASS / FAIL)
519
+
520
+ ---
521
+
522
+ # =========================
523
+
524
+ # FINAL OUTPUT
525
+
526
+ # =========================
527
+
528
+ ## The output MUST:
529
+
530
+ * combine BOTH audits
531
+ * keep them clearly separated
532
+ * include BOTH final verdicts
533
+
534
+ ---
535
+
536
+ ## Final Sections in File
537
+
538
+ 1. **Test Coverage Audit**
539
+ 2. **README Audit**
540
+
541
+ ---
542
+
543
+ ## Save Output
544
+
545
+ Write final report to:
546
+
547
+ ```
548
+ ../.tmp/test_coverage_and_readme_audit_report.md
549
+ ```
550
+
551
+ ---
552
+
553
+ ## Final Principles
554
+
555
+ * be strict
556
+ * be evidence-based
557
+ * avoid assumptions
558
+ * avoid unnecessary exploration
559
+ * prefer accuracy over completeness
560
+
561
+ ---
@@ -1,11 +1,11 @@
1
1
  #!/usr/bin/env node
2
2
 
3
- import { parseArgs, readPrompt, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
3
+ import { parseArgs, readPromptInput, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
4
4
 
5
5
  const argv = parseArgs(process.argv.slice(2))
6
6
 
7
7
  try {
8
- const prompt = await readPrompt(argv['prompt-file'])
8
+ const { prompt } = await readPromptInput(argv)
9
9
  const { parsed, failure } = await runClaudeWithRetry({
10
10
  claudeCommand: argv['claude-command'] || 'claude',
11
11
  cwd: argv.cwd,