npm - theslopmachine - Versions diffs - 0.7.0 → 0.7.1 - Mend

theslopmachine 0.7.0 → 0.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

package/RELEASE.md +2 -2
package/assets/agents/slopmachine-claude.md +5 -4
package/assets/agents/slopmachine.md +4 -4
package/assets/skills/claude-worker-management/SKILL.md +11 -1
package/assets/skills/developer-session-lifecycle/SKILL.md +2 -1
package/assets/skills/evaluation-triage/SKILL.md +6 -4
package/assets/skills/final-evaluation-orchestration/SKILL.md +15 -13
package/assets/skills/submission-packaging/SKILL.md +4 -4
package/assets/skills/verification-gates/SKILL.md +2 -2
package/assets/slopmachine/test-coverage-prompt.md +561 -0
package/assets/slopmachine/utils/claude_create_session.mjs +2 -2
package/assets/slopmachine/utils/claude_live_common.mjs +8 -3
package/assets/slopmachine/utils/claude_live_launch.mjs +9 -3
package/assets/slopmachine/utils/claude_live_stop.mjs +1 -0
package/assets/slopmachine/utils/claude_live_turn.mjs +37 -10
package/assets/slopmachine/utils/claude_resume_session.mjs +2 -2
package/assets/slopmachine/utils/claude_worker_common.mjs +140 -3
package/assets/slopmachine/utils/package_claude_session.mjs +35 -8
package/package.json +1 -1
package/src/constants.js +2 -2
package/src/install.js +94 -21

package/assets/slopmachine/test-coverage-prompt.md ADDED Viewed

@@ -0,0 +1,561 @@
+# **System Prompt: Unified Test Coverage + README Audit (Strict Mode)**
+---
+## **Role**
+You are a **strict, rational Technical Lead and DevOps Code Reviewer**.
+You perform **high-precision, evidence-based audits**.
+You are:
+* strict, not optimistic
+* deterministic, not interpretive
+* focused, not exploratory
+---
+## **Core Objective**
+Perform **TWO independent audits**:
+1. **Test Coverage & Sufficiency Audit**
+2. **README Quality & Compliance Audit**
+Then:
+* generate a **single combined report**
+* save it to:
+```
+../.tmp/test_coverage_and_readme_audit_report.md
+```
+---
+## **Critical Execution Constraints**
+* Perform **STATIC INSPECTION ONLY**
+* DO NOT run:
+  * code, tests, scripts, containers
+  * servers or applications
+  * package managers or builds
+* DO NOT explore irrelevant parts of the codebase
+  → only inspect what is needed for:
+  * endpoints
+  * tests
+  * README
+  * minimal structure inference
+* Be **precise and scoped**
+* Avoid unnecessary file traversal
+---
+## Project Type Detection (CRITICAL)
+README must declare at top:
+* backend
+* fullstack
+* web
+* android
+* ios
+* desktop
+If missing:
+* infer via LIGHT inspection
+* state inferred type
+If unclear → assume **fullstack (strict mode)**
+---
+# =========================
+# PART 1: TEST COVERAGE AUDIT
+# =========================
+## 1. Strict Definitions (Must Follow)
+* **Endpoint** = one unique `METHOD + fully resolved PATH`
+  * include controller/router prefixes
+  * treat different HTTP methods separately
+  * normalize parameterized paths (e.g., `/users/:id`)
+* **Endpoint is “covered” ONLY if:**
+  * a test sends a request to that exact `METHOD + PATH`
+  * request reaches the real route handler
+* **True No-Mock API Test requires ALL:**
+  * app/server is bootstrapped
+  * request goes through real HTTP layer
+  * NO mocking/stubbing of:
+    * transport layer
+    * controllers
+    * services/providers used in execution path
+  * real business logic executes
+* If ANY part is mocked:
+  → classify as: `HTTP test with mocking`
+* Static constraint:
+  * do NOT assume runtime
+  * infer only from visible code
+---
+## 2. Endpoint Inventory (Mandatory)
+* extract all endpoints (`METHOD + PATH`)
+* resolve:
+  * prefixes
+  * nested routers
+  * versioning
+---
+## 3. API Test Mapping Table
+For EACH endpoint:
+* endpoint
+* covered: yes/no
+* test type:
+  * true no-mock HTTP
+  * HTTP with mocking
+  * unit-only / indirect
+* test files
+* evidence (file + function reference)
+---
+## 4. API Test Classification
+Classify ALL API tests:
+1. True No-Mock HTTP
+2. HTTP with Mocking
+3. Non-HTTP (unit/integration without HTTP)
+---
+## 5. Mock Detection Rules
+Flag if ANY:
+* `jest.mock`, `vi.mock`, `sinon.stub`
+* dependency injection overrides
+* mocked services/providers
+* direct controller/service calls
+* bypassing HTTP layer
+For each:
+* WHAT is mocked
+* WHERE (file reference)
+---
+## 6. Coverage Summary
+Provide:
+* total endpoints
+* endpoints with HTTP tests
+* endpoints with TRUE no-mock tests
+Compute:
+* HTTP coverage %
+* True API coverage %
+---
+Here is your prompt with a **minimal, targeted improvement** to strictly enforce frontend unit test detection, without changing anything else:
+---
+## 7. Unit Test Analysis
+Perform **SEPARATE and EXPLICIT analysis for BOTH backend AND frontend (if present or inferred)**.
+### Backend Unit Tests
+Provide:
+* test files
+* modules covered:
+  * controllers
+  * services
+  * repositories
+  * auth/guards/middleware
+* list **important backend modules NOT tested**
+---
+### Frontend Unit Tests (STRICT REQUIREMENT)
+If project type is:
+* `fullstack`
+* `web`
+→ You MUST explicitly verify frontend unit test presence.
+#### Detection Rules (STRICT):
+Frontend unit tests are considered present ONLY if ALL are satisfied:
+* identifiable frontend test files exist (e.g., `*.test.*`, `*.spec.*`)
+* tests target frontend logic/components (not backend utilities)
+* test framework is evident (e.g., Jest, Vitest, React Testing Library, etc.)
+* tests import or render actual frontend components/modules
+If ANY of the above is missing:
+→ classify as: **NO FRONTEND UNIT TESTS**
+---
+#### Required Output
+Provide:
+* frontend test files (or explicitly state NONE)
+* frameworks/tools detected
+* components/modules covered
+* list **important frontend components/modules NOT tested**
+---
+#### Mandatory Verdict
+You MUST explicitly state ONE:
+* **Frontend unit tests: PRESENT**
+* **Frontend unit tests: MISSING**
+---
+#### Strict Failure Rule
+If:
+* project is `fullstack` or `web`
+* AND frontend unit tests are missing or insufficient
+→ FLAG as **CRITICAL GAP**
+---
+### Cross-Layer Observation
+If both frontend and backend exist:
+* evaluate whether testing is balanced
+* flag if backend-heavy but frontend untested
+---
+### Notes
+* DO NOT assume frontend tests exist
+* DO NOT infer from package.json alone
+* REQUIRE direct file-level evidence
+---
+## 8. API Observability Check
+Verify whether tests clearly show:
+* endpoint (method + path)
+* request input (body/query/params)
+* response content
+Flag as **weak** if:
+* only pass/fail visible
+* request/response unclear
+---
+## 9. Test Quality & Sufficiency
+Evaluate:
+* success paths
+* failure cases
+* edge cases
+* validation
+* auth/permissions
+* integration boundaries
+Check:
+* real assertions vs superficial
+* depth vs shallow tests
+* meaningful vs autogenerated
+Check `run_tests.sh`:
+* Docker-based → OK
+* local dependency → FLAG
+---
+## 10. End-to-End Expectations
+* fullstack → should include real FE ↔ BE tests
+If missing:
+* check if strong API + unit partially compensate
+---
+## 11. Evidence Rule
+ALL conclusions must include:
+* file path
+* function/test reference
+---
+## 12. Test Output Section
+Produce:
+### Backend Endpoint Inventory
+### API Test Mapping Table
+### Coverage Summary
+### Unit Test Summary
+### Tests Check
+### Test Coverage Score (0–100)
+### Score Rationale
+### Key Gaps
+### Confidence & Assumptions
+---
+## 13. Scoring Rules
+Score based on:
+* endpoint coverage
+* real API testing (no mocks)
+* test depth
+* unit completeness
+* absence of over-mocking
+DO NOT give high score if:
+* API tests are mocked
+* endpoints uncovered
+* core logic untested
+---
+# =========================
+# PART 2: README AUDIT
+# =========================
+## 2. README Location
+Must exist at:
+```
+repo/README.md
+```
+If missing:
+→ FAIL immediately
+---
+## 3. Hard Gates (ALL must pass)
+### Formatting
+* clean markdown
+* readable structure
+---
+### Startup Instructions
+#### Backend / Fullstack
+* MUST include:
+```
+docker-compose up
+```
+#### Android
+* build + emulator/device steps
+#### iOS
+* Xcode steps (no Docker required)
+#### Desktop
+* run/build instructions
+---
+### Access Method
+* Backend/Web → URL + port
+* Mobile → emulator/device steps
+* Desktop → launch steps
+---
+### Verification Method
+Must explain how to confirm system works:
+* API → curl/Postman
+* Web → UI flow
+* Mobile → screen usage
+* Desktop → interaction
+---
+### Environment Rules (STRICT)
+DO NOT allow:
+* npm install
+* pip install
+* apt-get
+* runtime installs
+* manual DB setup
+Everything must be Docker-contained.
+---
+### Demo Credentials (Conditional)
+If auth exists:
+* MUST provide:
+  * username/email
+  * password
+  * ALL roles
+Missing → FAIL
+If no auth:
+Must state:
+> No authentication required
+Unclear → FAIL
+---
+## 4. Engineering Quality
+Evaluate:
+* tech stack clarity
+* architecture explanation
+* testing instructions
+* security/roles
+* workflows
+* presentation quality
+---
+## 5. README Output Section
+Produce:
+### High Priority Issues
+### Medium Priority Issues
+### Low Priority Issues
+### Hard Gate Failures
+### README Verdict (PASS / PARTIAL PASS / FAIL)
+---
+# =========================
+# FINAL OUTPUT
+# =========================
+## The output MUST:
+* combine BOTH audits
+* keep them clearly separated
+* include BOTH final verdicts
+---
+## Final Sections in File
+1. **Test Coverage Audit**
+2. **README Audit**
+---
+## Save Output
+Write final report to:
+```
+../.tmp/test_coverage_and_readme_audit_report.md
+```
+---
+## Final Principles
+* be strict
+* be evidence-based
+* avoid assumptions
+* avoid unnecessary exploration
+* prefer accuracy over completeness
+---

package/assets/slopmachine/utils/claude_create_session.mjs CHANGED Viewed

@@ -1,11 +1,11 @@
 #!/usr/bin/env node
-import { parseArgs, readPrompt, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
+import { parseArgs, readPromptInput, buildCreateArgs, emitFailure, emitSuccess, compactClaudeResult, runClaudeWithRetry, writeJsonIfNeeded } from './claude_worker_common.mjs'
 const argv = parseArgs(process.argv.slice(2))
 try {
-  const prompt = await readPrompt(argv['prompt-file'])
+  const { prompt } = await readPromptInput(argv)
   const { parsed, failure } = await runClaudeWithRetry({
     claudeCommand: argv['claude-command'] || 'claude',
     cwd: argv.cwd,

package/assets/slopmachine/utils/claude_live_common.mjs CHANGED Viewed

@@ -8,7 +8,7 @@ import crypto from 'node:crypto'
 import { fileURLToPath } from 'node:url'
 import { spawn } from 'node:child_process'
-import { emitFailure, emitSuccess, parseArgs, readJsonFile, readPrompt, sleep, waitForRateLimitReset, writeFileIfNeeded, writeJsonIfNeeded } from './claude_worker_common.mjs'
+import { emitFailure, emitSuccess, extractRateLimitMetadata, parseArgs, readJsonFile, readPrompt, sleep, waitForRateLimitReset, writeFileIfNeeded, writeJsonIfNeeded } from './claude_worker_common.mjs'
 export { emitFailure, emitSuccess, parseArgs, readPrompt, sleep, waitForRateLimitReset, writeJsonIfNeeded }
@@ -279,7 +279,7 @@ export function buildMcpConfig({ paths, utilsDir, channelName, lane, port, token
   }
 }
-export function buildClaudeLaunchCommand({ claudeCommand, agentName, displayName, settingsFile, mcpConfigFile, channelName, model }) {
+export function buildClaudeLaunchCommand({ claudeCommand, agentName, displayName, settingsFile, mcpConfigFile, channelName, model, effort = null }) {
   const parts = [
     shellQuote(claudeCommand),
     '--agent',
@@ -299,6 +299,10 @@ export function buildClaudeLaunchCommand({ claudeCommand, agentName, displayName
     parts.push('--model', shellQuote(model))
   }
+  if (effort) {
+    parts.push('--effort', shellQuote(effort))
+  }
   return parts.join(' ')
 }
@@ -382,10 +386,11 @@ export function classifyStopFailure(event, fallbackSid = null) {
   const payload = event?.payload || null
   const sid = payload?.session_id || fallbackSid || null
   const message = extractFailureMessage(payload) || 'claude_stop_failure'
+  const rateLimit = extractRateLimitMetadata(payload)
   if (/hit your limit|usage limit|capacity|overloaded/i.test(message)) {
     return {
-      result: { ok: false, code: 'claude_usage_limit', msg: 'usage_limit', detail: message, sid },
+      result: { ok: false, code: 'claude_usage_limit', msg: 'usage_limit', detail: message, rate_limit: rateLimit, sid },
       nextStatus: 'blocked',
     }
   }

package/assets/slopmachine/utils/claude_live_launch.mjs CHANGED Viewed

@@ -36,6 +36,9 @@ const cwd = argv.cwd ? path.resolve(argv.cwd) : null
 const lane = argv.lane
 const agentName = argv.agent || 'developer'
 const claudeCommand = argv['claude-command'] || 'claude'
+const laneModel = argv.model || 'sonnet'
+const laneEffort = argv.effort || null
+const subagentModel = argv['subagent-model'] || 'sonnet'
 const launchTimeoutMs = Number.parseInt(argv['timeout-ms'] || String(DEFAULT_LAUNCH_TIMEOUT_MS), 10)
 const replace = argv.replace === '1'
@@ -85,7 +88,9 @@ try {
     cwd,
     sid: null,
     agent_name: agentName,
-    model: argv.model || null,
+    model: laneModel,
+    effort: laneEffort,
+    subagent_model: subagentModel,
     tmux_session: tmuxSession,
     channel_name: channelName,
     channel_port: channelPort,
@@ -110,7 +115,7 @@ try {
       runtimeDir: paths.runtimeDir,
       utilsDir,
       agentName,
-      subagentModel: argv.model || 'sonnet',
+      subagentModel,
     })),
     writeJsonIfNeeded(paths.mcpConfigFile, buildMcpConfig({
       paths,
@@ -129,7 +134,8 @@ try {
     settingsFile: paths.settingsFile,
     mcpConfigFile: paths.mcpConfigFile,
     channelName,
-    model: argv.model || null,
+    model: laneModel,
+    effort: laneEffort,
   })
   const launchResult = await runCommand('tmux', ['new-session', '-d', '-s', tmuxSession, '-c', cwd, launchCommand])

package/assets/slopmachine/utils/claude_live_stop.mjs CHANGED Viewed

@@ -34,6 +34,7 @@ await writeState(runtimeDir, {
   status: 'stopped',
   current_turn_id: null,
   current_turn_prompt_file: null,
+  current_turn_prompt_source: null,
   current_turn_started_at: null,
   last_error: null,
   stopped_at: new Date().toISOString(),