specweave 1.0.354 → 1.0.355

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "specweave",
3
- "version": "1.0.354",
3
+ "version": "1.0.355",
4
4
  "description": "Spec-driven development framework for AI coding agents. Works with Claude Code, Codex, Antigravity, Cursor, Copilot & more. 100+ skills, 49 CLI commands, verified skill certification, autonomous execution, and living documentation.",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",
@@ -316,6 +316,24 @@ Plan review MUST NOT block other agents. Review plans as they arrive — agents
316
316
 
317
317
  For very large features, the team lead MAY split work into multiple increments per domain for better tracking and independent closure. Decide this during initial analysis (Step 1), before spawning agents.
318
318
 
319
+ ### Task Cap Per Agent (CRITICAL — Context Overflow Prevention)
320
+
321
+ **Maximum 15 tasks per agent.** Agents with more tasks accumulate too much context in auto-mode, leading to extended thinking loops and stuck agents.
322
+
323
+ When distributing tasks from the master spec:
324
+ 1. Count tasks per domain
325
+ 2. If a domain has >15 tasks: **split into 2 agents** (e.g., `jira-agent-a`, `jira-agent-b`) with non-overlapping task ranges
326
+ 3. If splitting isn't natural, group tasks into phases and create 2 increments per domain
327
+
328
+ ```
329
+ Domain tasks analysis:
330
+ Frontend: 12 tasks -> 1 agent (OK)
331
+ Backend: 8 tasks -> 1 agent (OK)
332
+ JIRA: 23 tasks -> SPLIT into 2 agents (tasks 1-12, tasks 13-23)
333
+ ```
334
+
335
+ **Why**: Each auto-mode iteration adds context (spec reads, edits, test outputs). At 20+ tasks, accumulated context causes the model to enter extended thinking (30+ min) and effectively hang. The 15-task cap keeps agents within a safe context budget.
336
+
319
337
  ---
320
338
 
321
339
  ## 4. Agent Spawn Prompt Templates
@@ -494,36 +512,38 @@ Task({
494
512
 
495
513
  ## 8. Quality Gates
496
514
 
497
- Every agent MUST run quality validation before signaling completion.
515
+ Quality gates are split: agents handle tests, team-lead handles closure (grill, done, judge-llm). This prevents context overflow in agents from loading 4+ additional skill definitions during closure.
498
516
 
499
- ### Per-Agent Quality Gate
517
+ ### Per-Agent Quality Gate (Lightweight)
500
518
 
501
519
  ```
502
520
  Agent Workflow:
503
- 1. Execute all assigned tasks (prefer /sw:auto for autonomous execution)
521
+ 1. Execute all assigned tasks via /sw:auto --simple
504
522
  2. Run all tests for owned code (unit + integration + E2E)
505
523
  3. Run linter/type-check for owned code
506
- 4. Run /sw:grill
507
- 5. If tests fail -> fix issues and repeat from step 2. Do NOT signal completion until all tests pass.
508
- 6. If /sw:grill passes -> attempt closure via /sw:done
509
- 7. If /sw:grill fails -> fix issues, repeat from step 2
510
- 8. Signal COMPLETION via SendMessage
524
+ 4. If tests fail -> fix issues and repeat from step 2
525
+ 5. Do NOT signal completion until all tests pass
526
+ 6. Signal COMPLETION via SendMessage (include task count, test results summary)
527
+ 7. Do NOT run /sw:grill or /sw:done team-lead handles closure centrally
511
528
  ```
512
529
 
513
- ### Orchestrator Quality Gate
530
+ **Why agents don't run /sw:done**: The /sw:done skill invokes 4 sub-skills (grill, judge-llm, sync-docs, qa), each loading a full SKILL.md. After 15+ tasks of auto-mode context, this pushes agents into extended thinking (30+ min hangs). Centralizing closure on the team-lead (which has a cleaner context) avoids this.
514
531
 
515
- After all agents complete, the orchestrator (team lead) runs a final validation:
532
+ ### Orchestrator Quality Gate (Centralized Closure)
533
+
534
+ After all agents complete, the team-lead runs closure **centrally** for each increment:
516
535
 
517
536
  ```
518
537
  Orchestrator Final Check:
519
538
  1. All agents signaled COMPLETION
520
539
  2. No unresolved BLOCKING_ISSUE messages
521
540
  3. Run full test suite (all domains combined)
522
- 4. Run /sw:grill on the combined increment
523
- 5. Run /sw:done --auto <id> for each increment in dependency order
524
- 6. If any /sw:done --auto fails, report the failure and continue with remaining increments
525
- 7. If all pass -> /sw:team-merge
526
- 8. If failures -> identify owning agent, send fix request via SendMessage
541
+ 4. For EACH increment in dependency order:
542
+ a. Run /sw:grill on the increment
543
+ b. Run /sw:done --auto <id>
544
+ c. If /sw:done fails, report the failure and continue with remaining increments
545
+ 5. If all pass -> /sw:team-merge
546
+ 6. If failures -> identify owning agent, send fix request via SendMessage
527
547
  ```
528
548
 
529
549
  ### Grill Checklist per Domain
@@ -539,6 +559,36 @@ Orchestrator Final Check:
539
559
 
540
560
  ---
541
561
 
562
+ ## 8b. Agent Timeout and Stuck Detection
563
+
564
+ Agents can get stuck in extended thinking if their context overflows. The team-lead MUST monitor for stuck agents.
565
+
566
+ ### Timeout Rules
567
+
568
+ | Condition | Action |
569
+ |-----------|--------|
570
+ | Agent idle >20 min after last message | Send `STATUS_CHECK` message to agent |
571
+ | No response to STATUS_CHECK within 5 min | Declare agent stuck |
572
+ | Agent stuck | Log warning, proceed with other agents, handle stuck agent's increment manually in team-merge |
573
+ | All agents stuck | STOP team, report to user |
574
+
575
+ ### Stuck Agent Recovery
576
+
577
+ When an agent is declared stuck:
578
+ 1. Do NOT wait for it — proceed with closure of other agents' increments
579
+ 2. Note the stuck agent's increment ID and last known task progress
580
+ 3. During /sw:team-merge, the stuck agent's increment is left open for manual completion
581
+ 4. Send shutdown_request to the stuck agent to free resources
582
+
583
+ ### Preventing Stuck Agents
584
+
585
+ - Enforce the 15-task cap (Section 3b)
586
+ - Agents use `--simple` flag in auto-mode (reduces context per iteration)
587
+ - Agents do NOT run /sw:done (team-lead handles closure centrally)
588
+ - If an agent's task count exceeds 15 despite the cap, the team-lead should split it before spawning
589
+
590
+ ---
591
+
542
592
  ## 9. Workflow Summary
543
593
 
544
594
  ```
@@ -556,9 +606,10 @@ Orchestrator Final Check:
556
606
  │ │ └── Wait for CONTRACT_READY after approval
557
607
  │ └── Phase 2: Spawn backend + frontend + testing
558
608
  │ └── Receive PLAN_READY, review & approve via SendMessage
559
- ├── Step 5: Monitor progress via SendMessage
560
- ├── Step 6: Quality gates (each agent runs /sw:grill)
561
- └── Step 7: Merge and close (/sw:team-merge)
609
+ ├── Step 5: Monitor progress via SendMessage (timeout: 20min idle → STATUS_CHECK)
610
+ ├── Step 6: Agents signal COMPLETION (tests pass, no /sw:grill or /sw:done on agents)
611
+ ├── Step 7: Team-lead runs /sw:grill + /sw:done --auto per increment (centralized closure)
612
+ └── Step 8: Merge and close (/sw:team-merge)
562
613
  ```
563
614
 
564
615
  **IMPORTANT**: The intended entry point is: `/sw:increment` → `/sw:do` (detects 3+ domains) → `/sw:team-lead`.
@@ -596,6 +647,8 @@ To execute, run without --dry-run.
596
647
  | **Agent stuck on trust folder** | Agent spawned without `bypassPermissions` | ALWAYS use `mode: "bypassPermissions"` — NEVER `mode: "plan"`. Trust prompts require interactive input agents cannot provide |
597
648
  | **Agents editing same files** | Overlapping file ownership patterns | Review ownership map; reassign conflicting files to a single owner; use `--dry-run` to validate before launch |
598
649
  | **Token cost too high** | Too many agents or overly large prompts | Reduce `--max-agents`; use `--domains` to limit scope; split feature into smaller increments |
650
+ | **Agent stuck in extended thinking** | Too many tasks (>15) causing context overflow | Enforce 15-task cap per agent; split large domains into 2 agents; agents use `--simple` mode |
651
+ | **Agent hung on /sw:done** | Closure loads 4+ skill definitions into already-full context | Agents should NOT run /sw:done — team-lead handles closure centrally |
599
652
  | **Contract agent takes too long** | Large schema or complex type system | Set a timeout in the agent prompt; if stuck >15 min, check agent output and consider splitting the contract work |
600
653
  | **Phase 2 starts before Phase 1 finishes** | CONTRACT_READY not received yet | Ensure upstream agents send CONTRACT_READY via SendMessage before team-lead spawns downstream |
601
654
  | **Agent fails mid-task** | Build error, test failure, or dependency issue | Send message to agent to fix; restart the agent with `/sw:auto` on its increment |
@@ -42,13 +42,12 @@ WORKFLOW:
42
42
  content: "PLAN_READY: [increment path]. [summary of planned tasks and files].",
43
43
  summary: "Backend plan ready for review" })
44
44
  9. WAIT for "PLAN_APPROVED" message. If "PLAN_REJECTED", revise and re-submit.
45
- 10. Execute tasks autonomously: prefer /sw:auto for autonomous execution
45
+ 10. Execute tasks autonomously: /sw:auto --simple (minimal context mode to prevent context overflow)
46
46
  11. Generate or update OpenAPI spec if API routes change
47
47
  12. Run all tests for owned code (unit + integration): npm test
48
- 13. Run quality gate: /sw:grill
49
- 14. Do NOT signal completion until all tests pass
50
- 15. After auto completes, attempt closure via /sw:done
51
- 16. Signal completion via SendMessage to team-lead
48
+ 13. Do NOT signal completion until all tests pass
49
+ 14. Signal COMPLETION via SendMessage to team-lead with summary of tasks done and test results
50
+ 15. Do NOT run /sw:done or /sw:grill yourself — team-lead handles closure centrally
52
51
 
53
52
  RULES:
54
53
  - WRITE only to files you own (listed above)
@@ -33,13 +33,12 @@ WORKFLOW:
33
33
  8. WAIT for "PLAN_APPROVED" message. If "PLAN_REJECTED", revise and re-submit.
34
34
  9. Generate Prisma migration: npx prisma migrate dev --name <migration-name>
35
35
  10. Write seed data if needed
36
- 11. Execute tasks autonomously: prefer /sw:auto for autonomous execution
36
+ 11. Execute tasks autonomously: /sw:auto --simple (minimal context mode to prevent context overflow)
37
37
  12. Run all tests for owned code (migration, seed): npm test
38
- 13. Run quality gate: /sw:grill
39
- 14. Do NOT signal completion until all tests pass
40
- 15. Signal CONTRACT_READY with schema details via SendMessage to team-lead
41
- 16. After auto completes, attempt closure via /sw:done
42
- 17. Signal completion via SendMessage to team-lead
38
+ 13. Do NOT signal completion until all tests pass
39
+ 14. Signal CONTRACT_READY with schema details via SendMessage to team-lead
40
+ 15. Signal COMPLETION via SendMessage to team-lead with summary of tasks done and test results
41
+ 16. Do NOT run /sw:done or /sw:grill yourself — team-lead handles closure centrally
43
42
 
44
43
  RULES:
45
44
  - WRITE only to files you own (listed above)
@@ -44,12 +44,11 @@ WORKFLOW:
44
44
  content: "PLAN_READY: [increment path]. [summary of planned tasks and files].",
45
45
  summary: "Frontend plan ready for review" })
46
46
  9. WAIT for "PLAN_APPROVED" message. If "PLAN_REJECTED", revise and re-submit.
47
- 10. Execute tasks autonomously: prefer /sw:auto for autonomous execution
47
+ 10. Execute tasks autonomously: /sw:auto --simple (minimal context mode to prevent context overflow)
48
48
  11. Run all tests for owned code (unit + integration): npm test
49
- 12. Run quality gate: /sw:grill
50
- 13. Do NOT signal completion until all tests pass
51
- 14. After auto completes, attempt closure via /sw:done
52
- 15. Signal completion via SendMessage to team-lead
49
+ 12. Do NOT signal completion until all tests pass
50
+ 13. Signal COMPLETION via SendMessage to team-lead with summary of tasks done and test results
51
+ 14. Do NOT run /sw:done or /sw:grill yourself — team-lead handles closure centrally
53
52
 
54
53
  RULES:
55
54
  - WRITE only to files you own (listed above)
@@ -34,13 +34,12 @@ WORKFLOW:
34
34
  8. WAIT for "PLAN_APPROVED" message. If "PLAN_REJECTED", revise and re-submit.
35
35
  9. Implement auth/authz middleware if needed
36
36
  10. Add input validation and sanitization
37
- 11. Execute tasks autonomously: prefer /sw:auto for autonomous execution
37
+ 11. Execute tasks autonomously: /sw:auto --simple (minimal context mode to prevent context overflow)
38
38
  12. Run all tests for owned code (security tests): npm test
39
39
  13. Run security audit tools (npm audit, dependency check)
40
- 14. Run quality gate: /sw:grill
41
- 15. Do NOT signal completion until all tests pass
42
- 16. After auto completes, attempt closure via /sw:done
43
- 17. Signal completion with security findings summary via SendMessage to team-lead
40
+ 14. Do NOT signal completion until all tests pass
41
+ 15. Signal COMPLETION via SendMessage to team-lead with summary of tasks done, test results, and security findings
42
+ 16. Do NOT run /sw:done or /sw:grill yourself — team-lead handles closure centrally
44
43
 
45
44
  RULES:
46
45
  - WRITE only to files you own (listed above)
@@ -40,12 +40,11 @@ WORKFLOW:
40
40
  9. Write unit tests for new services/components
41
41
  10. Write integration tests for API endpoints
42
42
  11. Write E2E tests for user journeys
43
- 12. Execute tasks autonomously: prefer /sw:auto for autonomous execution
43
+ 12. Execute tasks autonomously: /sw:auto --simple (minimal context mode to prevent context overflow)
44
44
  13. Run all tests (unit + integration + E2E): npm test && npx playwright test
45
45
  14. Do NOT signal completion until all tests pass -- if tests fail, fix and repeat
46
- 15. Run quality gate: /sw:grill
47
- 16. After auto completes, attempt closure via /sw:done
48
- 17. Signal completion via SendMessage to team-lead
46
+ 15. Signal COMPLETION via SendMessage to team-lead with summary of tasks done and test results
47
+ 16. Do NOT run /sw:done or /sw:grill yourself — team-lead handles closure centrally
49
48
 
50
49
  RULES:
51
50
  - WRITE only to test files (listed above)
@@ -38,13 +38,17 @@
38
38
  <!-- SECTION:orchestration required -->
39
39
  ## Workflow Orchestration
40
40
 
41
- ### 1. Plan Before Code
41
+ ### 1. Plan Before Code (MANDATORY)
42
42
 
43
- BEFORE implementing ANY non-trivial task (3+ steps):
43
+ BEFORE implementing ANY task create an increment FIRST:
44
44
  1. Create increment: spec.md (WHAT/WHY) + plan.md (HOW) + tasks.md (checklist)
45
45
  2. Get user approval before implementing
46
46
  3. If something goes sideways → STOP and re-plan
47
47
 
48
+ **No exceptions for "simple" tasks** — "simple", "quick", "basic" still require an increment. The only exception: user explicitly says "don't create an increment."
49
+
50
+ **Setup/config actions are NOT implementation** — "connect github", "setup sync", "import issues" → use the matching setup command directly, not the increment workflow.
51
+
48
52
  See **Task Format** and **User Story Format** sections for templates.
49
53
 
50
54
  ### 2. Verify Before Done
@@ -84,6 +88,7 @@ Good: npm run build → node script.js → Success
84
88
  - Avoid over-engineering and premature optimization
85
89
  - One function = one responsibility
86
90
  - If you can delete code and tests still pass, delete it
91
+ - **Match tooling to complexity** — simple tasks (calculator, todo) need 0 domain plugins and vanilla code. Don't load heavyweight frameworks, design systems, or i18n for trivial features
87
92
 
88
93
  ### No Laziness
89
94
  - Don't leave TODO comments for "later"
@@ -139,6 +144,8 @@ Good: npm run build → node script.js → Success
139
144
  | `/sw-github:sync 0001` | Sync increment to GitHub issue |
140
145
  | `/sw-jira:sync 0001` | Sync to Jira |
141
146
  | `/sw-ado:sync 0001` | Sync to Azure DevOps |
147
+ | `/sw:sync-setup` | Connect GitHub/Jira/ADO integration |
148
+ | `/sw:import` | Import issues from external tools |
142
149
  <!-- /SECTION -->
143
150
 
144
151
  ---
@@ -9,6 +9,10 @@
9
9
  |---|---|
10
10
  | **"RESTART REQUIRED"** | ALL tools blocked → STOP, wait for restart |
11
11
  | **"SKILL FIRST"** | Call shown skill FIRST → chain domain skills → implement |
12
+
13
+ **"SKILL FIRST" is mandatory** — "simple", "quick", "basic" are NOT opt-out phrases. The ONLY exception: user explicitly says "don't create an increment" or similar. Perceived simplicity never overrides hook instructions.
14
+
15
+ **Setup actions are NOT implementation** — "connect github", "setup sync", "import issues" → route to the matching setup skill (`sw:sync-setup`, `sw:import`, `sw:progress-sync`), NOT `/sw:increment`.
12
16
  <!-- /SECTION -->
13
17
 
14
18
  <!-- SECTION:header required -->
@@ -28,6 +32,12 @@
28
32
  2. **Implementation**: Invoke domain skill per tech (React → `frontend:architect`, .NET → `backend:dotnet`, Stripe → `payments:payment-core`, etc.)
29
33
  3. **Closure**: `sw:grill` runs automatically via `/sw:done`
30
34
 
35
+ **Complexity gate** — before chaining domain skills:
36
+ 1. **Tech stack specified?** → Chain ONLY the matching skill. If unspecified, ASK or default to minimal (vanilla JS/HTML, simple Express)
37
+ 2. **Complexity triage** → Simple (calculator, todo) = 0 domain plugins. Medium (auth, dashboard) = 1-2. Complex (SaaS) = full chain
38
+ 3. **Sanity check** → Would a senior engineer use this tool for this task? If obviously not, don't invoke it
39
+ 4. **Never** load all available plugins for a domain — pick ONE per domain based on the actual tech stack
40
+
31
41
  If auto-activation fails, invoke explicitly: `Skill({ skill: "name" })`
32
42
  <!-- /SECTION -->
33
43
 
@@ -55,6 +65,10 @@ SpecWeave auto-detects product descriptions and routes to `/sw:increment`:
55
65
  **Signals** (5+ = auto-route): Project name | Features list (3+) | Tech stack | Timeline/MVP | Problem statement | Business model
56
66
 
57
67
  **Opt-out phrases**: "Just brainstorm first" | "Don't plan yet" | "Quick discussion" | "Let's explore ideas"
68
+
69
+ **NOT opt-out phrases**: "simple" | "quick" | "basic" | "small" — these still require `/sw:increment`
70
+
71
+ **Setup/config requests bypass auto-detection** → route directly to the matching skill (e.g., `sw:sync-setup`, `sw:import`)
58
72
  <!-- /SECTION -->
59
73
 
60
74
  <!-- SECTION:metarule required -->
@@ -123,6 +137,8 @@ Good: npm run build → node script.js → Success
123
137
  | `/sw:done` | Close |
124
138
  | `/sw:progress-sync` | Sync progress to all external tools |
125
139
  | `/sw-github:push` | Push progress to GitHub |
140
+ | `/sw:sync-setup` | Connect GitHub/Jira/ADO integration |
141
+ | `/sw:import` | Import issues from external tools |
126
142
 
127
143
  **Natural language**: "Let's build X" → `/sw:increment` | "What's status?" → `/sw:progress` | "We're done" → `/sw:done` | "Ship while sleeping" → `/sw:auto`
128
144
 
@@ -251,7 +267,7 @@ Plugins load automatically. Manual: `vskill install --repo anton-abyzov/vskill -
251
267
  <!-- SECTION:principles -->
252
268
  ## Principles
253
269
 
254
- 1. **Spec-first**: `/sw:increment` before coding
270
+ 1. **Spec-first**: `/sw:increment` before coding — mandatory for ALL implementation requests, no exceptions unless user explicitly opts out
255
271
  2. **Docs = truth**: Specs guide implementation
256
272
  3. **Simplicity First**: Minimal code, minimal impact
257
273
  4. **No Laziness**: Root causes, senior standards